WorldWideScience

Sample records for multiple processor computer

  1. Introduction to programming multiple-processor computers

    International Nuclear Information System (INIS)

    Hicks, H.R.; Lynch, V.E.

    1985-04-01

    FORTRAN applications programs can be executed on multiprocessor computers in either a unitasking (traditional) or multitasking form. The latter allows a single job to use more than one processor simultaneously, with a consequent reduction in wall-clock time and, perhaps, the cost of the calculation. An introduction to programming in this environment is presented. The concepts of synchronization and data sharing using EVENTS and LOCKS are illustrated with examples. The strategy of strong synchronization and the use of synchronization templates are proposed. We emphasize that incorrect multitasking programs can produce irreproducible results, which makes debugging more difficult

  2. Multiple Embedded Processors for Fault-Tolerant Computing

    Science.gov (United States)

    Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

    2005-01-01

    A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.

  3. An introduction to programming multiple-processor computers

    International Nuclear Information System (INIS)

    Hicks, H.R.; Lynch, V.E.

    1986-01-01

    Fortran applications programs can be executed on multiprocessor computers in either a unitasking (traditional) or multitasking form. The later allows a single job to use more than one processor simultaneously, with a consequent reduction in elapsed time and, perhaps, the cost of the calculation. An introduction to programming in this environment is presented. The concept of synchronization and data sharing using EVENTS and LOCKS are illustrated with examples. The strategy of strong synchronization and the use of synchronization templates are proposed. We emphasize that incorrect multitasking programs can produce irreducible results, which makes debugging more difficult

  4. Reducing the computational requirements for simulating tunnel fires by combining multiscale modelling and multiple processor calculation

    DEFF Research Database (Denmark)

    Vermesi, Izabella; Rein, Guillermo; Colella, Francesco

    2017-01-01

    Multiscale modelling of tunnel fires that uses a coupled 3D (fire area) and 1D (the rest of the tunnel) model is seen as the solution to the numerical problem of the large domains associated with long tunnels. The present study demonstrates the feasibility of the implementation of this method...... in FDS version 6.0, a widely used fire-specific, open source CFD software. Furthermore, it compares the reduction in simulation time given by multiscale modelling with the one given by the use of multiple processor calculation. This was done using a 1200m long tunnel with a rectangular cross......-section as a demonstration case. The multiscale implementation consisted of placing a 30MW fire in the centre of a 400m long 3D domain, along with two 400m long 1D ducts on each side of it, that were again bounded by two nodes each. A fixed volume flow was defined in the upstream duct and the two models were coupled...

  5. HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

    Science.gov (United States)

    van Dyk, Danny; Geveler, Markus; Mallach, Sven; Ribbrock, Dirk; Göddeke, Dominik; Gutwenger, Carsten

    2009-12-01

    We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI's libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI's SSE backend, and additional 3-4 and 4-16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development. Program summaryProgram title: HONEI Catalogue identifier: AEDW_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDW_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPLv2 No. of lines in distributed program, including test data, etc.: 216 180 No. of bytes in distributed program, including test data, etc.: 1 270 140 Distribution format: tar.gz Programming language: C++ Computer: x86, x86_64, NVIDIA CUDA GPUs, Cell blades and PlayStation 3 Operating system: Linux RAM: at least 500 MB free Classification: 4.8, 4.3, 6.1 External routines: SSE: none; [1] for GPU, [2] for Cell backend Nature of problem: Computational science in general and numerical simulation in particular have reached a turning point. The revolution developers are facing is not primarily driven by a change in (problem-specific) methodology, but rather by the fundamental paradigm shift of the

  6. Experiences with the ACPMAPS (Advanced Computer Program Multiple Array Processor System) 50 GFLOP system

    International Nuclear Information System (INIS)

    Fischler, M.

    1992-10-01

    The Fermilab Computer R ampersand D and Theory departments have for several years collaborated on a multi-GFLOP (recently upgraded to 50 GFLOP) system for lattice gauge calculations. The primary emphasis is on flexibility and ease of algorithm development. This system (ACPMAPS) has been in use for some time, allowing theorists to produce QCD results with relevance for the analysis of experimental data. We present general observations about benefits of such a scientist-oriented system, and summarize some of the advances recently made. We also discuss what was discovered about features needed in a useful algorithm exploration platform. These lessons can be applied to the design and evaluation of future massively parallel systems (commercial or otherwise)

  7. Launching applications on compute and service processors running under different operating systems in scalable network of processor boards with routers

    Science.gov (United States)

    Tomkins, James L [Albuquerque, NM; Camp, William J [Albuquerque, NM

    2009-03-17

    A multiple processor computing apparatus includes a physical interconnect structure that is flexibly configurable to support selective segregation of classified and unclassified users. The physical interconnect structure also permits easy physical scalability of the computing apparatus. The computing apparatus can include an emulator which permits applications from the same job to be launched on processors that use different operating systems.

  8. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    These proceedings contain the articles presented at the named conference. These concern hardware and software for vector and parallel processors, numerical methods and algorithms for the computation on such processors, as well as applications of such methods to different fields of physics and related sciences. See hints under the relevant topics. (HSI)

  9. Computer Generated Inputs for NMIS Processor Verification

    International Nuclear Information System (INIS)

    J. A. Mullens; J. E. Breeding; J. A. McEvers; R. W. Wysor; L. G. Chiang; J. R. Lenarduzzi; J. T. Mihalczo; J. K. Mattingly

    2001-01-01

    Proper operation of the Nuclear Identification Materials System (NMIS) processor can be verified using computer-generated inputs [BIST (Built-In-Self-Test)] at the digital inputs. Preselected sequences of input pulses to all channels with known correlation functions are compared to the output of the processor. These types of verifications have been utilized in NMIS type correlation processors at the Oak Ridge National Laboratory since 1984. The use of this test confirmed a malfunction in a NMIS processor at the All-Russian Scientific Research Institute of Experimental Physics (VNIIEF) in 1998. The NMIS processor boards were returned to the U.S. for repair and subsequently used in NMIS passive and active measurements with Pu at VNIIEF in 1999

  10. RISC Processors and High Performance Computing

    Science.gov (United States)

    Bailey, David H.; Saini, Subhash; Craw, James M. (Technical Monitor)

    1995-01-01

    This tutorial will discuss the top five RISC microprocessors and the parallel systems in which they are used. It will provide a unique cross-machine comparison not available elsewhere. The effective performance of these processors will be compared by citing standard benchmarks in the context of real applications. The latest NAS Parallel Benchmarks, both absolute performance and performance per dollar, will be listed. The next generation of the NPB will be described. The tutorial will conclude with a discussion of future directions in the field. Technology Transfer Considerations: All of these computer systems are commercially available internationally. Information about these processors is available in the public domain, mostly from the vendors themselves. The NAS Parallel Benchmarks and their results have been previously approved numerous times for public release, beginning back in 1991.

  11. Aspects of computation on asynchronous parallel processors

    International Nuclear Information System (INIS)

    Wright, M.

    1989-01-01

    The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues

  12. Parallel computation for distributed parameter system-from vector processors to Adena computer

    Energy Technology Data Exchange (ETDEWEB)

    Nogi, T

    1983-04-01

    Research on advanced parallel hardware and software architectures for very high-speed computation deserves and needs more support and attention to fulfil its promise. Novel architectures for parallel processing are being made ready. Architectures for parallel processing can be roughly divided into two groups. One is a vector processor in which a single central processing unit involves multiple vector-arithmetic registers. The other is a processor array in which slave processors are connected to a host processor to perform parallel computation. In this review, the concept and data structure of the Adena (alternating-direction edition nexus array) architecture, which is conformable to distributed-parameter simulation algorithms, are described. 5 references.

  13. Scientific Computing Kernels on the Cell Processor

    Energy Technology Data Exchange (ETDEWEB)

    Williams, Samuel W.; Shalf, John; Oliker, Leonid; Kamil, Shoaib; Husbands, Parry; Yelick, Katherine

    2007-04-04

    The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of using the recently-released STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. First, we introduce a performance model for Cell and apply it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs. The difficulty of programming Cell, which requires assembly level intrinsics for the best performance, makes this model useful as an initial step in algorithm design and evaluation. Next, we validate the accuracy of our model by comparing results against published hardware results, as well as our own implementations on a 3.2GHz Cell blade. Additionally, we compare Cell performance to benchmarks run on leading superscalar (AMD Opteron), VLIW (Intel Itanium2), and vector (Cray X1E) architectures. Our work also explores several different mappings of the kernels and demonstrates a simple and effective programming model for Cell's unique architecture. Finally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.

  14. Recommending the heterogeneous cluster type multi-processor system computing

    International Nuclear Information System (INIS)

    Iijima, Nobukazu

    2010-01-01

    Real-time reactor simulator had been developed by reusing the equipment of the Musashi reactor and its performance improvement became indispensable for research tools to increase sampling rate with introduction of arithmetic units using multi-Digital Signal Processor(DSP) system (cluster). In order to realize the heterogeneous cluster type multi-processor system computing, combination of two kinds of Control Processor (CP) s, Cluster Control Processor (CCP) and System Control Processor (SCP), were proposed with Large System Control Processor (LSCP) for hierarchical cluster if needed. Faster computing performance of this system was well evaluated by simulation results for simultaneous execution of plural jobs and also pipeline processing between clusters, which showed the system led to effective use of existing system and enhancement of the cost performance. (T. Tanaka)

  15. Fast parallel computation of polynomials using few processors

    DEFF Research Database (Denmark)

    Valiant, Leslie; Skyum, Sven

    1981-01-01

    It is shown that any multivariate polynomial that can be computed sequentially in C steps and has degree d can be computed in parallel in 0((log d) (log C + log d)) steps using only (Cd)0(1) processors....

  16. Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

    Science.gov (United States)

    Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

    2017-11-01

    Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.

  17. MPC Related Computational Capabilities of ARMv7A Processors

    DEFF Research Database (Denmark)

    Frison, Gianluca; Jørgensen, John Bagterp

    2015-01-01

    In recent years, the mass market of mobile devices has pushed the demand for increasingly fast but cheap processors. ARM, the world leader in this sector, has developed the Cortex-A series of processors with focus on computationally intensive applications. If properly programmed, these processors...... are powerful enough to solve the complex optimization problems arising in MPC in real-time, while keeping the traditional low-cost and low-power consumption. This makes these processors ideal candidates for use in embedded MPC. In this paper, we investigate the floating-point capabilities of Cortex A7, A9...... and A15 and show how to exploit the unique features of each processor to obtain the best performance, in the context of a novel implementation method for the linear-algebra routines used in MPC solvers. This method adapts high-performance computing techniques to the needs of embedded MPC. In particular...

  18. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)

  19. Highway traffic simulation on multi-processor computers

    Energy Technology Data Exchange (ETDEWEB)

    Hanebutte, U.R.; Doss, E.; Tentner, A.M.

    1997-04-01

    A computer model has been developed to simulate highway traffic for various degrees of automation with a high level of fidelity in regard to driver control and vehicle characteristics. The model simulates vehicle maneuvering in a multi-lane highway traffic system and allows for the use of Intelligent Transportation System (ITS) technologies such as an Automated Intelligent Cruise Control (AICC). The structure of the computer model facilitates the use of parallel computers for the highway traffic simulation, since domain decomposition techniques can be applied in a straight forward fashion. In this model, the highway system (i.e. a network of road links) is divided into multiple regions; each region is controlled by a separate link manager residing on an individual processor. A graphical user interface augments the computer model kv allowing for real-time interactive simulation control and interaction with each individual vehicle and road side infrastructure element on each link. Average speed and traffic volume data is collected at user-specified loop detector locations. Further, as a measure of safety the so- called Time To Collision (TTC) parameter is being recorded.

  20. Fast Parallel Computation of Polynomials Using Few Processors

    DEFF Research Database (Denmark)

    Valiant, Leslie G.; Skyum, Sven; Berkowitz, S.

    1983-01-01

    It is shown that any multivariate polynomial of degree $d$ that can be computed sequentially in $C$ steps can be computed in parallel in $O((\\log d)(\\log C + \\log d))$ steps using only $(Cd)^{O(1)} $ processors....

  1. Initial explorations of ARM processors for scientific computing

    International Nuclear Information System (INIS)

    Abdurachmanov, David; Elmer, Peter; Eulisse, Giulio; Muzaffar, Shahzad

    2014-01-01

    Power efficiency is becoming an ever more important metric for both high performance and high throughput computing. Over the course of next decade it is expected that flops/watt will be a major driver for the evolution of computer architecture. Servers with large numbers of ARM processors, already ubiquitous in mobile computing, are a promising alternative to traditional x86-64 computing. We present the results of our initial investigations into the use of ARM processors for scientific computing applications. In particular we report the results from our work with a current generation ARMv7 development board to explore ARM-specific issues regarding the software development environment, operating system, performance benchmarks and issues for porting High Energy Physics software

  2. A design of a computer complex including vector processors

    International Nuclear Information System (INIS)

    Asai, Kiyoshi

    1982-12-01

    We, members of the Computing Center, Japan Atomic Energy Research Institute have been engaged for these six years in the research of adaptability of vector processing to large-scale nuclear codes. The research has been done in collaboration with researchers and engineers of JAERI and a computer manufacturer. In this research, forty large-scale nuclear codes were investigated from the viewpoint of vectorization. Among them, twenty-six codes were actually vectorized and executed. As the results of the investigation, it is now estimated that about seventy percents of nuclear codes and seventy percents of our total amount of CPU time of JAERI are highly vectorizable. Based on the data obtained by the investigation, (1)currently vectorizable CPU time, (2)necessary number of vector processors, (3)necessary manpower for vectorization of nuclear codes, (4)computing speed, memory size, number of parallel 1/0 paths, size and speed of 1/0 buffer of vector processor suitable for our applications, (5)necessary software and operational policy for use of vector processors are discussed, and finally (6)a computer complex including vector processors is presented in this report. (author)

  3. Graphics processor efficiency for realization of rapid tabular computations

    International Nuclear Information System (INIS)

    Dudnik, V.A.; Kudryavtsev, V.I.; Us, S.A.; Shestakov, M.V.

    2016-01-01

    Capabilities of graphics processing units (GPU) and central processing units (CPU) have been investigated for realization of fast-calculation algorithms with the use of tabulated functions. The realization of tabulated functions is exemplified by the GPU/CPU architecture-based processors. Comparison is made between the operating efficiencies of GPU and CPU, employed for tabular calculations at different conditions of use. Recommendations are formulated for the use of graphical and central processors to speed up scientific and engineering computations through the use of tabulated functions

  4. Automatic differentiation for design sensitivity analysis of structural systems using multiple processors

    Science.gov (United States)

    Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi

    1994-01-01

    An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.

  5. Optimal neural computations require analog processors

    Energy Technology Data Exchange (ETDEWEB)

    Beiu, V.

    1998-12-31

    This paper discusses some of the limitations of hardware implementations of neural networks. The authors start by presenting neural structures and their biological inspirations, while mentioning the simplifications leading to artificial neural networks. Further, the focus will be on hardware imposed constraints. They will present recent results for three different alternatives of parallel implementations of neural networks: digital circuits, threshold gate circuits, and analog circuits. The area and the delay will be related to the neurons` fan-in and to the precision of their synaptic weights. The main conclusion is that hardware-efficient solutions require analog computations, and suggests the following two alternatives: (i) cope with the limitations imposed by silicon, by speeding up the computation of the elementary silicon neurons; (2) investigate solutions which would allow the use of the third dimension (e.g. using optical interconnections).

  6. DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

    Directory of Open Access Journals (Sweden)

    Kaufmann Michael

    2004-09-01

    Full Text Available Abstract Background Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Results Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. Conclusions By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.

  7. Image matrix processor for fast multi-dimensional computations

    Science.gov (United States)

    Roberson, George P.; Skeate, Michael F.

    1996-01-01

    An apparatus for multi-dimensional computation which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination.

  8. Computing trends using graphic processor in high energy physics

    CERN Document Server

    Niculescu, Mihai

    2011-01-01

    One of the main challenges in Heavy Energy Physics is to make fast analysis of high amount of experimental and simulated data. At LHC-CERN one p-p event is approximate 1 Mb in size. The time taken to analyze the data and obtain fast results depends on high computational power. The main advantage of using GPU(Graphic Processor Unit) programming over traditional CPU one is that graphical cards bring a lot of computing power at a very low price. Today a huge number of application(scientific, financial etc) began to be ported or developed for GPU, including Monte Carlo tools or data analysis tools for High Energy Physics. In this paper, we'll present current status and trends in HEP using GPU.

  9. Application of the Computer Capacity to the Analysis of Processors Evolution

    OpenAIRE

    Ryabko, Boris; Rakitskiy, Anton

    2017-01-01

    The notion of computer capacity was proposed in 2012, and this quantity has been estimated for computers of different kinds. In this paper we show that, when designing new processors, the manufacturers change the parameters that affect the computer capacity. This allows us to predict the values of parameters of future processors. As the main example we use Intel processors, due to the accessibility of detailed description of all their technical characteristics.

  10. Architecture and VHDL behavioural validation of a parallel processor dedicated to computer vision

    International Nuclear Information System (INIS)

    Collette, Thierry

    1992-01-01

    Speeding up image processing is mainly obtained using parallel computers; SIMD processors (single instruction stream, multiple data stream) have been developed, and have proven highly efficient regarding low-level image processing operations. Nevertheless, their performances drop for most intermediate of high level operations, mainly when random data reorganisations in processor memories are involved. The aim of this thesis was to extend the SIMD computer capabilities to allow it to perform more efficiently at the image processing intermediate level. The study of some representative algorithms of this class, points out the limits of this computer. Nevertheless, these limits can be erased by architectural modifications. This leads us to propose SYMPATIX, a new SIMD parallel computer. To valid its new concept, a behavioural model written in VHDL - Hardware Description Language - has been elaborated. With this model, the new computer performances have been estimated running image processing algorithm simulations. VHDL modeling approach allows to perform the system top down electronic design giving an easy coupling between system architectural modifications and their electronic cost. The obtained results show SYMPATIX to be an efficient computer for low and intermediate level image processing. It can be connected to a high level computer, opening up the development of new computer vision applications. This thesis also presents, a top down design method, based on the VHDL, intended for electronic system architects. (author) [fr

  11. The Potential of the Cell Processor for Scientific Computing

    Energy Technology Data Exchange (ETDEWEB)

    Williams, Samuel; Shalf, John; Oliker, Leonid; Husbands, Parry; Kamil, Shoaib; Yelick, Katherine

    2005-10-14

    The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of the using the forth coming STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. We are the first to present quantitative Cell performance data on scientific kernels and show direct comparisons against leading superscalar (AMD Opteron), VLIW (IntelItanium2), and vector (Cray X1) architectures. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop both analytical models and simulators to predict kernel performance. Our work also explores the complexity of mapping several important scientific algorithms onto the Cells unique architecture. Additionally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.

  12. Algorithms for computational fluid dynamics n parallel processors

    International Nuclear Information System (INIS)

    Van de Velde, E.F.

    1986-01-01

    A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries

  13. Programs for Testing Processor-in-Memory Computing Systems

    Science.gov (United States)

    Katz, Daniel S.

    2006-01-01

    The Multithreaded Microbenchmarks for Processor-In-Memory (PIM) Compilers, Simulators, and Hardware are computer programs arranged in a series for use in testing the performances of PIM computing systems, including compilers, simulators, and hardware. The programs at the beginning of the series test basic functionality; the programs at subsequent positions in the series test increasingly complex functionality. The programs are intended to be used while designing a PIM system, and can be used to verify that compilers, simulators, and hardware work correctly. The programs can also be used to enable designers of these system components to examine tradeoffs in implementation. Finally, these programs can be run on non-PIM hardware (either single-threaded or multithreaded) using the POSIX pthreads standard to verify that the benchmarks themselves operate correctly. [POSIX (Portable Operating System Interface for UNIX) is a set of standards that define how programs and operating systems interact with each other. pthreads is a library of pre-emptive thread routines that comply with one of the POSIX standards.

  14. Probabilistic programmable quantum processors with multiple copies of program states

    International Nuclear Information System (INIS)

    Brazier, Adam; Buzek, Vladimir; Knight, Peter L.

    2005-01-01

    We examine the execution of general U(1) transformations on programmable quantum processors. We show that, with only the minimal assumption of availability of copies of the 1-qubit program state, the apparent advantage of existing schemes proposed by G. Vidal et al. [Phys. Rev. Lett. 88, 047905 (2002)] and M. Hillery et al. [Phys. Rev. A 65, 022301 (2003)] to execute a general U(1) transformation with greater probability using complex program states appears not to hold

  15. Optical backplane interconnect switch for data processors and computers

    Science.gov (United States)

    Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.

    1989-01-01

    An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.

  16. Study of an analog/logic processor for the design of an auto patch hybrid computer

    International Nuclear Information System (INIS)

    Koched, Hassen

    1976-01-01

    This paper presents the experimental study of an analog multiprocessor designed at SES/CEN-Saclay. An application of such a device as a basic component of an auto-patch hybrid computer is presented. First, the description of the processor, and a presentation of the theoretical concepts which governed the design of the processor are given. Experiments on an hybrid computer are then presented. Finally, different systems of automatic patching are presented, and conveniently modified, for the use of such a processor. (author) [fr

  17. Input data requirements for special processors in the computation system containing the VENTURE neutronics code

    International Nuclear Information System (INIS)

    Vondy, D.R.; Fowler, T.B.; Cunningham, G.W.

    1979-07-01

    User input data requirements are presented for certain special processors in a nuclear reactor computation system. These processors generally read data in formatted form and generate binary interface data files. Some data processing is done to convert from the user oriented form to the interface file forms. The VENTURE diffusion theory neutronics code and other computation modules in this system use the interface data files which are generated

  18. Input data requirements for special processors in the computation system containing the VENTURE neutronics code

    International Nuclear Information System (INIS)

    Vondy, D.R.; Fowler, T.B.; Cunningham, G.W.

    1976-11-01

    This report presents user input data requirements for certain special processors in a nuclear reactor computation system. These processors generally read data in formatted form and generate binary interface data files. Some data processing is done to convert from the user-oriented form to the interface file forms. The VENTURE diffusion theory neutronics code and other computation modules in this system use the interface data files which are generated

  19. Computationally efficient implementation of sarse-tap FIR adaptive filters with tap-position control on intel IA-32 processors

    OpenAIRE

    Hirano, Akihiro; Nakayama, Kenji

    2008-01-01

    This paper presents an computationally ef cient implementation of sparse-tap FIR adaptive lters with tapposition control on Intel IA-32 processors with single-instruction multiple-data (SIMD) capability. In order to overcome randomorder memory access which prevents a ectorization, a blockbased processing and a re-ordering buffer are introduced. A dynamic register allocation and the use of memory-to-register operations help the maximization of the loop-unrolling level. Up to 66percent speedup ...

  20. Computing prime factors with a Josephson phase qubit quantum processor

    Science.gov (United States)

    Lucero, Erik; Barends, R.; Chen, Y.; Kelly, J.; Mariantoni, M.; Megrant, A.; O'Malley, P.; Sank, D.; Vainsencher, A.; Wenner, J.; White, T.; Yin, Y.; Cleland, A. N.; Martinis, John M.

    2012-10-01

    A quantum processor can be used to exploit quantum mechanics to find the prime factors of composite numbers. Compiled versions of Shor's algorithm and Gauss sum factorizations have been demonstrated on ensemble quantum systems, photonic systems and trapped ions. Although proposed, these algorithms have yet to be shown using solid-state quantum bits. Using a number of recent qubit control and hardware advances, here we demonstrate a nine-quantum-element solid-state quantum processor and show three experiments to highlight its capabilities. We begin by characterizing the device with spectroscopy. Next, we produce coherent interactions between five qubits and verify bi- and tripartite entanglement through quantum state tomography. In the final experiment, we run a three-qubit compiled version of Shor's algorithm to factor the number 15, and successfully find the prime factors 48% of the time. Improvements in the superconducting qubit coherence times and more complex circuits should provide the resources necessary to factor larger composite numbers and run more intricate quantum algorithms.

  1. Fast algorithms for coordinate processors in Galois field for multiplicity t = 4.5 and t > 5

    International Nuclear Information System (INIS)

    Nikityuk, N.M.

    1989-01-01

    Fast algorithms for solving the coordinate equations for special-purpose processors at multiplicity t = 4.5 and t > 5 are described. Block diagrams of coordinate processor for t 4 in Galois field GF(2 m ) is presented which is solved by a table method. Economical algorithms for solving the coordinate equations by serial methods at t > 5 are described. The algorithms and devices proposed could be applied when creating fast processors in high energy physics spectrometers. 9 refs.; 3 figs

  2. Vector and parallel processors in computational science. Proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Duff, I S; Reid, J K

    1985-01-01

    This volume contains papers from most of the invited talks and from several of the contributed talks and poster sessions presented at VAPP II. The contents present an extensive coverage of all important aspects of vector and parallel processors, including hardware, languages, numerical algorithms and applications. The topics covered include descriptions of new machines (both research and commercial machines), languages and software aids, and general discussions of whole classes of machines and their uses. Numerical methods papers include Monte Carlo algorithms, iterative and direct methods for solving large systems, finite elements, optimization, random number generation and mathematical software. The specific applications covered include neutron diffusion calculations, molecular dynamics, weather forecasting, lattice gauge calculations, fluid dynamics, flight simulation, cartography, image processing and cryptography. Most machines and architecture types are being used for these applications. many refs.

  3. Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

    OpenAIRE

    Catalán, Sandra; Igual, Francisco D.; Mayo, Rafael; Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.

    2015-01-01

    Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware ...

  4. The Square Kilometre Array Science Data Processor. Preliminary compute platform design

    International Nuclear Information System (INIS)

    Broekema, P.C.; Nieuwpoort, R.V. van; Bal, H.E.

    2015-01-01

    The Square Kilometre Array is a next-generation radio-telescope, to be built in South Africa and Western Australia. It is currently in its detailed design phase, with procurement and construction scheduled to start in 2017. The SKA Science Data Processor is the high-performance computing element of the instrument, responsible for producing science-ready data. This is a major IT project, with the Science Data Processor expected to challenge the computing state-of-the art even in 2020. In this paper we introduce the preliminary Science Data Processor design and the principles that guide the design process, as well as the constraints to the design. We introduce a highly scalable and flexible system architecture capable of handling the SDP workload

  5. Bringing Algorithms to Life: Cooperative Computing Activities Using Students as Processors.

    Science.gov (United States)

    Bachelis, Gregory F.; And Others

    1994-01-01

    Presents cooperative computing activities in which each student plays the role of a switch or processor and acts out algorithms. Includes binary counting, finding the smallest card in a deck, sorting by selection and merging, adding and multiplying large numbers, and sieving for primes. (16 references) (Author/MKR)

  6. Mathematical Methods and Algorithms of Mobile Parallel Computing on the Base of Multi-core Processors

    Directory of Open Access Journals (Sweden)

    Alexander B. Bakulev

    2012-11-01

    Full Text Available This article deals with mathematical models and algorithms, providing mobility of sequential programs parallel representation on the high-level language, presents formal model of operation environment processes management, based on the proposed model of programs parallel representation, presenting computation process on the base of multi-core processors.

  7. Networking Micro-Processors for Effective Computer Utilization in Nursing

    OpenAIRE

    Mangaroo, Jewellean; Smith, Bob; Glasser, Jay; Littell, Arthur; Saba, Virginia

    1982-01-01

    Networking as a social entity has important implications for maximizing computer resources for improved utilization in nursing. This paper describes the one process of networking of complementary resources at three institutions. Prairie View A&M University, Texas A&M University and the University of Texas School of Public Health, which has effected greater utilization of computers at the college. The results achieved in this project should have implications for nurses, users, and consumers in...

  8. Restricted access processor - An application of computer security technology

    Science.gov (United States)

    Mcmahon, E. M.

    1985-01-01

    This paper describes a security guard device that is currently being developed by Computer Sciences Corporation (CSC). The methods used to provide assurance that the system meets its security requirements include the system architecture, a system security evaluation, and the application of formal and informal verification techniques. The combination of state-of-the-art technology and the incorporation of new verification procedures results in a demonstration of the feasibility of computer security technology for operational applications.

  9. Scalable Parallelization of Skyline Computation for Multi-core Processors

    DEFF Research Database (Denmark)

    Chester, Sean; Sidlauskas, Darius; Assent, Ira

    2015-01-01

    The skyline is an important query operator for multi-criteria decision making. It reduces a dataset to only those points that offer optimal trade-offs of dimensions. In general, it is very expensive to compute. Recently, multi-core CPU algorithms have been proposed to accelerate the computation...... of the skyline. However, they do not sufficiently minimize dominance tests and so are not competitive with state-of-the-art sequential algorithms. In this paper, we introduce a novel multi-core skyline algorithm, Hybrid, which processes points in blocks. It maintains a shared, global skyline among all threads...

  10. Survey of ANL organization plans for word processors, personal computers, workstations, and associated software

    Energy Technology Data Exchange (ETDEWEB)

    Fenske, K.R.

    1991-11-01

    The Computing and Telecommunications Division (CTD) has compiled this Survey of ANL Organization Plans for Word Processors, Personal Computers, Workstations, and Associated Software to provide DOE and Argonne with a record of recent growth in the acquisition and use of personal computers, microcomputers, and word processors at ANL. Laboratory planners, service providers, and people involved in office automation may find the Survey useful. It is for internal use only, and any unauthorized use is prohibited. Readers of the Survey should use it as a reference that documents the plans of each organization for office automation, identifies appropriate planners and other contact people in those organizations, and encourages the sharing of this information among those people making plans for organizations and decisions about office automation. The Survey supplements information in both the ANL Statement of Site Strategy for Computing Workstations and the ANL Site Response for the DOE Information Technology Resources Long-Range Plan.

  11. Survey of ANL organization plans for word processors, personal computers, workstations, and associated software. Revision 3

    Energy Technology Data Exchange (ETDEWEB)

    Fenske, K.R.

    1991-11-01

    The Computing and Telecommunications Division (CTD) has compiled this Survey of ANL Organization Plans for Word Processors, Personal Computers, Workstations, and Associated Software to provide DOE and Argonne with a record of recent growth in the acquisition and use of personal computers, microcomputers, and word processors at ANL. Laboratory planners, service providers, and people involved in office automation may find the Survey useful. It is for internal use only, and any unauthorized use is prohibited. Readers of the Survey should use it as a reference that documents the plans of each organization for office automation, identifies appropriate planners and other contact people in those organizations, and encourages the sharing of this information among those people making plans for organizations and decisions about office automation. The Survey supplements information in both the ANL Statement of Site Strategy for Computing Workstations and the ANL Site Response for the DOE Information Technology Resources Long-Range Plan.

  12. Interface Message Processors for the ARPA Computer Network

    Science.gov (United States)

    1976-07-01

    and then clear the location) as its primitive locking facility (i.e., as the necessary multiprocessor lock equivalent to Dijkstra semaphores )[37]. To...of the extra storage required for the redundant copies. There is the problem of maintaining synchronization of multiple copy data bases in the presence...through any of the data base sites. I Update synchronization . Races between conflicting, "concurrent" update requests are resolved in a manner that j

  13. PHANTOM: Practical Oblivious Computation in a Secure Processor

    Science.gov (United States)

    2014-05-16

    competitive relationship between the cloud provider and the client. For example, Netflix is hosted on Amazon’s EC2 cloud infrastructure, while Amazon is a...unilever/ (visited on 03/21/2013). [7] Charles Babcock. Cloud Connect: Netflix Finds Home In Amazon EC2. Retrieved Sep 16, 2013. url: http...www.informationweek.com/cloud-computing/infrast ructure/cloud-connect- netflix -finds-home-in-amaz/229300547. [8] Jonathan Bachrach et al. “Chisel: Constructing

  14. SAD PROCESSOR FOR MULTIPLE MACROBLOCK MATCHING IN FAST SEARCH VIDEO MOTION ESTIMATION

    Directory of Open Access Journals (Sweden)

    Nehal N. Shah

    2015-02-01

    Full Text Available Motion estimation is a very important but computationally complex task in video coding. Process of determining motion vectors based on the temporal correlation of consecutive frame is used for video compression. In order to reduce the computational complexity of motion estimation and maintain the quality of encoding during motion compensation, different fast search techniques are available. These block based motion estimation algorithms use the sum of absolute difference (SAD between corresponding macroblock in current frame and all the candidate macroblocks in the reference frame to identify best match. Existing implementations can perform SAD between two blocks using sequential or pipeline approach but performing multi operand SAD in single clock cycle with optimized recourses is state of art. In this paper various parallel architectures for computation of the fixed block size SAD is evaluated and fast parallel SAD architecture is proposed with optimized resources. Further SAD processor is described with 9 processing elements which can be configured for any existing fast search block matching algorithm. Proposed SAD processor consumes 7% fewer adders compared to existing implementation for one processing elements. Using nine PE it can process 84 HD frames per second in worse case which is good outcome for real time implementation. In average case architecture process 325 HD frames per second.

  15. Icarus: A 2-D Direct Simulation Monte Carlo (DSMC) Code for Multi-Processor Computers

    International Nuclear Information System (INIS)

    BARTEL, TIMOTHY J.; PLIMPTON, STEVEN J.; GALLIS, MICHAIL A.

    2001-01-01

    Icarus is a 2D Direct Simulation Monte Carlo (DSMC) code which has been optimized for the parallel computing environment. The code is based on the DSMC method of Bird[11.1] and models from free-molecular to continuum flowfields in either cartesian (x, y) or axisymmetric (z, r) coordinates. Computational particles, representing a given number of molecules or atoms, are tracked as they have collisions with other particles or surfaces. Multiple species, internal energy modes (rotation and vibration), chemistry, and ion transport are modeled. A new trace species methodology for collisions and chemistry is used to obtain statistics for small species concentrations. Gas phase chemistry is modeled using steric factors derived from Arrhenius reaction rates or in a manner similar to continuum modeling. Surface chemistry is modeled with surface reaction probabilities; an optional site density, energy dependent, coverage model is included. Electrons are modeled by either a local charge neutrality assumption or as discrete simulational particles. Ion chemistry is modeled with electron impact chemistry rates and charge exchange reactions. Coulomb collision cross-sections are used instead of Variable Hard Sphere values for ion-ion interactions. The electro-static fields can either be: externally input, a Langmuir-Tonks model or from a Green's Function (Boundary Element) based Poison Solver. Icarus has been used for subsonic to hypersonic, chemically reacting, and plasma flows. The Icarus software package includes the grid generation, parallel processor decomposition, post-processing, and restart software. The commercial graphics package, Tecplot, is used for graphics display. All of the software packages are written in standard Fortran

  16. DiFX: A software correlator for very long baseline interferometry using multi-processor computing environments

    OpenAIRE

    Deller, A. T.; Tingay, S. J.; Bailes, M.; West, C.

    2007-01-01

    We describe the development of an FX style correlator for Very Long Baseline Interferometry (VLBI), implemented in software and intended to run in multi-processor computing environments, such as large clusters of commodity machines (Beowulf clusters) or computers specifically designed for high performance computing, such as multi-processor shared-memory machines. We outline the scientific and practical benefits for VLBI correlation, these chiefly being due to the inherent flexibility of softw...

  17. A Versatile Image Processor For Digital Diagnostic Imaging And Its Application In Computed Radiography

    Science.gov (United States)

    Blume, H.; Alexandru, R.; Applegate, R.; Giordano, T.; Kamiya, K.; Kresina, R.

    1986-06-01

    In a digital diagnostic imaging department, the majority of operations for handling and processing of images can be grouped into a small set of basic operations, such as image data buffering and storage, image processing and analysis, image display, image data transmission and image data compression. These operations occur in almost all nodes of the diagnostic imaging communications network of the department. An image processor architecture was developed in which each of these functions has been mapped into hardware and software modules. The modular approach has advantages in terms of economics, service, expandability and upgradeability. The architectural design is based on the principles of hierarchical functionality, distributed and parallel processing and aims at real time response. Parallel processing and real time response is facilitated in part by a dual bus system: a VME control bus and a high speed image data bus, consisting of 8 independent parallel 16-bit busses, capable of handling combined up to 144 MBytes/sec. The presented image processor is versatile enough to meet the video rate processing needs of digital subtraction angiography, the large pixel matrix processing requirements of static projection radiography, or the broad range of manipulation and display needs of a multi-modality diagnostic work station. Several hardware modules are described in detail. For illustrating the capabilities of the image processor, processed 2000 x 2000 pixel computed radiographs are shown and estimated computation times for executing the processing opera-tions are presented.

  18. Assessing the Progress of Trapped-Ion Processors Towards Fault-Tolerant Quantum Computation

    Science.gov (United States)

    Bermudez, A.; Xu, X.; Nigmatullin, R.; O'Gorman, J.; Negnevitsky, V.; Schindler, P.; Monz, T.; Poschinger, U. G.; Hempel, C.; Home, J.; Schmidt-Kaler, F.; Biercuk, M.; Blatt, R.; Benjamin, S.; Müller, M.

    2017-10-01

    A quantitative assessment of the progress of small prototype quantum processors towards fault-tolerant quantum computation is a problem of current interest in experimental and theoretical quantum information science. We introduce a necessary and fair criterion for quantum error correction (QEC), which must be achieved in the development of these quantum processors before their sizes are sufficiently big to consider the well-known QEC threshold. We apply this criterion to benchmark the ongoing effort in implementing QEC with topological color codes using trapped-ion quantum processors and, more importantly, to guide the future hardware developments that will be required in order to demonstrate beneficial QEC with small topological quantum codes. In doing so, we present a thorough description of a realistic trapped-ion toolbox for QEC and a physically motivated error model that goes beyond standard simplifications in the QEC literature. We focus on laser-based quantum gates realized in two-species trapped-ion crystals in high-optical aperture segmented traps. Our large-scale numerical analysis shows that, with the foreseen technological improvements described here, this platform is a very promising candidate for fault-tolerant quantum computation.

  19. 35-We polymer electrolyte membrane fuel cell system for notebook computer using a compact fuel processor

    Science.gov (United States)

    Son, In-Hyuk; Shin, Woo-Cheol; Lee, Yong-Kul; Lee, Sung-Chul; Ahn, Jin-Gu; Han, Sang-Il; kweon, Ho-Jin; Kim, Ju-Yong; Kim, Moon-Chan; Park, Jun-Yong

    A polymer electrolyte membrane fuel cell (PEMFC) system is developed to power a notebook computer. The system consists of a compact methanol-reforming system with a CO preferential oxidation unit, a 16-cell PEMFC stack, and a control unit for the management of the system with a d.c.-d.c. converter. The compact fuel-processor system (260 cm 3) generates about 1.2 L min -1 of reformate, which corresponds to 35 We, with a low CO concentration (notebook computers.

  20. 35-We polymer electrolyte membrane fuel cell system for notebook computer using a compact fuel processor

    Energy Technology Data Exchange (ETDEWEB)

    Son, In-Hyuk; Shin, Woo-Cheol; Lee, Sung-Chul; Ahn, Jin-Gu; Han, Sang-Il; kweon, Ho-Jin; Kim, Ju-Yong; Park, Jun-Yong [Energy 1 Group, Energy Laboratory at Corporate R and D Center in Samsung SDI Co., Ltd., 575, Shin-dong, Yeongtong-gu, Suwon-si, Gyeonggi-do 443-731 (Korea); Lee, Yong-Kul [Department of Chemical Engineering, Dankook University, Youngin 448-701 (Korea); Kim, Moon-Chan [Department of Environmental Engineering, Chongju University, Chongju 360-764 (Korea)

    2008-10-15

    A polymer electrolyte membrane fuel cell (PEMFC) system is developed to power a notebook computer. The system consists of a compact methanol-reforming system with a CO preferential oxidation unit, a 16-cell PEMFC stack, and a control unit for the management of the system with a d.c.-d.c. converter. The compact fuel-processor system (260 cm{sup 3}) generates about 1.2 L min{sup -1} of reformate, which corresponds to 35 We, with a low CO concentration (<30 ppm, typically 0 ppm), and is thus proven to be capable of being targetted at notebook computers. (author)

  1. Efficient Backprojection-Based Synthetic Aperture Radar Computation with Many-Core Processors

    Directory of Open Access Journals (Sweden)

    Jongsoo Park

    2013-01-01

    Full Text Available Tackling computationally challenging problems with high efficiency often requires the combination of algorithmic innovation, advanced architecture, and thorough exploitation of parallelism. We demonstrate this synergy through synthetic aperture radar (SAR via backprojection, an image reconstruction method that can require hundreds of TFLOPS. Computation cost is significantly reduced by our new algorithm of approximate strength reduction; data movement cost is economized by software locality optimizations facilitated by advanced architecture support; parallelism is fully harnessed in various patterns and granularities. We deliver over 35 billion backprojections per second throughput per compute node on an Intel® Xeon® processor E5-2670-based cluster, equipped with Intel® Xeon Phi™ coprocessors. This corresponds to processing a 3K×3K image within a second using a single node. Our study can be extended to other settings: backprojection is applicable elsewhere including medical imaging, approximate strength reduction is a general code transformation technique, and many-core processors are emerging as a solution to energy-efficient computing.

  2. Survey of ANL organization plans for word processors, personal computers, workstations, and associated software

    Energy Technology Data Exchange (ETDEWEB)

    Fenske, K.R.; Rockwell, V.S.

    1992-08-01

    The Computing and Telecommunications Division (CTD) has compiled this Survey of ANL Organization plans for Word Processors, Personal Computers, Workstations, and Associated Software (ANL/TM, Revision 4) to provide DOE and Argonne with a record of recent growth in the acquisition and use of personal computers, microcomputers, and word processors at ANL. Laboratory planners, service providers, and people involved in office automation may find the Survey useful. It is for internal use only, and any unauthorized use is prohibited. Readers of the Survey should use it as a reference document that (1) documents the plans of each organization for office automation, (2) identifies appropriate planners and other contact people in those organizations and (3) encourages the sharing of this information among those people making plans for organizations and decisions about office automation. The Survey supplements information in both the ANL Statement of Site Strategy for Computing Workstations (ANL/TM 458) and the ANL Site Response for the DOE Information Technology Resources Long-Range Plan (ANL/TM 466).

  3. Survey of ANL organization plans for word processors, personal computers, workstations, and associated software. Revision 4

    Energy Technology Data Exchange (ETDEWEB)

    Fenske, K.R.; Rockwell, V.S.

    1992-08-01

    The Computing and Telecommunications Division (CTD) has compiled this Survey of ANL Organization plans for Word Processors, Personal Computers, Workstations, and Associated Software (ANL/TM, Revision 4) to provide DOE and Argonne with a record of recent growth in the acquisition and use of personal computers, microcomputers, and word processors at ANL. Laboratory planners, service providers, and people involved in office automation may find the Survey useful. It is for internal use only, and any unauthorized use is prohibited. Readers of the Survey should use it as a reference document that (1) documents the plans of each organization for office automation, (2) identifies appropriate planners and other contact people in those organizations and (3) encourages the sharing of this information among those people making plans for organizations and decisions about office automation. The Survey supplements information in both the ANL Statement of Site Strategy for Computing Workstations (ANL/TM 458) and the ANL Site Response for the DOE Information Technology Resources Long-Range Plan (ANL/TM 466).

  4. Matrix-vector multiplication using digital partitioning for more accurate optical computing

    Science.gov (United States)

    Gary, C. K.

    1992-01-01

    Digital partitioning offers a flexible means of increasing the accuracy of an optical matrix-vector processor. This algorithm can be implemented with the same architecture required for a purely analog processor, which gives optical matrix-vector processors the ability to perform high-accuracy calculations at speeds comparable with or greater than electronic computers as well as the ability to perform analog operations at a much greater speed. Digital partitioning is compared with digital multiplication by analog convolution, residue number systems, and redundant number representation in terms of the size and the speed required for an equivalent throughput as well as in terms of the hardware requirements. Digital partitioning and digital multiplication by analog convolution are found to be the most efficient alogrithms if coding time and hardware are considered, and the architecture for digital partitioning permits the use of analog computations to provide the greatest throughput for a single processor.

  5. TMS320C25 Digital Signal Processor For 2-Dimensional Fast Fourier Transform Computation

    International Nuclear Information System (INIS)

    Ardisasmita, M. Syamsa

    1996-01-01

    The Fourier transform is one of the most important mathematical tool in signal processing and analysis, which converts information from the time/spatial domain into the frequency domain. Even with implementation of the Fast Fourier Transform algorithms in imaging data, the discrete Fourier transform execution consume a lot of time. Digital signal processors are designed specifically to perform computation intensive digital signal processing algorithms. By taking advantage of the advanced architecture. parallel processing, and dedicated digital signal processing (DSP) instruction sets. This device can execute million of DSP operations per second. The device architecture, characteristics and feature suitable for fast Fourier transform application and speed-up are discussed

  6. Animated computer graphics models of space and earth sciences data generated via the massively parallel processor

    Science.gov (United States)

    Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David

    1987-01-01

    The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.

  7. Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I

    Energy Technology Data Exchange (ETDEWEB)

    Schmalz, Mark S

    2011-07-24

    Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G} for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient

  8. Computing effective properties of random heterogeneous materials on heterogeneous parallel processors

    Science.gov (United States)

    Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto

    2012-11-01

    In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.

  9. Custom Hardware Processor to Compute a Figure of Merit for the Fit of X-Ray Diffraction

    International Nuclear Information System (INIS)

    Gomez-Pulido, P.J.A.; Vega-Rodriguez, M.A.; Sanchez-Perez, J.M.; Sanchez-Bajo, F.; Santos, S.P.D.

    2008-01-01

    A custom processor based on re configurable hardware technology is proposed in order to compute the figure of merit used to measure the quality of the fit of X-ray diffraction peaks. As the experimental X-ray profiles can present many peaks severely overlapped, it is necessary to select the best model among a large set of reasonably good solutions. Determining the best solution is computationally intensive, because this is a hard combinatorial optimization problem. The proposed processors, working in parallel, increase the performance relative to a software implementation.

  10. Neuron splitting in compute-bound parallel network simulations enables runtime scaling with twice as many processors.

    Science.gov (United States)

    Hines, Michael L; Eichner, Hubert; Schürmann, Felix

    2008-08-01

    Neuron tree topology equations can be split into two subtrees and solved on different processors with no change in accuracy, stability, or computational effort; communication costs involve only sending and receiving two double precision values by each subtree at each time step. Splitting cells is useful in attaining load balance in neural network simulations, especially when there is a wide range of cell sizes and the number of cells is about the same as the number of processors. For compute-bound simulations load balance results in almost ideal runtime scaling. Application of the cell splitting method to two published network models exhibits good runtime scaling on twice as many processors as could be effectively used with whole-cell balancing.

  11. EXPERIENCE WITH FPGA-BASED PROCESSOR CORE AS FRONT-END COMPUTER

    International Nuclear Information System (INIS)

    HOFF, L.T.

    2005-01-01

    The RHIC control system architecture follows the familiar ''standard model''. LINUX workstations are used as operator consoles. Front-end computers are distributed around the accelerator, close to equipment being controlled or monitored. These computers are generally based on VMEbus CPU modules running the VxWorks operating system. I/O is typically performed via the VMEbus, or via PMC daughter cards (via an internal PCI bus), or via on-board I/O interfaces (Ethernet or serial). Advances in FPGA size and sophistication now permit running virtual processor ''cores'' within the FPGA logic, including ''cores'' with advanced features such as memory management. Such systems offer certain advantages over traditional VMEbus Front-end computers. Advantages include tighter coupling with FPGA logic, and therefore higher I/O bandwidth, and flexibility in packaging, possibly resulting in a lower noise environment and/or lower cost. This paper presents the experience acquired while porting the RHIC control system to a PowerPC 405 core within a Xilinx FPGA for use in low-level RF control

  12. Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors

    DEFF Research Database (Denmark)

    Liu, Weifeng; Vinter, Brian

    2015-01-01

    of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over...

  13. Embedded Processor Laboratory

    Data.gov (United States)

    Federal Laboratory Consortium — The Embedded Processor Laboratory provides the means to design, develop, fabricate, and test embedded computers for missile guidance electronics systems in support...

  14. Analysis of the computational requirements of a pulse-doppler radar signal processor

    CSIR Research Space (South Africa)

    Broich, R

    2012-05-01

    Full Text Available In an attempt to find an optimal processing architecture for radar signal processing applications, the different algorithms that are typically used in a pulse-Doppler radar signal processor are investigated. Radar algorithms are broken down...

  15. Computations on the massively parallel processor at the Goddard Space Flight Center

    Science.gov (United States)

    Strong, James P.

    1991-01-01

    Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.

  16. Image Matrix Processor for Volumetric Computations Final Report CRADA No. TSB-1148-95

    Energy Technology Data Exchange (ETDEWEB)

    Roberson, G. Patrick [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Browne, Jolyon [Advanced Research & Applications Corporation, Sunnyvale, CA (United States)

    2018-01-22

    The development of an Image Matrix Processor (IMP) was proposed that would provide an economical means to perform rapid ray-tracing processes on volume "Giga Voxel" data sets. This was a multi-phased project. The objective of the first phase of the IMP project was to evaluate the practicality of implementing a workstation-based Image Matrix Processor for use in volumetric reconstruction and rendering using hardware simulation techniques. Additionally, ARACOR and LLNL worked together to identify and pursue further funding sources to complete a second phase of this project.

  17. The Fermilab Advanced Computer Program multi-array processor system (ACPMAPS): A site oriented supercomputer for theoretical physics

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.

    1988-08-01

    The ACP Multi-Array Processor System (ACPMAPS) is a highly cost effective, local memory parallel computer designed for floating point intensive grid based problems. The processing nodes of the system are single board array processors based on the FORTRAN and C programmable Weitek XL chip set. The nodes are connected by a network of very high bandwidth 16 port crossbar switches. The architecture is designed to achieve the highest possible cost effectiveness while maintaining a high level of programmability. The primary application of the machine at Fermilab will be lattice gauge theory. The hardware is supported by a transparent site oriented software system called CANOPY which shields theorist users from the underlying node structure. 4 refs., 2 figs

  18. Computation of Molecular Spectra on a Quantum Processor with an Error-Resilient Algorithm

    Directory of Open Access Journals (Sweden)

    J. I. Colless

    2018-02-01

    Full Text Available Harnessing the full power of nascent quantum processors requires the efficient management of a limited number of quantum bits with finite coherent lifetimes. Hybrid algorithms, such as the variational quantum eigensolver (VQE, leverage classical resources to reduce the required number of quantum gates. Experimental demonstrations of VQE have resulted in calculation of Hamiltonian ground states, and a new theoretical approach based on a quantum subspace expansion (QSE has outlined a procedure for determining excited states that are central to dynamical processes. We use a superconducting-qubit-based processor to apply the QSE approach to the H_{2} molecule, extracting both ground and excited states without the need for auxiliary qubits or additional minimization. Further, we show that this extended protocol can mitigate the effects of incoherent errors, potentially enabling larger-scale quantum simulations without the need for complex error-correction techniques.

  19. Computation of Molecular Spectra on a Quantum Processor with an Error-Resilient Algorithm

    Science.gov (United States)

    Colless, J. I.; Ramasesh, V. V.; Dahlen, D.; Blok, M. S.; Kimchi-Schwartz, M. E.; McClean, J. R.; Carter, J.; de Jong, W. A.; Siddiqi, I.

    2018-02-01

    Harnessing the full power of nascent quantum processors requires the efficient management of a limited number of quantum bits with finite coherent lifetimes. Hybrid algorithms, such as the variational quantum eigensolver (VQE), leverage classical resources to reduce the required number of quantum gates. Experimental demonstrations of VQE have resulted in calculation of Hamiltonian ground states, and a new theoretical approach based on a quantum subspace expansion (QSE) has outlined a procedure for determining excited states that are central to dynamical processes. We use a superconducting-qubit-based processor to apply the QSE approach to the H2 molecule, extracting both ground and excited states without the need for auxiliary qubits or additional minimization. Further, we show that this extended protocol can mitigate the effects of incoherent errors, potentially enabling larger-scale quantum simulations without the need for complex error-correction techniques.

  20. Design and Demonstration of RSFQ Processor Datapath for High Performance Computing

    Science.gov (United States)

    2014-09-30

    Microarchitecture for Wide Datapath RSFQ Processors: Design Study , IEEE Transactions on Applied Superconductivity, (06 2011): 787. doi: 10.1109/TASC...undergraduates funded by your agreement who graduated during this period and will receive scholarships or fellowships for further studies in science...size of 4.4 mm x 3.0 nun and consists of - 4000 Josephson junctions. The total length of passive transmission lines (PTL) used in wiring is - 0.2m

  1. High performance graphics processor based computed tomography reconstruction algorithms for nuclear and other large scale applications.

    Energy Technology Data Exchange (ETDEWEB)

    Jimenez, Edward S. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Orr, Laurel J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thompson, Kyle R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2013-09-01

    The goal of this work is to develop a fast computed tomography (CT) reconstruction algorithm based on graphics processing units (GPU) that achieves significant improvement over traditional central processing unit (CPU) based implementations. The main challenge in developing a CT algorithm that is capable of handling very large datasets is parallelizing the algorithm in such a way that data transfer does not hinder performance of the reconstruction algorithm. General Purpose Graphics Processing (GPGPU) is a new technology that the Science and Technology (S&T) community is starting to adopt in many fields where CPU-based computing is the norm. GPGPU programming requires a new approach to algorithm development that utilizes massively multi-threaded environments. Multi-threaded algorithms in general are difficult to optimize since performance bottlenecks occur that are non-existent in single-threaded algorithms such as memory latencies. If an efficient GPU-based CT reconstruction algorithm can be developed; computational times could be improved by a factor of 20. Additionally, cost benefits will be realized as commodity graphics hardware could potentially replace expensive supercomputers and high-end workstations. This project will take advantage of the CUDA programming environment and attempt to parallelize the task in such a way that multiple slices of the reconstruction volume are computed simultaneously. This work will also take advantage of the GPU memory by utilizing asynchronous memory transfers, GPU texture memory, and (when possible) pinned host memory so that the memory transfer bottleneck inherent to GPGPU is amortized. Additionally, this work will take advantage of GPU-specific hardware (i.e. fast texture memory, pixel-pipelines, hardware interpolators, and varying memory hierarchy) that will allow for additional performance improvements.

  2. Design and implementation of the modified signed digit multiplication routine on a ternary optical computer.

    Science.gov (United States)

    Xu, Qun; Wang, Xianchao; Xu, Chao

    2017-06-01

    Multiplication with traditional electronic computers is faced with a low calculating accuracy and a long computation time delay. To overcome these problems, the modified signed digit (MSD) multiplication routine is established based on the MSD system and the carry-free adder. Also, its parallel algorithm and optimization techniques are studied in detail. With the help of a ternary optical computer's characteristics, the structured data processor is designed especially for the multiplication routine. Several ternary optical operators are constructed to perform M transformations and summations in parallel, which has accelerated the iterative process of multiplication. In particular, the routine allocates data bits of the ternary optical processor based on digits of multiplication input, so the accuracy of the calculation results can always satisfy the users. Finally, the routine is verified by simulation experiments, and the results are in full compliance with the expectations. Compared with an electronic computer, the MSD multiplication routine is not only good at dealing with large-value data and high-precision arithmetic, but also maintains lower power consumption and fewer calculating delays.

  3. The Heidelberg POLYP - a flexible and fault-tolerant poly-processor

    International Nuclear Information System (INIS)

    Maenner, R.; Deluigi, B.

    1981-01-01

    The Heidelberg poly-processor system POLYP is described. It is intended to be used in nuclear physics for reprocessing of experimental data, in high energy physics as second-stage trigger processor, and generally in other applications requiring high-computing power. The POLYP system consists of any number of I/O-processors, processor modules (eventually of different types), global memory segments, and a host processor. All modules (up to several hundred) are connected by a multiple common-data-bus system; all processors, additionally, by a multiple sync bus system for processor/task-scheduling. All hard- and software is designed to be decentralized and free of bottle-necks. Most hardware-faults like single-bit errors in memory or multi-bit errors during transfers are automatically corrected. Defective modules, buses, etc., can be removed with only a graceful degradation of the system-throughput. (orig.)

  4. A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors

    DEFF Research Database (Denmark)

    Liu, Weifeng; Vinter, Brian

    2015-01-01

    General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle...... extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse matrix is unknown in advance, (2) very expensive parallel insert operations at random positions in the resulting sparse matrix dominate the execution time, and (3) load balancing must account for sparse data...... memory space and efficiently utilizes the very limited on-chip scratchpad memory. Parallel insert operations of the nonzero entries are implemented through the GPU merge path algorithm that is experimentally found to be the fastest GPU merge approach. Load balancing builds on the number of necessary...

  5. A 16-channel real-time digital processor for pulse-shape discrimination in multiplicity assay

    International Nuclear Information System (INIS)

    Joyce, Malcolm J.; Aspinall, M.D.; Cave, F.D.; Lavietes, A.

    2013-06-01

    In recent years, real-time neutron/γ-ray pulse-shape discrimination has become feasible for use with scintillator-based detectors that respond extremely quickly, on the order of 25 ns in terms of pulse width, and their application to a variety of nuclear material assays has been reported. For the in-situ analysis of nuclear materials, measurements are often based on the multiplicity assessment of spontaneous fission events. An example of this is the 240 Pu eff assessment stemming from long-established techniques developed for 3 He-based neutron coincidence counters when 3 He was abundant and cheap. However, such measurements when using scintillator detectors can be plagued by low detection efficiencies and low orders of coincidence (often limited to triples) if the number of detectors in use is similarly limited to 3-4 detectors. Conversely, an array of >10 detector modules arranged to optimize efficiency and multiplicity sensitivity, shifts the emphasis in terms of performance requirement to the real-time digital analyzer and, critically, to the scope remaining in the temporal processing window of these systems. In this paper we report on the design, development and commissioning of a bespoke, 16-channel real-time pulse-shape discrimination analyzer specified for the materials assay challenge summarized above. The analyzer incorporates 16 dedicated and independent high-voltage supplies along with 16 independent digital processing channels offering pulse-shape discrimination at a rate of 3 x 10 6 events per second. These functions are configured from a dedicated graphical user interface, and all settings can be adjusted on-the-fly with the analyzer effectively configured one-time-only (where desired) for subsequent plug-and-play connection, for example to a fuel bundle organic scintillation detector array. (authors)

  6. A low-cost, high-performance, digital signal processor-based lock-in amplifier capable of measuring multiple frequency sweeps simultaneously

    International Nuclear Information System (INIS)

    Sonnaillon, Maximiliano Osvaldo; Bonetto, Fabian Jose

    2005-01-01

    A high-performance digital lock-in amplifier implemented in a low-cost digital signal processor (DSP) board is described. This lock in is capable of measuring simultaneously multiple frequencies that change in time as frequency sweeps (chirps). The used 32-bit DSP has enough computing power to generate N=3 simultaneous reference signals and accurately measure the N=3 responses, operating as three lock ins connected in parallel to a linear system. The lock in stores the measured values in memory until they are downloaded to the a personal computer (PC). The lock in works in stand-alone mode and can be programmed and configured through the PC serial port. Downsampling and multiple filter stages were used in order to obtain a sharp roll off and a long time constant in the filters. This makes measurements possible in presence of high-noise levels. Before each measurement, the lock in performs an autocalibration that measures the frequency response of analog output and input circuitry in order to compensate for the departure from ideal operation. Improvements from previous lock-in implementations allow measuring the frequency response of a system in a short time. Furthermore, the proposed implementation can measure how the frequency response changes with time, a characteristic that is very important in our biotechnological application. The number of simultaneous components that the lock in can generate and measure can be extended, without reprogramming, by only using other DSPs of the same family that are code compatible and work at higher clock frequencies

  7. A low-cost, high-performance, digital signal processor-based lock-in amplifier capable of measuring multiple frequency sweeps simultaneously

    Energy Technology Data Exchange (ETDEWEB)

    Sonnaillon, Maximiliano Osvaldo; Bonetto, Fabian Jose [Laboratorio de Cavitacion y Biotecnologia, San Carlos de Bariloche (8400) (Argentina)

    2005-02-01

    A high-performance digital lock-in amplifier implemented in a low-cost digital signal processor (DSP) board is described. This lock in is capable of measuring simultaneously multiple frequencies that change in time as frequency sweeps (chirps). The used 32-bit DSP has enough computing power to generate N=3 simultaneous reference signals and accurately measure the N=3 responses, operating as three lock ins connected in parallel to a linear system. The lock in stores the measured values in memory until they are downloaded to the a personal computer (PC). The lock in works in stand-alone mode and can be programmed and configured through the PC serial port. Downsampling and multiple filter stages were used in order to obtain a sharp roll off and a long time constant in the filters. This makes measurements possible in presence of high-noise levels. Before each measurement, the lock in performs an autocalibration that measures the frequency response of analog output and input circuitry in order to compensate for the departure from ideal operation. Improvements from previous lock-in implementations allow measuring the frequency response of a system in a short time. Furthermore, the proposed implementation can measure how the frequency response changes with time, a characteristic that is very important in our biotechnological application. The number of simultaneous components that the lock in can generate and measure can be extended, without reprogramming, by only using other DSPs of the same family that are code compatible and work at higher clock frequencies.

  8. Supercomputers and parallel computation. Based on the proceedings of a workshop on progress in the use of vector and array processors organised by the Institute of Mathematics and its Applications and held in Bristol, 2-3 September 1982

    International Nuclear Information System (INIS)

    Paddon, D.J.

    1984-01-01

    This book is based on the proceedings of a conference on parallel computing held in 1982. There are 18 papers which cover the following topics: VLSI parallel architectures, the theory of parallel computing and vector and array processor computing. One paper on 'Tough Problems in Reactor Design' is indexed separately. All the contributions are on research done in the United Kingdom. Although much of the experience in array processor computing is associated with the ICL distributed array processor (DAP) and this is reflected in the contributions, the research relating to the ICL DAP is relevant to all types of array processors. (UK)

  9. Possibilities of computer tomography in multiple sclerosis

    International Nuclear Information System (INIS)

    Vymazal, J.; Bauer, J.

    1983-01-01

    Computer tomography was performed in 41 patients with multiple sclerosis, the average age of patients being 40.8 years. Native examinations were made of 17 patients, examinations with contrast medium of 19, both methods were used in the examination of 5 patients. In 26 patients, i.e. in almost two-thirds, cerebral atrophy was found, in 11 of a severe type. In 9 patients atrophy affected only the hemispheres, in 16 also the stem and cerebellum. The stem and cerebellum only were affected in 1 patient. Hypodense foci were found in 21 patients, i.e. more than half of those examined. In 9 there were multiple foci. In most of the 19 examined patients the hypodense changes were in the hemispheres and only in 2 in the cerebellum and brain stem. No hyperdense changes were detected. The value and possibilities are discussed of examinations by computer tomography multiple sclerosis. (author)

  10. Analog Integrated Circuit Design for Spike Time Dependent Encoder and Reservoir in Reservoir Computing Processors

    Science.gov (United States)

    2018-01-01

    HAS BEEN REVIEWED AND IS APPROVED FOR PUBLICATION IN ACCORDANCE WITH ASSIGNED DISTRIBUTION STATEMENT. FOR THE CHIEF ENGINEER : / S / / S...bridged high-performance computing, nanotechnology , and integrated circuits & systems. 15. SUBJECT TERMS neuromorphic computing, neuron design, spike...multidisciplinary effort encompassed high-performance computing, nanotechnology , integrated circuits, and integrated systems. The project’s architecture was

  11. Quantum simulation of superconductors on quantum computers. Toward the first applications of quantum processors

    International Nuclear Information System (INIS)

    Dallaire-Demers, Pierre-Luc

    2016-01-01

    Quantum computers are the ideal platform for quantum simulations. Given enough coherent operations and qubits, such machines can be leveraged to simulate strongly correlated materials, where intricate quantum effects give rise to counter-intuitive macroscopic phenomena such as high-temperature superconductivity. Many phenomena of strongly correlated materials are encapsulated in the Fermi-Hubbard model. In general, no closed-form solution is known for lattices of more than one spatial dimension, but they can be numerically approximated using cluster methods. To model long-range effects such as order parameters, a powerful method to compute the cluster's Green's function consists in finding its self-energy through a variational principle. As is shown in this thesis, this allows the possibility of studying various phase transitions at finite temperature in the Fermi-Hubbard model. However, a classical cluster solver quickly hits an exponential wall in the memory (or computation time) required to store the computation variables. We show theoretically that the cluster solver can be mapped to a subroutine on a quantum computer whose quantum memory usage scales linearly with the number of orbitals in the simulated cluster and the number of measurements scales quadratically. We also provide a gate decomposition of the cluster Hamiltonian and a simple planar architecture for a quantum simulator that can also be used to simulate more general fermionic systems. We briefly analyze the Trotter-Suzuki errors and estimate the scaling properties of the algorithm for more complex applications. A quantum computer with a few tens of qubits could therefore simulate the thermodynamic properties of complex fermionic lattices inaccessible to classical supercomputers.

  12. Quantum simulation of superconductors on quantum computers. Toward the first applications of quantum processors

    Energy Technology Data Exchange (ETDEWEB)

    Dallaire-Demers, Pierre-Luc

    2016-10-07

    Quantum computers are the ideal platform for quantum simulations. Given enough coherent operations and qubits, such machines can be leveraged to simulate strongly correlated materials, where intricate quantum effects give rise to counter-intuitive macroscopic phenomena such as high-temperature superconductivity. Many phenomena of strongly correlated materials are encapsulated in the Fermi-Hubbard model. In general, no closed-form solution is known for lattices of more than one spatial dimension, but they can be numerically approximated using cluster methods. To model long-range effects such as order parameters, a powerful method to compute the cluster's Green's function consists in finding its self-energy through a variational principle. As is shown in this thesis, this allows the possibility of studying various phase transitions at finite temperature in the Fermi-Hubbard model. However, a classical cluster solver quickly hits an exponential wall in the memory (or computation time) required to store the computation variables. We show theoretically that the cluster solver can be mapped to a subroutine on a quantum computer whose quantum memory usage scales linearly with the number of orbitals in the simulated cluster and the number of measurements scales quadratically. We also provide a gate decomposition of the cluster Hamiltonian and a simple planar architecture for a quantum simulator that can also be used to simulate more general fermionic systems. We briefly analyze the Trotter-Suzuki errors and estimate the scaling properties of the algorithm for more complex applications. A quantum computer with a few tens of qubits could therefore simulate the thermodynamic properties of complex fermionic lattices inaccessible to classical supercomputers.

  13. Strategies for Sharing Seismic Data Among Multiple Computer Platforms

    Science.gov (United States)

    Baker, L. M.; Fletcher, J. B.

    2001-12-01

    the user. Commercial software packages, such as MatLab, also have the ability to share data in their own formats across multiple computer platforms. Our Fortran applications can create plot files in Adobe PostScript, Illustrator, and Portable Document Format (PDF) formats. Vendor support for reading these files is readily available on multiple computer platforms. We will illustrate by example our strategies for sharing seismic data among our multiple computer platforms, and we will discuss our positive and negative experiences. We will include our solutions for handling the different byte ordering, floating-point formats, and text file ``end-of-line'' conventions on the various computer platforms we use (6 different operating systems on 5 processor architectures).

  14. A low-cost vector processor boosting compute-intensive image processing operations

    Science.gov (United States)

    Adorf, Hans-Martin

    1992-01-01

    Low-cost vector processing (VP) is within reach of everyone seriously engaged in scientific computing. The advent of affordable add-on VP-boards for standard workstations complemented by mathematical/statistical libraries is beginning to impact compute-intensive tasks such as image processing. A case in point in the restoration of distorted images from the Hubble Space Telescope. A low-cost implementation is presented of the standard Tarasko-Richardson-Lucy restoration algorithm on an Intel i860-based VP-board which is seamlessly interfaced to a commercial, interactive image processing system. First experience is reported (including some benchmarks for standalone FFT's) and some conclusions are drawn.

  15. AMD's 64-bit Opteron processor

    CERN Multimedia

    CERN. Geneva

    2003-01-01

    This talk concentrates on issues that relate to obtaining peak performance from the Opteron processor. Compiler options, memory layout, MPI issues in multi-processor configurations and the use of a NUMA kernel will be covered. A discussion of recent benchmarking projects and results will also be included.BiographiesDavid RichDavid directs AMD's efforts in high performance computing and also in the use of Opteron processors...

  16. Array processor architecture

    Science.gov (United States)

    Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

    1983-01-01

    A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.

  17. Three-dimensional magnetic field computation on a distributed memory parallel processor

    International Nuclear Information System (INIS)

    Barion, M.L.

    1990-01-01

    The analysis of three-dimensional magnetic fields by finite element methods frequently proves too onerous a task for the computing resource on which it is attempted. When non-linear and transient effects are included, it may become impossible to calculate the field distribution to sufficient resolution. One approach to this problem is to exploit the natural parallelism in the finite element method via parallel processing. This paper reports on an implementation of a finite element code for non-linear three-dimensional low-frequency magnetic field calculation on Intel's iPSC/2

  18. Multiple network alignment on quantum computers

    Science.gov (United States)

    Daskin, Anmer; Grama, Ananth; Kais, Sabre

    2014-12-01

    Comparative analyses of graph-structured datasets underly diverse problems. Examples of these problems include identification of conserved functional components (biochemical interactions) across species, structural similarity of large biomolecules, and recurring patterns of interactions in social networks. A large class of such analyses methods quantify the topological similarity of nodes across networks. The resulting correspondence of nodes across networks, also called node alignment, can be used to identify invariant subgraphs across the input graphs. Given graphs as input, alignment algorithms use topological information to assign a similarity score to each -tuple of nodes, with elements (nodes) drawn from each of the input graphs. Nodes are considered similar if their neighbors are also similar. An alternate, equivalent view of these network alignment algorithms is to consider the Kronecker product of the input graphs and to identify high-ranked nodes in the Kronecker product graph. Conventional methods such as PageRank and HITS (Hypertext-Induced Topic Selection) can be used for this purpose. These methods typically require computation of the principal eigenvector of a suitably modified Kronecker product matrix of the input graphs. We adopt this alternate view of the problem to address the problem of multiple network alignment. Using the phase estimation algorithm, we show that the multiple network alignment problem can be efficiently solved on quantum computers. We characterize the accuracy and performance of our method and show that it can deliver exponential speedups over conventional (non-quantum) methods.

  19. Design Principles for Synthesizable Processor Cores

    DEFF Research Database (Denmark)

    Schleuniger, Pascal; McKee, Sally A.; Karlsson, Sven

    2012-01-01

    As FPGAs get more competitive, synthesizable processor cores become an attractive choice for embedded computing. Currently popular commercial processor cores do not fully exploit current FPGA architectures. In this paper, we propose general design principles to increase instruction throughput...

  20. Dual-scale topology optoelectronic processor.

    Science.gov (United States)

    Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H

    1991-12-15

    The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.

  1. Transportable GPU (General Processor Units) chip set technology for standard computer architectures

    Science.gov (United States)

    Fosdick, R. E.; Denison, H. C.

    1982-11-01

    The USAFR-developed GPU Chip Set has been utilized by Tracor to implement both USAF and Navy Standard 16-Bit Airborne Computer Architectures. Both configurations are currently being delivered into DOD full-scale development programs. Leadless Hermetic Chip Carrier packaging has facilitated implementation of both architectures on single 41/2 x 5 substrates. The CMOS and CMOS/SOS implementations of the GPU Chip Set have allowed both CPU implementations to use less than 3 watts of power each. Recent efforts by Tracor for USAF have included the definition of a next-generation GPU Chip Set that will retain the application-proven architecture of the current chip set while offering the added cost advantages of transportability across ISO-CMOS and CMOS/SOS processes and across numerous semiconductor manufacturers using a newly-defined set of common design rules. The Enhanced GPU Chip Set will increase speed by an approximate factor of 3 while significantly reducing chip counts and costs of standard CPU implementations.

  2. Allocating application to group of consecutive processors in fault-tolerant deadlock-free routing path defined by routers obeying same rules for path selection

    Science.gov (United States)

    Leung, Vitus J [Albuquerque, NM; Phillips, Cynthia A [Albuquerque, NM; Bender, Michael A [East Northport, NY; Bunde, David P [Urbana, IL

    2009-07-21

    In a multiple processor computing apparatus, directional routing restrictions and a logical channel construct permit fault tolerant, deadlock-free routing. Processor allocation can be performed by creating a linear ordering of the processors based on routing rules used for routing communications between the processors. The linear ordering can assume a loop configuration, and bin-packing is applied to this loop configuration. The interconnection of the processors can be conceptualized as a generally rectangular 3-dimensional grid, and the MC allocation algorithm is applied with respect to the 3-dimensional grid.

  3. Experimental technique for study on three-particle reactions in kinematically total experiments with usage of the two-processor complex on the M-400 computer basis

    International Nuclear Information System (INIS)

    Berezin, F.N.; Kisurin, V.A.; Nemets, O.F.; Ofengenden, R.G.; Pugach, V.M.; Pavlenko, Yu.N.; Patlan', Yu.V.; Savrasov, S.S.

    1981-01-01

    Experimental technique for investigation of three-particle nuclear reactions in kinematically total experiments is described. The technique provides the storage of one-dimensional and two- dimensional energy spectra from several detectors. A block diagram of the measuring system, using this technique, is presented. The measuring system consists of analog equipment for rapid-slow coincidences and of a two-processor complex on the base of the M-400 computer with a general bus. Application of a two-processor complex, each computer of which has a possibility of direct access to memory of another computer, permits to separate functions of data collection and data operational presentation and to perform necessary physical calculations. Software of the measuring complex which includes programs written using the ASSEMBLER language for the first computer and functional programs written using the BASIC language for the second computer, is considered. Software of the first computer includes the DISPETCHER dialog control program, driver package for control of external devices, of applied program package and system modules. The technique, described, is tested in experiment on investigation of d+ 10 B→α+α+α three- particle reaction at deutron energy of 13.6 MeV. The two-dimensional energy spectrum reaction obtained with the help of the technique described is presented [ru

  4. Multiple single-board-computer system for the KEK positron generator control

    International Nuclear Information System (INIS)

    Nakahara, Kazuo; Abe, Isamu; Enomoto, Atsushi; Otake, Yuji; Urano, Takao

    1986-01-01

    The KEK positron generator is controlled by means of a distributed microprocessor network. The control system is composed of three kinds of equipment: device controllers for the linac equipment, operation management stations and a communication network. Individual linac equipment has its own microprocessor-based controller. A multiple single board computer (SBC) system is used for communication control and for equipment surveillance; it has a database containing communication and linac equipment status information. The linac operation management that should be the most soft part in the control system, is separated from the multiple SBC system and is carried out by work-stations. The principle that every processor executes only one task is maintained throughout the control system. This made the software architecture very simple. (orig.)

  5. Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA

    Science.gov (United States)

    Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei

    2013-03-01

    With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.

  6. Efficiently outsourcing multiparty computation under multiple keys

    NARCIS (Netherlands)

    Peter, Andreas; Tews, Erik; Tews, Erik; Katzenbeisser, Stefan

    2013-01-01

    Secure multiparty computation enables a set of users to evaluate certain functionalities on their respective inputs while keeping these inputs encrypted throughout the computation. In many applications, however, outsourcing these computations to an untrusted server is desirable, so that the server

  7. Globe hosts launch of new processor

    CERN Multimedia

    2006-01-01

    Launch of the quadecore processor chip at the Globe. On 14 November, in a series of major media events around the world, the chip-maker Intel launched its new 'quadcore' processor. For the regions of Europe, the Middle East and Africa, the day-long launch event took place in CERN's Globe of Science and Innovation, with over 30 journalists in attendance, coming from as far away as Johannesburg and Dubai. CERN was a significant choice for the event: the first tests of this new generation of processor in Europe had been made at CERN over the preceding months, as part of CERN openlab, a research partnership with leading IT companies such as Intel, HP and Oracle. The event also provided the opportunity for the journalists to visit ATLAS and the CERN Computer Centre. The strategy of putting multiple processor cores on the same chip, which has been pursued by Intel and other chip-makers in the last few years, represents an important departure from the more traditional improvements in the sheer speed of such chips. ...

  8. Many - body simulations using an array processor

    International Nuclear Information System (INIS)

    Rapaport, D.C.

    1985-01-01

    Simulations of microscopic models of water and polypeptides using molecular dynamics and Monte Carlo techniques have been carried out with the aid of an FPS array processor. The computational techniques are discussed, with emphasis on the development and optimization of the software to take account of the special features of the processor. The computing requirements of these simulations exceed what could be reasonably carried out on a normal 'scientific' computer. While the FPS processor is highly suited to the kinds of models described, several other computationally intensive problems in statistical mechanics are outlined for which alternative processor architectures are more appropriate

  9. Fuzzy multiple linear regression: A computational approach

    Science.gov (United States)

    Juang, C. H.; Huang, X. H.; Fleming, J. W.

    1992-01-01

    This paper presents a new computational approach for performing fuzzy regression. In contrast to Bardossy's approach, the new approach, while dealing with fuzzy variables, closely follows the conventional regression technique. In this approach, treatment of fuzzy input is more 'computational' than 'symbolic.' The following sections first outline the formulation of the new approach, then deal with the implementation and computational scheme, and this is followed by examples to illustrate the new procedure.

  10. A lock circuit for a multi-core processor

    DEFF Research Database (Denmark)

    2015-01-01

    An integrated circuit comprising a multiple processor cores and a lock circuit that comprises a queue register with respective bits set or reset via respective, connections dedicated to respective processor cores, whereby the queue register identifies those among the multiple processor cores...... that are enqueued in the queue register. Furthermore, the integrated circuit comprises a current register and a selector circuit configured to select a processor core and identify that processor core by a value in the current register. A selected processor core is a prioritized processor core among the cores...... configured with an integrated circuit; and a silicon die configured with an integrated circuit....

  11. Secure Multiparty Quantum Computation for Summation and Multiplication.

    Science.gov (United States)

    Shi, Run-hua; Mu, Yi; Zhong, Hong; Cui, Jie; Zhang, Shun

    2016-01-21

    As a fundamental primitive, Secure Multiparty Summation and Multiplication can be used to build complex secure protocols for other multiparty computations, specially, numerical computations. However, there is still lack of systematical and efficient quantum methods to compute Secure Multiparty Summation and Multiplication. In this paper, we present a novel and efficient quantum approach to securely compute the summation and multiplication of multiparty private inputs, respectively. Compared to classical solutions, our proposed approach can ensure the unconditional security and the perfect privacy protection based on the physical principle of quantum mechanics.

  12. Multithreaded Processors

    Indian Academy of Sciences (India)

    IAS Admin

    by modern computer designers due to operating speed mismatch between ... ern mobile phone is more powerful than the computer aboard NASA's Apollo 11 .... Note that such a system stalls only when all threads are stalled. Assume there are ...

  13. Green Secure Processors: Towards Power-Efficient Secure Processor Design

    Science.gov (United States)

    Chhabra, Siddhartha; Solihin, Yan

    With the increasing wealth of digital information stored on computer systems today, security issues have become increasingly important. In addition to attacks targeting the software stack of a system, hardware attacks have become equally likely. Researchers have proposed Secure Processor Architectures which utilize hardware mechanisms for memory encryption and integrity verification to protect the confidentiality and integrity of data and computation, even from sophisticated hardware attacks. While there have been many works addressing performance and other system level issues in secure processor design, power issues have largely been ignored. In this paper, we first analyze the sources of power (energy) increase in different secure processor architectures. We then present a power analysis of various secure processor architectures in terms of their increase in power consumption over a base system with no protection and then provide recommendations for designs that offer the best balance between performance and power without compromising security. We extend our study to the embedded domain as well. We also outline the design of a novel hybrid cryptographic engine that can be used to minimize the power consumption for a secure processor. We believe that if secure processors are to be adopted in future systems (general purpose or embedded), it is critically important that power issues are considered in addition to performance and other system level issues. To the best of our knowledge, this is the first work to examine the power implications of providing hardware mechanisms for security.

  14. Computer simulation of multiple dynamic photorefractive gratings

    DEFF Research Database (Denmark)

    Buchhave, Preben

    1998-01-01

    The benefits of a direct visualization of space-charge grating buildup are described. The visualization is carried out by a simple repetitive computer program, which simulates the basic processes in the band-transport model and displays the result graphically or in the form of numerical data. The...

  15. Computing multiple-output regression quantile regions

    Czech Academy of Sciences Publication Activity Database

    Paindaveine, D.; Šiman, Miroslav

    2012-01-01

    Roč. 56, č. 4 (2012), s. 840-853 ISSN 0167-9473 R&D Projects: GA MŠk(CZ) 1M06047 Institutional research plan: CEZ:AV0Z10750506 Keywords : halfspace depth * multiple-output regression * parametric linear programming * quantile regression Subject RIV: BA - General Mathematics Impact factor: 1.304, year: 2012 http://library.utia.cas.cz/separaty/2012/SI/siman-0376413.pdf

  16. An interactive parallel processor for data analysis

    International Nuclear Information System (INIS)

    Mong, J.; Logan, D.; Maples, C.; Rathbun, W.; Weaver, D.

    1984-01-01

    A parallel array of eight minicomputers has been assembled in an attempt to deal with kiloparameter data events. By exporting computer system functions to a separate processor, the authors have been able to achieve computer amplification linearly proportional to the number of executing processors

  17. Processors and systems (picture processing)

    Energy Technology Data Exchange (ETDEWEB)

    Gemmar, P

    1983-01-01

    Automatic picture processing requires high performance computers and high transmission capacities in the processor units. The author examines the possibilities of operating processors in parallel in order to accelerate the processing of pictures. He therefore discusses a number of available processors and systems for picture processing and illustrates their capacities for special types of picture processing. He stresses the fact that the amount of storage required for picture processing is exceptionally high. The author concludes that it is as yet difficult to decide whether very large groups of simple processors or highly complex multiprocessor systems will provide the best solution. Both methods will be aided by the development of VLSI. New solutions have already been offered (systolic arrays and 3-d processing structures) but they also are subject to losses caused by inherently parallel algorithms. Greater efforts must be made to produce suitable software for multiprocessor systems. Some possibilities for future picture processing systems are discussed. 33 references.

  18. The LASS hardware processor

    International Nuclear Information System (INIS)

    Kunz, P.F.

    1976-01-01

    The problems of data analysis with hardware processors are reviewed and a description is given of a programmable processor. This processor, the 168/E, has been designed for use in the LASS multi-processor system; it has an execution speed comparable to the IBM 370/168 and uses the subset of IBM 370 instructions appropriate to the LASS analysis task. (Auth.)

  19. Accuracies Of Optical Processors For Adaptive Optics

    Science.gov (United States)

    Downie, John D.; Goodman, Joseph W.

    1992-01-01

    Paper presents analysis of accuracies and requirements concerning accuracies of optical linear-algebra processors (OLAP's) in adaptive-optics imaging systems. Much faster than digital electronic processor and eliminate some residual distortion. Question whether errors introduced by analog processing of OLAP overcome advantage of greater speed. Paper addresses issue by presenting estimate of accuracy required in general OLAP that yields smaller average residual aberration of wave front than digital electronic processor computing at given speed.

  20. Functional Verification of Enhanced RISC Processor

    OpenAIRE

    SHANKER NILANGI; SOWMYA L

    2013-01-01

    This paper presents design and verification of a 32-bit enhanced RISC processor core having floating point computations integrated within the core, has been designed to reduce the cost and complexity. The designed 3 stage pipelined 32-bit RISC processor is based on the ARM7 processor architecture with single precision floating point multiplier, floating point adder/subtractor for floating point operations and 32 x 32 booths multiplier added to the integer core of ARM7. The binary representati...

  1. Video frame processor

    International Nuclear Information System (INIS)

    Joshi, V.M.; Agashe, Alok; Bairi, B.R.

    1993-01-01

    This report provides technical description regarding the Video Frame Processor (VFP) developed at Bhabha Atomic Research Centre. The instrument provides capture of video images available in CCIR format. Two memory planes each with a capacity of 512 x 512 x 8 bit data enable storage of two video image frames. The stored image can be processed on-line and on-line image subtraction can also be carried out for image comparisons. The VFP is a PC Add-on board and is I/O mapped within the host IBM PC/AT compatible computer. (author). 9 refs., 4 figs., 19 photographs

  2. Multibus-based parallel processor for simulation

    Science.gov (United States)

    Ogrady, E. P.; Wang, C.-H.

    1983-01-01

    A Multibus-based parallel processor simulation system is described. The system is intended to serve as a vehicle for gaining hands-on experience, testing system and application software, and evaluating parallel processor performance during development of a larger system based on the horizontal/vertical-bus interprocessor communication mechanism. The prototype system consists of up to seven Intel iSBC 86/12A single-board computers which serve as processing elements, a multiple transmission controller (MTC) designed to support system operation, and an Intel Model 225 Microcomputer Development System which serves as the user interface and input/output processor. All components are interconnected by a Multibus/IEEE 796 bus. An important characteristic of the system is that it provides a mechanism for a processing element to broadcast data to other selected processing elements. This parallel transfer capability is provided through the design of the MTC and a minor modification to the iSBC 86/12A board. The operation of the MTC, the basic hardware-level operation of the system, and pertinent details about the iSBC 86/12A and the Multibus are described.

  3. The study of Kruskal's and Prim's algorithms on the Multiple Instruction and Single Data stream computer system

    Directory of Open Access Journals (Sweden)

    A. Yu. Popov

    2015-01-01

    Full Text Available Bauman Moscow State Technical University is implementing a project to develop operating principles of computer system having radically new architecture. A developed working model of the system allowed us to evaluate an efficiency of developed hardware and software. The experimental results presented in previous studies, as well as the analysis of operating principles of new computer system permit to draw conclusions regarding its efficiency in solving discrete optimization problems related to processing of sets.The new architecture is based on a direct hardware support of operations of discrete mathematics, which is reflected in using the special facilities for processing of sets and data structures. Within the framework of the project a special device was designed, i.e. a structure processor (SP, which improved the performance, without limiting the scope of applications of such a computer system.The previous works presented the basic principles of the computational process organization in MISD (Multiple Instructions, Single Data system, showed the structure and features of the structure processor and the general principles to solve discrete optimization problems on graphs.This paper examines two search algorithms of the minimum spanning tree, namely Kruskal's and Prim's algorithms. It studies the implementations of algorithms for two SP operation modes: coprocessor mode and MISD one. The paper presents results of experimental comparison of MISD system performance in coprocessor mode with mainframes.

  4. High performance graphics processors for medical imaging applications

    International Nuclear Information System (INIS)

    Goldwasser, S.M.; Reynolds, R.A.; Talton, D.A.; Walsh, E.S.

    1989-01-01

    This paper describes a family of high- performance graphics processors with special hardware for interactive visualization of 3D human anatomy. The basic architecture expands to multiple parallel processors, each processor using pipelined arithmetic and logical units for high-speed rendering of Computed Tomography (CT), Magnetic Resonance (MR) and Positron Emission Tomography (PET) data. User-selectable display alternatives include multiple 2D axial slices, reformatted images in sagittal or coronal planes and shaded 3D views. Special facilities support applications requiring color-coded display of multiple datasets (such as radiation therapy planning), or dynamic replay of time- varying volumetric data (such as cine-CT or gated MR studies of the beating heart). The current implementation is a single processor system which generates reformatted images in true real time (30 frames per second), and shaded 3D views in a few seconds per frame. It accepts full scale medical datasets in their native formats, so that minimal preprocessing delay exists between data acquisition and display

  5. An orthogonal wavelet division multiple-access processor architecture for LTE-advanced wireless/radio-over-fiber systems over heterogeneous networks

    Science.gov (United States)

    Mahapatra, Chinmaya; Leung, Victor CM; Stouraitis, Thanos

    2014-12-01

    The increase in internet traffic, number of users, and availability of mobile devices poses a challenge to wireless technologies. In long-term evolution (LTE) advanced system, heterogeneous networks (HetNet) using centralized coordinated multipoint (CoMP) transmitting radio over optical fibers (LTE A-ROF) have provided a feasible way of satisfying user demands. In this paper, an orthogonal wavelet division multiple-access (OWDMA) processor architecture is proposed, which is shown to be better suited to LTE advanced systems as compared to orthogonal frequency division multiple access (OFDMA) as in LTE systems 3GPP rel.8 (3GPP, http://www.3gpp.org/DynaReport/36300.htm). ROF systems are a viable alternative to satisfy large data demands; hence, the performance in ROF systems is also evaluated. To validate the architecture, the circuit is designed and synthesized on a Xilinx vertex-6 field-programmable gate array (FPGA). The synthesis results show that the circuit performs with a clock period as short as 7.036 ns (i.e., a maximum clock frequency of 142.13 MHz) for transform size of 512. A pipelined version of the architecture reduces the power consumption by approximately 89%. We compare our architecture with similar available architectures for resource utilization and timing and provide performance comparison with OFDMA systems for various quality metrics of communication systems. The OWDMA architecture is found to perform better than OFDMA for bit error rate (BER) performance versus signal-to-noise ratio (SNR) in wireless channel as well as ROF media. It also gives higher throughput and mitigates the bad effect of peak-to-average-power ratio (PAPR).

  6. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

    Science.gov (United States)

    Manolakos, Elias S.

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332

  7. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

    Science.gov (United States)

    Sharma, Anuj; Manolakos, Elias S

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.

  8. Array processors based on Gaussian fraction-free method

    Energy Technology Data Exchange (ETDEWEB)

    Peng, S; Sedukhin, S [Aizu Univ., Aizuwakamatsu, Fukushima (Japan); Sedukhin, I

    1998-03-01

    The design of algorithmic array processors for solving linear systems of equations using fraction-free Gaussian elimination method is presented. The design is based on a formal approach which constructs a family of planar array processors systematically. These array processors are synthesized and analyzed. It is shown that some array processors are optimal in the framework of linear allocation of computations and in terms of number of processing elements and computing time. (author)

  9. Very Long Instruction Word Processors

    Indian Academy of Sciences (India)

    Explicitly Parallel Instruction Computing (EPIC) is an instruction processing paradigm that has been in the spot- light due to its adoption by the next generation of Intel. Processors starting with the IA-64. The EPIC processing paradigm is an evolution of the Very Long Instruction. Word (VLIW) paradigm. This article gives an ...

  10. SWIMS: a small-angle multiple scattering computer code

    International Nuclear Information System (INIS)

    Sayer, R.O.

    1976-07-01

    SWIMS (Sigmund and WInterbon Multiple Scattering) is a computer code for calculation of the angular dispersion of ion beams that undergo small-angle, incoherent multiple scattering by gaseous or solid media. The code uses the tabulated angular distributions of Sigmund and Winterbon for a Thomas-Fermi screened Coulomb potential. The fraction of the incident beam scattered into a cone defined by the polar angle α is computed as a function of α for reduced thicknesses over the range 0.01 less than or equal to tau less than or equal to 10.0. 1 figure, 2 tables

  11. Building an organic computing device with multiple interconnected brains

    OpenAIRE

    Pais-Vieira, Miguel; Chiuffa, Gabriela; Lebedev, Mikhail; Yadav, Amol; Nicolelis, Miguel A. L.

    2015-01-01

    Recently, we proposed that Brainets, i.e. networks formed by multiple animal brains, cooperating and exchanging information in real time through direct brain-to-brain interfaces, could provide the core of a new type of computing device: an organic computer. Here, we describe the first experimental demonstration of such a Brainet, built by interconnecting four adult rat brains. Brainets worked by concurrently recording the extracellular electrical activity generated by populations of cortical ...

  12. Multiple-User, Multitasking, Virtual-Memory Computer System

    Science.gov (United States)

    Generazio, Edward R.; Roth, Don J.; Stang, David B.

    1993-01-01

    Computer system designed and programmed to serve multiple users in research laboratory. Provides for computer control and monitoring of laboratory instruments, acquisition and anlaysis of data from those instruments, and interaction with users via remote terminals. System provides fast access to shared central processing units and associated large (from megabytes to gigabytes) memories. Underlying concept of system also applicable to monitoring and control of industrial processes.

  13. Computed Tomography diagnosis of skeletal involvement in multiple myeloma

    International Nuclear Information System (INIS)

    Scutellari, Pier Nuccio; Galeotti, Roberto; Leprotti, Stefano; Piva, Nadia; Spanedda, Romedio

    1997-01-01

    The authors assess the role of Computed Topography in the diagnosis and management of multiple myeloma (MM) and investigate if Computed Tomography findings can influence the clinical approach, prognosis and treatment. 273 multiple myeloma patients submitted to Computed Tomography June 1994, to December, 1996. The patients were 143 men and 130 women (mean age: 65 years): 143 were stage I, 38 stage II and 92 stage III according to Durie and Salomon's clinical classification. All patients were submitted to blood tests, spinal radiography and Computed Tomography, the latter with serial 5-mm scans on several vertebral bodies. Computed Tomography despicted vertebral arch and process involvement in 3 cases with the vertebral pedicle sign. Moreover, Computed Tomography proved superior to radiography in showing the spread of myelomatous masses into the soft tissues in a case with solitary permeative lesion in the left public bone, which facilitated subsequent biopsy. As for extraosseous localizations, Computed Tomography demonstrated thoracic soft tissue (1 woman) and pelvic (1 man) involvement by myelomtous masses penetrating into surrounding tissues. In our series, only a case of osteosclerotic bone myeloma was observed in the pelvis, associated with lytic abnormalities. Computed Tomography findings do not seem to improve the clinical approach and therapeutic management of the disease. Nevertheless, the authors reccommend Computed Tomography for some myelomatous conditions, namely: a) in the patients with focal bone pain but normal skeletal radiographs; b) in the patients with M protein, bone marrow plasmocytosis and back pain, but with an incoclusive multiple myeloma diagnosis; c) to asses bone spread in the regions which are anatomically complex or difficult to study with radiography and to depict soft tissue involvement; d) for bone biopsy

  14. Multitasking for flows about multiple body configurations using the chimera grid scheme

    Science.gov (United States)

    Dougherty, F. C.; Morgan, R. L.

    1987-01-01

    The multitasking of a finite-difference scheme using multiple overset meshes is described. In this chimera, or multiple overset mesh approach, a multiple body configuration is mapped using a major grid about the main component of the configuration, with minor overset meshes used to map each additional component. This type of code is well suited to multitasking. Both steady and unsteady two dimensional computations are run on parallel processors on a CRAY-X/MP 48, usually with one mesh per processor. Flow field results are compared with single processor results to demonstrate the feasibility of running multiple mesh codes on parallel processors and to show the increase in efficiency.

  15. XL-100S microprogrammable processor

    International Nuclear Information System (INIS)

    Gorbunov, N.V.; Guzik, Z.; Sutulin, V.A.; Forytski, A.

    1983-01-01

    The XL-100S microprogrammable processor providing the multiprocessor operation mode in the XL system crate is described. The processor meets the EUR 6500 CAMAC standards, address up to 4 Mbyte memory, and interacts with 7 CAMAC branchas. Eight external requests initiate operations preset by a sequence of microcommands in a memory of the capacity up to 64 kwords of 32-Git. The microprocessor architecture allows one to emulate commands of the majority of mini- or micro-computers, including floating point operations. The XL-100S processor may be used in various branches of experimental physics: for physical experiment apparatus control, fast selection of useful physical events, organization of the of input/output operations, organization of direct assess to memory included, etc. The Am2900 microprocessor set is used as an elementary base. The device is made in the form of a single width CAMAC module

  16. Discrete Fourier transformation processor based on complex radix (−1 + j number system

    Directory of Open Access Journals (Sweden)

    Anidaphi Shadap

    2017-02-01

    Full Text Available Complex radix (−1 + j allows the arithmetic operations of complex numbers to be done without treating the divide and conquer rules, which offers the significant speed improvement of complex numbers computation circuitry. Design and hardware implementation of complex radix (−1 + j converter has been introduced in this paper. Extensive simulation results have been incorporated and an application of this converter towards the implementation of discrete Fourier transformation (DFT processor has been presented. The functionality of the DFT processor have been verified in Xilinx ISE design suite version 14.7 and performance parameters like propagation delay and dynamic switching power consumption have been calculated by Virtuoso platform in Cadence. The proposed DFT processor has been implemented through conversion, multiplication and addition. The performance parameter matrix in terms of delay and power consumption offered a significant improvement over other traditional implementation of DFT processor.

  17. MULGRES: a computer program for stepwise multiple regression analysis

    Science.gov (United States)

    A. Jeff Martin

    1971-01-01

    MULGRES is a computer program source deck that is designed for multiple regression analysis employing the technique of stepwise deletion in the search for most significant variables. The features of the program, along with inputs and outputs, are briefly described, with a note on machine compatibility.

  18. Performance of Cloud Computing Centers with Multiple Priority Classes

    NARCIS (Netherlands)

    Ellens, W.; Zivkovic, Miroslav; Akkerboom, J.; Litjens, R.; van den Berg, Hans Leo

    In this paper we consider the general problem of resource provisioning within cloud computing. We analyze the problem of how to allocate resources to different clients such that the service level agreements (SLAs) for all of these clients are met. A model with multiple service request classes

  19. Computer optimization of cutting yield from multiple ripped boards

    Science.gov (United States)

    A.R. Stern; K.A. McDonald

    1978-01-01

    RIPYLD is a computer program that optimizes the cutting yield from multiple-ripped boards. Decisions are based on automatically collected defect information, cutting bill requirements, and sawing variables. The yield of clear cuttings from a board is calculated for every possible permutation of specified rip widths and both the maximum and minimum percent yield...

  20. Teacher regulation of multiple computer-supported collaborating groups

    NARCIS (Netherlands)

    Van Leeuwen, Anouschka; Janssen, Jeroen; Erkens, Gijsbert; Brekelmans, Mieke

    2015-01-01

    Teachers regulating groups of students during computer-supported collaborative learning (CSCL) face the challenge of orchestrating their guidance at student, group, and class level. During CSCL, teachers can monitor all student activity and interact with multiple groups at the same time. Not much is

  1. Advanced Avionics and Processor Systems for a Flexible Space Exploration Architecture

    Science.gov (United States)

    Keys, Andrew S.; Adams, James H.; Smith, Leigh M.; Johnson, Michael A.; Cressler, John D.

    2010-01-01

    The Advanced Avionics and Processor Systems (AAPS) project, formerly known as the Radiation Hardened Electronics for Space Environments (RHESE) project, endeavors to develop advanced avionic and processor technologies anticipated to be used by NASA s currently evolving space exploration architectures. The AAPS project is a part of the Exploration Technology Development Program, which funds an entire suite of technologies that are aimed at enabling NASA s ability to explore beyond low earth orbit. NASA s Marshall Space Flight Center (MSFC) manages the AAPS project. AAPS uses a broad-scoped approach to developing avionic and processor systems. Investment areas include advanced electronic designs and technologies capable of providing environmental hardness, reconfigurable computing techniques, software tools for radiation effects assessment, and radiation environment modeling tools. Near-term emphasis within the multiple AAPS tasks focuses on developing prototype components using semiconductor processes and materials (such as Silicon-Germanium (SiGe)) to enhance a device s tolerance to radiation events and low temperature environments. As the SiGe technology will culminate in a delivered prototype this fiscal year, the project emphasis shifts its focus to developing low-power, high efficiency total processor hardening techniques. In addition to processor development, the project endeavors to demonstrate techniques applicable to reconfigurable computing and partially reconfigurable Field Programmable Gate Arrays (FPGAs). This capability enables avionic architectures the ability to develop FPGA-based, radiation tolerant processor boards that can serve in multiple physical locations throughout the spacecraft and perform multiple functions during the course of the mission. The individual tasks that comprise AAPS are diverse, yet united in the common endeavor to develop electronics capable of operating within the harsh environment of space. Specifically, the AAPS tasks for

  2. A High Performance VLSI Computer Architecture For Computer Graphics

    Science.gov (United States)

    Chin, Chi-Yuan; Lin, Wen-Tai

    1988-10-01

    A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.

  3. Probabilistic programmable quantum processors

    International Nuclear Information System (INIS)

    Buzek, V.; Ziman, M.; Hillery, M.

    2004-01-01

    We analyze how to improve performance of probabilistic programmable quantum processors. We show how the probability of success of the probabilistic processor can be enhanced by using the processor in loops. In addition, we show that an arbitrary SU(2) transformations of qubits can be encoded in program state of a universal programmable probabilistic quantum processor. The probability of success of this processor can be enhanced by a systematic correction of errors via conditional loops. Finally, we show that all our results can be generalized also for qudits. (Abstract Copyright [2004], Wiley Periodicals, Inc.)

  4. Numeric algorithms for parallel processors computer architectures with applications to the few-groups neutron diffusion equations

    International Nuclear Information System (INIS)

    Zee, S.K.

    1987-01-01

    A numeric algorithm and an associated computer code were developed for the rapid solution of the finite-difference method representation of the few-group neutron-diffusion equations on parallel computers. Applications of the numeric algorithm on both SIMD (vector pipeline) and MIMD/SIMD (multi-CUP/vector pipeline) architectures were explored. The algorithm was successfully implemented in the two-group, 3-D neutron diffusion computer code named DIFPAR3D (DIFfusion PARallel 3-Dimension). Numerical-solution techniques used in the code include the Chebyshev polynomial acceleration technique in conjunction with the power method of outer iteration. For inner iterations, a parallel form of red-black (cyclic) line SOR with automated determination of group dependent relaxation factors and iteration numbers required to achieve specified inner iteration error tolerance is incorporated. The code employs a macroscopic depletion model with trace capability for selected fission products' transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the DIFPAR3D code, for realistic simulation of power reactor cores. The physics models used were proven acceptable in separate benchmarking studies

  5. Vertical Load Distribution for Cloud Computing via Multiple Implementation Options

    Science.gov (United States)

    Phan, Thomas; Li, Wen-Syan

    Cloud computing looks to deliver software as a provisioned service to end users, but the underlying infrastructure must be sufficiently scalable and robust. In our work, we focus on large-scale enterprise cloud systems and examine how enterprises may use a service-oriented architecture (SOA) to provide a streamlined interface to their business processes. To scale up the business processes, each SOA tier usually deploys multiple servers for load distribution and fault tolerance, a scenario which we term horizontal load distribution. One limitation of this approach is that load cannot be distributed further when all servers in the same tier are loaded. In complex multi-tiered SOA systems, a single business process may actually be implemented by multiple different computation pathways among the tiers, each with different components, in order to provide resilience and scalability. Such multiple implementation options gives opportunities for vertical load distribution across tiers. In this chapter, we look at a novel request routing framework for SOA-based enterprise computing with multiple implementation options that takes into account the options of both horizontal and vertical load distribution.

  6. On the implementation of the Ford | Fulkerson algorithm on the Multiple Instruction and Single Data computer system

    Directory of Open Access Journals (Sweden)

    A. Yu. Popov

    2014-01-01

    Full Text Available Algorithms of optimization in networks and direct graphs find a broad application when solving the practical tasks. However, along with large-scale introduction of information technologies in human activity, requirements for volumes of input data and retrieval rate of solution are aggravated. In spite of the fact that by now the large number of algorithms for the various models of computers and computing systems have been studied and implemented, the solution of key problems of optimization for real dimensions of tasks remains difficult. In this regard search of new and more efficient computing structures, as well as update of known algorithms are of great current interest.The work considers an implementation of the search-end algorithm of the maximum flow on the direct graph for multiple instructions and single data computer system (MISD developed in BMSTU. Key feature of this architecture is deep hardware support of operations over sets and structures of data. Functions of storage and access to them are realized on the specialized processor of structures processing (SP which is capable to perform at the hardware level such operations as: add, delete, search, intersect, complete, merge, and others. Advantage of such system is possibility of parallel execution of parts of the computing tasks regarding the access to the sets to data structures simultaneously with arithmetic and logical processing of information.The previous works present the general principles of the computing process arrangement and features of programs implemented in MISD system, describe the structure and principles of functioning the processor of structures processing, show the general principles of the graph task solutions in such system, and experimentally study the efficiency of the received algorithms.The work gives command formats of the SP processor, offers the technique to update the algorithms realized in MISD system, suggests the option of Ford-Falkersona algorithm

  7. A study on the effectiveness of lockup-free caches for a Reduced Instruction Set Computer (RISC) processor

    OpenAIRE

    Tharpe, Leonard.

    1992-01-01

    Approved for public release; distribution is unlimited This thesis presents a simulation and analysis of the Reduced Instruction Set Computer (RISC) architecture and the effects on RISC performance of a lockup-free cache interface. RISC architectures achieve high performance by having a small, but sufficient, instruction set with most instructions executing in one clock cycle. Current RISC performance range from 1.5 to 2.0 CPI. The goal of RISC is to attain a CPI of 1.0. The major hind...

  8. A high-accuracy optical linear algebra processor for finite element applications

    Science.gov (United States)

    Casasent, D.; Taylor, B. K.

    1984-01-01

    Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced.

  9. Computer studies of multiple-quantum spin dynamics

    Energy Technology Data Exchange (ETDEWEB)

    Murdoch, J.B.

    1982-11-01

    The excitation and detection of multiple-quantum (MQ) transitions in Fourier transform NMR spectroscopy is an interesting problem in the quantum mechanical dynamics of spin systems as well as an important new technique for investigation of molecular structure. In particular, multiple-quantum spectroscopy can be used to simplify overly complex spectra or to separate the various interactions between a nucleus and its environment. The emphasis of this work is on computer simulation of spin-system evolution to better relate theory and experiment.

  10. A computer program for determining multiplicities of powder reflexions

    International Nuclear Information System (INIS)

    Rouse, K.D.; Cooper, M.J.

    1977-01-01

    A computer program has been written which determines the multiplicity factors for a given set of X-ray or neutron powder diffraction reflexions for crystals of any space group. The value of the multiplicity for each reflexion is determined from a look-up table which is indexed by the symmetry type, determined directly from the space-group number, and the reflexion type, determined from the Miller indices. There are no restrictions on the choice of indices which are used to specify the reflexions. (Auth.)

  11. Computer studies of multiple-quantum spin dynamics

    International Nuclear Information System (INIS)

    Murdoch, J.B.

    1982-11-01

    The excitation and detection of multiple-quantum (MQ) transitions in Fourier transform NMR spectroscopy is an interesting problem in the quantum mechanical dynamics of spin systems as well as an important new technique for investigation of molecular structure. In particular, multiple-quantum spectroscopy can be used to simplify overly complex spectra or to separate the various interactions between a nucleus and its environment. The emphasis of this work is on computer simulation of spin-system evolution to better relate theory and experiment

  12. Computing all hybridization networks for multiple binary phylogenetic input trees.

    Science.gov (United States)

    Albrecht, Benjamin

    2015-07-30

    The computation of phylogenetic trees on the same set of species that are based on different orthologous genes can lead to incongruent trees. One possible explanation for this behavior are interspecific hybridization events recombining genes of different species. An important approach to analyze such events is the computation of hybridization networks. This work presents the first algorithm computing the hybridization number as well as a set of representative hybridization networks for multiple binary phylogenetic input trees on the same set of taxa. To improve its practical runtime, we show how this algorithm can be parallelized. Moreover, we demonstrate the efficiency of the software Hybroscale, containing an implementation of our algorithm, by comparing it to PIRNv2.0, which is so far the best available software computing the exact hybridization number for multiple binary phylogenetic trees on the same set of taxa. The algorithm is part of the software Hybroscale, which was developed specifically for the investigation of hybridization networks including their computation and visualization. Hybroscale is freely available(1) and runs on all three major operating systems. Our simulation study indicates that our approach is on average 100 times faster than PIRNv2.0. Moreover, we show how Hybroscale improves the interpretation of the reported hybridization networks by adding certain features to its graphical representation.

  13. Parallel Computation on Multicore Processors Using Explicit Form of the Finite Element Method and C++ Standard Libraries

    Directory of Open Access Journals (Sweden)

    Rek Václav

    2016-11-01

    Full Text Available In this paper, the form of modifications of the existing sequential code written in C or C++ programming language for the calculation of various kind of structures using the explicit form of the Finite Element Method (Dynamic Relaxation Method, Explicit Dynamics in the NEXX system is introduced. The NEXX system is the core of engineering software NEXIS, Scia Engineer, RFEM and RENEX. It has the possibilities of multithreaded running, which can now be supported at the level of native C++ programming language using standard libraries. Thanks to the high degree of abstraction that a contemporary C++ programming language provides, a respective library created in this way can be very generalized for other purposes of usage of parallelism in computational mechanics.

  14. Neurovision processor for designing intelligent sensors

    Science.gov (United States)

    Gupta, Madan M.; Knopf, George K.

    1992-03-01

    A programmable multi-task neuro-vision processor, called the Positive-Negative (PN) neural processor, is proposed as a plausible hardware mechanism for constructing robust multi-task vision sensors. The computational operations performed by the PN neural processor are loosely based on the neural activity fields exhibited by certain nervous tissue layers situated in the brain. The neuro-vision processor can be programmed to generate diverse dynamic behavior that may be used for spatio-temporal stabilization (STS), short-term visual memory (STVM), spatio-temporal filtering (STF) and pulse frequency modulation (PFM). A multi- functional vision sensor that performs a variety of information processing operations on time- varying two-dimensional sensory images can be constructed from a parallel and hierarchical structure of numerous individually programmed PN neural processors.

  15. Computation of subsonic flow around airfoil systems with multiple separation

    Science.gov (United States)

    Jacob, K.

    1982-01-01

    A numerical method for computing the subsonic flow around multi-element airfoil systems was developed, allowing for flow separation at one or more elements. Besides multiple rear separation also sort bubbles on the upper surface and cove bubbles can approximately be taken into account. Also, compressibility effects for pure subsonic flow are approximately accounted for. After presentation the method is applied to several examples and improved in some details. Finally, the present limitations and desirable extensions are discussed.

  16. Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions

    Directory of Open Access Journals (Sweden)

    Bérenger Bramas

    2018-04-01

    Full Text Available The sparse matrix-vector product (SpMV is a fundamental operation in many scientific applications from various fields. The High Performance Computing (HPC community has therefore continuously invested a lot of effort to provide an efficient SpMV kernel on modern CPU architectures. Although it has been shown that block-based kernels help to achieve high performance, they are difficult to use in practice because of the zero padding they require. In the current paper, we propose new kernels using the AVX-512 instruction set, which makes it possible to use a blocking scheme without any zero padding in the matrix memory storage. We describe mask-based sparse matrix formats and their corresponding SpMV kernels highly optimized in assembly language. Considering that the optimal blocking size depends on the matrix, we also provide a method to predict the best kernel to be used utilizing a simple interpolation of results from previous executions. We compare the performance of our approach to that of the Intel MKL CSR kernel and the CSR5 open-source package on a set of standard benchmark matrices. We show that we can achieve significant improvements in many cases, both for sequential and for parallel executions. Finally, we provide the corresponding code in an open source library, called SPC5.

  17. Evaluation of myocardial ischemia by multiple detector computed tomography

    Energy Technology Data Exchange (ETDEWEB)

    Fernandes, Fabio Vieira, E-mail: rccury@me.com [Hospital do Coracao (HCor), Sao Paulo, SP (Brazil); Cury, Roberto Caldeira [Hospital Samaritano, Sao Paulo, SP (Brazil)

    2015-01-15

    For years, cardiovascular diseases have been the leading cause of death worldwide, bringing on important social and economic consequences. Given this scenario, the search for a method capable of diagnosing coronary artery diseases in an early and accurate way is increasingly higher. The coronary computed tomography angiogram is already widely established for the stratification of coronary artery diseases, and, more recently, the computed tomography myocardial perfusion imaging has been providing relevant information by correlating ischemia and the coronary anatomy. The objective of this review is to describe the evaluation of myocardial ischemia by multiple detector computed tomography. This study will resort to controlled clinical trials that show the possibility of a single method to identify the atherosclerotic load, presence of coronary artery luminal narrowing and possible myocardial ischemia, by means of a fast, practical and reliable method validated by a multicenter study. (author)

  18. Invasive tightly coupled processor arrays

    CERN Document Server

    LARI, VAHID

    2016-01-01

    This book introduces new massively parallel computer (MPSoC) architectures called invasive tightly coupled processor arrays. It proposes strategies, architecture designs, and programming interfaces for invasive TCPAs that allow invading and subsequently executing loop programs with strict requirements or guarantees of non-functional execution qualities such as performance, power consumption, and reliability. For the first time, such a configurable processor array architecture consisting of locally interconnected VLIW processing elements can be claimed by programs, either in full or in part, using the principle of invasive computing. Invasive TCPAs provide unprecedented energy efficiency for the parallel execution of nested loop programs by avoiding any global memory access such as GPUs and may even support loops with complex dependencies such as loop-carried dependencies that are not amenable to parallel execution on GPUs. For this purpose, the book proposes different invasion strategies for claiming a desire...

  19. Multithreading in vector processors

    Science.gov (United States)

    Evangelinos, Constantinos; Kim, Changhoan; Nair, Ravi

    2018-01-16

    In one embodiment, a system includes a processor having a vector processing mode and a multithreading mode. The processor is configured to operate on one thread per cycle in the multithreading mode. The processor includes a program counter register having a plurality of program counters, and the program counter register is vectorized. Each program counter in the program counter register represents a distinct corresponding thread of a plurality of threads. The processor is configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle.

  20. Computer code MLCOSP for multiple-correlation and spectrum analysis with a hybrid computer

    International Nuclear Information System (INIS)

    Oguma, Ritsuo; Fujii, Yoshio; Usui, Hozumi; Watanabe, Koichi

    1975-10-01

    Usage of the computer code MLCOSP(Multiple Correlation and Spectrum) developed is described for a hybrid computer installed in JAERI Functions of the hybrid computer and its terminal devices are utilized ingeniously in the code to reduce complexity of the data handling which occurrs in analysis of the multivariable experimental data and to perform the analysis in perspective. Features of the code are as follows; Experimental data can be fed to the digital computer through the analog part of the hybrid computer by connecting with a data recorder. The computed results are displayed in figures, and hardcopies are taken when necessary. Series-messages to the code are shown on the terminal, so man-machine communication is possible. And further the data can be put in through a keyboard, so case study according to the results of analysis is possible. (auth.)

  1. A high-speed digital signal processor for atmospheric radar, part 7.3A

    Science.gov (United States)

    Brosnahan, J. W.; Woodard, D. M.

    1984-01-01

    The Model SP-320 device is a monolithic realization of a complex general purpose signal processor, incorporating such features as a 32-bit ALU, a 16-bit x 16-bit combinatorial multiplier, and a 16-bit barrel shifter. The SP-320 is designed to operate as a slave processor to a host general purpose computer in applications such as coherent integration of a radar return signal in multiple ranges, or dedicated FFT processing. Presently available is an I/O module conforming to the Intel Multichannel interface standard; other I/O modules will be designed to meet specific user requirements. The main processor board includes input and output FIFO (First In First Out) memories, both with depths of 4096 W, to permit asynchronous operation between the source of data and the host computer. This design permits burst data rates in excess of 5 MW/s.

  2. Multi-processor network implementations in Multibus II and VME

    International Nuclear Information System (INIS)

    Briegel, C.

    1992-01-01

    ACNET (Fermilab Accelerator Controls Network), a proprietary network protocol, is implemented in a multi-processor configuration for both Multibus II and VME. The implementations are contrasted by the bus protocol and software design goals. The Multibus II implementation provides for multiple processors running a duplicate set of tasks on each processor. For a network connected task, messages are distributed by a network round-robin scheduler. Further, messages can be stopped, continued, or re-routed for each task by user-callable commands. The VME implementation provides for multiple processors running one task across all processors. The process can either be fixed to a particular processor or dynamically allocated to an available processor depending on the scheduling algorithm of the multi-processing operating system. (author)

  3. Evaluation of Network Reliability for Computer Networks with Multiple Sources

    Directory of Open Access Journals (Sweden)

    Yi-Kuei Lin

    2012-01-01

    Full Text Available Evaluating the reliability of a network with multiple sources to multiple sinks is a critical issue from the perspective of quality management. Due to the unrealistic definition of paths of network models in previous literature, existing models are not appropriate for real-world computer networks such as the Taiwan Advanced Research and Education Network (TWAREN. This paper proposes a modified stochastic-flow network model to evaluate the network reliability of a practical computer network with multiple sources where data is transmitted through several light paths (LPs. Network reliability is defined as being the probability of delivering a specified amount of data from the sources to the sink. It is taken as a performance index to measure the service level of TWAREN. This paper studies the network reliability of the international portion of TWAREN from two sources (Taipei and Hsinchu to one sink (New York that goes through a submarine and land surface cable between Taiwan and the United States.

  4. A data base processor semantics specification package

    Science.gov (United States)

    Fishwick, P. A.

    1983-01-01

    A Semantics Specification Package (DBPSSP) for the Intel Data Base Processor (DBP) is defined. DBPSSP serves as a collection of cross assembly tools that allow the analyst to assemble request blocks on the host computer for passage to the DBP. The assembly tools discussed in this report may be effectively used in conjunction with a DBP compatible data communications protocol to form a query processor, precompiler, or file management system for the database processor. The source modules representing the components of DBPSSP are fully commented and included.

  5. NMRFx Processor: a cross-platform NMR data processing program

    International Nuclear Information System (INIS)

    Norris, Michael; Fetler, Bayard; Marchant, Jan; Johnson, Bruce A.

    2016-01-01

    NMRFx Processor is a new program for the processing of NMR data. Written in the Java programming language, NMRFx Processor is a cross-platform application and runs on Linux, Mac OS X and Windows operating systems. The application can be run in both a graphical user interface (GUI) mode and from the command line. Processing scripts are written in the Python programming language and executed so that the low-level Java commands are automatically run in parallel on computers with multiple cores or CPUs. Processing scripts can be generated automatically from the parameters of NMR experiments or interactively constructed in the GUI. A wide variety of processing operations are provided, including methods for processing of non-uniformly sampled datasets using iterative soft thresholding. The interactive GUI also enables the use of the program as an educational tool for teaching basic and advanced techniques in NMR data analysis.

  6. NMRFx Processor: a cross-platform NMR data processing program

    Energy Technology Data Exchange (ETDEWEB)

    Norris, Michael; Fetler, Bayard [One Moon Scientific, Inc. (United States); Marchant, Jan [University of Maryland Baltimore County, Howard Hughes Medical Institute (United States); Johnson, Bruce A., E-mail: bruce.johnson@asrc.cuny.edu [One Moon Scientific, Inc. (United States)

    2016-08-15

    NMRFx Processor is a new program for the processing of NMR data. Written in the Java programming language, NMRFx Processor is a cross-platform application and runs on Linux, Mac OS X and Windows operating systems. The application can be run in both a graphical user interface (GUI) mode and from the command line. Processing scripts are written in the Python programming language and executed so that the low-level Java commands are automatically run in parallel on computers with multiple cores or CPUs. Processing scripts can be generated automatically from the parameters of NMR experiments or interactively constructed in the GUI. A wide variety of processing operations are provided, including methods for processing of non-uniformly sampled datasets using iterative soft thresholding. The interactive GUI also enables the use of the program as an educational tool for teaching basic and advanced techniques in NMR data analysis.

  7. A VLSI image processor via pseudo-mersenne transforms

    International Nuclear Information System (INIS)

    Sei, W.J.; Jagadeesh, J.M.

    1986-01-01

    The computational burden on image processing in medical fields where a large amount of information must be processed quickly and accurately has led to consideration of special-purpose image processor chip design for some time. The very large scale integration (VLSI) resolution has made it cost-effective and feasible to consider the design of special purpose chips for medical imaging fields. This paper describes a VLSI CMOS chip suitable for parallel implementation of image processing algorithms and cyclic convolutions by using Pseudo-Mersenne Number Transform (PMNT). The main advantages of the PMNT over the Fast Fourier Transform (FFT) are: (1) no multiplications are required; (2) integer arithmetic is used. The design and development of this processor, which operates on 32-point convolution or 5 x 5 window image, are described

  8. Keystone Business Models for Network Security Processors

    OpenAIRE

    Arthur Low; Steven Muegge

    2013-01-01

    Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor...

  9. Application of the coupled code Athlet-Quabox/Cubbox for the extreme scenarios of the OECD/NRC BWR turbine trip benchmark and its performance on multi-processor computers

    International Nuclear Information System (INIS)

    Langenbuch, S.; Schmidt, K.D.; Velkov, K.

    2003-01-01

    The OECD/NRC BWR Turbine Trip (TT) Benchmark is investigated to perform code-to-code comparison of coupled codes including a comparison to measured data which are available from turbine trip experiments at Peach Bottom 2. This Benchmark problem for a BWR over-pressure transient represents a challenging application of coupled codes which integrate 3-dimensional neutron kinetics into thermal-hydraulic system codes for best-estimate simulation of plant transients. This transient represents a typical application of coupled codes which are usually performed on powerful workstations using a single CPU. Nowadays, the availability of multi-CPUs is much easier. Indeed, powerful workstations already provide 4 to 8 CPU, computer centers give access to multi-processor systems with numbers of CPUs in the order of 16 up to several 100. Therefore, the performance of the coupled code Athlet-Quabox/Cubbox on multi-processor systems is studied. Different cases of application lead to changing requirements of the code efficiency, because the amount of computer time spent in different parts of the code is varying. This paper presents main results of the coupled code Athlet-Quabox/Cubbox for the extreme scenarios of the BWR TT Benchmark together with evaluations of the code performance on multi-processor computers. (authors)

  10. Distributed processor systems

    International Nuclear Information System (INIS)

    Zacharov, B.

    1976-01-01

    In recent years, there has been a growing tendency in high-energy physics and in other fields to solve computational problems by distributing tasks among the resources of inter-coupled processing devices and associated system elements. This trend has gained further momentum more recently with the increased availability of low-cost processors and with the development of the means of data distribution. In two lectures, the broad question of distributed computing systems is examined and the historical development of such systems reviewed. An attempt is made to examine the reasons for the existence of these systems and to discern the main trends for the future. The components of distributed systems are discussed in some detail and particular emphasis is placed on the importance of standards and conventions in certain key system components. The ideas and principles of distributed systems are discussed in general terms, but these are illustrated by a number of concrete examples drawn from the context of the high-energy physics environment. (Auth.)

  11. Optimal design of structures with multiple design variables per group and multiple loading conditions on the personal computer

    Science.gov (United States)

    Nguyen, D. T.; Rogers, J. L., Jr.

    1986-01-01

    A finite element based programming system for minimum weight design of a truss-type structure subjected to displacement, stress, and lower and upper bounds on design variables is presented. The programming system consists of a number of independent processors, each performing a specific task. These processors, however, are interfaced through a well-organized data base, thus making the tasks of modifying, updating, or expanding the programming system much easier in a friendly environment provided by many inexpensive personal computers. The proposed software can be viewed as an important step in achieving a 'dummy' finite element for optimization. The programming system has been implemented on both large and small computers (such as VAX, CYBER, IBM-PC, and APPLE) although the focus is on the latter. Examples are presented to demonstrate the capabilities of the code. The present programming system can be used stand-alone or as part of the multilevel decomposition procedure to obtain optimum design for very large scale structural systems. Furthermore, other related research areas such as developing optimization algorithms (or in the larger level: a structural synthesis program) for future trends in using parallel computers may also benefit from this study.

  12. Multi-Core Processor Memory Contention Benchmark Analysis Case Study

    Science.gov (United States)

    Simon, Tyler; McGalliard, James

    2009-01-01

    Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.

  13. VIRTUS: a multi-processor system in FASTBUS

    International Nuclear Information System (INIS)

    Ellett, J.; Jackson, R.; Ritter, R.; Schlein, P.; Yaeger, D.; Zweizig, J.

    1986-01-01

    VIRTUS is a system of parallel MC68000-based processors interconnected by FASTBUS that is used either on-line as an intelligent trigger component or off-line for full event processing. Each processor receives the complete set of data from one event. The host computer, a VAX 11/780, down-line loads all software to the processors, controls and monitors the functioning of all processors, and writes processed data to tape. Instructions, programs, and data are transferred among the processors and the host in the form of fixed format, variable length data blocks. (Auth.)

  14. A computer program for multiple decrement life table analyses.

    Science.gov (United States)

    Poole, W K; Cooley, P C

    1977-06-01

    Life table analysis has traditionally been the tool of choice in analyzing distribution of "survival" times when a parametric form for the survival curve could not be reasonably assumed. Chiang, in two papers [1,2] formalized the theory of life table analyses in a Markov chain framework and derived maximum likelihood estimates of the relevant parameters for the analyses. He also discussed how the techniques could be generalized to consider competing risks and follow-up studies. Although various computer programs exist for doing different types of life table analysis [3] to date, there has not been a generally available, well documented computer program to carry out multiple decrement analyses, either by Chiang's or any other method. This paper describes such a program developed by Research Triangle Institute. A user's manual is available at printing costs which supplements the contents of this paper with a discussion of the formula used in the program listing.

  15. An algebraic substructuring using multiple shifts for eigenvalue computations

    International Nuclear Information System (INIS)

    Ko, Jin Hwan; Jung, Sung Nam; Byun, Do Young; Bai, Zhaojun

    2008-01-01

    Algebraic substructuring (AS) is a state-of-the-art method in eigenvalue computations, especially for large-sized problems, but originally it was designed to calculate only the smallest eigenvalues. Recently, an updated version of AS has been introduced to calculate the interior eigenvalues over a specified range by using a shift concept that is referred to as the shifted AS. In this work, we propose a combined method of both AS and the shifted AS by using multiple shifts for solving a considerable number of eigensolutions in a large-sized problem, which is an emerging computational issue of noise or vibration analysis in vehicle design. In addition, we investigated the accuracy of the shifted AS by presenting an error criterion. The proposed method has been applied to the FE model of an automobile body. The combined method yielded a higher efficiency without loss of accuracy in comparison to the original AS

  16. Monitoring system of multiple fire fighting based on computer vision

    Science.gov (United States)

    Li, Jinlong; Wang, Li; Gao, Xiaorong; Wang, Zeyong; Zhao, Quanke

    2010-10-01

    With the high demand of fire control in spacious buildings, computer vision is playing a more and more important role. This paper presents a new monitoring system of multiple fire fighting based on computer vision and color detection. This system can adjust to the fire position and then extinguish the fire by itself. In this paper, the system structure, working principle, fire orientation, hydrant's angle adjusting and system calibration are described in detail; also the design of relevant hardware and software is introduced. At the same time, the principle and process of color detection and image processing are given as well. The system runs well in the test, and it has high reliability, low cost, and easy nodeexpanding, which has a bright prospect of application and popularization.

  17. Neural Computations in a Dynamical System with Multiple Time Scales.

    Science.gov (United States)

    Mi, Yuanyuan; Lin, Xiaohan; Wu, Si

    2016-01-01

    Neural systems display rich short-term dynamics at various levels, e.g., spike-frequency adaptation (SFA) at the single-neuron level, and short-term facilitation (STF) and depression (STD) at the synapse level. These dynamical features typically cover a broad range of time scales and exhibit large diversity in different brain regions. It remains unclear what is the computational benefit for the brain to have such variability in short-term dynamics. In this study, we propose that the brain can exploit such dynamical features to implement multiple seemingly contradictory computations in a single neural circuit. To demonstrate this idea, we use continuous attractor neural network (CANN) as a working model and include STF, SFA and STD with increasing time constants in its dynamics. Three computational tasks are considered, which are persistent activity, adaptation, and anticipative tracking. These tasks require conflicting neural mechanisms, and hence cannot be implemented by a single dynamical feature or any combination with similar time constants. However, with properly coordinated STF, SFA and STD, we show that the network is able to implement the three computational tasks concurrently. We hope this study will shed light on the understanding of how the brain orchestrates its rich dynamics at various levels to realize diverse cognitive functions.

  18. A computer interface for processing multi-parameter data of multiple event types

    International Nuclear Information System (INIS)

    Katayama, I.; Ogata, H.

    1980-01-01

    A logic circuit called a 'Raw Data Processor (RDP)' which functions as an interface between ADCs and the PDP-11 computer has been developed at RCNP, Osaka University for general use. It enables data processing simultaneously for numbers of events of various types up to 16, and an arbitrary combination of ADCs of any number up to 14 can be assigned to each event type by means of a pinboard matrix. The details of the RDP and its application are described. (orig.)

  19. Effect of processor temperature on film dosimetry

    International Nuclear Information System (INIS)

    Srivastava, Shiv P.; Das, Indra J.

    2012-01-01

    Optical density (OD) of a radiographic film plays an important role in radiation dosimetry, which depends on various parameters, including beam energy, depth, field size, film batch, dose, dose rate, air film interface, postexposure processing time, and temperature of the processor. Most of these parameters have been studied for Kodak XV and extended dose range (EDR) films used in radiation oncology. There is very limited information on processor temperature, which is investigated in this study. Multiple XV and EDR films were exposed in the reference condition (d max. , 10 × 10 cm 2 , 100 cm) to a given dose. An automatic film processor (X-Omat 5000) was used for processing films. The temperature of the processor was adjusted manually with increasing temperature. At each temperature, a set of films was processed to evaluate OD at a given dose. For both films, OD is a linear function of processor temperature in the range of 29.4–40.6°C (85–105°F) for various dose ranges. The changes in processor temperature are directly related to the dose by a quadratic function. A simple linear equation is provided for the changes in OD vs. processor temperature, which could be used for correcting dose in radiation dosimetry when film is used.

  20. Optical Associative Processors For Visual Perception"

    Science.gov (United States)

    Casasent, David; Telfer, Brian

    1988-05-01

    We consider various associative processor modifications required to allow these systems to be used for visual perception, scene analysis, and object recognition. For these applications, decisions on the class of the objects present in the input image are required and thus heteroassociative memories are necessary (rather than the autoassociative memories that have been given most attention). We analyze the performance of both associative processors and note that there is considerable difference between heteroassociative and autoassociative memories. We describe associative processors suitable for realizing functions such as: distortion invariance (using linear discriminant function memory synthesis techniques), noise and image processing performance (using autoassociative memories in cascade with with a heteroassociative processor and with a finite number of autoassociative memory iterations employed), shift invariance (achieved through the use of associative processors operating on feature space data), and the analysis of multiple objects in high noise (which is achieved using associative processing of the output from symbolic correlators). We detail and provide initial demonstrations of the use of associative processors operating on iconic, feature space and symbolic data, as well as adaptive associative processors.

  1. Online Fastbus processor for LEP

    International Nuclear Information System (INIS)

    Mueller, H.

    1986-01-01

    The author describes the online computing aspects of Fastbus systems using a processor module which has been developed at CERN and is now available commercially. These General Purpose Master/Slaves (GPMS) are based on 68000/10 (or optionally 68020/68881) processors. Applications include use as event-filters (DELPHI), supervisory controllers, Fastbus stand-alone diagnostic tools, and multiprocessor array components. The direct mapping of single, 32-bit assembly instructions to execute Fastbus protocols makes the use of a GPM both simple and flexible. Loosely coupled processing in Fastbus networks is possible between GPM's as they support access semaphores and use a two port memory as I/O buffer for Fastbus. Both master and slave-ports support block transfers up to 20 Mbytes/s. The CERN standard Fastbus software and the MoniCa symbolic debugging monitor are available on the GPM with real time, multiprocessing support. (Auth.)

  2. Integrated fuel processor development

    International Nuclear Information System (INIS)

    Ahmed, S.; Pereira, C.; Lee, S. H. D.; Krumpelt, M.

    2001-01-01

    The Department of Energy's Office of Advanced Automotive Technologies has been supporting the development of fuel-flexible fuel processors at Argonne National Laboratory. These fuel processors will enable fuel cell vehicles to operate on fuels available through the existing infrastructure. The constraints of on-board space and weight require that these fuel processors be designed to be compact and lightweight, while meeting the performance targets for efficiency and gas quality needed for the fuel cell. This paper discusses the performance of a prototype fuel processor that has been designed and fabricated to operate with liquid fuels, such as gasoline, ethanol, methanol, etc. Rated for a capacity of 10 kWe (one-fifth of that needed for a car), the prototype fuel processor integrates the unit operations (vaporization, heat exchange, etc.) and processes (reforming, water-gas shift, preferential oxidation reactions, etc.) necessary to produce the hydrogen-rich gas (reformate) that will fuel the polymer electrolyte fuel cell stacks. The fuel processor work is being complemented by analytical and fundamental research. With the ultimate objective of meeting on-board fuel processor goals, these studies include: modeling fuel cell systems to identify design and operating features; evaluating alternative fuel processing options; and developing appropriate catalysts and materials. Issues and outstanding challenges that need to be overcome in order to develop practical, on-board devices are discussed

  3. Special processor for in-core control systems

    International Nuclear Information System (INIS)

    Golovanov, M.N.; Duma, V.R.; Levin, G.L.; Mel'nikov, A.V.; Polikanin, A.V.; Filatov, V.P.

    1978-01-01

    The BUTs-20 special processor is discussed, designed to control the units of the in-core control equipment which are incorporated into the VECTOR communication channel, and to provide preliminary data processing prior to computer calculations. A set of instructions and flowsheet of the processor, organization of its communication with memories and other units of the system are given. The processor components: a control unit and an arithmetic logical unit are discussed. It is noted that the special processor permits more effective utilization of the computer time

  4. Logistic Fuel Processor Development

    National Research Council Canada - National Science Library

    Salavani, Reza

    2004-01-01

    ... to light gases then steam reform the light gases into hydrogen rich stream. This report documents the efforts in developing a fuel processor capable of providing hydrogen to a 3kW fuel cell stack...

  5. 3081/E processor

    International Nuclear Information System (INIS)

    Kunz, P.F.; Gravina, M.; Oxoby, G.

    1984-04-01

    The 3081/E project was formed to prepare a much improved IBM mainframe emulator for the future. Its design is based on a large amount of experience in using the 168/E processor to increase available CPU power in both online and offline environments. The processor will be at least equal to the execution speed of a 370/168 and up to 1.5 times faster for heavy floating point code. A single processor will thus be at least four times more powerful than the VAX 11/780, and five processors on a system would equal at least the performance of the IBM 3081K. With its large memory space and simple but flexible high speed interface, the 3081/E is well suited for the online and offline needs of high energy physics in the future

  6. Logistic Fuel Processor Development

    National Research Council Canada - National Science Library

    Salavani, Reza

    2004-01-01

    The Air Base Technologies Division of the Air Force Research Laboratory has developed a logistic fuel processor that removes the sulfur content of the fuel and in the process converts logistic fuel...

  7. Adaptive signal processor

    Energy Technology Data Exchange (ETDEWEB)

    Walz, H.V.

    1980-07-01

    An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 ..mu..sec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed.

  8. Adaptive signal processor

    International Nuclear Information System (INIS)

    Walz, H.V.

    1980-07-01

    An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 μsec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed

  9. Supertracker: A Programmable Parallel Pipeline Arithmetic Processor For Auto-Cueing Target Processing

    Science.gov (United States)

    Mack, Harold; Reddi, S. S.

    1980-04-01

    Supertracker represents a programmable parallel pipeline computer architecture that has been designed to meet the real time image processing requirements of auto-cueing target data processing. The prototype bread-board currently under development will be designed to perform input video preprocessing and processing for 525-line and 875-line TV formats FLIR video, automatic display gain and contrast control, and automatic target cueing, classification, and tracking. The video preprocessor is capable of performing operations full frames of video data in real time, e.g., frame integration, storage, 3 x 3 convolution, and neighborhood processing. The processor architecture is being implemented using bit-slice microprogrammable arithmetic processors, operating in parallel. Each processor is capable of up to 20 million operations per second. Multiple frame memories are used for additional flexibility.

  10. Multiple single-element transducer photoacoustic computed tomography system

    Science.gov (United States)

    Kalva, Sandeep Kumar; Hui, Zhe Zhi; Pramanik, Manojit

    2018-02-01

    Light absorption by the chromophores (hemoglobin, melanin, water etc.) present in any biological tissue results in local temperature rise. This rise in temperature results in generation of pressure waves due to the thermoelastic expansion of the tissue. In a circular scanning photoacoustic computed tomography (PACT) system, these pressure waves can be detected using a single-element ultrasound transducer (SUST) (while rotating in full 360° around the sample) or using a circular array transducer. SUST takes several minutes to acquire the PA data around the sample whereas the circular array transducer takes only a fraction of seconds. Hence, for real time imaging circular array transducers are preferred. However, these circular array transducers are custom made, expensive and not easily available in the market whereas SUSTs are cheap and readily available in the market. Using SUST for PACT systems is still cost effective. In order to reduce the scanning time to few seconds instead of using single SUST (rotating 360° ), multiple SUSTs can be used at the same time to acquire the PA data. This will reduce the scanning time by two-fold in case of two SUSTs (rotating 180° ) or by four-fold and eight-fold in case of four SUSTs (rotating 90° ) and eight SUSTs (rotating 45° ) respectively. Here we show that with multiple SUSTs, similar PA images (numerical and experimental phantom data) can be obtained as that of PA images obtained using single SUST.

  11. Continuous analog of multiplicative algebraic reconstruction technique for computed tomography

    Science.gov (United States)

    Tateishi, Kiyoko; Yamaguchi, Yusaku; Abou Al-Ola, Omar M.; Kojima, Takeshi; Yoshinaga, Tetsuya

    2016-03-01

    We propose a hybrid dynamical system as a continuous analog to the block-iterative multiplicative algebraic reconstruction technique (BI-MART), which is a well-known iterative image reconstruction algorithm for computed tomography. The hybrid system is described by a switched nonlinear system with a piecewise smooth vector field or differential equation and, for consistent inverse problems, the convergence of non-negatively constrained solutions to a globally stable equilibrium is guaranteed by the Lyapunov theorem. Namely, we can prove theoretically that a weighted Kullback-Leibler divergence measure can be a common Lyapunov function for the switched system. We show that discretizing the differential equation by using the first-order approximation (Euler's method) based on the geometric multiplicative calculus leads to the same iterative formula of the BI-MART with the scaling parameter as a time-step of numerical discretization. The present paper is the first to reveal that a kind of iterative image reconstruction algorithm is constructed by the discretization of a continuous-time dynamical system for solving tomographic inverse problems. Iterative algorithms with not only the Euler method but also the Runge-Kutta methods of lower-orders applied for discretizing the continuous-time system can be used for image reconstruction. A numerical example showing the characteristics of the discretized iterative methods is presented.

  12. Neural Computations in a Dynamical System with Multiple Time Scales

    Directory of Open Access Journals (Sweden)

    Yuanyuan Mi

    2016-09-01

    Full Text Available Neural systems display rich short-term dynamics at various levels, e.g., spike-frequencyadaptation (SFA at single neurons, and short-term facilitation (STF and depression (STDat neuronal synapses. These dynamical features typically covers a broad range of time scalesand exhibit large diversity in different brain regions. It remains unclear what the computationalbenefit for the brain to have such variability in short-term dynamics is. In this study, we proposethat the brain can exploit such dynamical features to implement multiple seemingly contradictorycomputations in a single neural circuit. To demonstrate this idea, we use continuous attractorneural network (CANN as a working model and include STF, SFA and STD with increasing timeconstants in their dynamics. Three computational tasks are considered, which are persistent activity,adaptation, and anticipative tracking. These tasks require conflicting neural mechanisms, andhence cannot be implemented by a single dynamical feature or any combination with similar timeconstants. However, with properly coordinated STF, SFA and STD, we show that the network isable to implement the three computational tasks concurrently. We hope this study will shed lighton the understanding of how the brain orchestrates its rich dynamics at various levels to realizediverse cognitive functions.

  13. Simulation of a parallel processor on a serial processor: The neutron diffusion equation

    International Nuclear Information System (INIS)

    Honeck, H.C.

    1981-01-01

    Parallel processors could provide the nuclear industry with very high computing power at a very moderate cost. Will we be able to make effective use of this power. This paper explores the use of a very simple parallel processor for solving the neutron diffusion equation to predict power distributions in a nuclear reactor. We first describe a simple parallel processor and estimate its theoretical performance based on the current hardware technology. Next, we show how the parallel processor could be used to solve the neutron diffusion equation. We then present the results of some simulations of a parallel processor run on a serial processor and measure some of the expected inefficiencies. Finally we extrapolate the results to estimate how actual design codes would perform. We find that the standard numerical methods for solving the neutron diffusion equation are still applicable when used on a parallel processor. However, some simple modifications to these methods will be necessary if we are to achieve the full power of these new computers. (orig.) [de

  14. Assembly processor program converts symbolic programming language to machine language

    Science.gov (United States)

    Pelto, E. V.

    1967-01-01

    Assembly processor program converts symbolic programming language to machine language. This program translates symbolic codes into computer understandable instructions, assigns locations in storage for successive instructions, and computer locations from symbolic addresses.

  15. Functional unit for a processor

    NARCIS (Netherlands)

    Rohani, A.; Kerkhoff, Hans G.

    2013-01-01

    The invention relates to a functional unit for a processor, such as a Very Large Instruction Word Processor. The invention further relates to a processor comprising at least one such functional unit. The invention further relates to a functional unit and processor capable of mitigating the effect of

  16. Accuracy Limitations in Optical Linear Algebra Processors

    Science.gov (United States)

    Batsell, Stephen Gordon

    1990-01-01

    One of the limiting factors in applying optical linear algebra processors (OLAPs) to real-world problems has been the poor achievable accuracy of these processors. Little previous research has been done on determining noise sources from a systems perspective which would include noise generated in the multiplication and addition operations, noise from spatial variations across arrays, and from crosstalk. In this dissertation, we propose a second-order statistical model for an OLAP which incorporates all these system noise sources. We now apply this knowledge to determining upper and lower bounds on the achievable accuracy. This is accomplished by first translating the standard definition of accuracy used in electronic digital processors to analog optical processors. We then employ our second-order statistical model. Having determined a general accuracy equation, we consider limiting cases such as for ideal and noisy components. From the ideal case, we find the fundamental limitations on improving analog processor accuracy. From the noisy case, we determine the practical limitations based on both device and system noise sources. These bounds allow system trade-offs to be made both in the choice of architecture and in individual components in such a way as to maximize the accuracy of the processor. Finally, by determining the fundamental limitations, we show the system engineer when the accuracy desired can be achieved from hardware or architecture improvements and when it must come from signal pre-processing and/or post-processing techniques.

  17. A fast inner product processor based on equal alignments

    Energy Technology Data Exchange (ETDEWEB)

    Smith, S.P.; Torng, H.C.

    1985-11-01

    Inner product computation is an important operation, invoked repeatedly in matrix multiplications. A high-speed inner product processor can be very useful (among many possible applications) in real-time signal processing. This paper presents the design of a fast inner product processor, with appreciably reduced latency and cost. The inner product processor is implemented with a tree of carry-propagate or carry-save adders; this structure is obtained with the incorporation of three innovations in the conventional multiply/add tree: The leaf-multipliers are expanded into adder subtrees, thus achieving an O(log Nb) latency, where N denotes the number of elements in a vector and b the number of bits in each element. The partial products, to be summed in producing an inner product, are reordered according to their ''minimum alignments.'' This reordering brings approximately a 20% savings in hardware-including adders and data paths. The reduction in adder widths also yields savings in carry propagation time for carry-propagate adders. For trees implemented with carry-save adders, the partial product reordering also serves to truncate the carry propagation chain in the final propagation stage by 2 log b - 1 positions, thus significantly reducing the latency further. A form of the Baugh and Wooley algorithm is adopted to implement two's complement notation with changes only in peripheral hardware.

  18. The UA1 upgrade calorimeter trigger processor

    International Nuclear Information System (INIS)

    Bains, M.; Charleton, D.; Ellis, N.; Garvey, J.; Gregory, J.; Jimack, M.P.; Jovanovic, P.; Kenyon, I.R.; Baird, S.A.; Campbell, D.; Cawthraw, M.; Coughlan, J.; Flynn, P.; Galagedera, S.; Grayer, G.; Halsall, R.; Shah, T.P.; Stephens, R.; Biddulph, P.; Eisenhandler, E.; Fensome, I.F.; Landon, M.; Robinson, D.; Oliver, J.; Sumorok, K.

    1990-01-01

    The increased luminosity of the improved CERN Collider and the more subtle signals of second-generation collider physics demand increasingly sophisticated triggering. We have built a new first-level trigger processor designed to use the excellent granularity of the UA1 upgrade calorimeter. This device is entirely digital and handles events in 1.5 μs, thus introducing no dead time. Its most novel feature is fast two-dimensional electromagnetic cluster-finding with the possibility of demanding an isolated shower of limited penetration. The processor allows multiple combinations of triggers on electromagnetic showers, hadronic jets and energy sums, including a total-energy veto of multiple interactions and a full vector sum of missing transverse energy. This hard-wired processor is about five times more powerful than its predecessor, and makes extensive use of pipelining techniques. It was used extensively in the 1988 and 1989 runs of the CERN Collider. (orig.)

  19. The UA1 upgrade calorimeter trigger processor

    International Nuclear Information System (INIS)

    Bains, N.; Baird, S.A.; Biddulph, P.

    1990-01-01

    The increased luminosity of the improved CERN Collider and the more subtle signals of second-generation collider physics demand increasingly sophisticated triggering. We have built a new first-level trigger processor designed to use the excellent granularity of the UA1 upgrade calorimeter. This device is entirely digital and handles events in 1.5 μs, thus introducing no deadtime. Its most novel feature is fast two-dimensional electromagnetic cluster-finding with the possibility of demanding an isolated shower of limited penetration. The processor allows multiple combinations of triggers on electromagnetic showers, hadronic jets and energy sums, including a total-energy veto of multiple interactions and a full vector sum of missing transverse energy. This hard-wired processor is about five times more powerful than its predecessor, and makes extensive use of pipelining techniques. It was used extensively in the 1988 and 1989 runs of the CERN Collider. (author)

  20. A prediction method for job runtimes on shared processors: Survey, statistical analysis and new avenues

    NARCIS (Netherlands)

    Dobber, A.M.; van der Mei, R.D.; Koole, G.M.

    2007-01-01

    Grid computing is an emerging technology by which huge numbers of processors over the world create a global source of processing power. Their collaboration makes it possible to perform computations that are too extensive to perform on a single processor. On a grid, processors may connect and

  1. Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems

    Science.gov (United States)

    Downie, John D.; Goodman, Joseph W.

    1989-10-01

    The accuracy requirements of optical processors in adaptive optics systems are determined by estimating the required accuracy in a general optical linear algebra processor (OLAP) that results in a smaller average residual aberration than that achieved with a conventional electronic digital processor with some specific computation speed. Special attention is given to an error analysis of a general OLAP with regard to the residual aberration that is created in an adaptive mirror system by the inaccuracies of the processor, and to the effect of computational speed of an electronic processor on the correction. Results are presented on the ability of an OLAP to compete with a digital processor in various situations.

  2. Temporal fringe pattern analysis with parallel computing

    International Nuclear Information System (INIS)

    Tuck Wah Ng; Kar Tien Ang; Argentini, Gianluca

    2005-01-01

    Temporal fringe pattern analysis is invaluable in transient phenomena studies but necessitates long processing times. Here we describe a parallel computing strategy based on the single-program multiple-data model and hyperthreading processor technology to reduce the execution time. In a two-node cluster workstation configuration we found that execution periods were reduced by 1.6 times when four virtual processors were used. To allow even lower execution times with an increasing number of processors, the time allocated for data transfer, data read, and waiting should be minimized. Parallel computing is found here to present a feasible approach to reduce execution times in temporal fringe pattern analysis

  3. Recursive Matrix Inverse Update On An Optical Processor

    Science.gov (United States)

    Casasent, David P.; Baranoski, Edward J.

    1988-02-01

    A high accuracy optical linear algebraic processor (OLAP) using the digital multiplication by analog convolution (DMAC) algorithm is described for use in an efficient matrix inverse update algorithm with speed and accuracy advantages. The solution of the parameters in the algorithm are addressed and the advantages of optical over digital linear algebraic processors are advanced.

  4. 3081//sub E/ processor

    International Nuclear Information System (INIS)

    Kunz, P.F.; Gravina, M.; Oxoby, G.; Trang, Q.; Fucci, A.; Jacobs, D.; Martin, B.; Storr, K.

    1983-03-01

    Since the introduction of the 168//sub E/, emulating processors have been successful over an amazingly wide range of applications. This paper will describe a second generation processor, the 3081//sub E/. This new processor, which is being developed as a collaboration between SLAC and CERN, goes beyond just fixing the obvious faults of the 168//sub E/. Not only will the 3081//sub E/ have much more memory space, incorporate many more IBM instructions, and have much more memory space, incorporate many more IBM instructions, and have full double precision floating point arithmetic, but it will also have faster execution times and be much simpler to build, debug, and maintain. The simple interface and reasonable cost of the 168//sub E/ will be maintained for the 3081//sub E/

  5. Noise limitations in optical linear algebra processors.

    Science.gov (United States)

    Batsell, S G; Jong, T L; Walkup, J F; Krile, T F

    1990-05-10

    A general statistical noise model is presented for optical linear algebra processors. A statistical analysis which includes device noise, the multiplication process, and the addition operation is undertaken. We focus on those processes which are architecturally independent. Finally, experimental results which verify the analytical predictions are also presented.

  6. Sojourn time tails in processor-sharing systems

    NARCIS (Netherlands)

    Egorova, R.R.

    2009-01-01

    The processor-sharing discipline was originally introduced as a modeling abstraction for the design and performance analysis of the processing unit of a computer system. Under the processor-sharing discipline, all active tasks are assumed to be processed simultaneously, receiving an equal share of

  7. ACP/R3000 processors in data acquisition systems

    International Nuclear Information System (INIS)

    Deppe, J.; Areti, H.; Atac, R.

    1989-02-01

    We describe ACP/R3000 processor based data acquisition systems for high energy physics. This VME bus compatible processor board, with a computational power equivalent to 15 VAX 11/780s or better, contains 8 Mb of memory for event buffering and has a high speed secondary bus that allows data gathering from front end electronics. 2 refs., 3 figs

  8. A high-speed analog neural processor

    NARCIS (Netherlands)

    Masa, P.; Masa, Peter; Hoen, Klaas; Hoen, Klaas; Wallinga, Hans

    1994-01-01

    Targeted at high-energy physics research applications, our special-purpose analog neural processor can classify up to 70 dimensional vectors within 50 nanoseconds. The decision-making process of the implemented feedforward neural network enables this type of computation to tolerate weight

  9. The Central Trigger Processor (CTP)

    CERN Multimedia

    Franchini, Matteo

    2016-01-01

    The Central Trigger Processor (CTP) receives trigger information from the calorimeter and muon trigger processors, as well as from other sources of trigger. It makes the Level-1 decision (L1A) based on a trigger menu.

  10. Very Long Instruction Word Processors

    Indian Academy of Sciences (India)

    Pentium Processor have modified the processor architecture to exploit parallelism in a program. .... The type of operation itself is encoded using 14 bits. .... text of designing simple architectures with low power consump- tion and execute x86 ...

  11. Embedded processor extensions for image processing

    Science.gov (United States)

    Thevenin, Mathieu; Paindavoine, Michel; Letellier, Laurent; Heyrman, Barthélémy

    2008-04-01

    The advent of camera phones marks a new phase in embedded camera sales. By late 2009, the total number of camera phones will exceed that of both conventional and digital cameras shipped since the invention of photography. Use in mobile phones of applications like visiophony, matrix code readers and biometrics requires a high degree of component flexibility that image processors (IPs) have not, to date, been able to provide. For all these reasons, programmable processor solutions have become essential. This paper presents several techniques geared to speeding up image processors. It demonstrates that a gain of twice is possible for the complete image acquisition chain and the enhancement pipeline downstream of the video sensor. Such results confirm the potential of these computing systems for supporting future applications.

  12. Parallel processor programs in the Federal Government

    Science.gov (United States)

    Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

    1985-01-01

    In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.

  13. The Molen Polymorphic Media Processor

    NARCIS (Netherlands)

    Kuzmanov, G.K.

    2004-01-01

    In this dissertation, we address high performance media processing based on a tightly coupled co-processor architectural paradigm. More specifically, we introduce a reconfigurable media augmentation of a general purpose processor and implement it into a fully operational processor prototype. The

  14. Dual-core Itanium Processor

    CERN Multimedia

    2006-01-01

    Intel’s first dual-core Itanium processor, code-named "Montecito" is a major release of Intel's Itanium 2 Processor Family, which implements the Intel Itanium architecture on a dual-core processor with two cores per die (integrated circuit). Itanium 2 is much more powerful than its predecessor. It has lower power consumption and thermal dissipation.

  15. Design and Delivery of Multiple Server-Side Computer Languages Course

    Science.gov (United States)

    Wang, Shouhong; Wang, Hai

    2011-01-01

    Given the emergence of service-oriented architecture, IS students need to be knowledgeable of multiple server-side computer programming languages to be able to meet the needs of the job market. This paper outlines the pedagogy of an innovative course of multiple server-side computer languages for the undergraduate IS majors. The paper discusses…

  16. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    Science.gov (United States)

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.

  17. High-Performance Linear Algebra Processor using FPGA

    National Research Council Canada - National Science Library

    Johnson, J

    2004-01-01

    With recent advances in FPGA (Field Programmable Gate Array) technology it is now feasible to use these devices to build special purpose processors for floating point intensive applications that arise in scientific computing...

  18. Multimode power processor

    Science.gov (United States)

    O'Sullivan, George A.; O'Sullivan, Joseph A.

    1999-01-01

    In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.

  19. MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

    Energy Technology Data Exchange (ETDEWEB)

    Barhen, Jacob [ORNL; Kerekes, Ryan A [ORNL; ST Charles, Jesse Lee [ORNL; Buckner, Mark A [ORNL

    2008-01-01

    performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.

  20. MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

    International Nuclear Information System (INIS)

    Barhen, Jacob; Kerekes, Ryan A.; St Charles, Jesse Lee; Buckner, Mark A.

    2008-01-01

    performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R and D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.

  1. Micro processors for plant protection

    International Nuclear Information System (INIS)

    McAffer, N.T.C.

    1976-01-01

    Micro computers can be used satisfactorily in general protection duties with economic advantages over hardwired systems. The reliability of such protection functions can be enhanced by keeping the task performed by each protection micro processor simple and by avoiding such a task being dependent on others in any substantial way. This implies that vital work done for any task is kept within it and that any communications from it to outside or to it from outside are restricted to those for controlling data transfer. Also that the amount of this data should be the minimum consistent with satisfactory task execution. Technology is changing rapidly and devices may become obsolete and be supplanted by new ones before their theoretical reliability can be confirmed or otherwise by field service. This emphasises the need for users to pool device performance data so that effective reliability judgements can be made within the lifetime of the devices. (orig.) [de

  2. Treecode with a Special-Purpose Processor

    Science.gov (United States)

    Makino, Junichiro

    1991-08-01

    We describe an implementation of the modified Barnes-Hut tree algorithm for a gravitational N-body calculation on a GRAPE (GRAvity PipE) backend processor. GRAPE is a special-purpose computer for N-body calculations. It receives the positions and masses of particles from a host computer and then calculates the gravitational force at each coordinate specified by the host. To use this GRAPE processor with the hierarchical tree algorithm, the host computer must maintain a list of all nodes that exert force on a particle. If we create this list for each particle of the system at each timestep, the number of floating-point operations on the host and that on GRAPE would become comparable, and the increased speed obtained by using GRAPE would be small. In our modified algorithm, we create a list of nodes for many particles. Thus, the amount of the work required of the host is significantly reduced. This algorithm was originally developed by Barnes in order to vectorize the force calculation on a Cyber 205. With this algorithm, the computing time of the force calculation becomes comparable to that of the tree construction, if the GRAPE backend processor is sufficiently fast. The obtained speed-up factor is 30 to 50 for a RISC-based host computer and GRAPE-1A with a peak speed of 240 Mflops.

  3. Simulation of a processor switching circuit with APLSV

    International Nuclear Information System (INIS)

    Dilcher, H.

    1979-01-01

    The report describes the simulation of a processor switching circuit with APL. Furthermore an APL function is represented to simulate a processor in an assembly like language. Both together serve as a tool for studying processor properties. By means of the programming function it is also possible to program other simulated processors. The processor is to be used in the processing of data in real time analysis that occur in high energy physics experiments. The data are already offered to the computer in digitalized form. A typical data rate is at 10 KB/ sec. The data are structured in blocks. The particular blocks are 1 KB wide and are independent from each other. Aprocessor has to decide, whether the block data belong to an event that is part of the backround noise and can therefore be forgotten, or whether the data should be saved for a later evaluation. (orig./WB) [de

  4. Analysis of Computer Experiments with Multiple Noise Sources

    DEFF Research Database (Denmark)

    Dehlendorff, Christian; Kulahci, Murat; Andersen, Klaus Kaae

    2010-01-01

    In this paper we present a modeling framework for analyzing computer models with two types of variations. The paper is based on a case study of an orthopedic surgical unit, which has both controllable and uncontrollable factors. Our results show that this structure of variation can be modeled...

  5. Materials and nanosystems : interdisciplinary computational modeling at multiple scales

    International Nuclear Information System (INIS)

    Huber, S.E.

    2014-01-01

    Over the last five decades, computer simulation and numerical modeling have become valuable tools complementing the traditional pillars of science, experiment and theory. In this thesis, several applications of computer-based simulation and modeling shall be explored in order to address problems and open issues in chemical and molecular physics. Attention shall be paid especially to the different degrees of interrelatedness and multiscale-flavor, which may - at least to some extent - be regarded as inherent properties of computational chemistry. In order to do so, a variety of computational methods are used to study features of molecular systems which are of relevance in various branches of science and which correspond to different spatial and/or temporal scales. Proceeding from small to large measures, first, an application in astrochemistry, the investigation of spectroscopic and energetic aspects of carbonic acid isomers shall be discussed. In this respect, very accurate and hence at the same time computationally very demanding electronic structure methods like the coupled-cluster approach are employed. These studies are followed by the discussion of an application in the scope of plasma-wall interaction which is related to nuclear fusion research. There, the interactions of atoms and molecules with graphite surfaces are explored using density functional theory methods. The latter are computationally cheaper than coupled-cluster methods and thus allow the treatment of larger molecular systems, but yield less accuracy and especially reduced error control at the same time. The subsequently presented exploration of surface defects at low-index polar zinc oxide surfaces, which are of interest in materials science and surface science, is another surface science application. The necessity to treat even larger systems of several hundreds of atoms requires the use of approximate density functional theory methods. Thin gold nanowires consisting of several thousands of

  6. A pre- and post-processor for the ICOOL muon transport code

    International Nuclear Information System (INIS)

    Fawley, W.M.

    2001-01-01

    ICOOL[1] is a Fortran77 macroparticle transport code widely used by researchers to study the front end of a neutrino factory/muon collider[2]. In part due to the desire that ICOOL be usable over multiple computer platforms and operating systems, the code uses simple text files for input/output services. This choice together with user-driven requests for greater and greater choice of lattice element type and configuration has led to ICOOL input decks becoming rather difficult to compose and modify easily. Moreover, the lack of a standard graphical post-processor has prevented many ICOOL users from extracting all but the most simple results from the output files. Here I present two attempts to improve this situation: First, a simple but quite general graphical pre-processor (NIME) written in the Tcl/TK[3] to permit users to write and maintain ASCII-formatted input files by use of simple macro definitions and expansions. Second, an interactive post-processor written in Fortran90 and NCAR graphics, which allows users to define, extract, and then examine the behavior of various particle subsets. In this paper I show some examples of use of both the pre- and post-processor for a standard ICOOL run

  7. Evaluation of the Intel Sandy Bridge-EP server processor

    CERN Document Server

    Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

    2012-01-01

    In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing an 8-core “Sandy Bridge-EP” processor with Intel’s previous microarchitecture, the “Westmere-EP”. The Intel marketing names for these processors are “Xeon E5-2600 processor series” and “Xeon 5600 processor series”, respectively. Both processors are produced in a 32nm process, and both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores ...

  8. Trigger and decision processors

    International Nuclear Information System (INIS)

    Franke, G.

    1980-11-01

    In recent years there have been many attempts in high energy physics to make trigger and decision processes faster and more sophisticated. This became necessary due to a permanent increase of the number of sensitive detector elements in wire chambers and calorimeters, and in fact it was possible because of the fast developments in integrated circuits technique. In this paper the present situation will be reviewed. The discussion will be mainly focussed upon event filtering by pure software methods and - rather hardware related - microprogrammable processors as well as random access memory triggers. (orig.)

  9. Optical Finite Element Processor

    Science.gov (United States)

    Casasent, David; Taylor, Bradley K.

    1986-01-01

    A new high-accuracy optical linear algebra processor (OLAP) with many advantageous features is described. It achieves floating point accuracy, handles bipolar data by sign-magnitude representation, performs LU decomposition using only one channel, easily partitions and considers data flow. A new application (finite element (FE) structural analysis) for OLAPs is introduced and the results of a case study presented. Error sources in encoded OLAPs are addressed for the first time. Their modeling and simulation are discussed and quantitative data are presented. Dominant error sources and the effects of composite error sources are analyzed.

  10. Computer simulation of FT-NMR multiple pulse experiment

    Science.gov (United States)

    Allouche, A.; Pouzard, G.

    1989-04-01

    Using the product operator formalism in its real form, SIMULDENS expands the density matrix of a scalar coupled nuclear spin system and simulates analytically a large variety of FT-NMR multiple pulse experiments. The observable transverse magnetizations are stored and can be combined to represent signal accumulation. The programming language is VAX PASCAL, but a MacIntosh Turbo Pascal Version is also available.

  11. Computer simulation of FT-NMR multiple pulse experiment

    International Nuclear Information System (INIS)

    Allouche, A.; Pouzard, G.

    1989-01-01

    Using the product operator formalism in its real form, SIMULDENS expands the density matrix of a scalar coupled nuclear spin system and simulates analytically a large variety of FT-NMR multiple pulse experiments. The observable transverse magnetizations are stored and can be combined to represent signal accumulation. The programming language is VAX PASCAL, but a MacIntosh Turbo Pascal Version is also available. (orig.)

  12. Bank switched memory interface for an image processor

    International Nuclear Information System (INIS)

    Barron, M.; Downward, J.

    1980-09-01

    A commercially available image processor is interfaced to a PDP-11/45 through an 8K window of memory addresses. When the image processor was not in use it was desired to be able to use the 8K address space as real memory. The standard method of accomplishing this would have been to use UNIBUS switches to switch in either the physical 8K bank of memory or the image processor memory. This method has the disadvantage of being rather expensive. As a simple alternative, a device was built to selectively enable or disable either an 8K bank of memory or the image processor memory. To enable the image processor under program control, GEN is contracted in size, the memory is disabled, a device partition for the image processor is created above GEN, and the image processor memory is enabled. The process is reversed to restore memory to GEN. The hardware to enable/disable the image and computer memories is controlled using spare bits from a DR-11K output register. The image processor and physical memory can be switched in or out on line with no adverse affects on the system's operation

  13. Integrating multiple scientific computing needs via a Private Cloud infrastructure

    International Nuclear Information System (INIS)

    Bagnasco, S; Berzano, D; Brunetti, R; Lusso, S; Vallero, S

    2014-01-01

    In a typical scientific computing centre, diverse applications coexist and share a single physical infrastructure. An underlying Private Cloud facility eases the management and maintenance of heterogeneous use cases such as multipurpose or application-specific batch farms, Grid sites catering to different communities, parallel interactive data analysis facilities and others. It allows to dynamically and efficiently allocate resources to any application and to tailor the virtual machines according to the applications' requirements. Furthermore, the maintenance of large deployments of complex and rapidly evolving middleware and application software is eased by the use of virtual images and contextualization techniques; for example, rolling updates can be performed easily and minimizing the downtime. In this contribution we describe the Private Cloud infrastructure at the INFN-Torino Computer Centre, that hosts a full-fledged WLCG Tier-2 site and a dynamically expandable PROOF-based Interactive Analysis Facility for the ALICE experiment at the CERN LHC and several smaller scientific computing applications. The Private Cloud building blocks include the OpenNebula software stack, the GlusterFS filesystem (used in two different configurations for worker- and service-class hypervisors) and the OpenWRT Linux distribution (used for network virtualization). A future integration into a federated higher-level infrastructure is made possible by exposing commonly used APIs like EC2 and by using mainstream contextualization tools like CloudInit.

  14. Design of Processors with Reconfigurable Microarchitecture

    Directory of Open Access Journals (Sweden)

    Andrey Mokhov

    2014-01-01

    Full Text Available Energy becomes a dominating factor for a wide spectrum of computations: from intensive data processing in “big data” companies resulting in large electricity bills, to infrastructure monitoring with wireless sensors relying on energy harvesting. In this context it is essential for a computation system to be adaptable to the power supply and the service demand, which often vary dramatically during runtime. In this paper we present an approach to building processors with reconfigurable microarchitecture capable of changing the way they fetch and execute instructions depending on energy availability and application requirements. We show how to use Conditional Partial Order Graphs to formally specify the microarchitecture of such a processor, explore the design possibilities for its instruction set, and synthesise the instruction decoder using correct-by-construction techniques. The paper is focused on the design methodology, which is evaluated by implementing a power-proportional version of Intel 8051 microprocessor.

  15. Lattice gauge theory using parallel processors

    International Nuclear Information System (INIS)

    Lee, T.D.; Chou, K.C.; Zichichi, A.

    1987-01-01

    The book's contents include: Lattice Gauge Theory Lectures: Introduction and Current Fermion Simulations; Monte Carlo Algorithms for Lattice Gauge Theory; Specialized Computers for Lattice Gauge Theory; Lattice Gauge Theory at Finite Temperature: A Monte Carlo Study; Computational Method - An Elementary Introduction to the Langevin Equation, Present Status of Numerical Quantum Chromodynamics; Random Lattice Field Theory; The GF11 Processor and Compiler; and The APE Computer and First Physics Results; Columbia Supercomputer Project: Parallel Supercomputer for Lattice QCD; Statistical and Systematic Errors in Numerical Simulations; Monte Carlo Simulation for LGT and Programming Techniques on the Columbia Supercomputer; Food for Thought: Five Lectures on Lattice Gauge Theory

  16. Security prospects through cloud computing by adopting multiple clouds

    DEFF Research Database (Denmark)

    Jensen, Meiko; Schwenk, Jörg; Bohli, Jens Matthias

    2011-01-01

    Clouds impose new security challenges, which are amongst the biggest obstacles when considering the usage of cloud services. This triggered a lot of research activities in this direction, resulting in a quantity of proposals targeting the various security threats. Besides the security issues coming...... with the cloud paradigm, it can also provide a new set of unique features which open the path towards novel security approaches, techniques and architectures. This paper initiates this discussion by contributing a concept which achieves security merits by making use of multiple distinct clouds at the same time....

  17. Quantum chemistry on a superconducting quantum processor

    Energy Technology Data Exchange (ETDEWEB)

    Kaicher, Michael P.; Wilhelm, Frank K. [Theoretical Physics, Saarland University, 66123 Saarbruecken (Germany); Love, Peter J. [Department of Physics and Astronomy, Tufts University, Medford, MA 02155 (United States)

    2016-07-01

    Quantum chemistry is the most promising civilian application for quantum processors to date. We study its adaptation to superconducting (sc) quantum systems, computing the ground state energy of LiH through a variational hybrid quantum classical algorithm. We demonstrate how interactions native to sc qubits further reduce the amount of quantum resources needed, pushing sc architectures as a near-term candidate for simulations of more complex atoms/molecules.

  18. Debugging in a multi-processor environment

    International Nuclear Information System (INIS)

    Spann, J.M.

    1981-01-01

    The Supervisory Control and Diagnostic System (SCDS) for the Mirror Fusion Test Facility (MFTF) consists of nine 32-bit minicomputers arranged in a tightly coupled distributed computer system utilizing a share memory as the data exchange medium. Debugging of more than one program in the multi-processor environment is a difficult process. This paper describes what new tools were developed and how the testing of software is performed in the SCDS for the MFTF project

  19. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    Science.gov (United States)

    Hristov, Ivan; Goranov, Goran; Hristova, Radoslava

    2018-02-01

    We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named "Ivy Bridge-EP") in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named "Knights Landing" (KNL). The results show 2 times better performance on KNL processor.

  20. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    Directory of Open Access Journals (Sweden)

    Hristov Ivan

    2018-01-01

    Full Text Available We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named “Ivy Bridge-EP” in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named “Knights Landing” (KNL. The results show 2 times better performance on KNL processor.

  1. Accelerating Climate Simulations Through Hybrid Computing

    Science.gov (United States)

    Zhou, Shujia; Sinno, Scott; Cruz, Carlos; Purcell, Mark

    2009-01-01

    Unconventional multi-core processors (e.g., IBM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AMD) using MPI. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (1) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Virtualization (DAV) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading compute-intensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been offloaded to multiple Cell blades with approx.10% network overhead.

  2. Accelerating molecular dynamic simulation on the cell processor and Playstation 3.

    Science.gov (United States)

    Luttmann, Edgar; Ensign, Daniel L; Vaidyanathan, Vishal; Houston, Mike; Rimon, Noam; Øland, Jeppe; Jayachandran, Guha; Friedrichs, Mark; Pande, Vijay S

    2009-01-30

    Implementation of molecular dynamics (MD) calculations on novel architectures will vastly increase its power to calculate the physical properties of complex systems. Herein, we detail algorithmic advances developed to accelerate MD simulations on the Cell processor, a commodity processor found in PlayStation 3 (PS3). In particular, we discuss issues regarding memory access versus computation and the types of calculations which are best suited for streaming processors such as the Cell, focusing on implicit solvation models. We conclude with a comparison of improved performance on the PS3's Cell processor over more traditional processors. (c) 2008 Wiley Periodicals, Inc.

  3. Inferring Human Activity in Mobile Devices by Computing Multiple Contexts.

    Science.gov (United States)

    Chen, Ruizhi; Chu, Tianxing; Liu, Keqiang; Liu, Jingbin; Chen, Yuwei

    2015-08-28

    This paper introduces a framework for inferring human activities in mobile devices by computing spatial contexts, temporal contexts, spatiotemporal contexts, and user contexts. A spatial context is a significant location that is defined as a geofence, which can be a node associated with a circle, or a polygon; a temporal context contains time-related information that can be e.g., a local time tag, a time difference between geographical locations, or a timespan; a spatiotemporal context is defined as a dwelling length at a particular spatial context; and a user context includes user-related information that can be the user's mobility contexts, environmental contexts, psychological contexts or social contexts. Using the measurements of the built-in sensors and radio signals in mobile devices, we can snapshot a contextual tuple for every second including aforementioned contexts. Giving a contextual tuple, the framework evaluates the posteriori probability of each candidate activity in real-time using a Naïve Bayes classifier. A large dataset containing 710,436 contextual tuples has been recorded for one week from an experiment carried out at Texas A&M University Corpus Christi with three participants. The test results demonstrate that the multi-context solution significantly outperforms the spatial-context-only solution. A classification accuracy of 61.7% is achieved for the spatial-context-only solution, while 88.8% is achieved for the multi-context solution.

  4. Modular multiple sensors information management for computer-integrated surgery.

    Science.gov (United States)

    Vaccarella, Alberto; Enquobahrie, Andinet; Ferrigno, Giancarlo; Momi, Elena De

    2012-09-01

    In the past 20 years, technological advancements have modified the concept of modern operating rooms (ORs) with the introduction of computer-integrated surgery (CIS) systems, which promise to enhance the outcomes, safety and standardization of surgical procedures. With CIS, different types of sensor (mainly position-sensing devices, force sensors and intra-operative imaging devices) are widely used. Recently, the need for a combined use of different sensors raised issues related to synchronization and spatial consistency of data from different sources of information. In this study, we propose a centralized, multi-sensor management software architecture for a distributed CIS system, which addresses sensor information consistency in both space and time. The software was developed as a data server module in a client-server architecture, using two open-source software libraries: Image-Guided Surgery Toolkit (IGSTK) and OpenCV. The ROBOCAST project (FP7 ICT 215190), which aims at integrating robotic and navigation devices and technologies in order to improve the outcome of the surgical intervention, was used as the benchmark. An experimental protocol was designed in order to prove the feasibility of a centralized module for data acquisition and to test the application latency when dealing with optical and electromagnetic tracking systems and ultrasound (US) imaging devices. Our results show that a centralized approach is suitable for minimizing synchronization errors; latency in the client-server communication was estimated to be 2 ms (median value) for tracking systems and 40 ms (median value) for US images. The proposed centralized approach proved to be adequate for neurosurgery requirements. Latency introduced by the proposed architecture does not affect tracking system performance in terms of frame rate and limits US images frame rate at 25 fps, which is acceptable for providing visual feedback to the surgeon in the OR. Copyright © 2012 John Wiley & Sons, Ltd.

  5. High-Speed General Purpose Genetic Algorithm Processor.

    Science.gov (United States)

    Hoseini Alinodehi, Seyed Pourya; Moshfe, Sajjad; Saber Zaeimian, Masoumeh; Khoei, Abdollah; Hadidi, Khairollah

    2016-07-01

    In this paper, an ultrafast steady-state genetic algorithm processor (GAP) is presented. Due to the heavy computational load of genetic algorithms (GAs), they usually take a long time to find optimum solutions. Hardware implementation is a significant approach to overcome the problem by speeding up the GAs procedure. Hence, we designed a digital CMOS implementation of GA in [Formula: see text] process. The proposed processor is not bounded to a specific application. Indeed, it is a general-purpose processor, which is capable of performing optimization in any possible application. Utilizing speed-boosting techniques, such as pipeline scheme, parallel coarse-grained processing, parallel fitness computation, parallel selection of parents, dual-population scheme, and support for pipelined fitness computation, the proposed processor significantly reduces the processing time. Furthermore, by relying on a built-in discard operator the proposed hardware may be used in constrained problems that are very common in control applications. In the proposed design, a large search space is achievable through the bit string length extension of individuals in the genetic population by connecting the 32-bit GAPs. In addition, the proposed processor supports parallel processing, in which the GAs procedure can be run on several connected processors simultaneously.

  6. Modcomp MAX IV System Processors reference guide

    Energy Technology Data Exchange (ETDEWEB)

    Cummings, J.

    1990-10-01

    A user almost always faces a big problem when having to learn to use a new computer system. The information necessary to use the system is often scattered throughout many different manuals. The user also faces the problem of extracting the information really needed from each manual. Very few computer vendors supply a single Users Guide or even a manual to help the new user locate the necessary manuals. Modcomp is no exception to this, Modcomp MAX IV requires that the user be familiar with the system file usage which adds to the problem. At General Atomics there is an ever increasing need for new users to learn how to use the Modcomp computers. This paper was written to provide a condensed Users Reference Guide'' for Modcomp computer users. This manual should be of value not only to new users but any users that are not Modcomp computer systems experts. This Users Reference Guide'' is intended to provided the basic information for the use of the various Modcomp System Processors necessary to, create, compile, link-edit, and catalog a program. Only the information necessary to provide the user with a basic understanding of the Systems Processors is included. This document provides enough information for the majority of programmers to use the Modcomp computers without having to refer to any other manuals. A lot of emphasis has been placed on the file description and usage for each of the System Processors. This allows the user to understand how Modcomp MAX IV does things rather than just learning the system commands.

  7. Composable processor virtualization for embedded systems

    NARCIS (Netherlands)

    Molnos, A.M.; Milutinovic, A.; She, D.; Goossens, K.G.W.

    2010-01-01

    Processor virtualization divides a physical processor's time among a set of virual machines, enabling efficient hardware utilization, application security and allowing co-existence of different operating systems on the same processor. Through initially intended for the server domain, virtualization

  8. Nuclear interactive evaluations on distributed processors

    International Nuclear Information System (INIS)

    Dix, G.E.; Congdon, S.P.

    1988-01-01

    BWR [boiling water reactor] nuclear design is a complicated process, involving trade-offs among a variety of conflicting objectives. Complex computer calculations and usually required for each design iteration. GE Nuclear Energy has implemented a system where the evaluations are performed interactively on a large number of small microcomputers. This approach minimizes the time it takes to carry out design iterations even through the processor speeds are low compared with modern super computers. All of the desktop microcomputers are linked to a common data base via an ethernet communications system so that design data can be shared and data quality can be maintained

  9. Bulk-memory processor for data acquisition

    International Nuclear Information System (INIS)

    Nelson, R.O.; McMillan, D.E.; Sunier, J.W.; Meier, M.; Poore, R.V.

    1981-01-01

    To meet the diverse needs and data rate requirements at the Van de Graaff and Weapons Neutron Research (WNR) facilities, a bulk memory system has been implemented which includes a fast and flexible processor. This bulk memory processor (BMP) utilizes bit slice and microcode techniques and features a 24 bit wide internal architecture allowing direct addressing of up to 16 megawords of memory and histogramming up to 16 million counts per channel without overflow. The BMP is interfaced to the MOSTEK MK 8000 bulk memory system and to the standard MODCOMP computer I/O bus. Coding for the BMP both at the microcode level and with macro instructions is supported. The generalized data acquisition system has been extended to support the BMP in a manner transparent to the user

  10. A dedicated line-processor as used at the SHF

    International Nuclear Information System (INIS)

    Bevan, A.V.; Hatley, R.W.; Price, D.R.; Rankin, P.

    1985-01-01

    A hardwired trigger processor was used at the SLAC Hybrid Facility to find evidence for charged tracks originating from the fiducial volume of a 40'' rapidcycling bubble chamber. Straight-line projections of these tracks in the plane perpendicular to the applied magnetic field were searched for using data from three sets of proportional wire chambers (PWC). This information was made directly available to the processor by means of a special digitizing card. The results memory of the processor simulated read-only memory in a 168/E processor and was accessible by it. The 168/E controlled the issuing of a trigger command to the bubble chamber flash tubes. The same design of digitizer card used by the line processor was incorporated into the 168/E, again as read only memory, which allowed it access to the raw data for continual monitoring of trigger integrity. The design logic of the trigger processor was verified by running real PWC data through a FORTRAN simulation of the hardware. This enabled the debugging to become highly automated since a step by step, computer controlled comparison of processor registers to simulation predictions could be made

  11. WATERLOPP V2/64: A highly parallel machine for numerical computation

    Science.gov (United States)

    Ostlund, Neil S.

    1985-07-01

    Current technological trends suggest that the high performance scientific machines of the future are very likely to consist of a large number (greater than 1024) of processors connected and communicating with each other in some as yet undetermined manner. Such an assembly of processors should behave as a single machine in obtaining numerical solutions to scientific problems. However, the appropriate way of organizing both the hardware and software of such an assembly of processors is an unsolved and active area of research. It is particularly important to minimize the organizational overhead of interprocessor comunication, global synchronization, and contention for shared resources if the performance of a large number ( n) of processors is to be anything like the desirable n times the performance of a single processor. In many situations, adding a processor actually decreases the performance of the overall system since the extra organizational overhead is larger than the extra processing power added. The systolic loop architecture is a new multiple processor architecture which attemps at a solution to the problem of how to organize a large number of asynchronous processors into an effective computational system while minimizing the organizational overhead. This paper gives a brief overview of the basic systolic loop architecture, systolic loop algorithms for numerical computation, and a 64-processor implementation of the architecture, WATERLOOP V2/64, that is being used as a testbed for exploring the hardware, software, and algorithmic aspects of the architecture.

  12. High-speed packet filtering utilizing stream processors

    Science.gov (United States)

    Hummel, Richard J.; Fulp, Errin W.

    2009-04-01

    Parallel firewalls offer a scalable architecture for the next generation of high-speed networks. While these parallel systems can be implemented using multiple firewalls, the latest generation of stream processors can provide similar benefits with a significantly reduced latency due to locality. This paper describes how the Cell Broadband Engine (CBE), a popular stream processor, can be used as a high-speed packet filter. Results show the CBE can potentially process packets arriving at a rate of 1 Gbps with a latency less than 82 μ-seconds. Performance depends on how well the packet filtering process is translated to the unique stream processor architecture. For example the method used for transmitting data and control messages among the pseudo-independent processor cores has a significant impact on performance. Experimental results will also show the current limitations of a CBE operating system when used to process packets. Possible solutions to these issues will be discussed.

  13. Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

    Directory of Open Access Journals (Sweden)

    Abdul Kareem PARCHUR

    2010-12-01

    Full Text Available As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalability in Intel Core 2 Duo series processors. Even though AI benchmarks have similar execution time, they have dissimilar characteristics which are identified using principal component analysis and dendogram. As the processor frequency increased from 1.8 GHz to 3.167 GHz the execution time is decreased by ~370 sec for AI workloads. In the case of Physics/Quantum Computing programs it was ~940 sec.

  14. Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

    Science.gov (United States)

    Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

    2017-07-01

    Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).

  15. Integrated Markov-neural reliability computation method: A case for multiple automated guided vehicle system

    International Nuclear Information System (INIS)

    Fazlollahtabar, Hamed; Saidi-Mehrabad, Mohammad; Balakrishnan, Jaydeep

    2015-01-01

    This paper proposes an integrated Markovian and back propagation neural network approaches to compute reliability of a system. While states of failure occurrences are significant elements for accurate reliability computation, Markovian based reliability assessment method is designed. Due to drawbacks shown by Markovian model for steady state reliability computations and neural network for initial training pattern, integration being called Markov-neural is developed and evaluated. To show efficiency of the proposed approach comparative analyses are performed. Also, for managerial implication purpose an application case for multiple automated guided vehicles (AGVs) in manufacturing networks is conducted. - Highlights: • Integrated Markovian and back propagation neural network approach to compute reliability. • Markovian based reliability assessment method. • Managerial implication is shown in an application case for multiple automated guided vehicles (AGVs) in manufacturing networks

  16. ESPRIT-like algorithm for computational-efficient angle estimation in bistatic multiple-input multiple-output radar

    Science.gov (United States)

    Gong, Jian; Lou, Shuntian; Guo, Yiduo

    2016-04-01

    An estimation of signal parameters via a rotational invariance techniques-like (ESPRIT-like) algorithm is proposed to estimate the direction of arrival and direction of departure for bistatic multiple-input multiple-output (MIMO) radar. The properties of a noncircular signal and Euler's formula are first exploited to establish a real-valued bistatic MIMO radar array data, which is composed of sine and cosine data. Then the receiving/transmitting selective matrices are constructed to obtain the receiving/transmitting rotational invariance factors. Since the rotational invariance factor is a cosine function, symmetrical mirror angle ambiguity may occur. Finally, a maximum likelihood function is used to avoid the estimation ambiguities. Compared with the existing ESPRIT, the proposed algorithm can save about 75% of computational load owing to the real-valued ESPRIT algorithm. Simulation results confirm the effectiveness of the ESPRIT-like algorithm.

  17. Code compression for VLIW embedded processors

    Science.gov (United States)

    Piccinelli, Emiliano; Sannino, Roberto

    2004-04-01

    The implementation of processors for embedded systems implies various issues: main constraints are cost, power dissipation and die area. On the other side, new terminals perform functions that require more computational flexibility and effort. Long code streams must be loaded into memories, which are expensive and power consuming, to run on DSPs or CPUs. To overcome this issue, the "SlimCode" proprietary algorithm presented in this paper (patent pending technology) can reduce the dimensions of the program memory. It can run offline and work directly on the binary code the compiler generates, by compressing it and creating a new binary file, about 40% smaller than the original one, to be loaded into the program memory of the processor. The decompression unit will be a small ASIC, placed between the Memory Controller and the System bus of the processor, keeping unchanged the internal CPU architecture: this implies that the methodology is completely transparent to the core. We present comparisons versus the state-of-the-art IBM Codepack algorithm, along with its architectural implementation into the ST200 VLIW family core.

  18. Seismometer array station processors

    International Nuclear Information System (INIS)

    Key, F.A.; Lea, T.G.; Douglas, A.

    1977-01-01

    A description is given of the design, construction and initial testing of two types of Seismometer Array Station Processor (SASP), one to work with data stored on magnetic tape in analogue form, the other with data in digital form. The purpose of a SASP is to detect the short period P waves recorded by a UK-type array of 20 seismometers and to edit these on to a a digital library tape or disc. The edited data are then processed to obtain a rough location for the source and to produce seismograms (after optimum processing) for analysis by a seismologist. SASPs are an important component in the scheme for monitoring underground explosions advocated by the UK in the Conference of the Committee on Disarmament. With digital input a SASP can operate at 30 times real time using a linear detection process and at 20 times real time using the log detector of Weichert. Although the log detector is slower, it has the advantage over the linear detector that signals with lower signal-to-noise ratio can be detected and spurious large amplitudes are less likely to produce a detection. It is recommended, therefore, that where possible array data should be recorded in digital form for input to a SASP and that the log detector of Weichert be used. Trial runs show that a SASP is capable of detecting signals down to signal-to-noise ratios of about two with very few false detections, and at mid-continental array sites it should be capable of detecting most, if not all, the signals with magnitude above msub(b) 4.5; the UK argues that, given a suitable network, it is realistic to hope that sources of this magnitude and above can be detected and identified by seismological means alone. (author)

  19. Keystone Business Models for Network Security Processors

    Directory of Open Access Journals (Sweden)

    Arthur Low

    2013-07-01

    Full Text Available Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor” models nor the silicon intellectual-property licensing (“IP-licensing” models allow small technology companies to successfully compete. This article describes an alternative approach that produces an ongoing stream of novel network security processors for niche markets through continuous innovation by both large and small companies. This approach, referred to here as the "business ecosystem model for network security processors", includes a flexible and reconfigurable technology platform, a “keystone” business model for the company that maintains the platform architecture, and an extended ecosystem of companies that both contribute and share in the value created by innovation. New opportunities for business model innovation by participating companies are made possible by the ecosystem model. This ecosystem model builds on: i the lessons learned from the experience of the first author as a senior integrated circuit architect for providers of public-key cryptography solutions and as the owner of a semiconductor startup, and ii the latest scholarly research on technology entrepreneurship, business models, platforms, and business ecosystems. This article will be of interest to all technology entrepreneurs, but it will be of particular interest to owners of small companies that provide security solutions and to specialized security professionals seeking to launch their own companies.

  20. Efficient quantum walk on a quantum processor

    Science.gov (United States)

    Qiang, Xiaogang; Loke, Thomas; Montanaro, Ashley; Aungskunsiri, Kanin; Zhou, Xiaoqi; O'Brien, Jeremy L.; Wang, Jingbo B.; Matthews, Jonathan C. F.

    2016-01-01

    The random walk formalism is used across a wide range of applications, from modelling share prices to predicting population genetics. Likewise, quantum walks have shown much potential as a framework for developing new quantum algorithms. Here we present explicit efficient quantum circuits for implementing continuous-time quantum walks on the circulant class of graphs. These circuits allow us to sample from the output probability distributions of quantum walks on circulant graphs efficiently. We also show that solving the same sampling problem for arbitrary circulant quantum circuits is intractable for a classical computer, assuming conjectures from computational complexity theory. This is a new link between continuous-time quantum walks and computational complexity theory and it indicates a family of tasks that could ultimately demonstrate quantum supremacy over classical computers. As a proof of principle, we experimentally implement the proposed quantum circuit on an example circulant graph using a two-qubit photonics quantum processor. PMID:27146471

  1. Real-time wavefront processors for the next generation of adaptive optics systems: a design and analysis

    Science.gov (United States)

    Truong, Tuan; Brack, Gary L.; Troy, Mitchell; Trinh, Thang; Shi, Fang; Dekany, Richard G.

    2003-02-01

    Adaptive optics (AO) systems currently under investigation will require at least two orders of magitude increase in the number of actuators, which in turn translates to effectively a 104 increase in compute latency. Since the performance of an AO system invariably improves as the compute latency decreases, it is important to study how today's computer systems will scale to address this expected increase in actuator utilization. This paper answers this question by characterizing the performance of a single deformable mirror (DM) Shack-Hartmann natural guide star AO system implemented on the present-generation digital signal processor (DSP) TMS320C6701 from Texas Instruments. We derive the compute latency of such a system in terms of a few basic parameters, such as the number of DM actuators, the number of data channels used to read out the camera pixels, the number of DSPs, the available memory bandwidth, as well as the inter-processor communication (IPC) bandwidth and the pixel transfer rate. We show how the results would scale for future systems that utilizes multiple DMs and guide stars. We demonstrate that the principal performance bottleneck of such a system is the available memory bandwidth of the processors and to lesser extent the IPC bandwidth. This paper concludes with suggestions for mitigating this bottleneck.

  2. Hardware Synchronization for Embedded Multi-Core Processors

    DEFF Research Database (Denmark)

    Stoif, Christian; Schoeberl, Martin; Liccardi, Benito

    2011-01-01

    Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper, establi......Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper...

  3. Computational multiple steady states for enzymatic esterification of ethanol and oleic acid in an isothermal CSTR.

    Science.gov (United States)

    Ho, Pang-Yen; Chuang, Guo-Syong; Chao, An-Chong; Li, Hsing-Ya

    2005-05-01

    The capacity of complex biochemical reaction networks (consisting of 11 coupled non-linear ordinary differential equations) to show multiple steady states, was investigated. The system involved esterification of ethanol and oleic acid by lipase in an isothermal continuous stirred tank reactor (CSTR). The Deficiency One Algorithm and the Subnetwork Analysis were applied to determine the steady state multiplicity. A set of rate constants and two corresponding steady states are computed. The phenomena of bistability, hysteresis and bifurcation are discussed. Moreover, the capacity of steady state multiplicity is extended to the family of the studied reaction networks.

  4. Matrix multiplication operations with data pre-conditioning in a high performance computing architecture

    Science.gov (United States)

    Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

    2013-11-05

    Mechanisms for performing matrix multiplication operations with data pre-conditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation.

  5. Global synchronization of parallel processors using clock pulse width modulation

    Science.gov (United States)

    Chen, Dong; Ellavsky, Matthew R.; Franke, Ross L.; Gara, Alan; Gooding, Thomas M.; Haring, Rudolf A.; Jeanson, Mark J.; Kopcsay, Gerard V.; Liebsch, Thomas A.; Littrell, Daniel; Ohmacht, Martin; Reed, Don D.; Schenck, Brandon E.; Swetz, Richard A.

    2013-04-02

    A circuit generates a global clock signal with a pulse width modification to synchronize processors in a parallel computing system. The circuit may include a hardware module and a clock splitter. The hardware module may generate a clock signal and performs a pulse width modification on the clock signal. The pulse width modification changes a pulse width within a clock period in the clock signal. The clock splitter may distribute the pulse width modified clock signal to a plurality of processors in the parallel computing system.

  6. A fast processor for di-lepton triggers

    CERN Document Server

    Kostarakis, P; Barsotti, E; Conetti, S; Cox, B; Enagonio, J; Haldeman, M; Haynes, W; Katsanevas, S; Kerns, C; Lebrun, P; Smith, H; Soszyniski, T; Stoffel, J; Treptow, K; Turkot, F; Wagner, R

    1981-01-01

    As a new application of the Fermilab ECL-CAMAC logic modules a fast trigger processor was developed for Fermilab experiment E-537, aiming to measure the higher mass di-muon production by antiprotons. The processor matches the hit information received from drift chambers and scintillation counters, to find candidate muon tracks and determine their directions and momenta. The tracks are then paired to compute an invariant mass: when the computed mass falls within the desired range, the event is accepted. The process is accomplished in times of 5 to 10 microseconds, while achieving a trigger rate reduction of up to a factor of ten. (5 refs).

  7. OLYMPUS system and development of its pre-processor

    International Nuclear Information System (INIS)

    Okamoto, Masao; Takeda, Tatsuoki; Tanaka, Masatoshi; Asai, Kiyoshi; Nakano, Koh.

    1977-08-01

    The OLYMPUS SYSTEM developed by K. V. Roverts et al. was converted and introduced in computer system FACOM 230/75 of the JAERI Computing Center. A pre-processor was also developed for the OLYMPUS SYSTEM. The OLYMPUS SYSTEM is very useful for development, standardization and exchange of programs in thermonuclear fusion research and plasma physics. The pre-processor developed by the present authors is not only essential for the JAERI OLYMPUS SYSTEM, but also useful in manipulation, creation and correction of program files. (auth.)

  8. A UNIX-based prototype biomedical virtual image processor

    International Nuclear Information System (INIS)

    Fahy, J.B.; Kim, Y.

    1987-01-01

    The authors have developed a multiprocess virtual image processor for the IBM PC/AT, in order to maximize image processing software portability for biomedical applications. An interprocess communication scheme, based on two-way metacode exchange, has been developed and verified for this purpose. Application programs call a device-independent image processing library, which transfers commands over a shared data bridge to one or more Autonomous Virtual Image Processors (AVIP). Each AVIP runs as a separate process in the UNIX operating system, and implements the device-independent functions on the image processor to which it corresponds. Application programs can control multiple image processors at a time, change the image processor configuration used at any time, and are completely portable among image processors for which an AVIP has been implemented. Run-time speeds have been found to be acceptable for higher level functions, although rather slow for lower level functions, owing to the overhead associated with sending commands and data over the shared data bridge

  9. A General Cross-Layer Cloud Scheduling Framework for Multiple IoT Computer Tasks.

    Science.gov (United States)

    Wu, Guanlin; Bao, Weidong; Zhu, Xiaomin; Zhang, Xiongtao

    2018-05-23

    The diversity of IoT services and applications brings enormous challenges to improving the performance of multiple computer tasks' scheduling in cross-layer cloud computing systems. Unfortunately, the commonly-employed frameworks fail to adapt to the new patterns on the cross-layer cloud. To solve this issue, we design a new computer task scheduling framework for multiple IoT services in cross-layer cloud computing systems. Specifically, we first analyze the features of the cross-layer cloud and computer tasks. Then, we design the scheduling framework based on the analysis and present detailed models to illustrate the procedures of using the framework. With the proposed framework, the IoT services deployed in cross-layer cloud computing systems can dynamically select suitable algorithms and use resources more effectively to finish computer tasks with different objectives. Finally, the algorithms are given based on the framework, and extensive experiments are also given to validate its effectiveness, as well as its superiority.

  10. Array processors: an introduction to their architecture, software, and applications in nuclear medicine

    International Nuclear Information System (INIS)

    King, M.A.; Doherty, P.W.; Rosenberg, R.J.; Cool, S.L.

    1983-01-01

    Array processors are ''number crunchers'' that dramatically enhance the processing power of nuclear medicine computer systems for applicatons dealing with the repetitive operations involved in digital image processing of large segments of data. The general architecture and the programming of array processors are introduced, along with some applications of array processors to the reconstruction of emission tomographic images, digital image enhancement, and functional image formation

  11. Asymmetrical floating point array processors, their application to exploration and exploitation

    Energy Technology Data Exchange (ETDEWEB)

    Geriepy, B L

    1983-01-01

    An asymmetrical floating point array processor is a special-purpose scientific computer which operates under asymmetrical control of a host computer. Although an array processor can receive fixed point input and produce fixed point output, its primary mode of operation is floating point. The first generation of array processors was oriented towards time series information. The next generation of array processors has proved much more versatile and their applicability ranges from petroleum reservoir simulation to speech syntheses. Array processors are becoming commonplace in mining, the primary usage being construction of grids-by usual methods or by kriging. The Australian mining community is among the world's leaders in regard to computer-assisted exploration and exploitation systems. Part of this leadership role must be providing guidance to computer vendors in regard to current and future requirements.

  12. Multiple-Choice versus Constructed-Response Tests in the Assessment of Mathematics Computation Skills.

    Science.gov (United States)

    Gadalla, Tahany M.

    The equivalence of multiple-choice (MC) and constructed response (discrete) (CR-D) response formats as applied to mathematics computation at grade levels two to six was tested. The difference between total scores from the two response formats was tested for statistical significance, and the factor structure of items in both response formats was…

  13. An algorithm to compute a rule for division problems with multiple references

    Directory of Open Access Journals (Sweden)

    Sánchez Sánchez, Francisca J.

    2012-01-01

    Full Text Available In this paper we consider an extension of the classic division problem with claims: Thedivision problem with multiple references. Hinojosa et al. (2012 provide a solution for this type of pro-blems. The aim of this work is to extend their results by proposing an algorithm that calculates allocationsbased on these results. All computational details are provided in the paper.

  14. The MORPG-Based Learning System for Multiple Courses: A Case Study on Computer Science Curriculum

    Science.gov (United States)

    Liu, Kuo-Yu

    2015-01-01

    This study aimed at developing a Multiplayer Online Role Playing Game-based (MORPG) Learning system which enabled instructors to construct a game scenario and manage sharable and reusable learning content for multiple courses. It used the curriculum of "Introduction to Computer Science" as a study case to assess students' learning…

  15. The computer-aided design of a servo system as a multiple-criteria decision problem

    NARCIS (Netherlands)

    Udink ten Cate, A.J.

    1986-01-01

    This paper treats the selection of controller gains of a servo system as a multiple-criteria decision problem. In contrast to the usual optimization-based approaches to computer-aided design, inequality constraints are included in the problem as unconstrained objectives. This considerably simplifies

  16. On Combining Multiple-Instance Learning and Active Learning for Computer-Aided Detection of Tuberculosis

    NARCIS (Netherlands)

    Melendez Rodriguez, J.C.; Ginneken, B. van; Maduskar, P.; Philipsen, R.H.H.M.; Ayles, H.; Sanchez, C.I.

    2016-01-01

    The major advantage of multiple-instance learning (MIL) applied to a computer-aided detection (CAD) system is that it allows optimizing the latter with case-level labels instead of accurate lesion outlines as traditionally required for a supervised approach. As shown in previous work, a MIL-based

  17. The bit slice micro-processor 'GESPRO' as a project in the UA2 experiment

    CERN Document Server

    Becam, C; Delanghe, J; Fest, H M; Lecoq, J; Martin, H; Mencik, M; MerkeI, B; Meyer, J M; Perrin, M; Plothow, H; Rampazzo, J P; Schittly, A

    1981-01-01

    The bit slice micro-processor GESPRO is a CAMAC module plugged into a standard Elliot system crate via which it communicates as a slave with its host computer. It has full control of CAMAC as a master unit. GESPRO is a 24 bit machine with multi-mode memory addressing capacity of 64K words. The micro-processor structure uses 5 buses including pipe-line registers to mask access time and 16 interrupt levels. The micro-program memory capacity is 2K (RAM) words of 48 bits each. A special hardwired module allows floating point, as well as integer, multiplication of 24*24 bits, result in 48 bits, in about 200 ns. This micro-processor could be used in the UA2 data acquisition chain and trigger system for the following tasks: (a) online data reduction, i.e. to read DURANDAL, process the information resulting in accepting or rejecting the event; (b) readout and analysis of the accepted data; (c) preprocess the data. The UA2 version of GESPRO is under construction, programs and micro-programs are under development. Hard...

  18. The bit slice micro-processor 'GESPRO' as a project in the UA2 experiment

    International Nuclear Information System (INIS)

    Becam, C.; Bernaudin, P.; Delanghe, J.; Mencik, M.; Merkel, B.; Plothow, H.; Fest, H.M.; Lecoq, J.; Martin, H.; Meyer, J.M.

    1981-01-01

    The bit slice micro-processor GESPRO, as it is proposed for use in the UA 2 data acquisition chain and trigger system, is a CAMAC module plugged into a standard Elliott System crate via which it communicates as a slave with its host computer (ND, DEC). It has full control of CAMAC as a master unit. GESPRO is a 24 bit machine (150 ns effective cycle time) with multi-mode memory addressing capacity of 64 K words. The micro-processor structure uses 5 busses including pipe-line registers to mask access time and 16 interrupt levels. The micro-program memory capacity is 2 K (RAM) words of 48 bits each. A special hardwired module allows floating point (as well as integer) multiplication of 24 x 24 bits, result in 48 bits, in about 200 ns. This micro-processor could be used in the UA2 data acquisition chain and trigger system for the following tasks: a) online data reduction, i.e. to read DURANDAL (fast ADC's = the hardware trigger in the experiment), process the information (effective mass calculation, etc.) resulting in accepting or rejecting the event. b) read out and analysis of the accepted data (collect statistical information). c) preprocess the data (calculation of pointers, address decoding, etc.). The UA2 version of GESPRO is under construction, programs and micro-programs are under development. Hardware and software will be tested with simulated data. First results are expected in about one year from now. (orig.)

  19. Programmable level-1 trigger with 3D-Flow processor array

    International Nuclear Information System (INIS)

    Crosetto, D.

    1994-01-01

    The 3D-Flow parallel processing system is a new concept in processor architecture, system architecture, and assembly architecture. Compared to the electronics used in present systems, this approach reduces the cost and complexity of the hardware and allows easy assembly, disassembly, incremental upgrading, and maintenance of different interconnection topologies. The 3D-Flow parallel-processing system benefits high energy physics (HEP) by allowing: (1) common less costly hardware to be used in different experiments. (2) new uses of existing installations. (3) tuning of trigger based on the first analyzed data, and (4) selection of desired events directly from raw data. The goal of this parallel-processing architecture is to acquire multiple data in parallel (up to 100 million frames per second) and to process them at high speed, accomplishing digital filtering on the input data, pattern recognition (particle identification), data moving, and data formatting. The main features of the system are its programmability, scalability, high-speed communication, and low cost. The compactness of the 3D-Flow parallel-processing system in concert with the processor architecture allows processor interconnections to be mapped into the geometry of sensors (detectors in HEP) without large interconnection signal delay, enabling real-time pattern recognition. The overall 3D-Flow project has passed a major design review at Fermilab (Reviewers included experts in computers, triggering, system assembly, and electronics)

  20. Computing NLTE Opacities -- Node Level Parallel Calculation

    Energy Technology Data Exchange (ETDEWEB)

    Holladay, Daniel [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2015-09-11

    Presentation. The goal: to produce a robust library capable of computing reasonably accurate opacities inline with the assumption of LTE relaxed (non-LTE). Near term: demonstrate acceleration of non-LTE opacity computation. Far term (if funded): connect to application codes with in-line capability and compute opacities. Study science problems. Use efficient algorithms that expose many levels of parallelism and utilize good memory access patterns for use on advanced architectures. Portability to multiple types of hardware including multicore processors, manycore processors such as KNL, GPUs, etc. Easily coupled to radiation hydrodynamics and thermal radiative transfer codes.

  1. Interger multiplication with overflow detection or saturation

    Energy Technology Data Exchange (ETDEWEB)

    Schulte, M.J.; Balzola, P.I.; Akkas, A.; Brocato, R.W.

    2000-01-11

    High-speed multiplication is frequently used in general-purpose and application-specific computer systems. These systems often support integer multiplication, where two n-bit integers are multiplied to produce a 2n-bit product. To prevent growth in word length, processors typically return the n least significant bits of the product and a flag that indicates whether or not overflow has occurred. Alternatively, some processors saturate results that overflow to the most positive or most negative representable number. This paper presents efficient methods for performing unsigned or two's complement integer multiplication with overflow detection or saturation. These methods have significantly less area and delay than conventional methods for integer multiplication with overflow detection and saturation.

  2. Optimal Processor Assignment for Pipeline Computations

    Science.gov (United States)

    1991-10-01

    the use of ratios: initially each task is assigned a procesbuor2 the remaining proceborb are distributed in proportion to the quantities f,(1), 1 < i...algorithmns. IEEE Trans. onl Parallel and Distributed Systemns, 1 (4):470-499, October 1990. [26] P. Al. Kogge. The Architeture of Pipelined Comnputers

  3. Benchmarking NWP Kernels on Multi- and Many-core Processors

    Science.gov (United States)

    Michalakes, J.; Vachharajani, M.

    2008-12-01

    Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.

  4. Slowdown in the $M/M/1$ discriminatory processor-sharing queue

    NARCIS (Netherlands)

    Cheung, S.K.; Kim, Bara; Kim, Jeongsim

    2008-01-01

    We consider a queue with multiple K job classes, Poisson arrivals, and exponentially distributed required service times in which a single processor serves according to the discriminatory processor-sharing (DPS) discipline. For this queue, we obtain the first and second moments of the slowdown, which

  5. Making CSB + -Trees Processor Conscious

    DEFF Research Database (Denmark)

    Samuel, Michael; Pedersen, Anders Uhl; Bonnet, Philippe

    2005-01-01

    of the CSB+-tree. We argue that it is necessary to consider a larger group of parameters in order to adapt CSB+-tree to processor architectures as different as Pentium and Itanium. We identify this group of parameters and study how it impacts the performance of CSB+-tree on Itanium 2. Finally, we propose......Cache-conscious indexes, such as CSB+-tree, are sensitive to the underlying processor architecture. In this paper, we focus on how to adapt the CSB+-tree so that it performs well on a range of different processor architectures. Previous work has focused on the impact of node size on the performance...... a systematic method for adapting CSB+-tree to new platforms. This work is a first step towards integrating CSB+-tree in MySQL’s heap storage manager....

  6. Java Processor Optimized for RTSJ

    Directory of Open Access Journals (Sweden)

    Tu Shiliang

    2007-01-01

    Full Text Available Due to the preeminent work of the real-time specification for Java (RTSJ, Java is increasingly expected to become the leading programming language in real-time systems. To provide a Java platform suitable for real-time applications, a Java processor which can execute Java bytecode is directly proposed in this paper. It provides efficient support in hardware for some mechanisms specified in the RTSJ and offers a simpler programming model through ameliorating the scoped memory of the RTSJ. The worst case execution time (WCET of the bytecodes implemented in this processor is predictable by employing the optimization method proposed in our previous work, in which all the processing interfering predictability is handled before bytecode execution. Further advantage of this method is to make the implementation of the processor simpler and suited to a low-cost FPGA chip.

  7. An HDLC communication processor

    International Nuclear Information System (INIS)

    Brehmer, W.; Wawer, W.

    1981-09-01

    Systems for data acquisition and process control may comprise several intelligent stations with local computing power, each of them performing specific tasks in the control system. These stations generally are not independent of each other but are interconnected by the process being monitored or controlled. Therefore they must communicate with each other by transmitting or receiving messages which may contain instructions directed to the control system, status information and data from peripheral devices, variables which synchronize the execution of programs in the autonomous intelligent stations, and data for man-machine communication. This report describes a microprocessor based device which interfaces the I/O port of a host computer (CAMAC Dataway) to an HDLC bus system. The microprocessor (M6800) performs the HDLC protocol for a Multi-Point Unbalanced System. (orig.) [de

  8. Optical Array Processor: Laboratory Results

    Science.gov (United States)

    Casasent, David; Jackson, James; Vaerewyck, Gerard

    1987-01-01

    A Space Integrating (SI) Optical Linear Algebra Processor (OLAP) is described and laboratory results on its performance in several practical engineering problems are presented. The applications include its use in the solution of a nonlinear matrix equation for optimal control and a parabolic Partial Differential Equation (PDE), the transient diffusion equation with two spatial variables. Frequency-multiplexed, analog and high accuracy non-base-two data encoding are used and discussed. A multi-processor OLAP architecture is described and partitioning and data flow issues are addressed.

  9. Fast processor for dilepton triggers

    International Nuclear Information System (INIS)

    Katsanevas, S.; Kostarakis, P.; Baltrusaitis, R.

    1983-01-01

    We describe a fast trigger processor, developed for and used in Fermilab experiment E-537, for selecting high-mass dimuon events produced by negative pions and anti-protons. The processor finds candidate tracks by matching hit information received from drift chambers and scintillation counters, and determines their momenta. Invariant masses are calculated for all possible pairs of tracks and an event is accepted if any invariant mass is greater than some preselectable minimum mass. The whole process, accomplished within 5 to 10 microseconds, achieves up to a ten-fold reduction in trigger rate

  10. List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor

    Science.gov (United States)

    Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.

    2014-03-01

    List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging

  11. An accurate projection algorithm for array processor based SPECT systems

    International Nuclear Information System (INIS)

    King, M.A.; Schwinger, R.B.; Cool, S.L.

    1985-01-01

    A data re-projection algorithm has been developed for use in single photon emission computed tomography (SPECT) on an array processor based computer system. The algorithm makes use of an accurate representation of pixel activity (uniform square pixel model of intensity distribution), and is rapidly performed due to the efficient handling of an array based algorithm and the Fast Fourier Transform (FFT) on parallel processing hardware. The algorithm consists of using a pixel driven nearest neighbour projection operation to an array of subdivided projection bins. This result is then convolved with the projected uniform square pixel distribution before being compressed to original bin size. This distribution varies with projection angle and is explicitly calculated. The FFT combined with a frequency space multiplication is used instead of a spatial convolution for more rapid execution. The new algorithm was tested against other commonly used projection algorithms by comparing the accuracy of projections of a simulated transverse section of the abdomen against analytically determined projections of that transverse section. The new algorithm was found to yield comparable or better standard error and yet result in easier and more efficient implementation on parallel hardware. Applications of the algorithm include iterative reconstruction and attenuation correction schemes and evaluation of regions of interest in dynamic and gated SPECT

  12. Impacts of the IBM Cell Processor to Support Climate Models

    Science.gov (United States)

    Zhou, Shujia; Duffy, Daniel; Clune, Tom; Suarez, Max; Williams, Samuel; Halem, Milt

    2008-01-01

    NASA is interested in the performance and cost benefits for adapting its applications to the IBM Cell processor. However, its 256KB local memory per SPE and the new communication mechanism, make it very challenging to port an application. We selected the solar radiation component of the NASA GEOS-5 climate model, which: (1) is representative of column physics (approximately 50% computational time), (2) has a high computational load relative to transferring data from and to main memory, (3) performs independent calculations across multiple columns. We converted the baseline code (single-precision, Fortran) to C and ported it with manually SIMDizing 4 independent columns and found that a Cell with 8 SPEs can process 2274 columns per second. Compared with the baseline results, the Cell is approximately 5.2X, approximately 8.2X, approximately 15.1X faster than a core on Intel Woodcrest, Dempsey, and Itanium2, respectively. We believe this dramatic performance improvement makes a hybrid cluster with Cell and traditional nodes competitive.

  13. SAPIENS: Spreading Activation Processor for Information Encoded in Network Structures. Technical Report No. 296.

    Science.gov (United States)

    Ortony, Andrew; Radin, Dean I.

    The product of researchers' efforts to develop a computer processor which distinguishes between relevant and irrelevant information in the database, Spreading Activation Processor for Information Encoded in Network Structures (SAPIENS) exhibits (1) context sensitivity, (2) efficiency, (3) decreasing activation over time, (4) summation of…

  14. Processors for wavelet analysis and synthesis: NIFS and TI-C80 MVP

    Science.gov (United States)

    Brooks, Geoffrey W.

    1996-03-01

    Two processors are considered for image quadrature mirror filtering (QMF). The neuromorphic infrared focal-plane sensor (NIFS) is an existing prototype analog processor offering high speed spatio-temporal Gaussian filtering, which could be used for the QMF low- pass function, and difference of Gaussian filtering, which could be used for the QMF high- pass function. Although not designed specifically for wavelet analysis, the biologically- inspired system accomplishes the most computationally intensive part of QMF processing. The Texas Instruments (TI) TMS320C80 Multimedia Video Processor (MVP) is a 32-bit RISC master processor with four advanced digital signal processors (DSPs) on a single chip. Algorithm partitioning, memory management and other issues are considered for optimal performance. This paper presents these considerations with simulated results leading to processor implementation of high-speed QMF analysis and synthesis.

  15. Multi-processor developments in the United States for future high energy physics experiments and accelerators

    International Nuclear Information System (INIS)

    Gaines, I.

    1988-03-01

    The use of multi-processors for analysis and high-level triggering in High Energy Physics experiments, pioneered by the early emulator systems, has reached maturity, in particular with the multiple microprocessor systems in use at Fermilab. It is widely acknowledged that such systems will fulfill the major portion of the computing needs of future large experiments. Recent developments at Fermilab's Advanced Computer Program will make such systems even more powerful, cost-effective, and easier to use than they are at present. The next generation of microprocessors, already available, will provide CPU power of about one VAX 780 equivalent/$300, while supporting most VMS FORTRAN extensions and large (>8MB) amounts of memory. Low cost high density mass storage devices (based on video tape cartridge technology) will allow parallel I/O to remove potential I/O bottlenecks in systems of over 1000 VAX equipment processors. New interconnection schemes and system software will allow more flexible topologies and extremely high data bandwidth, especially for on-line systems. This talk will summarize the work at the Advanced Computer Program and the rest of the US in this field. 3 refs., 4 figs

  16. A Software Implementation of a Satellite Interface Message Processor.

    Science.gov (United States)

    Eastwood, Margaret A.; Eastwood, Lester F., Jr.

    A design for network control software for a computer network is described in which some nodes are linked by a communications satellite channel. It is assumed that the network has an ARPANET-like configuration; that is, that specialized processors at each node are responsible for message switching and network control. The purpose of the control…

  17. Sojourn time asymptotics in processor-sharing queues

    NARCIS (Netherlands)

    Borst, S.C.; Núñez Queija, R.; Zwart, B.

    2006-01-01

    Over the past few decades, the Processor-Sharing (PS) discipline has attracted a great deal of attention in the queueing literature. While the PS paradigm emerged in the sixties as an idealization of round-robin scheduling in time-shared computer systems, it has recently captured renewed interest as

  18. Elementary function calculation programs for the central processor-6

    International Nuclear Information System (INIS)

    Dobrolyubov, L.V.; Ovcharenko, G.A.; Potapova, V.A.

    1976-01-01

    Subprograms of elementary functions calculations are given for the central processor (CP AS-6). A procedure is described to obtain calculated formulae which represent the elementary functions as a polynomial. Standard programs for random numbers are considered. All the programs described are based upon the algorithms of respective programs for BESM computer

  19. Optimization of Particle-in-Cell Codes on RISC Processors

    Science.gov (United States)

    Decyk, Viktor K.; Karmesin, Steve Roy; Boer, Aeint de; Liewer, Paulette C.

    1996-01-01

    General strategies are developed to optimize particle-cell-codes written in Fortran for RISC processors which are commonly used on massively parallel computers. These strategies include data reorganization to improve cache utilization and code reorganization to improve efficiency of arithmetic pipelines.

  20. Multiple exciton generation in chiral carbon nanotubes: Density functional theory based computation

    Science.gov (United States)

    Kryjevski, Andrei; Mihaylov, Deyan; Kilina, Svetlana; Kilin, Dmitri

    2017-10-01

    We use a Boltzmann transport equation (BE) to study time evolution of a photo-excited state in a nanoparticle including phonon-mediated exciton relaxation and the multiple exciton generation (MEG) processes, such as exciton-to-biexciton multiplication and biexciton-to-exciton recombination. BE collision integrals are computed using Kadanoff-Baym-Keldysh many-body perturbation theory based on density functional theory simulations, including exciton effects. We compute internal quantum efficiency (QE), which is the number of excitons generated from an absorbed photon in the course of the relaxation. We apply this approach to chiral single-wall carbon nanotubes (SWCNTs), such as (6,2) and (6,5). We predict efficient MEG in the (6,2) and (6,5) SWCNTs within the solar spectrum range starting at the 2Eg energy threshold and with QE reaching ˜1.6 at about 3Eg, where Eg is the electronic gap.

  1. Adaptation of a Fault-Tolerant Fpga-Based Launch Sequencer as a Cubesat Payload Processor

    Science.gov (United States)

    2014-06-01

    32–bit, reduced instruction set computing ( RISC ) processor that interfaces with a universal asynchronous receiver/transmitter (UART) for a field...test a fault–tolerant reduced instruction set computer processor running a subset of the multiprocessor without interlocked pipelined stages instruction...James H. Newman Thesis Co-Advisor Clark Robertson Chair, Department of Electrical and Computer Engineering iv THIS PAGE

  2. Wavelength-encoded OCDMA system using opto-VLSI processors.

    Science.gov (United States)

    Aljada, Muhsen; Alameh, Kamal

    2007-07-01

    We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.

  3. Wavelength-encoded OCDMA system using opto-VLSI processors

    Science.gov (United States)

    Aljada, Muhsen; Alameh, Kamal

    2007-07-01

    We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.

  4. COMPUTING: International symposium

    International Nuclear Information System (INIS)

    Anon.

    1984-01-01

    Recent Developments in Computing, Processor, and Software Research for High Energy Physics, a four-day international symposium, was held in Guanajuato, Mexico, from 8-11 May, with 112 attendees from nine countries. The symposium was the third in a series of meetings exploring activities in leading-edge computing technology in both processor and software research and their effects on high energy physics. Topics covered included fixed-target on- and off-line reconstruction processors; lattice gauge and general theoretical processors and computing; multiprocessor projects; electron-positron collider on- and offline reconstruction processors; state-of-the-art in university computer science and industry; software research; accelerator processors; and proton-antiproton collider on and off-line reconstruction processors

  5. Computing for particle physics. Report of the HEPAP subpanel on computer needs for the next decade

    International Nuclear Information System (INIS)

    1985-08-01

    The increasing importance of computation to the future progress in high energy physics is documented. Experimental computing demands are analyzed for the near future (four to ten years). The computer industry's plans for the near term and long term are surveyed as they relate to the solution of high energy physics computing problems. This survey includes large processors and the future role of alternatives to commercial mainframes. The needs for low speed and high speed networking are assessed, and the need for an integrated network for high energy physics is evaluated. Software requirements are analyzed. The role to be played by multiple processor systems is examined. The computing needs associated with elementary particle theory are briefly summarized. Computing needs associated with the Superconducting Super Collider are analyzed. Recommendations are offered for expanding computing capabilities in high energy physics and for networking between the laboratories

  6. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    OpenAIRE

    Hristov Ivan; Goranov Goran; Hristova Radoslava

    2018-01-01

    We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named “Ivy Bridge-EP”) in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named “Knights Landing” (KNL). The results show 2 times better per...

  7. Matrix preconditioning: a robust operation for optical linear algebra processors.

    Science.gov (United States)

    Ghosh, A; Paparao, P

    1987-07-15

    Analog electrooptical processors are best suited for applications demanding high computational throughput with tolerance for inaccuracies. Matrix preconditioning is one such application. Matrix preconditioning is a preprocessing step for reducing the condition number of a matrix and is used extensively with gradient algorithms for increasing the rate of convergence and improving the accuracy of the solution. In this paper, we describe a simple parallel algorithm for matrix preconditioning, which can be implemented efficiently on a pipelined optical linear algebra processor. From the results of our numerical experiments we show that the efficacy of the preconditioning algorithm is affected very little by the errors of the optical system.

  8. Ring-array processor distribution topology for optical interconnects

    Science.gov (United States)

    Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.

    1992-01-01

    The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.

  9. Parallel Processor for 3D Recovery from Optical Flow

    Directory of Open Access Journals (Sweden)

    Jose Hugo Barron-Zambrano

    2009-01-01

    Full Text Available 3D recovery from motion has received a major effort in computer vision systems in the recent years. The main problem lies in the number of operations and memory accesses to be performed by the majority of the existing techniques when translated to hardware or software implementations. This paper proposes a parallel processor for 3D recovery from optical flow. Its main feature is the maximum reuse of data and the low number of clock cycles to calculate the optical flow, along with the precision with which 3D recovery is achieved. The results of the proposed architecture as well as those from processor synthesis are presented.

  10. The associative memory system for the FTK processor at ATLAS

    CERN Document Server

    Magalotti, D; The ATLAS collaboration; Donati, S; Luciano, P; Piendibene, M; Giannetti, P; Lanza, A; Verzellesi, G; Sakellariou, Andreas; Billereau, W; Combe, J M

    2014-01-01

    In high energy physics experiments, the most interesting processes are very rare and hidden in an extremely large level of background. As the experiment complexity, accelerator backgrounds, and instantaneous luminosity increase, more effective and accurate data selection techniques are needed. The Fast TracKer processor (FTK) is a real time tracking processor designed for the ATLAS trigger upgrade. The FTK core is the Associative Memory system. It provides massive computing power to minimize the processing time of complex tracking algorithms executed online. This paper reports on the results and performance of a new prototype of Associative Memory system.

  11. Monte Carlo simulations on SIMD computer architectures

    International Nuclear Information System (INIS)

    Burmester, C.P.; Gronsky, R.; Wille, L.T.

    1992-01-01

    In this paper algorithmic considerations regarding the implementation of various materials science applications of the Monte Carlo technique to single instruction multiple data (SIMD) computer architectures are presented. In particular, implementation of the Ising model with nearest, next nearest, and long range screened Coulomb interactions on the SIMD architecture MasPar MP-1 (DEC mpp-12000) series of massively parallel computers is demonstrated. Methods of code development which optimize processor array use and minimize inter-processor communication are presented including lattice partitioning and the use of processor array spanning tree structures for data reduction. Both geometric and algorithmic parallel approaches are utilized. Benchmarks in terms of Monte Carl updates per second for the MasPar architecture are presented and compared to values reported in the literature from comparable studies on other architectures

  12. Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing

    Directory of Open Access Journals (Sweden)

    Fan Zhang

    2016-04-01

    Full Text Available With the development of synthetic aperture radar (SAR technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO. However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.

  13. Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.

    Science.gov (United States)

    Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin

    2016-04-07

    With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.

  14. Efficient computation of the joint sample frequency spectra for multiple populations.

    Science.gov (United States)

    Kamm, John A; Terhorst, Jonathan; Song, Yun S

    2017-01-01

    A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.

  15. Distributed Factorization Computation on Multiple Volunteered Mobile Resource to Break RSA Key

    Science.gov (United States)

    Jaya, I.; Hardi, S. M.; Tarigan, J. T.; Zamzami, E. M.; Sihombing, P.

    2017-01-01

    Similar to common asymmeric encryption, RSA can be cracked by usmg a series mathematical calculation. The private key used to decrypt the massage can be computed using the public key. However, finding the private key may require a massive amount of calculation. In this paper, we propose a method to perform a distributed computing to calculate RSA’s private key. The proposed method uses multiple volunteered mobile devices to contribute during the calculation process. Our objective is to demonstrate how the use of volunteered computing on mobile devices may be a feasible option to reduce the time required to break a weak RSA encryption and observe the behavior and running time of the application on mobile devices.

  16. VON WISPR Family Processors: Volume 1

    National Research Council Canada - National Science Library

    Wagstaff, Ronald

    1997-01-01

    ...) and the background noise they are embedded in. Processors utilizing those fluctuations such as the von WISPR Family Processors discussed herein, are methods or algorithms that preferentially attenuate the fluctuating signals and noise...

  17. Processor farming in two-level analysis of historical bridge

    Science.gov (United States)

    Krejčí, T.; Kruis, J.; Koudelka, T.; Šejnoha, M.

    2017-11-01

    This contribution presents a processor farming method in connection with a multi-scale analysis. In this method, each macro-scopic integration point or each finite element is connected with a certain meso-scopic problem represented by an appropriate representative volume element (RVE). The solution of a meso-scale problem provides then effective parameters needed on the macro-scale. Such an analysis is suitable for parallel computing because the meso-scale problems can be distributed among many processors. The application of the processor farming method to a real world masonry structure is illustrated by an analysis of Charles bridge in Prague. The three-dimensional numerical model simulates the coupled heat and moisture transfer of one half of arch No. 3. and it is a part of a complex hygro-thermo-mechanical analysis which has been developed to determine the influence of climatic loading on the current state of the bridge.

  18. Deterministic chaos in the processor load

    International Nuclear Information System (INIS)

    Halbiniak, Zbigniew; Jozwiak, Ireneusz J.

    2007-01-01

    In this article we present the results of research whose purpose was to identify the phenomenon of deterministic chaos in the processor load. We analysed the time series of the processor load during efficiency tests of database software. Our research was done on a Sparc Alpha processor working on the UNIX Sun Solaris 5.7 operating system. The conducted analyses proved the presence of the deterministic chaos phenomenon in the processor load in this particular case

  19. Scientific programming on massively parallel processor CP-PACS

    International Nuclear Information System (INIS)

    Boku, Taisuke

    1998-01-01

    The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)

  20. A computational procedure for finding multiple solutions of convective heat transfer equations

    International Nuclear Information System (INIS)

    Mishra, S; DebRoy, T

    2005-01-01

    In recent years numerical solutions of the convective heat transfer equations have provided significant insight into the complex materials processing operations. However, these computational methods suffer from two major shortcomings. First, these procedures are designed to calculate temperature fields and cooling rates as output and the unidirectional structure of these solutions preclude specification of these variables as input even when their desired values are known. Second, and more important, these procedures cannot determine multiple pathways or multiple sets of input variables to achieve a particular output from the convective heat transfer equations. Here we propose a new method that overcomes the aforementioned shortcomings of the commonly used solutions of the convective heat transfer equations. The procedure combines the conventional numerical solution methods with a real number based genetic algorithm (GA) to achieve bi-directionality, i.e. the ability to calculate the required input variables to achieve a specific output such as temperature field or cooling rate. More important, the ability of the GA to find a population of solutions enables this procedure to search for and find multiple sets of input variables, all of which can lead to the desired specific output. The proposed computational procedure has been applied to convective heat transfer in a liquid layer locally heated on its free surface by an electric arc, where various sets of input variables are computed to achieve a specific fusion zone geometry defined by an equilibrium temperature. Good agreement is achieved between the model predictions and the independent experimental results, indicating significant promise for the application of this procedure in finding multiple solutions of convective heat transfer equations

  1. JPP: A Java Pre-Processor

    OpenAIRE

    Kiniry, Joseph R.; Cheong, Elaine

    1998-01-01

    The Java Pre-Processor, or JPP for short, is a parsing pre-processor for the Java programming language. Unlike its namesake (the C/C++ Pre-Processor, cpp), JPP provides functionality above and beyond simple textual substitution. JPP's capabilities include code beautification, code standard conformance checking, class and interface specification and testing, and documentation generation.

  2. Coordinated Energy Management in Heterogeneous Processors

    Directory of Open Access Journals (Sweden)

    Indrani Paul

    2014-01-01

    Full Text Available This paper examines energy management in a heterogeneous processor consisting of an integrated CPU–GPU for high-performance computing (HPC applications. Energy management for HPC applications is challenged by their uncompromising performance requirements and complicated by the need for coordinating energy management across distinct core types – a new and less understood problem. We examine the intra-node CPU–GPU frequency sensitivity of HPC applications on tightly coupled CPU–GPU architectures as the first step in understanding power and performance optimization for a heterogeneous multi-node HPC system. The insights from this analysis form the basis of a coordinated energy management scheme, called DynaCo, for integrated CPU–GPU architectures. We implement DynaCo on a modern heterogeneous processor and compare its performance to a state-of-the-art power- and performance-management algorithm. DynaCo improves measured average energy-delay squared (ED2 product by up to 30% with less than 2% average performance loss across several exascale and other HPC workloads.

  3. High-Performance Computing Paradigm and Infrastructure

    CERN Document Server

    Yang, Laurence T

    2006-01-01

    With hyperthreading in Intel processors, hypertransport links in next generation AMD processors, multi-core silicon in today's high-end microprocessors from IBM and emerging grid computing, parallel and distributed computers have moved into the mainstream

  4. HTGR core seismic analysis using an array processor

    International Nuclear Information System (INIS)

    Shatoff, H.; Charman, C.M.

    1983-01-01

    A Floating Point Systems array processor performs nonlinear dynamic analysis of the high-temperature gas-cooled reactor (HTGR) core with significant time and cost savings. The graphite HTGR core consists of approximately 8000 blocks of various shapes which are subject to motion and impact during a seismic event. Two-dimensional computer programs (CRUNCH2D, MCOCO) can perform explicit step-by-step dynamic analyses of up to 600 blocks for time-history motions. However, use of two-dimensional codes was limited by the large cost and run times required. Three-dimensional analysis of the entire core, or even a large part of it, had been considered totally impractical. Because of the needs of the HTGR core seismic program, a Floating Point Systems array processor was used to enhance computer performance of the two-dimensional core seismic computer programs, MCOCO and CRUNCH2D. This effort began by converting the computational algorithms used in the codes to a form which takes maximum advantage of the parallel and pipeline processors offered by the architecture of the Floating Point Systems array processor. The subsequent conversion of the vectorized FORTRAN coding to the array processor required a significant programming effort to make the system work on the General Atomic (GA) UNIVAC 1100/82 host. These efforts were quite rewarding, however, since the cost of running the codes has been reduced approximately 50-fold and the time threefold. The core seismic analysis with large two-dimensional models has now become routine and extension to three-dimensional analysis is feasible. These codes simulate the one-fifth-scale full-array HTGR core model. This paper compares the analysis with the test results for sine-sweep motion

  5. FireProt: Energy- and Evolution-Based Computational Design of Thermostable Multiple-Point Mutants.

    Science.gov (United States)

    Bednar, David; Beerens, Koen; Sebestova, Eva; Bendl, Jaroslav; Khare, Sagar; Chaloupkova, Radka; Prokop, Zbynek; Brezovsky, Jan; Baker, David; Damborsky, Jiri

    2015-11-01

    There is great interest in increasing proteins' stability to enhance their utility as biocatalysts, therapeutics, diagnostics and nanomaterials. Directed evolution is a powerful, but experimentally strenuous approach. Computational methods offer attractive alternatives. However, due to the limited reliability of predictions and potentially antagonistic effects of substitutions, only single-point mutations are usually predicted in silico, experimentally verified and then recombined in multiple-point mutants. Thus, substantial screening is still required. Here we present FireProt, a robust computational strategy for predicting highly stable multiple-point mutants that combines energy- and evolution-based approaches with smart filtering to identify additive stabilizing mutations. FireProt's reliability and applicability was demonstrated by validating its predictions against 656 mutations from the ProTherm database. We demonstrate that thermostability of the model enzymes haloalkane dehalogenase DhaA and γ-hexachlorocyclohexane dehydrochlorinase LinA can be substantially increased (ΔTm = 24°C and 21°C) by constructing and characterizing only a handful of multiple-point mutants. FireProt can be applied to any protein for which a tertiary structure and homologous sequences are available, and will facilitate the rapid development of robust proteins for biomedical and biotechnological applications.

  6. FireProt: Energy- and Evolution-Based Computational Design of Thermostable Multiple-Point Mutants.

    Directory of Open Access Journals (Sweden)

    David Bednar

    2015-11-01

    Full Text Available There is great interest in increasing proteins' stability to enhance their utility as biocatalysts, therapeutics, diagnostics and nanomaterials. Directed evolution is a powerful, but experimentally strenuous approach. Computational methods offer attractive alternatives. However, due to the limited reliability of predictions and potentially antagonistic effects of substitutions, only single-point mutations are usually predicted in silico, experimentally verified and then recombined in multiple-point mutants. Thus, substantial screening is still required. Here we present FireProt, a robust computational strategy for predicting highly stable multiple-point mutants that combines energy- and evolution-based approaches with smart filtering to identify additive stabilizing mutations. FireProt's reliability and applicability was demonstrated by validating its predictions against 656 mutations from the ProTherm database. We demonstrate that thermostability of the model enzymes haloalkane dehalogenase DhaA and γ-hexachlorocyclohexane dehydrochlorinase LinA can be substantially increased (ΔTm = 24°C and 21°C by constructing and characterizing only a handful of multiple-point mutants. FireProt can be applied to any protein for which a tertiary structure and homologous sequences are available, and will facilitate the rapid development of robust proteins for biomedical and biotechnological applications.

  7. Refficientlib: an efficient load-rebalanced adaptive mesh refinement algorithm for high-performance computational physics meshes

    OpenAIRE

    Baiges Aznar, Joan; Bayona Roa, Camilo Andrés

    2017-01-01

    No separate or additional fees are collected for access to or distribution of the work. In this paper we present a novel algorithm for adaptive mesh refinement in computational physics meshes in a distributed memory parallel setting. The proposed method is developed for nodally based parallel domain partitions where the nodes of the mesh belong to a single processor, whereas the elements can belong to multiple processors. Some of the main features of the algorithm presented in this paper a...

  8. Computed a multiple band metamaterial absorber and its application based on the figure of merit value

    Science.gov (United States)

    Chen, Chao; Sheng, Yuping; Jun, Wang

    2018-01-01

    A high performed multiple band metamaterial absorber is designed and computed through the software Ansofts HFSS 10.0, which is constituted with two kinds of separated metal particles sub-structures. The multiple band absorption property of the metamaterial absorber is based on the resonance of localized surface plasmon (LSP) modes excited near edges of metal particles. The damping constant of gold layer is optimized to obtain a near-perfect absorption rate. Four kinds of dielectric layers is computed to achieve the perfect absorption perform. The perfect absorption perform of the metamaterial absorber is enhanced through optimizing the structural parameters (R = 75 nm, w = 80 nm). Moreover, a perfect absorption band is achieved because of the plasmonic hybridization phenomenon between LSP modes. The designed metamaterial absorber shows high sensitive in the changed of the refractive index of the liquid. A liquid refractive index sensor strategy is proposed based on the computed figure of merit (FOM) value of the metamaterial absorber. High FOM values (116, 111, and 108) are achieved with three liquid (Methanol, Carbon tetrachloride, and Carbon disulfide).

  9. Data driven processor 'Vertex Trigger' for B experiments

    International Nuclear Information System (INIS)

    Hartouni, E.P.

    1993-01-01

    Data Driven Processors (DDP's) are specialized computation engines configured to solve specific numerical problems, such as vertex reconstruction. The architecture of the DDP which is the subject of this talk was designed and implemented by W. Sippach and B.C. Knapp at Nevis Lab. in the early 1980's. This particular implementation allows multiple parallel streams of data to provide input to a heterogenous collection of simple operators whose interconnection form an algorithm. The local data flow control allows this device to execute algorithms extremely quickly provided that care is taken in the layout of the algorithm. I/O rates of several hundred megabytes/second are routinely achieved thus making DDP's attractive candidates for complex online calculations. The original question was open-quote can a DDP reconstruct tracks in a Silicon Vertex Detector, find events with a separated vertex and do it fast enough to be used as an online trigger?close-quote Restating this inquiry as three questions and describing the answers to the questions will be the subject of this talk. The three specific questions are: (1) Can an algorithm be found which reconstructs tracks in a planar geometry and no magnetic field; (2) Can separated vertices be recognized in some way; (3) Can the algorithm be implemented in the Nevis-UMass and DDP and execute in 10-20 μs?

  10. Using multiple metaphors and multimodalities as a semiotic resource when teaching year 2 students computational strategies

    Science.gov (United States)

    Mildenhall, Paula; Sherriff, Barbara

    2017-06-01

    Recent research indicates that using multimodal learning experiences can be effective in teaching mathematics. Using a social semiotic lens within a participationist framework, this paper reports on a professional learning collaboration with a primary school teacher designed to explore the use of metaphors and modalities in mathematics instruction. This video case study was conducted in a year 2 classroom over two terms, with the focus on building children's understanding of computational strategies. The findings revealed that the teacher was able to successfully plan both multimodal and multiple metaphor learning experiences that acted as semiotic resources to support the children's understanding of abstract mathematics. The study also led to implications for teaching when using multiple metaphors and multimodalities.

  11. Computing multiple periodic solutions of nonlinear vibration problems using the harmonic balance method and Groebner bases

    Science.gov (United States)

    Grolet, Aurelien; Thouverez, Fabrice

    2015-02-01

    This paper is devoted to the study of vibration of mechanical systems with geometric nonlinearities. The harmonic balance method is used to derive systems of polynomial equations whose solutions give the frequency component of the possible steady states. Groebner basis methods are used for computing all solutions of polynomial systems. This approach allows to reduce the complete system to an unique polynomial equation in one variable driving all solutions of the problem. In addition, in order to decrease the number of variables, we propose to first work on the undamped system, and recover solution of the damped system using a continuation on the damping parameter. The search for multiple solutions is illustrated on a simple system, where the influence of the retained number of harmonic is studied. Finally, the procedure is applied on a simple cyclic system and we give a representation of the multiple states versus frequency.

  12. A Lévy HJM Multiple-Curve Model with Application to CVA Computation

    DEFF Research Database (Denmark)

    Crépey, Stéphane; Grbac, Zorana; Ngor, Nathalie

    2015-01-01

    , the calibration to OTM swaptions guaranteeing that the model correctly captures volatility smile effects and the calibration to co-terminal ATM swaptions ensuring an appropriate term structure of the volatility in the model. To account for counterparty risk and funding issues, we use the calibrated multiple......-curve model as an underlying model for CVA computation. We follow a reduced-form methodology through which the problem of pricing the counterparty risk and funding costs can be reduced to a pre-default Markovian BSDE, or an equivalent semi-linear PDE. As an illustration, we study the case of a basis swap...... and a related swaption, for which we compute the counterparty risk and funding adjustments...

  13. The display of multiple images derived from emission computed assisted tomography (ECAT)

    International Nuclear Information System (INIS)

    Jackson, P.C.; Davies, E.R.; Goddard, P.R.; Wilde, R.P.H.

    1983-01-01

    In emission computed assisted tomography, a technique has been developed to display the multiple sections of an organ within a single image, such that three dimensional appreciation of the organ can be obtained, whilst also preserving functional information. The technique when tested on phantoms showed no obvious deterioration in resolution and when used clinically gave satisfactory visual results. Such a method should allow easier appreciation of the extent of a lesion through an organ and thus allow dimensions to be obtained by direct measurement. (U.K.)

  14. Rational calculation accuracy in acousto-optical matrix-vector processor

    Science.gov (United States)

    Oparin, V. V.; Tigin, Dmitry V.

    1994-01-01

    The high speed of parallel computations for a comparatively small-size processor and acceptable power consumption makes the usage of acousto-optic matrix-vector multiplier (AOMVM) attractive for processing of large amounts of information in real time. The limited accuracy of computations is an essential disadvantage of such a processor. The reduced accuracy requirements allow for considerable simplification of the AOMVM architecture and the reduction of the demands on its components.

  15. Processor-in-memory-and-storage architecture

    Science.gov (United States)

    DeBenedictis, Erik

    2018-01-02

    A method and apparatus for performing reliable general-purpose computing. Each sub-core of a plurality of sub-cores of a processor core processes a same instruction at a same time. A code analyzer receives a plurality of residues that represents a code word corresponding to the same instruction and an indication of whether the code word is a memory address code or a data code from the plurality of sub-cores. The code analyzer determines whether the plurality of residues are consistent or inconsistent. The code analyzer and the plurality of sub-cores perform a set of operations based on whether the code word is a memory address code or a data code and a determination of whether the plurality of residues are consistent or inconsistent.

  16. Satellite on-board real-time SAR processor prototype

    Science.gov (United States)

    Bergeron, Alain; Doucet, Michel; Harnisch, Bernd; Suess, Martin; Marchese, Linda; Bourqui, Pascal; Desnoyers, Nicholas; Legros, Mathieu; Guillot, Ludovic; Mercier, Luc; Châteauneuf, François

    2017-11-01

    A Compact Real-Time Optronic SAR Processor has been successfully developed and tested up to a Technology Readiness Level of 4 (TRL4), the breadboard validation in a laboratory environment. SAR, or Synthetic Aperture Radar, is an active system allowing day and night imaging independent of the cloud coverage of the planet. The SAR raw data is a set of complex data for range and azimuth, which cannot be compressed. Specifically, for planetary missions and unmanned aerial vehicle (UAV) systems with limited communication data rates this is a clear disadvantage. SAR images are typically processed electronically applying dedicated Fourier transformations. This, however, can also be performed optically in real-time. Originally the first SAR images were optically processed. The optical Fourier processor architecture provides inherent parallel computing capabilities allowing real-time SAR data processing and thus the ability for compression and strongly reduced communication bandwidth requirements for the satellite. SAR signal return data are in general complex data. Both amplitude and phase must be combined optically in the SAR processor for each range and azimuth pixel. Amplitude and phase are generated by dedicated spatial light modulators and superimposed by an optical relay set-up. The spatial light modulators display the full complex raw data information over a two-dimensional format, one for the azimuth and one for the range. Since the entire signal history is displayed at once, the processor operates in parallel yielding real-time performances, i.e. without resulting bottleneck. Processing of both azimuth and range information is performed in a single pass. This paper focuses on the onboard capabilities of the compact optical SAR processor prototype that allows in-orbit processing of SAR images. Examples of processed ENVISAT ASAR images are presented. Various SAR processor parameters such as processing capabilities, image quality (point target analysis), weight and

  17. Primary intestinal lymphangiectasia: Multiple detector computed tomography findings after direct lymphangiography.

    Science.gov (United States)

    Sun, Xiaoli; Shen, Wenbin; Chen, Xiaobai; Wen, Tingguo; Duan, Yongli; Wang, Rengui

    2017-10-01

    To analyse the findings of multiple detector computed tomography (MDCT) after direct lymphangiography in primary intestinal lymphangiectasia (PIL). Fifty-five patients with PIL were retrospectively reviewed. All patients underwent MDCT after direct lymphangiography. The pathologies of 16 patients were confirmed by surgery and the remaining 39 patients were confirmed by gastroendoscopy and/or capsule endoscopy. After direct lymphangiography, MDCT found intra- and extraintestinal as well as lymphatic vessel abnormalities. Among the intra- and extraintestinal disorders, 49 patients had varying degrees of intestinal dilatation, 46 had small bowel wall thickening, 9 had pleural and pericardial effusions, 21 had ascites, 41 had mesenteric oedema, 20 had mesenteric nodules and 9 had abdominal lymphatic cysts. Features of lymphatic vessel abnormalities included intestinal trunk reflux (43.6%, n = 24), lumbar trunk reflux (89.1%, n = 49), pleural and pulmonary lymph reflux (14.5%, n = 8), pericardial and mediastinal lymph reflux (16.4%, n = 9), mediastinal and pulmonary lymph reflux (18.2%, n = 10), and thoracic duct outlet obstruction (90.9%, n = 50). Multiple detector computed tomography after direct lymphangiography provides a safe and accurate examination method and is an excellent tool for the diagnosis of PIL. © 2017 The Royal Australian and New Zealand College of Radiologists.

  18. Computed tomography in multiple trauma patients. Technical aspects, work flow, and dose reduction

    International Nuclear Information System (INIS)

    Fellner, F.A.; Krieger, J.; Floery, D.; Lechner, N.

    2014-01-01

    Patients with severe, life-threatening trauma require a fast and accurate clinical and imaging diagnostic workup during the first phase of trauma management. Early whole-body computed tomography has clearly been proven to be the current standard of care of these patients. A similar imaging quality can be achieved in the multiple trauma setting compared with routine imaging especially using rapid, latest generation computed tomography (CT) scanners. This article encompasses a detailed view on the use of CT in patients with life-threatening trauma. A special focus is placed on radiological procedures in trauma units and on the methods for CT workup in routine cases and in challenging situations. Another focus discusses the potential of dose reduction of CT scans in multiple trauma as well as the examination of children with severe trauma. Various studies have demonstrated that early whole-body CT positively correlates with low morbidity and mortality and is clearly superior to the use of other imaging modalities. Optimal trauma unit management means a close cooperation between trauma surgeons, anesthesiologists and radiologists, whereby the radiologist is responsible for a rapid and accurate radiological workup and the rapid communication of imaging findings. However, even in the trauma setting, aspects of patient radiation doses should be kept in mind. (orig.) [de

  19. Seeing is believing: video classification for computed tomographic colonography using multiple-instance learning.

    Science.gov (United States)

    Wang, Shijun; McKenna, Matthew T; Nguyen, Tan B; Burns, Joseph E; Petrick, Nicholas; Sahiner, Berkman; Summers, Ronald M

    2012-05-01

    In this paper, we present development and testing results for a novel colonic polyp classification method for use as part of a computed tomographic colonography (CTC) computer-aided detection (CAD) system. Inspired by the interpretative methodology of radiologists using 3-D fly-through mode in CTC reading, we have developed an algorithm which utilizes sequences of images (referred to here as videos) for classification of CAD marks. For each CAD mark, we created a video composed of a series of intraluminal, volume-rendered images visualizing the detection from multiple viewpoints. We then framed the video classification question as a multiple-instance learning (MIL) problem. Since a positive (negative) bag may contain negative (positive) instances, which in our case depends on the viewing angles and camera distance to the target, we developed a novel MIL paradigm to accommodate this class of problems. We solved the new MIL problem by maximizing a L2-norm soft margin using semidefinite programming, which can optimize relevant parameters automatically. We tested our method by analyzing a CTC data set obtained from 50 patients from three medical centers. Our proposed method showed significantly better performance compared with several traditional MIL methods.

  20. Computed tomography contrast media extravasation: treatment algorithm and immediate treatment by squeezing with multiple slit incisions.

    Science.gov (United States)

    Kim, Sue Min; Cook, Kyung Hoon; Lee, Il Jae; Park, Dong Ha; Park, Myong Chul

    2017-04-01

    In our hospital, an adverse event reporting system was initiated that alerts the plastic surgery department immediately after suspecting contrast media extravasation injury. This system is particularly important for a large volume of extravasation during power injector use. Between March 2011 and May 2015, a retrospective chart review was performed on all patients experiencing contrast media extravasation while being treated at our hospital. Immediate treatment by squeezing with multiple slit incisions was conducted for a portion of these patients. Eighty cases of extravasation were reported from approximately 218 000 computed tomography scans. The expected extravasation volume was larger than 50 ml, or severe pressure was felt on the affected limb in 23 patients. They were treated with multiple slit incisions followed by squeezing. Oedema of the affected limb disappeared after 1-2 hours after treatment, and the skin incisions healed within a week. We propose a set of guidelines for the initial management of contrast media extravasation injuries for a timely intervention. For large-volume extravasation cases, immediate management with multiple slit incisions is safe and effective in reducing the swelling quickly, preventing patient discomfort and decreasing skin and soft tissue problems. © 2016 Medicalhelplines.com Inc and John Wiley & Sons Ltd.

  1. Evolving Non-Dominated Parameter Sets for Computational Models from Multiple Experiments

    Science.gov (United States)

    Lane, Peter C. R.; Gobet, Fernand

    2013-03-01

    Creating robust, reproducible and optimal computational models is a key challenge for theorists in many sciences. Psychology and cognitive science face particular challenges as large amounts of data are collected and many models are not amenable to analytical techniques for calculating parameter sets. Particular problems are to locate the full range of acceptable model parameters for a given dataset, and to confirm the consistency of model parameters across different datasets. Resolving these problems will provide a better understanding of the behaviour of computational models, and so support the development of general and robust models. In this article, we address these problems using evolutionary algorithms to develop parameters for computational models against multiple sets of experimental data; in particular, we propose the `speciated non-dominated sorting genetic algorithm' for evolving models in several theories. We discuss the problem of developing a model of categorisation using twenty-nine sets of data and models drawn from four different theories. We find that the evolutionary algorithms generate high quality models, adapted to provide a good fit to all available data.

  2. FACC: A Novel Finite Automaton Based on Cloud Computing for the Multiple Longest Common Subsequences Search

    Directory of Open Access Journals (Sweden)

    Yanni Li

    2012-01-01

    Full Text Available Searching for the multiple longest common subsequences (MLCS has significant applications in the areas of bioinformatics, information processing, and data mining, and so forth, Although a few parallel MLCS algorithms have been proposed, the efficiency and effectiveness of the algorithms are not satisfactory with the increasing complexity and size of biologic data. To overcome the shortcomings of the existing MLCS algorithms, and considering that MapReduce parallel framework of cloud computing being a promising technology for cost-effective high performance parallel computing, a novel finite automaton (FA based on cloud computing called FACC is proposed under MapReduce parallel framework, so as to exploit a more efficient and effective general parallel MLCS algorithm. FACC adopts the ideas of matched pairs and finite automaton by preprocessing sequences, constructing successor tables, and common subsequences finite automaton to search for MLCS. Simulation experiments on a set of benchmarks from both real DNA and amino acid sequences have been conducted and the results show that the proposed FACC algorithm outperforms the current leading parallel MLCS algorithm FAST-MLCS.

  3. Cross-scale Efficient Tensor Contractions for Coupled Cluster Computations Through Multiple Programming Model Backends

    Energy Technology Data Exchange (ETDEWEB)

    Ibrahim, Khaled Z. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Epifanovsky, Evgeny [Q-Chem, Inc., Pleasanton, CA (United States); Williams, Samuel W. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Krylov, Anna I. [Univ. of Southern California, Los Angeles, CA (United States). Dept. of Chemistry

    2016-07-26

    Coupled-cluster methods provide highly accurate models of molecular structure by explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts to extend the Libtensor framework to work in the distributed memory environment in a scalable and energy efficient manner. We achieve up to 240 speedup compared with the best optimized shared memory implementation. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures, (Cray XC30&XC40, BlueGene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance. Nevertheless, we preserve a uni ed interface to both programming models to maintain the productivity of computational quantum chemists.

  4. Application of Soft Computing Techniques and Multiple Regression Models for CBR prediction of Soils

    Directory of Open Access Journals (Sweden)

    Fatimah Khaleel Ibrahim

    2017-08-01

    Full Text Available The techniques of soft computing technique such as Artificial Neutral Network (ANN have improved the predicting capability and have actually discovered application in Geotechnical engineering. The aim of this research is to utilize the soft computing technique and Multiple Regression Models (MLR for forecasting the California bearing ratio CBR( of soil from its index properties. The indicator of CBR for soil could be predicted from various soils characterizing parameters with the assist of MLR and ANN methods. The data base that collected from the laboratory by conducting tests on 86 soil samples that gathered from different projects in Basrah districts. Data gained from the experimental result were used in the regression models and soft computing techniques by using artificial neural network. The liquid limit, plastic index , modified compaction test and the CBR test have been determined. In this work, different ANN and MLR models were formulated with the different collection of inputs to be able to recognize their significance in the prediction of CBR. The strengths of the models that were developed been examined in terms of regression coefficient (R2, relative error (RE% and mean square error (MSE values. From the results of this paper, it absolutely was noticed that all the proposed ANN models perform better than that of MLR model. In a specific ANN model with all input parameters reveals better outcomes than other ANN models.

  5. Broadcasting collective operation contributions throughout a parallel computer

    Science.gov (United States)

    Faraj, Ahmad [Rochester, MN

    2012-02-21

    Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.

  6. Role of whole-body 64-slice multidetector computed tomography in treatment planning for multiple myeloma.

    Science.gov (United States)

    Razek, Ahmed Abdel Khalek Abdel; Ezzat, Amany; Azmy, Emad; Tharwat, Nehal

    2013-08-01

    The authors evaluated the role of whole-body 64-slice multidetector computed tomography (WB-MDCT) in treatment planning for multiple myeloma. This was a prospective study of 28 consecutive patients with multiple myeloma (19 men, nine women; age range, 51-73 years; mean age, 60 years) who underwent WB-MDCT and conventional radiography (CR) of the skeleton. The images were interpreted for the presence of bony lesions, medullary lesions, fractures and extraosseous lesions. We evaluated any changes in treatment planning as a result of WB-MDCT findings. WB-MDCT was superior to CR for detecting bony lesions (p=0.001), especially of the spine (p=0.001) and thoracic cage (p=0.006). WB-MDCT upstaged 14 patients, with a significant difference in staging (p=0.002) between WB-MDCT and CR. Medullary involvement either focal (n=6) or diffuse (n=3) had a positive correlation with the overall score (r=0.790) and stage (r=0.618) of disease. Spine fractures were better detected at WB-MDCT (n=4) than at CR (n=2). Extraosseous soft tissue lesions (n=7) were detected only at WB-MDCT. Findings detected at the WB-MDCT led to changes in the patient's treatment plan in 39% of cases. Upstaging of seven patients (25%) altered the medical treatment plan, and four of 28 (14%) patients required additional radiotherapy (7%) and vertebroplasty (7%). We conclude that WB-MDCT has an impact on treatment planning and prognosis in patients with multiple myeloma, as it has high rate of detecting cortical and medullary bone lesions, spinal fracture and extraosseous lesions. This information may alter treatment planning in multiple myeloma due to disease upstaging and detection of spine fracture and extraosseous spinal lesions.

  7. Evaluation of the Intel Westmere-EP server processor

    CERN Document Server

    Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

    2010-01-01

    In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing the 6-core “Westmere-EP” processor with Intel’s previous generation of the same microarchitecture, the “Nehalem-EP”. The former is produced in a new 32nm process, the latter in 45nm. Both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores via Simultaneous Multi-Threading (SMT), the cache sizes available, the memory configuration installed, as well...

  8. Evaluation of the Intel Nehalem-EX server processor

    CERN Document Server

    Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

    2010-01-01

    In this paper we report on a set of benchmark results recently obtained by the CERN openlab by comparing the 4-socket, 32-core Intel Xeon X7560 server with the previous generation 4-socket server, based on the Xeon X7460 processor. The Xeon X7560 processor represents a major change in many respects, especially the memory sub-system, so it was important to make multiple comparisons. In most benchmarks the two 4-socket servers were compared. It should be underlined that both servers represent the “top of the line” in terms of frequency. However, in some cases, it was important to compare systems that integrated the latest processor features, such as QPI links, Symmetric multithreading and over-clocking via Turbo mode, and in such situations the X7560 server was compared to a dual socket L5520 based system with an identical frequency of 2.26 GHz. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following ...

  9. Programming massively parallel processors a hands-on approach

    CERN Document Server

    Kirk, David B

    2010-01-01

    Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...

  10. Measuring the Effect of Gender on Computer Attitudes among Pre-Service Teachers: A Multiple Indicators, Multiple Causes (MIMIC) Modeling

    Science.gov (United States)

    Teo, Timothy

    2010-01-01

    Purpose: The purpose of this paper is to examine the effect of gender on pre-service teachers' computer attitudes. Design/methodology/approach: A total of 157 pre-service teachers completed a survey questionnaire measuring their responses to four constructs which explain computer attitude. These were administered during the teaching term where…

  11. Alternative Water Processor Test Development

    Science.gov (United States)

    Pickering, Karen D.; Mitchell, Julie; Vega, Leticia; Adam, Niklas; Flynn, Michael; Wjee (er. Rau); Lunn, Griffin; Jackson, Andrew

    2012-01-01

    The Next Generation Life Support Project is developing an Alternative Water Processor (AWP) as a candidate water recovery system for long duration exploration missions. The AWP consists of biological water processor (BWP) integrated with a forward osmosis secondary treatment system (FOST). The basis of the BWP is a membrane aerated biological reactor (MABR), developed in concert with Texas Tech University. Bacteria located within the MABR metabolize organic material in wastewater, converting approximately 90% of the total organic carbon to carbon dioxide. In addition, bacteria convert a portion of the ammonia-nitrogen present in the wastewater to nitrogen gas, through a combination of nitrogen and denitrification. The effluent from the BWP system is low in organic contaminants, but high in total dissolved solids. The FOST system, integrated downstream of the BWP, removes dissolved solids through a combination of concentration-driven forward osmosis and pressure driven reverse osmosis. The integrated system is expected to produce water with a total organic carbon less than 50 mg/l and dissolved solids that meet potable water requirements for spaceflight. This paper describes the test definition, the design of the BWP and FOST subsystems, and plans for integrated testing.

  12. The UA1 trigger processor

    International Nuclear Information System (INIS)

    Grayer, G.H.

    1981-01-01

    Experiment UA1 is a large multi-purpose spectrometer at the CERN proton-antiproton collider, scheduled for late 1981. The principal trigger is formed on the basis of the energy deposition in calorimeters. A trigger decision taken in under 2.4 microseconds can avoid dead time losses due to the bunched nature of the beam. To achieve this we have built fast 8-bit charge to digital converters followed by two identical digital processors tailored to the experiment. The outputs of groups of the 2440 photomultipliers in the calorimeters are summed to form a total of 288 input channels to the ADCs. A look-up table in RAM is used to convert the digitised photomultiplier signals to energy in one processor, combinations of input channels, and also counts the number of clusters with electromagnetic or hadronic energy above pre-determined levels. Up to twelve combinations of these conditions, together with external information, may be combined in coincidence or in veto to form the final trigger. Provision has been made for testing using simulated data in an off-line mode, and sampling real data when on-line. (orig.)

  13. Computer facilities for ISABELLE data handling

    International Nuclear Information System (INIS)

    Kramer, M.A.; Love, W.A.; Miller, R.J.; Zeller, M.

    1977-01-01

    The analysis of data produced by ISABELLE experiments will need a large system of computers. An official group of prospective users and operators of that system should begin planning now. Included in the array will be a substantial computer system at each ISABELLE intersection in use. These systems must include enough computer power to keep experimenters aware of the health of the experiment. This will require at least one very fast sophisticated processor in the system, the size depending on the experiment. Other features of the intersection systems must be a good, high speed graphic display, ability to record data on magnetic tape at 500 to 1000 KB, and a high speed link to a central computer. The operating system software must support multiple interactive users. A substantially larger capacity computer system, shared by the six intersection region experiments, must be available with good turnaround for experimenters while ISABELLE is running. A computer support group will be required to maintain the computer system and to provide and maintain software common to all experiments. Special superfast computing hardware or special function processors constructed with microprocessor circuitry may be necessary both in the data gathering and data processing work. Thus both the local and central processors should be chosen with the possibility of interfacing such devices in mind

  14. Data register and processor for multiwire chambers

    International Nuclear Information System (INIS)

    Karpukhin, V.V.

    1985-01-01

    A data register and a processor for data receiving and processing from drift chambers of a device for investigating relativistic positroniums are described. The data are delivered to the register input in the form of the Grey 8 bit code, memorized and transformed to a position code. The register information is delivered to the KAMAK trunk and to the front panel plug. The processor selects particle tracks in a horizontal plane of the facility. ΔY maximum coordinate divergence and minimum point quantity on the track are set from the processor front panel. Processor solution time is 16 μs maximum quantity of simultaneously analyzed coordinates is 16

  15. Sensitometric control of roentgen film processors

    International Nuclear Information System (INIS)

    Forsberg, H.; Karolinska Sjukhuset, Stockholm

    1987-01-01

    Monitoring of film processors performance is essential since image quality, patient dose and costs are influenced by the performance. A system for sensitometric constancy control of film processors and their associated components is described. Experience with the system for 3 years is given when implemented on 17 film processors. Modern high quality film processors have a stability that makes a test frequency of once a week sufficient to maintain adequate image quality. The test system is so sensitive that corrective actions almost invariably have been taken before any technical problem degraded the image quality to a visible degree. (orig.)

  16. Processor tradeoffs in distributed real-time systems

    Science.gov (United States)

    Krishna, C. M.; Shin, Kang G.; Bhandari, Inderpal S.

    1987-01-01

    The problem of the optimization of the design of real-time distributed systems is examined with reference to a class of computer architectures similar to the continuously reconfigurable multiprocessor flight control system structure, CM2FCS. Particular attention is given to the impact of processor replacement and the burn-in time on the probability of dynamic failure and mean cost. The solution is obtained numerically and interpreted in the context of real-time applications.

  17. Computer experiments of the time-sequence of individual steps in multiple Coulomb-excitation

    International Nuclear Information System (INIS)

    Boer, J. de; Dannhaueser, G.

    1982-01-01

    The way in which the multiple E2 steps in the Coulomb-excitation of a rotational band of a nucleus follow one another is elucidated for selected examples using semiclassical computer experiments. The role a given transition plays for the excitation of a given final state is measured by a quantity named ''importance function''. It is found that these functions, calculated for the highest rotational state, peak at times forming a sequence for the successive E2 transitions starting from the ground state. This sequential behaviour is used to approximately account for the effects on the projectile orbit of the sequential transfer of excitation energy and angular momentum from projectile to target. These orbits lead to similar deflection functions and cross sections as those obtained from a symmetrization procedure approximately accounting for the transfer of angular momentum and energy. (Auth.)

  18. Iodide and xenon enhancement of computed tomography (CT) in multiple sclerosis (MS)

    International Nuclear Information System (INIS)

    Radue, E.W.; Kendall, B.E.

    1978-01-01

    The characteristic findings on computed tomography (CT) in multiple sclerosis (MS) are discussed. In a series of 49 cases plain CT was normal in 21 (43%), cerebral atrophy alone was present in 17 (35%) and plaques were visible in 11 (23%). These were most often adjacent to the lateral ventricles (14 plaques) and in the parietal white matter (10 plaques). CT was performed after the intravenous administration of iodide in 16 of these cases. Two patients with low attenuation plaques were scanned with xenon enhancement; the plaques absorbed less xenon than the corresponding contralateral brain substance and additional, previously isodense plaques were revealed. In one case the white matter absorbed much less xenon than normal and its uptake relative to grey matter was reduced. (orig.) [de

  19. X-ray luminescence computed tomography imaging via multiple intensity weighted narrow beam irradiation

    Science.gov (United States)

    Feng, Bo; Gao, Feng; Zhao, Huijuan; Zhang, Limin; Li, Jiao; Zhou, Zhongxing

    2018-02-01

    The purpose of this work is to introduce and study a novel x-ray beam irradiation pattern for X-ray Luminescence Computed Tomography (XLCT), termed multiple intensity-weighted narrow-beam irradiation. The proposed XLCT imaging method is studied through simulations of x-ray and diffuse lights propagation. The emitted optical photons from X-ray excitable nanophosphors were collected by optical fiber bundles from the right-side surface of the phantom. The implementation of image reconstruction is based on the simulated measurements from 6 or 12 angular projections in terms of 3 or 5 x-ray beams scanning mode. The proposed XLCT imaging method is compared against the constant intensity weighted narrow-beam XLCT. From the reconstructed XLCT images, we found that the Dice similarity and quantitative ratio of targets have a certain degree of improvement. The results demonstrated that the proposed method can offer simultaneously high image quality and fast image acquisition.

  20. Mining Emerging Patterns for Recognizing Activities of Multiple Users in Pervasive Computing

    DEFF Research Database (Denmark)

    Gu, Tao; Wu, Zhanqing; Wang, Liang

    2009-01-01

    Understanding and recognizing human activities from sensor readings is an important task in pervasive computing. Existing work on activity recognition mainly focuses on recognizing activities for a single user in a smart home environment. However, in real life, there are often multiple inhabitants...... activity models, and propose an Emerging Pattern based Multi-user Activity Recognizer (epMAR) to recognize both single-user and multiuser activities. We conduct our empirical studies by collecting real-world activity traces done by two volunteers over a period of two weeks in a smart home environment...... sensor readings in a home environment, and propose a novel pattern mining approach to recognize both single-user and multi-user activities in a unified solution. We exploit Emerging Pattern – a type of knowledge pattern that describes significant changes between classes of data – for constructing our...

  1. Three-dimensional multiple reciprocity boundary element method for one-group neutron diffusion eigenvalue computations

    International Nuclear Information System (INIS)

    Itagaki, Masafumi; Sahashi, Naoki.

    1996-01-01

    The multiple reciprocity method (MRM) in conjunction with the boundary element method has been employed to solve one-group eigenvalue problems described by the three-dimensional (3-D) neutron diffusion equation. The domain integral related to the fission source is transformed into a series of boundary-only integrals, with the aid of the higher order fundamental solutions based on the spherical and the modified spherical Bessel functions. Since each degree of the higher order fundamental solutions in the 3-D cases has a singularity of order (1/r), the above series of boundary integrals requires additional terms which do not appear in the 2-D MRM formulation. The critical eigenvalue itself can be also described using only boundary integrals. Test calculations show that Wielandt's spectral shift technique guarantees rapid and stable convergence of 3-D MRM computations. (author)

  2. A fast continuous magnetic field measurement system based on digital signal processors

    Energy Technology Data Exchange (ETDEWEB)

    Velev, G.V.; Carcagno, R.; DiMarco, J.; Kotelnikov, S.; Lamm, M.; Makulski, A.; /Fermilab; Maroussov, V.; /Purdue U.; Nehring, R.; Nogiec, J.; Orris, D.; /Fermilab; Poukhov,; Prakoshyn, F.; /Dubna, JINR; Schlabach, P.; Tompkins, J.C.; /Fermilab

    2005-09-01

    In order to study dynamic effects in accelerator magnets, such as the decay of the magnetic field during the dwell at injection and the rapid so-called ''snapback'' during the first few seconds of the resumption of the energy ramp, a fast continuous harmonics measurement system was required. A new magnetic field measurement system, based on the use of digital signal processors (DSP) and Analog to Digital (A/D) converters, was developed and prototyped at Fermilab. This system uses Pentek 6102 16 bit A/D converters and the Pentek 4288 DSP board with the SHARC ADSP-2106 family digital signal processor. It was designed to acquire multiple channels of data with a wide dynamic range of input signals, which are typically generated by a rotating coil probe. Data acquisition is performed under a RTOS, whereas processing and visualization are performed under a host computer. Firmware code was developed for the DSP to perform fast continuous readout of the A/D FIFO memory and integration over specified intervals, synchronized to the probe's rotation in the magnetic field. C, C++ and Java code was written to control the data acquisition devices and to process a continuous stream of data. The paper summarizes the characteristics of the system and presents the results of initial tests and measurements.

  3. A fast continuous magnetic field measurement system based on digital signal processors

    International Nuclear Information System (INIS)

    Velev, G.V.; Carcagno, R.; DiMarco, J.; Kotelnikov, S.; Lamm, M.; Makulski, A.; Maroussov, V.; Nehring, R.; Nogiec, J.; Orris, D.; Poukhov, O.; Prakoshyn, F.; Schlabach, P.; Tompkins, J.C.

    2005-01-01

    In order to study dynamic effects in accelerator magnets, such as the decay of the magnetic field during the dwell at injection and the rapid so-called ''snapback'' during the first few seconds of the resumption of the energy ramp, a fast continuous harmonics measurement system was required. A new magnetic field measurement system, based on the use of digital signal processors (DSP) and Analog to Digital (A/D) converters, was developed and prototyped at Fermilab. This system uses Pentek 6102 16 bit A/D converters and the Pentek 4288 DSP board with the SHARC ADSP-2106 family digital signal processor. It was designed to acquire multiple channels of data with a wide dynamic range of input signals, which are typically generated by a rotating coil probe. Data acquisition is performed under a RTOS, whereas processing and visualization are performed under a host computer. Firmware code was developed for the DSP to perform fast continuous readout of the A/D FIFO memory and integration over specified intervals, synchronized to the probe's rotation in the magnetic field. C, C++ and Java code was written to control the data acquisition devices and to process a continuous stream of data. The paper summarizes the characteristics of the system and presents the results of initial tests and measurements

  4. Precision analog signal processor for beam position measurements in electron storage rings

    International Nuclear Information System (INIS)

    Hinkson, J.A.; Unser, K.B.

    1995-05-01

    Beam position monitors (BPM) in electron and positron storage rings have evolved from simple systems composed of beam pickups, coaxial cables, multiplexing relays, and a single receiver (usually a analyzer) into very complex and costly systems of multiple receivers and processors. The older may have taken minutes to measure the circulating beam closed orbit. Today instrumentation designers are required to provide high-speed measurements of the beam orbit, often at the ring revolution frequency. In addition the instruments must have very high accuracy and resolution. A BPM has been developed for the Advanced Light Source (ALS) in Berkeley which features high resolution and relatively low cost. The instrument has a single purpose; to measure position of a stable stored beam. Because the pickup signals are multiplexed into a single receiver, and due to its narrow bandwidth, the receiver is not intended for single-turn studies. The receiver delivers normalized measurements of X and Y position entirely by analog means at nominally 1 V/mm. No computers are involved. No software is required. Bergoz, a French company specializing in precision beam instrumentation, integrated the ALS design m their new BPM analog signal processor module. Performance comparisons were made on the ALS. In this paper we report on the architecture and performance of the ALS prototype BPM

  5. Precision analog signal processor for beam position measurements in electron storage rings

    International Nuclear Information System (INIS)

    Hinkson, J.A.; Unser, K.B.

    1995-01-01

    Beam position monitors (BPM) in electron and positron storage rings have evolved from simple systems composed of beam pickups, coaxial cables, multiplexing relays, and a single receiver (usually a analyzer) into very complex and costly systems of multiple receivers and processors. The older may have taken minutes to measure the circulating beam closed orbit. Today instrumentation designers are required to provide high-speed measurements of the beam orbit, often at the ring revolution frequency. In addition the instruments must have very high accuracy and resolution. A BPM has been developed for the Advanced Light Source (ALS) in Berkeley which features high resolution and relatively low cost. The instrument has a single purpose; to measure position of a stable stored beam. Because the pickup signals are multiplexed into a single receiver, and due to its narrow bandwidth, the receiver is not intended for single-turn studies. The receiver delivers normalized measurements of X and Y posit ion entirely by analog means at nominally 1 V/mm. No computers are involved. No software is required. Bergoz, a French company specializing in precision beam instrumentation, integrated the ALS design m their new BPM analog signal processor module. Performance comparisons were made on the ALS. In this paper we report on the architecture and performance of the ALS prototype BPM

  6. Thermal Dissipation Efficiency in a Micro-Processor Using Carbon Nanotubes Based Composite

    Science.gov (United States)

    Thang, Bui Hung; Van Quang, Cao; Nghia, Van Trong; Hong, Phan Ngoc; Van Chuc, Nguyen; Tam, Ngo Thi Thanh; Quang, Le Dinh; Khang, Dao Duc; Khoi, Phan Hong; Minh, Phan Ngoc

    2009-09-01

    Modern electronic and optoelectronic devices such as μ-processor, light emitting diode, semiconductor laser issued a challenge in the thermal dissipation problem. Finding an effective way for thermal dissipation therefore becomes a very important issue. It is known that carbon nanotubes (CNTs) is one of the most valuable materials with high thermal conductivity (2000 W/m.K compared to thermal conductivity of Ag 419 W/m.K). This suggested an approach in applying the CNTs as an essential component for thermal dissipation media to improve the performance of computer processor and other high power electronic devices. In this work multi walled carbon nanotubes (MWCNTs) based composites were utilized as the thermal dissipation media in a micro processor of a personal computer. The MWCNTs of different concentrations were added into polyaniline, commercial silicon thermal paste and commercial silver thermal paste by mechanical methods. A personal computer with configuration: Intel Pentium IV 3.066 GHz, 512 MB of RAM and Windows XP Service Pack 2 Operating System was employed. The thermal dissipation efficiency of the system was evaluated by directly measure the temperature of the μ-processor during the operation of the computer in different CPU speeds. The measured results showed that the CNTs based composite could reduce the temperature of the u-processor more than 5° C, and the time for increasing the temperature of the μ-processor was three times longer than that when using commercial thermal paste.

  7. The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

    CERN Document Server

    Biesuz, Nicolo Vladi; The ATLAS collaboration; Luciano, Pierluigi; Magalotti, Daniel; Rossi, Enrico

    2015-01-01

    The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed on purpose to execute pattern matching with a high degree of parallelism. It finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 2 Gb/s serial links. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. We report on the performance of the intermedia...

  8. The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

    CERN Document Server

    Andreani, A; The ATLAS collaboration; Beccherle, R; Beretta, M; Cipriani, R; Citraro, S; Citterio, M; Colombo, A; Crescioli, F; Dimas, D; Donati, S; Giannetti, P; Kordas, K; Lanza, A; Liberali, V; Luciano, P; Magalotti, D; Neroutsos, P; Nikolaidis, S; Piendibene, M; Sakellariou, A; Shojaii, S; Sotiropoulou, C-L; Stabile, A

    2014-01-01

    The Associative Memory (AM) system of the FTK processor has been designed to perform pattern matching using the hit information of the ATLAS silicon tracker. The AM is the heart of the FTK and it finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside the FTK, multiple designs and tests have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 2 Gb/s serial links. This paper reports on the design of the Serial Link Processor consisting of the AM chip, an ASIC designed and optimized to perform pattern matching, and two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. Special relevance will be given to the AMchip design that includes two custom cells optimized for low consumption. We repo...

  9. The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

    CERN Document Server

    Biesuz, Nicolo Vladi; The ATLAS collaboration; Luciano, Pierluigi; Magalotti, Daniel; Rossi, Enrico

    2015-01-01

    The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed to execute pattern matching with a high degree of parallelism. The AM system finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 828 2 Gbit/s serial links for a total in/out bandwidth of 56 Gb/s. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. ...

  10. Novel memory architecture for video signal processor

    Science.gov (United States)

    Hung, Jen-Sheng; Lin, Chia-Hsing; Jen, Chein-Wei

    1993-11-01

    An on-chip memory architecture for video signal processor (VSP) is proposed. This memory structure is a two-level design for the different data locality in video applications. The upper level--Memory A provides enough storage capacity to reduce the impact on the limitation of chip I/O bandwidth, and the lower level--Memory B provides enough data parallelism and flexibility to meet the requirements of multiple reconfigurable pipeline function units in a single VSP chip. The needed memory size is decided by the memory usage analysis for video algorithms and the number of function units. Both levels of memory adopted a dual-port memory scheme to sustain the simultaneous read and write operations. Especially, Memory B uses multiple one-read-one-write memory banks to emulate the real multiport memory. Therefore, one can change the configuration of Memory B to several sets of memories with variable read/write ports by adjusting the bus switches. Then the numbers of read ports and write ports in proposed memory can meet requirement of data flow patterns in different video coding algorithms. We have finished the design of a prototype memory design using 1.2- micrometers SPDM SRAM technology and will fabricated it through TSMC, in Taiwan.

  11. A CNN-Specific Integrated Processor

    Directory of Open Access Journals (Sweden)

    Suleyman Malki

    2009-01-01

    Full Text Available Integrated Processors (IP are algorithm-specific cores that either by programming or by configuration can be re-used within many microelectronic systems. This paper looks at Cellular Neural Networks (CNN to become realized as IP. First current digital implementations are reviewed, and the memoryprocessor bandwidth issues are analyzed. Then a generic view is taken on the structure of the network, and a new intra-communication protocol based on rotating wheels is proposed. It is shown that this provides for guaranteed high-performance with a minimal network interface. The resulting node is small and supports multi-level CNN designs, giving the system a 30-fold increase in capacity compared to classical designs. As it facilitates multiple operations on a single image, and single operations on multiple images, with minimal access to the external image memory, balancing the internal and external data transfer requirements optimizes the system operation. In conventional digital CNN designs, the treatment of boundary nodes requires additional logic to handle the CNN value propagation scheme. In the new architecture, only a slight modification of the existing cells is necessary to model the boundary effect. A typical prototype for visual pattern recognition will house 4096 CNN cells with a 2% overhead for making it an IP.

  12. Using of opportunities of graphic processors for acceleration of scientific and technical calculations

    International Nuclear Information System (INIS)

    Dudnik, V.A.; Kudryavtsev, V.I.; Sereda, T.M.; Us, S.A.; Shestakov, M.V.

    2009-01-01

    The new opportunities of modern graphic processors (GPU) for acceleration of the scientific and technical calculations with the help of paralleling of a calculating task between the central processor and GPU are described. The description of using the technology NVIDIA CUDA for connection of parallel computing opportunities of GPU within the programme of the some intensive mathematical tasks is resulted. The examples of comparison of parameters of productivity in the process of these tasks' calculation without application of GPU and with use of opportunities NVIDIA CUDA for graphic processor GeForce 8800 are resulted

  13. Numerical Computation of Underground Inundation in Multiple Layers Using the Adaptive Transfer Method

    Directory of Open Access Journals (Sweden)

    Hyung-Jun Kim

    2018-01-01

    Full Text Available Extreme rainfall causes surface runoff to flow towards lowlands and subterranean facilities, such as subway stations and buildings with underground spaces in densely packed urban areas. These facilities and areas are therefore vulnerable to catastrophic submergence. However, flood modeling of underground space has not yet been adequately studied because there are difficulties in reproducing the associated multiple horizontal layers connected with staircases or elevators. This study proposes a convenient approach to simulate underground inundation when two layers are connected. The main facet of this approach is to compute the flow flux passing through staircases in an upper layer and to transfer the equivalent quantity to a lower layer. This is defined as the ‘adaptive transfer method’. This method overcomes the limitations of 2D modeling by introducing layers connecting concepts to prevent large variations in mesh sizes caused by complicated underlying obstacles or local details. Consequently, this study aims to contribute to the numerical analysis of flow in inundated underground spaces with multiple floors.

  14. Computed tomography imaging of early coronary artery lesions in stable individuals with multiple cardiovascular risk factors

    Directory of Open Access Journals (Sweden)

    Xi Yang

    2015-04-01

    Full Text Available OBJECTIVES: To investigate the prevalence, extent, severity, and features of coronary artery lesions in stable patients with multiple cardiovascular risk factors. METHODS: Seventy-seven patients with more than 3 cardiovascular risk factors were suspected of having coronary artery disease. Patients with high-risk factors and 39 controls with no risk factors were enrolled in the study. The related risk factors included hypertension, impaired glucose tolerance, dyslipidemia, smoking history, and overweight. The characteristics of coronary lesions were identified and evaluated by 64-slice coronary computed tomography angiography. RESULTS: The incidence of coronary atherosclerosis was higher in the high-risk group than in the no-risk group. The involved branches of the coronary artery, the diffusivity of the lesion, the degree of stenosis, and the nature of the plaques were significantly more severe in the high-risk group compared with the no-risk group (all p < 0.05. CONCLUSION: Among stable individuals with high-risk factors, early coronary artery lesions are common and severe. Computed tomography has promising value for the early screening of coronary lesions.

  15. The computational form of craving is a selective multiplication of economic value.

    Science.gov (United States)

    Konova, Anna B; Louie, Kenway; Glimcher, Paul W

    2018-04-17

    Craving is thought to be a specific desire state that biases choice toward the desired object, be it chocolate or drugs. A vast majority of people report having experienced craving of some kind. In its pathological form craving contributes to health outcomes in addiction and obesity. Yet despite its ubiquity and clinical relevance we still lack a basic neurocomputational understanding of craving. Here, using an instantaneous measure of subjective valuation and selective cue exposure, we identify a behavioral signature of a food craving-like state and advance a computational framework for understanding how this state might transform valuation to bias choice. We find desire induced by exposure to a specific high-calorie, high-fat/sugar snack good is expressed in subjects' momentary willingness to pay for this good. This effect is selective but not exclusive to the exposed good; rather, we find it generalizes to nonexposed goods in proportion to their subjective attribute similarity to the exposed ones. A second manipulation of reward size (number of snack units available for purchase) further suggested that a multiplicative gain mechanism supports the transformation of valuation during laboratory craving. These findings help explain how real-world food craving can result in behaviors inconsistent with preferences expressed in the absence of craving and open a path for the computational modeling of craving-like phenomena using a simple and repeatable experimental tool for assessing subjective states in economic terms. Copyright © 2018 the Author(s). Published by PNAS.

  16. Producing chopped firewood with firewood processors

    International Nuclear Information System (INIS)

    Kaerhae, K.; Jouhiaho, A.

    2009-01-01

    The TTS Institute's research and development project studied both the productivity of new, chopped firewood processors (cross-cutting and splitting machines) suitable for professional and independent small-scale production, and the costs of the chopped firewood produced. Seven chopped firewood processors were tested in the research, six of which were sawing processors and one shearing processor. The chopping work was carried out using wood feeding racks and a wood lifter. The work was also carried out without any feeding appliances. Altogether 132.5 solid m 3 of wood were chopped in the time studies. The firewood processor used had the most significant impact on chopping work productivity. In addition to the firewood processor, the stem mid-diameter, the length of the raw material, and of the firewood were also found to affect productivity. The wood feeding systems also affected productivity. If there is a feeding rack and hydraulic grapple loader available for use in chopping firewood, then it is worth using the wood feeding rack. A wood lifter is only worth using with the largest stems (over 20 cm mid-diameter) if a feeding rack cannot be used. When producing chopped firewood from small-diameter wood, i.e. with a mid-diameter less than 10 cm, the costs of chopping work were over 10 EUR solid m -3 with sawing firewood processors. The shearing firewood processor with a guillotine blade achieved a cost level of 5 EUR solid m -3 when the mid-diameter of the chopped stem was 10 cm. In addition to the raw material, the cost-efficient chopping work also requires several hundred annual operating hours with a firewood processor, which is difficult for individual firewood entrepreneurs to achieve. The operating hours of firewood processors can be increased to the required level by the joint use of the processors by a number of firewood entrepreneurs. (author)

  17. Processor farming method for multi-scale analysis of masonry structures

    Science.gov (United States)

    Krejčí, Tomáš; Koudelka, Tomáš

    2017-07-01

    This paper describes a processor farming method for a coupled heat and moisture transport in masonry using a two-level approach. The motivation for the two-level description comes from difficulties connected with masonry structures, where the size of stone blocks is much larger than the size of mortar layers and very fine finite element mesh has to be used. The two-level approach is suitable for parallel computing because nearly all computations can be performed independently with little synchronization. This approach is called processor farming. The master processor is dealing with the macro-scale level - the structure and the slave processors are dealing with a homogenization procedure on the meso-scale level which is represented by an appropriate representative volume element.

  18. On the Organization of Parallel Operation of Some Algorithms for Finding the Shortest Path on a Graph on a Computer System with Multiple Instruction Stream and Single Data Stream

    Directory of Open Access Journals (Sweden)

    V. E. Podol'skii

    2015-01-01

    Full Text Available The paper considers the implementing Bellman-Ford and Lee algorithms to find the shortest graph path on a computer system with multiple instruction stream and single data stream (MISD. The MISD computer is a computer that executes commands of arithmetic-logical processing (on the CPU and commands of structures processing (on the structures processor in parallel on a single data stream. Transformation of sequential programs into the MISD programs is a labor intensity process because it requires a stream of the arithmetic-logical processing to be manually separated from that of the structures processing. Algorithms based on the processing of data structures (e.g., algorithms on graphs show high performance on a MISD computer. Bellman-Ford and Lee algorithms for finding the shortest path on a graph are representatives of these algorithms. They are applied to robotics for automatic planning of the robot movement in-situ. Modification of Bellman-Ford and Lee algorithms for finding the shortest graph path in coprocessor MISD mode and the parallel MISD modification of these algorithms were first obtained in this article. Thus, this article continues a series of studies on the transformation of sequential algorithms into MISD ones (Dijkstra and Ford-Fulkerson 's algorithms and has a pronouncedly applied nature. The article also presents the analysis results of Bellman-Ford and Lee algorithms in MISD mode. The paper formulates the basic trends of a technique for parallelization of algorithms into arithmetic-logical processing stream and structures processing stream. Among the key areas for future research, development of the mathematical approach to provide a subsequently formalized and automated process of parallelizing sequential algorithms between the CPU and structures processor is highlighted. Among the mathematical models that can be used in future studies there are graph models of algorithms (e.g., dependency graph of a program. Due to the high

  19. Towards a Process Algebra for Shared Processors

    DEFF Research Database (Denmark)

    Buchholtz, Mikael; Andersen, Jacob; Løvengreen, Hans Henrik

    2002-01-01

    We present initial work on a timed process algebra that models sharing of processor resources allowing preemption at arbitrary points in time. This enables us to model both the functional and the timely behaviour of concurrent processes executed on a single processor. We give a refinement relation...

  20. The communication processor of TUMULT-64

    NARCIS (Netherlands)

    Smit, Gerardus Johannes Maria; Jansen, P.G.

    1988-01-01

    Tumult (Twente University MULTi-processor system) is a modular extendible multi-processor system designed and implemented at the Twente University of Technology in co-operation with Oce Nederland B.V. and the Dr. Neher Laboratories (Dutch PTT). Characteristics of the hardware are: MIMD type,

  1. Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

    OpenAIRE

    Abdul Kareem PARCHUR; Ram Asaray SINGH

    2012-01-01

    High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310). The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many ke...

  2. Effects of mathematics computer games on special education students' multiplicative reasoning ability

    NARCIS (Netherlands)

    Bakker, M.; Heuvel-Panhuizen, M.H.A.M. van den; Robitzsch, A.

    2016-01-01

    This study examined the effects of a teacher-delivered intervention with online mathematics mini-games on special education students' multiplicative reasoning ability (multiplication and division). The games involved declarative, procedural, as well as conceptual knowledge of multiplicative

  3. Development of a highly reliable CRT processor

    International Nuclear Information System (INIS)

    Shimizu, Tomoya; Saiki, Akira; Hirai, Kenji; Jota, Masayoshi; Fujii, Mikiya

    1996-01-01

    Although CRT processors have been employed by the main control board to reduce the operator's workload during monitoring, the control systems are still operated by hardware switches. For further advancement, direct controller operation through a display device is expected. A CRT processor providing direct controller operation must be as reliable as the hardware switches are. The authors are developing a new type of highly reliable CRT processor that enables direct controller operations. In this paper, we discuss the design principles behind a highly reliable CRT processor. The principles are defined by studies of software reliability and of the functional reliability of the monitoring and operation systems. The functional configuration of an advanced CRT processor is also addressed. (author)

  4. Online track processor for the CDF upgrade

    International Nuclear Information System (INIS)

    Thomson, E. J.

    2002-01-01

    A trigger track processor, called the eXtremely Fast Tracker (XFT), has been designed for the CDF upgrade. This processor identifies high transverse momentum (> 1.5 GeV/c) charged particles in the new central outer tracking chamber for CDF II. The XFT design is highly parallel to handle the input rate of 183 Gbits/s and output rate of 44 Gbits/s. The processor is pipelined and reports the result for a new event every 132 ns. The processor uses three stages: hit classification, segment finding, and segment linking. The pattern recognition algorithms for the three stages are implemented in programmable logic devices (PLDs) which allow in-situ modification of the algorithm at any time. The PLDs reside on three different types of modules. The complete system has been installed and commissioned at CDF II. An overview of the track processor and performance in CDF Run II are presented

  5. Choosing processor array configuration by performance modeling for a highly parallel linear algebra algorithm

    International Nuclear Information System (INIS)

    Littlefield, R.J.; Maschhoff, K.J.

    1991-04-01

    Many linear algebra algorithms utilize an array of processors across which matrices are distributed. Given a particular matrix size and a maximum number of processors, what configuration of processors, i.e., what size and shape array, will execute the fastest? The answer to this question depends on tradeoffs between load balancing, communication startup and transfer costs, and computational overhead. In this paper we analyze in detail one algorithm: the blocked factored Jacobi method for solving dense eigensystems. A performance model is developed to predict execution time as a function of the processor array and matrix sizes, plus the basic computation and communication speeds of the underlying computer system. In experiments on a large hypercube (up to 512 processors), this model has been found to be highly accurate (mean error ∼ 2%) over a wide range of matrix sizes (10 x 10 through 200 x 200) and processor counts (1 to 512). The model reveals, and direct experiment confirms, that the tradeoffs mentioned above can be surprisingly complex and counterintuitive. We propose decision procedures based directly on the performance model to choose configurations for fastest execution. The model-based decision procedures are compared to a heuristic strategy and shown to be significantly better. 7 refs., 8 figs., 1 tab

  6. A Divide and Conquer Strategy for Scaling Weather Simulations with Multiple Regions of Interest

    Directory of Open Access Journals (Sweden)

    Preeti Malakar

    2013-01-01

    Full Text Available Accurate and timely prediction of weather phenomena, such as hurricanes and flash floods, require high-fidelity compute intensive simulations of multiple finer regions of interest within a coarse simulation domain. Current weather applications execute these nested simulations sequentially using all the available processors, which is sub-optimal due to their sub-linear scalability. In this work, we present a strategy for parallel execution of multiple nested domain simulations based on partitioning the 2-D processor grid into disjoint rectangular regions associated with each domain. We propose a novel combination of performance prediction, processor allocation methods and topology-aware mapping of the regions on torus interconnects. Experiments on IBM Blue Gene systems using WRF show that the proposed strategies result in performance improvement of up to 33% with topology-oblivious mapping and up to additional 7% with topology-aware mapping over the default sequential strategy.

  7. A Real-Time Sound Field Rendering Processor

    Directory of Open Access Journals (Sweden)

    Tan Yiyu

    2017-12-01

    Full Text Available Real-time sound field renderings are computationally intensive and memory-intensive. Traditional rendering systems based on computer simulations suffer from memory bandwidth and arithmetic units. The computation is time-consuming, and the sample rate of the output sound is low because of the long computation time at each time step. In this work, a processor with a hybrid architecture is proposed to speed up computation and improve the sample rate of the output sound, and an interface is developed for system scalability through simply cascading many chips to enlarge the simulated area. To render a three-minute Beethoven wave sound in a small shoe-box room with dimensions of 1.28 m × 1.28 m × 0.64 m, the field programming gate array (FPGA-based prototype machine with the proposed architecture carries out the sound rendering at run-time while the software simulation with the OpenMP parallelization takes about 12.70 min on a personal computer (PC with 32 GB random access memory (RAM and an Intel i7-6800K six-core processor running at 3.4 GHz. The throughput in the software simulation is about 194 M grids/s while it is 51.2 G grids/s in the prototype machine even if the clock frequency of the prototype machine is much lower than that of the PC. The rendering processor with a processing element (PE and interfaces consumes about 238,515 gates after fabricated by the 0.18 µm processing technology from the ROHM semiconductor Co., Ltd. (Kyoto Japan, and the power consumption is about 143.8 mW.

  8. Nested dissection on a mesh-connected processor array

    International Nuclear Information System (INIS)

    Worley, P.H.; Schreiber, R.

    1986-01-01

    The authors present a parallel implementation of Gaussian elimination without pivoting using the nested dissection ordering for solving Ax=b where A is an N x N symmetric positive definite matrix. If the graph of A is a √N x √N finite element mesh then a parallel complexity of O(√N) can be achieved for Gaussian elimination with the nested dissection ordering. The authors' implementation achieves this parallel complexity on a two dimensional MIMD processor array with N processors and nearest neighbors interconnections. Thus nested dissection is a near optimal algorithm for this problem on this interconnection topology. The parallel implementation on this architecture requires 158√N + O(log/sub 2/(√N)) parallel floating point multiplications. It is faster than a Kung-Leiserson systolic array for banded matrices for N≥961, and faster than a serial implementation for N as small as 9

  9. Performance of Distributed CFAR Processors in Pearson Distributed Clutter

    Directory of Open Access Journals (Sweden)

    Messali Zoubeida

    2007-01-01

    Full Text Available This paper deals with the distributed constant false alarm rate (CFAR radar detection of targets embedded in heavy-tailed Pearson distributed clutter. In particular, we extend the results obtained for the cell averaging (CA, order statistics (OS, and censored mean level CMLD CFAR processors operating in positive alpha-stable (P&S random variables to more general situations, specifically to the presence of interfering targets and distributed CFAR detectors. The receiver operating characteristics of the greatest of (GO and the smallest of (SO CFAR processors are also determined. The performance characteristics of distributed systems are presented and compared in both homogeneous and in presence of interfering targets. We demonstrate, via simulation results, that the distributed systems when the clutter is modelled as positive alpha-stable distribution offer robustness properties against multiple target situations especially when using the "OR" fusion rule.

  10. Performance of Distributed CFAR Processors in Pearson Distributed Clutter

    Directory of Open Access Journals (Sweden)

    Faouzi Soltani

    2007-01-01

    Full Text Available This paper deals with the distributed constant false alarm rate (CFAR radar detection of targets embedded in heavy-tailed Pearson distributed clutter. In particular, we extend the results obtained for the cell averaging (CA, order statistics (OS, and censored mean level CMLD CFAR processors operating in positive alpha-stable (P&S random variables to more general situations, specifically to the presence of interfering targets and distributed CFAR detectors. The receiver operating characteristics of the greatest of (GO and the smallest of (SO CFAR processors are also determined. The performance characteristics of distributed systems are presented and compared in both homogeneous and in presence of interfering targets. We demonstrate, via simulation results, that the distributed systems when the clutter is modelled as positive alpha-stable distribution offer robustness properties against multiple target situations especially when using the “OR” fusion rule.

  11. Design and implementation of a high performance network security processor

    Science.gov (United States)

    Wang, Haixin; Bai, Guoqiang; Chen, Hongyi

    2010-03-01

    The last few years have seen many significant progresses in the field of application-specific processors. One example is network security processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from network processors (NPs). This article presents a high performance NSP system architecture implementation intended for both internet protocol security (IPSec) and secure socket layer (SSL) protocol acceleration, which are widely employed in virtual private network (VPN) and e-commerce applications. The efficient dual one-way pipelined data transfer skeleton and optimised integration scheme of the heterogenous parallel crypto engine arrays lead to a Gbps rate NSP, which is programmable with domain specific descriptor-based instructions. The descriptor-based control flow fragments large data packets and distributes them to the crypto engine arrays, which fully utilises the parallel computation resources and improves the overall system data throughput. A prototyping platform for this NSP design is implemented with a Xilinx XC3S5000 based FPGA chip set. Results show that the design gives a peak throughput for the IPSec ESP tunnel mode of 2.85 Gbps with over 2100 full SSL handshakes per second at a clock rate of 95 MHz.

  12. 3081/E processor and its on-line use

    International Nuclear Information System (INIS)

    Rankin, P.; Bricaud, B.; Gravina, M.

    1985-05-01

    The 3081/E is a second generation emulator of a mainframe IBM. One of it's applications will be to form part of the data acquisition system of the upgraded Mark II detector for data taking at the SLAC linear collider. Since the processor does not have direct connections to I/O devices a FASTBUS interface will be provided to allow communication with both SLAC Scanner Processors (which are responsible for the accumulation of data at a crate level) and the experiment's VAX 8600 mainframe. The 3081/E's will supply a significant amount of on-line computing power to the experiment (a single 3081/E is equivalent to 4 to 5 VAX 11/780's). A major advantage of the 3081/E is that program development can be done on an IBM mainframe (such as the one used for off-line analysis) which gives the programmer access to a full range of debugging tools. The processor's performance can be continually monitored by comparison of the results obtained using it to those given when the same program is run on an IBM computer. 9 refs

  13. Integration of multiple determinants in the neuronal computation of economic values.

    Science.gov (United States)

    Raghuraman, Anantha P; Padoa-Schioppa, Camillo

    2014-08-27

    Economic goods may vary on multiple dimensions (determinants). A central conjecture in decision neuroscience is that choices between goods are made by comparing subjective values computed through the integration of all relevant determinants. Previous work identified three groups of neurons in the orbitofrontal cortex (OFC) of monkeys engaged in economic choices: (1) offer value cells, which encode the value of individual offers; (2) chosen value cells, which encode the value of the chosen good; and (3) chosen juice cells, which encode the identity of the chosen good. In principle, these populations could be sufficient to generate a decision. Critically, previous work did not assess whether offer value cells (the putative input to the decision) indeed encode subjective values as opposed to physical properties of the goods, and/or whether offer value cells integrate multiple determinants. To address these issues, we recorded from the OFC while monkeys chose between risky outcomes. Confirming previous observations, three populations of neurons encoded the value of individual offers, the value of the chosen option, and the value-independent choice outcome. The activity of both offer value cells and chosen value cells encoded values defined by the integration of juice quantity and probability. Furthermore, both populations reflected the subjective risk attitude of the animals. We also found additional groups of neurons encoding the risk associated with a particular option, the risky nature of the chosen option, and whether the trial outcome was positive or negative. These results provide substantial support for the conjecture described above and for the involvement of OFC in good-based decisions. Copyright © 2014 the authors 0270-6474/14/3311583-21$15.00/0.

  14. Design, development and integration of a large scale multiple source X-ray computed tomography system

    International Nuclear Information System (INIS)

    Malcolm, Andrew A.; Liu, Tong; Ng, Ivan Kee Beng; Teng, Wei Yuen; Yap, Tsi Tung; Wan, Siew Ping; Kong, Chun Jeng

    2013-01-01

    X-ray Computed Tomography (CT) allows visualisation of the physical structures in the interior of an object without physically opening or cutting it. This technology supports a wide range of applications in the non-destructive testing, failure analysis or performance evaluation of industrial products and components. Of the numerous factors that influence the performance characteristics of an X-ray CT system the energy level in the X-ray spectrum to be used is one of the most significant. The ability of the X-ray beam to penetrate a given thickness of a specific material is directly related to the maximum available energy level in the beam. Higher energy levels allow penetration of thicker components made of more dense materials. In response to local industry demand and in support of on-going research activity in the area of 3D X-ray imaging for industrial inspection the Singapore Institute of Manufacturing Technology (SIMTech) engaged in the design, development and integration of large scale multiple source X-ray computed tomography system based on X-ray sources operating at higher energies than previously available in the Institute. The system consists of a large area direct digital X-ray detector (410 x 410 mm), a multiple-axis manipulator system, a 225 kV open tube microfocus X-ray source and a 450 kV closed tube millifocus X-ray source. The 225 kV X-ray source can be operated in either transmission or reflection mode. The body of the 6-axis manipulator system is fabricated from heavy-duty steel onto which high precision linear and rotary motors have been mounted in order to achieve high accuracy, stability and repeatability. A source-detector distance of up to 2.5 m can be achieved. The system is controlled by a proprietary X-ray CT operating system developed by SIMTech. The system currently can accommodate samples up to 0.5 x 0.5 x 0.5 m in size with weight up to 50 kg. These specifications will be increased to 1.0 x 1.0 x 1.0 m and 100 kg in future

  15. CosmoTransitions: Computing cosmological phase transition temperatures and bubble profiles with multiple fields

    Science.gov (United States)

    Wainwright, Carroll L.

    2012-09-01

    I present a numerical package (CosmoTransitions) for analyzing finite-temperature cosmological phase transitions driven by single or multiple scalar fields. The package analyzes the different vacua of a theory to determine their critical temperatures (where the vacuum energy levels are degenerate), their supercooling temperatures, and the bubble wall profiles which separate the phases and describe their tunneling dynamics. I introduce a new method of path deformation to find the profiles of both thin- and thick-walled bubbles. CosmoTransitions is freely available for public use.Program summaryProgram Title: CosmoTransitionsCatalogue identifier: AEML_v1_0Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEML_v1_0.htmlProgram obtainable from: CPC Program Library, Queen's University, Belfast, N. IrelandLicensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.htmlNo. of lines in distributed program, including test data, etc.: 8775No. of bytes in distributed program, including test data, etc.: 621096Distribution format: tar.gzProgramming language: Python.Computer: Developed on a 2009 MacBook Pro. No computer-specific optimization was performed.Operating system: Designed and tested on Mac OS X 10.6.8. Compatible with any OS with Python installed.RAM: Approximately 50 MB, mostly for loading plotting packages.Classification: 1.9, 11.1.External routines: SciPy, NumPy, matplotLibNature of problem: I describe a program to analyze early-Universe finite-temperature phase transitions with multiple scalar fields. The goal is to analyze the phase structure of an input theory, determine the amount of supercooling at each phase transition, and find the bubble-wall profiles of the nucleated bubbles that drive the transitions.Solution method: To find the bubble-wall profile, the program assumes that tunneling happens along a fixed path in field space. This reduces the equations of motion to one dimension, which can then be solved using the overshoot

  16. Contrast-enhanced ultrasound and computed tomography findings of granulomatosis with polyangiitis presenting with multiple intrarenal microaneurysms: A case report.

    Science.gov (United States)

    Kim, Youe Ree; Lee, Young Hwan; Lee, Jong-Ho; Yoon, Kwon-Ha

    Granulomatosis with polyangiitis (GPA) is a systemic disorder that affects small- and medium- sized vessels in many organs. Although the kidneys are the second most commonly involved organ in patients with GPA, its manifestation as multiple intrarenal aneurysms is rare. We report an unusual manifestation of GPA with multiple intrarenal microaneurysms, as demonstrated by contrast-enhanced ultrasound and computed tomography. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. FPGA Based Intelligent Co-operative Processor in Memory Architecture

    DEFF Research Database (Denmark)

    Ahmed, Zaki; Sotudeh, Reza; Hussain, Dil Muhammad Akbar

    2011-01-01

    benefits of PIM, a concept of Co-operative Intelligent Memory (CIM) was developed by the intelligent system group of University of Hertfordshire, based on the previously developed Co-operative Pseudo Intelligent Memory (CPIM). This paper provides an overview on previous works (CPIM, CIM) and realization......In a continuing effort to improve computer system performance, Processor-In-Memory (PIM) architecture has emerged as an alternative solution. PIM architecture incorporates computational units and control logic directly on the memory to provide immediate access to the data. To exploit the potential...

  18. Formal characterizations of FA-based string processors

    CSIR Research Space (South Africa)

    Ngassam, EK

    2010-08-01

    Full Text Available stream_source_info Ngassam_2010.pdf.txt stream_content_type text/plain stream_size 7434 Content-Encoding UTF-8 stream_name Ngassam_2010.pdf.txt Content-Type text/plain; charset=UTF-8 Formal Characterizations of FA...-based String Processors Ernest Ketcha Ngassam1,2,?, Bruce W. Watson3, and Derrick G. Kourie3 1SAP Meraka UTD, Pretoria, South Africa 2School of Computing University of South Africa Pretoria 0001 ernest.ngassam@sap.com 3Department of Computer Science...

  19. LASIP-III, a generalized processor for standard interface files

    International Nuclear Information System (INIS)

    Bosler, G.E.; O'Dell, R.D.; Resnik, W.M.

    1976-03-01

    The LASIP-III code was developed for processing Version III standard interface data files which have been specified by the Committee on Computer Code Coordination. This processor performs two distinct tasks, namely, transforming free-field format, BCD data into well-defined binary files and providing for printing and punching data in the binary files. While LASIP-III is exported as a complete free-standing code package, techniques are described for easily separating the processor into two modules, viz., one for creating the binary files and one for printing the files. The two modules can be separated into free-standing codes or they can be incorporated into other codes. Also, the LASIP-III code can be easily expanded for processing additional files, and procedures are described for such an expansion. 2 figures, 8 tables

  20. Efficacy of Code Optimization on Cache-Based Processors

    Science.gov (United States)

    VanderWijngaart, Rob F.; Saphir, William C.; Chancellor, Marisa K. (Technical Monitor)

    1997-01-01

    In this paper a number of techniques for improving the cache performance of a representative piece of numerical software is presented. Target machines are popular processors from several vendors: MIPS R5000 (SGI Indy), MIPS R8000 (SGI PowerChallenge), MIPS R10000 (SGI Origin), DEC Alpha EV4 + EV5 (Cray T3D & T3E), IBM RS6000 (SP Wide-node), Intel PentiumPro (Ames' Whitney), Sun UltraSparc (NERSC's NOW). The optimizations all attempt to increase the locality of memory accesses. But they meet with rather varied and often counterintuitive success on the different computing platforms. We conclude that it may be genuinely impossible to obtain portable performance on the current generation of cache-based machines. At the least, it appears that the performance of modern commodity processors cannot be described with parameters defining the cache alone.

  1. A Time-Composable Operating System for the Patmos Processor

    DEFF Research Database (Denmark)

    Ziccardi, Marco; Schoeberl, Martin; Vardanega, Tullio

    2015-01-01

    -composable operating system, on top of a time-composable processor, facilitates incremental development, which is highly desirable for industry. This paper makes a twofold contribution. First, we present enhancements to the Patmos processor to allow achieving time composability at the operating system level. Second......, we extend an existing time-composable operating system, TiCOS, to make best use of advanced Patmos hardware features in the pursuit of time composability.......In the last couple of decades we have witnessed a steady growth in the complexity and widespread of real-time systems. In order to master the rising complexity in the timing behaviour of those systems, rightful attention has been given to the development of time-predictable computer architectures...

  2. Color sensor and neural processor on one chip

    Science.gov (United States)

    Fiesler, Emile; Campbell, Shannon R.; Kempem, Lother; Duong, Tuan A.

    1998-10-01

    Low-cost, compact, and robust color sensor that can operate in real-time under various environmental conditions can benefit many applications, including quality control, chemical sensing, food production, medical diagnostics, energy conservation, monitoring of hazardous waste, and recycling. Unfortunately, existing color sensor are either bulky and expensive or do not provide the required speed and accuracy. In this publication we describe the design of an accurate real-time color classification sensor, together with preprocessing and a subsequent neural network processor integrated on a single complementary metal oxide semiconductor (CMOS) integrated circuit. This one-chip sensor and information processor will be low in cost, robust, and mass-producible using standard commercial CMOS processes. The performance of the chip and the feasibility of its manufacturing is proven through computer simulations based on CMOS hardware parameters. Comparisons with competing methodologies show a significantly higher performance for our device.

  3. SPP: A data base processor data communications protocol

    Science.gov (United States)

    Fishwick, P. A.

    1983-01-01

    The design and implementation of a data communications protocol for the Intel Data Base Processor (DBP) is defined. The protocol is termed SPP (Service Port Protocol) since it enables data transfer between the host computer and the DBP service port. The protocol implementation is extensible in that it is explicitly layered and the protocol functionality is hierarchically organized. Extensive trace and performance capabilities have been supplied with the protocol software to permit optional efficient monitoring of the data transfer between the host and the Intel data base processor. Machine independence was considered to be an important attribute during the design and implementation of SPP. The protocol source is fully commented and is included in Appendix A of this report.

  4. Analytical Bounds on the Threads in IXP1200 Network Processor

    OpenAIRE

    Ramakrishna, STGS; Jamadagni, HS

    2003-01-01

    Increasing link speeds have placed enormous burden on the processing requirements and the processors are expected to carry out a variety of tasks. Network Processors (NP) [1] [2] is the blanket name given to the processors, which are traded for flexibility and performance. Network Processors are offered by a number of vendors; to take the main burden of processing requirement of network related operations from the conventional processors. The Network Processors cover a spectrum of design trad...

  5. Multiple scattering of low energy rare gas ions: a comparison of experiment and computer simulation

    International Nuclear Information System (INIS)

    Heiland, W.; Taglauer, E.; Robinson, M.T.

    1976-01-01

    Some aspects of ion scattering below a few keV have been interpreted by multiple scattering. This can partly be simulated by chain or string models, where the single crystal surface is replaced by a chain of atoms. The computer program MARLOWE allows a simulation of solid-ion interaction, which is much closer to reality, e.g. the crystal is three-dimensional, includes lattice vibrations, electronic stopping power, different scattering potentials, etc. It is shown that the energy of the reflected ions as a function of the primary energy, lattice constant, impact angle and scattering angle can be understood within the string model. These results of the string model are confirmed by the MARLOWE calculations. For an interpretation of the measured intensities the simple string model is insufficient, whereas with MARLOWE reasonable agreement with experimental data may be achieved, if the thermal vibrations of the lattice atoms are taken into account. The experimental data include Ne + →Ni, Ne + →Ag and preliminary data on Ne + →W. The screening parameters of the scattering potentials are estimated for these ion-atom combinations. The results allow some conclusions about surface Debye temperatures. (Auth.)

  6. 320-row detector computed tomography angiography findings of a case with multiple

    International Nuclear Information System (INIS)

    Akay, S.; Bozlar, U.; Demirkol, S.; Tasar, M.

    2012-01-01

    Full text: Introduction: Computed tomography angiography (CTA) with three-dimensional imaging capability is a very reliable imaging modality for the evaluation of the coronary arteries. Objectives and tasks: To discuss the 320-row detector CTA findings of a case with multiple coronary artery course anomaly. Materials and methods: A 46-year-old man with palpitation, admitted to Cardiology department of our hospital. On electrocardiography, polymorphic ventricular early beats were observed. The patient was referred to Radiology department for CTA examination in terms of probable coronary artery anomaly. Results: On CTA, left main coronary artery was short. The bridging causes nearly 75% luminal stenosis was observed in the middle part of left descending artery. Circumflex artery was continuing as the first obtuse margin and this branch was separating to four branches in the middle part. They were coursing subepicardially in the middle and distal part. Right main coronary artery has also subepicardial course in its middle and distal part. Conclusion: Myocardial bridging is not a rare situation in routine clinical practice. But bridging in all of the three coronary arteries is very uncommon. Multidetector CTA is an effective and non-invasive imaging modality for understanding the normal anatomy and detecting the congenital anomalies of the coronary arteries

  7. Multiple-instance learning for computer-aided detection of tuberculosis

    Science.gov (United States)

    Melendez, J.; Sánchez, C. I.; Philipsen, R. H. H. M.; Maduskar, P.; van Ginneken, B.

    2014-03-01

    Detection of tuberculosis (TB) on chest radiographs (CXRs) is a hard problem. Therefore, to help radiologists or even take their place when they are not available, computer-aided detection (CAD) systems are being developed. In order to reach a performance comparable to that of human experts, the pattern recognition algorithms of these systems are typically trained on large CXR databases that have been manually annotated to indicate the abnormal lung regions. However, manually outlining those regions constitutes a time-consuming process that, besides, is prone to inconsistencies and errors introduced by interobserver variability and the absence of an external reference standard. In this paper, we investigate an alternative pattern classi cation method, namely multiple-instance learning (MIL), that does not require such detailed information for a CAD system to be trained. We have applied this alternative approach to a CAD system aimed at detecting textural lesions associated with TB. Only the case (or image) condition (normal or abnormal) was provided in the training stage. We compared the resulting performance with those achieved by several variations of a conventional system trained with detailed annotations. A database of 917 CXRs was constructed for experimentation. It was divided into two roughly equal parts that were used as training and test sets. The area under the receiver operating characteristic curve was utilized as a performance measure. Our experiments show that, by applying the investigated MIL approach, comparable results as with the aforementioned conventional systems are obtained in most cases, without requiring condition information at the lesion level.

  8. Dataflow formalisation of real-time streaming applications on a composable and predictable multi-processor SOC

    NARCIS (Netherlands)

    Nelson, A.T.; Goossens, K.G.W.; Akesson, K.B.

    2015-01-01

    Embedded systems often contain multiple applications, some of which have real-time requirements and whose performance must be guaranteed. To efficiently execute applications, modern embedded systems contain Globally Asynchronous Locally Synchronous (GALS) processors, network on chip, DRAM and SRAM

  9. Computerized nipple identification for multiple image analysis in computer-aided diagnosis

    International Nuclear Information System (INIS)

    Zhou Chuan; Chan Heangping; Paramagul, Chintana; Roubidoux, Marilyn A.; Sahiner, Berkman; Hadjiiski, Labomir M.; Petrick, Nicholas

    2004-01-01

    Correlation of information from multiple-view mammograms (e.g., MLO and CC views, bilateral views, or current and prior mammograms) can improve the performance of breast cancer diagnosis by radiologists or by computer. The nipple is a reliable and stable landmark on mammograms for the registration of multiple mammograms. However, accurate identification of nipple location on mammograms is challenging because of the variations in image quality and in the nipple projections, resulting in some nipples being nearly invisible on the mammograms. In this study, we developed a computerized method to automatically identify the nipple location on digitized mammograms. First, the breast boundary was obtained using a gradient-based boundary tracking algorithm, and then the gray level profiles along the inside and outside of the boundary were identified. A geometric convergence analysis was used to limit the nipple search to a region of the breast boundary. A two-stage nipple detection method was developed to identify the nipple location using the gray level information around the nipple, the geometric characteristics of nipple shapes, and the texture features of glandular tissue or ducts which converge toward the nipple. At the first stage, a rule-based method was designed to identify the nipple location by detecting significant changes of intensity along the gray level profiles inside and outside the breast boundary and the changes in the boundary direction. At the second stage, a texture orientation-field analysis was developed to estimate the nipple location based on the convergence of the texture pattern of glandular tissue or ducts towards the nipple. The nipple location was finally determined from the detected nipple candidates by a rule-based confidence analysis. In this study, 377 and 367 randomly selected digitized mammograms were used for training and testing the nipple detection algorithm, respectively. Two experienced radiologists identified the nipple locations

  10. The ATLAS fast tracker processor design

    CERN Document Server

    Volpi, Guido; Albicocco, Pietro; Alison, John; Ancu, Lucian Stefan; Anderson, James; Andari, Nansi; Andreani, Alessandro; Andreazza, Attilio; Annovi, Alberto; Antonelli, Mario; Asbah, Needa; Atkinson, Markus; Baines, J; Barberio, Elisabetta; Beccherle, Roberto; Beretta, Matteo; Biesuz, Nicolo Vladi; Blair, R E; Bogdan, Mircea; Boveia, Antonio; Britzger, Daniel; Bryant, Partick; Burghgrave, Blake; Calderini, Giovanni; Camplani, Alessandra; Cavaliere, Viviana; Cavasinni, Vincenzo; Chakraborty, Dhiman; Chang, Philip; Cheng, Yangyang; Citraro, Saverio; Citterio, Mauro; Crescioli, Francesco; Dawe, Noel; Dell'Orso, Mauro; Donati, Simone; Dondero, Paolo; Drake, G; Gadomski, Szymon; Gatta, Mauro; Gentsos, Christos; Giannetti, Paola; Gkaitatzis, Stamatios; Gramling, Johanna; Howarth, James William; Iizawa, Tomoya; Ilic, Nikolina; Jiang, Zihao; Kaji, Toshiaki; Kasten, Michael; Kawaguchi, Yoshimasa; Kim, Young Kee; Kimura, Naoki; Klimkovich, Tatsiana; Kolb, Mathis; Kordas, K; Krizka, Karol; Kubota, T; Lanza, Agostino; Li, Ho Ling; Liberali, Valentino; Lisovyi, Mykhailo; Liu, Lulu; Love, Jeremy; Luciano, Pierluigi; Luongo, Carmela; Magalotti, Daniel; Maznas, Ioannis; Meroni, Chiara; Mitani, Takashi; Nasimi, Hikmat; Negri, Andrea; Neroutsos, Panos; Neubauer, Mark; Nikolaidis, Spiridon; Okumura, Y; Pandini, Carlo; Petridou, Chariclia; Piendibene, Marco; Proudfoot, James; Rados, Petar Kevin; Roda, Chiara; Rossi, Enrico; Sakurai, Yuki; Sampsonidis, Dimitrios; Saxon, James; Schmitt, Stefan; Schoening, Andre; Shochet, Mel; Shoijaii, Jafar; Soltveit, Hans Kristian; Sotiropoulou, Calliope-Louisa; Stabile, Alberto; Swiatlowski, Maximilian J; Tang, Fukun; Taylor, Pierre Thor Elliot; Testa, Marianna; Tompkins, Lauren; Vercesi, V; Wang, Rui; Watari, Ryutaro; Zhang, Jianhong; Zeng, Jian Cong; Zou, Rui; Bertolucci, Federico

    2015-01-01

    The extended use of tracking information at the trigger level in the LHC is crucial for the trigger and data acquisition (TDAQ) system to fulfill its task. Precise and fast tracking is important to identify specific decay products of the Higgs boson or new phenomena, as well as to distinguish the contributions coming from the many collisions that occur at every bunch crossing. However, track reconstruction is among the most demanding tasks performed by the TDAQ computing farm; in fact, complete reconstruction at full Level-1 trigger accept rate (100 kHz) is not possible. In order to overcome this limitation, the ATLAS experiment is planning the installation of a dedicated processor, the Fast Tracker (FTK), which is aimed at achieving this goal. The FTK is a pipeline of high performance electronics, based on custom and commercial devices, which is expected to reconstruct, with high resolution, the trajectories of charged-particle tracks with a transverse momentum above 1 GeV, using the ATLAS inner tracker info...

  11. Development of Innovative Design Processor

    International Nuclear Information System (INIS)

    Park, Y.S.; Park, C.O.

    2004-01-01

    The nuclear design analysis requires time-consuming and erroneous model-input preparation, code run, output analysis and quality assurance process. To reduce human effort and improve design quality and productivity, Innovative Design Processor (IDP) is being developed. Two basic principles of IDP are the document-oriented design and the web-based design. The document-oriented design is that, if the designer writes a design document called active document and feeds it to a special program, the final document with complete analysis, table and plots is made automatically. The active documents can be written with ordinary HTML editors or created automatically on the web, which is another framework of IDP. Using the proper mix-up of server side and client side programming under the LAMP (Linux/Apache/MySQL/PHP) environment, the design process on the web is modeled as a design wizard style so that even a novice designer makes the design document easily. This automation using the IDP is now being implemented for all the reload design of Korea Standard Nuclear Power Plant (KSNP) type PWRs. The introduction of this process will allow large reduction in all reload design efforts of KSNP and provide a platform for design and R and D tasks of KNFC. (authors)

  12. Onboard spectral imager data processor

    Science.gov (United States)

    Otten, Leonard J.; Meigs, Andrew D.; Franklin, Abraham J.; Sears, Robert D.; Robison, Mark W.; Rafert, J. Bruce; Fronterhouse, Donald C.; Grotbeck, Ronald L.

    1999-10-01

    Previous papers have described the concept behind the MightySat II.1 program, the satellite's Fourier Transform imaging spectrometer's optical design, the design for the spectral imaging payload, and its initial qualification testing. This paper discusses the on board data processing designed to reduce the amount of downloaded data by an order of magnitude and provide a demonstration of a smart spaceborne spectral imaging sensor. Two custom components, a spectral imager interface 6U VME card that moves data at over 30 MByte/sec, and four TI C-40 processors mounted to a second 6U VME and daughter card, are used to adapt the sensor to the spacecraft and provide the necessary high speed processing. A system architecture that offers both on board real time image processing and high-speed post data collection analysis of the spectral data has been developed. In addition to the on board processing of the raw data into a usable spectral data volume, one feature extraction technique has been incorporated. This algorithm operates on the basic interferometric data. The algorithm is integrated within the data compression process to search for uploadable feature descriptions.

  13. Hardware trigger processor for the MDT system

    CERN Document Server

    AUTHOR|(SzGeCERN)757787; The ATLAS collaboration; Hazen, Eric; Butler, John; Black, Kevin; Gastler, Daniel Edward; Ntekas, Konstantinos; Taffard, Anyes; Martinez Outschoorn, Verena; Ishino, Masaya; Okumura, Yasuyuki

    2017-01-01

    We are developing a low-latency hardware trigger processor for the Monitored Drift Tube system in the Muon spectrometer. The processor will fit candidate Muon tracks in the drift tubes in real time, improving significantly the momentum resolution provided by the dedicated trigger chambers. We present a novel pure-FPGA implementation of a Legendre transform segment finder, an associative-memory alternative implementation, an ARM (Zynq) processor-based track fitter, and compact ATCA carrier board architecture. The ATCA architecture is designed to allow a modular, staged approach to deployment of the system and exploration of alternative technologies.

  14. MP CBM-Z V1.0: design for a new CBM-Z gas-phase chemical mechanism architecture for next generation processors

    OpenAIRE

    Wang, Hui; Lin, Junmin; Wu, Qizhong; Chen, Huansheng; Tang, Xiao; Wang, Zifa; Chen, Xueshun; Cheng, Huaqiong; Wang, Lanning

    2018-01-01

    Precise and rapid air quality simulation and forecasting are limited by the computation performance of the air quality model, and the gas-phase chemistry module is the most time-consuming function in the air quality model. In this study, we designed a new framework for the widely used Carbon Bond Mechanism Z (CBM-Z) gas-phase chemical kinetics kernel to adapt the Single Instruction Multiple Data (SIMD) technology in the next-generation processors for improving its calculation performance. The...

  15. Optimizing the performance of streaming numerical kernels on the IBM Blue Gene/P PowerPC 450 processor

    KAUST Repository

    Malas, Tareq Majed Yasin; Ahmadia, Aron; Brown, Jed; Gunnels, John A.; Keyes, David E.

    2012-01-01

    Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the solution

  16. PERFORMANCE EVALUATION OF OR1200 PROCESSOR WITH EVOLUTIONARY PARALLEL HPRC USING GEP

    Directory of Open Access Journals (Sweden)

    R. Maheswari

    2012-04-01

    Full Text Available In this fast computing era, most of the embedded system requires more computing power to complete the complex function/ task at the lesser amount of time. One way to achieve this is by boosting up the processor performance which allows processor core to run faster. This paper presents a novel technique of increasing the performance by parallel HPRC (High Performance Reconfigurable Computing in the CPU/DSP (Digital Signal Processor unit of OR1200 (Open Reduced Instruction Set Computer (RISC 1200 using Gene Expression Programming (GEP an evolutionary programming model. OR1200 is a soft-core RISC processor of the Intellectual Property cores that can efficiently run any modern operating system. In the manufacturing process of OR1200 a parallel HPRC is placed internally in the Integer Execution Pipeline unit of the CPU/DSP core to increase the performance. The GEP Parallel HPRC is activated /deactivated by triggering the signals i HPRC_Gene_Start ii HPRC_Gene_End. A Verilog HDL(Hardware Description language functional code for Gene Expression Programming parallel HPRC is developed and synthesised using XILINX ISE in the former part of the work and a CoreMark processor core benchmark is used to test the performance of the OR1200 soft core in the later part of the work. The result of the implementation ensures the overall speed-up increased to 20.59% by GEP based parallel HPRC in the execution unit of OR1200.

  17. Effects of Mathematics Computer Games on Special Education Students' Multiplicative Reasoning Ability

    Science.gov (United States)

    Bakker, Marjoke; van den Heuvel-Panhuizen, Marja; Robitzsch, Alexander

    2016-01-01

    This study examined the effects of a teacher-delivered intervention with online mathematics mini-games on special education students' multiplicative reasoning ability (multiplication and division). The games involved declarative, procedural, as well as conceptual knowledge of multiplicative relations, and were accompanied with teacher-led lessons…

  18. Effects of mathematics computer games on special education students’ multiplicative reasoning ability

    NARCIS (Netherlands)

    Bakker, M.|info:eu-repo/dai/nl/355337770; Van den Heuvel-Panhuizen, M.|info:eu-repo/dai/nl/069266255; Robitzsch, Alexander

    2016-01-01

    This study examined the effects of a teacher-delivered intervention with online math-ematics mini-games on special education students’ multiplicative reasoning ability(multiplication and division). The games involved declarative, procedural, as well asconceptual knowledge of multiplicative

  19. Photonics and Fiber Optics Processor Lab

    Data.gov (United States)

    Federal Laboratory Consortium — The Photonics and Fiber Optics Processor Lab develops, tests and evaluates high speed fiber optic network components as well as network protocols. In addition, this...

  20. Broadband set-top box using MAP-CA processor

    Science.gov (United States)

    Bush, John E.; Lee, Woobin; Basoglu, Chris

    2001-12-01

    Advances in broadband access are expected to exert a profound impact in our everyday life. It will be the key to the digital convergence of communication, computer and consumer equipment. A common thread that facilitates this convergence comprises digital media and Internet. To address this market, Equator Technologies, Inc., is developing the Dolphin broadband set-top box reference platform using its MAP-CA Broadband Signal ProcessorT chip. The Dolphin reference platform is a universal media platform for display and presentation of digital contents on end-user entertainment systems. The objective of the Dolphin reference platform is to provide a complete set-top box system based on the MAP-CA processor. It includes all the necessary hardware and software components for the emerging broadcast and the broadband digital media market based on IP protocols. Such reference design requires a broadband Internet access and high-performance digital signal processing. By using the MAP-CA processor, the Dolphin reference platform is completely programmable, allowing various codecs to be implemented in software, such as MPEG-2, MPEG-4, H.263 and proprietary codecs. The software implementation also enables field upgrades to keep pace with evolving technology and industry demands.

  1. Further computer appreciation

    CERN Document Server

    Fry, T F

    2014-01-01

    Further Computer Appreciation is a comprehensive cover of the principles and aspects in computer appreciation. The book starts by describing the development of computers from the first to the third computer generations, to the development of processors and storage systems, up to the present position of computers and future trends. The text tackles the basic elements, concepts and functions of digital computers, computer arithmetic, input media and devices, and computer output. The basic central processor functions, data storage and the organization of data by classification of computer files,

  2. Point and track-finding processors for multiwire chambers

    CERN Document Server

    Hansroul, M

    1973-01-01

    The hardware processors described below are designed to be used in conjunction with multi-wire chambers. They have the characteristic of being based on computational methods in contrast to analogue procedures. In a sense, they are hardware implementations of computer programs. But, being specially designed for their purpose, they are free of the restrictions imposed by the architecture of the computer on which the equivalent program is to run. The parallelism inherent in the algorithms can thus be fully exploited. Combined with the use of fast access scratch-pad memories and the non-sequential nature of the control program, the parallelism accounts for the fact that these processors are expected to execute 2-3 orders of magnitude faster than the equivalent Fortran programs on a CDC 7600 or 6600. As a consequence, methods which are simple and straightforward, but which are impractical because they require an exorbitant amount of computer time can on the contrary be very attractive for hardware implementation. ...

  3. Real time monitoring of electron processors

    International Nuclear Information System (INIS)

    Nablo, S.V.; Kneeland, D.R.; McLaughlin, W.L.

    1995-01-01

    A real time radiation monitor (RTRM) has been developed for monitoring the dose rate (current density) of electron beam processors. The system provides continuous monitoring of processor output, electron beam uniformity, and an independent measure of operating voltage or electron energy. In view of the device's ability to replace labor-intensive dosimetry in verification of machine performance on a real-time basis, its application to providing archival performance data for in-line processing is discussed. (author)

  4. Advanced topics in security computer system design

    International Nuclear Information System (INIS)

    Stachniak, D.E.; Lamb, W.R.

    1989-01-01

    The capability, performance, and speed of contemporary computer processors, plus the associated performance capability of the operating systems accommodating the processors, have enormously expanded the scope of possibilities for designers of nuclear power plant security computer systems. This paper addresses the choices that could be made by a designer of security computer systems working with contemporary computers and describes the improvement in functionality of contemporary security computer systems based on an optimally chosen design. Primary initial considerations concern the selection of (a) the computer hardware and (b) the operating system. Considerations for hardware selection concern processor and memory word length, memory capacity, and numerous processor features

  5. Challenges in computational materials science: Multiple scales, multi-physics and evolving discontinuities

    NARCIS (Netherlands)

    Borst, de R.

    2008-01-01

    Novel experimental possibilities together with improvements in computer hardware as well as new concepts in computational mathematics and mechanics in particular multiscale methods are now, in principle, making it possible to derive and compute phenomena and material parameters at a macroscopic

  6. Roles of computed tomography and [18F]fluorodeoxyglucose-positron emission tomography/computed tomography in the characterization of multiple solitary solid lung nodules

    OpenAIRE

    Travaini, LL; Trifirò, G; Vigna, PD; Veronesi, G; De Pas, TM; Spaggiari, L; Paganelli, G; Bellomi, M

    2012-01-01

    The purpose of this study is to compare the performance of multidetector computed tomography (CT) and positron emission tomography/CT (PET/CT) with [18F]fluorodeoxyglucose in the diagnosis of multiple solitary lung nodules in 14 consecutive patients with suspicious lung cancer. CT and PET/CT findings were reviewed by a radiologist and nuclear medicine physician, respectively, blinded to the pathological diagnoses of lung cancer, considering nodule size, shape, and location (CT) and maximum st...

  7. Multicore Challenges and Benefits for High Performance Scientific Computing

    Directory of Open Access Journals (Sweden)

    Ida M.B. Nielsen

    2008-01-01

    Full Text Available Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexity of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.

  8. An FPGA computing demo core for space charge simulation

    International Nuclear Information System (INIS)

    Wu, Jinyuan; Huang, Yifei

    2009-01-01

    In accelerator physics, space charge simulation requires large amount of computing power. In a particle system, each calculation requires time/resource consuming operations such as multiplications, divisions, and square roots. Because of the flexibility of field programmable gate arrays (FPGAs), we implemented this task with efficient use of the available computing resources and completely eliminated non-calculating operations that are indispensable in regular micro-processors (e.g. instruction fetch, instruction decoding, etc.). We designed and tested a 16-bit demo core for computing Coulomb's force in an Altera Cyclone II FPGA device. To save resources, the inverse square-root cube operation in our design is computed using a memory look-up table addressed with nine to ten most significant non-zero bits. At 200 MHz internal clock, our demo core reaches a throughput of 200 M pairs/s/core, faster than a typical 2 GHz micro-processor by about a factor of 10. Temperature and power consumption of FPGAs were also lower than those of micro-processors. Fast and convenient, FPGAs can serve as alternatives to time-consuming micro-processors for space charge simulation.

  9. An FPGA computing demo core for space charge simulation

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Jinyuan; Huang, Yifei; /Fermilab

    2009-01-01

    In accelerator physics, space charge simulation requires large amount of computing power. In a particle system, each calculation requires time/resource consuming operations such as multiplications, divisions, and square roots. Because of the flexibility of field programmable gate arrays (FPGAs), we implemented this task with efficient use of the available computing resources and completely eliminated non-calculating operations that are indispensable in regular micro-processors (e.g. instruction fetch, instruction decoding, etc.). We designed and tested a 16-bit demo core for computing Coulomb's force in an Altera Cyclone II FPGA device. To save resources, the inverse square-root cube operation in our design is computed using a memory look-up table addressed with nine to ten most significant non-zero bits. At 200 MHz internal clock, our demo core reaches a throughput of 200 M pairs/s/core, faster than a typical 2 GHz micro-processor by about a factor of 10. Temperature and power consumption of FPGAs were also lower than those of micro-processors. Fast and convenient, FPGAs can serve as alternatives to time-consuming micro-processors for space charge simulation.

  10. [Construction and analysis of a monitoring system with remote real-time multiple physiological parameters based on cloud computing].

    Science.gov (United States)

    Zhu, Lingyun; Li, Lianjie; Meng, Chunyan

    2014-12-01

    There have been problems in the existing multiple physiological parameter real-time monitoring system, such as insufficient server capacity for physiological data storage and analysis so that data consistency can not be guaranteed, poor performance in real-time, and other issues caused by the growing scale of data. We therefore pro posed a new solution which was with multiple physiological parameters and could calculate clustered background data storage and processing based on cloud computing. Through our studies, a batch processing for longitudinal analysis of patients' historical data was introduced. The process included the resource virtualization of IaaS layer for cloud platform, the construction of real-time computing platform of PaaS layer, the reception and analysis of data stream of SaaS layer, and the bottleneck problem of multi-parameter data transmission, etc. The results were to achieve in real-time physiological information transmission, storage and analysis of a large amount of data. The simulation test results showed that the remote multiple physiological parameter monitoring system based on cloud platform had obvious advantages in processing time and load balancing over the traditional server model. This architecture solved the problems including long turnaround time, poor performance of real-time analysis, lack of extensibility and other issues, which exist in the traditional remote medical services. Technical support was provided in order to facilitate a "wearable wireless sensor plus mobile wireless transmission plus cloud computing service" mode moving towards home health monitoring for multiple physiological parameter wireless monitoring.

  11. Negative base encoding in optical linear algebra processors

    Science.gov (United States)

    Perlee, C.; Casasent, D.

    1986-01-01

    In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary. Attention is presently given to negative base encoding, treating base -2 initially, and then showing that the negative base system can be readily extended to any radix. In general, negative base encoding in optical linear algebra processors represents a more efficient technique than either sign magnitude or 2's complement encoding, when the additions of digitally encoded products are performed in parallel.

  12. MEDUSA - An overset grid flow solver for network-based parallel computer systems

    Science.gov (United States)

    Smith, Merritt H.; Pallis, Jani M.

    1993-01-01

    Continuing improvement in processing speed has made it feasible to solve the Reynolds-Averaged Navier-Stokes equations for simple three-dimensional flows on advanced workstations. Combining multiple workstations into a network-based heterogeneous parallel computer allows the application of programming principles learned on MIMD (Multiple Instruction Multiple Data) distributed memory parallel computers to the solution of larger problems. An overset-grid flow solution code has been developed which uses a cluster of workstations as a network-based parallel computer. Inter-process communication is provided by the Parallel Virtual Machine (PVM) software. Solution speed equivalent to one-third of a Cray-YMP processor has been achieved from a cluster of nine commonly used engineering workstation processors. Load imbalance and communication overhead are the principal impediments to parallel efficiency in this application.

  13. ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

    Directory of Open Access Journals (Sweden)

    Kim Taeho

    2010-09-01

    Full Text Available Abstract Background There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC environment with a greatly extended data storage capacity. Results We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA. The new editing option and the graphical user interface (GUI provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets. Conclusions ClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1 the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2 Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3 Support for both single PC and distributed cluster systems.

  14. Development of an Advanced Digital Reactor Protection System Using Diverse Dual Processors to Prevent Common-Mode Failure

    International Nuclear Information System (INIS)

    Shin, Hyun Kook; Nam, Sang Ku; Sohn, Se Do; Chang, Hoon Seon

    2003-01-01

    The advanced digital reactor protection system (ADRPS) with diverse dual processors has been developed to prevent common-mode failure (CMF). The principle of diversity is applied to both hardware design and software design. For hardware diversity, two different types of CPUs are used for the bistable processor and local coincidence logic (LCL) processor. The Versa Module Eurocard-based single board computers are used for the CPU hardware platforms. The QNX operating system and the VxWorks operating system were selected for software diversity. Functional diversity is also applied to the input and output modules, and to the algorithm in the bistable processors and LCL processors. The characteristics of the newly developed digital protection system are described together with the preventive capability against CMF. Also, system reliability analysis is discussed. The evaluation results show that the ADRPS has a good preventive capability against the CMF and is a highly reliable reactor protection system

  15. Demonstration of two-qubit algorithms with a superconducting quantum processor.

    Science.gov (United States)

    DiCarlo, L; Chow, J M; Gambetta, J M; Bishop, Lev S; Johnson, B R; Schuster, D I; Majer, J; Blais, A; Frunzio, L; Girvin, S M; Schoelkopf, R J

    2009-07-09

    Quantum computers, which harness the superposition and entanglement of physical states, could outperform their classical counterparts in solving problems with technological impact-such as factoring large numbers and searching databases. A quantum processor executes algorithms by applying a programmable sequence of gates to an initialized register of qubits, which coherently evolves into a final state containing the result of the computation. Building a quantum processor is challenging because of the need to meet simultaneously requirements that are in conflict: state preparation, long coherence times, universal gate operations and qubit readout. Processors based on a few qubits have been demonstrated using nuclear magnetic resonance, cold ion trap and optical systems, but a solid-state realization has remained an outstanding challenge. Here we demonstrate a two-qubit superconducting processor and the implementation of the Grover search and Deutsch-Jozsa quantum algorithms. We use a two-qubit interaction, tunable in strength by two orders of magnitude on nanosecond timescales, which is mediated by a cavity bus in a circuit quantum electrodynamics architecture. This interaction allows the generation of highly entangled states with concurrence up to 94 per cent. Although this processor constitutes an important step in quantum computing with integrated circuits, continuing efforts to increase qubit coherence times, gate performance and register size will be required to fulfil the promise of a scalable technology.

  16. Discussion paper for a highly parallel array processor-based machine

    International Nuclear Information System (INIS)

    Hagstrom, R.; Bolotin, G.; Dawson, J.

    1984-01-01

    The architectural plant for a quickly realizable implementation of a highly parallel special-purpose computer system with peak performance in the range of 6 billion floating point operations per second is discussed. The architecture is suitable to Lattice Gauge theoretical computations of fundamental physics interest and may be applicable to a range of other problems which deal with numerically intensive computational problems. The plan is quickly realizable because it employs a maximum of commercially available hardware subsystems and because the architecture is software-transparent to the individual processors, allowing straightforward re-use of whatever commercially available operating-systems and support software that is suitable to run on the commercially-produced processors. A tiny prototype instrument, designed along this architecture has already operated. A few elementary examples of programs which can run efficiently are presented. The large machine which the authors would propose to build would be based upon a highly competent array-processor, the ST-100 Array Processor, and specific design possibilities are discussed. The first step toward realizing this plan practically is to install a single ST-100 to allow algorithm development to proceed while a demonstration unit is built using two of the ST-100 Array Processors

  17. BWR thermohydraulics simulation on the AD-10 peripheral processor

    International Nuclear Information System (INIS)

    Wulff, W.; Cheng, H.S.; Lekach, S.V.; Mallen, A.N.

    1983-01-01

    This presentation demonstrates the feasibility of simulating plant transients and severe abnormal transients in nuclear power plants at much faster than real-time computing speeds in a low-cost, dedicated, interactive minicomputer. This is achieved by implementing advanced modeling techniques in modern, special-purpose peripheral processors for high-speed system simulation. The results of this demonstration will impact safety analyses and parametric studies, studies on operator responses and control system failures and it will make possible the continuous on-line monitoring of plant performance and the detection and diagnosis of system or component failures

  18. Sn transport calculations on vector and parallel processors

    International Nuclear Information System (INIS)

    Rhoades, W.A.; Childs, R.L.

    1987-01-01

    The transport of radiation from the source to the location of people or equipment gives rise to some of the most challenging of calculations. A problem may involve as many as a billion unknowns, each evaluated several times to resolve interdependence. Such calculations run many hours on a Cray computer, and a typical study involves many such calculations. This paper will discuss the steps taken to vectorize the DOT code, which solves transport problems in two space dimensions (2-D); the extension of this code to 3-D; and the plans for extension to parallel processors

  19. Computational Study of Shock/Plume Interactions Between Multiple Jets in Supersonic Crossflow

    Science.gov (United States)

    Tylczak, Erik B.

    The interaction of multiple jets in supersonic crossflow is simulated using hybrid Reynolds- Averaged Navier Stokes and Large Eddy Simulation turbulence models. The blockage of a jet generates a curved bow shock, and in multi-jet flows, each shock impinges on the other fuel plumes. The curved nature of each shock generates vorticity directly, and the impingement of each shock on the vortical structures within the adjacent fuel plumes strengthens vortical structures already present. These stirring motions are the major driver of fuel-air mixing, and so mixing enhancement is predicted to occur in multi-port configurations. The primary geometry considered is that of the combustion duct at the Calspan- University of Buffalo Research Center 48" Large Energy National Shock (LENS) tunnel. This geometry was developed to be representative of the geometry and flow physics of the Flight 2 test vehicle of the Hypersonic International Flight Research Experimenta- tion Program (HiFIRE-2). This geometry takes the form of a symmetric pair of external compression ramps that feed an isolator of approximately 4" x 1" cross-section. Nine interdigitated flush-wall injectors, four on one wall and five on the other, inject hydrogen at an angle of 30 degrees to the freestream. Two freestream flow conditions are consid- ered: approximately Mach 7.2 at a static temperature of 214K and a density of 0.039 kg/m3 for the five-injector case, and approximately Mach 8.9 at a static temperature of 167K and density of 0.014 kg/m 3 for the nine-injector case. Validation computations are performed on a single-port experiment with an imposed shock wave. Unsteady calculations are performed on five-port and nine-port configura- tions, and the five-port configuration is compared to calculations performed with only a single active port on the same geometry. Analysis of statistical data demonstrates enhanced mixing in the multi-port configurations in regions where shock impingement occurs.

  20. Performance evaluation for volumetric segmentation of multiple sclerosis lesions using MATLAB and computing engine in the graphical processing unit (GPU)

    Science.gov (United States)

    Le, Anh H.; Park, Young W.; Ma, Kevin; Jacobs, Colin; Liu, Brent J.

    2010-03-01

    Multiple Sclerosis (MS) is a progressive neurological disease affecting myelin pathways in the brain. Multiple lesions in the white matter can cause paralysis and severe motor disabilities of the affected patient. To solve the issue of inconsistency and user-dependency in manual lesion measurement of MRI, we have proposed a 3-D automated lesion quantification algorithm to enable objective and efficient lesion volume tracking. The computer-aided detection (CAD) of MS, written in MATLAB, utilizes K-Nearest Neighbors (KNN) method to compute the probability of lesions on a per-voxel basis. Despite the highly optimized algorithm of imaging processing that is used in CAD development, MS CAD integration and evaluation in clinical workflow is technically challenging due to the requirement of high computation rates and memory bandwidth in the recursive nature of the algorithm. In this paper, we present the development and evaluation of using a computing engine in the graphical processing unit (GPU) with MATLAB for segmentation of MS lesions. The paper investigates the utilization of a high-end GPU for parallel computing of KNN in the MATLAB environment to improve algorithm performance. The integration is accomplished using NVIDIA's CUDA developmental toolkit for MATLAB. The results of this study will validate the practicality and effectiveness of the prototype MS CAD in a clinical setting. The GPU method may allow MS CAD to rapidly integrate in an electronic patient record or any disease-centric health care system.

  1. Wing-Body Aeroelasticity Using Finite-Difference Fluid/Finite-Element Structural Equations on Parallel Computers

    Science.gov (United States)

    Byun, Chansup; Guruswamy, Guru P.; Kutler, Paul (Technical Monitor)

    1994-01-01

    In recent years significant advances have been made for parallel computers in both hardware and software. Now parallel computers have become viable tools in computational mechanics. Many application codes developed on conventional computers have been modified to benefit from parallel computers. Significant speedups in some areas have been achieved by parallel computations. For single-discipline use of both fluid dynamics and structural dynamics, computations have been made on wing-body configurations using parallel computers. However, only a limited amount of work has been completed in combining these two disciplines for multidisciplinary applications. The prime reason is the increased level of complication associated with a multidisciplinary approach. In this work, procedures to compute aeroelasticity on parallel computers using direct coupling of fluid and structural equations will be investigated for wing-body configurations. The parallel computer selected for computations is an Intel iPSC/860 computer which is a distributed-memory, multiple-instruction, multiple data (MIMD) computer with 128 processors. In this study, the computational efficiency issues of parallel integration of both fluid and structural equations will be investigated in detail. The fluid and structural domains will be modeled using finite-difference and finite-element approaches, respectively. Results from the parallel computer will be compared with those from the conventional computers using a single processor. This study will provide an efficient computational tool for the aeroelastic analysis of wing-body structures on MIMD type parallel computers.

  2. Architectural design and analysis of a programmable image processor

    International Nuclear Information System (INIS)

    Siyal, M.Y.; Chowdhry, B.S.; Rajput, A.Q.K.

    2003-01-01

    In this paper we present an architectural design and analysis of a programmable image processor, nicknamed Snake. The processor was designed with a high degree of parallelism to speed up a range of image processing operations. Data parallelism found in array processors has been included into the architecture of the proposed processor. The implementation of commonly used image processing algorithms and their performance evaluation are also discussed. The performance of Snake is also compared with other types of processor architectures. (author)

  3. Encryption and display of multiple-image information using computer-generated holography with modified GS iterative algorithm

    Science.gov (United States)

    Xiao, Dan; Li, Xiaowei; Liu, Su-Juan; Wang, Qiong-Hua

    2018-03-01

    In this paper, a new scheme of multiple-image encryption and display based on computer-generated holography (CGH) and maximum length cellular automata (MLCA) is presented. With the scheme, the computer-generated hologram, which has the information of the three primitive images, is generated by modified Gerchberg-Saxton (GS) iterative algorithm using three different fractional orders in fractional Fourier domain firstly. Then the hologram is encrypted using MLCA mask. The ciphertext can be decrypted combined with the fractional orders and the rules of MLCA. Numerical simulations and experimental display results have been carried out to verify the validity and feasibility of the proposed scheme.

  4. A programmable two-qubit quantum processor in silicon.

    Science.gov (United States)

    Watson, T F; Philips, S G J; Kawakami, E; Ward, D R; Scarlino, P; Veldhorst, M; Savage, D E; Lagally, M G; Friesen, Mark; Coppersmith, S N; Eriksson, M A; Vandersypen, L M K

    2018-03-29

    Now that it is possible to achieve measurement and control fidelities for individual quantum bits (qubits) above the threshold for fault tolerance, attention is moving towards the difficult task of scaling up the number of physical qubits to the large numbers that are needed for fault-tolerant quantum computing. In this context, quantum-dot-based spin qubits could have substantial advantages over other types of qubit owing to their potential for all-electrical operation and ability to be integrated at high density onto an industrial platform. Initialization, readout and single- and two-qubit gates have been demonstrated in various quantum-dot-based qubit representations. However, as seen with small-scale demonstrations of quantum computers using other types of qubit, combining these elements leads to challenges related to qubit crosstalk, state leakage, calibration and control hardware. Here we overcome these challenges by using carefully designed control techniques to demonstrate a programmable two-qubit quantum processor in a silicon device that can perform the Deutsch-Josza algorithm and the Grover search algorithm-canonical examples of quantum algorithms that outperform their classical analogues. We characterize the entanglement in our processor by using quantum-state tomography of Bell states, measuring state fidelities of 85-89 per cent and concurrences of 73-82 per cent. These results pave the way for larger-scale quantum computers that use spins confined to quantum dots.

  5. A programmable two-qubit quantum processor in silicon

    Science.gov (United States)

    Watson, T. F.; Philips, S. G. J.; Kawakami, E.; Ward, D. R.; Scarlino, P.; Veldhorst, M.; Savage, D. E.; Lagally, M. G.; Friesen, Mark; Coppersmith, S. N.; Eriksson, M. A.; Vandersypen, L. M. K.

    2018-03-01

    Now that it is possible to achieve measurement and control fidelities for individual quantum bits (qubits) above the threshold for fault tolerance, attention is moving towards the difficult task of scaling up the number of physical qubits to the large numbers that are needed for fault-tolerant quantum computing. In this context, quantum-dot-based spin qubits could have substantial advantages over other types of qubit owing to their potential for all-electrical operation and ability to be integrated at high density onto an industrial platform. Initialization, readout and single- and two-qubit gates have been demonstrated in various quantum-dot-based qubit representations. However, as seen with small-scale demonstrations of quantum computers using other types of qubit, combining these elements leads to challenges related to qubit crosstalk, state leakage, calibration and control hardware. Here we overcome these challenges by using carefully designed control techniques to demonstrate a programmable two-qubit quantum processor in a silicon device that can perform the Deutsch–Josza algorithm and the Grover search algorithm—canonical examples of quantum algorithms that outperform their classical analogues. We characterize the entanglement in our processor by using quantum-state tomography of Bell states, measuring state fidelities of 85–89 per cent and concurrences of 73–82 per cent. These results pave the way for larger-scale quantum computers that use spins confined to quantum dots.

  6. A High-Speed and Low-Energy-Consumption Processor for SVD-MIMO-OFDM Systems

    Directory of Open Access Journals (Sweden)

    Hiroki Iwaizumi

    2013-01-01

    Full Text Available A processor design for singular value decomposition (SVD and compression/decompression of feedback matrices, which are mandatory operations for SVD multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM systems, is proposed and evaluated. SVD-MIMO is a transmission method for suppressing multistream interference and improving communication quality by beamforming. An application specific instruction-set processor (ASIP architecture is adopted to achieve flexibility in terms of operations and matrix size. The proposed processor realizes a high-speed/low-power design and real-time processing by the parallelization of floating-point units (FPUs and arithmetic instructions specialized in complex matrix operations.

  7. Visualization of unsteady computational fluid dynamics

    Science.gov (United States)

    Haimes, Robert

    1994-11-01

    A brief summary of the computer environment used for calculating three dimensional unsteady Computational Fluid Dynamic (CFD) results is presented. This environment requires a super computer as well as massively parallel processors (MPP's) and clusters of workstations acting as a single MPP (by concurrently working on the same task) provide the required computational bandwidth for CFD calculations of transient problems. The cluster of reduced instruction set computers (RISC) is a recent advent based on the low cost and high performance that workstation vendors provide. The cluster, with the proper software can act as a multiple instruction/multiple data (MIMD) machine. A new set of software tools is being designed specifically to address visualizing 3D unsteady CFD results in these environments. Three user's manuals for the parallel version of Visual3, pV3, revision 1.00 make up the bulk of this report.

  8. Front end data link processor

    International Nuclear Information System (INIS)

    Wallace, J.J.

    1988-01-01

    It is possible to expand the data acquisition capabilities of an existing process computer to include other dedicated computer based systems, provided each system has at least minimal data link capabilities. The following paper discusses the addition of three computer based acquisition systems to a Honeywell 4500C (also designated the 45000) running the SEER system. Only one data link port was required to support the link. Each of the three specialized systems implemented data link protocols used by their suppliers in previous projects: none of the three were compatible with Honeywell's protocol. Part one of the following provides a generic overview of the project and would be relevent to the operator of any process system interested in expansion. Part two provides specific details of this project and may serve to provide performance benchmarks to those who wish to consider a similar project

  9. Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

    KAUST Repository

    Quintin, Jean-Noel

    2013-10-01

    Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon\\'s algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon\\'s algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.

  10. Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

    KAUST Repository

    Quintin, Jean-Noel; Hasanov, Khalid; Lastovetsky, Alexey

    2013-01-01

    Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon's algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon's algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.

  11. Computation of Effect Size for Moderating Effects of Categorical Variables in Multiple Regression

    Science.gov (United States)

    Aguinis, Herman; Pierce, Charles A.

    2006-01-01

    The computation and reporting of effect size estimates is becoming the norm in many journals in psychology and related disciplines. Despite the increased importance of effect sizes, researchers may not report them or may report inaccurate values because of a lack of appropriate computational tools. For instance, Pierce, Block, and Aguinis (2004)…

  12. Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems

    Science.gov (United States)

    Downie, John D.

    1990-01-01

    A ground-based adaptive optics imaging telescope system attempts to improve image quality by detecting and correcting for atmospherically induced wavefront aberrations. The required control computations during each cycle will take a finite amount of time. Longer time delays result in larger values of residual wavefront error variance since the atmosphere continues to change during that time. Thus an optical processor may be well-suited for this task. This paper presents a study of the accuracy requirements in a general optical processor that will make it competitive with, or superior to, a conventional digital computer for the adaptive optics application. An optimization of the adaptive optics correction algorithm with respect to an optical processor's degree of accuracy is also briefly discussed.

  13. Parallelising a molecular dynamics algorithm on a multi-processor workstation

    Science.gov (United States)

    Müller-Plathe, Florian

    1990-12-01

    The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.

  14. Real-time simulation of MHD/steam power plants by digital parallel processors

    International Nuclear Information System (INIS)

    Johnson, R.M.; Rudberg, D.A.

    1981-01-01

    Attention is given to a large FORTRAN coded program which simulates the dynamic response of the MHD/steam plant on either a SEL 32/55 or VAX 11/780 computer. The code realizes a detailed first-principle model of the plant. Quite recently, in addition to the VAX 11/780, an AD-10 has been installed for usage as a real-time simulation facility. The parallel processor AD-10 is capable of simulating the MHD/steam plant at several times real-time rates. This is desirable in order to develop rapidly a large data base of varied plant operating conditions. The combined-cycle MHD/steam plant model is discussed, taking into account a number of disadvantages. The disadvantages can be overcome with the aid of an array processor used as an adjunct to the unit processor. The conversion of some computations for real-time simulation is considered

  15. Simulation of Particulate Flows Multi-Processor Machines with Distributed Memory

    Energy Technology Data Exchange (ETDEWEB)

    Uhlmann, M.

    2004-07-01

    We presented a method for the parallelization of an immersed boundary algorithm for particulate flows using the MPI standard of communication. The treatment of the fluid phase used the domain decomposition technique over a Cartesian processor grid. The solution of the Helmholtz problem is approximately factorized an relies upon apparel tri-diagonal solver the Poisson problem is solved by means of a parallel multi-grid technique similar to MUDPACK. for the solid phase we employ a master-slaves technique where one processor handles all the particles contained in its Eulerian fluid sub-domain and zero or more neighbor processors collaborate in the computation of particle-related quantities whenever a particle position over laps the boundary of a sub-domain. the parallel efficiency for some preliminary computations is presented. (Author) 9 refs.

  16. VLSI Design of a Variable-Length FFT/IFFT Processor for OFDM-Based Communication Systems

    Directory of Open Access Journals (Sweden)

    Jen-Chih Kuo

    2003-12-01

    Full Text Available The technique of {orthogonal frequency division multiplexing (OFDM} is famous for its robustness against frequency-selective fading channel. This technique has been widely used in many wired and wireless communication systems. In general, the {fast Fourier transform (FFT} and {inverse FFT (IFFT} operations are used as the modulation/demodulation kernel in the OFDM systems, and the sizes of FFT/IFFT operations are varied in different applications of OFDM systems. In this paper, we design and implement a variable-length prototype FFT/IFFT processor to cover different specifications of OFDM applications. The cached-memory FFT architecture is our suggested VLSI system architecture to design the prototype FFT/IFFT processor for the consideration of low-power consumption. We also implement the twiddle factor butterfly {processing element (PE} based on the {{coordinate} rotation digital computer (CORDIC} algorithm, which avoids the use of conventional multiplication-and-accumulation unit, but evaluates the trigonometric functions using only add-and-shift operations. Finally, we implement a variable-length prototype FFT/IFFT processor with TSMC 0.35 μm 1P4M CMOS technology. The simulations results show that the chip can perform (64-2048-point FFT/IFFT operations up to 80 MHz operating frequency which can meet the speed requirement of most OFDM standards such as WLAN, ADSL, VDSL (256∼2K, DAB, and 2K-mode DVB.

  17. High-Speed Computation of the Kleene Star in Max-Plus Algebraic System Using a Cell Broadband Engine

    Science.gov (United States)

    Goto, Hiroyuki

    This research addresses a high-speed computation method for the Kleene star of the weighted adjacency matrix in a max-plus algebraic system. We focus on systems whose precedence constraints are represented by a directed acyclic graph and implement it on a Cell Broadband Engine™ (CBE) processor. Since the resulting matrix gives the longest travel times between two adjacent nodes, it is often utilized in scheduling problem solvers for a class of discrete event systems. This research, in particular, attempts to achieve a speedup by using two approaches: parallelization and SIMDization (Single Instruction, Multiple Data), both of which can be accomplished by a CBE processor. The former refers to a parallel computation using multiple cores, while the latter is a method whereby multiple elements are computed by a single instruction. Using the implementation on a Sony PlayStation 3™ equipped with a CBE processor, we found that the SIMDization is effective regardless of the system's size and the number of processor cores used. We also found that the scalability of using multiple cores is remarkable especially for systems with a large number of nodes. In a numerical experiment where the number of nodes is 2000, we achieved a speedup of 20 times compared with the method without the above techniques.

  18. An Evaluation of an Ada Implementation of the Rete Algorithm for Embedded Flight Processors

    Science.gov (United States)

    1990-12-01

    computers was desired. The VAX VMS operating system has many built-in methods for determining program performance (including VAX PCA), but these methods... overviev , of the target environment-- the MIL-STD-1750A VHSIC Avionic Modular Processor ( VA.IP, running under the Ada Avionics Real-Time Software (AARTS... computers . Mil-STD-1750A, the Air Force’s standard flight computer architecture, however, places severe constraints on applications software processing

  19. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

    Science.gov (United States)

    Wan, Shixiang; Zou, Quan

    2017-01-01

    Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

  20. MUSIDH, multiple use of simulated demographic histories, a novel method to reduce computation time in microsimulation models of infectious diseases.

    Science.gov (United States)

    Fischer, E A J; De Vlas, S J; Richardus, J H; Habbema, J D F

    2008-09-01

    Microsimulation of infectious diseases requires simulation of many life histories of interacting individuals. In particular, relatively rare infections such as leprosy need to be studied in very large populations. Computation time increases disproportionally with the size of the simulated population. We present a novel method, MUSIDH, an acronym for multiple use of simulated demographic histories, to reduce computation time. Demographic history refers to the processes of birth, death and all other demographic events that should be unrelated to the natural course of an infection, thus non-fatal infections. MUSIDH attaches a fixed number of infection histories to each demographic history, and these infection histories interact as if being the infection history of separate individuals. With two examples, mumps and leprosy, we show that the method can give a factor 50 reduction in computation time at the cost of a small loss in precision. The largest reductions are obtained for rare infections with complex demographic histories.

  1. Control structures for high speed processors

    Science.gov (United States)

    Maki, G. K.; Mankin, R.; Owsley, P. A.; Kim, G. M.

    1982-01-01

    A special processor was designed to function as a Reed Solomon decoder with throughput data rate in the Mhz range. This data rate is significantly greater than is possible with conventional digital architectures. To achieve this rate, the processor design includes sequential, pipelined, distributed, and parallel processing. The processor was designed using a high level language register transfer language. The RTL can be used to describe how the different processes are implemented by the hardware. One problem of special interest was the development of dependent processes which are analogous to software subroutines. For greater flexibility, the RTL control structure was implemented in ROM. The special purpose hardware required approximately 1000 SSI and MSI components. The data rate throughput is 2.5 megabits/second. This data rate is achieved through the use of pipelined and distributed processing. This data rate can be compared with 800 kilobits/second in a recently proposed very large scale integration design of a Reed Solomon encoder.

  2. Real time processor for array speckle interferometry

    Science.gov (United States)

    Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

    1989-02-01

    The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.

  3. Development methods for VLSI-processors

    International Nuclear Information System (INIS)

    Horninger, K.; Sandweg, G.

    1982-01-01

    The aim of this project, which was originally planed for 3 years, was the development of modern system and circuit concepts, for VLSI-processors having a 32 bit wide data path. The result of this first years work is the concept of a general purpose processor. This processor is not only logically but also physically (on the chip) divided into four functional units: a microprogrammable instruction unit, an execution unit in slice technique, a fully associative cache memory and an I/O unit. For the ALU of the execution unit circuits in PLA and slice techniques have been realized. On the basis of regularity, area consumption and achievable performance the slice technique has been prefered. The designs utilize selftesting circuitry. (orig.) [de

  4. The ATLAS Level-1 Muon to Central Trigger Processor Interface

    CERN Document Server

    Berge, D; Farthouat, P; Haas, S; Klofver, P; Krasznahorkay, A; Messina, A; Pauly, T; Schuler, G; Spiwoks, R; Wengler, T; PH-EP

    2007-01-01

    The Muon to Central Trigger Processor Interface (MUCTPI) is part of the ATLAS Level-1 trigger system and connects the output of muon trigger system to the Central Trigger Processor (CTP). At every bunch crossing (BC), the MUCTPI receives information on muon candidates from each of the 208 muon trigger sectors and calculates the total multiplicity for each of six transverse momentum (pT) thresholds. This multiplicity value is then sent to the CTP, where it is used together with the input from the Calorimeter trigger to make the final Level-1 Accept (L1A) decision. In addition the MUCTPI provides summary information to the Level-2 trigger and to the data acquisition (DAQ) system for events selected at Level-1. This information is used to define the regions of interest (RoIs) that drive the Level-2 muontrigger processing. The MUCTPI system consists of a 9U VME chassis with a dedicated active backplane and 18 custom designed modules. The design of the modules is based on state-of-the-art FPGA devices and special ...

  5. The computation of multiple MHD equilibria in axisymmetric and straight geometry

    International Nuclear Information System (INIS)

    Thomas, C.Ll.

    1979-01-01

    The details of the numerical methods used in codes for computing MHD equilibria in discrete conductor configurations are described with both code users and code writers in mind. Results produced by the codes have been successfully verified against analytic results and independent computations. The axisymmetric code has proved to be a valuable diagnostic aid for the TOSCA experiment. The user images of the codes are described in the appendices. (author)

  6. A Representational Approach to Knowledge and Multiple Skill Levels for Broad Classes of Computer Generated Forces

    Science.gov (United States)

    1997-12-01

    that I’ll turn my attention to that computer game we’ve talked so much about... Dave Van Veldhuizen and Scott Brown (soon-to-be Drs. Van Veldhuizen and...Industry Training Systems Conference. 1988. 37. Van Veldhuizen , D. A. and L. J Hutson. "A Design Methodology for Domain Inde- pendent Computer...proposed by Van Veld- huizen and Hutson (37), extends the general architecture to support both a domain- independent approach to implementing CGFs and

  7. Reducing adaptive optics latency using Xeon Phi many-core processors

    Science.gov (United States)

    Barr, David; Basden, Alastair; Dipper, Nigel; Schwartz, Noah

    2015-11-01

    The next generation of Extremely Large Telescopes (ELTs) for astronomy will rely heavily on the performance of their adaptive optics (AO) systems. Real-time control is at the heart of the critical technologies that will enable telescopes to deliver the best possible science and will require a very significant extrapolation from current AO hardware existing for 4-10 m telescopes. Investigating novel real-time computing architectures and testing their eligibility against anticipated challenges is one of the main priorities of technology development for the ELTs. This paper investigates the suitability of the Intel Xeon Phi, which is a commercial off-the-shelf hardware accelerator. We focus on wavefront reconstruction performance, implementing a straightforward matrix-vector multiplication (MVM) algorithm. We present benchmarking results of the Xeon Phi on a real-time Linux platform, both as a standalone processor and integrated into an existing real-time controller (RTC). Performance of single and multiple Xeon Phis are investigated. We show that this technology has the potential of greatly reducing the mean latency and variations in execution time (jitter) of large AO systems. We present both a detailed performance analysis of the Xeon Phi for a typical E-ELT first-light instrument along with a more general approach that enables us to extend to any AO system size. We show that systematic and detailed performance analysis is an essential part of testing novel real-time control hardware to guarantee optimal science results.

  8. High-performance computing — an overview

    Science.gov (United States)

    Marksteiner, Peter

    1996-08-01

    An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.

  9. Software-defined reconfigurable microwave photonics processor.

    Science.gov (United States)

    Pérez, Daniel; Gasulla, Ivana; Capmany, José

    2015-06-01

    We propose, for the first time to our knowledge, a software-defined reconfigurable microwave photonics signal processor architecture that can be integrated on a chip and is capable of performing all the main functionalities by suitable programming of its control signals. The basic configuration is presented and a thorough end-to-end design model derived that accounts for the performance of the overall processor taking into consideration the impact and interdependencies of both its photonic and RF parts. We demonstrate the model versatility by applying it to several relevant application examples.

  10. Parallel processor for fast event analysis

    International Nuclear Information System (INIS)

    Hensley, D.C.

    1983-01-01

    Current maximum data rates from the Spin Spectrometer of approx. 5000 events/s (up to 1.3 MBytes/s) and minimum analysis requiring at least 3000 operations/event require a CPU cycle time near 70 ns. In order to achieve an effective cycle time of 70 ns, a parallel processing device is proposed where up to 4 independent processors will be implemented in parallel. The individual processors are designed around the Am2910 Microsequencer, the AM29116 μP, and the Am29517 Multiplier. Satellite histogramming in a mass memory system will be managed by a commercial 16-bit μP system

  11. Time Manager Software for a Flight Processor

    Science.gov (United States)

    Zoerne, Roger

    2012-01-01

    Data analysis is a process of inspecting, cleaning, transforming, and modeling data to highlight useful information and suggest conclusions. Accurate timestamps and a timeline of vehicle events are needed to analyze flight data. By moving the timekeeping to the flight processor, there is no longer a need for a redundant time source. If each flight processor is initially synchronized to GPS, they can freewheel and maintain a fairly accurate time throughout the flight with no additional GPS time messages received. How ever, additional GPS time messages will ensure an even greater accuracy. When a timestamp is required, a gettime function is called that immediately reads the time-base register.

  12. Temporal analysis and scheduling of hard real-time radios running on a multi-processor

    NARCIS (Netherlands)

    Moreira, O.

    2012-01-01

    On a multi-radio baseband system, multiple independent transceivers must share the resources of a multi-processor, while meeting each its own hard real-time requirements. Not all possible combinations of transceivers are known at compile time, so a solution must be found that either allows for

  13. Some questions of using the algebraic coding theory for construction of special-purpose processors in high energy physics spectrometers

    International Nuclear Information System (INIS)

    Nikityuk, N.M.

    1989-01-01

    The results of investigations of using the algebraic coding theory for the creation of parallel encoders, majority coincidence schemes and coordinate processors for the first and second trigger levels are described. Concrete examples of calculation and structure of special-purpose processor using the table arithmetic method are given for multiplicity t ≤ 5. The question of using parallel and sequential syndrome coding methods for the registration of events with clusters is discussed. 30 refs.; 10 figs

  14. Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

    Directory of Open Access Journals (Sweden)

    Abdul Kareem PARCHUR

    2012-08-01

    Full Text Available High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310. The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many key application areas in modern generation. The scaling of performance in two major series of Intel Xeon processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310 has been analyzed using the performance numbers of 12 CPU2006 integer benchmarks, performance numbers that exhibit significant differences in performance. The results and analysis can be used by performance engineers, scientists and developers to better understand the performance scaling in modern generation processors.

  15. Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards.

    Science.gov (United States)

    Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G

    2011-07-01

    In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.

  16. Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture

    Science.gov (United States)

    Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

    2014-02-11

    Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.

  17. Special purpose processors for high energy physics applications

    International Nuclear Information System (INIS)

    Verkerk, C.

    1978-01-01

    The review on the subject of hardware processors from very fast decision logic for the split field magnet facility at CERN, to a point-finding processor used to relieve the data-acquisition minicomputer from the task of monitoring the SPS experiment is given. Block diagrams of decision making processor, point-finding processor, complanarity and opening angle processor and programmable track selector module are presented and discussed. The applications of fully programmable but slower processor on the one hand, and very fast and programmable decision logic on the other hand are given in this review

  18. Automation of ORIGEN2 calculations for the transuranic waste baseline inventory database using a pre-processor and a post-processor

    International Nuclear Information System (INIS)

    Liscum-Powell, J.

    1997-06-01

    The purpose of the work described in this report was to automate ORIGEN2 calculations for the Waste Isolation Pilot Plant (WIPP) Transuranic Waste Baseline Inventory Database (WTWBID); this was done by developing a pre-processor to generate ORIGEN2 input files from WWBID inventory files and a post-processor to remove excess information from the ORIGEN2 output files. The calculations performed with ORIGEN2 estimate the radioactive decay and buildup of various radionuclides in the waste streams identified in the WTWBID. The resulting radionuclide inventories are needed for performance assessment calculations for the WIPP site. The work resulted in the development of PreORG, which requires interaction with the user to generate ORIGEN2 input files on a site-by-site basis, and PostORG, which processes ORIGEN2 output into more manageable files. Both programs are written in the FORTRAN 77 computer language. After running PreORG, the user will run ORIGEN2 to generate the desired data; upon completion of ORIGEN2 calculations, the user can run PostORG to process the output to make it more manageable. All the programs run on a 386 PC or higher with a math co-processor or a computer platform running under VMS operating system. The pre- and post-processors for ORIGEN2 were generated for use with Rev. 1 data of the WTWBID and can also be used with Rev. 2 and 3 data of the TWBID (Transuranic Waste Baseline Inventory Database)

  19. On-line trigger processor in PETRA/DORIS experiments at DESY

    CERN Document Server

    ölschläger, R

    1981-01-01

    Data, presented at a poster session, on on-line trigger processing are given. Brief details of trigger processors at the detectors CELLO, TASSO and ARAUS are shown, including: general working method; IC technology; power consumption; logic elements for trigger decision; number of chambers; number of input wires; execution time; parameter variation; links to host computer; cost; test features. (0 refs).

  20. Multi-processor system for real-time flow estimation in medical ultrasound imaging

    DEFF Research Database (Denmark)

    Stetson, Paul F.; Jensen, Jesper Lomborg; Antonius, Peter

    1997-01-01

    the processed data. The generous bandwidth of the links makes it easy to balance the computational load among the processors.In order to manage the shared system memory and to make use of the parallel processing capabilities of the system, a real-time multitasking kernel has been developed. The kernel uses...

  1. Design of massively parallel hardware multi-processors for highly-demanding embedded applications

    NARCIS (Netherlands)

    Jozwiak, L.; Jan, Y.

    2013-01-01

    Many new embedded applications require complex computations to be performed to tight schedules, while at the same time demanding low energy consumption and low cost. For implementation of these highly-demanding applications, highly-optimized application-specific multi-processor system-on-a-chip

  2. Implementation of CT and IHT Processors for Invariant Object Recognition System

    Directory of Open Access Journals (Sweden)

    J. Turan jr.

    2004-12-01

    Full Text Available This paper presents PDL or ASIC implementation of key modules ofinvariant object recognition system based on the combination of theIncremental Hough transform (IHT, correlation and rapid transform(RT. The invariant object recognition system was represented partiallyin C++ language for general-purpose processor on personal computer andpartially described in VHDL code for implementation in PLD or ASIC.

  3. Scaling-up spatially-explicit ecological models using graphics processors

    NARCIS (Netherlands)

    Koppel, Johan van de; Gupta, Rohit; Vuik, Cornelis

    2011-01-01

    How the properties of ecosystems relate to spatial scale is a prominent topic in current ecosystem research. Despite this, spatially explicit models typically include only a limited range of spatial scales, mostly because of computing limitations. Here, we describe the use of graphics processors to

  4. Computer-generated holograms by multiple wavefront recording plane method with occlusion culling.

    Science.gov (United States)

    Symeonidou, Athanasia; Blinder, David; Munteanu, Adrian; Schelkens, Peter

    2015-08-24

    We propose a novel fast method for full parallax computer-generated holograms with occlusion processing, suitable for volumetric data such as point clouds. A novel light wave propagation strategy relying on the sequential use of the wavefront recording plane method is proposed, which employs look-up tables in order to reduce the computational complexity in the calculation of the fields. Also, a novel technique for occlusion culling with little additional computation cost is introduced. Additionally, the method adheres a Gaussian distribution to the individual points in order to improve visual quality. Performance tests show that for a full-parallax high-definition CGH a speedup factor of more than 2,500 compared to the ray-tracing method can be achieved without hardware acceleration.

  5. Interplay of multiple synaptic plasticity features in filamentary memristive devices for neuromorphic computing

    Science.gov (United States)

    La Barbera, Selina; Vincent, Adrien F.; Vuillaume, Dominique; Querlioz, Damien; Alibart, Fabien

    2016-12-01

    Bio-inspired computing represents today a major challenge at different levels ranging from material science for the design of innovative devices and circuits to computer science for the understanding of the key features required for processing of natural data. In this paper, we propose a detail analysis of resistive switching dynamics in electrochemical metallization cells for synaptic plasticity implementation. We show how filament stability associated to joule effect during switching can be used to emulate key synaptic features such as short term to long term plasticity transition and spike timing dependent plasticity. Furthermore, an interplay between these different synaptic features is demonstrated for object motion detection in a spike-based neuromorphic circuit. System level simulation presents robust learning and promising synaptic operation paving the way to complex bio-inspired computing systems composed of innovative memory devices.

  6. Implementation schemes in NMR of quantum processors and the Deutsch-Jozsa algorithm by using virtual spin representation

    International Nuclear Information System (INIS)

    Kessel, Alexander R.; Yakovleva, Natalia M.

    2002-01-01

    Schemes of experimental realization of the main two-qubit processors for quantum computers and the Deutsch-Jozsa algorithm are derived in virtual spin representation. The results are applicable for every four quantum states allowing the required properties for quantum processor implementation if for qubit encoding, virtual spin representation is used. A four-dimensional Hilbert space of nuclear spin 3/2 is considered in detail for this aim

  7. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

    Science.gov (United States)

    Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.

    2006-01-01

    Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…

  8. Effects of playing mathematics computer games on primary school students' multiplicative reasoning ability

    NARCIS (Netherlands)

    Bakker, Marjoke; Van den Heuvel-Panhuizen, M.; Robitzsch, Alexander

    2015-01-01

    This study used a large-scale cluster randomized longitudinal experiment (N=719; 35schools) to investigate the effects of online mathematics mini-games on primary school students' multiplicative reasoning ability. The experiment included four conditions: playing at school, integrated in a lesson

  9. An Alternative Water Processor for Long Duration Space Missions

    Science.gov (United States)

    Barta, Daniel J.; Pickering, Karen D.; Meyer, Caitlin; Pennsinger, Stuart; Vega, Leticia; Flynn, Michael; Jackson, Andrew; Wheeler, Raymond

    2014-01-01

    A new wastewater recovery system has been developed that combines novel biological and physicochemical components for recycling wastewater on long duration human space missions. Functionally, this Alternative Water Processor (AWP) would replace the Urine Processing Assembly on the International Space Station and reduce or eliminate the need for the multi-filtration beds of the Water Processing Assembly (WPA). At its center are two unique game changing technologies: 1) a biological water processor (BWP) to mineralize organic forms of carbon and nitrogen and 2) an advanced membrane processor (Forward Osmosis Secondary Treatment) for removal of solids and inorganic ions. The AWP is designed for recycling larger quantities of wastewater from multiple sources expected during future exploration missions, including urine, hygiene (hand wash, shower, oral and shave) and laundry. The BWP utilizes a single-stage membrane-aerated biological reactor for simultaneous nitrification and denitrification. The Forward Osmosis Secondary Treatment (FOST) system uses a combination of forward osmosis (FO) and reverse osmosis (RO), is resistant to biofouling and can easily tolerate wastewaters high in non-volatile organics and solids associated with shower and/or hand washing. The BWP has been operated continuously for over 300 days. After startup, the mature biological system averaged 85% organic carbon removal and 44% nitrogen removal, close to stoichiometric maximum based on available carbon. To date, the FOST has averaged 93% water recovery, with a maximum of 98%. If the wastewater is slighty acidified, ammonia rejection is optimal. This paper will provide a description of the technology and summarize results from ground-based testing using real wastewater

  10. Directions in parallel processor architecture, and GPUs too

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    Modern computing is power-limited in every domain of computing. Performance increments extracted from instruction-level parallelism (ILP) are no longer power-efficient; they haven't been for some time. Thread-level parallelism (TLP) is a more easily exploited form of parallelism, at the expense of programmer effort to expose it in the program. In this talk, I will introduce you to disparate topics in parallel processor architecture that will impact programming models (and you) in both the near and far future. About the speaker Olivier is a senior GPU (SM) architect at NVIDIA and an active participant in the concurrency working group of the ISO C++ committee. He has also worked on very large diesel engines as a mechanical engineer, and taught at McGill University (Canada) as a faculty instructor.

  11. The breaking point of modern processor and platform technology

    CERN Document Server

    Nowak, A; Lazzaro, A; Leduc, J

    2011-01-01

    This work is an overview of state of the art processors used in High Energy Physics, their architecture and an extensive outline of the forthcoming technologies. Silicon process science and hardware design are making constant and rapid progress, and a solid grasp of these developments is imperative to the understanding of their possible future applications, which might include software strategy, optimizations, computing center operations and hardware acquisitions. In particular, the current issue of software and platform scalability is becoming more and more noticeable, and will develop in the near future with the growing core count of single chips and the approach of certain x86 architectural limits. Other topics brought forward include the hard, physical limits of innovation, the applicability of tried and tested computing formulas to modern technologies, as well as an analysis of viable alternate choices for continued development.

  12. Cassava processors' awareness of occupational and environmental ...

    African Journals Online (AJOL)

    A larger percentage (74.5%) of the respondents indicated that the Agricultural Development Programme (ADP) is their source of information. The result also showed that processor's awareness of occupational hazards associated with the different stages of cassava processing vary because their involvement in these stages

  13. Beeldverwerking met de Micron Automatic Processor

    OpenAIRE

    Goyens, Frank

    2017-01-01

    Deze thesis is een onderzoek naar toepassingen binnen beeldverwerking op de Micron Automata Processor hardware. De hardware wordt vergeleken met populaire hedendaagse hardware. Ook bevat dit onderzoek nuttige informatie en strategieën voor het ontwikkelen van nieuwe toepassingen. Bevindingen in dit onderzoek omvatten proof of concept algoritmes en een praktische toepassing.

  14. 7 CFR 1215.14 - Processor.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 10 2010-01-01 2010-01-01 false Processor. 1215.14 Section 1215.14 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (MARKETING AGREEMENTS... CONSUMER INFORMATION Popcorn Promotion, Research, and Consumer Information Order Definitions § 1215.14...

  15. Simplifying cochlear implant speech processor fitting

    NARCIS (Netherlands)

    Willeboer, C.

    2008-01-01

    Conventional fittings of the speech processor of a cochlear implant (CI) rely to a large extent on the implant recipient's subjective responses. For each of the 22 intracochlear electrodes the recipient has to indicate the threshold level (T-level) and comfortable loudness level (C-level) while

  16. Space Station Water Processor Process Pump

    Science.gov (United States)

    Parker, David

    1995-01-01

    This report presents the results of the development program conducted under contract NAS8-38250-12 related to the International Space Station (ISS) Water Processor (WP) Process Pump. The results of the Process Pumps evaluation conducted on this program indicates that further development is required in order to achieve the performance and life requirements for the ISSWP.

  17. Interleaved Subtask Scheduling on Multi Processor SOC

    NARCIS (Netherlands)

    Zhe, M.

    2006-01-01

    The ever-progressing semiconductor processing technique has integrated more and more embedded processors on a single system-on-achip (SoC). With such powerful SoC platforms, and also due to the stringent time-to-market deadlines, many functionalities which used to be implemented in ASICs are

  18. User manual Dieka PreProcessor

    NARCIS (Netherlands)

    Valkering, Kasper

    2000-01-01

    This is the user manual belonging to the Dieka-PreProcessor. This application was written by Wenhua Cao and revised and expanded by Kasper Valkering. The aim of this preproccesor is to be able to draw and mesh extrusion dies in ProEngineer, and do the FE-calculation in Dieka. The preprocessor makes

  19. Event analysis using a massively parallel processor

    International Nuclear Information System (INIS)

    Bale, A.; Gerelle, E.; Messersmith, J.; Warren, R.; Hoek, J.

    1990-01-01

    This paper describes a system for performing histogramming of n-tuple data at interactive rates using a commercial SIMD processor array connected to a work-station running the well-known Physics Analysis Workstation software (PAW). Results indicate that an order of magnitude performance improvement over current RISC technology is easily achievable

  20. THOR Fields and Wave Processor - FWP

    Science.gov (United States)

    Soucek, Jan; Rothkaehl, Hanna; Ahlen, Lennart; Balikhin, Michael; Carr, Christopher; Dekkali, Moustapha; Khotyaintsev, Yuri; Lan, Radek; Magnes, Werner; Morawski, Marek; Nakamura, Rumi; Uhlir, Ludek; Yearby, Keith; Winkler, Marek; Zaslavsky, Arnaud

    2017-04-01

    If selected, Turbulence Heating ObserveR (THOR) will become the first spacecraft mission dedicated to the study of plasma turbulence. The Fields and Waves Processor (FWP) is an integrated electronics unit for all electromagnetic field measurements performed by THOR. FWP will interface with all THOR fields sensors: electric field antennas of the EFI instrument, the MAG fluxgate magnetometer, and search-coil magnetometer (SCM), and perform signal digitization and on-board data processing. FWP box will house multiple data acquisition sub-units and signal analyzers all sharing a common power supply and data processing unit and thus a single data and power interface to the spacecraft. Integrating all the electromagnetic field measurements in a single unit will improve the consistency of field measurement and accuracy of time synchronization. The scientific value of highly sensitive electric and magnetic field measurements in space has been demonstrated by Cluster (among other spacecraft) and THOR instrumentation will further improve on this heritage. Large dynamic range of the instruments will be complemented by a thorough electromagnetic cleanliness program, which will prevent perturbation of field measurements by interference from payload and platform subsystems. Taking advantage of the capabilities of modern electronics and the large telemetry bandwidth of THOR, FWP will provide multi-component electromagnetic field waveforms and spectral data products at a high time resolution. Fully synchronized sampling of many signals will allow to resolve wave phase information and estimate wavelength via interferometric correlations between EFI probes. FWP will also implement a plasma resonance sounder and a digital plasma quasi-thermal noise analyzer designed to provide high cadence measurements of plasma density and temperature complementary to data from particle instruments. FWP will rapidly transmit information about magnetic field vector and spacecraft potential to the