single processor implementation: Topics by WorldWideScience.org

Sample records for single processor implementation

Multi-processor network implementations in Multibus II and VME

International Nuclear Information System (INIS)

Briegel, C.

1992-01-01

ACNET (Fermilab Accelerator Controls Network), a proprietary network protocol, is implemented in a multi-processor configuration for both Multibus II and VME. The implementations are contrasted by the bus protocol and software design goals. The Multibus II implementation provides for multiple processors running a duplicate set of tasks on each processor. For a network connected task, messages are distributed by a network round-robin scheduler. Further, messages can be stopped, continued, or re-routed for each task by user-callable commands. The VME implementation provides for multiple processors running one task across all processors. The process can either be fixed to a particular processor or dynamically allocated to an available processor depending on the scheduling algorithm of the multi-processing operating system. (author)
The hardware implementation of the CERN SPS ultrafast feedback processor demonstrator

CERN Document Server

Dusakto, J E; Fox, J D; Olsen, J; Rivetta, C H; Höfle, W

2013-01-01

An ultrafast 4GSa/s transverse feedback processor has been developed for proof-of-concept studies of feedback control of e-cloud driven and transverse mode coupled intra-bunch instabilities in the CERN SPS. This system consists of a high-speed ADC on the front end and equally fast DAC on the back end. All control and signal processing is implemented in FPGA logic. This system is capable of taking up to 16 sample slices across a single SPS bunch and processing each slice individually within a reconfigurable signal processor. This demonstrator system is a rapidly developed prototype, consisting of both commercial and custom-design components. It can stabilize the motion of a single particle bunch using closed loop feedback. The system can also run open loop as a high-speed arbitrary waveform generator and contains diagnostic features including a special ADC snapshot capture memory. This paper describes the overall system, the feedback processor and focuses on the hardware architecture, design ...
FASTBUS Standard Routines implementation for Fermilab embedded processor boards

International Nuclear Information System (INIS)

Pangburn, J.; Patrick, J.; Kent, S.; Oleynik, G.; Pordes, R.; Votava, M.; Heyes, G.; Watson, W.A. III

1992-10-01

In collaboration with CEBAF, Fermilab's Online Support Department and the CDF experiment have produced a new implementation of the IEEE FASTBUS Standard Routines for two embedded processor FASTBUS boards: the Fermilab Smart Crate Controller (FSCC) and the FASTBUS Readout Controller (FRC). Features of this implementation include: portability (to other embedded processor boards), remote source-level debugging, high speed, optional generation of very high-speed code for readout applications, and built-in Sun RPC support for execution of FASTBUS transactions and lists over the network
A single chip pulse processor for nuclear spectroscopy

International Nuclear Information System (INIS)

Hilsenrath, F.; Bakke, J.C.; Voss, H.D.

1985-01-01

A high performance digital pulse processor, integrated into a single gate array microcircuit, has been developed for spaceflight applications. The new approach takes advantage of the latest CMOS high speed A/D flash converters and low-power gated logic arrays. The pulse processor measures pulse height, pulse area and the required timing information (e.g. multi detector coincidence and pulse pile-up detection). The pulse processor features high throughput rate (e.g. 0.5 Mhz for 2 usec gausssian pulses) and improved differential linearity (e.g. + or - 0.2 LSB for a + or - 1 LSB A/D). Because of the parallel digital architecture of the device, the interface is microprocessor bus compatible. A satellite flight application of this module is presented for use in the X-ray imager and high energy particle spectrometers of the PEM experiment on the Upper Atmospheric Research Satellite
Single particle irradiation effect of digital signal processor

International Nuclear Information System (INIS)

Fan Si'an; Chen Kenan

2010-01-01

The single particle irradiation effect of high energy neutron on digital signal processor TMS320P25 in dynamic working condition has been studied. The influence of the single particle on the device has been explored through the acquired waveform and working current of TMS320P25. Analysis results, test data and test methods have also been presented. (authors)
An implementation of the SANE Virtual Processor using POSIX threads

NARCIS (Netherlands)

van Tol, M.W.; Jesshope, C.R.; Lankamp, M.; Polstra, S.

2009-01-01

The SANE Virtual Processor (SVP) is an abstract concurrent programming model that is both deadlock free and supports efficient implementation. It is captured by the μTC programming language. The work presented in this paper covers a portable implementation of this model as a C++ library on top of
Multiple Embedded Processors for Fault-Tolerant Computing

Science.gov (United States)

Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy

2005-01-01

A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.
Digital implementation of the preloaded filter pulse processor

International Nuclear Information System (INIS)

Westphal, G.P.; Cadek, G.R.; Keroe, N.; Sauter, TH.; Thorwartl, P.C.

1995-01-01

Adapting it's processing time to the respective pulse intervals, the Preloaded Filter (PLF) pulse processor offers optimum resolution together with highest possible throughput rates. The PLF algorithm could be formulated in a recursive manner which made possible it's implementation by means of a large field-programmable gate array, as a fast, pipe-lined digital processor with 10 MHz maximum throughput rate. While pre-filter digitization by an ADC with 12 bit resolution and 10M Hz sampling rate resulted in a poorer resolution than that of an analog filter, a digital PLF based on an ADC with 14 bit resolution and 10 MHz sampling rate, surpassed high-quality analog filters in resolution, throughput rate and long-term stability. (author) 6 refs.; 7 figs
Safe and Efficient Support for Embeded Multi-Processors in ADA

Science.gov (United States)

Ruiz, Jose F.

2010-08-01

New software demands increasing processing power, and multi-processor platforms are spreading as the answer to achieve the required performance. Embedded real-time systems are also subject to this trend, but in the case of real-time mission-critical systems, the properties of reliability, predictability and analyzability are also paramount. The Ada 2005 language defined a subset of its tasking model, the Ravenscar profile, that provides the basis for the implementation of deterministic and time analyzable applications on top of a streamlined run-time system. This Ravenscar tasking profile, originally designed for single processors, has proven remarkably useful for modelling verifiable real-time single-processor systems. This paper proposes a simple extension to the Ravenscar profile to support multi-processor systems using a fully partitioned approach. The implementation of this scheme is simple, and it can be used to develop applications amenable to schedulability analysis.
Design and implementation of a high performance network security processor

Science.gov (United States)

Wang, Haixin; Bai, Guoqiang; Chen, Hongyi

2010-03-01

The last few years have seen many significant progresses in the field of application-specific processors. One example is network security processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from network processors (NPs). This article presents a high performance NSP system architecture implementation intended for both internet protocol security (IPSec) and secure socket layer (SSL) protocol acceleration, which are widely employed in virtual private network (VPN) and e-commerce applications. The efficient dual one-way pipelined data transfer skeleton and optimised integration scheme of the heterogenous parallel crypto engine arrays lead to a Gbps rate NSP, which is programmable with domain specific descriptor-based instructions. The descriptor-based control flow fragments large data packets and distributes them to the crypto engine arrays, which fully utilises the parallel computation resources and improves the overall system data throughput. A prototyping platform for this NSP design is implemented with a Xilinx XC3S5000 based FPGA chip set. Results show that the design gives a peak throughput for the IPSec ESP tunnel mode of 2.85 Gbps with over 2100 full SSL handshakes per second at a clock rate of 95 MHz.
Implementation of an Ethernet-Based Communication Channel for the Patmos Processor

DEFF Research Database (Denmark)

Pezzarossa, Luca; Kenn Toft, Jakob; Lønbæk, Jesper

The Patmos processor, which is used as the intellectual property of the T-CREST platform, is only equipped with a RS-232 serial port for communication with the outside world. The serial port is a minimal input/output device with a limited speed and without native networking features. An Ethernet 10....../100BASE-T IEEE 802.3 based communication channel is a reliable and high speed communication interface (10/100 Mbits/s) that also supports networking. This technical report presents an implementation of an Ethernet-based communication channel for the Patmos processor, targeting the Terasic DE2......-115 development board. We have designed the hardware to interface the EthMac Ethernet controller from OpenCores to Patmos and to the physical chip of the development board, and we have implemented a software library to drive the controller and to support some essential protocols. The design was implemented...
DESIGN AND IMPLEMENTATION OF A VHDL PROCESSOR FOR DCT BASED IMAGE COMPRESSION

Directory of Open Access Journals (Sweden)

Md. Shabiul Islam

2017-11-01

Full Text Available This paper describes the design and implementation of a VHDL processor meant for performing 2D-Discrete Cosine Transform (DCT to use in image compression applications. The design flow starts from the system specification to implementation on silicon and the entire process is carried out using an advanced workstation based design environment for digital signal processing. The software allows the bit-true analysis to ensure that the designed VLSI processor satisfies the required specifications. The bit-true analysis is performed on all levels of abstraction (behavior, VHDL etc.. The motivation behind the work is smaller size chip area, faster processing, reducing the cost of the chip
A parallel FPGA implementation for real-time 2D pixel clustering for the ATLAS Fast Tracker Processor

International Nuclear Information System (INIS)

Sotiropoulou, C L; Gkaitatzis, S; Kordas, K; Nikolaidis, S; Petridou, C; Annovi, A; Beretta, M; Volpi, G

2014-01-01

The parallel 2D pixel clustering FPGA implementation used for the input system of the ATLAS Fast TracKer (FTK) processor is presented. The input system for the FTK processor will receive data from the Pixel and micro-strip detectors from inner ATLAS read out drivers (RODs) at full rate, for total of 760Gbs, as sent by the RODs after level-1 triggers. Clustering serves two purposes, the first is to reduce the high rate of the received data before further processing, the second is to determine the cluster centroid to obtain the best spatial measurement. For the pixel detectors the clustering is implemented by using a 2D-clustering algorithm that takes advantage of a moving window technique to minimize the logic required for cluster identification. The cluster detection window size can be adjusted for optimizing the cluster identification process. Additionally, the implementation can be parallelized by instantiating multiple cores to identify different clusters independently thus exploiting more FPGA resources. This flexibility makes the implementation suitable for a variety of demanding image processing applications. The implementation is robust against bit errors in the input data stream and drops all data that cannot be identified. In the unlikely event of missing control words, the implementation will ensure stable data processing by inserting the missing control words in the data stream. The 2D pixel clustering implementation is developed and tested in both single flow and parallel versions. The first parallel version with 16 parallel cluster identification engines is presented. The input data from the RODs are received through S-Links and the processing units that follow the clustering implementation also require a single data stream, therefore data parallelizing (demultiplexing) and serializing (multiplexing) modules are introduced in order to accommodate the parallelized version and restore the data stream afterwards. The results of the first hardware tests of
Considerations for control system software verification and validation specific to implementations using distributed processor architectures

International Nuclear Information System (INIS)

Munro, J.K. Jr.

1993-01-01

Until recently, digital control systems have been implemented on centralized processing systems to function in one of several ways: (1) as a single processor control system; (2) as a supervisor at the top of a hierarchical network of multiple processors; or (3) in a client-server mode. Each of these architectures uses a very different set of communication protocols. The latter two architectures also belong to the category of distributed control systems. Distributed control systems can have a central focus, as in the cases just cited, or be quite decentralized in a loosely coupled, shared responsibility arrangement. This last architecture is analogous to autonomous hosts on a local area network. Each of the architectures identified above will have a different set of architecture-associated issues to be addressed in the verification and validation activities during software development. This paper summarizes results of efforts to identify, describe, contrast, and compare these issues
Implementation of CT and IHT Processors for Invariant Object Recognition System

Directory of Open Access Journals (Sweden)

J. Turan jr.

2004-12-01

Full Text Available This paper presents PDL or ASIC implementation of key modules ofinvariant object recognition system based on the combination of theIncremental Hough transform (IHT, correlation and rapid transform(RT. The invariant object recognition system was represented partiallyin C++ language for general-purpose processor on personal computer andpartially described in VHDL code for implementation in PLD or ASIC.
Computationally efficient implementation of sarse-tap FIR adaptive filters with tap-position control on intel IA-32 processors

OpenAIRE

Hirano, Akihiro; Nakayama, Kenji

2008-01-01

This paper presents an computationally ef cient implementation of sparse-tap FIR adaptive lters with tapposition control on Intel IA-32 processors with single-instruction multiple-data (SIMD) capability. In order to overcome randomorder memory access which prevents a ectorization, a blockbased processing and a re-ordering buffer are introduced. A dynamic register allocation and the use of memory-to-register operations help the maximization of the loop-unrolling level. Up to 66percent speedup ...
Dynamic overset grid communication on distributed memory parallel processors

Science.gov (United States)

Barszcz, Eric; Weeratunga, Sisira K.; Meakin, Robert L.

1993-01-01

A parallel distributed memory implementation of intergrid communication for dynamic overset grids is presented. Included are discussions of various options considered during development. Results are presented comparing an Intel iPSC/860 to a single processor Cray Y-MP. Results for grids in relative motion show the iPSC/860 implementation to be faster than the Cray implementation.
Implementation of 4-way Superscalar Hash MIPS Processor Using FPGA

Science.gov (United States)

Sahib Omran, Safaa; Fouad Jumma, Laith

2018-05-01

Due to the quick advancements in the personal communications systems and wireless communications, giving data security has turned into a more essential subject. This security idea turns into a more confounded subject when next-generation system requirements and constant calculation speed are considered in real-time. Hash functions are among the most essential cryptographic primitives and utilized as a part of the many fields of signature authentication and communication integrity. These functions are utilized to acquire a settled size unique fingerprint or hash value of an arbitrary length of message. In this paper, Secure Hash Algorithms (SHA) of types SHA-1, SHA-2 (SHA-224, SHA-256) and SHA-3 (BLAKE) are implemented on Field-Programmable Gate Array (FPGA) in a processor structure. The design is described and implemented using a hardware description language, namely VHSIC “Very High Speed Integrated Circuit” Hardware Description Language (VHDL). Since the logical operation of the hash types of (SHA-1, SHA-224, SHA-256 and SHA-3) are 32-bits, so a Superscalar Hash Microprocessor without Interlocked Pipelines (MIPS) processor are designed with only few instructions that were required in invoking the desired Hash algorithms, when the four types of hash algorithms executed sequentially using the designed processor, the total time required equal to approximately 342 us, with a throughput of 4.8 Mbps while the required to execute the same four hash algorithms using the designed four-way superscalar is reduced to 237 us with improved the throughput to 5.1 Mbps.
Design Approach and Implementation of Application Specific Instruction Set Processor for SHA-3 BLAKE Algorithm

Science.gov (United States)

Zhang, Yuli; Han, Jun; Weng, Xinqian; He, Zhongzhu; Zeng, Xiaoyang

This paper presents an Application Specific Instruction-set Processor (ASIP) for the SHA-3 BLAKE algorithm family by instruction set extensions (ISE) from an RISC (reduced instruction set computer) processor. With a design space exploration for this ASIP to increase the performance and reduce the area cost, we accomplish an efficient hardware and software implementation of BLAKE algorithm. The special instructions and their well-matched hardware function unit improve the calculation of the key section of the algorithm, namely G-functions. Also, relaxing the time constraint of the special function unit can decrease its hardware cost, while keeping the high data throughput of the processor. Evaluation results reveal the ASIP achieves 335Mbps and 176Mbps for BLAKE-256 and BLAKE-512. The extra area cost is only 8.06k equivalent gates. The proposed ASIP outperforms several software approaches on various platforms in cycle per byte. In fact, both high throughput and low hardware cost achieved by this programmable processor are comparable to that of ASIC implementations.
Interleaved Subtask Scheduling on Multi Processor SOC

NARCIS (Netherlands)

Zhe, M.

2006-01-01

The ever-progressing semiconductor processing technique has integrated more and more embedded processors on a single system-on-achip (SoC). With such powerful SoC platforms, and also due to the stringent time-to-market deadlines, many functionalities which used to be implemented in ASICs are

gFEX, the ATLAS Calorimeter Level-1 Real Time Processor

CERN Document Server

AUTHOR|(SzGeCERN)759889; The ATLAS collaboration; Begel, Michael; Chen, Hucheng; Lanni, Francesco; Takai, Helio; Wu, Weihao

2016-01-01

The global feature extractor (gFEX) is a component of the Level-1 Calorimeter trigger Phase-I upgrade for the ATLAS experiment. It is intended to identify patterns of energy associated with the hadronic decays of high momentum Higgs, W, & Z bosons, top quarks, and exotic particles in real time at the LHC crossing rate. The single processor board will be packaged in an Advanced Telecommunications Computing Architecture (ATCA) module and implemented as a fast reconfigurable processor based on three Xilinx Vertex Ultra-scale FPGAs. The board will receive coarse-granularity information from all the ATLAS calorimeters on 276 optical fibers with the data transferred at the 40 MHz Large Hadron Collider (LHC) clock frequency. The gFEX will be controlled by a single system-on-chip processor, ZYNQ, that will be used to configure all the processor Field-Programmable Gate Array (FPGAs), monitor board health, and interface to external signals. Now, the pre-prototype board which includes one ZYNQ and one Vertex-7 FPGA ...
gFEX, the ATLAS Calorimeter Level 1 Real Time Processor

CERN Document Server

Tang, Shaochun; The ATLAS collaboration

2015-01-01

The global feature extractor (gFEX) is a component of the Level-1Calorimeter trigger Phase-I upgrade for the ATLAS experiment. It is intended to identify patterns of energy associated with the hadronic decays of high momentum Higgs, W, & Z bosons, top quarks, and exotic particles in real time at the LHC crossing rate. The single processor board will be packaged in an Advanced Telecommunications Computing Architecture (ATCA) module and implemented as a fast reconfigurable processor based on three Xilinx Ultra-scale FPGAs. The board will receive coarse-granularity information from all the ATLAS calorimeters on 264 optical fibers with the data transferred at the 40 MHz LHC clock frequency. The gFEX will be controlled by a single system-on-chip processor, ZYNQ, that will be used to configure all the processor FPGAs, monitor board health, and interface to external signals. Now, the pre-prototype board which includes one ZYNQ and one Vertex-7 FPGA has been designed for testing and verification. The performance ...
Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

OpenAIRE

Abdul Kareem PARCHUR; Ram Asaray SINGH

2012-01-01

High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310). The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many ke...
Dual-core Itanium Processor

CERN Multimedia

2006-01-01

Intel’s first dual-core Itanium processor, code-named "Montecito" is a major release of Intel's Itanium 2 Processor Family, which implements the Intel Itanium architecture on a dual-core processor with two cores per die (integrated circuit). Itanium 2 is much more powerful than its predecessor. It has lower power consumption and thermal dissipation.
The Molen Polymorphic Media Processor

NARCIS (Netherlands)

Kuzmanov, G.K.

2004-01-01

In this dissertation, we address high performance media processing based on a tightly coupled co-processor architectural paradigm. More specifically, we introduce a reconfigurable media augmentation of a general purpose processor and implement it into a fully operational processor prototype. The
An efficient ASIC implementation of 16-channel on-line recursive ICA processor for real-time EEG system.

Science.gov (United States)

Fang, Wai-Chi; Huang, Kuan-Ju; Chou, Chia-Ching; Chang, Jui-Chung; Cauwenberghs, Gert; Jung, Tzyy-Ping

2014-01-01

This is a proposal for an efficient very-large-scale integration (VLSI) design, 16-channel on-line recursive independent component analysis (ORICA) processor ASIC for real-time EEG system, implemented with TSMC 40 nm CMOS technology. ORICA is appropriate to be used in real-time EEG system to separate artifacts because of its highly efficient and real-time process features. The proposed ORICA processor is composed of an ORICA processing unit and a singular value decomposition (SVD) processing unit. Compared with previous work [1], this proposed ORICA processor has enhanced effectiveness and reduced hardware complexity by utilizing a deeper pipeline architecture, shared arithmetic processing unit, and shared registers. The 16-channel random signals which contain 8-channel super-Gaussian and 8-channel sub-Gaussian components are used to analyze the dependence of the source components, and the average correlation coefficient is 0.95452 between the original source signals and extracted ORICA signals. Finally, the proposed ORICA processor ASIC is implemented with TSMC 40 nm CMOS technology, and it consumes 15.72 mW at 100 MHz operating frequency.
Synchronization of faulty processors in coarse-grained TMR protected partially reconfigurable FPGA designs

International Nuclear Information System (INIS)

Kretzschmar, U.; Gomez-Cornejo, J.; Astarloa, A.; Bidarte, U.; Ser, J. Del

2016-01-01

The expansion of FPGA technology in numerous application fields is a fact. Single Event Effects (SEE) are a critical factor for the reliability of FPGA based systems. For this reason, a number of researches have been studying fault tolerance techniques to harden different elements of FPGA designs. Using Partial Reconfiguration (PR) in conjunction with Triple Modular Redundancy (TMR) is an emerging approach in recent publications dealing with the implementation of fault tolerant processors on SRAM-based FPGAs. While these works pay great attention to the repair of erroneous instances by means of reconfiguration, the essential step of synchronizing the repaired processors is insufficiently addressed. In this context, this paper poses four different synchronization approaches for soft core processors, which balance differently the trade-off between synchronization speed and hardware overhead. All approaches are assessed in practice by synchronizing TMR protected PicoBlaze processors implemented on a Virtex-5 FPGA. Nevertheless all methods are of a general nature and can be applied for different processor architectures in a straightforward fashion. - Highlights: • Four different synchronization methods for faulty processors are proposed. • The methods balance between synchronization speed and hardware overhead. • They can be applied to TMR-protected reconfigurable FPGA designs. • The proposed schemes are implemented and tested in real hardware.
A scalable single-chip multi-processor architecture with on-chip RTOS kernel

NARCIS (Netherlands)

Theelen, B.D.; Verschueren, A.C.; Reyes Suarez, V.V.; Stevens, M.P.J.; Nunez, A.

2003-01-01

Now that system-on-chip technology is emerging, single-chip multi-processors are becoming feasible. A key problem of designing such systems is the complexity of their on-chip interconnects and memory architecture. It is furthermore unclear at what level software should be integrated. An example of a
High performance graphics processors for medical imaging applications

International Nuclear Information System (INIS)

Goldwasser, S.M.; Reynolds, R.A.; Talton, D.A.; Walsh, E.S.

1989-01-01

This paper describes a family of high- performance graphics processors with special hardware for interactive visualization of 3D human anatomy. The basic architecture expands to multiple parallel processors, each processor using pipelined arithmetic and logical units for high-speed rendering of Computed Tomography (CT), Magnetic Resonance (MR) and Positron Emission Tomography (PET) data. User-selectable display alternatives include multiple 2D axial slices, reformatted images in sagittal or coronal planes and shaded 3D views. Special facilities support applications requiring color-coded display of multiple datasets (such as radiation therapy planning), or dynamic replay of time- varying volumetric data (such as cine-CT or gated MR studies of the beating heart). The current implementation is a single processor system which generates reformatted images in true real time (30 frames per second), and shaded 3D views in a few seconds per frame. It accepts full scale medical datasets in their native formats, so that minimal preprocessing delay exists between data acquisition and display
Processors for wavelet analysis and synthesis: NIFS and TI-C80 MVP

Science.gov (United States)

Brooks, Geoffrey W.

1996-03-01

Two processors are considered for image quadrature mirror filtering (QMF). The neuromorphic infrared focal-plane sensor (NIFS) is an existing prototype analog processor offering high speed spatio-temporal Gaussian filtering, which could be used for the QMF low- pass function, and difference of Gaussian filtering, which could be used for the QMF high- pass function. Although not designed specifically for wavelet analysis, the biologically- inspired system accomplishes the most computationally intensive part of QMF processing. The Texas Instruments (TI) TMS320C80 Multimedia Video Processor (MVP) is a 32-bit RISC master processor with four advanced digital signal processors (DSPs) on a single chip. Algorithm partitioning, memory management and other issues are considered for optimal performance. This paper presents these considerations with simulated results leading to processor implementation of high-speed QMF analysis and synthesis.
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

Science.gov (United States)

Manolakos, Elias S.

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

Science.gov (United States)

Sharma, Anuj; Manolakos, Elias S

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.
Implementation schemes in NMR of quantum processors and the Deutsch-Jozsa algorithm by using virtual spin representation

International Nuclear Information System (INIS)

Kessel, Alexander R.; Yakovleva, Natalia M.

2002-01-01

Schemes of experimental realization of the main two-qubit processors for quantum computers and the Deutsch-Jozsa algorithm are derived in virtual spin representation. The results are applicable for every four quantum states allowing the required properties for quantum processor implementation if for qubit encoding, virtual spin representation is used. A four-dimensional Hilbert space of nuclear spin 3/2 is considered in detail for this aim
Programming the Linpack Benchmark for the IBM PowerXCell 8i Processor

Directory of Open Access Journals (Sweden)

Michael Kistler

2009-01-01

Full Text Available In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i1 processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™2 architecture and contains a set of special-purpose processing cores known as Synergistic Processing Elements (SPEs. The SPEs can be used as computational accelerators to augment the main PowerPC processor. The added computational capability of the SPEs results in a peak double precision floating point capability of 108.8 GFLOPS. We explain how we modified the standard open source implementation of Linpack to accelerate key computational kernels using the SPEs of the PowerXCell 8i processors. We describe in detail the implementation and performance of the computational kernels and also explain how we employed the SPEs for high-speed data movement and reformatting. The result of these modifications is a Linpack benchmark optimized for the IBM PowerXCell 8i processor that achieves 170.7 GFLOPS on a BladeCenter QS22 with 32 GB of DDR2 SDRAM memory. Our implementation of Linpack also supports clusters of QS22s, and was used to achieve a result of 11.1 TFLOPS on a cluster of 84 QS22 blades. We compare our results on a single BladeCenter QS22 with the base Linpack implementation without SPE acceleration to illustrate the benefits of our optimizations.
Median and Morphological Specialized Processors for a Real-Time Image Data Processing

Directory of Open Access Journals (Sweden)

Kazimierz Wiatr

2002-01-01

Full Text Available This paper presents the considerations on selecting a multiprocessor MISD architecture for fast implementation of the vision image processing. Using the authorÃ¢Â€Â²s earlier experience with real-time systems, implementing of specialized hardware processors based on the programmable FPGA systems has been proposed in the pipeline architecture. In particular, the following processors are presented: median filter and morphological processor. The structure of a universal reconfigurable processor developed has been proposed as well. Experimental results are presented as delays on LCA level implementation for median filter, morphological processor, convolution processor, look-up-table processor, logic processor and histogram processor. These times compare with delays in general purpose processor and DSP processor.
Safety-critical Java on a time-predictable processor

DEFF Research Database (Denmark)

Korsholm, Stephan E.; Schoeberl, Martin; Puffitsch, Wolfgang

2015-01-01

For real-time systems the whole execution stack needs to be time-predictable and analyzable for the worst-case execution time (WCET). This paper presents a time-predictable platform for safety-critical Java. The platform consists of (1) the Patmos processor, which is a time-predictable processor......; (2) a C compiler for Patmos with support for WCET analysis; (3) the HVM, which is a Java-to-C compiler; (4) the HVM-SCJ implementation which supports SCJ Level 0, 1, and 2 (for both single and multicore platforms); and (5) a WCET analysis tool. We show that real-time Java programs translated to C...... and compiled to a Patmos binary can be analyzed by the AbsInt aiT WCET analysis tool. To the best of our knowledge the presented system is the second WCET analyzable real-time Java system; and the first one on top of a RISC processor....
A Geometric Algebra Co-Processor for Color Edge Detection

Directory of Open Access Journals (Sweden)

Biswajit Mishra

2015-01-01

Full Text Available This paper describes advancement in color edge detection, using a dedicated Geometric Algebra (GA co-processor implemented on an Application Specific Integrated Circuit (ASIC. GA provides a rich set of geometric operations, giving the advantage that many signal and image processing operations become straightforward and the algorithms intuitive to design. The use of GA allows images to be represented with the three R, G, B color channels defined as a single entity, rather than separate quantities. A novel custom ASIC is proposed and fabricated that directly targets GA operations and results in significant performance improvement for color edge detection. Use of the hardware described in this paper also shows that the convolution operation with the rotor masks within GA belongs to a class of linear vector filters and can be applied to image or speech signals. The contribution of the proposed approach has been demonstrated by implementing three different types of edge detection schemes on the proposed hardware. The overall performance gains using the proposed GA Co-Processor over existing software approaches are more than 3.2× faster than GAIGEN and more than 2800× faster than GABLE. The performance of the fabricated GA co-processor is approximately an order of magnitude faster than previously published results for hardware implementations.
Development of a new signal processor for tetralateral position sensitive detector based on single-chip microcomputer

International Nuclear Information System (INIS)

Huang Meizhen; Shi Longzhao; Wang Yuxing; Ni Yi; Li Zhenqing; Ding Haifeng

2006-01-01

An inherently nonlinear relation between the output current of the tetralateral position sensitive detector (PSD) and the position of the incident light spot has been found theoretically. Based on single-chip microcomputer and the theoretical relation between output current and position, a new signal processor capable of correcting nonlinearity and reducing position measurement deviation of tetralateral PSD was developed. A tetralateral PSD (S1200, 13x13 mm 2 , Hamamatsu Photonics K.K.) was measured with the new signal processor, a linear relation between the output position of the PSD, and the incident position of the light spot was obtained. In the 60% range of a 13x13 mm 2 active area, the position nonlinearity (rms) was 0.15% and the position measurement deviation (rms) was ±20 μm. Compared with traditional analog signal processor, the new signal processor is of better compatibility, lower cost, higher precision, and easier to be interfaced
Implementation of an EPICS IOC on an Embedded Soft Core Processor Using Field Programmable Gate Arrays

International Nuclear Information System (INIS)

Douglas Curry; Alicia Hofler; Hai Dong; Trent Allison; J. Hovater; Kelly Mahoney

2005-01-01

At Jefferson Lab, we have been evaluating soft core processors running an EPICS IOC over μClinux on our custom hardware. A soft core processor is a flexible CPU architecture that is configured in the FPGA as opposed to a hard core processor which is fixed in silicon. Combined with an on-board Ethernet port, the technology incorporates the IOC and digital control hardware within a single FPGA. By eliminating the general purpose computer IOC, the designer is no longer tied to a specific platform, e.g. PC, VME, or VXI, to serve as the intermediary between the high level controls and the field hardware. This paper will discuss the design and development process as well as specific applications for JLab's next generation low-level RF controls and Machine Protection Systems
Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

Directory of Open Access Journals (Sweden)

Abdul Kareem PARCHUR

2012-08-01

Full Text Available High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310. The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many key application areas in modern generation. The scaling of performance in two major series of Intel Xeon processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310 has been analyzed using the performance numbers of 12 CPU2006 integer benchmarks, performance numbers that exhibit significant differences in performance. The results and analysis can be used by performance engineers, scientists and developers to better understand the performance scaling in modern generation processors.

Multi-Core Processor Memory Contention Benchmark Analysis Case Study

Science.gov (United States)

Simon, Tyler; McGalliard, James

2009-01-01

Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.
3081/E processor

International Nuclear Information System (INIS)

Kunz, P.F.; Gravina, M.; Oxoby, G.

1984-04-01

The 3081/E project was formed to prepare a much improved IBM mainframe emulator for the future. Its design is based on a large amount of experience in using the 168/E processor to increase available CPU power in both online and offline environments. The processor will be at least equal to the execution speed of a 370/168 and up to 1.5 times faster for heavy floating point code. A single processor will thus be at least four times more powerful than the VAX 11/780, and five processors on a system would equal at least the performance of the IBM 3081K. With its large memory space and simple but flexible high speed interface, the 3081/E is well suited for the online and offline needs of high energy physics in the future
Multiprocessor Real-Time Scheduling with Hierarchical Processor Affinities

OpenAIRE

Bonifaci , Vincenzo; Brandenburg , Björn; D'Angelo , Gianlorenzo; Marchetti-Spaccamela , Alberto

2016-01-01

International audience; Many multiprocessor real-time operating systems offer the possibility to restrict the migrations of any task to a specified subset of processors by setting affinity masks. A notion of " strong arbitrary processor affinity scheduling " (strong APA scheduling) has been proposed; this notion avoids schedulability losses due to overly simple implementations of processor affinities. Due to potential overheads, strong APA has not been implemented so far in a real-time operat...
Discussion paper for a highly parallel array processor-based machine

International Nuclear Information System (INIS)

Hagstrom, R.; Bolotin, G.; Dawson, J.

1984-01-01

The architectural plant for a quickly realizable implementation of a highly parallel special-purpose computer system with peak performance in the range of 6 billion floating point operations per second is discussed. The architecture is suitable to Lattice Gauge theoretical computations of fundamental physics interest and may be applicable to a range of other problems which deal with numerically intensive computational problems. The plan is quickly realizable because it employs a maximum of commercially available hardware subsystems and because the architecture is software-transparent to the individual processors, allowing straightforward re-use of whatever commercially available operating-systems and support software that is suitable to run on the commercially-produced processors. A tiny prototype instrument, designed along this architecture has already operated. A few elementary examples of programs which can run efficiently are presented. The large machine which the authors would propose to build would be based upon a highly competent array-processor, the ST-100 Array Processor, and specific design possibilities are discussed. The first step toward realizing this plan practically is to install a single ST-100 to allow algorithm development to proceed while a demonstration unit is built using two of the ST-100 Array Processors
Quality-driven model-based design of multi-processor accelerators : an application to LDPC decoders

NARCIS (Netherlands)

Jan, Y.

2012-01-01

The recent spectacular progress in nano-electronic technology has enabled the implementation of very complex multi-processor systems on single chips (MPSoCs). However in parallel, new highly demanding complex embedded applications are emerging, in fields like communication and networking,
Development of Softcore Processor based RTU for FBR

International Nuclear Information System (INIS)

Gour, Aditya; Santhana Raj, A.; Behera, R.P.; Murali, N.; Swaminathan, P.

2010-01-01

Remote Terminal Units (RTU) are used to acquire analog/digital signals and generate potential free contact outputs and send the acquired data through LAN. The aim of this design is to develop a Soft-Core Processor based RTU by implementing the glue logic along with 8051 microcontroller present in existing RTUs into a single FPGA, so that component count and power consumption on the board will be reduced and thereby achieving a higher reliability than before. Implementation of glue logic was done using VHDL and Altium's TSK51 Softcore was used in place of 8051 microcontroller. (author)
Hardware trigger processor for the MDT system

CERN Document Server

AUTHOR|(SzGeCERN)757787; The ATLAS collaboration; Hazen, Eric; Butler, John; Black, Kevin; Gastler, Daniel Edward; Ntekas, Konstantinos; Taffard, Anyes; Martinez Outschoorn, Verena; Ishino, Masaya; Okumura, Yasuyuki

2017-01-01

We are developing a low-latency hardware trigger processor for the Monitored Drift Tube system in the Muon spectrometer. The processor will fit candidate Muon tracks in the drift tubes in real time, improving significantly the momentum resolution provided by the dedicated trigger chambers. We present a novel pure-FPGA implementation of a Legendre transform segment finder, an associative-memory alternative implementation, an ARM (Zynq) processor-based track fitter, and compact ATCA carrier board architecture. The ATCA architecture is designed to allow a modular, staged approach to deployment of the system and exploration of alternative technologies.
The communication processor of TUMULT-64

NARCIS (Netherlands)

Smit, Gerardus Johannes Maria; Jansen, P.G.

1988-01-01

Tumult (Twente University MULTi-processor system) is a modular extendible multi-processor system designed and implemented at the Twente University of Technology in co-operation with Oce Nederland B.V. and the Dr. Neher Laboratories (Dutch PTT). Characteristics of the hardware are: MIMD type,
A Parallel FPGA Implementation for Real-Time 2D Pixel Clustering for the ATLAS Fast TracKer Processor

CERN Document Server

Sotiropoulou, C-L; The ATLAS collaboration; Annovi, A; Beretta, M; Kordas, K; Nikolaidis, S; Petridou, C; Volpi, G

2014-01-01

The parallel 2D pixel clustering FPGA implementation used for the input system of the ATLAS Fast TracKer (FTK) processor is presented. The input system for the FTK processor will receive data from the Pixel and micro-strip detectors from inner ATLAS read out drivers (RODs) at full rate, for total of 760Gbs, as sent by the RODs after level-1 triggers. Clustering serves two purposes, the first is to reduce the high rate of the received data before further processing, the second is to determine the cluster centroid to obtain the best spatial measurement. For the pixel detectors the clustering is implemented by using a 2D-clustering algorithm that takes advantage of a moving window technique to minimize the logic required for cluster identification. The cluster detection window size can be adjusted for optimizing the cluster identification process. Additionally, the implementation can be parallelized by instantiating multiple cores to identify different clusters independently thus exploiting more FPGA resources. ...
A Parallel FPGA Implementation for Real-Time 2D Pixel Clustering for the ATLAS Fast TracKer Processor

CERN Document Server

Sotiropoulou, C-L; The ATLAS collaboration; Annovi, A; Beretta, M; Kordas, K; Nikolaidis, S; Petridou, C; Volpi, G

2014-01-01

The parallel 2D pixel clustering FPGA implementation used for the input system of the ATLAS Fast TracKer (FTK) processor is presented. The input system for the FTK processor will receive data from the Pixel and micro-strip detectors from inner ATLAS read out drivers (RODs) at full rate, for total of 760Gbs, as sent by the RODs after level1 triggers. Clustering serves two purposes, the first is to reduce the high rate of the received data before further processing, the second is to determine the cluster centroid to obtain the best spatial measurement. For the pixel detectors the clustering is implemented by using a 2D-clustering algorithm that takes advantage of a moving window technique to minimize the logic required for cluster identification. The cluster detection window size can be adjusted for optimizing the cluster identification process. Additionally, the implementation can be parallelized by instantiating multiple cores to identify different clusters independently thus exploiting more FPGA resources. T...
A UNIX-based prototype biomedical virtual image processor

International Nuclear Information System (INIS)

Fahy, J.B.; Kim, Y.

1987-01-01

The authors have developed a multiprocess virtual image processor for the IBM PC/AT, in order to maximize image processing software portability for biomedical applications. An interprocess communication scheme, based on two-way metacode exchange, has been developed and verified for this purpose. Application programs call a device-independent image processing library, which transfers commands over a shared data bridge to one or more Autonomous Virtual Image Processors (AVIP). Each AVIP runs as a separate process in the UNIX operating system, and implements the device-independent functions on the image processor to which it corresponds. Application programs can control multiple image processors at a time, change the image processor configuration used at any time, and are completely portable among image processors for which an AVIP has been implemented. Run-time speeds have been found to be acceptable for higher level functions, although rather slow for lower level functions, owing to the overhead associated with sending commands and data over the shared data bridge
Java Processor Optimized for RTSJ

Directory of Open Access Journals (Sweden)

Tu Shiliang

2007-01-01

Full Text Available Due to the preeminent work of the real-time specification for Java (RTSJ, Java is increasingly expected to become the leading programming language in real-time systems. To provide a Java platform suitable for real-time applications, a Java processor which can execute Java bytecode is directly proposed in this paper. It provides efficient support in hardware for some mechanisms specified in the RTSJ and offers a simpler programming model through ameliorating the scoped memory of the RTSJ. The worst case execution time (WCET of the bytecodes implemented in this processor is predictable by employing the optimization method proposed in our previous work, in which all the processing interfering predictability is handled before bytecode execution. Further advantage of this method is to make the implementation of the processor simpler and suited to a low-cost FPGA chip.
Sensitometric control of roentgen film processors

International Nuclear Information System (INIS)

Forsberg, H.; Karolinska Sjukhuset, Stockholm

1987-01-01

Monitoring of film processors performance is essential since image quality, patient dose and costs are influenced by the performance. A system for sensitometric constancy control of film processors and their associated components is described. Experience with the system for 3 years is given when implemented on 17 film processors. Modern high quality film processors have a stability that makes a test frequency of once a week sufficient to maintain adequate image quality. The test system is so sensitive that corrective actions almost invariably have been taken before any technical problem degraded the image quality to a visible degree. (orig.)
Lipsi: Probably the Smallest Processor in the World

DEFF Research Database (Denmark)

Schoeberl, Martin

2018-01-01

While research on high-performance processors is important, it is also interesting to explore processor architectures at the other end of the spectrum: tiny processor cores for auxiliary functions. While it is common to implement small circuits for such functions, such as a serial port, in dedica...... at a minimal cost....
Acoustooptic linear algebra processors - Architectures, algorithms, and applications

Science.gov (United States)

Casasent, D.

1984-01-01

Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
Single event effects and performance predictions for space applications of RISC processors

International Nuclear Information System (INIS)

Kimbrough, J.R.; Colella, N.J.; Denton, S.M.; Shaeffer, D.L.; Shih, D.; Wilburn, J.W.; Coakley, P.G.; Casteneda, C.; Koga, R.; Clark, D.A.; Ullmann, J.L.

1994-01-01

Proton and ion Single Event Phenomena (SEP) tests were performed on 32-b processors including R3000A's from all commercial manufacturers along with the Performance PR3400 family, Integrated Device Technology Inc. 79R3081, LSI Logic Corporation LR33000HC, and Intel i80960MX parts. The microprocessors had acceptable upset rates for operation in a low earth orbit or a lunar mission such as CLEMENTINE with a wide range in proton total dose failure. Even though R3000A devices are 60% smaller in physical area than R3000 devices, there was a 340% increase in device Single Event Upset (SEU) cross section. Software tests of varying complexity demonstrate that registers and other functional blocks using register architecture dominate the cross section. The current approach of giving a single upset cross section can lead to erroneous upset rates depending on the application software
Functional Verification of Enhanced RISC Processor

OpenAIRE

SHANKER NILANGI; SOWMYA L

2013-01-01

This paper presents design and verification of a 32-bit enhanced RISC processor core having floating point computations integrated within the core, has been designed to reduce the cost and complexity. The designed 3 stage pipelined 32-bit RISC processor is based on the ARM7 processor architecture with single precision floating point multiplier, floating point adder/subtractor for floating point operations and 32 x 32 booths multiplier added to the integer core of ARM7. The binary representati...
High-Speed General Purpose Genetic Algorithm Processor.

Science.gov (United States)

Hoseini Alinodehi, Seyed Pourya; Moshfe, Sajjad; Saber Zaeimian, Masoumeh; Khoei, Abdollah; Hadidi, Khairollah

2016-07-01

In this paper, an ultrafast steady-state genetic algorithm processor (GAP) is presented. Due to the heavy computational load of genetic algorithms (GAs), they usually take a long time to find optimum solutions. Hardware implementation is a significant approach to overcome the problem by speeding up the GAs procedure. Hence, we designed a digital CMOS implementation of GA in [Formula: see text] process. The proposed processor is not bounded to a specific application. Indeed, it is a general-purpose processor, which is capable of performing optimization in any possible application. Utilizing speed-boosting techniques, such as pipeline scheme, parallel coarse-grained processing, parallel fitness computation, parallel selection of parents, dual-population scheme, and support for pipelined fitness computation, the proposed processor significantly reduces the processing time. Furthermore, by relying on a built-in discard operator the proposed hardware may be used in constrained problems that are very common in control applications. In the proposed design, a large search space is achievable through the bit string length extension of individuals in the genetic population by connecting the 32-bit GAPs. In addition, the proposed processor supports parallel processing, in which the GAs procedure can be run on several connected processors simultaneously.
Architectural design and analysis of a programmable image processor

International Nuclear Information System (INIS)

Siyal, M.Y.; Chowdhry, B.S.; Rajput, A.Q.K.

2003-01-01

In this paper we present an architectural design and analysis of a programmable image processor, nicknamed Snake. The processor was designed with a high degree of parallelism to speed up a range of image processing operations. Data parallelism found in array processors has been included into the architecture of the proposed processor. The implementation of commonly used image processing algorithms and their performance evaluation are also discussed. The performance of Snake is also compared with other types of processor architectures. (author)
A Performance-Prediction Model for PIC Applications on Clusters of Symmetric MultiProcessors: Validation with Hierarchical HPF+OpenMP Implementation

Directory of Open Access Journals (Sweden)

Sergio Briguglio

2003-01-01

Full Text Available A performance-prediction model is presented, which describes different hierarchical workload decomposition strategies for particle in cell (PIC codes on Clusters of Symmetric MultiProcessors. The devised workload decomposition is hierarchically structured: a higher-level decomposition among the computational nodes, and a lower-level one among the processors of each computational node. Several decomposition strategies are evaluated by means of the prediction model, with respect to the memory occupancy, the parallelization efficiency and the required programming effort. Such strategies have been implemented by integrating the high-level languages High Performance Fortran (at the inter-node stage and OpenMP (at the intra-node one. The details of these implementations are presented, and the experimental values of parallelization efficiency are compared with the predicted results.

Firmware implementation of algorithms for the new topological processor in the ATLAS first level trigger

Energy Technology Data Exchange (ETDEWEB)

Maldaner, Stephan; Caputo, Regina; Schaefer, Ulrich; Tapprogge, Stefan [Universitaet Mainz, Staudingerweg 7, 55128 Mainz (Germany)

2013-07-01

After the upgrade of the Large Hadron Collider in 2013/2014 proton-proton collisions will be provided at a center-of-mass energy of up to 14 TeV with an instantaneous luminosity of at least 1 . 10{sup 34} cm{sup -2}s{sup -1}. During this upgrade a new FPGA based electronics system (Topological Processor) will be included in the ATLAS trigger chain to keep up with the increased rate of events. To reduce rates while maintaining high signal efficiency of the trigger the processor will make its decisions based upon topological criteria like angular cuts and mass calculations. As a hardware based trigger, it will have to fit into the tight first level trigger latency budget of 2.5 μs and thus provides the challenge of making decisions within very short time. Beside the latency, the main constraints on the algorithms are the required amount of logic resources of the FPGA which will be implemented as firmware. Therefore to be able to use as much information as possible, each module will be equipped with 2 state-of-the-art Xilinx Virtex 7 FPGAs to process the incoming data. This talk will present some of the topological algorithms and discuss properties of their implementation in firmware.
A Real-Time Marker-Based Visual Sensor Based on a FPGA and a Soft Core Processor.

Science.gov (United States)

Tayara, Hilal; Ham, Woonchul; Chong, Kil To

2016-12-15

This paper introduces a real-time marker-based visual sensor architecture for mobile robot localization and navigation. A hardware acceleration architecture for post video processing system was implemented on a field-programmable gate array (FPGA). The pose calculation algorithm was implemented in a System on Chip (SoC) with an Altera Nios II soft-core processor. For every frame, single pass image segmentation and Feature Accelerated Segment Test (FAST) corner detection were used for extracting the predefined markers with known geometries in FPGA. Coplanar PosIT algorithm was implemented on the Nios II soft-core processor supplied with floating point hardware for accelerating floating point operations. Trigonometric functions have been approximated using Taylor series and cubic approximation using Lagrange polynomials. Inverse square root method has been implemented for approximating square root computations. Real time results have been achieved and pixel streams have been processed on the fly without any need to buffer the input frame for further implementation.
A High Performance Multi-Core FPGA Implementation for 2D Pixel Clustering for the ATLAS Fast TracKer (FTK) Processor

CERN Document Server

Sotiropoulou, C-L; The ATLAS collaboration; Beretta, M; Gkaitatzis, S; Kordas, K; Nikolaidis, S; Petridou, C; Volpi, G

2014-01-01

The high performance multi-core 2D pixel clustering FPGA implementation used for the input system of the ATLAS Fast TracKer (FTK) processor is presented. The input system for the FTK processor will receive data from the Pixel and micro-strip detectors read out drivers (RODs) at 760Gbps, the full rate of level 1 triggers. Clustering is required as a method to reduce the high rate of the received data before further processing, as well as to determine the cluster centroid for obtaining obtain the best spatial measurement. Our implementation targets the pixel detectors and uses a 2D-clustering algorithm that takes advantage of a moving window technique to minimize the logic required for cluster identification. The design is fully generic and the cluster detection window size can be adjusted for optimizing the cluster identification process. Τhe implementation can be parallelized by instantiating multiple cores to identify different clusters independently thus exploiting more FPGA resources. This flexibility mak...
Implementation of the Two-Point Angular Correlation Function on a High-Performance Reconfigurable Computer

Directory of Open Access Journals (Sweden)

Volodymyr V. Kindratenko

2009-01-01

Full Text Available We present a parallel implementation of an algorithm for calculating the two-point angular correlation function as applied in the field of computational cosmology. The algorithm has been specifically developed for a reconfigurable computer. Our implementation utilizes a microprocessor and two reconfigurable processors on a dual-MAP SRC-6 system. The two reconfigurable processors are used as two application-specific co-processors. Two independent computational kernels are simultaneously executed on the reconfigurable processors while data pre-fetching from disk and initial data pre-processing are executed on the microprocessor. The overall end-to-end algorithm execution speedup achieved by this implementation is over 90× as compared to a sequential implementation of the algorithm executed on a single 2.8 GHz Intel Xeon microprocessor.
Towards a Process Algebra for Shared Processors

DEFF Research Database (Denmark)

Buchholtz, Mikael; Andersen, Jacob; Løvengreen, Hans Henrik

2002-01-01

We present initial work on a timed process algebra that models sharing of processor resources allowing preemption at arbitrary points in time. This enables us to model both the functional and the timely behaviour of concurrent processes executed on a single processor. We give a refinement relation...
The hardware track finder processor in CMS at CERN

International Nuclear Information System (INIS)

Kluge, A.

1997-07-01

The work covers the design of the Track Finder Processor in the high energy experiment CMS at CERN/Geneva. The task of this processor is to identify muons and to measure their transverse momentum. The Track Finder makes it possible to determine the physical relevance of each high energetic collision and to forward only interesting data to the data analysis units. Data of more than two hundred thousand detector cells are used to determine the location of muons and to measure their transverse momentum. Each 25 ns a new data set is generated. Measurement of location and transverse momentum of the muons can be terminated within 350 ns by using an ASIC. The classical method in high energy physics experiments is to employ a pattern comparison method. The predefined patterns are compared to the found patterns. The high number of data channels and the complex requirements to the spatial detector resolution do not permit to employ a pattern comparison method. A so called track following algorithm was designed, which is able to assemble complete tracks through the whole detector starting from single track segments. Instead of storing a high number of track patterns the problem is brought back to the algorithm level. Comprehensive simulations, employing the hardware simulation language VHDL, were conducted in order to optimize the algorithm and its hardware implementation. A FPGA (field program able gate array)-prototype was designed. A feasibility study to implement the track finder processor employing ASICs was conducted. (author)
Control structures for high speed processors

Science.gov (United States)

Maki, G. K.; Mankin, R.; Owsley, P. A.; Kim, G. M.

1982-01-01

A special processor was designed to function as a Reed Solomon decoder with throughput data rate in the Mhz range. This data rate is significantly greater than is possible with conventional digital architectures. To achieve this rate, the processor design includes sequential, pipelined, distributed, and parallel processing. The processor was designed using a high level language register transfer language. The RTL can be used to describe how the different processes are implemented by the hardware. One problem of special interest was the development of dependent processes which are analogous to software subroutines. For greater flexibility, the RTL control structure was implemented in ROM. The special purpose hardware required approximately 1000 SSI and MSI components. The data rate throughput is 2.5 megabits/second. This data rate is achieved through the use of pipelined and distributed processing. This data rate can be compared with 800 kilobits/second in a recently proposed very large scale integration design of a Reed Solomon encoder.
A survey of Tumult, a real-time multi-processor system

International Nuclear Information System (INIS)

Jansen, P.G.

1986-01-01

Tumult (Twente University MULTi processor system) is the name of an ongoing project aiming at the design and implementation of a modular extendible multiprocessor system. All memory is distributed and processors communicate in parallel via a fast and reliable local switching network instead of a shared bus. A distributed real-time operating system is being designed and implemented, consisting of a multi-tasking subsystem per processor. Processes can communicate via a message passing mechanism. Communication links and processes are dynamically created and disposed by the application. In this article a brief description of the system is given; communication aspects are emphasized. (Auth.)
Online track processor for the CDF upgrade

International Nuclear Information System (INIS)

Thomson, E. J.

2002-01-01

A trigger track processor, called the eXtremely Fast Tracker (XFT), has been designed for the CDF upgrade. This processor identifies high transverse momentum (> 1.5 GeV/c) charged particles in the new central outer tracking chamber for CDF II. The XFT design is highly parallel to handle the input rate of 183 Gbits/s and output rate of 44 Gbits/s. The processor is pipelined and reports the result for a new event every 132 ns. The processor uses three stages: hit classification, segment finding, and segment linking. The pattern recognition algorithms for the three stages are implemented in programmable logic devices (PLDs) which allow in-situ modification of the algorithm at any time. The PLDs reside on three different types of modules. The complete system has been installed and commissioned at CDF II. An overview of the track processor and performance in CDF Run II are presented
Designing a dataflow processor using CλaSH

NARCIS (Netherlands)

Niedermeier, A.; Wester, Rinse; Wester, Rinse; Rovers, K.C.; Baaij, C.P.R.; Kuper, Jan; Smit, Gerardus Johannes Maria

2010-01-01

In this paper we show how a simple dataflow processor can be fully implemented using CλaSH, a high level HDL based on the functional programming language Haskell. The processor was described using Haskell, the CλaSH compiler was then used to translate the design into a fully synthesisable VHDL code.
Design and implementation of an ASIP-based cryptography processor for AES, IDEA, and MD5

Directory of Open Access Journals (Sweden)

Karim Shahbazi

2017-08-01

Full Text Available In this paper, a new 32-bit ASIP-based crypto processor for AES, IDEA, and MD5 is designed. The instruction-set consists of both general purpose and specific instructions for the above cryptographic algorithms. The proposed architecture has nine function units and two data buses. It has also two types of 32-bit instruction formats for executing Memory Reference (M.R., Register Reference (R.R., and Input/Output Reference (I/O R. instructions. The maximum achieved frequency is 166.916 MHz. The encoded output results of the encryption process of a 128-bit input block are obtained after 122, 146 and 170 clock cycles for AES-128, AES-192, and AES-256, respectively. Moreover, it takes 95 clock cycles to encrypt or decrypt a 64-bit input block by using IDEA. Finally, the MD5 hash algorithm requires 469 clock cycles to generate the coded outputs for a block of 512 bits. The performance of the proposed processor is compared to some previous and state-of-the-art implementations in terms of speed, latency, throughput, and flexibility.
Keystone Business Models for Network Security Processors

OpenAIRE

Arthur Low; Steven Muegge

2013-01-01

Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor...
Code compression for VLIW embedded processors

Science.gov (United States)

Piccinelli, Emiliano; Sannino, Roberto

2004-04-01

The implementation of processors for embedded systems implies various issues: main constraints are cost, power dissipation and die area. On the other side, new terminals perform functions that require more computational flexibility and effort. Long code streams must be loaded into memories, which are expensive and power consuming, to run on DSPs or CPUs. To overcome this issue, the "SlimCode" proprietary algorithm presented in this paper (patent pending technology) can reduce the dimensions of the program memory. It can run offline and work directly on the binary code the compiler generates, by compressing it and creating a new binary file, about 40% smaller than the original one, to be loaded into the program memory of the processor. The decompression unit will be a small ASIC, placed between the Memory Controller and the System bus of the processor, keeping unchanged the internal CPU architecture: this implies that the methodology is completely transparent to the core. We present comparisons versus the state-of-the-art IBM Codepack algorithm, along with its architectural implementation into the ST200 VLIW family core.
Ring-array processor distribution topology for optical interconnects

Science.gov (United States)

Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.

1992-01-01

The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.
Reconfigurable lattice mesh designs for programmable photonic processors.

Science.gov (United States)

Pérez, Daniel; Gasulla, Ivana; Capmany, José; Soref, Richard A

2016-05-30

We propose and analyse two novel mesh design geometries for the implementation of tunable optical cores in programmable photonic processors. These geometries are the hexagonal and the triangular lattice. They are compared here to a previously proposed square mesh topology in terms of a series of figures of merit that account for metrics that are relevant to on-chip integration of the mesh. We find that that the hexagonal mesh is the most suitable option of the three considered for the implementation of the reconfigurable optical core in the programmable processor.
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors

DEFF Research Database (Denmark)

Liu, Weifeng; Vinter, Brian

2014-01-01

and its child nodes must be executed sequentially, and (2) heaps, even d-heaps (d-ary heaps or d-way heaps), cannot supply enough wide data parallelism to these processors. Recent research proposed more versatile asymmetric multicore processors (AMPs) that consist of two types of cores (latency......-oriented cores with high single-thread performance and throughput-oriented cores with wide vector processing capability), unified memory address space and faster synchronization mechanism among cores with different ISAs. To leverage the AMPs for the heap data structure, in this paper we propose ad......-heap, an efficient heap data structure that introduces an implicit bridge structure and properly apportions workloads to the two types of cores. We implement a batch k-selection algorithm and conduct experiments on simulated AMP environments composed of real CPUs and GPUs. In our experiments on two representative...
Phase space simulation of collisionless stellar systems on the massively parallel processor

International Nuclear Information System (INIS)

White, R.L.

1987-01-01

A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem
Fast track trigger processor for the OPAL detector at LEP

Energy Technology Data Exchange (ETDEWEB)

Carter, A A; Carter, J R; Ward, D R; Heuer, R D; Jaroslawski, S; Wagner, A

1986-09-20

A fast hardware track trigger processor being built for the OPAL experiment is described. The processor will analyse data from the central drift chambers of OPAL to determine whether any tracks come from the interaction region, and thereby eliminate background events. The processor will find tracks over a large angular range, vertical strokecos thetavertical stroke < or approx. 0.95. The design of the processor is described, together with a brief account of its hardware implementation for OPAL. The results of feasibility studies are also presented.
An Implementation of ARM 920T Processor-based Ultrasonic Spirometer and Improvement of Its Sensitivity

International Nuclear Information System (INIS)

Lee, Cheul Won; Kim, Young Kil

2005-01-01

The spirometer is a medical device that measures the instantaneous velocity of the respiratory gas flow capacity. It is used for testing the condition of the lung and patient monitoring. It measures the absolute capacity difference that includes the flow capacity signal. In this paper, by using an ultrasound sensor that reduce+ the error caused by the inertia and pressure it has improved the transmission and receiving signal. This has enabled patients with weak respiratory to use the spirometer. Also, by using the ARM 920T Processor, a precise and prompt detection system was implemented
On the effective parallel programming of multi-core processors

NARCIS (Netherlands)

Varbanescu, A.L.

2010-01-01

Multi-core processors are considered now the only feasible alternative to the large single-core processors which have become limited by technological aspects such as power consumption and heat dissipation. However, due to their inherent parallel structure and their diversity, multi-cores are

SPP: A data base processor data communications protocol

Science.gov (United States)

Fishwick, P. A.

1983-01-01

The design and implementation of a data communications protocol for the Intel Data Base Processor (DBP) is defined. The protocol is termed SPP (Service Port Protocol) since it enables data transfer between the host computer and the DBP service port. The protocol implementation is extensible in that it is explicitly layered and the protocol functionality is hierarchically organized. Extensive trace and performance capabilities have been supplied with the protocol software to permit optional efficient monitoring of the data transfer between the host and the Intel data base processor. Machine independence was considered to be an important attribute during the design and implementation of SPP. The protocol source is fully commented and is included in Appendix A of this report.
Reward-based learning under hardware constraints - Using a RISC processor embedded in a neuromorphic substrate

Directory of Open Access Journals (Sweden)

Simon eFriedmann

2013-09-01

Full Text Available In this study, we propose and analyze in simulations a new, highly flexible method of imple-menting synaptic plasticity in a wafer-scale, accelerated neuromorphic hardware system. Thestudy focuses on globally modulated STDP, as a special use-case of this method. Flexibility isachieved by embedding a general-purpose processor dedicated to plasticity into the wafer. Toevaluate the suitability of the proposed system, we use a reward modulated STDP rule in a spiketrain learning task. A single layer of neurons is trained to fire at specific points in time withonly the reward as feedback. This model is simulated to measure its performance, i.e. the in-crease in received reward after learning. Using this performance as baseline, we then simulatethe model with various constraints imposed by the proposed implementation and compare theperformance. The simulated constraints include discretized synaptic weights, a restricted inter-face between analog synapses and embedded processor, and mismatch of analog circuits. Wefind that probabilistic updates can increase the performance of low-resolution weights, a simpleinterface between analog synapses and processor is sufficient for learning, and performance isinsensitive to mismatch. Further, we consider communication latency between wafer and theconventional control computer system that is simulating the environment. This latency increasesthe delay, with which the reward is sent to the embedded processor. Because of the time continu-ous operation of the analog synapses, delay can cause a deviation of the updates as compared tothe not delayed situation. We find that for highly accelerated systems latency has to be kept to aminimum. This study demonstrates the suitability of the proposed implementation to emulatethe selected reward modulated STDP learning rule. It is therefore an ideal candidate for imple-mentation in an upgraded version of the wafer-scale system developed within the BrainScaleSproject.
Reward-based learning under hardware constraints-using a RISC processor embedded in a neuromorphic substrate.

Science.gov (United States)

Friedmann, Simon; Frémaux, Nicolas; Schemmel, Johannes; Gerstner, Wulfram; Meier, Karlheinz

2013-01-01

In this study, we propose and analyze in simulations a new, highly flexible method of implementing synaptic plasticity in a wafer-scale, accelerated neuromorphic hardware system. The study focuses on globally modulated STDP, as a special use-case of this method. Flexibility is achieved by embedding a general-purpose processor dedicated to plasticity into the wafer. To evaluate the suitability of the proposed system, we use a reward modulated STDP rule in a spike train learning task. A single layer of neurons is trained to fire at specific points in time with only the reward as feedback. This model is simulated to measure its performance, i.e., the increase in received reward after learning. Using this performance as baseline, we then simulate the model with various constraints imposed by the proposed implementation and compare the performance. The simulated constraints include discretized synaptic weights, a restricted interface between analog synapses and embedded processor, and mismatch of analog circuits. We find that probabilistic updates can increase the performance of low-resolution weights, a simple interface between analog synapses and processor is sufficient for learning, and performance is insensitive to mismatch. Further, we consider communication latency between wafer and the conventional control computer system that is simulating the environment. This latency increases the delay, with which the reward is sent to the embedded processor. Because of the time continuous operation of the analog synapses, delay can cause a deviation of the updates as compared to the not delayed situation. We find that for highly accelerated systems latency has to be kept to a minimum. This study demonstrates the suitability of the proposed implementation to emulate the selected reward modulated STDP learning rule. It is therefore an ideal candidate for implementation in an upgraded version of the wafer-scale system developed within the BrainScaleS project.
Single-chip serial channel enhances multi-processor systems

Energy Technology Data Exchange (ETDEWEB)

Millar, J.

1982-01-01

In this paper multiprocessor systems are described and explained. The impact that VLSI advancements are having on multiprocessor design is pointed out. The TMS 7041 single-chip microcomputer is described briefly, highlighting its multiprocessor communication capability. And finally, a typical multiprocessor system is shown, implementing the TMS 7041.
GA103: A microprogrammable processor for online filtering

International Nuclear Information System (INIS)

Calzas, A.; Danon, G.; Bouquet, B.

1981-01-01

GA 103 is a 16 bit microprogrammable processor which emulates the PDP 11 instruction set. It is based on the Am 2900 slices. It allows user-implemented microinstructions and addition of hardwired processors. It will perform on-line filtering tasks in the NA 14 experiment at CERN, based on the reconstruction of transverse momentum of photons detected in a lead glass calorimeter. (orig.)
Accelerating molecular dynamic simulation on the cell processor and Playstation 3.

Science.gov (United States)

Luttmann, Edgar; Ensign, Daniel L; Vaidyanathan, Vishal; Houston, Mike; Rimon, Noam; Øland, Jeppe; Jayachandran, Guha; Friedrichs, Mark; Pande, Vijay S

2009-01-30

Implementation of molecular dynamics (MD) calculations on novel architectures will vastly increase its power to calculate the physical properties of complex systems. Herein, we detail algorithmic advances developed to accelerate MD simulations on the Cell processor, a commodity processor found in PlayStation 3 (PS3). In particular, we discuss issues regarding memory access versus computation and the types of calculations which are best suited for streaming processors such as the Cell, focusing on implicit solvation models. We conclude with a comparison of improved performance on the PS3's Cell processor over more traditional processors. (c) 2008 Wiley Periodicals, Inc.
Natrium: Use of FPGA embedded processors for real-time data compression

Energy Technology Data Exchange (ETDEWEB)

Ammendola, R; Salamon, A; Salina, G [INFN Sezione di Roma Tor Vergata, Rome (Italy); Biagioni, A; Frezza, O; Cicero, F Lo; Lonardo, A; Rossetti, D; Simula, F; Tosoratto, L; Vicini, P [INFN Sezione di Roma, Rome (Italy)

2011-12-15

We present test results and characterization of a data compression system for the readout of the NA62 liquid krypton calorimeter trigger processor. The Level-0 electromagnetic calorimeter trigger processor of the NA62 experiment at CERN receives digitized data from the calorimeter main readout board. These data are stored on an on-board DDR2 RAM memory and read out upon reception of a Level-0 accept signal. The maximum raw data throughput from the trigger front-end cards is 2.6 Gbps. To readout these data over two Gbit Ethernet interfaces we investigated different implementations of a data compression system based on the Rice-Golomb coding: one is implemented in the FPGA as a custom block and one is implemented on the FPGA embedded processor running a C code. The two implementations are tested on a set of sample events and compared with respect to achievable readout bandwidth.
Natrium: Use of FPGA embedded processors for real-time data compression

International Nuclear Information System (INIS)

Ammendola, R; Salamon, A; Salina, G; Biagioni, A; Frezza, O; Cicero, F Lo; Lonardo, A; Rossetti, D; Simula, F; Tosoratto, L; Vicini, P

2011-01-01

We present test results and characterization of a data compression system for the readout of the NA62 liquid krypton calorimeter trigger processor. The Level-0 electromagnetic calorimeter trigger processor of the NA62 experiment at CERN receives digitized data from the calorimeter main readout board. These data are stored on an on-board DDR2 RAM memory and read out upon reception of a Level-0 accept signal. The maximum raw data throughput from the trigger front-end cards is 2.6 Gbps. To readout these data over two Gbit Ethernet interfaces we investigated different implementations of a data compression system based on the Rice-Golomb coding: one is implemented in the FPGA as a custom block and one is implemented on the FPGA embedded processor running a C code. The two implementations are tested on a set of sample events and compared with respect to achievable readout bandwidth.
Monte Carlo photon transport on shared memory and distributed memory parallel processors

International Nuclear Information System (INIS)

Martin, W.R.; Wan, T.C.; Abdel-Rahman, T.S.; Mudge, T.N.; Miura, K.

1987-01-01

Parallelized Monte Carlo algorithms for analyzing photon transport in an inertially confined fusion (ICF) plasma are considered. Algorithms were developed for shared memory (vector and scalar) and distributed memory (scalar) parallel processors. The shared memory algorithm was implemented on the IBM 3090/400, and timing results are presented for dedicated runs with two, three, and four processors. Two alternative distributed memory algorithms (replication and dispatching) were implemented on a hypercube parallel processor (1 through 64 nodes). The replication algorithm yields essentially full efficiency for all cube sizes; with the 64-node configuration, the absolute performance is nearly the same as with the CRAY X-MP. The dispatching algorithm also yields efficiencies above 80% in a large simulation for the 64-processor configuration
The Heidelberg POLYP - a flexible and fault-tolerant poly-processor

International Nuclear Information System (INIS)

Maenner, R.; Deluigi, B.

1981-01-01

The Heidelberg poly-processor system POLYP is described. It is intended to be used in nuclear physics for reprocessing of experimental data, in high energy physics as second-stage trigger processor, and generally in other applications requiring high-computing power. The POLYP system consists of any number of I/O-processors, processor modules (eventually of different types), global memory segments, and a host processor. All modules (up to several hundred) are connected by a multiple common-data-bus system; all processors, additionally, by a multiple sync bus system for processor/task-scheduling. All hard- and software is designed to be decentralized and free of bottle-necks. Most hardware-faults like single-bit errors in memory or multi-bit errors during transfers are automatically corrected. Defective modules, buses, etc., can be removed with only a graceful degradation of the system-throughput. (orig.)
SAD PROCESSOR FOR MULTIPLE MACROBLOCK MATCHING IN FAST SEARCH VIDEO MOTION ESTIMATION

Directory of Open Access Journals (Sweden)

Nehal N. Shah

2015-02-01

Full Text Available Motion estimation is a very important but computationally complex task in video coding. Process of determining motion vectors based on the temporal correlation of consecutive frame is used for video compression. In order to reduce the computational complexity of motion estimation and maintain the quality of encoding during motion compensation, different fast search techniques are available. These block based motion estimation algorithms use the sum of absolute difference (SAD between corresponding macroblock in current frame and all the candidate macroblocks in the reference frame to identify best match. Existing implementations can perform SAD between two blocks using sequential or pipeline approach but performing multi operand SAD in single clock cycle with optimized recourses is state of art. In this paper various parallel architectures for computation of the fixed block size SAD is evaluated and fast parallel SAD architecture is proposed with optimized resources. Further SAD processor is described with 9 processing elements which can be configured for any existing fast search block matching algorithm. Proposed SAD processor consumes 7% fewer adders compared to existing implementation for one processing elements. Using nine PE it can process 84 HD frames per second in worse case which is good outcome for real time implementation. In average case architecture process 325 HD frames per second.
A fast track trigger processor for the OPAL detector at LEP

International Nuclear Information System (INIS)

Carter, A.A.; Jaroslawski, S.; Wagner, A.

1986-01-01

A fast hardware track trigger processor being built for the OPAL experiment is described. The processor will analyse data from the central drift chambers of OPAL to determine whether any tracks come from the interaction region, and thereby eliminate background events. The processor will find tracks over a large angular range, vertical strokecos thetavertical stroke < or approx. 0.95. The design of the processor is described, together with a brief account of its hardware implementation for OPAL. The results of feasibility studies are also presented. (orig.)
Optical backplane interconnect switch for data processors and computers

Science.gov (United States)

Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.

1989-01-01

An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.
XL-100S microprogrammable processor

International Nuclear Information System (INIS)

Gorbunov, N.V.; Guzik, Z.; Sutulin, V.A.; Forytski, A.

1983-01-01

The XL-100S microprogrammable processor providing the multiprocessor operation mode in the XL system crate is described. The processor meets the EUR 6500 CAMAC standards, address up to 4 Mbyte memory, and interacts with 7 CAMAC branchas. Eight external requests initiate operations preset by a sequence of microcommands in a memory of the capacity up to 64 kwords of 32-Git. The microprocessor architecture allows one to emulate commands of the majority of mini- or micro-computers, including floating point operations. The XL-100S processor may be used in various branches of experimental physics: for physical experiment apparatus control, fast selection of useful physical events, organization of the of input/output operations, organization of direct assess to memory included, etc. The Am2900 microprocessor set is used as an elementary base. The device is made in the form of a single width CAMAC module
A Taxonomy of Reconfigurable Single-/Multiprocessor Systems-on-Chip

Directory of Open Access Journals (Sweden)

Diana Göhringer

2009-01-01

Full Text Available Runtime adaptivity of hardware in processor architectures is a novel trend, which is under investigation in a variety of research labs all over the world. The runtime exchange of modules, implemented on a reconfigurable hardware, affects the instruction flow (e.g., in reconfigurable instruction set processors or the data flow, which has a strong impact on the performance of an application. Furthermore, the choice of a certain processor architecture related to the class of target applications is a crucial point in application development. A simple example is the domain of high-performance computing applications found in meteorology or high-energy physics, where vector processors are the optimal choice. A classification scheme for computer systems was provided in 1966 by Flynn where single/multiple data and instruction streams were combined to four types of architectures. This classification is now used as a foundation for an extended classification scheme including runtime adaptivity as further degree of freedom for processor architecture design. The developed scheme is validated by a multiprocessor system implemented on reconfigurable hardware as well as by a classification of existing static and reconfigurable processor systems.
Multi-processor data acquisition and monitoring systems for particle physics

International Nuclear Information System (INIS)

White, V.; Burch, B.; Eng, K.; Heinicke, P.; Pyatetsky, M.; Ritchie, D.

1983-01-01

A high speed distributed processing system, using PDP-11 and VAX processors, is being developed at Fermilab. The acquisition of data is done using one or more PDP-11s. Additional processors are connected to provide either data logging or extra data analysis capabilities. Within this framework, functional interchangeability of PDP-11 and VAX processors and of the PDP-11 operating systems, RT-11 and RSX-11M, has been maintained. Inter-processor connections have been implemented in a general way using the 5 megabit DR11-W hardware currently selected for the purpose. Using this approach the authors have been able to make use of several existing data acquisition and analysis packages, such as RT/MULTI, in a multi-processor system
A Software Implementation of a Satellite Interface Message Processor.

Science.gov (United States)

Eastwood, Margaret A.; Eastwood, Lester F., Jr.

A design for network control software for a computer network is described in which some nodes are linked by a communications satellite channel. It is assumed that the network has an ARPANET-like configuration; that is, that specialized processors at each node are responsible for message switching and network control. The purpose of the control…
First level trigger processor for the ZEUS calorimeter

International Nuclear Information System (INIS)

Dawson, J.W.; Talaga, R.L.; Burr, G.W.; Laird, R.J.; Smith, W.; Lackey, J.

1990-01-01

This paper discusses the design of the first level trigger processor for the ZEUS calorimeter. This processor accepts data from the 13,000 photomultipliers of the calorimeter which is topologically divided into 16 regions, and after regional preprocessing, performs logical and numerical operations which cross regional boundaries. Because the crossing period at the HERA collider is 96 ns, it is necessary that first-level trigger decisions be made in pipelined hardware. One microsecond is allowed for the processor to perform the required logical and numerical operations, during which time the data from ten crossings would be resident in the processor while being clocked through the pipelined hardware. The circuitry is implemented in 100K ECL, Advanced CMOS discrete devices, and programmable gate arrays, and operates in a VME environment. All tables and registers are written/read from VME, and all diagnostic codes are executed from VME. Preprocessed data flows into the processor at a rate of 5.2GB/s, and processed data flows from the processor to the Global First-Level Trigger at a rate of 700MB/s. The system allows for subsets of the logic to be configured by software and for various important variables to be histogrammed as they flow through the processor. 2 refs., 3 figs
First-level trigger processor for the ZEUS calorimeter

International Nuclear Information System (INIS)

Dawson, J.W.; Talaga, R.L.; Burr, G.W.; Laird, R.J.; Smith, W.; Lackey, J.

1990-01-01

The design of the first-level trigger processor for the Zeus calorimeter is discussed. This processor accepts data from the 13,000 photomultipliers of the calorimeter, which is topologically divided into 16 regions, and after regional preprocessing performs logical and numerical operations that cross regional boundaries. Because the crossing period at the HERA collider is 96 ns, it is necessary that first-level trigger decisions be made in pipelined hardware. One microsecond is allowed for the processor to perform the required logical and numerical operations, during which time the data from ten crossings would be resident in the processor while being clocked through the pipelined hardware. The circuitry is implemented in 100K emitter-coupled logic (ECL), advanced CMOS discrete devices and programmable gate arrays, and operates in a VME environment. All tables and registers are written/read from VME, and all diagnostic codes are executed from VME. Preprocessed data flows into the processor at a rate of 5.2 Gbyte/s, and processed data flows from the processor to the global first-level trigger at a rate of 70 Mbyte/s. The system allows for subsets of the logic to be configured by software and for various important variables to be histogrammed as they flow through the processor
Nested dissection on a mesh-connected processor array

International Nuclear Information System (INIS)

Worley, P.H.; Schreiber, R.

1986-01-01

The authors present a parallel implementation of Gaussian elimination without pivoting using the nested dissection ordering for solving Ax=b where A is an N x N symmetric positive definite matrix. If the graph of A is a √N x √N finite element mesh then a parallel complexity of O(√N) can be achieved for Gaussian elimination with the nested dissection ordering. The authors' implementation achieves this parallel complexity on a two dimensional MIMD processor array with N processors and nearest neighbors interconnections. Thus nested dissection is a near optimal algorithm for this problem on this interconnection topology. The parallel implementation on this architecture requires 158√N + O(log/sub 2/(√N)) parallel floating point multiplications. It is faster than a Kung-Leiserson systolic array for banded matrices for N≥961, and faster than a serial implementation for N as small as 9

Slowdown in the $M/M/1$ discriminatory processor-sharing queue

NARCIS (Netherlands)

Cheung, S.K.; Kim, Bara; Kim, Jeongsim

2008-01-01

We consider a queue with multiple K job classes, Poisson arrivals, and exponentially distributed required service times in which a single processor serves according to the discriminatory processor-sharing (DPS) discipline. For this queue, we obtain the first and second moments of the slowdown, which
High-speed packet filtering utilizing stream processors

Science.gov (United States)

Hummel, Richard J.; Fulp, Errin W.

2009-04-01

Parallel firewalls offer a scalable architecture for the next generation of high-speed networks. While these parallel systems can be implemented using multiple firewalls, the latest generation of stream processors can provide similar benefits with a significantly reduced latency due to locality. This paper describes how the Cell Broadband Engine (CBE), a popular stream processor, can be used as a high-speed packet filter. Results show the CBE can potentially process packets arriving at a rate of 1 Gbps with a latency less than 82 μ-seconds. Performance depends on how well the packet filtering process is translated to the unique stream processor architecture. For example the method used for transmitting data and control messages among the pseudo-independent processor cores has a significant impact on performance. Experimental results will also show the current limitations of a CBE operating system when used to process packets. Possible solutions to these issues will be discussed.
On the implementation of the Ford | Fulkerson algorithm on the Multiple Instruction and Single Data computer system

Directory of Open Access Journals (Sweden)

A. Yu. Popov

2014-01-01

Full Text Available Algorithms of optimization in networks and direct graphs find a broad application when solving the practical tasks. However, along with large-scale introduction of information technologies in human activity, requirements for volumes of input data and retrieval rate of solution are aggravated. In spite of the fact that by now the large number of algorithms for the various models of computers and computing systems have been studied and implemented, the solution of key problems of optimization for real dimensions of tasks remains difficult. In this regard search of new and more efficient computing structures, as well as update of known algorithms are of great current interest.The work considers an implementation of the search-end algorithm of the maximum flow on the direct graph for multiple instructions and single data computer system (MISD developed in BMSTU. Key feature of this architecture is deep hardware support of operations over sets and structures of data. Functions of storage and access to them are realized on the specialized processor of structures processing (SP which is capable to perform at the hardware level such operations as: add, delete, search, intersect, complete, merge, and others. Advantage of such system is possibility of parallel execution of parts of the computing tasks regarding the access to the sets to data structures simultaneously with arithmetic and logical processing of information.The previous works present the general principles of the computing process arrangement and features of programs implemented in MISD system, describe the structure and principles of functioning the processor of structures processing, show the general principles of the graph task solutions in such system, and experimentally study the efficiency of the received algorithms.The work gives command formats of the SP processor, offers the technique to update the algorithms realized in MISD system, suggests the option of Ford-Falkersona algorithm
Real time implementation of a linear predictive coding algorithm on digital signal processor DSP32C

International Nuclear Information System (INIS)

Sheikh, N.M.; Usman, S.R.; Fatima, S.

2002-01-01

Pulse Code Modulation (PCM) has been widely used in speech coding. However, due to its high bit rate. PCM has severe limitations in application where high spectral efficiency is desired, for example, in mobile communication, CD quality broadcasting system etc. These limitation have motivated research in bit rate reduction techniques. Linear predictive coding (LPC) is one of the most powerful complex techniques for bit rate reduction. With the introduction of powerful digital signal processors (DSP) it is possible to implement the complex LPC algorithm in real time. In this paper we present a real time implementation of the LPC algorithm on AT and T's DSP32C at a sampling frequency of 8192 HZ. Application of the LPC algorithm on two speech signals is discussed. Using this implementation , a bit rate reduction of 1:3 is achieved for better than tool quality speech, while a reduction of 1.16 is possible for speech quality required in military applications. (author)
A high-speed analog neural processor

NARCIS (Netherlands)

Masa, P.; Masa, Peter; Hoen, Klaas; Hoen, Klaas; Wallinga, Hans

1994-01-01

Targeted at high-energy physics research applications, our special-purpose analog neural processor can classify up to 70 dimensional vectors within 50 nanoseconds. The decision-making process of the implemented feedforward neural network enables this type of computation to tolerate weight
Parallel processor for fast event analysis

International Nuclear Information System (INIS)

Hensley, D.C.

1983-01-01

Current maximum data rates from the Spin Spectrometer of approx. 5000 events/s (up to 1.3 MBytes/s) and minimum analysis requiring at least 3000 operations/event require a CPU cycle time near 70 ns. In order to achieve an effective cycle time of 70 ns, a parallel processing device is proposed where up to 4 independent processors will be implemented in parallel. The individual processors are designed around the Am2910 Microsequencer, the AM29116 μP, and the Am29517 Multiplier. Satellite histogramming in a mass memory system will be managed by a commercial 16-bit μP system
MPC Related Computational Capabilities of ARMv7A Processors

DEFF Research Database (Denmark)

Frison, Gianluca; Jørgensen, John Bagterp

2015-01-01

In recent years, the mass market of mobile devices has pushed the demand for increasingly fast but cheap processors. ARM, the world leader in this sector, has developed the Cortex-A series of processors with focus on computationally intensive applications. If properly programmed, these processors...... are powerful enough to solve the complex optimization problems arising in MPC in real-time, while keeping the traditional low-cost and low-power consumption. This makes these processors ideal candidates for use in embedded MPC. In this paper, we investigate the floating-point capabilities of Cortex A7, A9...... and A15 and show how to exploit the unique features of each processor to obtain the best performance, in the context of a novel implementation method for the linear-algebra routines used in MPC solvers. This method adapts high-performance computing techniques to the needs of embedded MPC. In particular...
UA1 upgrade first-level calorimeter trigger processor

International Nuclear Information System (INIS)

Bains, N.; Charlton, D.; Ellis, N.; Garvey, J.; Gregory, J.; Jimack, M.P.; Jovanovic, P.; Kenyon, I.R.; Baird, S.A.; Campbell, D.; Cawthraw, M.; Coughlan, J.; Flynn, P.; Galagedera, S.; Grayer, G.; Halsall, R.; Shah, T.P.; Stephens, R.; Eisenhandler, E.; Fensome, I.; Landon, M.

1989-01-01

A new first-level trigger processor has been built for the UA1 experiment on the Cern SppS Collider. The processor exploits the fine granularity of the new UA1 uranium-TMP calorimeter to improve the selectivity of the trigger. The new electron trigger has improved hadron jet rejection, achieved by requiring low energy deposition around the electromagnetic cluster. A missing transverse energy trigger and a total energy trigger have also been implemented. (orig.)
Discrete Fourier transformation processor based on complex radix (−1 + j number system

Directory of Open Access Journals (Sweden)

Anidaphi Shadap

2017-02-01

Full Text Available Complex radix (−1 + j allows the arithmetic operations of complex numbers to be done without treating the divide and conquer rules, which offers the significant speed improvement of complex numbers computation circuitry. Design and hardware implementation of complex radix (−1 + j converter has been introduced in this paper. Extensive simulation results have been incorporated and an application of this converter towards the implementation of discrete Fourier transformation (DFT processor has been presented. The functionality of the DFT processor have been verified in Xilinx ISE design suite version 14.7 and performance parameters like propagation delay and dynamic switching power consumption have been calculated by Virtuoso platform in Cadence. The proposed DFT processor has been implemented through conversion, multiplication and addition. The performance parameter matrix in terms of delay and power consumption offered a significant improvement over other traditional implementation of DFT processor.
Quality-Driven Model-Based Design of MultiProcessor Embedded Systems for Highlydemanding Applications

DEFF Research Database (Denmark)

Jozwiak, Lech; Madsen, Jan

2013-01-01

The recent spectacular progress in modern nano-dimension semiconductor technology enabled implementation of a complete complex multi-processor system on a single chip (MPSoC), global networking and mobile wire-less communication, and facilitated a fast progress in these areas. New important...... accessible or distant) objects, installations, machines or devices, or even implanted in human or animal body can serve as examples. However, many of the modern embedded application impose very stringent functional and parametric demands. Moreover, the spectacular advances in microelectronics introduced...
APRON: A Cellular Processor Array Simulation and Hardware Design Tool

Science.gov (United States)

Barr, David R. W.; Dudek, Piotr

2009-12-01

We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
APRON: A Cellular Processor Array Simulation and Hardware Design Tool

Directory of Open Access Journals (Sweden)

David R. W. Barr

2009-01-01

Full Text Available We present a software environment for the efficient simulation of cellular processor arrays (CPAs. This software (APRON is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
On developing B-spline registration algorithms for multi-core processors

International Nuclear Information System (INIS)

Shackleford, J A; Kandasamy, N; Sharp, G C

2010-01-01

Spline-based deformable registration methods are quite popular within the medical-imaging community due to their flexibility and robustness. However, they require a large amount of computing time to obtain adequate results. This paper makes two contributions towards accelerating B-spline-based registration. First, we propose a grid-alignment scheme and associated data structures that greatly reduce the complexity of the registration algorithm. Based on this grid-alignment scheme, we then develop highly data parallel designs for B-spline registration within the stream-processing model, suitable for implementation on multi-core processors such as graphics processing units (GPUs). Particular attention is focused on an optimal method for performing analytic gradient computations in a data parallel fashion. CPU and GPU versions are validated for execution time and registration quality. Performance results on large images show that our GPU algorithm achieves a speedup of 15 times over the single-threaded CPU implementation whereas our multi-core CPU algorithm achieves a speedup of 8 times over the single-threaded implementation. The CPU and GPU versions achieve near-identical registration quality in terms of RMS differences between the generated vector fields.
Architectures for single-chip image computing

Science.gov (United States)

Gove, Robert J.

1992-04-01

This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.
A VAX-FPS Loosely-Coupled Array of Processors

International Nuclear Information System (INIS)

Grosdidier, G.

1987-03-01

The main features of a VAX-FPS Loosely-Coupled Array of Processors (LCAP) set-up and the implementation of a High Energy Physics tracking program for off-line purposes will be described. This LCAP consists of a VAX 11/750 host and two FPS 64 bit attached processors. Before analyzing the performances of this LCAP, its characteristics will be outlined, especially from a user's point of vue, and will be briefly compared to those of the IBM-FPS LCAP
Reducing Competitive Cache Misses in Modern Processor Architectures

OpenAIRE

Prisagjanec, Milcho; Mitrevski, Pece

2017-01-01

The increasing number of threads inside the cores of a multicore processor, and competitive access to the shared cache memory, become the main reasons for an increased number of competitive cache misses and performance decline. Inevitably, the development of modern processor architectures leads to an increased number of cache misses. In this paper, we make an attempt to implement a technique for decreasing the number of competitive cache misses in the first level of cache memory. This tec...
Parallelising a molecular dynamics algorithm on a multi-processor workstation

Science.gov (United States)

Müller-Plathe, Florian

1990-12-01

The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.
Development of a processor embedded timing unit for the synchronized operation in KSTAR

Energy Technology Data Exchange (ETDEWEB)

Lee, Woongryol, E-mail: wrlee@nfri.re.kr; Lee, Taegu; Hong, Jaesic

2016-11-15

Highlights: • Timing board for the synchronized tokamak operation. • Processor embedded distributed control system. • Single clock source and multiple trigger signal for the plasma diagnostics. • Delay compensation among the distributed timing boards. - Abstract: The Local Timing Unit (LTU) in KSTAR provides a single clock source and multiple trigger signals with flexible configuration. Over the past seven years, the LTU had a mechanical redesign and several firmware updates for the purpose of provision of a robust operation and precision timing signal. Now we have developed a third version of a local timing unit which has a standalone operation capability. The LTU is built in a cabinet mountable 1U PIZZA box and provides twelve signal output ports, a packet mirroring interface, and an LCD interface panel. The core functions of the LTU are implemented in a Field Programmable Gate Array (FPGA) which has an internal hardcore processor. The internal processor allows the use of Linux Operating System (OS) and the Experimental Physics and Industrial Control System (EPICS). All user level application functions are controllable through the EPICS, however the time critical internal functions are performed by the FPGA logic blocks same as the previous version. The new LTU provides pluggable output module so that we can easily extend the signal output port. The easy installation and effective replacement reduce the efforts of maintenance. This paper describes design, development, and commissioning results of the new KSTAR LTU.
Development of a processor embedded timing unit for the synchronized operation in KSTAR

International Nuclear Information System (INIS)

Lee, Woongryol; Lee, Taegu; Hong, Jaesic

2016-01-01

Highlights: • Timing board for the synchronized tokamak operation. • Processor embedded distributed control system. • Single clock source and multiple trigger signal for the plasma diagnostics. • Delay compensation among the distributed timing boards. - Abstract: The Local Timing Unit (LTU) in KSTAR provides a single clock source and multiple trigger signals with flexible configuration. Over the past seven years, the LTU had a mechanical redesign and several firmware updates for the purpose of provision of a robust operation and precision timing signal. Now we have developed a third version of a local timing unit which has a standalone operation capability. The LTU is built in a cabinet mountable 1U PIZZA box and provides twelve signal output ports, a packet mirroring interface, and an LCD interface panel. The core functions of the LTU are implemented in a Field Programmable Gate Array (FPGA) which has an internal hardcore processor. The internal processor allows the use of Linux Operating System (OS) and the Experimental Physics and Industrial Control System (EPICS). All user level application functions are controllable through the EPICS, however the time critical internal functions are performed by the FPGA logic blocks same as the previous version. The new LTU provides pluggable output module so that we can easily extend the signal output port. The easy installation and effective replacement reduce the efforts of maintenance. This paper describes design, development, and commissioning results of the new KSTAR LTU.
First Results of an “Artificial Retina” Processor Prototype

International Nuclear Information System (INIS)

Cenci, Riccardo; Bedeschi, Franco; Marino, Pietro; Morello, Michael J.; Ninci, Daniele; Piucci, Alessio; Punzi, Giovanni; Ristori, Luciano; Spinella, Franco; Stracka, Simone; Tonelli, Diego; Walsh, John

2016-01-01

We report on the performance of a specialized processor capable of reconstructing charged particle tracks in a realistic LHC silicon tracker detector, at the same speed of the readout and with sub-microsecond latency. The processor is based on an innovative pattern-recognition algorithm, called “artificial retina algorithm”, inspired from the vision system of mammals. A prototype of the processor has been designed, simulated, and implemented on Tel62 boards equipped with high-bandwidth Altera Stratix III FPGA devices. The prototype is the first step towards a real-time track reconstruction device aimed at processing complex events of high-luminosity LHC experiments at 40 MHz crossing rate

Comparison Of Hybrid Sorting Algorithms Implemented On Different Parallel Hardware Platforms

Directory of Open Access Journals (Sweden)

Dominik Zurek

2013-01-01

Full Text Available Sorting is a common problem in computer science. There are lot of well-known sorting algorithms created for sequential execution on a single processor. Recently, hardware platforms enable to create wide parallel algorithms. We have standard processors consist of multiple cores and hardware accelerators like GPU. The graphic cards with their parallel architecture give new possibility to speed up many algorithms. In this paper we describe results of implementation of a few different sorting algorithms on GPU cards and multicore processors. Then hybrid algorithm will be presented which consists of parts executed on both platforms, standard CPU and GPU.
MORPION: a fast hardware processor for straight line finding in MWPC

International Nuclear Information System (INIS)

Mur, M.

1980-02-01

A fast hardware processor for straight line finding in MWPC has been built in Saclay and successfully operated in the NA3 experiment at CERN. We give the motivations to build this processor, and describe the hardware implementation of the line finding algorithm. Finally its use and performance in NA3 are described
Run-time Adaptable VLIW Processors : Resources, Performance, Power Consumption, and Reliability Trade-offs

NARCIS (Netherlands)

Anjam, F.

2013-01-01

In this dissertation, we propose to combine programmability with reconfigurability by implementing an adaptable programmable VLIW processor in a reconfigurable hardware. The approach allows applications to be developed at high-level (C language level), while at the same time, the processor
A CNN-Specific Integrated Processor

Directory of Open Access Journals (Sweden)

Suleyman Malki

2009-01-01

Full Text Available Integrated Processors (IP are algorithm-specific cores that either by programming or by configuration can be re-used within many microelectronic systems. This paper looks at Cellular Neural Networks (CNN to become realized as IP. First current digital implementations are reviewed, and the memoryprocessor bandwidth issues are analyzed. Then a generic view is taken on the structure of the network, and a new intra-communication protocol based on rotating wheels is proposed. It is shown that this provides for guaranteed high-performance with a minimal network interface. The resulting node is small and supports multi-level CNN designs, giving the system a 30-fold increase in capacity compared to classical designs. As it facilitates multiple operations on a single image, and single operations on multiple images, with minimal access to the external image memory, balancing the internal and external data transfer requirements optimizes the system operation. In conventional digital CNN designs, the treatment of boundary nodes requires additional logic to handle the CNN value propagation scheme. In the new architecture, only a slight modification of the existing cells is necessary to model the boundary effect. A typical prototype for visual pattern recognition will house 4096 CNN cells with a 2% overhead for making it an IP.
Broadband set-top box using MAP-CA processor

Science.gov (United States)

Bush, John E.; Lee, Woobin; Basoglu, Chris

2001-12-01

Advances in broadband access are expected to exert a profound impact in our everyday life. It will be the key to the digital convergence of communication, computer and consumer equipment. A common thread that facilitates this convergence comprises digital media and Internet. To address this market, Equator Technologies, Inc., is developing the Dolphin broadband set-top box reference platform using its MAP-CA Broadband Signal ProcessorT chip. The Dolphin reference platform is a universal media platform for display and presentation of digital contents on end-user entertainment systems. The objective of the Dolphin reference platform is to provide a complete set-top box system based on the MAP-CA processor. It includes all the necessary hardware and software components for the emerging broadcast and the broadband digital media market based on IP protocols. Such reference design requires a broadband Internet access and high-performance digital signal processing. By using the MAP-CA processor, the Dolphin reference platform is completely programmable, allowing various codecs to be implemented in software, such as MPEG-2, MPEG-4, H.263 and proprietary codecs. The software implementation also enables field upgrades to keep pace with evolving technology and industry demands.
The Interface Between Redundant Processor Modules Of Safety Grade PLC Using Mass Storage DPRAM

International Nuclear Information System (INIS)

Hwang, Sung Jae; Song, Seong Hwan; No, Young Hun; Yun, Dong Hwa; Park, Gang Min; Kim, Min Gyu; Choi, Kyung Chul; Lee, Ui Taek

2010-01-01

Processor module of safety grade PLC (hereinafter called as POSAFE-Q) developed by POSCO ICT provides high reliability and safety. However, POSAFEQ would have suffered a malfunction when we think taking place of abnormal operation by exceptional environmental. POSAFE-Q would not able to conduct its function normally in such case. To prevent these situations, the necessity of redundant processor module has been raised. Therefore, redundant processor module, NCPU-2Q, has been developed which has not only functions of single processor module with high reliability and safety but also functions of redundant processor
A prediction method for job runtimes on shared processors: Survey, statistical analysis and new avenues

NARCIS (Netherlands)

Dobber, A.M.; van der Mei, R.D.; Koole, G.M.

2007-01-01

Grid computing is an emerging technology by which huge numbers of processors over the world create a global source of processing power. Their collaboration makes it possible to perform computations that are too extensive to perform on a single processor. On a grid, processors may connect and
An energy-efficient high-performance processor with reconfigurable data-paths using RSFQ circuits

International Nuclear Information System (INIS)

Takagi, Naofumi

2013-01-01

Highlights: ► An idea of a high-performance computer using RSFQ circuits is shown. ► An outline of processor with reconfigurable data-paths (RDPs) is shown. ► Architectural details of an SFQ-RDP are described. -- Abstract: We show recent progress in our research on an energy-efficient high-performance processor with reconfigurable data-paths (RDPs) using rapid single-flux-quantum (RSFQ) circuits. We mainly describe the architectural details of an RDP implemented using RSFQ circuits. An RDP consists of a lot of floating-point units (FPUs) and operand routing networks (ORNs) which connect the FPUs. We reconfigure the RDP to fit a computation, i.e., a group of floating-point operations, appearing in a ‘for’ loop of programs for numerical computations by setting the route in ORNs before the execution of the loop. In the RDP, a lot of FPUs work in parallel with pipelined fashion, and hence, very high-performance computation is achieved
Accuracy-Energy Configurable Sensor Processor and IoT Device for Long-Term Activity Monitoring in Rare-Event Sensing Applications

Directory of Open Access Journals (Sweden)

Daejin Park

2014-01-01

Full Text Available A specially designed sensor processor used as a main processor in IoT (internet-of-thing device for the rare-event sensing applications is proposed. The IoT device including the proposed sensor processor performs the event-driven sensor data processing based on an accuracy-energy configurable event-quantization in architectural level. The received sensor signal is converted into a sequence of atomic events, which is extracted by the signal-to-atomic-event generator (AEG. Using an event signal processing unit (EPU as an accelerator, the extracted atomic events are analyzed to build the final event. Instead of the sampled raw data transmission via internet, the proposed method delays the communication with a host system until a semantic pattern of the signal is identified as a final event. The proposed processor is implemented on a single chip, which is tightly coupled in bus connection level with a microcontroller using a 0.18 μm CMOS embedded-flash process. For experimental results, we evaluated the proposed sensor processor by using an IR- (infrared radio- based signal reflection and sensor signal acquisition system. We successfully demonstrated that the expected power consumption is in the range of 20% to 50% compared to the result of the basement in case of allowing 10% accuracy error.
A Modular Pipelined Processor for High Resolution Gamma-Ray Spectroscopy

Science.gov (United States)

Veiga, Alejandro; Grunfeld, Christian

2016-02-01

The design of a digital signal processor for gamma-ray applications is presented in which a single ADC input can simultaneously provide temporal and energy characterization of gamma radiation for a wide range of applications. Applying pipelining techniques, the processor is able to manage and synchronize very large volumes of streamed real-time data. Its modular user interface provides a flexible environment for experimental design. The processor can fit in a medium-sized FPGA device operating at ADC sampling frequency, providing an efficient solution for multi-channel applications. Two experiments are presented in order to characterize its temporal and energy resolution.
Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA

Science.gov (United States)

Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei

2013-03-01

With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.
Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors

Energy Technology Data Exchange (ETDEWEB)

Heybrock, Simon; Joo, Balint; Kalamkar, Dhiraj D; Smelyanskiy, Mikhail; Vaidyanathan, Karthikeyan; Wettig, Tilo; Dubey, Pradeep

2014-12-01

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel Xeon Phi co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.
Air-Lubricated Thermal Processor For Dry Silver Film

Science.gov (United States)

Siryj, B. W.

1980-09-01

Since dry silver film is processed by heat, it may be viewed on a light table only seconds after exposure. On the other hand, wet films require both bulky chemicals and substantial time before an image can be analyzed. Processing of dry silver film, although simple in concept, is not so simple when reduced to practice. The main concern is the effect of film temperature gradients on uniformity of optical film density. RCA has developed two thermal processors, different in implementation but based on the same philosophy. Pressurized air is directed to both sides of the film to support the film and to conduct the heat to the film. Porous graphite is used as the medium through which heat and air are introduced. The initial thermal processor was designed to process 9.5-inch-wide film moving at speeds ranging from 0.0034 to 0.008 inch per second. The processor configuration was curved to match the plane generated by the laser recording beam. The second thermal processor was configured to process 5-inch-wide film moving at a continuously variable rate ranging from 0.15 to 3.5 inches per second. Due to field flattening optics used in this laser recorder, the required film processing area was plane. In addition, this processor was sectioned in the direction of film motion, giving the processor the capability of varying both temperature and effective processing area.
Parallel computation for distributed parameter system-from vector processors to Adena computer

Energy Technology Data Exchange (ETDEWEB)

Nogi, T

1983-04-01

Research on advanced parallel hardware and software architectures for very high-speed computation deserves and needs more support and attention to fulfil its promise. Novel architectures for parallel processing are being made ready. Architectures for parallel processing can be roughly divided into two groups. One is a vector processor in which a single central processing unit involves multiple vector-arithmetic registers. The other is a processor array in which slave processors are connected to a host processor to perform parallel computation. In this review, the concept and data structure of the Adena (alternating-direction edition nexus array) architecture, which is conformable to distributed-parameter simulation algorithms, are described. 5 references.
A fast inner product processor based on equal alignments

Energy Technology Data Exchange (ETDEWEB)

Smith, S.P.; Torng, H.C.

1985-11-01

Inner product computation is an important operation, invoked repeatedly in matrix multiplications. A high-speed inner product processor can be very useful (among many possible applications) in real-time signal processing. This paper presents the design of a fast inner product processor, with appreciably reduced latency and cost. The inner product processor is implemented with a tree of carry-propagate or carry-save adders; this structure is obtained with the incorporation of three innovations in the conventional multiply/add tree: The leaf-multipliers are expanded into adder subtrees, thus achieving an O(log Nb) latency, where N denotes the number of elements in a vector and b the number of bits in each element. The partial products, to be summed in producing an inner product, are reordered according to their ''minimum alignments.'' This reordering brings approximately a 20% savings in hardware-including adders and data paths. The reduction in adder widths also yields savings in carry propagation time for carry-propagate adders. For trees implemented with carry-save adders, the partial product reordering also serves to truncate the carry propagation chain in the final propagation stage by 2 log b - 1 positions, thus significantly reducing the latency further. A form of the Baugh and Wooley algorithm is adopted to implement two's complement notation with changes only in peripheral hardware.
Safety-critical Java on a Java processor

DEFF Research Database (Denmark)

Schoeberl, Martin; Rios Rivas, Juan Ricardo

2012-01-01

The safety-critical Java (SCJ) specification is developed within the Java Community Process under specification request number JSR 302. The specification is available as public draft, but details are still discussed by the expert group. In this stage of the specification we need prototype...... implementations of SCJ and first test applications that are written with SCJ, even when the specification is not finalized. The feedback from those prototype implementations is needed for final decisions. To help the SCJ expert group, a prototype implementation of SCJ on top of the Java optimized processor...
The Danish real-time SAR processor: first results

DEFF Research Database (Denmark)

Dall, Jørgen; Jørgensen, Jørn Hjelm; Netterstrøm, Anders

1993-01-01

A real-time processor (RTP) for the Danish airborne Synthetic Aperture Radar (SAR) has been designed and constructed at the Electromagnetics Institute. The implementation was completed in mid 1992, and since then the RTP has been operated successfully on several test and demonstration flights....... The processor is capable of focusing the entire swath of the raw SAR data into full resolution, and depending on the choice made by the on-board operator, either a high resolution one-look zoom image or a spatially multilooked overview image is displayed. After a brief design review, the paper addresses various...
Computations on the massively parallel processor at the Goddard Space Flight Center

Science.gov (United States)

Strong, James P.

1991-01-01

Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.
Array processor architecture

Science.gov (United States)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1983-01-01

A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
Very wide register : an asymmetric register file organization for low power embedded processors.

NARCIS (Netherlands)

Raghavan, P.; Lambrechts, A.; Jayapala, M.; Catthoor, F.; Verkest, D.T.M.L.; Corporaal, H.

2007-01-01

In current embedded systems processors, multi-ported register files are one of the most power hungry parts of the processor, even when they are clustered. This paper presents a novel register file architecture, which has single ported cells and asymmetric interfaces to the memory and to the

Stepping motor control processor reference manual. Volume I

International Nuclear Information System (INIS)

Holloway, F.W.; VanArsdall, P.J.; Suski, G.J.; Gant, R.G.; Rash, M.

1980-01-01

This manual is intended to serve several purposes. The first goal is to describe the capabilities and operation of the SMC processor package from an operator or user point of view. Secondly, the manual will describe in some detail the basic hardware elements and how they can be used effectively to implement a step motor control system. Practical information on the use, installation and checkout of the hardware set is presented in the following sections along with programming suggestions. Available related system software is described in this manual for reference and as an aid in understanding the system architecture. Section two presents an overview and operations manual of the SMC processor describing its composition and functional capabilities. Section three contains hardware descriptions in some detail for the LLL-designed hardware used in the SMC processor. Basic theory of operation and important features are explained
Green Secure Processors: Towards Power-Efficient Secure Processor Design

Science.gov (United States)

Chhabra, Siddhartha; Solihin, Yan

With the increasing wealth of digital information stored on computer systems today, security issues have become increasingly important. In addition to attacks targeting the software stack of a system, hardware attacks have become equally likely. Researchers have proposed Secure Processor Architectures which utilize hardware mechanisms for memory encryption and integrity verification to protect the confidentiality and integrity of data and computation, even from sophisticated hardware attacks. While there have been many works addressing performance and other system level issues in secure processor design, power issues have largely been ignored. In this paper, we first analyze the sources of power (energy) increase in different secure processor architectures. We then present a power analysis of various secure processor architectures in terms of their increase in power consumption over a base system with no protection and then provide recommendations for designs that offer the best balance between performance and power without compromising security. We extend our study to the embedded domain as well. We also outline the design of a novel hybrid cryptographic engine that can be used to minimize the power consumption for a secure processor. We believe that if secure processors are to be adopted in future systems (general purpose or embedded), it is critically important that power issues are considered in addition to performance and other system level issues. To the best of our knowledge, this is the first work to examine the power implications of providing hardware mechanisms for security.
Bulk-memory processor for data acquisition

International Nuclear Information System (INIS)

Nelson, R.O.; McMillan, D.E.; Sunier, J.W.; Meier, M.; Poore, R.V.

1981-01-01

To meet the diverse needs and data rate requirements at the Van de Graaff and Weapons Neutron Research (WNR) facilities, a bulk memory system has been implemented which includes a fast and flexible processor. This bulk memory processor (BMP) utilizes bit slice and microcode techniques and features a 24 bit wide internal architecture allowing direct addressing of up to 16 megawords of memory and histogramming up to 16 million counts per channel without overflow. The BMP is interfaced to the MOSTEK MK 8000 bulk memory system and to the standard MODCOMP computer I/O bus. Coding for the BMP both at the microcode level and with macro instructions is supported. The generalized data acquisition system has been extended to support the BMP in a manner transparent to the user
Reconfigurable signal processor designs for advanced digital array radar systems

Science.gov (United States)

Suarez, Hernan; Zhang, Yan (Rockee); Yu, Xining

2017-05-01

The new challenges originated from Digital Array Radar (DAR) demands a new generation of reconfigurable backend processor in the system. The new FPGA devices can support much higher speed, more bandwidth and processing capabilities for the need of digital Line Replaceable Unit (LRU). This study focuses on using the latest Altera and Xilinx devices in an adaptive beamforming processor. The field reprogrammable RF devices from Analog Devices are used as analog front end transceivers. Different from other existing Software-Defined Radio transceivers on the market, this processor is designed for distributed adaptive beamforming in a networked environment. The following aspects of the novel radar processor will be presented: (1) A new system-on-chip architecture based on Altera's devices and adaptive processing module, especially for the adaptive beamforming and pulse compression, will be introduced, (2) Successful implementation of generation 2 serial RapidIO data links on FPGA, which supports VITA-49 radio packet format for large distributed DAR processing. (3) Demonstration of the feasibility and capabilities of the processor in a Micro-TCA based, SRIO switching backplane to support multichannel beamforming in real-time. (4) Application of this processor in ongoing radar system development projects, including OU's dual-polarized digital array radar, the planned new cylindrical array radars, and future airborne radars.
Probabilistic programmable quantum processors

International Nuclear Information System (INIS)

Buzek, V.; Ziman, M.; Hillery, M.

2004-01-01

We analyze how to improve performance of probabilistic programmable quantum processors. We show how the probability of success of the probabilistic processor can be enhanced by using the processor in loops. In addition, we show that an arbitrary SU(2) transformations of qubits can be encoded in program state of a universal programmable probabilistic quantum processor. The probability of success of this processor can be enhanced by a systematic correction of errors via conditional loops. Finally, we show that all our results can be generalized also for qudits. (Abstract Copyright [2004], Wiley Periodicals, Inc.)
The LASS hardware processor

International Nuclear Information System (INIS)

Kunz, P.F.

1976-01-01

The problems of data analysis with hardware processors are reviewed and a description is given of a programmable processor. This processor, the 168/E, has been designed for use in the LASS multi-processor system; it has an execution speed comparable to the IBM 370/168 and uses the subset of IBM 370 instructions appropriate to the LASS analysis task. (Auth.)
Wavelength-encoded OCDMA system using opto-VLSI processors.

Science.gov (United States)

Aljada, Muhsen; Alameh, Kamal

2007-07-01

We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.
Wavelength-encoded OCDMA system using opto-VLSI processors

Science.gov (United States)

Aljada, Muhsen; Alameh, Kamal

2007-07-01

We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.
A Model of Batch Scheduling for a Single Batch Processor with Additional Setups to Minimize Total Inventory Holding Cost of Parts of a Single Item Requested at Multi-due-date

Science.gov (United States)

Hakim Halim, Abdul; Ernawati; Hidayat, Nita P. A.

2018-03-01

This paper deals with a model of batch scheduling for a single batch processor on which a number of parts of a single items are to be processed. The process needs two kinds of setups, i. e., main setups required before processing any batches, and additional setups required repeatedly after the batch processor completes a certain number of batches. The parts to be processed arrive at the shop floor at the times coinciding with their respective starting times of processing, and the completed parts are to be delivered at multiple due dates. The objective adopted for the model is that of minimizing total inventory holding cost consisting of holding cost per unit time for a part in completed batches, and that in in-process batches. The formulation of total inventory holding cost is derived from the so-called actual flow time defined as the interval between arrival times of parts at the production line and delivery times of the completed parts. The actual flow time satisfies not only minimum inventory but also arrival and delivery just in times. An algorithm to solve the model is proposed and a numerical example is shown.
Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors.

Science.gov (United States)

Han, Bing; Taha, Tarek M

2010-04-01

There is currently a strong push in the research community to develop biological scale implementations of neuron based vision models. Systems at this scale are computationally demanding and generally utilize more accurate neuron models, such as the Izhikevich and the Hodgkin-Huxley models, in favor of the more popular integrate and fire model. We examine the feasibility of using graphics processing units (GPUs) to accelerate a spiking neural network based character recognition network to enable such large scale systems. Two versions of the network utilizing the Izhikevich and Hodgkin-Huxley models are implemented. Three NVIDIA general-purpose (GP) GPU platforms are examined, including the GeForce 9800 GX2, the Tesla C1060, and the Tesla S1070. Our results show that the GPGPUs can provide significant speedup over conventional processors. In particular, the fastest GPGPU utilized, the Tesla S1070, provided a speedup of 5.6 and 84.4 over highly optimized implementations on the fastest central processing unit (CPU) tested, a quadcore 2.67 GHz Xeon processor, for the Izhikevich and the Hodgkin-Huxley models, respectively. The CPU implementation utilized all four cores and the vector data parallelism offered by the processor. The results indicate that GPUs are well suited for this application domain.
The application of charge-coupled device processors in automatic-control systems

Science.gov (United States)

Mcvey, E. S.; Parrish, E. A., Jr.

1977-01-01

The application of charge-coupled device (CCD) processors to automatic-control systems is suggested. CCD processors are a new form of semiconductor component with the unique ability to process sampled signals on an analog basis. Specific implementations of controllers are suggested for linear time-invariant, time-varying, and nonlinear systems. Typical processing time should be only a few microseconds. This form of technology may become competitive with microprocessors and minicomputers in addition to supplementing them.
Online Fastbus processor for LEP

International Nuclear Information System (INIS)

Mueller, H.

1986-01-01

The author describes the online computing aspects of Fastbus systems using a processor module which has been developed at CERN and is now available commercially. These General Purpose Master/Slaves (GPMS) are based on 68000/10 (or optionally 68020/68881) processors. Applications include use as event-filters (DELPHI), supervisory controllers, Fastbus stand-alone diagnostic tools, and multiprocessor array components. The direct mapping of single, 32-bit assembly instructions to execute Fastbus protocols makes the use of a GPM both simple and flexible. Loosely coupled processing in Fastbus networks is possible between GPM's as they support access semaphores and use a two port memory as I/O buffer for Fastbus. Both master and slave-ports support block transfers up to 20 Mbytes/s. The CERN standard Fastbus software and the MoniCa symbolic debugging monitor are available on the GPM with real time, multiprocessing support. (Auth.)
Token-Aware Completion Functions for Elastic Processor Verification

Directory of Open Access Journals (Sweden)

Sudarshan K. Srinivasan

2009-01-01

Full Text Available We develop a formal verification procedure to check that elastic pipelined processor designs correctly implement their instruction set architecture (ISA specifications. The notion of correctness we use is based on refinement. Refinement proofs are based on refinement maps, which—in the context of this problem—are functions that map elastic processor states to states of the ISA specification model. Data flow in elastic architectures is complicated by the insertion of any number of buffers in any place in the design, making it hard to construct refinement maps for elastic systems in a systematic manner. We introduce token-aware completion functions, which incorporate a mechanism to track the flow of data in elastic pipelines, as a highly automated and systematic approach to construct refinement maps. We demonstrate the efficiency of the overall verification procedure based on token-aware completion functions using six elastic pipelined processor models based on the DLX architecture.
Point and track-finding processors for multiwire chambers

CERN Document Server

Hansroul, M

1973-01-01

The hardware processors described below are designed to be used in conjunction with multi-wire chambers. They have the characteristic of being based on computational methods in contrast to analogue procedures. In a sense, they are hardware implementations of computer programs. But, being specially designed for their purpose, they are free of the restrictions imposed by the architecture of the computer on which the equivalent program is to run. The parallelism inherent in the algorithms can thus be fully exploited. Combined with the use of fast access scratch-pad memories and the non-sequential nature of the control program, the parallelism accounts for the fact that these processors are expected to execute 2-3 orders of magnitude faster than the equivalent Fortran programs on a CDC 7600 or 6600. As a consequence, methods which are simple and straightforward, but which are impractical because they require an exorbitant amount of computer time can on the contrary be very attractive for hardware implementation. ...
Multibus-based parallel processor for simulation

Science.gov (United States)

Ogrady, E. P.; Wang, C.-H.

1983-01-01

A Multibus-based parallel processor simulation system is described. The system is intended to serve as a vehicle for gaining hands-on experience, testing system and application software, and evaluating parallel processor performance during development of a larger system based on the horizontal/vertical-bus interprocessor communication mechanism. The prototype system consists of up to seven Intel iSBC 86/12A single-board computers which serve as processing elements, a multiple transmission controller (MTC) designed to support system operation, and an Intel Model 225 Microcomputer Development System which serves as the user interface and input/output processor. All components are interconnected by a Multibus/IEEE 796 bus. An important characteristic of the system is that it provides a mechanism for a processing element to broadcast data to other selected processing elements. This parallel transfer capability is provided through the design of the MTC and a minor modification to the iSBC 86/12A board. The operation of the MTC, the basic hardware-level operation of the system, and pertinent details about the iSBC 86/12A and the Multibus are described.
Extending and implementing the Self-adaptive Virtual Processor for distributed memory architectures

NARCIS (Netherlands)

van Tol, M.W.; Koivisto, J.

2011-01-01

Many-core architectures of the future are likely to have distributed memory organizations and need fine grained concurrency management to be used effectively. The Self-adaptive Virtual Processor (SVP) is an abstract concurrent programming model which can provide this, but the model and its current
A Ravenscar-Java profile implementation

DEFF Research Database (Denmark)

Thomsen, Bent; Ravn, Anders Peter; Søndergaard, Hans

2006-01-01

This paper presents an implementation of the Ravenscar-Java profile. While most implementations of the profile are reference-implementations showing that it is possible to implement the profile, our implementation is aimed at industrial applications. It uses a dedicated real-time Java processor......, since we want to investigate if the Ravenscar-Java profile, implemented on a Java processor, is efficient for real applications. During the implementation some ambiguities and weaknesses of the profile were uncovered. However, test examples indicate that the profile is suitable for development...... of realistic real-time programs....
Simulation of a parallel processor on a serial processor: The neutron diffusion equation

International Nuclear Information System (INIS)

Honeck, H.C.

1981-01-01

Parallel processors could provide the nuclear industry with very high computing power at a very moderate cost. Will we be able to make effective use of this power. This paper explores the use of a very simple parallel processor for solving the neutron diffusion equation to predict power distributions in a nuclear reactor. We first describe a simple parallel processor and estimate its theoretical performance based on the current hardware technology. Next, we show how the parallel processor could be used to solve the neutron diffusion equation. We then present the results of some simulations of a parallel processor run on a serial processor and measure some of the expected inefficiencies. Finally we extrapolate the results to estimate how actual design codes would perform. We find that the standard numerical methods for solving the neutron diffusion equation are still applicable when used on a parallel processor. However, some simple modifications to these methods will be necessary if we are to achieve the full power of these new computers. (orig.) [de
Functional unit for a processor

NARCIS (Netherlands)

Rohani, A.; Kerkhoff, Hans G.

2013-01-01

The invention relates to a functional unit for a processor, such as a Very Large Instruction Word Processor. The invention further relates to a processor comprising at least one such functional unit. The invention further relates to a functional unit and processor capable of mitigating the effect of
An evaluation of safety-critical Java on a Java processor

OpenAIRE

Rios Rivas, Juan Ricardo; Schoeberl, Martin

2014-01-01

The safety-critical Java (SCJ) specification provides a restricted set of the Java language intended for applications that require certification. In order to test the specification, implementations are emerging and the need to evaluate those implementations in a systematic way is becoming important. In this paper we evaluate our SCJ implementation which is based on the Java Optimized Processor JOP and we measure different performance and timeliness criteria relevant to hard real-time systems....

Adaptive signal processor

Energy Technology Data Exchange (ETDEWEB)

Walz, H.V.

1980-07-01

An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 ..mu..sec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed.
Adaptive signal processor

International Nuclear Information System (INIS)

Walz, H.V.

1980-07-01

An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 μsec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed
Parallel implementation of multireference coupled-cluster theories based on the reference-level parallelism

Energy Technology Data Exchange (ETDEWEB)

Brabec, Jiri; Pittner, Jiri; van Dam, Hubertus JJ; Apra, Edoardo; Kowalski, Karol

2012-02-01

A novel algorithm for implementing general type of multireference coupled-cluster (MRCC) theory based on the Jeziorski-Monkhorst exponential Ansatz [B. Jeziorski, H.J. Monkhorst, Phys. Rev. A 24, 1668 (1981)] is introduced. The proposed algorithm utilizes processor groups to calculate the equations for the MRCC amplitudes. In the basic formulation each processor group constructs the equations related to a specific subset of references. By flexible choice of processor groups and subset of reference-specific sufficiency conditions designated to a given group one can assure optimum utilization of available computing resources. The performance of this algorithm is illustrated on the examples of the Brillouin-Wigner and Mukherjee MRCC methods with singles and doubles (BW-MRCCSD and Mk-MRCCSD). A significant improvement in scalability and in reduction of time to solution is reported with respect to recently reported parallel implementation of the BW-MRCCSD formalism [J.Brabec, H.J.J. van Dam, K. Kowalski, J. Pittner, Chem. Phys. Lett. 514, 347 (2011)].
Multithreading in vector processors

Science.gov (United States)

Evangelinos, Constantinos; Kim, Changhoan; Nair, Ravi

2018-01-16

In one embodiment, a system includes a processor having a vector processing mode and a multithreading mode. The processor is configured to operate on one thread per cycle in the multithreading mode. The processor includes a program counter register having a plurality of program counters, and the program counter register is vectorized. Each program counter in the program counter register represents a distinct corresponding thread of a plurality of threads. The processor is configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle.
Implementation of an FIR Band Pass Filter Using a Bit-Slice Processor.

Science.gov (United States)

1987-06-01

SYSTEM ’ SOFTWARE FIRMWARE HARDWARE Figure 2.1 Instruction Levels CRef. 5] 16 are microprogrammed (firmware) to enable physical control signals to the...Controllers and ALUs, pp. 9, 30-42, 70-71, Garland STPM Press, 1981. 6. Wolfe, C.F., "Bit-slice Processors Come To Mainframe Design," Electronics
Real-time wavefront processors for the next generation of adaptive optics systems: a design and analysis

Science.gov (United States)

Truong, Tuan; Brack, Gary L.; Troy, Mitchell; Trinh, Thang; Shi, Fang; Dekany, Richard G.

2003-02-01

Adaptive optics (AO) systems currently under investigation will require at least two orders of magitude increase in the number of actuators, which in turn translates to effectively a 104 increase in compute latency. Since the performance of an AO system invariably improves as the compute latency decreases, it is important to study how today's computer systems will scale to address this expected increase in actuator utilization. This paper answers this question by characterizing the performance of a single deformable mirror (DM) Shack-Hartmann natural guide star AO system implemented on the present-generation digital signal processor (DSP) TMS320C6701 from Texas Instruments. We derive the compute latency of such a system in terms of a few basic parameters, such as the number of DM actuators, the number of data channels used to read out the camera pixels, the number of DSPs, the available memory bandwidth, as well as the inter-processor communication (IPC) bandwidth and the pixel transfer rate. We show how the results would scale for future systems that utilizes multiple DMs and guide stars. We demonstrate that the principal performance bottleneck of such a system is the available memory bandwidth of the processors and to lesser extent the IPC bandwidth. This paper concludes with suggestions for mitigating this bottleneck.
Parallel Processor for 3D Recovery from Optical Flow

Directory of Open Access Journals (Sweden)

Jose Hugo Barron-Zambrano

2009-01-01

Full Text Available 3D recovery from motion has received a major effort in computer vision systems in the recent years. The main problem lies in the number of operations and memory accesses to be performed by the majority of the existing techniques when translated to hardware or software implementations. This paper proposes a parallel processor for 3D recovery from optical flow. Its main feature is the maximum reuse of data and the low number of clock cycles to calculate the optical flow, along with the precision with which 3D recovery is achieved. The results of the proposed architecture as well as those from processor synthesis are presented.
Programmable optical processor chips: toward photonic RF filters with DSP-level flexibility and MHz-band selectivity

Directory of Open Access Journals (Sweden)

Xie Yiwei

2017-12-01

Full Text Available Integrated optical signal processors have been identified as a powerful engine for optical processing of microwave signals. They enable wideband and stable signal processing operations on miniaturized chips with ultimate control precision. As a promising application, such processors enables photonic implementations of reconfigurable radio frequency (RF filters with wide design flexibility, large bandwidth, and high-frequency selectivity. This is a key technology for photonic-assisted RF front ends that opens a path to overcoming the bandwidth limitation of current digital electronics. Here, the recent progress of integrated optical signal processors for implementing such RF filters is reviewed. We highlight the use of a low-loss, high-index-contrast stoichiometric silicon nitride waveguide which promises to serve as a practical material platform for realizing high-performance optical signal processors and points toward photonic RF filters with digital signal processing (DSP-level flexibility, hundreds-GHz bandwidth, MHz-band frequency selectivity, and full system integration on a chip scale.
Microlens array processor with programmable weight mask and direct optical input

Science.gov (United States)

Schmid, Volker R.; Lueder, Ernst H.; Bader, Gerhard; Maier, Gert; Siegordner, Jochen

1999-03-01

We present an optical feature extraction system with a microlens array processor. The system is suitable for online implementation of a variety of transforms such as the Walsh transform and DCT. Operating with incoherent light, our processor accepts direct optical input. Employing a sandwich- like architecture, we obtain a very compact design of the optical system. The key elements of the microlens array processor are a square array of 15 X 15 spherical microlenses on acrylic substrate and a spatial light modulator as transmissive mask. The light distribution behind the mask is imaged onto the pixels of a customized a-Si image sensor with adjustable gain. We obtain one output sample for each microlens image and its corresponding weight mask area as summation of the transmitted intensity within one sensor pixel. The resulting architecture is very compact and robust like a conventional camera lens while incorporating a high degree of parallelism. We successfully demonstrate a Walsh transform into the spatial frequency domain as well as the implementation of a discrete cosine transform with digitized gray values. We provide results showing the transformation performance for both synthetic image patterns and images of natural texture samples. The extracted frequency features are suitable for neural classification of the input image. Other transforms and correlations can be implemented in real-time allowing adaptive optical signal processing.
Satellite on-board real-time SAR processor prototype

Science.gov (United States)

Bergeron, Alain; Doucet, Michel; Harnisch, Bernd; Suess, Martin; Marchese, Linda; Bourqui, Pascal; Desnoyers, Nicholas; Legros, Mathieu; Guillot, Ludovic; Mercier, Luc; Châteauneuf, François

2017-11-01

A Compact Real-Time Optronic SAR Processor has been successfully developed and tested up to a Technology Readiness Level of 4 (TRL4), the breadboard validation in a laboratory environment. SAR, or Synthetic Aperture Radar, is an active system allowing day and night imaging independent of the cloud coverage of the planet. The SAR raw data is a set of complex data for range and azimuth, which cannot be compressed. Specifically, for planetary missions and unmanned aerial vehicle (UAV) systems with limited communication data rates this is a clear disadvantage. SAR images are typically processed electronically applying dedicated Fourier transformations. This, however, can also be performed optically in real-time. Originally the first SAR images were optically processed. The optical Fourier processor architecture provides inherent parallel computing capabilities allowing real-time SAR data processing and thus the ability for compression and strongly reduced communication bandwidth requirements for the satellite. SAR signal return data are in general complex data. Both amplitude and phase must be combined optically in the SAR processor for each range and azimuth pixel. Amplitude and phase are generated by dedicated spatial light modulators and superimposed by an optical relay set-up. The spatial light modulators display the full complex raw data information over a two-dimensional format, one for the azimuth and one for the range. Since the entire signal history is displayed at once, the processor operates in parallel yielding real-time performances, i.e. without resulting bottleneck. Processing of both azimuth and range information is performed in a single pass. This paper focuses on the onboard capabilities of the compact optical SAR processor prototype that allows in-orbit processing of SAR images. Examples of processed ENVISAT ASAR images are presented. Various SAR processor parameters such as processing capabilities, image quality (point target analysis), weight and
Integrated fuel processor development

International Nuclear Information System (INIS)

Ahmed, S.; Pereira, C.; Lee, S. H. D.; Krumpelt, M.

2001-01-01

The Department of Energy's Office of Advanced Automotive Technologies has been supporting the development of fuel-flexible fuel processors at Argonne National Laboratory. These fuel processors will enable fuel cell vehicles to operate on fuels available through the existing infrastructure. The constraints of on-board space and weight require that these fuel processors be designed to be compact and lightweight, while meeting the performance targets for efficiency and gas quality needed for the fuel cell. This paper discusses the performance of a prototype fuel processor that has been designed and fabricated to operate with liquid fuels, such as gasoline, ethanol, methanol, etc. Rated for a capacity of 10 kWe (one-fifth of that needed for a car), the prototype fuel processor integrates the unit operations (vaporization, heat exchange, etc.) and processes (reforming, water-gas shift, preferential oxidation reactions, etc.) necessary to produce the hydrogen-rich gas (reformate) that will fuel the polymer electrolyte fuel cell stacks. The fuel processor work is being complemented by analytical and fundamental research. With the ultimate objective of meeting on-board fuel processor goals, these studies include: modeling fuel cell systems to identify design and operating features; evaluating alternative fuel processing options; and developing appropriate catalysts and materials. Issues and outstanding challenges that need to be overcome in order to develop practical, on-board devices are discussed
The fast tracker processor for hadronic collider triggers

CERN Document Server

Annovi, A; Bardi, A; Carosi, R; Dell'Orso, Mauro; D'Onofrio, M; Giannetti, P; Iannaccone, G; Morsani, F; Pietri, M; Varotto, G

2000-01-01

Perspective for precise and fast track reconstruction in future hadronic collider experiments are addressed. We discuss the feasibility of a pipelined highly parallelized processor dedicated to the implementation of a very fast algorithm. The algorithm is based on the use of a large bank of pre-stored combinations of trajectory points (patterns) for extremely complex tracking systems. The CMS experiment at LHC is used as a benchmark. Tracking data from the events selected by the level-1 trigger are sorted and filtered by the Fast Tracker processor at a rate of 100 kHz. This data organization allows the level-2 trigger logic to reconstruct full resolution traces with transverse momentum above few GeV and search secondary vertexes within typical level-2 times. 15 Refs.
Design of Processors with Reconfigurable Microarchitecture

Directory of Open Access Journals (Sweden)

Andrey Mokhov

2014-01-01

Full Text Available Energy becomes a dominating factor for a wide spectrum of computations: from intensive data processing in “big data” companies resulting in large electricity bills, to infrastructure monitoring with wireless sensors relying on energy harvesting. In this context it is essential for a computation system to be adaptable to the power supply and the service demand, which often vary dramatically during runtime. In this paper we present an approach to building processors with reconfigurable microarchitecture capable of changing the way they fetch and execute instructions depending on energy availability and application requirements. We show how to use Conditional Partial Order Graphs to formally specify the microarchitecture of such a processor, explore the design possibilities for its instruction set, and synthesise the instruction decoder using correct-by-construction techniques. The paper is focused on the design methodology, which is evaluated by implementing a power-proportional version of Intel 8051 microprocessor.
Coordinated Energy Management in Heterogeneous Processors

Directory of Open Access Journals (Sweden)

Indrani Paul

2014-01-01

Full Text Available This paper examines energy management in a heterogeneous processor consisting of an integrated CPU–GPU for high-performance computing (HPC applications. Energy management for HPC applications is challenged by their uncompromising performance requirements and complicated by the need for coordinating energy management across distinct core types – a new and less understood problem. We examine the intra-node CPU–GPU frequency sensitivity of HPC applications on tightly coupled CPU–GPU architectures as the first step in understanding power and performance optimization for a heterogeneous multi-node HPC system. The insights from this analysis form the basis of a coordinated energy management scheme, called DynaCo, for integrated CPU–GPU architectures. We implement DynaCo on a modern heterogeneous processor and compare its performance to a state-of-the-art power- and performance-management algorithm. DynaCo improves measured average energy-delay squared (ED2 product by up to 30% with less than 2% average performance loss across several exascale and other HPC workloads.
An evaluation of safety-critical Java on a Java processor

DEFF Research Database (Denmark)

Rios Rivas, Juan Ricardo; Schoeberl, Martin

2014-01-01

The safety-critical Java (SCJ) specification provides a restricted set of the Java language intended for applications that require certification. In order to test the specification, implementations are emerging and the need to evaluate those implementations in a systematic way is becoming important....... In this paper we evaluate our SCJ implementation which is based on the Java Optimized Processor JOP and we measure different performance and timeliness criteria relevant to hard real-time systems. Our implementation targets Level 0 and Level1 of the specification and to test it we use a series of micro...
Design and development of microblaze processor based Remote Terminal Units for Fast Breeder Reactors

International Nuclear Information System (INIS)

Gour, Aditya; Santhanaraj, A.; Behera, R.P.; Murali, N.; Satyamurty, S.A.V.

2013-01-01

Remote Terminal Units (RTUs) are single board remote data acquisition and control systems that are widely used in FBRs during all states of plant operation. Distributed Digital Control System (DDCS) architecture is being followed for the plant control and operation, which mandates the need for multiple sockets support in TCPIP Ethernet communication in an embedded system. Existing RTUs are 89C51 microcontroller based systems where the TCPIP communication is done using Wiznet Module. These modules can support maximum of four sockets and are already obsolete from the market. In this paper a new RTU design is described where the complete digital logic of a board is implemented in one single FPGA device using Soft-core processor and EMAC controller with multiple socket support for the Ethernet communication. This makes design more reliable and immune to obsolescence. (author)
Treecode with a Special-Purpose Processor

Science.gov (United States)

Makino, Junichiro

1991-08-01

We describe an implementation of the modified Barnes-Hut tree algorithm for a gravitational N-body calculation on a GRAPE (GRAvity PipE) backend processor. GRAPE is a special-purpose computer for N-body calculations. It receives the positions and masses of particles from a host computer and then calculates the gravitational force at each coordinate specified by the host. To use this GRAPE processor with the hierarchical tree algorithm, the host computer must maintain a list of all nodes that exert force on a particle. If we create this list for each particle of the system at each timestep, the number of floating-point operations on the host and that on GRAPE would become comparable, and the increased speed obtained by using GRAPE would be small. In our modified algorithm, we create a list of nodes for many particles. Thus, the amount of the work required of the host is significantly reduced. This algorithm was originally developed by Barnes in order to vectorize the force calculation on a Cyber 205. With this algorithm, the computing time of the force calculation becomes comparable to that of the tree construction, if the GRAPE backend processor is sufficiently fast. The obtained speed-up factor is 30 to 50 for a RISC-based host computer and GRAPE-1A with a peak speed of 240 Mflops.
3081/E processor and its on-line use

International Nuclear Information System (INIS)

Rankin, P.; Bricaud, B.; Gravina, M.

1985-05-01

The 3081/E is a second generation emulator of a mainframe IBM. One of it's applications will be to form part of the data acquisition system of the upgraded Mark II detector for data taking at the SLAC linear collider. Since the processor does not have direct connections to I/O devices a FASTBUS interface will be provided to allow communication with both SLAC Scanner Processors (which are responsible for the accumulation of data at a crate level) and the experiment's VAX 8600 mainframe. The 3081/E's will supply a significant amount of on-line computing power to the experiment (a single 3081/E is equivalent to 4 to 5 VAX 11/780's). A major advantage of the 3081/E is that program development can be done on an IBM mainframe (such as the one used for off-line analysis) which gives the programmer access to a full range of debugging tools. The processor's performance can be continually monitored by comparison of the results obtained using it to those given when the same program is run on an IBM computer. 9 refs
Demonstration of two-qubit algorithms with a superconducting quantum processor.

Science.gov (United States)

DiCarlo, L; Chow, J M; Gambetta, J M; Bishop, Lev S; Johnson, B R; Schuster, D I; Majer, J; Blais, A; Frunzio, L; Girvin, S M; Schoelkopf, R J

2009-07-09

Quantum computers, which harness the superposition and entanglement of physical states, could outperform their classical counterparts in solving problems with technological impact-such as factoring large numbers and searching databases. A quantum processor executes algorithms by applying a programmable sequence of gates to an initialized register of qubits, which coherently evolves into a final state containing the result of the computation. Building a quantum processor is challenging because of the need to meet simultaneously requirements that are in conflict: state preparation, long coherence times, universal gate operations and qubit readout. Processors based on a few qubits have been demonstrated using nuclear magnetic resonance, cold ion trap and optical systems, but a solid-state realization has remained an outstanding challenge. Here we demonstrate a two-qubit superconducting processor and the implementation of the Grover search and Deutsch-Jozsa quantum algorithms. We use a two-qubit interaction, tunable in strength by two orders of magnitude on nanosecond timescales, which is mediated by a cavity bus in a circuit quantum electrodynamics architecture. This interaction allows the generation of highly entangled states with concurrence up to 94 per cent. Although this processor constitutes an important step in quantum computing with integrated circuits, continuing efforts to increase qubit coherence times, gate performance and register size will be required to fulfil the promise of a scalable technology.
Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

Science.gov (United States)

Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

2017-11-01

Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.

Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU

Energy Technology Data Exchange (ETDEWEB)

Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl [Faculty of Physics, University of Warsaw, ul. Pasteura 5, 02-093 Warsaw (Poland); Rudnicki, Witold R. [Institute of Informatics, University of Białystok, ul. Konstantego Ciołkowskiego 1M, 15-245 Białystok (Poland); Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw (Poland); Majewski, Jacek A. [Faculty of Physics, University of Warsaw, ul. Pasteura 5, 02-093 Warsaw (Poland)

2016-09-15

The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.
Single channel analog pulse processor Asic for gas proportional counters and SI detector

International Nuclear Information System (INIS)

Chandratre, V.B.; Sarkar, Soumen; Kataria, S.K.; Viyogi, Y.P.

2005-01-01

The paper presents the design and development of a single channel pulse processor in short Singleplex ASIC targeted for gas proportional counters/Si detectors. The design is optimized for the dynamic range of +500 fC to -500 fC with provision for externally adjusted pole-zero cancellation. A dedicated filter based on the de-convolution principle is used for the cancellation of the long hyperbolic signal tail produced by the slow drift of ions, typical in gas proportional with the filter time constants derived from the actual detector input signal shape. The pole-zero adjustment can be done by external dc voltage to achieve perfect base-line recovery to 1% after 5 μs. The simulated 0 pf noise is 500 e - rms for the peaking time of 1.2 μs with noise slope of 7e - -. The gain is 3.4 mv/fC over the entire linear dynamic range with power dissipation of 13 mW. This design is a modified version of Indiplex chip with features dynamic range equal gain on both polarities with nearly same noise and serves as diagnostic chip for Indiplex. The chip can be used for radiation monitoring instruments. (author)
The fast tracker processor for hadron collider triggers

CERN Document Server

Annovi, A; Bardi, A; Carosi, R; Dell'Orso, Mauro; D'Onofrio, M; Giannetti, P; Iannaccone, G; Morsani, E; Pietri, M; Varotto, G

2001-01-01

Perspectives for precise and fast track reconstruction in future hadron collider experiments are addressed. We discuss the feasibility of a pipelined highly parallel processor dedicated to the implementation of a very fast tracking algorithm. The algorithm is based on the use of a large bank of pre-stored combinations of trajectory points, called patterns, for extremely complex tracking systems. The CMS experiment at LHC is used as a benchmark. Tracking data from the events selected by the level-1 trigger are sorted and filtered by the Fast Tracker processor at an input rate of 100 kHz. This data organization allows the level-2 trigger logic to reconstruct full resolution tracks with transverse momentum above a few GeV and search for secondary vertices within typical level-2 times. (15 refs).
A VLSI image processor via pseudo-mersenne transforms

International Nuclear Information System (INIS)

Sei, W.J.; Jagadeesh, J.M.

1986-01-01

The computational burden on image processing in medical fields where a large amount of information must be processed quickly and accurately has led to consideration of special-purpose image processor chip design for some time. The very large scale integration (VLSI) resolution has made it cost-effective and feasible to consider the design of special purpose chips for medical imaging fields. This paper describes a VLSI CMOS chip suitable for parallel implementation of image processing algorithms and cyclic convolutions by using Pseudo-Mersenne Number Transform (PMNT). The main advantages of the PMNT over the Fast Fourier Transform (FFT) are: (1) no multiplications are required; (2) integer arithmetic is used. The design and development of this processor, which operates on 32-point convolution or 5 x 5 window image, are described
Tinuso: A processor architecture for a multi-core hardware simulation platform

DEFF Research Database (Denmark)

Schleuniger, Pascal; Karlsson, Sven

2010-01-01

Multi-core systems have the potential to improve performance, energy and cost properties of embedded systems but also require new design methods and tools to take advantage of the new architectures. Due to the limited accuracy and performance of pure software simulators, we are working on a cycle...... accurate hardware simulation platform. We have developed the Tinuso processor architecture for this platform. Tinuso is a processor architecture optimized for FPGA implementation. The instruction set makes use of predicated instructions and supports C/C++ and assembly language programming. It is designed...... to be easy extendable to maintain the exibility required for the research on multi-core systems. Tinuso contains a co-processor interface to connect to a network interface. This interface allow for communication over an on-chip network. A clock frequency estimation study on a deeply pipelined Tinuso...
Comparing the Overhead of Lock-based and Lock-free Implementations of Priority Queues

DEFF Research Database (Denmark)

Passas, Stavros; Karlsson, Sven

2011-01-01

. In this paper, we compare a lock-free implementation of a priority queue with a lock-based implementation. We perform experiments with processors of different generations and observe large performance differences for lock-free data structures depending on the processor generation. The lock-free implementation...... performs much better on the most recent processor generation. We investigate this performance trend, using a set of micro-benchmarks and show a significant difference in the overhead of atomic operations between processor generations. The lock-free implementation executes approximately three times as many...
Matrix preconditioning: a robust operation for optical linear algebra processors.

Science.gov (United States)

Ghosh, A; Paparao, P

1987-07-15

Analog electrooptical processors are best suited for applications demanding high computational throughput with tolerance for inaccuracies. Matrix preconditioning is one such application. Matrix preconditioning is a preprocessing step for reducing the condition number of a matrix and is used extensively with gradient algorithms for increasing the rate of convergence and improving the accuracy of the solution. In this paper, we describe a simple parallel algorithm for matrix preconditioning, which can be implemented efficiently on a pipelined optical linear algebra processor. From the results of our numerical experiments we show that the efficacy of the preconditioning algorithm is affected very little by the errors of the optical system.
Embedded SoPC Design with Nios II Processor and Verilog Examples

CERN Document Server

Chu, Pong P

2012-01-01

Explores the unique hardware programmability of FPGA-based embedded systems, using a learn-by-doing approach to introduce the concepts and techniques for embedded SoPC design with Verilog An SoPC (system on a programmable chip) integrates a processor, memory modules, I/O peripherals, and custom hardware accelerators into a single FPGA (field-programmable gate array) device. In addition to the customized software, customized hardware can be developed and incorporated into the embedded system as well-allowing us to configure the soft-core processor, create tailored I/O interfaces, and develop s
Design and evaluation of an architecture for a digital signal processor for instrumentation applications

Science.gov (United States)

Fellman, Ronald D.; Kaneshiro, Ronald T.; Konstantinides, Konstantinos

1990-03-01

The authors present the design and evaluation of an architecture for a monolithic, programmable, floating-point digital signal processor (DSP) for instrumentation applications. An investigation of the most commonly used algorithms in instrumentation led to a design that satisfies the requirements for high computational and I/O (input/output) throughput. In the arithmetic unit, a 16- x 16-bit multiplier and a 32-bit accumulator provide the capability for single-cycle multiply/accumulate operations, and three format adjusters automatically adjust the data format for increased accuracy and dynamic range. An on-chip I/O unit is capable of handling data block transfers through a direct memory access port and real-time data streams through a pair of parallel I/O ports. I/O operations and program execution are performed in parallel. In addition, the processor includes two data memories with independent addressing units, a microsequencer with instruction RAM, and multiplexers for internal data redirection. The authors also present the structure and implementation of a design environment suitable for the algorithmic, behavioral, and timing simulation of a complete DSP system. Various benchmarking results are reported.
The performances of R GPU implementations of the GMRES method

Directory of Open Access Journals (Sweden)

Bogdan Oancea

2018-03-01

Full Text Available Although the performance of commodity computers has improved drastically with the introduction of multicore processors and GPU computing, the standard R distribution is still based on single-threaded model of computation, using only a small fraction of the computational power available now for most desktops and laptops. Modern statistical software packages rely on high performance implementations of the linear algebra routines there are at the core of several important leading edge statistical methods. In this paper we present a GPU implementation of the GMRES iterative method for solving linear systems. We compare the performance of this implementation with a pure single threaded version of the CPU. We also investigate the performance of our implementation using different GPU packages available now for R such as gmatrix, gputools or gpuR which are based on CUDA or OpenCL frameworks.
Study of low insertion loss and miniaturization wavelet transform and inverse transform processor using SAW devices.

Science.gov (United States)

Jiang, Hua; Lu, Wenke; Zhang, Guoan

2013-07-01

In this paper, we propose a low insertion loss and miniaturization wavelet transform and inverse transform processor using surface acoustic wave (SAW) devices. The new SAW wavelet transform devices (WTDs) use the structure with two electrode-widths-controlled (EWC) single phase unidirectional transducers (SPUDT-SPUDT). This structure consists of the input withdrawal weighting interdigital transducer (IDT) and the output overlap weighting IDT. Three experimental devices for different scales 2(-1), 2(-2), and 2(-3) are designed and measured. The minimum insertion loss of the three devices reaches 5.49dB, 4.81dB, and 5.38dB respectively which are lower than the early results. Both the electrode width and the number of electrode pairs are reduced, thus making the three devices much smaller than the early devices. Therefore, the method described in this paper is suitable for implementing an arbitrary multi-scale low insertion loss and miniaturization wavelet transform and inverse transform processor using SAW devices. Copyright © 2013 Elsevier B.V. All rights reserved.
The processor farm for online triggering and full event reconstruction of the HERA-B experiment at HERA

International Nuclear Information System (INIS)

Gellrich, A.; Dippel, R.; Gensch, U.; Kowallik, R.; Legrand, I.C.; Leich, H.; Sun, F.; Wegner, P.

1996-01-01

The main goal of the HERA-B experiment which start taking data in 1988 is to study CP violation in B decays. This article describes the concept and the planned implementation of a multi-processor system, called processor farm,as the last part of the data acquisition and trigger system of the HERA B experiment. The third level trigger task and a full online event reconstruction will be performed on this processor farm, consisting of more then 100 powerful RISC processors which are based on commercial hardware boards. The controlling will be done by a real-time operating system which provides a software development environment, including FORTRAN and C compilers. (author)
Supertracker: A Programmable Parallel Pipeline Arithmetic Processor For Auto-Cueing Target Processing

Science.gov (United States)

Mack, Harold; Reddi, S. S.

1980-04-01

Supertracker represents a programmable parallel pipeline computer architecture that has been designed to meet the real time image processing requirements of auto-cueing target data processing. The prototype bread-board currently under development will be designed to perform input video preprocessing and processing for 525-line and 875-line TV formats FLIR video, automatic display gain and contrast control, and automatic target cueing, classification, and tracking. The video preprocessor is capable of performing operations full frames of video data in real time, e.g., frame integration, storage, 3 x 3 convolution, and neighborhood processing. The processor architecture is being implemented using bit-slice microprogrammable arithmetic processors, operating in parallel. Each processor is capable of up to 20 million operations per second. Multiple frame memories are used for additional flexibility.
The micro-processor controlled process radiation monitoring system for reactor safety systems

International Nuclear Information System (INIS)

Mizuno, K.; Noguchi, A.; Kumagami, S.; Gotoh, Y.; Kumahara, T.; Arita, S.

1986-01-01

Digital computers are soon expected to be applied to various real-time safety and safety-related systems in nuclear power plants. Hitachi is now engaged in the development of a micro-processor controlled process radiation monitoring system, which operates on digital processing methods employed with a log ratemeter. A newly defined methodology of design and test procedures is being applied as a means of software program verification for these safety systems. Recently implemented micro-processor technology will help to achieve an advanced man-machine interface and highly reliable performance. (author)
Keystone Business Models for Network Security Processors

Directory of Open Access Journals (Sweden)

Arthur Low

2013-07-01

Full Text Available Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor” models nor the silicon intellectual-property licensing (“IP-licensing” models allow small technology companies to successfully compete. This article describes an alternative approach that produces an ongoing stream of novel network security processors for niche markets through continuous innovation by both large and small companies. This approach, referred to here as the "business ecosystem model for network security processors", includes a flexible and reconfigurable technology platform, a “keystone” business model for the company that maintains the platform architecture, and an extended ecosystem of companies that both contribute and share in the value created by innovation. New opportunities for business model innovation by participating companies are made possible by the ecosystem model. This ecosystem model builds on: i the lessons learned from the experience of the first author as a senior integrated circuit architect for providers of public-key cryptography solutions and as the owner of a semiconductor startup, and ii the latest scholarly research on technology entrepreneurship, business models, platforms, and business ecosystems. This article will be of interest to all technology entrepreneurs, but it will be of particular interest to owners of small companies that provide security solutions and to specialized security professionals seeking to launch their own companies.
In-Network Adaptation of Video Streams Using Network Processors

Directory of Open Access Journals (Sweden)

Mohammad Shorfuzzaman

2009-01-01

problem can be addressed, near the network edge, by applying dynamic, in-network adaptation (e.g., transcoding of video streams to meet available connection bandwidth, machine characteristics, and client preferences. In this paper, we extrapolate from earlier work of Shorfuzzaman et al. 2006 in which we implemented and assessed an MPEG-1 transcoding system on the Intel IXP1200 network processor to consider the feasibility of in-network transcoding for other video formats and network processor architectures. The use of “on-the-fly” video adaptation near the edge of the network offers the promise of simpler support for a wide range of end devices with different display, and so forth, characteristics that can be used in different types of environments.
JPP: A Java Pre-Processor

OpenAIRE

Kiniry, Joseph R.; Cheong, Elaine

1998-01-01

The Java Pre-Processor, or JPP for short, is a parsing pre-processor for the Java programming language. Unlike its namesake (the C/C++ Pre-Processor, cpp), JPP provides functionality above and beyond simple textual substitution. JPP's capabilities include code beautification, code standard conformance checking, class and interface specification and testing, and documentation generation.
Design of an ultra-low-power digital processor for passive UHF RFID tags

Energy Technology Data Exchange (ETDEWEB)

Shi Wanggen; Zhuang Yiqi; Li Xiaoming; Wang Xianghua; Jin Zhao; Wang Dan, E-mail: wanggen_shi@163.co [Key Laboratory of the Ministry of Education for Wide Band-Gap Semiconductor Materials and Devices, Institute of Microelectronics, Xidian University, Xi' an 710071 (China)

2009-04-15

A new architecture of digital processors for passive UHF radio-frequency identification tags is proposed. This architecture is based on ISO/IEC 18000-6C and targeted at ultra-low power consumption. By applying methods like system-level power management, global clock gating and low voltage implementation, the total power of the design is reduced to a few microwatts. In addition, an innovative way for the design of a true RNG is presented, which contributes to both low power and secure data transaction. The digital processor is verified by an integrated FPGA platform and implemented by the Synopsys design kit for ASIC flows. The design fits different CMOS technologies and has been taped out using the 2P4M 0.35 mum process of Chartered Semiconductor.
A Fault Detection Mechanism in a Data-flow Scheduled Multithreaded Processor

NARCIS (Netherlands)

Fu, J.; Yang, Q.; Poss, R.; Jesshope, C.R.; Zhang, C.

2014-01-01

This paper designs and implements the Redundant Multi-Threading (RMT) in a Data-flow scheduled MultiThreaded (DMT) multicore processor, called Data-flow scheduled Redundant Multi-Threading (DRMT). Meanwhile, It presents Asynchronous Output Comparison (AOC) for RMT techniques to avoid fault detection
List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor

Science.gov (United States)

Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.

2014-03-01

List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging

A general model of concurrency and its implementation as many-core dynamic RISC processors

NARCIS (Netherlands)

Bernard, T.; Bousias, K.; Guang, L.; Jesshope, C.R.; Lankamp, M.; van Tol, M.W.; Zhang, L.

2008-01-01

This paper presents a concurrent execution model and its micro-architecture based on in-order RISC processors, which schedules instructions from large pools of contextualised threads. The model admits a strategy for programming chip multiprocessors using parallelising compilers based on existing
Upgrade of the PreProcessor System for the ATLAS Level-1 Calorimeter Trigger

CERN Document Server

Khomich, A

2010-01-01

The ATLAS Level-1 Calorimeter Trigger is a hardware-based pipelined system designed to identify high-pT objects in the ATLAS calorimeters within a fixed latency of 2.5\\,us. It consists of three subsystems: the PreProcessor which conditions and digitizes analogue signals and two digital processors. The majority of the PreProcessor's tasks are performed on a dense Multi-Chip Module(MCM) consisting of FADCs, a time-adjustment and digital processing ASICs, and LVDS serialisers designed and implemented in ten years old technologies. An MCM substitute, based on today's components (dual channel FADCs and FPGA), is being developed to profit from state-of-the-art electronics and to enhance the flexibility of the digital processing. Development and first test results are presented.
Upgrade of the PreProcessor System for the ATLAS LVL1 Calorimeter Trigger

CERN Document Server

Khomich, A; The ATLAS collaboration

2010-01-01

The ATLAS Level-1 Calorimeter Trigger is a hardware-based pipelined system designed to identify high-pT objects in the ATLAS calorimeters within a fixed latency of 2.5us. It consists of three subsystems: the PreProcessor which conditions and digitizes analogue signals and two digital processors. The majority of the PreProcessor's tasks are performed on a dense Multi-Chip Module(MCM) consisting of FADCs, a time-adjustment and digital processing ASICs, and LVDS serializers designed and implemented in ten years old technologies. An MCM substitute, based on today's components (dual channel FADCs and FPGA), is being developed to profit from state-of-the-art electronics and to enhance the flexibility of the digital processing. Development and first test results are presented.
Producing chopped firewood with firewood processors

International Nuclear Information System (INIS)

Kaerhae, K.; Jouhiaho, A.

2009-01-01

The TTS Institute's research and development project studied both the productivity of new, chopped firewood processors (cross-cutting and splitting machines) suitable for professional and independent small-scale production, and the costs of the chopped firewood produced. Seven chopped firewood processors were tested in the research, six of which were sawing processors and one shearing processor. The chopping work was carried out using wood feeding racks and a wood lifter. The work was also carried out without any feeding appliances. Altogether 132.5 solid m 3 of wood were chopped in the time studies. The firewood processor used had the most significant impact on chopping work productivity. In addition to the firewood processor, the stem mid-diameter, the length of the raw material, and of the firewood were also found to affect productivity. The wood feeding systems also affected productivity. If there is a feeding rack and hydraulic grapple loader available for use in chopping firewood, then it is worth using the wood feeding rack. A wood lifter is only worth using with the largest stems (over 20 cm mid-diameter) if a feeding rack cannot be used. When producing chopped firewood from small-diameter wood, i.e. with a mid-diameter less than 10 cm, the costs of chopping work were over 10 EUR solid m -3 with sawing firewood processors. The shearing firewood processor with a guillotine blade achieved a cost level of 5 EUR solid m -3 when the mid-diameter of the chopped stem was 10 cm. In addition to the raw material, the cost-efficient chopping work also requires several hundred annual operating hours with a firewood processor, which is difficult for individual firewood entrepreneurs to achieve. The operating hours of firewood processors can be increased to the required level by the joint use of the processors by a number of firewood entrepreneurs. (author)
Programs for Testing Processor-in-Memory Computing Systems

Science.gov (United States)

Katz, Daniel S.

2006-01-01

The Multithreaded Microbenchmarks for Processor-In-Memory (PIM) Compilers, Simulators, and Hardware are computer programs arranged in a series for use in testing the performances of PIM computing systems, including compilers, simulators, and hardware. The programs at the beginning of the series test basic functionality; the programs at subsequent positions in the series test increasingly complex functionality. The programs are intended to be used while designing a PIM system, and can be used to verify that compilers, simulators, and hardware work correctly. The programs can also be used to enable designers of these system components to examine tradeoffs in implementation. Finally, these programs can be run on non-PIM hardware (either single-threaded or multithreaded) using the POSIX pthreads standard to verify that the benchmarks themselves operate correctly. [POSIX (Portable Operating System Interface for UNIX) is a set of standards that define how programs and operating systems interact with each other. pthreads is a library of pre-emptive thread routines that comply with one of the POSIX standards.
DiFX: A software correlator for very long baseline interferometry using multi-processor computing environments

OpenAIRE

Deller, A. T.; Tingay, S. J.; Bailes, M.; West, C.

2007-01-01

We describe the development of an FX style correlator for Very Long Baseline Interferometry (VLBI), implemented in software and intended to run in multi-processor computing environments, such as large clusters of commodity machines (Beowulf clusters) or computers specifically designed for high performance computing, such as multi-processor shared-memory machines. We outline the scientific and practical benefits for VLBI correlation, these chiefly being due to the inherent flexibility of softw...
Unified and Modular Modeling and Functional Verification Framework of Real-Time Image Signal Processors

Directory of Open Access Journals (Sweden)

Abhishek Jain

2016-01-01

Full Text Available In VLSI industry, image signal processing algorithms are developed and evaluated using software models before implementation of RTL and firmware. After the finalization of the algorithm, software models are used as a golden reference model for the image signal processor (ISP RTL and firmware development. In this paper, we are describing the unified and modular modeling framework of image signal processing algorithms used for different applications such as ISP algorithms development, reference for hardware (HW implementation, reference for firmware (FW implementation, and bit-true certification. The universal verification methodology- (UVM- based functional verification framework of image signal processors using software reference models is described. Further, IP-XACT based tools for automatic generation of functional verification environment files and model map files are described. The proposed framework is developed both with host interface and with core using virtual register interface (VRI approach. This modeling and functional verification framework is used in real-time image signal processing applications including cellphone, smart cameras, and image compression. The main motivation behind this work is to propose the best efficient, reusable, and automated framework for modeling and verification of image signal processor (ISP designs. The proposed framework shows better results and significant improvement is observed in product verification time, verification cost, and quality of the designs.
NMR-MPar: A Fault-Tolerance Approach for Multi-Core and Many-Core Processors

Directory of Open Access Journals (Sweden)

Vanessa Vargas

2018-03-01

Full Text Available Multi-core and many-core processors are a promising solution to achieve high performance by maintaining a lower power consumption. However, the degree of miniaturization makes them more sensitive to soft-errors. To improve the system reliability, this work proposes a fault-tolerance approach based on redundancy and partitioning principles called N-Modular Redundancy and M-Partitions (NMR-MPar. By combining both principles, this approach allows multi-/many-core processors to perform critical functions in mixed-criticality systems. Benefiting from the capabilities of these devices, NMR-MPar creates different partitions that perform independent functions. For critical functions, it is proposed that N partitions with the same configuration participate of an N-modular redundancy system. In order to validate the approach, a case study is implemented on the KALRAY Multi-Purpose Processing Array (MPPA-256 many-core processor running two parallel benchmark applications. The traveling salesman problem and matrix multiplication applications were selected to test different device’s resources. The effectiveness of NMR-MPar is assessed by software-implemented fault-injection. For evaluation purposes, it is considered that the system is intended to be used in avionics. Results show the improvement of the application reliability by two orders of magnitude when implementing NMR-MPar on the system. Finally, this work opens the possibility to use massive parallelism for dependable applications in embedded systems.
ARTiS, an Asymmetric Real-Time Scheduler for Linux on Multi-Processor Architectures

OpenAIRE

Piel , Éric; Marquet , Philippe; Soula , Julien; Osuna , Christophe; Dekeyser , Jean-Luc

2005-01-01

The ARTiS system is a real-time extension of the GNU/Linux scheduler dedicated to SMP (Symmetric Multi-Processors) systems. It allows to mix High Performance Computing and real-time. ARTiS exploits the SMP architecture to guarantee the preemption of a processor when the system has to schedule a real-time task. The implementation is available as a modification of the Linux kernel, especially focusing (but not restricted to) IA-64 architecture. The basic idea of ARTiS is to assign a selected se...
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

Energy Technology Data Exchange (ETDEWEB)

Barhen, Jacob [ORNL; Kerekes, Ryan A [ORNL; ST Charles, Jesse Lee [ORNL; Buckner, Mark A [ORNL

2008-01-01

performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

International Nuclear Information System (INIS)

Barhen, Jacob; Kerekes, Ryan A.; St Charles, Jesse Lee; Buckner, Mark A.

2008-01-01

performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R and D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.
Optical Associative Processors For Visual Perception"

Science.gov (United States)

Casasent, David; Telfer, Brian

1988-05-01

We consider various associative processor modifications required to allow these systems to be used for visual perception, scene analysis, and object recognition. For these applications, decisions on the class of the objects present in the input image are required and thus heteroassociative memories are necessary (rather than the autoassociative memories that have been given most attention). We analyze the performance of both associative processors and note that there is considerable difference between heteroassociative and autoassociative memories. We describe associative processors suitable for realizing functions such as: distortion invariance (using linear discriminant function memory synthesis techniques), noise and image processing performance (using autoassociative memories in cascade with with a heteroassociative processor and with a finite number of autoassociative memory iterations employed), shift invariance (achieved through the use of associative processors operating on feature space data), and the analysis of multiple objects in high noise (which is achieved using associative processing of the output from symbolic correlators). We detail and provide initial demonstrations of the use of associative processors operating on iconic, feature space and symbolic data, as well as adaptive associative processors.
AMD's 64-bit Opteron processor

CERN Multimedia

CERN. Geneva

2003-01-01

This talk concentrates on issues that relate to obtaining peak performance from the Opteron processor. Compiler options, memory layout, MPI issues in multi-processor configurations and the use of a NUMA kernel will be covered. A discussion of recent benchmarking projects and results will also be included.BiographiesDavid RichDavid directs AMD's efforts in high performance computing and also in the use of Opteron processors...
Composable processor virtualization for embedded systems

NARCIS (Netherlands)

Molnos, A.M.; Milutinovic, A.; She, D.; Goossens, K.G.W.

2010-01-01

Processor virtualization divides a physical processor's time among a set of virual machines, enabling efficient hardware utilization, application security and allowing co-existence of different operating systems on the same processor. Through initially intended for the server domain, virtualization
VLSI System Implementation of 200 MHz, 8-bit, 90nm CMOS Arithmetic and Logic Unit (ALU Processor Controller

Directory of Open Access Journals (Sweden)

Fazal NOORBASHA

2012-08-01

Full Text Available In this present study includes the Very Large Scale Integration (VLSI system implementation of 200MHz, 8-bit, 90nm Complementary Metal Oxide Semiconductor (CMOS Arithmetic and Logic Unit (ALU processor control with logic gate design style and 0.12µm six metal 90nm CMOS fabrication technology. The system blocks and the behaviour are defined and the logical design is implemented in gate level in the design phase. Then, the logic circuits are simulated and the subunits are converted in to 90nm CMOS layout. Finally, in order to construct the VLSI system these units are placed in the floor plan and simulated with analog and digital, logic and switch level simulators. The results of the simulations indicates that the VLSI system can control different instructions which can divided into sub groups: transfer instructions, arithmetic and logic instructions, rotate and shift instructions, branch instructions, input/output instructions, control instructions. The data bus of the system is 16-bit. It runs at 200MHz, and operating power is 1.2V. In this paper, the parametric analysis of the system, the design steps and obtained results are explained.
A light hydrocarbon fuel processor producing high-purity hydrogen

Science.gov (United States)

Löffler, Daniel G.; Taylor, Kyle; Mason, Dylan

This paper discusses the design process and presents performance data for a dual fuel (natural gas and LPG) fuel processor for PEM fuel cells delivering between 2 and 8 kW electric power in stationary applications. The fuel processor resulted from a series of design compromises made to address different design constraints. First, the product quality was selected; then, the unit operations needed to achieve that product quality were chosen from the pool of available technologies. Next, the specific equipment needed for each unit operation was selected. Finally, the unit operations were thermally integrated to achieve high thermal efficiency. Early in the design process, it was decided that the fuel processor would deliver high-purity hydrogen. Hydrogen can be separated from other gases by pressure-driven processes based on either selective adsorption or permeation. The pressure requirement made steam reforming (SR) the preferred reforming technology because it does not require compression of combustion air; therefore, steam reforming is more efficient in a high-pressure fuel processor than alternative technologies like autothermal reforming (ATR) or partial oxidation (POX), where the combustion occurs at the pressure of the process stream. A low-temperature pre-reformer reactor is needed upstream of a steam reformer to suppress coke formation; yet, low temperatures facilitate the formation of metal sulfides that deactivate the catalyst. For this reason, a desulfurization unit is needed upstream of the pre-reformer. Hydrogen separation was implemented using a palladium alloy membrane. Packed beds were chosen for the pre-reformer and reformer reactors primarily because of their low cost, relatively simple operation and low maintenance. Commercial, off-the-shelf balance of plant (BOP) components (pumps, valves, and heat exchangers) were used to integrate the unit operations. The fuel processor delivers up to 100 slm hydrogen >99.9% pure with <1 ppm CO, <3 ppm CO 2. The
System Level Design of Reconfigurable Server Farms Using Elliptic Curve Cryptography Processor Engines

Directory of Open Access Journals (Sweden)

Sangook Moon

2014-01-01

Full Text Available As today’s hardware architecture becomes more and more complicated, it is getting harder to modify or improve the microarchitecture of a design in register transfer level (RTL. Consequently, traditional methods we have used to develop a design are not capable of coping with complex designs. In this paper, we suggest a way of designing complex digital logic circuits with a soft and advanced type of SystemVerilog at an electronic system level. We apply the concept of design-and-reuse with a high level of abstraction to implement elliptic curve crypto-processor server farms. With the concept of the superior level of abstraction to the RTL used with the traditional HDL design, we successfully achieved the soft implementation of the crypto-processor server farms as well as robust test bench code with trivial effort in the same simulation environment. Otherwise, it could have required error-prone Verilog simulations for the hardware IPs and other time-consuming jobs such as C/SystemC verification for the software, sacrificing more time and effort. In the design of the elliptic curve cryptography processor engine, we propose a 3X faster GF(2m serial multiplication architecture.
Implementation of Hardware Accelerators on Zynq

DEFF Research Database (Denmark)

Toft, Jakob Kenn

of the ARM Cortex-9 processor featured on the Zynq SoC, with regard to execution time, power dissipation and energy consumption. The implementation of the hardware accelerators were successful. Use of the Monte Carlo processor resulted in a significant increase in performance. The Telco hardware accelerator......In the recent years it has become obvious that the performance of general purpose processors are having trouble meeting the requirements of high performance computing applications of today. This is partly due to the relatively high power consumption, compared to the performance, of general purpose...... processors, which has made hardware accelerators an essential part of several datacentres and the worlds fastest super-computers. In this work, two different hardware accelerators were implemented on a Xilinx Zynq SoC platform mounted on the ZedBoard platform. The two accelerators are based on two different...
Design of an ultra-low-power digital processor for passive UHF RFID tags

International Nuclear Information System (INIS)

Shi Wanggen; Zhuang Yiqi; Li Xiaoming; Wang Xianghua; Jin Zhao; Wang Dan

2009-01-01

A new architecture of digital processors for passive UHF radio-frequency identification tags is proposed. This architecture is based on ISO/IEC 18000-6C and targeted at ultra-low power consumption. By applying methods like system-level power management, global clock gating and low voltage implementation, the total power of the design is reduced to a few microwatts. In addition, an innovative way for the design of a true RNG is presented, which contributes to both low power and secure data transaction. The digital processor is verified by an integrated FPGA platform and implemented by the Synopsys design kit for ASIC flows. The design fits different CMOS technologies and has been taped out using the 2P4M 0.35 μm process of Chartered Semiconductor.
Deterministic chaos in the processor load

International Nuclear Information System (INIS)

Halbiniak, Zbigniew; Jozwiak, Ireneusz J.

2007-01-01

In this article we present the results of research whose purpose was to identify the phenomenon of deterministic chaos in the processor load. We analysed the time series of the processor load during efficiency tests of database software. Our research was done on a Sparc Alpha processor working on the UNIX Sun Solaris 5.7 operating system. The conducted analyses proved the presence of the deterministic chaos phenomenon in the processor load in this particular case

A longitudinal multi-bunch feedback system using parallel digital signal processors

International Nuclear Information System (INIS)

Sapozhnikov, L.; Fox, J.D.; Olsen, J.J.; Oxoby, G.; Linscott, I.; Drago, A.; Serio, M.

1994-01-01

A programmable longitudinal feedback system based on four AT ampersand T 1610 digital signal processors has been developed as a component of the PEP-II R ampersand D program. This longitudinal quick prototype is a proof of concept for the PEP-II system and implements full-speed bunch-by-bunch signal processing for storage rings with bunch spacings of 4 ns. The design incorporates a phase-detector-based front end that digitizes the oscillation phases of bunches at the 250 MHz crossing rate, four programmable signal processors that compute correction signals, and a 250-MHz hold buffer/kicker driver stage that applies correction signals back on the beam. The design implements a general-purpose, table-driven downsampler that allows the system to be operated at several accelerator facilities. The hardware architecture of the signal processing is described, and the software algorithms used in the feedback signal computation are discussed. The system configuration used for tests at the LBL Advanced Light Source is presented
An "artificial retina" processor for track reconstruction at the full LHC crossing rate

Science.gov (United States)

Abba, A.; Bedeschi, F.; Caponio, F.; Cenci, R.; Citterio, M.; Cusimano, A.; Fu, J.; Geraci, A.; Grizzuti, M.; Lusardi, N.; Marino, P.; Morello, M. J.; Neri, N.; Ninci, D.; Petruzzo, M.; Piucci, A.; Punzi, G.; Ristori, L.; Spinella, F.; Stracka, S.; Tonelli, D.; Walsh, J.

2016-07-01

We present the latest results of an R&D study for a specialized processor capable of reconstructing, in a silicon pixel detector, high-quality tracks from high-energy collision events at 40 MHz. The processor applies a highly parallel pattern-recognition algorithm inspired to quick detection of edges in mammals visual cortex. After a detailed study of a real-detector application, demonstrating that online reconstruction of offline-quality tracks is feasible at 40 MHz with sub-microsecond latency, we are implementing a prototype using common high-bandwidth FPGA devices.
Image Matrix Processor for Volumetric Computations Final Report CRADA No. TSB-1148-95

Energy Technology Data Exchange (ETDEWEB)

Roberson, G. Patrick [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Browne, Jolyon [Advanced Research & Applications Corporation, Sunnyvale, CA (United States)

2018-01-22

The development of an Image Matrix Processor (IMP) was proposed that would provide an economical means to perform rapid ray-tracing processes on volume "Giga Voxel" data sets. This was a multi-phased project. The objective of the first phase of the IMP project was to evaluate the practicality of implementing a workstation-based Image Matrix Processor for use in volumetric reconstruction and rendering using hardware simulation techniques. Additionally, ARACOR and LLNL worked together to identify and pursue further funding sources to complete a second phase of this project.
Real-time tracking with a 3D-flow processor array

International Nuclear Information System (INIS)

Crosetto, D.

1993-01-01

The problem of real-time track-finding has been performed to date with CAM (Content Addressable Memories) or with fast coincidence logic, because the processing scheme was though to have much slower performance. Advances in technology together with a new architectural approach make it feasible to also explore the computing technique for real-time track finding thus giving the advantages of implementing algorithms that can find more parameters such as calculate the sagitta, curvature, pt, etc. with respect to the CAM approach. This report describes real-time track finding using a new computing approach technique based on the 3D-flow array processor system. This system consists of a fixed interconnection architexture scheme, allowing flexible algorithm implementation on a scalable platform. The 3D-Flow parallel processing system for track finding is scalable in size and performance by either increasing the number of processors, or increasing the speed or else the number of pipelined stages. The present article describes the conceptual idea and the design stage of the project
Real-time tracking with a 3D-Flow processor array

International Nuclear Information System (INIS)

Crosetto, D.

1993-06-01

The problem of real-time track-finding has been performed to date with CAM (Content Addressable Memories) or with fast coincidence logic, because the processing scheme was thought to have much slower performance. Advances in technology together with a new architectural approach make it feasible to also explore the computing technique for real-time track finding thus giving the advantages of implementing algorithms that can find more parameters such as calculate the sagitta, curvature, pt, etc., with respect to the CAM approach. The report describes real-time track finding using new computing approach technique based on the 3D-Flow array processor system. This system consists of a fixed interconnection architecture scheme, allowing flexible algorithm implementation on a scalable platform. The 3D-Flow parallel processing system for track finding is scalable in size and performance by either increasing the number of processors, or increasing the speed or else the number of pipelined stages. The present article describes the conceptual idea and the design stage of the project
Redesigned-Scale-Free CORDIC Algorithm Based FPGA Implementation of Window Functions to Minimize Area and Latency

Directory of Open Access Journals (Sweden)

Supriya Aggarwal

2012-01-01

Full Text Available One of the most important steps in spectral analysis is filtering, where window functions are generally used to design filters. In this paper, we modify the existing architecture for realizing the window functions using CORDIC processor. Firstly, we modify the conventional CORDIC algorithm to reduce its latency and area. The proposed CORDIC algorithm is completely scale-free for the range of convergence that spans the entire coordinate space. Secondly, we realize the window functions using a single CORDIC processor as against two serially connected CORDIC processors in existing technique, thus optimizing it for area and latency. The linear CORDIC processor is replaced by a shift-add network which drastically reduces the number of pipelining stages required in the existing design. The proposed design on an average requires approximately 64% less pipeline stages and saves up to 44.2% area. Currently, the processor is designed to implement Blackman windowing architecture, which with slight modifications can be extended to other widow functions as well. The details of the proposed architecture are discussed in the paper.
Fast volume reconstruction in positron emission tomography: Implementation of four algorithms on a high-performance scalable parallel platform

International Nuclear Information System (INIS)

Egger, M.L.; Scheurer, A.H.; Joseph, C.

1996-01-01

The issue of long reconstruction times in PET has been addressed from several points of view, resulting in an affordable dedicated system capable of handling routine 3D reconstruction in a few minutes per frame: on the hardware side using fast processors and a parallel architecture, and on the software side, using efficient implementations of computationally less intensive algorithms. Execution times obtained for the PRT-1 data set on a parallel system of five hybrid nodes, each combining an Alpha processor for computation and a transputer for communication, are the following (256 sinograms of 96 views by 128 radial samples): Ramp algorithm 56 s, Favor 81 s and reprojection algorithm of Kinahan and Rogers 187 s. The implementation of fast rebinning algorithms has shown our hardware platform to become communications-limited; they execute faster on a conventional single-processor Alpha workstation: single-slice rebinning 7 s, Fourier rebinning 22 s, 2D filtered backprojection 5 s. The scalability of the system has been demonstrated, and a saturation effect at network sizes above ten nodes has become visible; new T9000-based products lifting most of the constraints on network topology and link throughput are expected to result in improved parallel efficiency and scalability properties
Design of massively parallel hardware multi-processors for highly-demanding embedded applications

NARCIS (Netherlands)

Jozwiak, L.; Jan, Y.

2013-01-01

Many new embedded applications require complex computations to be performed to tight schedules, while at the same time demanding low energy consumption and low cost. For implementation of these highly-demanding applications, highly-optimized application-specific multi-processor system-on-a-chip
Neurovision processor for designing intelligent sensors

Science.gov (United States)

Gupta, Madan M.; Knopf, George K.

1992-03-01

A programmable multi-task neuro-vision processor, called the Positive-Negative (PN) neural processor, is proposed as a plausible hardware mechanism for constructing robust multi-task vision sensors. The computational operations performed by the PN neural processor are loosely based on the neural activity fields exhibited by certain nervous tissue layers situated in the brain. The neuro-vision processor can be programmed to generate diverse dynamic behavior that may be used for spatio-temporal stabilization (STS), short-term visual memory (STVM), spatio-temporal filtering (STF) and pulse frequency modulation (PFM). A multi- functional vision sensor that performs a variety of information processing operations on time- varying two-dimensional sensory images can be constructed from a parallel and hierarchical structure of numerous individually programmed PN neural processors.
VIRTUS: a multi-processor system in FASTBUS

International Nuclear Information System (INIS)

Ellett, J.; Jackson, R.; Ritter, R.; Schlein, P.; Yaeger, D.; Zweizig, J.

1986-01-01

VIRTUS is a system of parallel MC68000-based processors interconnected by FASTBUS that is used either on-line as an intelligent trigger component or off-line for full event processing. Each processor receives the complete set of data from one event. The host computer, a VAX 11/780, down-line loads all software to the processors, controls and monitors the functioning of all processors, and writes processed data to tape. Instructions, programs, and data are transferred among the processors and the host in the form of fixed format, variable length data blocks. (Auth.)
JIST: Just-In-Time Scheduling Translation for Parallel Processors

Directory of Open Access Journals (Sweden)

Giovanni Agosta

2005-01-01

Full Text Available The application fields of bytecode virtual machines and VLIW processors overlap in the area of embedded and mobile systems, where the two technologies offer different benefits, namely high code portability, low power consumption and reduced hardware cost. Dynamic compilation makes it possible to bridge the gap between the two technologies, but special attention must be paid to software instruction scheduling, a must for the VLIW architectures. We have implemented JIST, a Virtual Machine and JIT compiler for Java Bytecode targeted to a VLIW processor. We show the impact of various optimizations on the performance of code compiled with JIST through the experimental study on a set of benchmark programs. We report significant speedups, and increments in the number of instructions issued per cycle up to 50% with respect to the non-scheduling version of the JITcompiler. Further optimizations are discussed.
GOTHIC memory management : a multiprocessor shared single level store

OpenAIRE

Michel , Béatrice

1990-01-01

Gothic purpose is to build an object-oriented fault-tolerant distributed operating system for a local area network of multiprocessor workstations. This paper describes Gothic memory manager. It realizes the sharing of the secondary memory space between any process running on the Gothic system. Processes on different processors can communicate by sharing permanent information. The manager implements a shared single level storage with an invalidation protocol working on disk-pages to maintain s...
A lock circuit for a multi-core processor

DEFF Research Database (Denmark)

2015-01-01

An integrated circuit comprising a multiple processor cores and a lock circuit that comprises a queue register with respective bits set or reset via respective, connections dedicated to respective processor cores, whereby the queue register identifies those among the multiple processor cores...... that are enqueued in the queue register. Furthermore, the integrated circuit comprises a current register and a selector circuit configured to select a processor core and identify that processor core by a value in the current register. A selected processor core is a prioritized processor core among the cores...... configured with an integrated circuit; and a silicon die configured with an integrated circuit....
Color sensor and neural processor on one chip

Science.gov (United States)

Fiesler, Emile; Campbell, Shannon R.; Kempem, Lother; Duong, Tuan A.

1998-10-01

Low-cost, compact, and robust color sensor that can operate in real-time under various environmental conditions can benefit many applications, including quality control, chemical sensing, food production, medical diagnostics, energy conservation, monitoring of hazardous waste, and recycling. Unfortunately, existing color sensor are either bulky and expensive or do not provide the required speed and accuracy. In this publication we describe the design of an accurate real-time color classification sensor, together with preprocessing and a subsequent neural network processor integrated on a single complementary metal oxide semiconductor (CMOS) integrated circuit. This one-chip sensor and information processor will be low in cost, robust, and mass-producible using standard commercial CMOS processes. The performance of the chip and the feasibility of its manufacturing is proven through computer simulations based on CMOS hardware parameters. Comparisons with competing methodologies show a significantly higher performance for our device.
Nuclear interactive evaluations on distributed processors

International Nuclear Information System (INIS)

Dix, G.E.; Congdon, S.P.

1988-01-01

BWR [boiling water reactor] nuclear design is a complicated process, involving trade-offs among a variety of conflicting objectives. Complex computer calculations and usually required for each design iteration. GE Nuclear Energy has implemented a system where the evaluations are performed interactively on a large number of small microcomputers. This approach minimizes the time it takes to carry out design iterations even through the processor speeds are low compared with modern super computers. All of the desktop microcomputers are linked to a common data base via an ethernet communications system so that design data can be shared and data quality can be maintained
A frequency and sensitivity tunable microresonator array for high-speed quantum processor readout

Energy Technology Data Exchange (ETDEWEB)

Whittaker, J. D., E-mail: jwhittaker@dwavesys.com; Swenson, L. J.; Volkmann, M. H.; Spear, P.; Altomare, F.; Berkley, A. J.; Bunyk, P.; Harris, R.; Hilton, J. P.; Hoskinson, E.; Johnson, M. W.; Ladizinsky, E.; Lanting, T.; Oh, T.; Perminov, I.; Tolkacheva, E.; Yao, J. [D-Wave Systems, Inc., Burnaby, British Columbia V5G 4M9 (Canada); Bumble, B.; Day, P. K.; Eom, B. H. [Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California 91109 (United States); and others

2016-01-07

Superconducting microresonators have been successfully utilized as detection elements for a wide variety of applications. With multiplexing factors exceeding 1000 detectors per transmission line, they are the most scalable low-temperature detector technology demonstrated to date. For high-throughput applications, fewer detectors can be coupled to a single wire but utilize a larger per-detector bandwidth. For all existing designs, fluctuations in fabrication tolerances result in a non-uniform shift in resonance frequency and sensitivity, which ultimately limits the efficiency of bandwidth utilization. Here, we present the design, implementation, and initial characterization of a superconducting microresonator readout integrating two tunable inductances per detector. We demonstrate that these tuning elements provide independent control of both the detector frequency and sensitivity, allowing us to maximize the transmission line bandwidth utilization. Finally, we discuss the integration of these detectors in a multilayer fabrication stack for high-speed readout of the D-Wave quantum processor, highlighting the use of control and routing circuitry composed of single-flux-quantum loops to minimize the number of control wires at the lowest temperature stage.
Level Zero Trigger processor for the ultra rare kaon decay experiment—NA62

CERN Document Server

Chiozzi, S; Gianoli, A; Mila, G; Neri, I; Petrucci, F; Soldi, D

2016-01-01

n the NA62 experiment at CERN-SPS the communication between detectors and the Lowest Level (L0) trigger processor is performed via Ethernet packets, using the UDP protocol. The L0 Trigger Processor handles the signals from sub-detectors that take part to the trigger generation. In order to choose the best solution for its realization, two different approaches have been implemented. The first approach is fully based on a FPGA device while the second one joins an off-the-shelf PC to the FPGA. The performance of the two systems will be discussed and compared.
Special purpose processors for high energy physics applications

International Nuclear Information System (INIS)

Verkerk, C.

1978-01-01

The review on the subject of hardware processors from very fast decision logic for the split field magnet facility at CERN, to a point-finding processor used to relieve the data-acquisition minicomputer from the task of monitoring the SPS experiment is given. Block diagrams of decision making processor, point-finding processor, complanarity and opening angle processor and programmable track selector module are presented and discussed. The applications of fully programmable but slower processor on the one hand, and very fast and programmable decision logic on the other hand are given in this review
The Central Trigger Processor (CTP)

CERN Multimedia

Franchini, Matteo

2016-01-01

The Central Trigger Processor (CTP) receives trigger information from the calorimeter and muon trigger processors, as well as from other sources of trigger. It makes the Level-1 decision (L1A) based on a trigger menu.
An “artificial retina” processor for track reconstruction at the full LHC crossing rate

International Nuclear Information System (INIS)

Abba, A.; Bedeschi, F.; Caponio, F.; Cenci, R.; Citterio, M.; Cusimano, A.; Fu, J.; Geraci, A.; Grizzuti, M.; Lusardi, N.; Marino, P.; Morello, M.J.; Neri, N.; Ninci, D.; Petruzzo, M.; Piucci, A.; Punzi, G.; Ristori, L.; Spinella, F.

2016-01-01

We present the latest results of an R&D study for a specialized processor capable of reconstructing, in a silicon pixel detector, high-quality tracks from high-energy collision events at 40 MHz. The processor applies a highly parallel pattern-recognition algorithm inspired to quick detection of edges in mammals visual cortex. After a detailed study of a real-detector application, demonstrating that online reconstruction of offline-quality tracks is feasible at 40 MHz with sub-microsecond latency, we are implementing a prototype using common high-bandwidth FPGA devices.

An “artificial retina” processor for track reconstruction at the full LHC crossing rate

Energy Technology Data Exchange (ETDEWEB)

Abba, A. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Politecnico di Milano, Milano (Italy); Bedeschi, F. [Istituto Nazionale di Fisica Nucleare – Sez. di Pisa, Pisa (Italy); Caponio, F. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Politecnico di Milano, Milano (Italy); Cenci, R., E-mail: riccardo.cenci@pi.infn.it [Istituto Nazionale di Fisica Nucleare – Sez. di Pisa, Pisa (Italy); Scuola Normale Superiore, Pisa (Italy); Citterio, M. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Cusimano, A. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Politecnico di Milano, Milano (Italy); Fu, J. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Geraci, A.; Grizzuti, M.; Lusardi, N. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Politecnico di Milano, Milano (Italy); Marino, P.; Morello, M.J. [Istituto Nazionale di Fisica Nucleare – Sez. di Pisa, Pisa (Italy); Scuola Normale Superiore, Pisa (Italy); Neri, N. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Ninci, D. [Istituto Nazionale di Fisica Nucleare – Sez. di Pisa, Pisa (Italy); Università di Pisa, Pisa (Italy); Petruzzo, M. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Università di Milano, Milano (Italy); Piucci, A.; Punzi, G. [Istituto Nazionale di Fisica Nucleare – Sez. di Pisa, Pisa (Italy); Università di Pisa, Pisa (Italy); Ristori, L. [Fermi National Accelerator Laboratory, Batavia, IL (United States); Spinella, F. [Istituto Nazionale di Fisica Nucleare – Sez. di Pisa, Pisa (Italy); and others

2016-07-11

We present the latest results of an R&D study for a specialized processor capable of reconstructing, in a silicon pixel detector, high-quality tracks from high-energy collision events at 40 MHz. The processor applies a highly parallel pattern-recognition algorithm inspired to quick detection of edges in mammals visual cortex. After a detailed study of a real-detector application, demonstrating that online reconstruction of offline-quality tracks is feasible at 40 MHz with sub-microsecond latency, we are implementing a prototype using common high-bandwidth FPGA devices.
RTEMS SMP and MTAPI for Efficient Multi-Core Space Applications on LEON3/LEON4 Processors

Science.gov (United States)

Cederman, Daniel; Hellstrom, Daniel; Sherrill, Joel; Bloom, Gedare; Patte, Mathieu; Zulianello, Marco

2015-09-01

This paper presents the final result of an European Space Agency (ESA) activity aimed at improving the software support for LEON processors used in SMP configurations. One of the benefits of using a multicore system in a SMP configuration is that in many instances it is possible to better utilize the available processing resources by load balancing between cores. This however comes with the cost of having to synchronize operations between cores, leading to increased complexity. While in an AMP system one can use multiple instances of operating systems that are only uni-processor capable, a SMP system requires the operating system to be written to support multicore systems. In this activity we have improved and extended the SMP support of the RTEMS real-time operating system and ensured that it fully supports the multicore capable LEON processors. The targeted hardware in the activity has been the GR712RC, a dual-core core LEON3FT processor, and the functional prototype of ESA's Next Generation Multiprocessor (NGMP), a quad core LEON4 processor. The final version of the NGMP is now available as a product under the name GR740. An implementation of the Multicore Task Management API (MTAPI) has been developed as part of this activity to aid in the parallelization of applications for RTEMS SMP. It allows for simplified development of parallel applications using the task-based programming model. An existing space application, the Gaia Video Processing Unit, has been ported to RTEMS SMP using the MTAPI implementation to demonstrate the feasibility and usefulness of multicore processors for space payload software. The activity is funded by ESA under contract 4000108560/13/NL/JK. Gedare Bloom is supported in part by NSF CNS-0934725.
Very Long Instruction Word Processors

Indian Academy of Sciences (India)

Pentium Processor have modified the processor architecture to exploit parallelism in a program. .... The type of operation itself is encoded using 14 bits. .... text of designing simple architectures with low power consump- tion and execute x86 ...
Event Pre Processor for the CZT Detector on MIRAX

International Nuclear Information System (INIS)

Kendziorra, Eckhard; Schanz, Thomas; Distratis, Giuseppe; Suchy, Slawomir

2006-01-01

We describe the Event Pre Processor (EPP) for the Hard X-ray Imager (HXI) on MIRAX. The EPP provides on board data reduction and event filtering for the HXI Cadmium Zinc Telluride strip detector. Emphasis is placed upon the EPP requirements, its implementation as VHDL design in a Field Programmable Gate Array (FPGA), and the description of a test environment for both the VHDL code and the FPGA hardware
Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors

OpenAIRE

Catalán, Sandra; Herrero, José R.; Igual, Francisco D.; Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.

2015-01-01

Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical tools for many scientific and engineering applications. While there exist high performance implementations of the BLAS (and LAPACK) functionality for many current multi-threaded architectures,the adaption of these libraries for asymmetric multicore processors (AMPs)is still pending. In this paper we address this challenge by developing an asymmetry-aware implementation of the BLAS, based on the...
Single-instruction multiple-data execution

CERN Document Server

Hughes, Christopher J

2015-01-01

Having hit power limitations to even more aggressive out-of-order execution in processor cores, many architects in the past decade have turned to single-instruction-multiple-data (SIMD) execution to increase single-threaded performance. SIMD execution, or having a single instruction drive execution of an identical operation on multiple data items, was already well established as a technique to efficiently exploit data parallelism. Furthermore, support for it was already included in many commodity processors. However, in the past decade, SIMD execution has seen a dramatic increase in the set of
Fast Optimal Replica Placement with Exhaustive Search Using Dynamically Reconfigurable Processor

Directory of Open Access Journals (Sweden)

Hidetoshi Takeshita

2011-01-01

Full Text Available This paper proposes a new replica placement algorithm that expands the exhaustive search limit with reasonable calculation time. It combines a new type of parallel data-flow processor with an architecture tuned for fast calculation. The replica placement problem is to find a replica-server set satisfying service constraints in a content delivery network (CDN. It is derived from the set cover problem which is known to be NP-hard. It is impractical to use exhaustive search to obtain optimal replica placement in large-scale networks, because calculation time increases with the number of combinations. To reduce calculation time, heuristic algorithms have been proposed, but it is known that no heuristic algorithm is assured of finding the optimal solution. The proposed algorithm suits parallel processing and pipeline execution and is implemented on DAPDNA-2, a dynamically reconfigurable processor. Experiments show that the proposed algorithm expands the exhaustive search limit by the factor of 18.8 compared to the conventional algorithm search limit running on a Neumann-type processor.
PERFORMANCE EVALUATION OF OR1200 PROCESSOR WITH EVOLUTIONARY PARALLEL HPRC USING GEP

Directory of Open Access Journals (Sweden)

R. Maheswari

2012-04-01

Full Text Available In this fast computing era, most of the embedded system requires more computing power to complete the complex function/ task at the lesser amount of time. One way to achieve this is by boosting up the processor performance which allows processor core to run faster. This paper presents a novel technique of increasing the performance by parallel HPRC (High Performance Reconfigurable Computing in the CPU/DSP (Digital Signal Processor unit of OR1200 (Open Reduced Instruction Set Computer (RISC 1200 using Gene Expression Programming (GEP an evolutionary programming model. OR1200 is a soft-core RISC processor of the Intellectual Property cores that can efficiently run any modern operating system. In the manufacturing process of OR1200 a parallel HPRC is placed internally in the Integer Execution Pipeline unit of the CPU/DSP core to increase the performance. The GEP Parallel HPRC is activated /deactivated by triggering the signals i HPRC_Gene_Start ii HPRC_Gene_End. A Verilog HDL(Hardware Description language functional code for Gene Expression Programming parallel HPRC is developed and synthesised using XILINX ISE in the former part of the work and a CoreMark processor core benchmark is used to test the performance of the OR1200 soft core in the later part of the work. The result of the implementation ensures the overall speed-up increased to 20.59% by GEP based parallel HPRC in the execution unit of OR1200.
High performance in silico virtual drug screening on many-core processors

Science.gov (United States)

Price, James; Sessions, Richard B; Ibarra, Amaurys A

2015-01-01

Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel’s Xeon Phi and multi-core CPUs with SIMD instruction sets. PMID:25972727
High performance in silico virtual drug screening on many-core processors.

Science.gov (United States)

McIntosh-Smith, Simon; Price, James; Sessions, Richard B; Ibarra, Amaurys A

2015-05-01

Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through more complex pharmacophore matching to more computationally intensive approaches, such as molecular docking. The latter simulates the binding of drug molecules to their targets, typically protein molecules. In this work, we describe BUDE, the Bristol University Docking Engine, which has been ported to the OpenCL industry standard parallel programming language in order to exploit the performance of modern many-core processors. Our highly optimized OpenCL implementation of BUDE sustains 1.43 TFLOP/s on a single Nvidia GTX 680 GPU, or 46% of peak performance. BUDE also exploits OpenCL to deliver effective performance portability across a broad spectrum of different computer architectures from different vendors, including GPUs from Nvidia and AMD, Intel's Xeon Phi and multi-core CPUs with SIMD instruction sets.
Experimental testing of the noise-canceling processor.

Science.gov (United States)

Collins, Michael D; Baer, Ralph N; Simpson, Harry J

2011-09-01

Signal-processing techniques for localizing an acoustic source buried in noise are tested in a tank experiment. Noise is generated using a discrete source, a bubble generator, and a sprinkler. The experiment has essential elements of a realistic scenario in matched-field processing, including complex source and noise time series in a waveguide with water, sediment, and multipath propagation. The noise-canceling processor is found to outperform the Bartlett processor and provide the correct source range for signal-to-noise ratios below -10 dB. The multivalued Bartlett processor is found to outperform the Bartlett processor but not the noise-canceling processor. © 2011 Acoustical Society of America
Video rate morphological processor based on a redundant number representation

Science.gov (United States)

Kuczborski, Wojciech; Attikiouzel, Yianni; Crebbin, Gregory A.

1992-03-01

This paper presents a video rate morphological processor for automated visual inspection of printed circuit boards, integrated circuit masks, and other complex objects. Inspection algorithms are based on gray-scale mathematical morphology. Hardware complexity of the known methods of real-time implementation of gray-scale morphology--the umbra transform and the threshold decomposition--has prompted us to propose a novel technique which applied an arithmetic system without carrying propagation. After considering several arithmetic systems, a redundant number representation has been selected for implementation. Two options are analyzed here. The first is a pure signed digit number representation (SDNR) with the base of 4. The second option is a combination of the base-2 SDNR (to represent gray levels of images) and the conventional twos complement code (to represent gray levels of structuring elements). Operation principle of the morphological processor is based on the concept of the digit level systolic array. Individual processing units and small memory elements create a pipeline. The memory elements store current image windows (kernels). All operation primitives of processing units apply a unified direction of digit processing: most significant digit first (MSDF). The implementation technology is based on the field programmable gate arrays by Xilinx. This paper justified the rationality of a new approach to logic design, which is the decomposition of Boolean functions instead of Boolean minimization.
Design of 10Gbps optical encoder/decoder structure for FE-OCDMA system using SOA and opto-VLSI processors.

Science.gov (United States)

Aljada, Muhsen; Hwang, Seow; Alameh, Kamal

2008-01-21

In this paper we propose and experimentally demonstrate a reconfigurable 10Gbps frequency-encoded (1D) encoder/decoder structure for optical code division multiple access (OCDMA). The encoder is constructed using a single semiconductor optical amplifier (SOA) and 1D reflective Opto-VLSI processor. The SOA generates broadband amplified spontaneous emission that is dynamically sliced using digital phase holograms loaded onto the Opto-VLSI processor to generate 1D codewords. The selected wavelengths are injected back into the same SOA for amplifications. The decoder is constructed using single Opto-VLSI processor only. The encoded signal can successfully be retrieved at the decoder side only when the digital phase holograms of the encoder and the decoder are matched. The system performance is measured in terms of the auto-correlation and cross-correlation functions as well as the eye diagram.
Software and DVFS Tuning for Performance and Energy-Efficiency on Intel KNL Processors

Directory of Open Access Journals (Sweden)

Enrico Calore

2018-06-01

Full Text Available Energy consumption of processors and memories is quickly becoming a limiting factor in the deployment of large computing systems. For this reason, it is important to understand the energy performance of these processors and to study strategies allowing their use in the most efficient way. In this work, we focus on the computing and energy performance of the Knights Landing Xeon Phi, the latest Intel many-core architecture processor for HPC applications. We consider the 64-core Xeon Phi 7230 and profile its performance and energy efficiency using both its on-chip MCDRAM and the off-chip DDR4 memory as the main storage for application data. As a benchmark application, we use a lattice Boltzmann code heavily optimized for this architecture and implemented using several different arrangements of the application data in memory (data-layouts, in short. We also assess the dependence of energy consumption on data-layouts, memory configurations (DDR4 or MCDRAM and the number of threads per core. We finally consider possible trade-offs between computing performance and energy efficiency, tuning the clock frequency of the processor using the Dynamic Voltage and Frequency Scaling (DVFS technique.
Introduction to programming multiple-processor computers

International Nuclear Information System (INIS)

Hicks, H.R.; Lynch, V.E.

1985-04-01

FORTRAN applications programs can be executed on multiprocessor computers in either a unitasking (traditional) or multitasking form. The latter allows a single job to use more than one processor simultaneously, with a consequent reduction in wall-clock time and, perhaps, the cost of the calculation. An introduction to programming in this environment is presented. The concepts of synchronization and data sharing using EVENTS and LOCKS are illustrated with examples. The strategy of strong synchronization and the use of synchronization templates are proposed. We emphasize that incorrect multitasking programs can produce irreproducible results, which makes debugging more difficult
Development of an Advanced Digital Reactor Protection System Using Diverse Dual Processors to Prevent Common-Mode Failure

International Nuclear Information System (INIS)

Shin, Hyun Kook; Nam, Sang Ku; Sohn, Se Do; Chang, Hoon Seon

2003-01-01

The advanced digital reactor protection system (ADRPS) with diverse dual processors has been developed to prevent common-mode failure (CMF). The principle of diversity is applied to both hardware design and software design. For hardware diversity, two different types of CPUs are used for the bistable processor and local coincidence logic (LCL) processor. The Versa Module Eurocard-based single board computers are used for the CPU hardware platforms. The QNX operating system and the VxWorks operating system were selected for software diversity. Functional diversity is also applied to the input and output modules, and to the algorithm in the bistable processors and LCL processors. The characteristics of the newly developed digital protection system are described together with the preventive capability against CMF. Also, system reliability analysis is discussed. The evaluation results show that the ADRPS has a good preventive capability against the CMF and is a highly reliable reactor protection system
Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

Science.gov (United States)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks number of processors
VLSI Design of a Variable-Length FFT/IFFT Processor for OFDM-Based Communication Systems

Directory of Open Access Journals (Sweden)

Jen-Chih Kuo

2003-12-01

Full Text Available The technique of {orthogonal frequency division multiplexing (OFDM} is famous for its robustness against frequency-selective fading channel. This technique has been widely used in many wired and wireless communication systems. In general, the {fast Fourier transform (FFT} and {inverse FFT (IFFT} operations are used as the modulation/demodulation kernel in the OFDM systems, and the sizes of FFT/IFFT operations are varied in different applications of OFDM systems. In this paper, we design and implement a variable-length prototype FFT/IFFT processor to cover different specifications of OFDM applications. The cached-memory FFT architecture is our suggested VLSI system architecture to design the prototype FFT/IFFT processor for the consideration of low-power consumption. We also implement the twiddle factor butterfly {processing element (PE} based on the {{coordinate} rotation digital computer (CORDIC} algorithm, which avoids the use of conventional multiplication-and-accumulation unit, but evaluates the trigonometric functions using only add-and-shift operations. Finally, we implement a variable-length prototype FFT/IFFT processor with TSMC 0.35Ã¢Â€Â‰ÃŽÂ¼m 1P4M CMOS technology. The simulations results show that the chip can perform (64-2048-point FFT/IFFT operations up to 80Ã¢Â€Â‰MHz operating frequency which can meet the speed requirement of most OFDM standards such as WLAN, ADSL, VDSL (256Ã¢ÂˆÂ¼2K, DAB, and 2K-mode DVB.
Development of a highly reliable CRT processor

International Nuclear Information System (INIS)

Shimizu, Tomoya; Saiki, Akira; Hirai, Kenji; Jota, Masayoshi; Fujii, Mikiya

1996-01-01

Although CRT processors have been employed by the main control board to reduce the operator's workload during monitoring, the control systems are still operated by hardware switches. For further advancement, direct controller operation through a display device is expected. A CRT processor providing direct controller operation must be as reliable as the hardware switches are. The authors are developing a new type of highly reliable CRT processor that enables direct controller operations. In this paper, we discuss the design principles behind a highly reliable CRT processor. The principles are defined by studies of software reliability and of the functional reliability of the monitoring and operation systems. The functional configuration of an advanced CRT processor is also addressed. (author)
Computer Generated Inputs for NMIS Processor Verification

International Nuclear Information System (INIS)

J. A. Mullens; J. E. Breeding; J. A. McEvers; R. W. Wysor; L. G. Chiang; J. R. Lenarduzzi; J. T. Mihalczo; J. K. Mattingly

2001-01-01

Proper operation of the Nuclear Identification Materials System (NMIS) processor can be verified using computer-generated inputs [BIST (Built-In-Self-Test)] at the digital inputs. Preselected sequences of input pulses to all channels with known correlation functions are compared to the output of the processor. These types of verifications have been utilized in NMIS type correlation processors at the Oak Ridge National Laboratory since 1984. The use of this test confirmed a malfunction in a NMIS processor at the All-Russian Scientific Research Institute of Experimental Physics (VNIIEF) in 1998. The NMIS processor boards were returned to the U.S. for repair and subsequently used in NMIS passive and active measurements with Pu at VNIIEF in 1999

Efficient Programming for Multicore Processor Heterogeneity: OpenMP versus OmpSs

OpenAIRE

Butko , Anastasiia; Bruguier , Florent; Gamatié , Abdoulaye; Sassatelli , Gilles

2017-01-01

International audience; ARM single-ISA heterogeneous multicore processors combine high-performance big cores with power-efficient small cores. They aim at achieving a suitable balance between performance and energy. How- ever, a main challenge is to program such architectures so as to efficiently exploit their features. In this paper, we study the impact on performance and energy trade-offs of single-ISA architecture according to OpenMP 3.0 and the OmpSs programming models. We consider differ...
3081//sub E/ processor

International Nuclear Information System (INIS)

Kunz, P.F.; Gravina, M.; Oxoby, G.; Trang, Q.; Fucci, A.; Jacobs, D.; Martin, B.; Storr, K.

1983-03-01

Since the introduction of the 168//sub E/, emulating processors have been successful over an amazingly wide range of applications. This paper will describe a second generation processor, the 3081//sub E/. This new processor, which is being developed as a collaboration between SLAC and CERN, goes beyond just fixing the obvious faults of the 168//sub E/. Not only will the 3081//sub E/ have much more memory space, incorporate many more IBM instructions, and have much more memory space, incorporate many more IBM instructions, and have full double precision floating point arithmetic, but it will also have faster execution times and be much simpler to build, debug, and maintain. The simple interface and reasonable cost of the 168//sub E/ will be maintained for the 3081//sub E/
Multimode power processor

Science.gov (United States)

O'Sullivan, George A.; O'Sullivan, Joseph A.

1999-01-01

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.
PixonVision real-time video processor

Science.gov (United States)

Puetter, R. C.; Hier, R. G.

2007-09-01

PixonImaging LLC and DigiVision, Inc. have developed a real-time video processor, the PixonVision PV-200, based on the patented Pixon method for image deblurring and denoising, and DigiVision's spatially adaptive contrast enhancement processor, the DV1000. The PV-200 can process NTSC and PAL video in real time with a latency of 1 field (1/60 th of a second), remove the effects of aerosol scattering from haze, mist, smoke, and dust, improve spatial resolution by up to 2x, decrease noise by up to 6x, and increase local contrast by up to 8x. A newer version of the processor, the PV-300, is now in prototype form and can handle high definition video. Both the PV-200 and PV-300 are FPGA-based processors, which could be spun into ASICs if desired. Obvious applications of these processors include applications in the DOD (tanks, aircraft, and ships), homeland security, intelligence, surveillance, and law enforcement. If developed into an ASIC, these processors will be suitable for a variety of portable applications, including gun sights, night vision goggles, binoculars, and guided munitions. This paper presents a variety of examples of PV-200 processing, including examples appropriate to border security, battlefield applications, port security, and surveillance from unmanned aerial vehicles.
Processors and systems (picture processing)

Energy Technology Data Exchange (ETDEWEB)

Gemmar, P

1983-01-01

Automatic picture processing requires high performance computers and high transmission capacities in the processor units. The author examines the possibilities of operating processors in parallel in order to accelerate the processing of pictures. He therefore discusses a number of available processors and systems for picture processing and illustrates their capacities for special types of picture processing. He stresses the fact that the amount of storage required for picture processing is exceptionally high. The author concludes that it is as yet difficult to decide whether very large groups of simple processors or highly complex multiprocessor systems will provide the best solution. Both methods will be aided by the development of VLSI. New solutions have already been offered (systolic arrays and 3-d processing structures) but they also are subject to losses caused by inherently parallel algorithms. Greater efforts must be made to produce suitable software for multiprocessor systems. Some possibilities for future picture processing systems are discussed. 33 references.
A Low-Power ASIC Signal Processor for a Vestibular Prosthesis.

Science.gov (United States)

Töreyin, Hakan; Bhatti, Pamela T

2016-06-01

A low-power ASIC signal processor for a vestibular prosthesis (VP) is reported. Fabricated with TI 0.35 μm CMOS technology and designed to interface with implanted inertial sensors, the digitally assisted analog signal processor operates extensively in the CMOS subthreshold region. During its operation the ASIC encodes head motion signals captured by the inertial sensors as electrical pulses ultimately targeted for in-vivo stimulation of vestibular nerve fibers. To achieve this, the ASIC implements a coordinate system transformation to correct for misalignment between natural sensors and implanted inertial sensors. It also mimics the frequency response characteristics and frequency encoding mappings of angular and linear head motions observed at the peripheral sense organs, semicircular canals and otolith. Overall the design occupies an area of 6.22 mm (2) and consumes 1.24 mW when supplied with ± 1.6 V.
Special processor for in-core control systems

International Nuclear Information System (INIS)

Golovanov, M.N.; Duma, V.R.; Levin, G.L.; Mel'nikov, A.V.; Polikanin, A.V.; Filatov, V.P.

1978-01-01

The BUTs-20 special processor is discussed, designed to control the units of the in-core control equipment which are incorporated into the VECTOR communication channel, and to provide preliminary data processing prior to computer calculations. A set of instructions and flowsheet of the processor, organization of its communication with memories and other units of the system are given. The processor components: a control unit and an arithmetic logical unit are discussed. It is noted that the special processor permits more effective utilization of the computer time
Precision analog signal processor for beam position measurements in electron storage rings

International Nuclear Information System (INIS)

Hinkson, J.A.; Unser, K.B.

1995-05-01

Beam position monitors (BPM) in electron and positron storage rings have evolved from simple systems composed of beam pickups, coaxial cables, multiplexing relays, and a single receiver (usually a analyzer) into very complex and costly systems of multiple receivers and processors. The older may have taken minutes to measure the circulating beam closed orbit. Today instrumentation designers are required to provide high-speed measurements of the beam orbit, often at the ring revolution frequency. In addition the instruments must have very high accuracy and resolution. A BPM has been developed for the Advanced Light Source (ALS) in Berkeley which features high resolution and relatively low cost. The instrument has a single purpose; to measure position of a stable stored beam. Because the pickup signals are multiplexed into a single receiver, and due to its narrow bandwidth, the receiver is not intended for single-turn studies. The receiver delivers normalized measurements of X and Y position entirely by analog means at nominally 1 V/mm. No computers are involved. No software is required. Bergoz, a French company specializing in precision beam instrumentation, integrated the ALS design m their new BPM analog signal processor module. Performance comparisons were made on the ALS. In this paper we report on the architecture and performance of the ALS prototype BPM
A two-level parallel direct search implementation for arbitrarily sized objective functions

Energy Technology Data Exchange (ETDEWEB)

Hutchinson, S.A.; Shadid, N.; Moffat, H.K. [Sandia National Labs., Albuquerque, NM (United States)] [and others

1994-12-31

In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.
Real-time portable system for fabric defect detection using an ARM processor

Science.gov (United States)

Fernandez-Gallego, J. A.; Yañez-Puentes, J. P.; Ortiz-Jaramillo, B.; Alvarez, J.; Orjuela-Vargas, S. A.; Philips, W.

2012-06-01

Modern textile industry seeks to produce textiles as little defective as possible since the presence of defects can decrease the final price of products from 45% to 65%. Automated visual inspection (AVI) systems, based on image analysis, have become an important alternative for replacing traditional inspections methods that involve human tasks. An AVI system gives the advantage of repeatability when implemented within defined constrains, offering more objective and reliable results for particular tasks than human inspection. Costs of automated inspection systems development can be reduced using modular solutions with embedded systems, in which an important advantage is the low energy consumption. Among the possibilities for developing embedded systems, the ARM processor has been explored for acquisition, monitoring and simple signal processing tasks. In a recent approach we have explored the use of the ARM processor for defects detection by implementing the wavelet transform. However, the computation speed of the preprocessing was not yet sufficient for real time applications. In this approach we significantly improve the preprocessing speed of the algorithm, by optimizing matrix operations, such that it is adequate for a real time application. The system was tested for defect detection using different defect types. The paper is focused in giving a detailed description of the basis of the algorithm implementation, such that other algorithms may use of the ARM operations for fast implementations.
Modcomp MAX IV System Processors reference guide

Energy Technology Data Exchange (ETDEWEB)

Cummings, J.

1990-10-01

A user almost always faces a big problem when having to learn to use a new computer system. The information necessary to use the system is often scattered throughout many different manuals. The user also faces the problem of extracting the information really needed from each manual. Very few computer vendors supply a single Users Guide or even a manual to help the new user locate the necessary manuals. Modcomp is no exception to this, Modcomp MAX IV requires that the user be familiar with the system file usage which adds to the problem. At General Atomics there is an ever increasing need for new users to learn how to use the Modcomp computers. This paper was written to provide a condensed Users Reference Guide'' for Modcomp computer users. This manual should be of value not only to new users but any users that are not Modcomp computer systems experts. This Users Reference Guide'' is intended to provided the basic information for the use of the various Modcomp System Processors necessary to, create, compile, link-edit, and catalog a program. Only the information necessary to provide the user with a basic understanding of the Systems Processors is included. This document provides enough information for the majority of programmers to use the Modcomp computers without having to refer to any other manuals. A lot of emphasis has been placed on the file description and usage for each of the System Processors. This allows the user to understand how Modcomp MAX IV does things rather than just learning the system commands.
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

Energy Technology Data Exchange (ETDEWEB)

Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarria-Miranda, Daniel

2009-02-15

We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.
Assessment of directionality performances: comparison between Freedom and CP810 sound processors.

Science.gov (United States)

Razza, Sergio; Albanese, Greta; Ermoli, Lucilla; Zaccone, Monica; Cristofari, Eliana

2013-10-01

To compare speech recognition in noise for the Nucleus Freedom and CP810 sound processors using different directional settings among those available in the SmartSound portfolio. Single-subject, repeated measures study. Tertiary care referral center. Thirty-one monoaurally and binaurally implanted subjects (24 children and 7 adults) were enrolled. They were all experienced Nucleus Freedom sound processor users and achieved a 100% open set word recognition score in quiet listening conditions. Each patient was fitted with the Freedom and the CP810 processor. The program setting incorporated Adaptive Dynamic Range Optimization (ADRO) and adopted the directional algorithm BEAM (both devices) and ZOOM (only on CP810). Speech reception threshold (SRT) was assessed in a free-field layout, with disyllabic word list and interfering multilevel babble noise in the 3 different pre-processing configurations. On average, CP810 improved significantly patients' SRTs as compared to Freedom SP after 1 hour of use. Instead, no significant difference was observed in patients' SRT between the BEAM and the ZOOM algorithm fitted in the CP810 processor. The results suggest that hardware developments achieved in the design of CP810 allow an immediate and relevant directional advantage as compared to the previous-generation Freedom device.
Effect of processor temperature on film dosimetry

International Nuclear Information System (INIS)

Srivastava, Shiv P.; Das, Indra J.

2012-01-01

Optical density (OD) of a radiographic film plays an important role in radiation dosimetry, which depends on various parameters, including beam energy, depth, field size, film batch, dose, dose rate, air film interface, postexposure processing time, and temperature of the processor. Most of these parameters have been studied for Kodak XV and extended dose range (EDR) films used in radiation oncology. There is very limited information on processor temperature, which is investigated in this study. Multiple XV and EDR films were exposed in the reference condition (d max. , 10 × 10 cm 2 , 100 cm) to a given dose. An automatic film processor (X-Omat 5000) was used for processing films. The temperature of the processor was adjusted manually with increasing temperature. At each temperature, a set of films was processed to evaluate OD at a given dose. For both films, OD is a linear function of processor temperature in the range of 29.4–40.6°C (85–105°F) for various dose ranges. The changes in processor temperature are directly related to the dose by a quadratic function. A simple linear equation is provided for the changes in OD vs. processor temperature, which could be used for correcting dose in radiation dosimetry when film is used.
Implementation and analysis of a Navier-Stokes algorithm on parallel computers

Science.gov (United States)

Fatoohi, Raad A.; Grosch, Chester E.

1988-01-01

The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Bank switched memory interface for an image processor

International Nuclear Information System (INIS)

Barron, M.; Downward, J.

1980-09-01

A commercially available image processor is interfaced to a PDP-11/45 through an 8K window of memory addresses. When the image processor was not in use it was desired to be able to use the 8K address space as real memory. The standard method of accomplishing this would have been to use UNIBUS switches to switch in either the physical 8K bank of memory or the image processor memory. This method has the disadvantage of being rather expensive. As a simple alternative, a device was built to selectively enable or disable either an 8K bank of memory or the image processor memory. To enable the image processor under program control, GEN is contracted in size, the memory is disabled, a device partition for the image processor is created above GEN, and the image processor memory is enabled. The process is reversed to restore memory to GEN. The hardware to enable/disable the image and computer memories is controlled using spare bits from a DR-11K output register. The image processor and physical memory can be switched in or out on line with no adverse affects on the system's operation
VON WISPR Family Processors: Volume 1

National Research Council Canada - National Science Library

Wagstaff, Ronald

1997-01-01

...) and the background noise they are embedded in. Processors utilizing those fluctuations such as the von WISPR Family Processors discussed herein, are methods or algorithms that preferentially attenuate the fluctuating signals and noise...
Hardware Realization of an FPGA Processor – Operating System Call Offload and Experiences

DEFF Research Database (Denmark)

Hindborg, Andreas Erik; Schleuniger, Pascal; Jensen, Nicklas Bo

2014-01-01

Field-programmable gate arrays, FPGAs, are attractive implementation platforms for low-volume signal and image processing applications. The structure of FPGAs allows for an efficient implementation of parallel algorithms. Sequential algorithms, on the other hand, often perform better...... core that can be integrated in many signal and data processing platforms on FPGAs. We also show how we allow the processor to use operating system services. For a set of SPLASH-2 and SPEC CPU2006 benchmarks we show a speedup of up to 64% over a similar Xilinx MicroBlaze implementation while using 27...
An Evaluation of an Ada Implementation of the Rete Algorithm for Embedded Flight Processors

Science.gov (United States)

1990-12-01

computers was desired. The VAX VMS operating system has many built-in methods for determining program performance (including VAX PCA), but these methods... overviev , of the target environment-- the MIL-STD-1750A VHSIC Avionic Modular Processor ( VA.IP, running under the Ada Avionics Real-Time Software (AARTS... computers . Mil-STD-1750A, the Air Force’s standard flight computer architecture, however, places severe constraints on applications software processing
A Workload-Adaptive and Reconfigurable Bus Architecture for Multicore Processors

Directory of Open Access Journals (Sweden)

Shoaib Akram

2010-01-01

Full Text Available Interconnection networks for multicore processors are traditionally designed to serve a diversity of workloads. However, different workloads or even different execution phases of the same workload may benefit from different interconnect configurations. In this paper, we first motivate the need for workload-adaptive interconnection networks. Subsequently, we describe an interconnection network framework based on reconfigurable switches for use in medium-scale (up to 32 cores shared memory multicore processors. Our cost-effective reconfigurable interconnection network is implemented on a traditional shared bus interconnect with snoopy-based coherence, and it enables improved multicore performance. The proposed interconnect architecture distributes the cores of the processor into clusters with reconfigurable logic between clusters to support workload-adaptive policies for inter-cluster communication. Our interconnection scheme is complemented by interconnect-aware scheduling and additional interconnect optimizations which help boost the performance of multiprogramming and multithreaded workloads. We provide experimental results that show that the overall throughput of multiprogramming workloads (consisting of two and four programs can be improved by up to 60% with our configurable bus architecture. Similar gains can be achieved also for multithreaded applications as shown by further experiments. Finally, we present the performance sensitivity of the proposed interconnect architecture on shared memory bandwidth availability.

Hardware Realization of an FPGA Processor - Operating System Call Offload and Experiences

DEFF Research Database (Denmark)

Hindborg, Andreas Erik; Karlsson, Sven

2014-01-01

core that can be integrated in many signal and data processing platforms on FPGAs. We also show how we allow the processor to use operating system services. For a set of SPLASH-2 and SPEC2006 benchmarks we show an speedup of up to 64% over a similar Xilinx MicroBlaze implementation while using 27...
Many - body simulations using an array processor

International Nuclear Information System (INIS)

Rapaport, D.C.

1985-01-01

Simulations of microscopic models of water and polypeptides using molecular dynamics and Monte Carlo techniques have been carried out with the aid of an FPS array processor. The computational techniques are discussed, with emphasis on the development and optimization of the software to take account of the special features of the processor. The computing requirements of these simulations exceed what could be reasonably carried out on a normal 'scientific' computer. While the FPS processor is highly suited to the kinds of models described, several other computationally intensive problems in statistical mechanics are outlined for which alternative processor architectures are more appropriate
The design of a graphics processor

International Nuclear Information System (INIS)

Holmes, M.; Thorne, A.R.

1975-12-01

The design of a graphics processor is described which takes into account known and anticipated user requirements, the availability of cheap minicomputers, the state of integrated circuit technology, and the overall need to minimise cost for a given performance. The main user needs are the ability to display large high resolution pictures, and to dynamically change the user's view in real time by means of fast coordinate processing hardware. The transformations that can be applied to 2D or 3D coordinates either singly or in combination are: translation, scaling, mirror imaging, rotation, and the ability to map the transformation origin on to any point on the screen. (author)
LHCb base-line level-0 trigger 3D-Flow implementation

International Nuclear Information System (INIS)

Crosetto, Dario B.

1999-01-01

The LHCb Level-0 trigger implementation with the 3D-Flow system offers full programmability, allowing it to adapt to unexpected operating conditions and enabling new, unpredicted physics. The implementation is described in detail and refers to components and technology available today. The 3D-Flow Processor system is a new, technology-independent concept in very fast, real-time system architectures. Based on the replication of a single type of circuit of 100k gates, which communicates in six directions: bi-directional with North, East, West, and South neighbors, unidirectional from Top to Bottom, the system offers full programmability, modularity, ease of expansion and adaptation to the latest technology. A complete study of its applicability to the LHCb calorimeter triggers is presented. Full description of the input data handling, either in digital or mixed digital-analog form, of the data processing, and the transmission of results to the global level-0 trigger decision unit are provided. Any level-0 trigger algorithm (2x2, 3x3, 4x4, etc.) with up to 20 steps, can be implemented with zero dead-time, while sustaining input data rate (up to 32-bit per input channel, per bunch crossing) at 40 MHz. For each step, each 3D-Flow processor can execute up to 26 operations, inclusive of compare, ranging, finding local maxima, and efficient data exchange with neighboring channels. (One-to-one correspondence between input channel and trigger tower.) Populated with only two main types of components, front-end FPGAs and 3D-Flow processors, a single type of board, it is shown how the whole Level-0 calorimeter trigger can be accommodated into six crates (9U), each containing 16 identical boards. All 3D-Flow inter-chip Bottom to Top ports connection are all contained on the board (data are multiplexed 2 : 1, PCB traces are shorter than 6 cm); all 3D-flow inter-chip North, East, West, and South ports connections, between boards and crates, are multiplexed (8+2) : 1 and are
Application of functional IDDQ testing in a VLIW processor towards detection of aging degradation

NARCIS (Netherlands)

Kerkhoff, Hans G.; Zhao, Yong

2015-01-01

In this paper, functional IDDQ testing has been applied for a 90nm VLIW processor to effectively detect aging degradation. This technique can provide health data for reliability evaluation as used in e.g. prognostic software for lifetime prediction. The test environment for validation, implementing
Reconfigurable Secure Video Codec Based on DWT and AES Processor

Directory of Open Access Journals (Sweden)

Rached Tourki

2010-01-01

Full Text Available In this paper, we proposed a secure video codec based on the discrete wavelet transformation (DWT and the Advanced Encryption Standard (AES processor. Either, use of video coding with DWT or encryption using AES is well known. However, linking these two designs to achieve secure video coding is leading. The contributions of our work are as follows. First, a new method for image and video compression is proposed. This codec is a synthesis of JPEG and JPEG2000,which is implemented using Huffman coding to the JPEG and DWT to the JPEG2000. Furthermore, an improved motion estimation algorithm is proposed. Second, the encryptiondecryption effects are achieved by the AES processor. AES is aim to encrypt group of LL bands. The prominent feature of this method is an encryption of LL bands by AES-128 (128-bit keys, or AES-192 (192-bit keys, or AES-256 (256-bit keys.Third, we focus on a method that implements partial encryption of LL bands. Our approach provides considerable levels of security (key size, partial encryption, mode encryption, and has very limited adverse impact on the compression efficiency. The proposed codec can provide up to 9 cipher schemes within a reasonable software cost. Latency, correlation, PSNR and compression rate results are analyzed and shown.
Evaluation of the Intel Sandy Bridge-EP server processor

CERN Document Server

Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

2012-01-01

In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing an 8-core “Sandy Bridge-EP” processor with Intel’s previous microarchitecture, the “Westmere-EP”. The Intel marketing names for these processors are “Xeon E5-2600 processor series” and “Xeon 5600 processor series”, respectively. Both processors are produced in a 32nm process, and both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores ...
Array processors based on Gaussian fraction-free method

Energy Technology Data Exchange (ETDEWEB)

Peng, S; Sedukhin, S [Aizu Univ., Aizuwakamatsu, Fukushima (Japan); Sedukhin, I

1998-03-01

The design of algorithmic array processors for solving linear systems of equations using fraction-free Gaussian elimination method is presented. The design is based on a formal approach which constructs a family of planar array processors systematically. These array processors are synthesized and analyzed. It is shown that some array processors are optimal in the framework of linear allocation of computations and in terms of number of processing elements and computing time. (author)
Custom Hardware Processor to Compute a Figure of Merit for the Fit of X-Ray Diffraction

International Nuclear Information System (INIS)

Gomez-Pulido, P.J.A.; Vega-Rodriguez, M.A.; Sanchez-Perez, J.M.; Sanchez-Bajo, F.; Santos, S.P.D.

2008-01-01

A custom processor based on re configurable hardware technology is proposed in order to compute the figure of merit used to measure the quality of the fit of X-ray diffraction peaks. As the experimental X-ray profiles can present many peaks severely overlapped, it is necessary to select the best model among a large set of reasonably good solutions. Determining the best solution is computationally intensive, because this is a hard combinatorial optimization problem. The proposed processors, working in parallel, increase the performance relative to a software implementation.
Design Principles for Synthesizable Processor Cores

DEFF Research Database (Denmark)

Schleuniger, Pascal; McKee, Sally A.; Karlsson, Sven

2012-01-01

As FPGAs get more competitive, synthesizable processor cores become an attractive choice for embedded computing. Currently popular commercial processor cores do not fully exploit current FPGA architectures. In this paper, we propose general design principles to increase instruction throughput...
Data register and processor for multiwire chambers

International Nuclear Information System (INIS)

Karpukhin, V.V.

1985-01-01

A data register and a processor for data receiving and processing from drift chambers of a device for investigating relativistic positroniums are described. The data are delivered to the register input in the form of the Grey 8 bit code, memorized and transformed to a position code. The register information is delivered to the KAMAK trunk and to the front panel plug. The processor selects particle tracks in a horizontal plane of the facility. ΔY maximum coordinate divergence and minimum point quantity on the track are set from the processor front panel. Processor solution time is 16 μs maximum quantity of simultaneously analyzed coordinates is 16
Computation and parallel implementation for early vision

Science.gov (United States)

Gualtieri, J. Anthony

1990-01-01

The problem of early vision is to transform one or more retinal illuminance images-pixel arrays-to image representations built out of such primitive visual features such as edges, regions, disparities, and clusters. These transformed representations form the input to later vision stages that perform higher level vision tasks including matching and recognition. Researchers developed algorithms for: (1) edge finding in the scale space formulation; (2) correlation methods for computing matches between pairs of images; and (3) clustering of data by neural networks. These algorithms are formulated for parallel implementation of SIMD machines, such as the Massively Parallel Processor, a 128 x 128 array processor with 1024 bits of local memory per processor. For some cases, researchers can show speedups of three orders of magnitude over serial implementations.
Implementation of a cluster Beowulf

International Nuclear Information System (INIS)

Victorino Guzman, Jorge Enrique

2001-01-01

One of the simulation systems that put a great stress on computational resources and performance are the climatic models, with a high cost of implementation, making difficult its acquisition. An alternative that offers good performance at a reasonable cost is the construction of Cluster Beowulf that allows to emulate the behaviour of a computer with several processors. In the present article we discuss the requirements of hardware for the construction of the Cluster Beowulf, the software resources for the implementation of the model CCM3.6 and the performance of the Cluster Beowulf, of the Group of Investigation in Meteorology at the National University of Colombia, with different number of processors
Development methods for VLSI-processors

International Nuclear Information System (INIS)

Horninger, K.; Sandweg, G.

1982-01-01

The aim of this project, which was originally planed for 3 years, was the development of modern system and circuit concepts, for VLSI-processors having a 32 bit wide data path. The result of this first years work is the concept of a general purpose processor. This processor is not only logically but also physically (on the chip) divided into four functional units: a microprogrammable instruction unit, an execution unit in slice technique, a fully associative cache memory and an I/O unit. For the ALU of the execution unit circuits in PLA and slice techniques have been realized. On the basis of regularity, area consumption and achievable performance the slice technique has been prefered. The designs utilize selftesting circuitry. (orig.) [de
Precision analog signal processor for beam position measurements in electron storage rings

International Nuclear Information System (INIS)

Hinkson, J.A.; Unser, K.B.

1995-01-01

Beam position monitors (BPM) in electron and positron storage rings have evolved from simple systems composed of beam pickups, coaxial cables, multiplexing relays, and a single receiver (usually a analyzer) into very complex and costly systems of multiple receivers and processors. The older may have taken minutes to measure the circulating beam closed orbit. Today instrumentation designers are required to provide high-speed measurements of the beam orbit, often at the ring revolution frequency. In addition the instruments must have very high accuracy and resolution. A BPM has been developed for the Advanced Light Source (ALS) in Berkeley which features high resolution and relatively low cost. The instrument has a single purpose; to measure position of a stable stored beam. Because the pickup signals are multiplexed into a single receiver, and due to its narrow bandwidth, the receiver is not intended for single-turn studies. The receiver delivers normalized measurements of X and Y posit ion entirely by analog means at nominally 1 V/mm. No computers are involved. No software is required. Bergoz, a French company specializing in precision beam instrumentation, integrated the ALS design m their new BPM analog signal processor module. Performance comparisons were made on the ALS. In this paper we report on the architecture and performance of the ALS prototype BPM
The performance of an LSI-11/23 with a SKYMNK-Q array processor as a high speed front end processor

International Nuclear Information System (INIS)

Clark, D.L.

1983-01-01

The NSRL has recently installed a VAX-11/750 based data acquisition system which is networked to two LSI-11/23 satellite processors. Each of the LSI's are connected to CAMAC branch drivers. The LSI's have small array processors installed for use in preprocessing data. The objective is to provide an easy to use high speed processor that will relieve the VAX of some of the real-time data analysis tasks. The basic operation of the array processor and some of the results of performance tests are described
Embedded Processor Laboratory

Data.gov (United States)

Federal Laboratory Consortium — The Embedded Processor Laboratory provides the means to design, develop, fabricate, and test embedded computers for missile guidance electronics systems in support...
An introduction to programming multiple-processor computers

International Nuclear Information System (INIS)

Hicks, H.R.; Lynch, V.E.

1986-01-01

Fortran applications programs can be executed on multiprocessor computers in either a unitasking (traditional) or multitasking form. The later allows a single job to use more than one processor simultaneously, with a consequent reduction in elapsed time and, perhaps, the cost of the calculation. An introduction to programming in this environment is presented. The concept of synchronization and data sharing using EVENTS and LOCKS are illustrated with examples. The strategy of strong synchronization and the use of synchronization templates are proposed. We emphasize that incorrect multitasking programs can produce irreducible results, which makes debugging more difficult
Analytical Bounds on the Threads in IXP1200 Network Processor

OpenAIRE

Ramakrishna, STGS; Jamadagni, HS

2003-01-01

Increasing link speeds have placed enormous burden on the processing requirements and the processors are expected to carry out a variety of tasks. Network Processors (NP) [1] [2] is the blanket name given to the processors, which are traded for flexibility and performance. Network Processors are offered by a number of vendors; to take the main burden of processing requirement of network related operations from the conventional processors. The Network Processors cover a spectrum of design trad...
Automotive Fuel Processor Development and Demonstration with Fuel Cell Systems

Energy Technology Data Exchange (ETDEWEB)

Nuvera Fuel Cells

2005-04-15

The potential for fuel cell systems to improve energy efficiency and reduce emissions over conventional power systems has generated significant interest in fuel cell technologies. While fuel cells are being investigated for use in many applications such as stationary power generation and small portable devices, transportation applications present some unique challenges for fuel cell technology. Due to their lower operating temperature and non-brittle materials, most transportation work is focusing on fuel cells using proton exchange membrane (PEM) technology. Since PEM fuel cells are fueled by hydrogen, major obstacles to their widespread use are the lack of an available hydrogen fueling infrastructure and hydrogen's relatively low energy storage density, which leads to a much lower driving range than conventional vehicles. One potential solution to the hydrogen infrastructure and storage density issues is to convert a conventional fuel such as gasoline into hydrogen onboard the vehicle using a fuel processor. Figure 2 shows that gasoline stores roughly 7 times more energy per volume than pressurized hydrogen gas at 700 bar and 4 times more than liquid hydrogen. If integrated properly, the fuel processor/fuel cell system would also be more efficient than traditional engines and would give a fuel economy benefit while hydrogen storage and distribution issues are being investigated. Widespread implementation of fuel processor/fuel cell systems requires improvements in several aspects of the technology, including size, startup time, transient response time, and cost. In addition, the ability to operate on a number of hydrocarbon fuels that are available through the existing infrastructure is a key enabler for commercializing these systems. In this program, Nuvera Fuel Cells collaborated with the Department of Energy (DOE) to develop efficient, low-emission, multi-fuel processors for transportation applications. Nuvera's focus was on (1) developing fuel

On the Parallel Elliptic Single/Multigrid Solutions about Aligned and Nonaligned Bodies Using the Virtual Machine for Multiprocessors

Directory of Open Access Journals (Sweden)

A. Averbuch

1994-01-01

Full Text Available Parallel elliptic single/multigrid solutions around an aligned and nonaligned body are presented and implemented on two multi-user and single-user shared memory multiprocessors (Sequent Symmetry and MOS and on a distributed memory multiprocessor (a Transputer network. Our parallel implementation uses the Virtual Machine for Muli-Processors (VMMP, a software package that provides a coherent set of services for explicitly parallel application programs running on diverse multiple instruction multiple data (MIMD multiprocessors, both shared memory and message passing. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. Furthermore, it ensures high portability of application programs by implementing the same services on all target multiprocessors. The performance of our algorithm is investigated in detail. It is seen to fit well the above architectures when the number of processors is less than the maximal number of grid points along the axes. In general, the efficiency in the nonaligned case is higher than in the aligned case. Alignment overhead is observed to be up to 200% in the shared-memory case and up to 65% in the message-passing case. We have demonstrated that when using VMMP, the portability of the algorithms is straightforward and efficient.
The ATLAS Trigger Algorithms for General Purpose Graphics Processor Units

CERN Document Server

Tavares Delgado, Ademar; The ATLAS collaboration

2016-01-01

The ATLAS Trigger Algorithms for General Purpose Graphics Processor Units Type: Talk Abstract: We present the ATLAS Trigger algorithms developed to exploit General Purpose Graphics Processor Units. ATLAS is a particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system has two levels, hardware-based Level 1 and the High Level Trigger implemented in software running on a farm of commodity CPU. Performing the trigger event selection within the available farm resources presents a significant challenge that will increase future LHC upgrades. are being evaluated as a potential solution for trigger algorithms acceleration. Key factors determining the potential benefit of this new technology are the relative execution speedup, the number of GPUs required and the relative financial cost of the selected GPU. We have developed a trigger demonstrator which includes algorithms for reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Cal...
Accuracies Of Optical Processors For Adaptive Optics

Science.gov (United States)

Downie, John D.; Goodman, Joseph W.

1992-01-01

Paper presents analysis of accuracies and requirements concerning accuracies of optical linear-algebra processors (OLAP's) in adaptive-optics imaging systems. Much faster than digital electronic processor and eliminate some residual distortion. Question whether errors introduced by analog processing of OLAP overcome advantage of greater speed. Paper addresses issue by presenting estimate of accuracy required in general OLAP that yields smaller average residual aberration of wave front than digital electronic processor computing at given speed.
Making CSB + -Trees Processor Conscious

DEFF Research Database (Denmark)

Samuel, Michael; Pedersen, Anders Uhl; Bonnet, Philippe

2005-01-01

of the CSB+-tree. We argue that it is necessary to consider a larger group of parameters in order to adapt CSB+-tree to processor architectures as different as Pentium and Itanium. We identify this group of parameters and study how it impacts the performance of CSB+-tree on Itanium 2. Finally, we propose......Cache-conscious indexes, such as CSB+-tree, are sensitive to the underlying processor architecture. In this paper, we focus on how to adapt the CSB+-tree so that it performs well on a range of different processor architectures. Previous work has focused on the impact of node size on the performance...... a systematic method for adapting CSB+-tree to new platforms. This work is a first step towards integrating CSB+-tree in MySQL’s heap storage manager....
Recommending the heterogeneous cluster type multi-processor system computing

International Nuclear Information System (INIS)

Iijima, Nobukazu

2010-01-01

Real-time reactor simulator had been developed by reusing the equipment of the Musashi reactor and its performance improvement became indispensable for research tools to increase sampling rate with introduction of arithmetic units using multi-Digital Signal Processor(DSP) system (cluster). In order to realize the heterogeneous cluster type multi-processor system computing, combination of two kinds of Control Processor (CP) s, Cluster Control Processor (CCP) and System Control Processor (SCP), were proposed with Large System Control Processor (LSCP) for hierarchical cluster if needed. Faster computing performance of this system was well evaluated by simulation results for simultaneous execution of plural jobs and also pipeline processing between clusters, which showed the system led to effective use of existing system and enhancement of the cost performance. (T. Tanaka)
Parallel Implementation and Scaling of an Adaptive Mesh Discrete Ordinates Algorithm for Transport

International Nuclear Information System (INIS)

Howell, L H

2004-01-01

Block-structured adaptive mesh refinement (AMR) uses a mesh structure built up out of locally-uniform rectangular grids. In the BoxLib parallel framework used by the Raptor code, each processor operates on one or more of these grids at each refinement level. The decomposition of the mesh into grids and the distribution of these grids among processors may change every few timesteps as a calculation proceeds. Finer grids use smaller timesteps than coarser grids, requiring additional work to keep the system synchronized and ensure conservation between different refinement levels. In a paper for NECDC 2002 I presented preliminary results on implementation of parallel transport sweeps on the AMR mesh, conjugate gradient acceleration, accuracy of the AMR solution, and scalar speedup of the AMR algorithm compared to a uniform fully-refined mesh. This paper continues with a more in-depth examination of the parallel scaling properties of the scheme, both in single-level and multi-level calculations. Both sweeping and setup costs are considered. The algorithm scales with acceptable performance to several hundred processors. Trends suggest, however, that this is the limit for efficient calculations with traditional transport sweeps, and that modifications to the sweep algorithm will be increasingly needed as job sizes in the thousands of processors become common
A low power biomedical signal processor ASIC based on hardware software codesign.

Science.gov (United States)

Nie, Z D; Wang, L; Chen, W G; Zhang, T; Zhang, Y T

2009-01-01

A low power biomedical digital signal processor ASIC based on hardware and software codesign methodology was presented in this paper. The codesign methodology was used to achieve higher system performance and design flexibility. The hardware implementation included a low power 32bit RISC CPU ARM7TDMI, a low power AHB-compatible bus, and a scalable digital co-processor that was optimized for low power Fast Fourier Transform (FFT) calculations. The co-processor could be scaled for 8-point, 16-point and 32-point FFTs, taking approximate 50, 100 and 150 clock circles, respectively. The complete design was intensively simulated using ARM DSM model and was emulated by ARM Versatile platform, before conducted to silicon. The multi-million-gate ASIC was fabricated using SMIC 0.18 microm mixed-signal CMOS 1P6M technology. The die area measures 5,000 microm x 2,350 microm. The power consumption was approximately 3.6 mW at 1.8 V power supply and 1 MHz clock rate. The power consumption for FFT calculations was less than 1.5 % comparing with the conventional embedded software-based solution.
Assessing the Progress of Trapped-Ion Processors Towards Fault-Tolerant Quantum Computation

Science.gov (United States)

Bermudez, A.; Xu, X.; Nigmatullin, R.; O'Gorman, J.; Negnevitsky, V.; Schindler, P.; Monz, T.; Poschinger, U. G.; Hempel, C.; Home, J.; Schmidt-Kaler, F.; Biercuk, M.; Blatt, R.; Benjamin, S.; Müller, M.

2017-10-01

A quantitative assessment of the progress of small prototype quantum processors towards fault-tolerant quantum computation is a problem of current interest in experimental and theoretical quantum information science. We introduce a necessary and fair criterion for quantum error correction (QEC), which must be achieved in the development of these quantum processors before their sizes are sufficiently big to consider the well-known QEC threshold. We apply this criterion to benchmark the ongoing effort in implementing QEC with topological color codes using trapped-ion quantum processors and, more importantly, to guide the future hardware developments that will be required in order to demonstrate beneficial QEC with small topological quantum codes. In doing so, we present a thorough description of a realistic trapped-ion toolbox for QEC and a physically motivated error model that goes beyond standard simplifications in the QEC literature. We focus on laser-based quantum gates realized in two-species trapped-ion crystals in high-optical aperture segmented traps. Our large-scale numerical analysis shows that, with the foreseen technological improvements described here, this platform is a very promising candidate for fault-tolerant quantum computation.
Low Cost Design of an Advanced Encryption Standard (AES) Processor Using a New Common-Subexpression-Elimination Algorithm

Science.gov (United States)

Chen, Ming-Chih; Hsiao, Shen-Fu

In this paper, we propose an area-efficient design of Advanced Encryption Standard (AES) processor by applying a new common-expression-elimination (CSE) method to the sub-functions of various transformations required in AES. The proposed method reduces the area cost of realizing the sub-functions by extracting the common factors in the bit-level XOR/AND-based sum-of-product expressions of these sub-functions using a new CSE algorithm. Cell-based implementation results show that the AES processor with our proposed CSE method has significant area improvement compared with previous designs.
Preemptive and Non-Preemptive Real-Time UniProcessor Scheduling

OpenAIRE

George , Laurent; Rivierre , Nicolas; Spuri , Marco

1996-01-01

Projet REFLECS; Scheduling theory, as it applies to hard-real-time environment, has been widely studied in the last twenty years and it might be unclear to make it out within the plethora of results available. Our goal is first to collect in a single paper the results known for uniproces sor, non-idling, preemptive/non-preemptive, fixed/dynamic priority driven contexts, consid ering general task sets as a central figure for the description of possible processor loads. Second to establish new ...
A digital retina-like low-level vision processor.

Science.gov (United States)

Mertoguno, S; Bourbakis, N G

2003-01-01

This correspondence presents the basic design and the simulation of a low level multilayer vision processor that emulates to some degree the functional behavior of a human retina. This retina-like multilayer processor is the lower part of an autonomous self-organized vision system, called Kydon, that could be used on visually impaired people with a damaged visual cerebral cortex. The Kydon vision system, however, is not presented in this paper. The retina-like processor consists of four major layers, where each of them is an array processor based on hexagonal, autonomous processing elements that perform a certain set of low level vision tasks, such as smoothing and light adaptation, edge detection, segmentation, line recognition and region-graph generation. At each layer, the array processor is a 2D array of k/spl times/m hexagonal identical autonomous cells that simultaneously execute certain low level vision tasks. Thus, the hardware design and the simulation at the transistor level of the processing elements (PEs) of the retina-like processor and its simulated functionality with illustrative examples are provided in this paper.
The TMS34010 graphic processor - an architecture for image visualization in NMR tomography

International Nuclear Information System (INIS)

Slaets, Jan Frans Willem; Paiva, Maria Stela Veludo de; Almeida, Lirio O.B.

1989-01-01

This abstract presents a description of the minimum system implemented with the graphic processor TMS34010, which will be used in the reconstruction, treatment and interpretation f images obtained by NMR tomography. The project is being developed in the LIE (Electronic Instrumentation Laboratory), of the Sao Carlos Chemistry and Physical Institute, S P, Brazil and is already in operation
State-based Communication on Time-predictable Multicore Processors

DEFF Research Database (Denmark)

Sørensen, Rasmus Bo; Schoeberl, Martin; Sparsø, Jens

2016-01-01

Some real-time systems use a form of task-to-task communication called state-based or sample-based communication that does not impose any flow control among the communicating tasks. The concept is similar to a shared variable, where a reader may read the same value multiple times or may not read...... a given value at all. This paper explores time-predictable implementations of state-based communication in network-on-chip based multicore platforms through five algorithms. With the presented analysis of the implemented algorithms, the communicating tasks of one core can be scheduled independently...... of tasks on other cores. Assuming a specific time-predictable multicore processor, we evaluate how the read and write primitives of the five algorithms contribute to the worst-case execution time of the communicating tasks. Each of the five algorithms has specific capabilities that make them suitable...
Data collection from FASTBUS to a DEC UNIBUS processor through the UNIBUS-Processor Interface

International Nuclear Information System (INIS)

Larwill, M.; Barsotti, E.; Lesny, D.; Pordes, R.

1983-01-01

This paper describes the use of the UNIBUS Processor Interface, an interface between FASTBUS and the Digital Equipment Corporation UNIBUS. The UPI was developed by Fermilab and the University of Illinois. Details of the use of this interface in a high energy physics experiment at Fermilab are given. The paper includes a discussion of the operation of the UPI on the UNIBUS of a VAX-11, and plans for using the UPI to perform data acquisition from FASTBUS to a VAX-11 Processor
Atmel's New Rad-Hard Sparc V8 Processor 200Mhz & Low Power System on Chip

Science.gov (United States)

Ganry, Nicolas; Mantelet, Guy; Parkes, Steve; McClements, Chris

2014-08-01

The AT6981 is a new generation of processor designed for critical spaceflight applications, which combines a high-performance SPARC® V8 radiation hard processor, with enough on-chip memory for many aerospace applications and state-of-the-art SpaceWire networking technology from STAR- Dundee. The AT6981 is implemented in Atmel 90nm rad-hard technology, enabling 200 MHz operating speed for the processor with power consumption levels around 1W. This advanced technology allows strong system integration in a SoC with embedded peripherals like CAN, 1553, Ethernet, DDR and embedded memory with 1Mbytes SRAM. The device is ITAR- free and is developed in France by Atmel Aerospace having more than of 30years space experience. This paper describes this new SoC architecture and technical options considered to insure the best performances, the minimum power consumption and high reliability. This device will be available on the market in H2 2014 for evaluation with first flight models targeted end 2015.
Real-Time Adaptive Lossless Hyperspectral Image Compression using CCSDS on Parallel GPGPU and Multicore Processor Systems

Science.gov (United States)

Hopson, Ben; Benkrid, Khaled; Keymeulen, Didier; Aranki, Nazeeh; Klimesh, Matt; Kiely, Aaron

2012-01-01

The proposed CCSDS (Consultative Committee for Space Data Systems) Lossless Hyperspectral Image Compression Algorithm was designed to facilitate a fast hardware implementation. This paper analyses that algorithm with regard to available parallelism and describes fast parallel implementations in software for GPGPU and Multicore CPU architectures. We show that careful software implementation, using hardware acceleration in the form of GPGPUs or even just multicore processors, can exceed the performance of existing hardware and software implementations by up to 11x and break the real-time barrier for the first time for a typical test application.
New development for low energy electron beam processor

International Nuclear Information System (INIS)

Takei, Taro; Goto, Hitoshi; Oizumi, Matsutoshi; Hirakawa, Tetsuya; Ochi, Masafumi

2003-01-01

Newly developed low-energy electron beam (EB) processors that have unique designs and configurations compared to conventional ones enable electron-beam treatment of small three-dimensional objects, such as grain-like agricultural products and small plastic parts. As the EB processor can irradiate the products from the whole angles, the uniform EB treatment can be achieved at one time regardless the complex shapes of the product. Here presented are two new EB processors: the first system has cylindrical process zone, which allows three-dimensional objects to be irradiated with one-pass treatment. The second is a tube-type small EB processor, achieving not only its compactor design, but also higher beam extraction efficiency and flexible installation of the irradiation heads. The basic design of each processor and potential applications with them will be presented in this paper. (author)
Efficient quantum walk on a quantum processor

Science.gov (United States)

Qiang, Xiaogang; Loke, Thomas; Montanaro, Ashley; Aungskunsiri, Kanin; Zhou, Xiaoqi; O'Brien, Jeremy L.; Wang, Jingbo B.; Matthews, Jonathan C. F.

2016-01-01

The random walk formalism is used across a wide range of applications, from modelling share prices to predicting population genetics. Likewise, quantum walks have shown much potential as a framework for developing new quantum algorithms. Here we present explicit efficient quantum circuits for implementing continuous-time quantum walks on the circulant class of graphs. These circuits allow us to sample from the output probability distributions of quantum walks on circulant graphs efficiently. We also show that solving the same sampling problem for arbitrary circulant quantum circuits is intractable for a classical computer, assuming conjectures from computational complexity theory. This is a new link between continuous-time quantum walks and computational complexity theory and it indicates a family of tasks that could ultimately demonstrate quantum supremacy over classical computers. As a proof of principle, we experimentally implement the proposed quantum circuit on an example circulant graph using a two-qubit photonics quantum processor. PMID:27146471
BWR thermohydraulics simulation on the AD-10 peripheral processor

International Nuclear Information System (INIS)

Wulff, W.; Cheng, H.S.; Lekach, S.V.; Mallen, A.N.

1983-01-01

This presentation demonstrates the feasibility of simulating plant transients and severe abnormal transients in nuclear power plants at much faster than real-time computing speeds in a low-cost, dedicated, interactive minicomputer. This is achieved by implementing advanced modeling techniques in modern, special-purpose peripheral processors for high-speed system simulation. The results of this demonstration will impact safety analyses and parametric studies, studies on operator responses and control system failures and it will make possible the continuous on-line monitoring of plant performance and the detection and diagnosis of system or component failures
Addressing Thermal and Performance Variability Issues in Dynamic Processors

Energy Technology Data Exchange (ETDEWEB)

Yoshii, Kazutomo [Argonne National Lab. (ANL), Argonne, IL (United States); Llopis, Pablo [Univ. Carlos III de Madrid (Spain); Zhang, Kaicheng [Northwestern Univ., Evanston, IL (United States); Luo, Yingyi [Northwestern Univ., Evanston, IL (United States); Ogrenci-Memik, Seda [Northwestern Univ., Evanston, IL (United States); Memik, Gokhan [Northwestern Univ., Evanston, IL (United States); Sankaran, Rajesh [Argonne National Lab. (ANL), Argonne, IL (United States); Beckman, Pete [Argonne National Lab. (ANL), Argonne, IL (United States)

2017-03-01

As CMOS scaling nears its end, parameter variations (process, temperature and voltage) are becoming a major concern. To overcome parameter variations and provide stability, modern processors are becoming dynamic, opportunistically adjusting voltage and frequency based on thermal and energy constraints, which negatively impacts traditional bulk-synchronous parallelism-minded hardware and software designs. As node-level architecture is growing in complexity, implementing variation control mechanisms only with hardware can be a challenging task. In this paper we investigate a software strategy to manage hardwareinduced variations, leveraging low-level monitoring/controlling mechanisms.

Reconfigurable Secure Video Codec Based on DWT and AES Processor

OpenAIRE

Rached Tourki; M. Machhout; B. Bouallegue; M. Atri; M. Zeghid; D. Dia

2010-01-01

In this paper, we proposed a secure video codec based on the discrete wavelet transformation (DWT) and the Advanced Encryption Standard (AES) processor. Either, use of video coding with DWT or encryption using AES is well known. However, linking these two designs to achieve secure video coding is leading. The contributions of our work are as follows. First, a new method for image and video compression is proposed. This codec is a synthesis of JPEG and JPEG2000,which is implemented using Huffm...
The Level 0 Trigger Processor for the NA62 experiment

International Nuclear Information System (INIS)

Chiozzi, S.; Gamberini, E.; Gianoli, A.; Mila, G.; Neri, I.; Petrucci, F.; Soldi, D.

2016-01-01

In the NA62 experiment at CERN, the intense flux of particles requires a high-performance trigger for the data acquisition system. A Level 0 Trigger Processor (L0TP) was realized, performing the event selection based on trigger primitives coming from sub-detectors and reducing the trigger rate from 10 to 1 MHz. The L0TP is based on a commercial FPGA device and has been implemented in two different solutions. The performance of the two systems are highlighted and compared.
The Level 0 Trigger Processor for the NA62 experiment

Energy Technology Data Exchange (ETDEWEB)

Chiozzi, S. [INFN, Ferrara (Italy); Gamberini, E. [University of Ferrara and INFN, Ferrara (Italy); Gianoli, A. [INFN, Ferrara (Italy); Mila, G. [University of Turin and INFN, Turin (Italy); Neri, I., E-mail: neri@fe.infn.it [University of Ferrara and INFN, Ferrara (Italy); Petrucci, F. [University of Ferrara and INFN, Ferrara (Italy); Soldi, D. [University of Turin and INFN, Turin (Italy)

2016-07-11

In the NA62 experiment at CERN, the intense flux of particles requires a high-performance trigger for the data acquisition system. A Level 0 Trigger Processor (L0TP) was realized, performing the event selection based on trigger primitives coming from sub-detectors and reducing the trigger rate from 10 to 1 MHz. The L0TP is based on a commercial FPGA device and has been implemented in two different solutions. The performance of the two systems are highlighted and compared.
A data base processor semantics specification package

Science.gov (United States)

Fishwick, P. A.

1983-01-01

A Semantics Specification Package (DBPSSP) for the Intel Data Base Processor (DBP) is defined. DBPSSP serves as a collection of cross assembly tools that allow the analyst to assemble request blocks on the host computer for passage to the DBP. The assembly tools discussed in this report may be effectively used in conjunction with a DBP compatible data communications protocol to form a query processor, precompiler, or file management system for the database processor. The source modules representing the components of DBPSSP are fully commented and included.
Globe hosts launch of new processor

CERN Multimedia

2006-01-01

Launch of the quadecore processor chip at the Globe. On 14 November, in a series of major media events around the world, the chip-maker Intel launched its new 'quadcore' processor. For the regions of Europe, the Middle East and Africa, the day-long launch event took place in CERN's Globe of Science and Innovation, with over 30 journalists in attendance, coming from as far away as Johannesburg and Dubai. CERN was a significant choice for the event: the first tests of this new generation of processor in Europe had been made at CERN over the preceding months, as part of CERN openlab, a research partnership with leading IT companies such as Intel, HP and Oracle. The event also provided the opportunity for the journalists to visit ATLAS and the CERN Computer Centre. The strategy of putting multiple processor cores on the same chip, which has been pursued by Intel and other chip-makers in the last few years, represents an important departure from the more traditional improvements in the sheer speed of such chips. ...
Simulation of a processor switching circuit with APLSV

International Nuclear Information System (INIS)

Dilcher, H.

1979-01-01

The report describes the simulation of a processor switching circuit with APL. Furthermore an APL function is represented to simulate a processor in an assembly like language. Both together serve as a tool for studying processor properties. By means of the programming function it is also possible to program other simulated processors. The processor is to be used in the processing of data in real time analysis that occur in high energy physics experiments. The data are already offered to the computer in digitalized form. A typical data rate is at 10 KB/ sec. The data are structured in blocks. The particular blocks are 1 KB wide and are independent from each other. Aprocessor has to decide, whether the block data belong to an event that is part of the backround noise and can therefore be forgotten, or whether the data should be saved for a later evaluation. (orig./WB) [de
Accuracy Limitations in Optical Linear Algebra Processors

Science.gov (United States)

Batsell, Stephen Gordon

1990-01-01

One of the limiting factors in applying optical linear algebra processors (OLAPs) to real-world problems has been the poor achievable accuracy of these processors. Little previous research has been done on determining noise sources from a systems perspective which would include noise generated in the multiplication and addition operations, noise from spatial variations across arrays, and from crosstalk. In this dissertation, we propose a second-order statistical model for an OLAP which incorporates all these system noise sources. We now apply this knowledge to determining upper and lower bounds on the achievable accuracy. This is accomplished by first translating the standard definition of accuracy used in electronic digital processors to analog optical processors. We then employ our second-order statistical model. Having determined a general accuracy equation, we consider limiting cases such as for ideal and noisy components. From the ideal case, we find the fundamental limitations on improving analog processor accuracy. From the noisy case, we determine the practical limitations based on both device and system noise sources. These bounds allow system trade-offs to be made both in the choice of architecture and in individual components in such a way as to maximize the accuracy of the processor. Finally, by determining the fundamental limitations, we show the system engineer when the accuracy desired can be achieved from hardware or architecture improvements and when it must come from signal pre-processing and/or post-processing techniques.
Performance of a Processor for On-Board RFI Detection and Mitigation in MetOpSG Radiometers

DEFF Research Database (Denmark)

Skou, Niels; Kristensen, Steen Savstrup; Lahtinen, J.

2016-01-01

An RFI processor breadboard has been designed and developed for the second generation MetOp satellites. RFI detection is based on the anomalous amplitude, kurtosis, and cross-frequency algorithms. These are implemented in VHDL code in an FPGA. Thus algorithm performance can very well be assessed...... by proper code simulation. Such simulations show that the kurtosis algorithm as implemented works according to theory when subjected to pulsed sinusoidal and QPSK signals....
Description of the signal and background event mixing as implemented in the Marlin processor OverlayTiming

CERN Document Server

Schade, P

2011-01-01

This note documents OverlayTiming, a processor in the Marlin software frame- work. OverlayTiming can model the timing structure of a linear collider bunch train and offers the possibility to merge simulated physics events with beam-beam background events. In addition, a realistic structure of the detector readout can be imitated by defining readout time windows for each subdetector.
A dedicated line-processor as used at the SHF

International Nuclear Information System (INIS)

Bevan, A.V.; Hatley, R.W.; Price, D.R.; Rankin, P.

1985-01-01

A hardwired trigger processor was used at the SLAC Hybrid Facility to find evidence for charged tracks originating from the fiducial volume of a 40'' rapidcycling bubble chamber. Straight-line projections of these tracks in the plane perpendicular to the applied magnetic field were searched for using data from three sets of proportional wire chambers (PWC). This information was made directly available to the processor by means of a special digitizing card. The results memory of the processor simulated read-only memory in a 168/E processor and was accessible by it. The 168/E controlled the issuing of a trigger command to the bubble chamber flash tubes. The same design of digitizer card used by the line processor was incorporated into the 168/E, again as read only memory, which allowed it access to the raw data for continual monitoring of trigger integrity. The design logic of the trigger processor was verified by running real PWC data through a FORTRAN simulation of the hardware. This enabled the debugging to become highly automated since a step by step, computer controlled comparison of processor registers to simulation predictions could be made
A Dual Digital Signal Processor VME Board for Instrumentation and Control Applications

International Nuclear Information System (INIS)

H. Dong; R. Flood; C. Hovater; J. Musson

2001-01-01

A Dual Digital Signal Processing VME Board is being developed for the CEBAF Beam Current Monitor system at Jefferson Lab. It is a versatile general-purpose digital signal processing board using an open architecture, which allows for adaptation to various applications. The base design uses two independent Texas Instrument (TI) TMS320C6711, which are 900 MFLOPS floating-point digital signal processors (DSP). Applications that require a fixed point DSP can be implemented by replacing the baseline DSP with the pin-for-pin compatible TMS320C6211. Both parallel and serial protocols have been implemented for communicating with off board devices. The initial implementation makes use of TI Multi-channel Serial protocol and VME bus protocol. Other communication protocols can be implemented by reprogramming the FPGA. Each DSP is equipped with FLASH PROM and SDRAM for program and data storage. Additionally, each DSP has 16 bits of digital I/O, two digital analog converters, and two analog to digital converters. Dual 160 pins mezzanine connectors provide expansion capability without design modifications. The mezzanine interface conforms to the TI Expansion Daughter Card Interface standard. The design can be manufactured with a reduced chip set without redesigning the printed circuit board. For example, it can be implemented as a single-channel DSP with no analog I/O. The board supports JTAG 1149 boundary scan to facilitate testing, debugging, and programming. It is fully programmable using software development tools such as TI Code Composer Studio and a JTAG emulator such as Spectrum Digital DS510PP-PLUS. Using these tools allows one program the flash memory and FPGA through the JTAG ports, thus eliminating the need for a separate ROM/FPGA programmer. This work supported by U.S. DOE Contract No. DE-AC05-84ER40150
Single Event Upset Analysis: On-orbit performance of the Alpha Magnetic Spectrometer Digital Signal Processor Memory aboard the International Space Station

Science.gov (United States)

Li, Jiaqiang; Choutko, Vitaly; Xiao, Liyi

2018-03-01

Based on the collection of error data from the Alpha Magnetic Spectrometer (AMS) Digital Signal Processors (DSP), on-orbit Single Event Upsets (SEUs) of the DSP program memory are analyzed. The daily error distribution and time intervals between errors are calculated to evaluate the reliability of the system. The particle density distribution of International Space Station (ISS) orbit is presented and the effects from the South Atlantic Anomaly (SAA) and the geomagnetic poles are analyzed. The impact of solar events on the DSP program memory is carried out combining data analysis and Monte Carlo simulation (MC). From the analysis and simulation results, it is concluded that the area corresponding to the SAA is the main source of errors on the ISS orbit. Solar events can also cause errors on DSP program memory, but the effect depends on the on-orbit particle density.
Implementation of single qubit in QD ensembles

International Nuclear Information System (INIS)

Alegre, T.P. Mayer

2004-01-01

Full text: During the last decades the semiconductor industry has achieved the production of exponentially shrinking components. This fact points to fundamental limits of integration, making computation with single atoms or particles like an electron an ultimate goal. To get to this limit, quantum systems in solid state have to be manipulated in a controllable fashion. The assessment of quantum degrees of freedom for information processing may allow exponentially faster performance for certain classes of problems. The essential aspect to be explored in quantum information processing resides in the superposition of states that allows resources such as entangled states to be envisaged. The quest for the optimal system to host a quantum variable that is sufficiently isolated from the environment encompasses implementations spanning optical, atomic, molecular and solid state systems. In the solid state, a variety of proposals have come forth, each one having its own advantages and disadvantages. The main conclusion from these e efforts is that there is no decisive technology upon which quantum information devices will be built. Self-assembled quantum dots (SAQDs or QDs), can be grown with size uniformity that enables the observation of single electron loading events. They can in turn be used to controllably trap single electrons into discrete levels, atom-like, with their corresponding shells. Hund's rules and Pauli exclusion principle are observed in these nanostructures and are key in allowing and preserving a particular quantum state. Provided that one can trap one electron in a QD ensemble, the corresponding spin can be manipulated by an external magnetic field by either conventional Electron Spin Resonance (ESR) techniques or g-tensor modulation resonance (g-TMR). By analogy with Nuclear Magnetic Resonance, single qubit operations are proposed, which at some point in time should be scaled, provided that spin-spin interactions can be controlled. Read out can be
Consumer Electronics Processors for Critical Real-Time Systems: a (Failed) Practical Experience

OpenAIRE

Fernandez , Gabriel; Cazorla , Francisco; Abella , Jaume

2018-01-01

International audience; The convergence between consumer electronics and critical real-time markets has increased the need for hardware platforms able to deliver high performance as well as high (sustainable) performance guarantees. Using the ARM big.LITTLE architecture as example of those platforms, in this paper we report our experience with one of its implementations (the Qualcomm SnapDragon 810 processor) to derive performance bounds with measurement-based techniques. Our theoretical and ...
Evaluation of the Sentinel-3 Hydrologic Altimetry Processor prototypE (SHAPE) methods.

Science.gov (United States)

Benveniste, J.; Garcia-Mondéjar, A.; Bercher, N.; Fabry, P. L.; Roca, M.; Varona, E.; Fernandes, J.; Lazaro, C.; Vieira, T.; David, G.; Restano, M.; Ambrózio, A.

2017-12-01

Inland water scenes are highly variable, both in space and time, which leads to a much broader range of radar signatures than ocean surfaces. This applies to both LRM and "SAR" mode (SARM) altimetry. Nevertheless the enhanced along-track resolution of SARM altimeters should help improve the accuracy and precision of inland water height measurements from satellite. The SHAPE project - Sentinel-3 Hydrologic Altimetry Processor prototypE - which is funded by ESA through the Scientific Exploitation of Operational Missions Programme Element (contract number 4000115205/15/I-BG) aims at preparing for the exploitation of Sentinel-3 data over the inland water domain. The SHAPE Processor implements all of the steps necessary to derive rivers and lakes water levels and discharge from Delay-Doppler Altimetry and perform their validation against in situ data. The processor uses FBR CryoSat-2 and L1A Sentinel-3A data as input and also various ancillary data (proc. param., water masks, L2 corrections, etc.), to produce surface water levels. At a later stage, water level data are assimilated into hydrological models to derive river discharge. This poster presents the improvements obtained with the new methods and algorithms over the regions of interest (Amazon and Danube rivers, Vanern and Titicaca lakes).
Embedded processor extensions for image processing

Science.gov (United States)

Thevenin, Mathieu; Paindavoine, Michel; Letellier, Laurent; Heyrman, Barthélémy

2008-04-01

The advent of camera phones marks a new phase in embedded camera sales. By late 2009, the total number of camera phones will exceed that of both conventional and digital cameras shipped since the invention of photography. Use in mobile phones of applications like visiophony, matrix code readers and biometrics requires a high degree of component flexibility that image processors (IPs) have not, to date, been able to provide. For all these reasons, programmable processor solutions have become essential. This paper presents several techniques geared to speeding up image processors. It demonstrates that a gain of twice is possible for the complete image acquisition chain and the enhancement pipeline downstream of the video sensor. Such results confirm the potential of these computing systems for supporting future applications.
One-Chip Solution to Intelligent Robot Control: Implementing Hexapod Subsumption Architecture Using a Contemporary Microprocessor

Directory of Open Access Journals (Sweden)

Nikita Pashenkov

2008-11-01

Full Text Available This paper introduces a six-legged autonomous robot managed by a single controller and a software core modeled on subsumption architecture. We begin by discussing the features and capabilities of IsoPod, a new processor for robotics which has enabled a streamlined implementation of our project. We argue that this processor offers a unique set of hardware and software features, making it a practical development platform for robotics in general and for subsumption-based control architectures in particular. Next, we summarize original ideas on subsumption architecture implementation for a six-legged robot, as presented by its inventor Rodney Brooks in 1980's. A comparison is then made to a more recent example of a hexapod control architecture based on subsumption. The merits of both systems are analyzed and a new subsumption architecture layout is formulated as a response. We conclude with some remarks regarding the development of this project as a hint at new potentials for intelligent robot design, opened up by a recent development in embedded controller market.
Run Control Communication for the Upgrade of the ATLAS Muon-to-Central-Trigger-Processor Interface (MUCTPI)

CERN Document Server

AUTHOR|(INSPIRE)INSPIRE-00223859; The ATLAS collaboration; Armbruster, Aaron James; Carrillo-Montoya, German D.; Chelstowska, Magda Anna; Czodrowski, Patrick; Deviveiros, Pier-Olivier; Eifert, Till; Ellis, Nicolas; Galster, Gorm Aske Gram Krohn; Haas, Stefan; Helary, Louis; Lagkas Nikolos, Orestis; Marzin, Antoine; Pauly, Thilo; Ryjov, Vladimir; Schmieden, Kristof; Silva Oliveira, Marcos Vinicius; Stelzer, Harald Joerg; Vichoudis, Paschalis; Wengler, Thorsten; Farthouat, Philippe

2018-01-01

The Muon-to-Central Trigger Processor Interface (MUCTPI) of the ATLAS experiment at the Large Hadron Collider (LHC) at CERN will be upgraded to an ATCA blade system for Run 3. The new design requires development of new communication models for control, configuration and monitoring. A System-on-Chip (SoC) with a programmable logic part and a processor part will be used for communication to the run control system and to the MUCTPI processing FPGAs. Different approaches have been compared. First, we tried an available UDP-based implementation in firmware for the programmable logic. Although this approach works as expected, it does not provide any flexibility to extend the functionality to more complex operations, e.g. for serial protocols. Second, we used the SoC processor with an embedded Linux operating system and an application-specific software written in C++ using a TCP remote-procedure-call approach. The software is built and maintained using the Yocto/OpenEmbedded framework. This approach was successfully...
Run control communication for the upgrade of the ATLAS Muon-to-Central Trigger Processor Interface (MUCTPI)

CERN Document Server

AUTHOR|(INSPIRE)INSPIRE-00223859; The ATLAS collaboration; Armbruster, Aaron James; Carrillo-Montoya, German D.; Chelstowska, Magda Anna; Czodrowski, Patrick; Deviveiros, Pier-Olivier; Eifert, Till; Ellis, Nicolas; Farthouat, Philippe; Galster, Gorm Aske Gram Krohn; Haas, Stefan; Helary, Louis; Lagkas Nikolos, Orestis; Marzin, Antoine; Pauly, Thilo; Ryjov, Vladimir; Schmieden, Kristof; Silva Oliveira, Marcos Vinicius; Stelzer, Harald Joerg; Vichoudis, Paschalis; Wengler, Thorsten

The Muon-to-Central-Trigger-Processor Interface (MUCTPI) of the ATLAS experiment at the Large Hadron Collider (LHC) at CERN will be upgraded to an ATCA blade system for Run 3, starting in 2021. The new design requires development of new communication models for control, configuration and monitoring. A System-on-Chip (SoC) with a programmable logic part and a processor part will be used for communication to the run control system and to the MUCTPI processing FPGAs. Different approaches have been compared. First, we tried an available UDP-based implementation in firmware for the programmable logic. Although this approach works as expected, it does not provide any flexibility to extend the functionality to more complex operations, e.g. for serial protocols. Second, we used a SoC processor with an embedded Linux operating system and an application-specific software written in C++ using a TCP remote-procedure-call approach. The software is built and maintained using the framework of the Yocto Project. This approa...
An Alternative Water Processor for Long Duration Space Missions

Science.gov (United States)

Barta, Daniel J.; Pickering, Karen D.; Meyer, Caitlin; Pennsinger, Stuart; Vega, Leticia; Flynn, Michael; Jackson, Andrew; Wheeler, Raymond

2014-01-01

A new wastewater recovery system has been developed that combines novel biological and physicochemical components for recycling wastewater on long duration human space missions. Functionally, this Alternative Water Processor (AWP) would replace the Urine Processing Assembly on the International Space Station and reduce or eliminate the need for the multi-filtration beds of the Water Processing Assembly (WPA). At its center are two unique game changing technologies: 1) a biological water processor (BWP) to mineralize organic forms of carbon and nitrogen and 2) an advanced membrane processor (Forward Osmosis Secondary Treatment) for removal of solids and inorganic ions. The AWP is designed for recycling larger quantities of wastewater from multiple sources expected during future exploration missions, including urine, hygiene (hand wash, shower, oral and shave) and laundry. The BWP utilizes a single-stage membrane-aerated biological reactor for simultaneous nitrification and denitrification. The Forward Osmosis Secondary Treatment (FOST) system uses a combination of forward osmosis (FO) and reverse osmosis (RO), is resistant to biofouling and can easily tolerate wastewaters high in non-volatile organics and solids associated with shower and/or hand washing. The BWP has been operated continuously for over 300 days. After startup, the mature biological system averaged 85% organic carbon removal and 44% nitrogen removal, close to stoichiometric maximum based on available carbon. To date, the FOST has averaged 93% water recovery, with a maximum of 98%. If the wastewater is slighty acidified, ammonia rejection is optimal. This paper will provide a description of the technology and summarize results from ground-based testing using real wastewater

Realization Of Algebraic Processor For XML Documents Processing

International Nuclear Information System (INIS)

Georgiev, Bozhidar; Georgieva, Adriana

2010-01-01

In this paper, are presented some possibilities concerning the implementation of an algebraic method for XML hierarchical data processing which makes faster the XML search mechanism. Here is offered a different point of view for creation of advanced algebraic processor (with all necessary software tools and programming modules respectively). Therefore, this nontraditional approach for fast XML navigation with the presented algebraic processor may help to build an easier user-friendly interface provided XML transformations, which can avoid the difficulties in the complicated language constructions of XSL, XSLT and XPath. This approach allows comparatively simple search of XML hierarchical data by means of the following types of functions: specification functions and so named build-in functions. The choice of programming language Java may appear strange at first, but it isn't when you consider that the applications can run on different kinds of computers. The specific search mechanism based on the linear algebra theory is faster in comparison with MSXML parsers (on the basis of the developed examples with about 30%). Actually, there exists the possibility for creating new software tools based on the linear algebra theory, which cover the whole navigation and search techniques characterizing XSLT/XPath. The proposed method is able to replace more complicated operations in other SOA components.
Optimal control of a head-of-line processor sharing model with regular and opportunity customers

NARCIS (Netherlands)

Wijk, van A.C.C.

2011-01-01

Motivated by a workload control setting, we study a model where two types of customers are served by a single server according to the head-of-line processor sharing discipline. Regular customers and opportunity customers are arriving to the system according to two independent Poisson processes, each
Sojourn time tails in processor-sharing systems

NARCIS (Netherlands)

Egorova, R.R.

2009-01-01

The processor-sharing discipline was originally introduced as a modeling abstraction for the design and performance analysis of the processing unit of a computer system. Under the processor-sharing discipline, all active tasks are assumed to be processed simultaneously, receiving an equal share of
An interactive parallel processor for data analysis

International Nuclear Information System (INIS)

Mong, J.; Logan, D.; Maples, C.; Rathbun, W.; Weaver, D.

1984-01-01

A parallel array of eight minicomputers has been assembled in an attempt to deal with kiloparameter data events. By exporting computer system functions to a separate processor, the authors have been able to achieve computer amplification linearly proportional to the number of executing processors
Optical Array Processor: Laboratory Results

Science.gov (United States)

Casasent, David; Jackson, James; Vaerewyck, Gerard

1987-01-01

A Space Integrating (SI) Optical Linear Algebra Processor (OLAP) is described and laboratory results on its performance in several practical engineering problems are presented. The applications include its use in the solution of a nonlinear matrix equation for optimal control and a parabolic Partial Differential Equation (PDE), the transient diffusion equation with two spatial variables. Frequency-multiplexed, analog and high accuracy non-base-two data encoding are used and discussed. A multi-processor OLAP architecture is described and partitioning and data flow issues are addressed.
XOP: A second generation fast processor for on-line use in high energy physics experiments

International Nuclear Information System (INIS)

Lingjaerde, T.

1981-01-01

Processors for trigger calculations and data compression in high energy physics are characterized by a high data input capability combined with fas execution of relatively simple routines. In order to achieve the required performance it is advantageous to replace the classical computer instruction-set by microcoded instructions, the various fields of which control the internal subunits in parallel. The fast processor called ESOP is based on such a principle: the different operations are handled step by step by dedicated optimized modules under control of a central instruction unit. Thus, the arithmetic operations, address calculations, conditional checking, loop counts and next instruction evaluation all overlap in time. Based upon the experience from ESOP the architecture of a new processor 'XOP' is beginning to take shape which will be faster and easier to use. In this context the most important innovations are: easy handling of operands in the arithmetic unit by means of three data buses and large data files, a powerful data addressing unit for easy handling of vectors, as well as single operands, and a very flexible logic for conditional branching. Input/output will be made transparent through the introduction of internal fast processors which will be used in conjunction with powerful firmware as a software debugging aid. (orig.)
Resource and Performance Evaluations of Fixed Point QRD-RLS Systolic Array through FPGA Implementation

Science.gov (United States)

Yokoyama, Yoshiaki; Kim, Minseok; Arai, Hiroyuki

At present, when using space-time processing techniques with multiple antennas for mobile radio communication, real-time weight adaptation is necessary. Due to the progress of integrated circuit technology, dedicated processor implementation with ASIC or FPGA can be employed to implement various wireless applications. This paper presents a resource and performance evaluation of the QRD-RLS systolic array processor based on fixed-point CORDIC algorithm with FPGA. In this paper, to save hardware resources, we propose the shared architecture of a complex CORDIC processor. The required precision of internal calculation, the circuit area for the number of antenna elements and wordlength, and the processing speed will be evaluated. The resource estimation provides a possible processor configuration with a current FPGA on the market. Computer simulations assuming a fading channel will show a fast convergence property with a finite number of training symbols. The proposed architecture has also been implemented and its operation was verified by beamforming evaluation through a radio propagation experiment.
A Multiprogramming Aware OpenMP Implementation

Directory of Open Access Journals (Sweden)

Vasileios K. Barekas

2003-01-01

Full Text Available In this work, we present an OpenMP implementation suitable for multiprogrammed environments on Intel-based SMP systems. This implementation consists of a runtime system and a resource manager, while we use the NanosCompiler to transform OpenMP-coded applications into code with calls to our runtime system. The resource manager acts as the operating system scheduler for the applications built with our runtime system. It executes a custom made scheduling policy to distribute the available physical processors to the active applications. The runtime system cooperates with the resource manager in order to adapt each application's generated parallelism to the number of processors allocated to it, according to the resource manager scheduling policy. We use the OpenMP version of the NAS Parallel Benchmark suite in order to evaluate the performance of our implementation. In our experiments we compare the performance of our implementation with that of a commercial OpenMP implementation. The comparison proves that our approach performs better both on a dedicated and on a heavily multiprogrammed environment.
Fast processor for dilepton triggers

International Nuclear Information System (INIS)

Katsanevas, S.; Kostarakis, P.; Baltrusaitis, R.

1983-01-01

We describe a fast trigger processor, developed for and used in Fermilab experiment E-537, for selecting high-mass dimuon events produced by negative pions and anti-protons. The processor finds candidate tracks by matching hit information received from drift chambers and scintillation counters, and determines their momenta. Invariant masses are calculated for all possible pairs of tracks and an event is accepted if any invariant mass is greater than some preselectable minimum mass. The whole process, accomplished within 5 to 10 microseconds, achieves up to a ten-fold reduction in trigger rate
Design of RISC Processor Using VHDL and Cadence

Science.gov (United States)

Moslehpour, Saeid; Puliroju, Chandrasekhar; Abu-Aisheh, Akram

The project deals about development of a basic RISC processor. The processor is designed with basic architecture consisting of internal modules like clock generator, memory, program counter, instruction register, accumulator, arithmetic and logic unit and decoder. This processor is mainly used for simple general purpose like arithmetic operations and which can be further developed for general purpose processor by increasing the size of the instruction register. The processor is designed in VHDL by using Xilinx 8.1i version. The present project also serves as an application of the knowledge gained from past studies of the PSPICE program. The study will show how PSPICE can be used to simplify massive complex circuits designed in VHDL Synthesis. The purpose of the project is to explore the designed RISC model piece by piece, examine and understand the Input/ Output pins, and to show how the VHDL synthesis code can be converted to a simplified PSPICE model. The project will also serve as a collection of various research materials about the pieces of the circuit.
Efficient Backprojection-Based Synthetic Aperture Radar Computation with Many-Core Processors

Directory of Open Access Journals (Sweden)

Jongsoo Park

2013-01-01

Full Text Available Tackling computationally challenging problems with high efficiency often requires the combination of algorithmic innovation, advanced architecture, and thorough exploitation of parallelism. We demonstrate this synergy through synthetic aperture radar (SAR via backprojection, an image reconstruction method that can require hundreds of TFLOPS. Computation cost is significantly reduced by our new algorithm of approximate strength reduction; data movement cost is economized by software locality optimizations facilitated by advanced architecture support; parallelism is fully harnessed in various patterns and granularities. We deliver over 35 billion backprojections per second throughput per compute node on an Intel® Xeon® processor E5-2670-based cluster, equipped with Intel® Xeon Phi™ coprocessors. This corresponds to processing a 3K×3K image within a second using a single node. Our study can be extended to other settings: backprojection is applicable elsewhere including medical imaging, approximate strength reduction is a general code transformation technique, and many-core processors are emerging as a solution to energy-efficient computing.
Real time processor for array speckle interferometry

Science.gov (United States)

Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

1989-02-01

The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.
A seasonal model of contracts between a monopsonistic processor and smallholder pepper producers in Costa Rica

NARCIS (Netherlands)

Sáenz Segura, F.; Haese, D' M.F.C.; Schipper, R.A.

2010-01-01

We model the contractual arrangements between smallholder pepper (Piper nigrum L.) producers and a single processor in Costa Rica. Producers in the El Roble settlement sell their pepper to only one processing firm, which exerts its monopsonistic bargaining power by setting the purchase price of
Design and implementation of an ASIP-based cryptography processor for AES, IDEA, and MD5

OpenAIRE

Karim Shahbazi; Mohammad Eshghi; Reza Faghih Mirzaee

2017-01-01

In this paper, a new 32-bit ASIP-based crypto processor for AES, IDEA, and MD5 is designed. The instruction-set consists of both general purpose and specific instructions for the above cryptographic algorithms. The proposed architecture has nine function units and two data buses. It has also two types of 32-bit instruction formats for executing Memory Reference (M.R.), Register Reference (R.R.), and Input/Output Reference (I/O R.) instructions. The maximum achieved frequency is 166.916 MHz. T...
Design of a dedicated processor for AC motor control implemented in a low cost FPGA

DEFF Research Database (Denmark)

Jakobsen, Uffe; Matzen, Torben N.

2008-01-01

of drives. Furthermore the softcore processor is designed with a system for plug in of external logic. Doing so shortens development time, since functionality is simply added to or removed from the softcore. The designer can then choose between resource usage on the FPGA and execution speed in more degrees....... The approach is tested for two different motor types, synchronousand hybrid switched reluctance motors, using a Spartan 3E FPGA. The impact of having ADC-communication in VHDL versus in assembler is also presented....
The Topological Processor for the future ATLAS Level-1 Trigger

CERN Document Server

Kahra, C; The ATLAS collaboration

2014-01-01

ATLAS is an experiment on the Large Hadron Collider (LHC), located at the European Organization for Nuclear Research (CERN) in Switzerland. By 2015 the LHC instantaneous luminosity will be increased from $10^{34}$ up to $3\\cdot 10^{34} \\mathrm{cm}^{-2} \\mathrm{s}^{-1}$. This places stringent operational and physical requirements on the ATLAS Trigger in order to reduce the 40MHz collision rate to a manageable event storage rate of 1kHz while at the same time, selecting those events that contain interesting physics events. The Level-1 Trigger is the first rate-reducing step in the ATLAS Trigger, with an output rate of 100kHz and decision latency of less than $2.5 \\mu \\mathrm{s}$. It is composed of the Calorimeter Trigger, the Muon Trigger and the Central Trigger Processor (CTP). In 2014, there will be a new electronics module: the Topological Processor (L1Topo). The L1Topo will make it possible, for the first time, to use detailed information from subdetectors in a single Level-1 module. This allows the determi...
Vector and parallel processors in computational science

International Nuclear Information System (INIS)

Duff, I.S.; Reid, J.K.

1985-01-01

These proceedings contain the articles presented at the named conference. These concern hardware and software for vector and parallel processors, numerical methods and algorithms for the computation on such processors, as well as applications of such methods to different fields of physics and related sciences. See hints under the relevant topics. (HSI)
Single particle detecting telescope system

International Nuclear Information System (INIS)

Yamamoto, I.; Tomiyama, T.; Iga, Y.; Komatsubara, T.; Kanada, M.; Yamashita, Y.; Wada, T.; Furukawa, S.

1981-01-01

We constructed the single particle detecting telescope system for detecting a fractionally charged particle. The telescope consists of position detecting counters, wall-less multi-cell chambers, single detecting circuits and microcomputer system as data I/0 processor. Especially, a frequency of double particle is compared the case of the single particle detecting with the case of an ordinary measurement
GPU: the biggest key processor for AI and parallel processing

Science.gov (United States)

Baji, Toru

2017-07-01

Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
An accurate projection algorithm for array processor based SPECT systems

International Nuclear Information System (INIS)

King, M.A.; Schwinger, R.B.; Cool, S.L.

1985-01-01

A data re-projection algorithm has been developed for use in single photon emission computed tomography (SPECT) on an array processor based computer system. The algorithm makes use of an accurate representation of pixel activity (uniform square pixel model of intensity distribution), and is rapidly performed due to the efficient handling of an array based algorithm and the Fast Fourier Transform (FFT) on parallel processing hardware. The algorithm consists of using a pixel driven nearest neighbour projection operation to an array of subdivided projection bins. This result is then convolved with the projected uniform square pixel distribution before being compressed to original bin size. This distribution varies with projection angle and is explicitly calculated. The FFT combined with a frequency space multiplication is used instead of a spatial convolution for more rapid execution. The new algorithm was tested against other commonly used projection algorithms by comparing the accuracy of projections of a simulated transverse section of the abdomen against analytically determined projections of that transverse section. The new algorithm was found to yield comparable or better standard error and yet result in easier and more efficient implementation on parallel hardware. Applications of the algorithm include iterative reconstruction and attenuation correction schemes and evaluation of regions of interest in dynamic and gated SPECT

Toward an Ultralow-Power Onboard Processor for Tongue Drive System.

Science.gov (United States)

Viseh, Sina; Ghovanloo, Maysam; Mohsenin, Tinoosh

2015-02-01

The Tongue Drive System (TDS) is a new unobtrusive, wireless, and wearable assistive device that allows for real-time tracking of the voluntary tongue motion in the oral space for communication, control, and navigation applications. The latest TDS prototype appears as a wireless headphone and has been tested in human subject trials. However, the robustness of the external TDS (eTDS) in real-life outdoor conditions may not meet safety regulations because of the limited mechanical stability of the headset. The intraoral TDS (iTDS), which is in the shape of a dental retainer, firmly clasps to the upper teeth and resists sensor misplacement. However, the iTDS has more restrictions on its dimensions, limiting the battery size and consequently requiring a considerable reduction in its power consumption to operate over an extended period of two days on a single charge. In this brief, we propose an ultralow-power local processor for the TDS that performs all signal processing on the transmitter side, following the sensors. Assuming the TDS user on average issuing one command/s, implementing the computational engine reduces the data volume that needs to be wirelessly transmitted to a PC or smartphone by a factor of 1500×, from 12 kb/s to ~8 b/s. The proposed design is implemented on an ultralow-power IGLOO nano field-programmable gate array (FPGA) and is tested on AGLN250 prototype board. According to our post-place-and-route results, implementing the engine on the FPGA significantly drops the required data transmission, while an application-specific integrated circuit (ASIC) implementation in a 65-nm CMOS results in a 15× power saving compared to the FPGA solution and occupies a 0.02-mm 2 footprint. As a result, the power consumption and size of the iTDS will be significantly reduced through the use of a much smaller rechargeable battery. Moreover, the system can operate longer following every recharge, improving the iTDS usability.
Associative Memory Design for the Fast TracKer Processor (FTK)at ATLAS

CERN Document Server

Annovi, A; The ATLAS collaboration; Beretta, M; Bossini, E; Crescioli, F; Dell'Orso, M; Giannetti, P; Hoff, J; Liu, T; Liberali, V; Sacco, I; Schoening, A; Soltveit, H K; Stabile, A; Tripiccione, R

2011-01-01

We describe a VLSI processor for pattern recognition based on Content Addressable Memory (CAM) architecture, optimized for on-line track finding in high-energy physics experiments. A large CAM bank stores all trajectories of interest and extracts the ones compatible with a given event. This task is naturally parallelized by a CAM architecture able to output identified trajectories, recognized among a huge amount of possible combinations, in just a few 100 MHz clock cycles. We have developed this device (called the AMchip03 processor), using 180 nm technology, for the Silicon Vertex Trigger (SVT) upgrade at CDF [1] using a standard-cell VLSI design methodology. We propose a new design that introduces a full custom CAM cell and takes advantage of 65 nm technology. The customized design maximizes the pattern density, minimizes the power consumption and implements the functionalities needed for the planned Fast Tracker (FTK) [2], an ATLAS trigger upgrade project at LHC. We introduce a new variable resolution patt...
SP-100 Position Multiplexer and Analog Input Processor

International Nuclear Information System (INIS)

Syed, A.; Gilliland, K.; Shukla, J.N.

1992-01-01

This paper describes the design, implementation, and performance test results of an engineering model of the Position Multiplexer (MUX)-Analog Input Processor (AIP) System for the transmission and continuous measurements of Reflector Control Drive position in SP-100. This paper describes the work performed to determine the practical circuit limitations, investigate the circuit/component degradation of the multiplexer due to radiation, develop an interference cancellation technique, and evaluate the measurement accuracy as a function of resolver angle, temperature, radiation, and interference. The system developed performs a complex cross-correlation between the resolver excitation and the resolver sine cosine outputs, from which the precise resolver amplitude and phase can be determined while simultaneously eliminating virtually all uncorrelated interference
Acceleration of iterative tomographic reconstruction using graphics processors

International Nuclear Information System (INIS)

Belzunce, M.A.; Osorio, A.; Verrastro, C.A.

2009-01-01

Using iterative algorithms for image reconstruction in 3 D Positron Emission Tomography has shown to produce images with better quality than analytical methods. How ever, these algorithms are computationally expensive. New Graphic Processor Units (GPU) provides high performance at low cost and also programming tools that make possible to execute parallel algorithms easily in scientific applications. In this work, we try to achieve an acceleration of image reconstruction algorithms in 3 D PET by using a GPU. A parallel implementation of the algorithm ML-EM 3 D was developed using Siddon algorithm as Projector and Back-projector. Results show that accelerations of more than one order of magnitude can be achieved, keeping similar image quality. (author)
Logistic Fuel Processor Development

National Research Council Canada - National Science Library

Salavani, Reza

2004-01-01

The Air Base Technologies Division of the Air Force Research Laboratory has developed a logistic fuel processor that removes the sulfur content of the fuel and in the process converts logistic fuel...
Design And Implementation of Low Area/Power Elliptic Curve Digital Signature Hardware Core

Directory of Open Access Journals (Sweden)

Anissa Sghaier

2017-06-01

Full Text Available The Elliptic Curve Digital Signature Algorithm(ECDSA is the analog to the Digital Signature Algorithm(DSA. Based on the elliptic curve, which uses a small key compared to the others public-key algorithms, ECDSA is the most suitable scheme for environments where processor power and storage are limited. This paper focuses on the hardware implementation of the ECDSA over elliptic curveswith the 163-bit key length recommended by the NIST (National Institute of Standards and Technology. It offers two services: signature generation and signature verification. The proposed processor integrates an ECC IP, a Secure Hash Standard 2 IP (SHA-2 Ip and Random Number Generator IP (RNG IP. Thus, all IPs will be optimized, and different types of RNG will be implemented in order to choose the most appropriate one. A co-simulation was done to verify the ECDSA processor using MATLAB Software. All modules were implemented on a Xilinx Virtex 5 ML 50 FPGA platform; they require respectively 9670 slices, 2530 slices and 18,504 slices. FPGA implementations represent generally the first step for obtaining faster ASIC implementations. Further, the proposed design was also implemented on an ASIC CMOS 45-nm technology; it requires a 0.257 mm2 area cell achieving a maximum frequency of 532 MHz and consumes 63.444 (mW. Furthermore, in this paper, we analyze the security of our proposed ECDSA processor against the no correctness check for input points and restart attacks.
DBPQL: A view-oriented query language for the Intel Data Base Processor

Science.gov (United States)

Fishwick, P. A.

1983-01-01

An interactive query language (BDPQL) for the Intel Data Base Processor (DBP) is defined. DBPQL includes a parser generator package which permits the analyst to easily create and manipulate the query statement syntax and semantics. The prototype language, DBPQL, includes trace and performance commands to aid the analyst when implementing new commands and analyzing the execution characteristics of the DBP. The DBPQL grammar file and associated key procedures are included as an appendix to this report.
Compact optical processor for Hough and frequency domain features

Science.gov (United States)

Ott, Peter

1996-11-01

Shape recognition is necessary in a broad band of applications such as traffic sign or work piece recognition. It requires not only neighborhood processing of the input image pixels but global interconnection of them. The Hough transform (HT) performs such a global operation and it is well suited in the preprocessing stage of a shape recognition system. Translation invariant features can be easily calculated form the Hough domain. We have implemented on the computer a neural network shape recognition system which contains a HT, a feature extraction, and a classification layer. The advantage of this approach is that the total system can be optimized with well-known learning techniques and that it can explore the parallelism of the algorithms. However, the HT is a time consuming operation. Parallel, optical processing is therefore advantageous. Several systems have been proposed, based on space multiplexing with arrays of holograms and CGH's or time multiplexing with acousto-optic processors or by image rotation with incoherent and coherent astigmatic optical processors. We took up the last mentioned approach because 2D array detectors are read out line by line, so a 2D detector can achieve the same speed and is easier to implement. Coherent processing can allow the implementation of tilers in the frequency domain. Features based on wedge/ring, Gabor, or wavelet filters have been proven to show good discrimination capabilities for texture and shape recognition. The astigmatic lens system which is derived form the mathematical formulation of the HT is long and contains a non-standard, astigmatic element. By methods of lens transformation s for coherent applications we map the original design to a shorter lens with a smaller number of well separated standard elements and with the same coherent system response. The final lens design still contains the frequency plane for filtering and ray-tracing shows diffraction limited performance. Image rotation can be done
Fixed switching frequency applied in single-phase boost AC to DC converter

International Nuclear Information System (INIS)

Chen, T.-C.; Ren, T.-J.; Ou, J.-C.

2009-01-01

The fixed switching frequency control for a single-phase boost AC to DC converter to achieve a sinusoidal line current and unity power factor is proposed in this paper. The relation between the line current error and the fixed switching frequency was developed. For a limit line current error, the minimum switching frequency for a boost AC to DC converter can be achieved. The proposed scheme was implemented using a 32-bit digital signal processor TMS320C32. Simulations and experimental results demonstrate the feasibility and fast dynamic response of the proposed control strategy.
Block iterative restoration of astronomical images with the massively parallel processor

International Nuclear Information System (INIS)

Heap, S.R.; Lindler, D.J.

1987-01-01

A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images
Real time monitoring of electron processors

International Nuclear Information System (INIS)

Nablo, S.V.; Kneeland, D.R.; McLaughlin, W.L.

1995-01-01

A real time radiation monitor (RTRM) has been developed for monitoring the dose rate (current density) of electron beam processors. The system provides continuous monitoring of processor output, electron beam uniformity, and an independent measure of operating voltage or electron energy. In view of the device's ability to replace labor-intensive dosimetry in verification of machine performance on a real-time basis, its application to providing archival performance data for in-line processing is discussed. (author)
Onboard Data Processors for Planetary Ice-Penetrating Sounding Radars

Science.gov (United States)

Tan, I. L.; Friesenhahn, R.; Gim, Y.; Wu, X.; Jordan, R.; Wang, C.; Clark, D.; Le, M.; Hand, K. P.; Plaut, J. J.

2011-12-01

Among the many concerns faced by outer planetary missions, science data storage and transmission hold special significance. Such missions must contend with limited onboard storage, brief data downlink windows, and low downlink bandwidths. A potential solution to these issues lies in employing onboard data processors (OBPs) to convert raw data into products that are smaller and closely capture relevant scientific phenomena. In this paper, we present the implementation of two OBP architectures for ice-penetrating sounding radars tasked with exploring Europa and Ganymede. Our first architecture utilizes an unfocused processing algorithm extended from the Mars Advanced Radar for Subsurface and Ionosphere Sounding (MARSIS, Jordan et. al. 2009). Compared to downlinking raw data, we are able to reduce data volume by approximately 100 times through OBP usage. To ensure the viability of our approach, we have implemented, simulated, and synthesized this architecture using both VHDL and Matlab models (with fixed-point and floating-point arithmetic) in conjunction with Modelsim. Creation of a VHDL model of our processor is the principle step in transitioning to actual digital hardware, whether in a FPGA (field-programmable gate array) or an ASIC (application-specific integrated circuit), and successful simulation and synthesis strongly indicate feasibility. In addition, we examined the tradeoffs faced in the OBP between fixed-point accuracy, resource consumption, and data product fidelity. Our second architecture is based upon a focused fast back projection (FBP) algorithm that requires a modest amount of computing power and on-board memory while yielding high along-track resolution and improved slope detection capability. We present an overview of the algorithm and details of our implementation, also in VHDL. With the appropriate tradeoffs, the use of OBPs can significantly reduce data downlink requirements without sacrificing data product fidelity. Through the development
MAP3D: a media processor approach for high-end 3D graphics

Science.gov (United States)

Darsa, Lucia; Stadnicki, Steven; Basoglu, Chris

1999-12-01

Equator Technologies, Inc. has used a software-first approach to produce several programmable and advanced VLIW processor architectures that have the flexibility to run both traditional systems tasks and an array of media-rich applications. For example, Equator's MAP1000A is the world's fastest single-chip programmable signal and image processor targeted for digital consumer and office automation markets. The Equator MAP3D is a proposal for the architecture of the next generation of the Equator MAP family. The MAP3D is designed to achieve high-end 3D performance and a variety of customizable special effects by combining special graphics features with high performance floating-point and media processor architecture. As a programmable media processor, it offers the advantages of a completely configurable 3D pipeline--allowing developers to experiment with different algorithms and to tailor their pipeline to achieve the highest performance for a particular application. With the support of Equator's advanced C compiler and toolkit, MAP3D programs can be written in a high-level language. This allows the compiler to successfully find and exploit any parallelism in a programmer's code, thus decreasing the time to market of a given applications. The ability to run an operating system makes it possible to run concurrent applications in the MAP3D chip, such as video decoding while executing the 3D pipelines, so that integration of applications is easily achieved--using real-time decoded imagery for texturing 3D objects, for instance. This novel architecture enables an affordable, integrated solution for high performance 3D graphics.
Streaming Model Based Volume Ray Casting Implementation for Cell Broadband Engine

Directory of Open Access Journals (Sweden)

Jusub Kim

2009-01-01

Full Text Available Interactive high quality volume rendering is becoming increasingly more important as the amount of more complex volumetric data steadily grows. While a number of volumetric rendering techniques have been widely used, ray casting has been recognized as an effective approach for generating high quality visualization. However, for most users, the use of ray casting has been limited to datasets that are very small because of its high demands on computational power and memory bandwidth. However the recent introduction of the Cell Broadband Engine (Cell B.E. processor, which consists of 9 heterogeneous cores designed to handle extremely demanding computations with large streams of data, provides an opportunity to put the ray casting into practical use. In this paper, we introduce an efficient parallel implementation of volume ray casting on the Cell B.E. The implementation is designed to take full advantage of the computational power and memory bandwidth of the Cell B.E. using an intricate orchestration of the ray casting computation on the available heterogeneous resources. Specifically, we introduce streaming model based schemes and techniques to efficiently implement acceleration techniques for ray casting on Cell B.E. In addition to ensuring effective SIMD utilization, our method provides two key benefits: there is no cost for empty space skipping and there is no memory bottleneck on moving volumetric data for processing. Our experimental results show that we can interactively render practical datasets on a single Cell B.E. processor.
ARCIMBOLDO_LITE: single-workstation implementation and use.

Science.gov (United States)

Sammito, Massimo; Millán, Claudia; Frieske, Dawid; Rodríguez-Freire, Eloy; Borges, Rafael J; Usón, Isabel

2015-09-01

ARCIMBOLDO solves the phase problem at resolutions of around 2 Å or better through massive combination of small fragments and density modification. For complex structures, this imposes a need for a powerful grid where calculations can be distributed, but for structures with up to 200 amino acids in the asymmetric unit a single workstation may suffice. The use and performance of the single-workstation implementation, ARCIMBOLDO_LITE, on a pool of test structures with 40-120 amino acids and resolutions between 0.54 and 2.2 Å is described. Inbuilt polyalanine helices and iron cofactors are used as search fragments. ARCIMBOLDO_BORGES can also run on a single workstation to solve structures in this test set using precomputed libraries of local folds. The results of this study have been incorporated into an automated, resolution- and hardware-dependent parameterization. ARCIMBOLDO has been thoroughly rewritten and three binaries are now available: ARCIMBOLDO_LITE, ARCIMBOLDO_SHREDDER and ARCIMBOLDO_BORGES. The programs and libraries can be downloaded from http://chango.ibmb.csic.es/ARCIMBOLDO_LITE.
Fast digital processor for event selection according to particle number difference

International Nuclear Information System (INIS)

Basiladze, S.G.; Gus'kov, B.N.; Li Van Sun; Maksimov, A.N.; Parfenov, A.N.

1978-01-01

A fast digital processor for a magnetic spectrometer is described. It is used in experimental searches for charmed particles. The basic purpose of the processor is discriminating events in the difference of numbers of particles passing through two proportional chambers (PC). The processor consists of three units for detecting signals with PC, and a binary coder. The number of inputs of the processor is 32 for the first PC and 64 for the second. The difference in the number of particles discriminated is from 0 to 8. The resolution time is 180 ns. The processor is built in the CAMAC standard
Logistic Fuel Processor Development

National Research Council Canada - National Science Library

Salavani, Reza

2004-01-01

... to light gases then steam reform the light gases into hydrogen rich stream. This report documents the efforts in developing a fuel processor capable of providing hydrogen to a 3kW fuel cell stack...
A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

Science.gov (United States)

Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

2002-02-01

This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
The Trigger Processor and Trigger Processor Algorithms for the ATLAS New Small Wheel Upgrade

CERN Document Server

Lazovich, Tomo; The ATLAS collaboration

2015-01-01

The ATLAS New Small Wheel (NSW) is an upgrade to the ATLAS muon endcap detectors that will be installed during the next long shutdown of the LHC. Comprising both MicroMegas (MMs) and small-strip Thin Gap Chambers (sTGCs), this system will drastically improve the performance of the muon system in a high cavern background environment. The NSW trigger, in particular, will significantly reduce the rate of fake triggers coming from track segments in the endcap not originating from the interaction point. We will present an overview of the trigger, the proposed sTGC and MM trigger algorithms, and the hardware implementation of the trigger. In particular, we will discuss both the heart of the trigger, an ATCA system with FPGA-based trigger processors (using the same hardware platform for both MM and sTGC triggers), as well as the full trigger electronics chain, including dedicated cards for transmission of data via GBT optical links. Finally, we will detail the challenges of ensuring that the trigger electronics can ...
PRODUCTIVITY AND COSTS OF PROCESSOR WORKING IN STANDS OF Eucalyptus grandis Hill ex Maiden

Directory of Open Access Journals (Sweden)

Bernardo Carlos Tarnowski

2009-09-01

Full Text Available In the present work a time study was conducted with the objective of adjusting equations to estimate the time of activities, productivity, operational costs and the production of the processor used in a harvest operation of stands of Eucalyptus grandis in plain topography in the state of Bahia, Brazil. The operational cycle of the processor consisted of the time spent to process a tree, and was divided in to stages, which were assessed using the methotodology of single activity times. The sampling unit was the operational cycle of the machine. The statistical analysis was based on regression analysis considering the selection procedure “stepwise”. With the adjusted equations it was possible to estimate the productivity of the machine taking into account the of tree diameter. Considering an operational efficiency of 70 % under the circumstances of the study, the productivity of the processor was 25,8 m3 cc/h, the operational costs 47,90 US$/h and the production costs 1,86 US$/m3 cc. On the basis of the obtained results it can be concluded that the time of tree processing has varied directly according to the diameter increase diameter; the preparation time, contrary to the processing time, only shows a weak correlation with tree diameter; productivity of the processor is directly proportional to tree diameter, when expressed in volume and inversely proportional when expressed in tree number; the costs per cubic meter of wood processed varies inversely with of increased diameter; from the operational costs, fixed costs had the highest proportion followed by the variable costs, administrative costs and costs for manpower; the production costs of the processor decreased exponentially with increasing tree diameter.

Channel processor in 2D cluster finding algorithm for high energy physics application

International Nuclear Information System (INIS)

Paul, Rourab; Chakrabarti, Amlan; Mitra, Jubin; Khan, Shuaib A.; Nayak, Tapan; Mukherjee, Sanjoy

2016-01-01

In a Large Ion Collider Experiment (ALICE) at CERN 1 TB/s (approximately) data comes from front end electronics. Previously, we had 1 GBT link operated with a cluster clock frequencies of 133 MHz and 320 MHz in Run 1 and Run 2 respectively. The cluster algorithm proposed in Run 1 and 2 could not work in Run 3 as the data speed increased almost 20 times. Older version cluster algorithm receives data sequentially as a stream. It has 2 main sub processes - Channel Processor, Merging process. The initial step of channel processor finds a peak Q max and sums up pads (sensors) data from -2 time bin to +2 time bin in the time direction. The computed value stores in a register named cluster fragment data (cfd o ). The merging process merges cfd o in pad direction. The data streams in Run 2 comes sequentially, which processed by the channel processor and merging block in a sequential manner with very less resource over head. In Run 3 data comes parallely, 1600 data from 1600 pads of a single time instant comes at each 200 ns interval (5 MHz) which is very challenging to process in the budgeted resource platform of Arria 10 FPGA hardware with 250 to 320 MHz cluster clock
Advantages of an Automated Chemical Processor for XAFS Analysis of Novel Materials

International Nuclear Information System (INIS)

Calvin, S.; Carpenter, E. E.

2007-01-01

For synthetic chemists and material scientists, XAFS analysis is typically used after promising samples have already been identified. This is not an inherent limitation of the technique, which can be applied in a crude fashion quite rapidly, but is rather an artifact of the separation of laboratory from synchrotron, and of the nature of the allocation of beam time. One way around these difficulties is to use automated chemical processors; preferably with identical machines at the home institution and at the beamline. This allows identical samples to be synthesized on demand in both locations, so that the characterization resources of the home institution's laboratory can be applied immediately to samples synthesized at the beamline. In addition, the processor can be run in a combinatorial mode, so that the XAFS of many possible synthesis results are examined in a short time. Finally, spectra possessing features of interest can be identified by software, and those syntheses singled out for further study
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems

Science.gov (United States)

Downie, John D.; Goodman, Joseph W.

1989-10-01

The accuracy requirements of optical processors in adaptive optics systems are determined by estimating the required accuracy in a general optical linear algebra processor (OLAP) that results in a smaller average residual aberration than that achieved with a conventional electronic digital processor with some specific computation speed. Special attention is given to an error analysis of a general OLAP with regard to the residual aberration that is created in an adaptive mirror system by the inaccuracies of the processor, and to the effect of computational speed of an electronic processor on the correction. Results are presented on the ability of an OLAP to compete with a digital processor in various situations.
A new approach in simulating RF linacs using a general, linear real-time signal processor

International Nuclear Information System (INIS)

Young, A.; Jachim, S.P.

1991-01-01

Strict requirements on the tolerances of the amplitude and phase of the radio frequency (RF) cavity field are necessary to advance the field of accelerator technology. Due to these stringent requirements upon modern accelerators,a new approach of modeling and simulating is essential in developing and understanding their characteristics. This paper describes the implementation of a general, linear model of an RF cavity which is used to develop a real-time signal processor. This device fully emulates the response of an RF cavity upon receiving characteristic parameters (Q 0 , ω 0 , Δω, R S , Z 0 ). Simulating an RF cavity with a real-time signal processor is beneficial to an accelerator designer because the device allows one to answer fundamental questions on the response of the cavity to a particular stimulus without operating the accelerator. In particular, the complex interactions between the RF power and the control systems, the beam and cavity fields can simply be observed in a real-time domain. The signal processor can also be used upon initialization of the accelerator as a diagnostic device and as a dummy load for determining the closed-loop error of the control system. In essence, the signal processor is capable of providing information that allows an operator to determine whether the control systems and peripheral devices are operating properly without going through the tedious procedure of running the beam through a cavity
Evaluation of the Intel Westmere-EP server processor

CERN Document Server

Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

2010-01-01

In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing the 6-core “Westmere-EP” processor with Intel’s previous generation of the same microarchitecture, the “Nehalem-EP”. The former is produced in a new 32nm process, the latter in 45nm. Both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores via Simultaneous Multi-Threading (SMT), the cache sizes available, the memory configuration installed, as well...
Development of level 2 processor for the readout of TMC

International Nuclear Information System (INIS)

Arai, Y.; Ikeno, M.; Murata, T.; Sudo, F.; Emura, T.

1995-01-01

We have developed a prototype 8-bit processor for the level 2 data processing for the Time Memory Cell (TMC). The first prototype processor successfully runs with 18 MHz clock. The operation of same clock frequency as TMC (30 MHz) will be easily achieved with simple modifications. Although the processor is very primitive one but shows its powerful performance and flexibility. To realize the compact TMC/L2P (Level 2 Processor) system, it is better to include the microcode memory within the chip. Encoding logic of the microcode must be included to reduce the microcode memory in this case. (J.P.N.)
A programmable two-qubit quantum processor in silicon.

Science.gov (United States)

Watson, T F; Philips, S G J; Kawakami, E; Ward, D R; Scarlino, P; Veldhorst, M; Savage, D E; Lagally, M G; Friesen, Mark; Coppersmith, S N; Eriksson, M A; Vandersypen, L M K

2018-03-29

Now that it is possible to achieve measurement and control fidelities for individual quantum bits (qubits) above the threshold for fault tolerance, attention is moving towards the difficult task of scaling up the number of physical qubits to the large numbers that are needed for fault-tolerant quantum computing. In this context, quantum-dot-based spin qubits could have substantial advantages over other types of qubit owing to their potential for all-electrical operation and ability to be integrated at high density onto an industrial platform. Initialization, readout and single- and two-qubit gates have been demonstrated in various quantum-dot-based qubit representations. However, as seen with small-scale demonstrations of quantum computers using other types of qubit, combining these elements leads to challenges related to qubit crosstalk, state leakage, calibration and control hardware. Here we overcome these challenges by using carefully designed control techniques to demonstrate a programmable two-qubit quantum processor in a silicon device that can perform the Deutsch-Josza algorithm and the Grover search algorithm-canonical examples of quantum algorithms that outperform their classical analogues. We characterize the entanglement in our processor by using quantum-state tomography of Bell states, measuring state fidelities of 85-89 per cent and concurrences of 73-82 per cent. These results pave the way for larger-scale quantum computers that use spins confined to quantum dots.
A programmable two-qubit quantum processor in silicon

Science.gov (United States)

Watson, T. F.; Philips, S. G. J.; Kawakami, E.; Ward, D. R.; Scarlino, P.; Veldhorst, M.; Savage, D. E.; Lagally, M. G.; Friesen, Mark; Coppersmith, S. N.; Eriksson, M. A.; Vandersypen, L. M. K.

2018-03-01

Now that it is possible to achieve measurement and control fidelities for individual quantum bits (qubits) above the threshold for fault tolerance, attention is moving towards the difficult task of scaling up the number of physical qubits to the large numbers that are needed for fault-tolerant quantum computing. In this context, quantum-dot-based spin qubits could have substantial advantages over other types of qubit owing to their potential for all-electrical operation and ability to be integrated at high density onto an industrial platform. Initialization, readout and single- and two-qubit gates have been demonstrated in various quantum-dot-based qubit representations. However, as seen with small-scale demonstrations of quantum computers using other types of qubit, combining these elements leads to challenges related to qubit crosstalk, state leakage, calibration and control hardware. Here we overcome these challenges by using carefully designed control techniques to demonstrate a programmable two-qubit quantum processor in a silicon device that can perform the Deutsch–Josza algorithm and the Grover search algorithm—canonical examples of quantum algorithms that outperform their classical analogues. We characterize the entanglement in our processor by using quantum-state tomography of Bell states, measuring state fidelities of 85–89 per cent and concurrences of 73–82 per cent. These results pave the way for larger-scale quantum computers that use spins confined to quantum dots.
Real-time implementation of logo detection on open source BeagleBoard

Science.gov (United States)

George, M.; Kehtarnavaz, N.; Estevez, L.

2011-03-01

This paper presents the real-time implementation of our previously developed logo detection and tracking algorithm on the open source BeagleBoard mobile platform. This platform has an OMAP processor that incorporates an ARM Cortex processor. The algorithm combines Scale Invariant Feature Transform (SIFT) with k-means clustering, online color calibration and moment invariants to robustly detect and track logos in video. Various optimization steps that are carried out to allow the real-time execution of the algorithm on BeagleBoard are discussed. The results obtained are compared to the PC real-time implementation results.
Review of trigger and on-line processors at SLAC

International Nuclear Information System (INIS)

Lankford, A.J.

1984-07-01

The role of trigger and on-line processors in reducing data rates to manageable proportions in e + e - physics experiments is defined not by high physics or background rates, but by the large event sizes of the general-purpose detectors employed. The rate of e + e - annihilation is low, and backgrounds are not high; yet the number of physics processes which can be studied is vast and varied. This paper begins by briefly describing the role of trigger processors in the e + e - context. The usual flow of the trigger decision process is illustrated with selected examples of SLAC trigger processing. The features are mentioned of triggering at the SLC and the trigger processing plans of the two SLC detectors: The Mark II and the SLD. The most common on-line processors at SLAC, the BADC, the SLAC Scanner Processor, the SLAC FASTBUS Controller, and the VAX CAMAC Channel, are discussed. Uses of the 168/E, 3081/E, and FASTBUS VAX processors are mentioned. The manner in which these processors are interfaced and the function they serve on line is described. Finally, the accelerator control system for the SLC is outlined. This paper is a survey in nature, and hence, relies heavily upon references to previous publications for detailed description of work mentioned here. 27 references, 9 figures, 1 table
SIFT - Design and analysis of a fault-tolerant computer for aircraft control. [Software Implemented Fault Tolerant systems

Science.gov (United States)

Wensley, J. H.; Lamport, L.; Goldberg, J.; Green, M. W.; Levitt, K. N.; Melliar-Smith, P. M.; Shostak, R. E.; Weinstock, C. B.

1978-01-01

SIFT (Software Implemented Fault Tolerance) is an ultrareliable computer for critical aircraft control applications that achieves fault tolerance by the replication of tasks among processing units. The main processing units are off-the-shelf minicomputers, with standard microcomputers serving as the interface to the I/O system. Fault isolation is achieved by using a specially designed redundant bus system to interconnect the processing units. Error detection and analysis and system reconfiguration are performed by software. Iterative tasks are redundantly executed, and the results of each iteration are voted upon before being used. Thus, any single failure in a processing unit or bus can be tolerated with triplication of tasks, and subsequent failures can be tolerated after reconfiguration. Independent execution by separate processors means that the processors need only be loosely synchronized, and a novel fault-tolerant synchronization method is described.
FPGA Implementation of Decimal Processors for Hardware Acceleration

DEFF Research Database (Denmark)

Borup, Nicolas; Dindorp, Jonas; Nannarelli, Alberto

2011-01-01

Applications in non-conventional number systems can benefit from accelerators implemented on reconfigurable platforms, such as Field Programmable Gate-Arrays (FPGAs). In this paper, we show that applications requiring decimal operations, such as the ones necessary in accounting or financial trans...... execution on the CPU of the hosting computer....
Fundamental physics issues of multilevel logic in developing a parallel processor.

Science.gov (United States)

Bandyopadhyay, Anirban; Miki, Kazushi

2007-06-01

In the last century, On and Off physical switches, were equated with two decisions 0 and 1 to express every information in terms of binary digits and physically realize it in terms of switches connected in a circuit. Apart from memory-density increase significantly, more possible choices in particular space enables pattern-logic a reality, and manipulation of pattern would allow controlling logic, generating a new kind of processor. Neumann's computer is based on sequential logic, processing bits one by one. But as pattern-logic is generated on a surface, viewing whole pattern at a time is a truly parallel processing. Following Neumann's and Shannons fundamental thermodynamical approaches we have built compatible model based on series of single molecule based multibit logic systems of 4-12 bits in an UHV-STM. On their monolayer multilevel communication and pattern formation is experimentally verified. Furthermore, the developed intelligent monolayer is trained by Artificial Neural Network. Therefore fundamental weak interactions for the building of truly parallel processor are explored here physically and theoretically.
GR712RC- Dual-Core Processor- Product Status

Science.gov (United States)

Sturesson, Fredrik; Habinc, Sandi; Gaisler, Jiri

2012-08-01

The GR712RC System-on-Chip (SoC) is a dual core LEON3FT system suitable for advanced high reliability space avionics. Fault tolerance features from Aeroflex Gaisler’s GRLIB IP library and an implementation using Ramon Chips RadSafe cell library enables superior radiation hardness.The GR712RC device has been designed to provide high processing power by including two LEON3FT 32- bit SPARC V8 processors, each with its own high- performance IEEE754 compliant floating-point-unit and SPARC reference memory management unit.This high processing power is combined with a large number of serial interfaces, ranging from high-speed links for data transfers to low-speed control buses for commanding and status acquisition.
Monte Carlo dose calculation using a cell processor based PlayStation 3 system

International Nuclear Information System (INIS)

Chow, James C L; Lam, Phil; Jaffray, David A

2012-01-01

This study investigates the performance of the EGSnrc computer code coupled with a Cell-based hardware in Monte Carlo simulation of radiation dose in radiotherapy. Performance evaluations of two processor-intensive functions namely, HOWNEAR and RANMAR G ET in the EGSnrc code were carried out basing on the 20-80 rule (Pareto principle). The execution speeds of the two functions were measured by the profiler gprof specifying the number of executions and total time spent on the functions. A testing architecture designed for Cell processor was implemented in the evaluation using a PlayStation3 (PS3) system. The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made. However, as the EGSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.
Monte Carlo dose calculation using a cell processor based PlayStation 3 system

Science.gov (United States)

Chow, James C. L.; Lam, Phil; Jaffray, David A.

2012-02-01

This study investigates the performance of the EGSnrc computer code coupled with a Cell-based hardware in Monte Carlo simulation of radiation dose in radiotherapy. Performance evaluations of two processor-intensive functions namely, HOWNEAR and RANMAR_GET in the EGSnrc code were carried out basing on the 20-80 rule (Pareto principle). The execution speeds of the two functions were measured by the profiler gprof specifying the number of executions and total time spent on the functions. A testing architecture designed for Cell processor was implemented in the evaluation using a PlayStation3 (PS3) system. The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made. However, as the EGSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.
Monte Carlo dose calculation using a cell processor based PlayStation 3 system

Energy Technology Data Exchange (ETDEWEB)

Chow, James C L; Lam, Phil; Jaffray, David A, E-mail: james.chow@rmp.uhn.on.ca [Department of Radiation Oncology, University of Toronto and Radiation Medicine Program, Princess Margaret Hospital, University Health Network, Toronto, Ontario M5G 2M9 (Canada)

2012-02-09

This study investigates the performance of the EGSnrc computer code coupled with a Cell-based hardware in Monte Carlo simulation of radiation dose in radiotherapy. Performance evaluations of two processor-intensive functions namely, HOWNEAR and RANMAR{sub G}ET in the EGSnrc code were carried out basing on the 20-80 rule (Pareto principle). The execution speeds of the two functions were measured by the profiler gprof specifying the number of executions and total time spent on the functions. A testing architecture designed for Cell processor was implemented in the evaluation using a PlayStation3 (PS3) system. The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made. However, as the EGSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.
FY1995 study of design methodology and environment of high-performance processor architectures; 1995 nendo koseino processor architecture sekkeiho to sekkei kankyo no kenkyu

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-03-01

The aim of our project is to develop high-performance processor architectures for both general purpose and application-specific purpose. We also plan to develop basic softwares, such as compliers, and various design aid tools for those architectures. We are particularly interested in performance evaluation at architecture design phase, design optimization, automatic generation of compliers from processor designs, and architecture design methodologies combined with circuit layout. We have investigated both microprocessor architectures and design methodologies / environments for the processors. Our goal is to establish design technologies for high-performance, low-power, low-cost and highly-reliable systems in system-on-silicon era. We have proposed PPRAM architecture for high-performance system using DRAM and logic mixture technology, Softcore processor architecture for special purpose processors in embedded systems, and Power-Pro architecture for low power systems. We also developed design methodologies and design environments for the above architectures as well as a new method for design verification of microprocessors. (NEDO)
Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

Science.gov (United States)

Roy, Indranil; Aluru, Srinivas

2016-01-01

Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.
Software-defined reconfigurable microwave photonics processor.

Science.gov (United States)

Pérez, Daniel; Gasulla, Ivana; Capmany, José

2015-06-01

We propose, for the first time to our knowledge, a software-defined reconfigurable microwave photonics signal processor architecture that can be integrated on a chip and is capable of performing all the main functionalities by suitable programming of its control signals. The basic configuration is presented and a thorough end-to-end design model derived that accounts for the performance of the overall processor taking into consideration the impact and interdependencies of both its photonic and RF parts. We demonstrate the model versatility by applying it to several relevant application examples.

Software Optimization of Video Codecs on Pentium Processor with MMX Technology

Directory of Open Access Journals (Sweden)

Liu KJ Ray

2001-01-01

Full Text Available A key enabling technology for the proliferation of multimedia PC's is the availability of fast video codecs, which are the basic building blocks of many new multimedia applications. Since most industrial video coding standards (e.g., MPEG1, MPEG2, H.261, H.263 only specify the decoder syntax, there are a lot of rooms for optimization in a practical implementation. When considering a specific hardware platform like the PC, the algorithmic optimization must be considered in tandem with the architecture of the PC. Specifically, an algorithm that is optimal in the sense of number of operations needed may not be the fastest implementation on the PC. This is because special instructions are available which can perform several operations at once under special circumstances. In this work, we describe a fast implementation of H.263 video encoder for the Pentium processor with MMX technology. The described codec is adopted for video mail and video phone softwares used in IBM ThinkPad.
Constructing valid density matrices on an NMR quantum information processor via maximum likelihood estimation

Energy Technology Data Exchange (ETDEWEB)

Singh, Harpreet; Arvind; Dorai, Kavita, E-mail: kavita@iisermohali.ac.in

2016-09-07

Estimation of quantum states is an important step in any quantum information processing experiment. A naive reconstruction of the density matrix from experimental measurements can often give density matrices which are not positive, and hence not physically acceptable. How do we ensure that at all stages of reconstruction, we keep the density matrix positive? Recently a method has been suggested based on maximum likelihood estimation, wherein the density matrix is guaranteed to be positive definite. We experimentally implement this protocol on an NMR quantum information processor. We discuss several examples and compare with the standard method of state estimation. - Highlights: • State estimation using maximum likelihood method was performed on an NMR quantum information processor. • Physically valid density matrices were obtained every time in contrast to standard quantum state tomography. • Density matrices of several different entangled and separable states were reconstructed for two and three qubits.
Implementation of Scientific Computing Applications on the Cell Broadband Engine

Directory of Open Access Journals (Sweden)

Guochun Shi

2009-01-01

Full Text Available The Cell Broadband Engine architecture is a revolutionary processor architecture well suited for many scientific codes. This paper reports on an effort to implement several traditional high-performance scientific computing applications on the Cell Broadband Engine processor, including molecular dynamics, quantum chromodynamics and quantum chemistry codes. The paper discusses data and code restructuring strategies necessary to adapt the applications to the intrinsic properties of the Cell processor and demonstrates performance improvements achieved on the Cell architecture. It concludes with the lessons learned and provides practical recommendations on optimization techniques that are believed to be most appropriate.
Researching, building a soft-processor and Ethernet interface circuit using EDK

International Nuclear Information System (INIS)

Tuong Thi Thu Huong; Pham Ngoc Tuan; Truong Van Dat, Dang Lanh; Chau Thi Nhu Quynh

2014-01-01

The processor is an indispensable component in the measurement and automatic control systems. This report describes the fabrication of a soft-processor (32-bits, on-chip block RAM 64K, 50M clock, internal and peripheral bus) for receiving, sending and processing of data Ethernet packets. This processor is fabricated using the XPS component from EDK (Xilinx) software toolkit. After that, it is configured on the FPGA named Spartan XC3S500E circuit. A firmware of a processor for controlling the interface between processor and Ethernet port is written in C language and can play a role of a HOST (station) which has its own IP to connect to Ethernet network. Besides, there are some needed parts as follows: an Ethernet interfacing controller chip, a suitable cable providing a speed up to 100 Mbs and an application program running under Window XP environment written in LabView to communicate with soft-processor. (author)
A high-accuracy optical linear algebra processor for finite element applications

Science.gov (United States)

Casasent, D.; Taylor, B. K.

1984-01-01

Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced.
Redesign of an in-market food processor for manufacturing cost reduction using DFMA methodology

Directory of Open Access Journals (Sweden)

Akshay Harlalka

2016-01-01

Full Text Available Reducing the time and cost involved in the product development is very important to stay competitive in the market. Design for Manufacturing and Assembly (DFMA is a cost-reduction framework for designers to evaluate manufacturing aspects of a product design. A case study on an in-market food processor, this paper aims to demonstrate the significance of DFMA implementation on an Indian consumer product. Even though many DFMA case studies have been published till date, very few have dealt with the Indian consumer durables category. In this paper, various cost-reduction opportunities are identified in the design of a food processor manufactured by a reputed company in India. Using the DFMA study, design ideas are developed with an objective of reducing the overall manufacturing cost of the product. As a result of DFMA implementation, significant improvement in terms of the product’s architecture, assembly time and design efficiency is identified. Overall cost reduction of 0.25 USD (United States Dollar was achieved and an improvement in the Design for Assembly (DFA index from 15.99 to 19.93 is reported. This procedure utilized in this paper can be adopted for any consumer durable products of similar type and design improvements and cost reduction can be achieved.
The Fermilab Advanced Computer Program multi-array processor system (ACPMAPS): A site oriented supercomputer for theoretical physics

International Nuclear Information System (INIS)

Nash, T.; Areti, H.; Atac, R.

1988-08-01

The ACP Multi-Array Processor System (ACPMAPS) is a highly cost effective, local memory parallel computer designed for floating point intensive grid based problems. The processing nodes of the system are single board array processors based on the FORTRAN and C programmable Weitek XL chip set. The nodes are connected by a network of very high bandwidth 16 port crossbar switches. The architecture is designed to achieve the highest possible cost effectiveness while maintaining a high level of programmability. The primary application of the machine at Fermilab will be lattice gauge theory. The hardware is supported by a transparent site oriented software system called CANOPY which shields theorist users from the underlying node structure. 4 refs., 2 figs
Low voltage 80 KV to 125 KV electron processors

International Nuclear Information System (INIS)

Lauppi, U.V.

1999-01-01

The classic electron beam technology made use of accelerating energies in the voltage range of 300 to 800 kV. The first EB processors - built for the curing of coatings - operated at 300 kV. The products to be treated were thicker than a simple layer of coating with thicknesses up to 100g and more. It was only in the beginning of the 1970's that industrial EB processors with accelerating voltages below 300 kV appeared on the market. Our company developed the first commercial electron accelerator without a beam scanner. The new EB machine featured a linear cathode, emitting a shower or 'curtain' of electrons over the full width of the product. These units were much smaller than anv previous EB processors and dedicated to the curing of coatings and other thin layers. ESI's first EB units operated with accelerating voltages between 150 and 200 kV. In 1993 ESI announced the introduction of a new generation of Electrocure. EB processors operating at 120 kV, and in 1998, at the RadTech North America '98 Conference in Chicago, the introduction of an 80 kV electron beam processor under the designation Microbeam LV
Launching applications on compute and service processors running under different operating systems in scalable network of processor boards with routers

Science.gov (United States)

Tomkins, James L [Albuquerque, NM; Camp, William J [Albuquerque, NM

2009-03-17

A multiple processor computing apparatus includes a physical interconnect structure that is flexibly configurable to support selective segregation of classified and unclassified users. The physical interconnect structure also permits easy physical scalability of the computing apparatus. The computing apparatus can include an emulator which permits applications from the same job to be launched on processors that use different operating systems.
Particle simulation on a distributed memory highly parallel processor

International Nuclear Information System (INIS)

Sato, Hiroyuki; Ikesaka, Morio

1990-01-01

This paper describes parallel molecular dynamics simulation of atoms governed by local force interaction. The space in the model is divided into cubic subspaces and mapped to the processor array of the CAP-256, a distributed memory, highly parallel processor developed at Fujitsu Labs. We developed a new technique to avoid redundant calculation of forces between atoms in different processors. Experiments showed the communication overhead was less than 5%, and the idle time due to load imbalance was less than 11% for two model problems which contain 11,532 and 46,128 argon atoms. From the software simulation, the CAP-II which is under development is estimated to be about 45 times faster than CAP-256 and will be able to run the same problem about 40 times faster than Fujitsu's M-380 mainframe when 256 processors are used. (author)
Recursive Matrix Inverse Update On An Optical Processor

Science.gov (United States)

Casasent, David P.; Baranoski, Edward J.

1988-02-01

A high accuracy optical linear algebraic processor (OLAP) using the digital multiplication by analog convolution (DMAC) algorithm is described for use in an efficient matrix inverse update algorithm with speed and accuracy advantages. The solution of the parameters in the algorithm are addressed and the advantages of optical over digital linear algebraic processors are advanced.
Evaluation of the Intel Nehalem-EX server processor

CERN Document Server

Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

2010-01-01

In this paper we report on a set of benchmark results recently obtained by the CERN openlab by comparing the 4-socket, 32-core Intel Xeon X7560 server with the previous generation 4-socket server, based on the Xeon X7460 processor. The Xeon X7560 processor represents a major change in many respects, especially the memory sub-system, so it was important to make multiple comparisons. In most benchmarks the two 4-socket servers were compared. It should be underlined that both servers represent the “top of the line” in terms of frequency. However, in some cases, it was important to compare systems that integrated the latest processor features, such as QPI links, Symmetric multithreading and over-clocking via Turbo mode, and in such situations the X7560 server was compared to a dual socket L5520 based system with an identical frequency of 2.26 GHz. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following ...
Graphical user interface for TOUGH/TOUGH2 - development of database, pre-processor, and post-processor

Energy Technology Data Exchange (ETDEWEB)

Sato, Tatsuya; Okabe, Takashi; Osato, Kazumi [Geothermal Energy Research and Development Co., Ltd., Tokyo (Japan)

1995-03-01

One of the advantages of the TOUGH/TOUGH2 (Pruess, 1987 and 1991) is the modeling using {open_quotes}free shape{close_quotes} polygonal blocks. However, the treatment of three-dimensional information, particularly for TOUGH/TOUGH2 is not easy because of the {open_quotes}free shape{close_quotes} polygonal blocks. Therefore, we have developed a database named {open_quotes}GEOBASE{close_quotes} and a pre/post-processor named {open_quotes}GEOGRAPH{close_quotes} for TOUGH/TOUGH2 on engineering work station (EWS). {open_quotes}GEOGRAPH{close_quotes} is based on the ORACLE{sup *1} relational database manager system to access data sets of surface exploration (geology, geophysics, geochemistry, etc.), drilling (well trajectory, geological column, logging, etc.), well testing (production test, injection test, interference test, tracer test, etc.) and production/injection history.{open_quotes}GEOGRAPH{close_quotes} consists of {open_quotes}Pre-processor{close_quotes} that can construct the three-dimensional free shape reservoir modeling by mouse operation on X-window and {open_quotes}Post-processor{close_quotes} that can display several kinds of two/three-dimensional maps and X-Y plots to compile data on {open_quotes}GEOBASE{close_quotes} and result of TOUGH/TOUGH2 calculation. This paper shows concept of the systems and examples of utilization.
Speeding up the MATLAB complex networks package using graphic processors

International Nuclear Information System (INIS)

Zhang Bai-Da; Wu Jun-Jie; Li Xin; Tang Yu-Hua

2011-01-01

The availability of computers and communication networks allows us to gather and analyse data on a far larger scale than previously. At present, it is believed that statistics is a suitable method to analyse networks with millions, or more, of vertices. The MATLAB language, with its mass of statistical functions, is a good choice to rapidly realize an algorithm prototype of complex networks. The performance of the MATLAB codes can be further improved by using graphic processor units (GPU). This paper presents the strategies and performance of the GPU implementation of a complex networks package, and the Jacket toolbox of MATLAB is used. Compared with some commercially available CPU implementations, GPU can achieve a speedup of, on average, 11.3×. The experimental result proves that the GPU platform combined with the MATLAB language is a good combination for complex network research. (interdisciplinary physics and related areas of science and technology)
Biomass is beginning to threaten the wood-processors

International Nuclear Information System (INIS)

Beer, G.; Sobinkovic, B.

2004-01-01

In this issue an exploitation of biomass in Slovak Republic is analysed. Some new projects of constructing of the stoke-holds for biomass processing are published. The grants for biomass are ascending the prices of wood raw material, which is thus becoming less accessible for the wood-processors. An excessive wood export threatens the domestic processors
Computational Particle Dynamic Simulations on Multicore Processors (CPDMu) Final Report Phase I

Energy Technology Data Exchange (ETDEWEB)

Schmalz, Mark S

2011-07-24

Statement of Problem - Department of Energy has many legacy codes for simulation of computational particle dynamics and computational fluid dynamics applications that are designed to run on sequential processors and are not easily parallelized. Emerging high-performance computing architectures employ massively parallel multicore architectures (e.g., graphics processing units) to increase throughput. Parallelization of legacy simulation codes is a high priority, to achieve compatibility, efficiency, accuracy, and extensibility. General Statement of Solution - A legacy simulation application designed for implementation on mainly-sequential processors has been represented as a graph G. Mathematical transformations, applied to G, produce a graph representation {und G} for a high-performance architecture. Key computational and data movement kernels of the application were analyzed/optimized for parallel execution using the mapping G {yields} {und G}, which can be performed semi-automatically. This approach is widely applicable to many types of high-performance computing systems, such as graphics processing units or clusters comprised of nodes that contain one or more such units. Phase I Accomplishments - Phase I research decomposed/profiled computational particle dynamics simulation code for rocket fuel combustion into low and high computational cost regions (respectively, mainly sequential and mainly parallel kernels), with analysis of space and time complexity. Using the research team's expertise in algorithm-to-architecture mappings, the high-cost kernels were transformed, parallelized, and implemented on Nvidia Fermi GPUs. Measured speedups (GPU with respect to single-core CPU) were approximately 20-32X for realistic model parameters, without final optimization. Error analysis showed no loss of computational accuracy. Commercial Applications and Other Benefits - The proposed research will constitute a breakthrough in solution of problems related to efficient
Asymmetrical floating point array processors, their application to exploration and exploitation

Energy Technology Data Exchange (ETDEWEB)

Geriepy, B L

1983-01-01

An asymmetrical floating point array processor is a special-purpose scientific computer which operates under asymmetrical control of a host computer. Although an array processor can receive fixed point input and produce fixed point output, its primary mode of operation is floating point. The first generation of array processors was oriented towards time series information. The next generation of array processors has proved much more versatile and their applicability ranges from petroleum reservoir simulation to speech syntheses. Array processors are becoming commonplace in mining, the primary usage being construction of grids-by usual methods or by kriging. The Australian mining community is among the world's leaders in regard to computer-assisted exploration and exploitation systems. Part of this leadership role must be providing guidance to computer vendors in regard to current and future requirements.
The UA1 upgrade calorimeter trigger processor

International Nuclear Information System (INIS)

Bains, N.; Baird, S.A.; Biddulph, P.

1990-01-01

The increased luminosity of the improved CERN Collider and the more subtle signals of second-generation collider physics demand increasingly sophisticated triggering. We have built a new first-level trigger processor designed to use the excellent granularity of the UA1 upgrade calorimeter. This device is entirely digital and handles events in 1.5 μs, thus introducing no deadtime. Its most novel feature is fast two-dimensional electromagnetic cluster-finding with the possibility of demanding an isolated shower of limited penetration. The processor allows multiple combinations of triggers on electromagnetic showers, hadronic jets and energy sums, including a total-energy veto of multiple interactions and a full vector sum of missing transverse energy. This hard-wired processor is about five times more powerful than its predecessor, and makes extensive use of pipelining techniques. It was used extensively in the 1988 and 1989 runs of the CERN Collider. (author)
The UA1 upgrade calorimeter trigger processor

International Nuclear Information System (INIS)

Bains, M.; Charleton, D.; Ellis, N.; Garvey, J.; Gregory, J.; Jimack, M.P.; Jovanovic, P.; Kenyon, I.R.; Baird, S.A.; Campbell, D.; Cawthraw, M.; Coughlan, J.; Flynn, P.; Galagedera, S.; Grayer, G.; Halsall, R.; Shah, T.P.; Stephens, R.; Biddulph, P.; Eisenhandler, E.; Fensome, I.F.; Landon, M.; Robinson, D.; Oliver, J.; Sumorok, K.

1990-01-01

The increased luminosity of the improved CERN Collider and the more subtle signals of second-generation collider physics demand increasingly sophisticated triggering. We have built a new first-level trigger processor designed to use the excellent granularity of the UA1 upgrade calorimeter. This device is entirely digital and handles events in 1.5 μs, thus introducing no dead time. Its most novel feature is fast two-dimensional electromagnetic cluster-finding with the possibility of demanding an isolated shower of limited penetration. The processor allows multiple combinations of triggers on electromagnetic showers, hadronic jets and energy sums, including a total-energy veto of multiple interactions and a full vector sum of missing transverse energy. This hard-wired processor is about five times more powerful than its predecessor, and makes extensive use of pipelining techniques. It was used extensively in the 1988 and 1989 runs of the CERN Collider. (orig.)
Development of a 30-cm ion thruster thermal-vacuum power processor

Science.gov (United States)

Herron, B. G.

1976-01-01

The 30-cm Hg electron-bombardment ion thruster presently under development has reached engineering model status and is generally accepted as the prime propulsion thruster module to be used on the earliest solar electric propulsion missions. This paper presents the results of a related program to develop a transistorized 3-kW Thermal-Vacuum Breadboard (TVBB) Power Processor for this thruster. Emphasized in the paper are the implemented electrical and mechanical designs as well as the resultant system performance achieved over a range of test conditions. In addition, design modifications affording improved performance are identified and discussed.

Light scattering microscopy measurements of single nuclei compared with GPU-accelerated FDTD simulations

International Nuclear Information System (INIS)

Stark, Julian; Rothe, Thomas; Kienle, Alwin; Kieß, Steffen; Simon, Sven

2016-01-01

Single cell nuclei were investigated using two-dimensional angularly and spectrally resolved scattering microscopy. We show that even for a qualitative comparison of experimental and theoretical data, the standard Mie model of a homogeneous sphere proves to be insufficient. Hence, an accelerated finite-difference time-domain method using a graphics processor unit and domain decomposition was implemented to analyze the experimental scattering patterns. The measured cell nuclei were modeled as single spheres with randomly distributed spherical inclusions of different size and refractive index representing the nucleoli and clumps of chromatin. Taking into account the nuclear heterogeneity of a large number of inclusions yields a qualitative agreement between experimental and theoretical spectra and illustrates the impact of the nuclear micro- and nanostructure on the scattering patterns. (paper)
Light scattering microscopy measurements of single nuclei compared with GPU-accelerated FDTD simulations.

Science.gov (United States)

Stark, Julian; Rothe, Thomas; Kieß, Steffen; Simon, Sven; Kienle, Alwin

2016-04-07

Single cell nuclei were investigated using two-dimensional angularly and spectrally resolved scattering microscopy. We show that even for a qualitative comparison of experimental and theoretical data, the standard Mie model of a homogeneous sphere proves to be insufficient. Hence, an accelerated finite-difference time-domain method using a graphics processor unit and domain decomposition was implemented to analyze the experimental scattering patterns. The measured cell nuclei were modeled as single spheres with randomly distributed spherical inclusions of different size and refractive index representing the nucleoli and clumps of chromatin. Taking into account the nuclear heterogeneity of a large number of inclusions yields a qualitative agreement between experimental and theoretical spectra and illustrates the impact of the nuclear micro- and nanostructure on the scattering patterns.
Benchmarking NWP Kernels on Multi- and Many-core Processors

Science.gov (United States)

Michalakes, J.; Vachharajani, M.

2008-12-01

Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.
Implementing An Image Understanding System Architecture Using Pipe

Science.gov (United States)

Luck, Randall L.

1988-03-01

This paper will describe PIPE and how it can be used to implement an image understanding system. Image understanding is the process of developing a description of an image in order to make decisions about its contents. The tasks of image understanding are generally split into low level vision and high level vision. Low level vision is performed by PIPE -a high performance parallel processor with an architecture specifically designed for processing video images at up to 60 fields per second. High level vision is performed by one of several types of serial or parallel computers - depending on the application. An additional processor called ISMAP performs the conversion from iconic image space to symbolic feature space. ISMAP plugs into one of PIPE's slots and is memory mapped into the high level processor. Thus it forms the high speed link between the low and high level vision processors. The mechanisms for bottom-up, data driven processing and top-down, model driven processing are discussed.
ACP/R3000 processors in data acquisition systems

International Nuclear Information System (INIS)

Deppe, J.; Areti, H.; Atac, R.

1989-02-01

We describe ACP/R3000 processor based data acquisition systems for high energy physics. This VME bus compatible processor board, with a computational power equivalent to 15 VAX 11/780s or better, contains 8 Mb of memory for event buffering and has a high speed secondary bus that allows data gathering from front end electronics. 2 refs., 3 figs
Methodology and Implementation on DSP of Heuristic Multiuser DS/CDMA Detectors

Directory of Open Access Journals (Sweden)

Alex Miyamoto Mussi

2010-12-01

Full Text Available The growing number of users of mobile communications networks and the scarcity of the electromagnetic spectrum make the use of diversity techniques and detection/decoding efficient, such as the use of multiple antennas at the transmitter and/or receiver, multiuser detection (MuD – Multiuser Detection, among others, have an increasingly prominent role in the telecommunications landscape. This paper presents a design methodology based on digital signal processors (DSP – Digital Signal Processor with a view to the implementation of multiuser heuristics detectors in systems DS/CDMA (Direct Sequence Code Division Multiple Access. Heuristics detection techniques result in near-optimal performance in order to approach the performance of maximum-likelihood (ML. In this work, was employed the DSP development platform called the C6713 DSK, which is based in Texas TMS320C6713 processor. The heuristics techniques proposed are based on well established algorithms in the literature. The efficiency of the algorithms implemented in DSP has been evaluated numerically by computing the measure of bit error rate (BER. Finally, the feasibility of implementation in DSP could then be verified by comparing results from multiple Monte-Carlo simulation in Matlab, with those obtained from implementation on DSP. It also demonstrates the effective increase in performance and system capacity of DS/CDMA with the use of heuristic multiuser detection techniques, implemented directly in the DSP.
Real-time trajectory optimization on parallel processors

Science.gov (United States)

Psiaki, Mark L.

1993-01-01

A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.
The ATLAS Level-1 Central Trigger Processor (CTP)

CERN Document Server

Spiwoks, Ralf; Ellis, Nick; Farthouat, P; Gällnö, P; Haller, J; Krasznahorkay, A; Maeno, T; Pauly, T; Pessoa-Lima, H; Resurreccion-Arcas, I; Schuler, G; De Seixas, J M; Torga-Teixeira, R; Wengler, T

2005-01-01

The ATLAS Level-1 Central Trigger Processor (CTP) combines information from calorimeter and muon trigger processors and makes the final Level-1 Accept (L1A) decision on the basis of lists of selection criteria (trigger menus). In addition to the event-selection decision, the CTP also provides trigger summary information to the Level-2 trigger and the data acquisition system. It further provides accumulated and bunch-by-bunch scaler data for monitoring of the trigger, detector and beam conditions. The CTP is presented and results are shown from tests with the calorimeter adn muon trigger processors connected to detectors in a particle beam, as well as from stand-alone full-system tests in the laboratory which were used to validate the CTP.
A Processor-Sharing Scheduling Strategy for NFV Nodes

Directory of Open Access Journals (Sweden)

Giuseppe Faraci

2016-01-01

Full Text Available The introduction of the two paradigms SDN and NFV to “softwarize” the current Internet is making management and resource allocation two key challenges in the evolution towards the Future Internet. In this context, this paper proposes Network-Aware Round Robin (NARR, a processor-sharing strategy, to reduce delays in traversing SDN/NFV nodes. The application of NARR alleviates the job of the Orchestrator by automatically working at the intranode level, dynamically assigning the processor slices to the virtual network functions (VNFs according to the state of the queues associated with the output links of the network interface cards (NICs. An extensive simulation set is presented to show the improvements achieved with respect to two more processor-sharing strategies chosen as reference.
Processor farming method for multi-scale analysis of masonry structures

Science.gov (United States)

Krejčí, Tomáš; Koudelka, Tomáš

2017-07-01

This paper describes a processor farming method for a coupled heat and moisture transport in masonry using a two-level approach. The motivation for the two-level description comes from difficulties connected with masonry structures, where the size of stone blocks is much larger than the size of mortar layers and very fine finite element mesh has to be used. The two-level approach is suitable for parallel computing because nearly all computations can be performed independently with little synchronization. This approach is called processor farming. The master processor is dealing with the macro-scale level - the structure and the slave processors are dealing with a homogenization procedure on the meso-scale level which is represented by an appropriate representative volume element.
[Improving speech comprehension using a new cochlear implant speech processor].

Science.gov (United States)

Müller-Deile, J; Kortmann, T; Hoppe, U; Hessel, H; Morsnowski, A

2009-06-01

The aim of this multicenter clinical field study was to assess the benefits of the new Freedom 24 sound processor for cochlear implant (CI) users implanted with the Nucleus 24 cochlear implant system. The study included 48 postlingually profoundly deaf experienced CI users who demonstrated speech comprehension performance with their current speech processor on the Oldenburg sentence test (OLSA) in quiet conditions of at least 80% correct scores and who were able to perform adaptive speech threshold testing using the OLSA in noisy conditions. Following baseline measures of speech comprehension performance with their current speech processor, subjects were upgraded to the Freedom 24 speech processor. After a take-home trial period of at least 2 weeks, subject performance was evaluated by measuring the speech reception threshold with the Freiburg multisyllabic word test and speech intelligibility with the Freiburg monosyllabic word test at 50 dB and 70 dB in the sound field. The results demonstrated highly significant benefits for speech comprehension with the new speech processor. Significant benefits for speech comprehension were also demonstrated with the new speech processor when tested in competing background noise.In contrast, use of the Abbreviated Profile of Hearing Aid Benefit (APHAB) did not prove to be a suitably sensitive assessment tool for comparative subjective self-assessment of hearing benefits with each processor. Use of the preprocessing algorithm known as adaptive dynamic range optimization (ADRO) in the Freedom 24 led to additional improvements over the standard upgrade map for speech comprehension in quiet and showed equivalent performance in noise. Through use of the preprocessing beam-forming algorithm BEAM, subjects demonstrated a highly significant improved signal-to-noise ratio for speech comprehension thresholds (i.e., signal-to-noise ratio for 50% speech comprehension scores) when tested with an adaptive procedure using the Oldenburg
Hardware processor for tracking particles in an alternating-gradient synchrotron

International Nuclear Information System (INIS)

Johnson, M.; Avilez, C.

1987-01-01

We discuss the design and performance of special-purpose processors for tracking particles through an alternating-gradient synchrotron. We present block diagram designs for two hardware processors. Both processors use algorithms based on the 'kick' approximation, i.e., transport matrices are used for dipoles and quadrupoles, and the thin-lens approximation is used for all higher multipoles. The faster processor makes extensive use of memory look-up tables for evaluating functions. For the case of magnets with multipoles up to pole 30 and using one kick per magnet, this processor can track 19 particles through an accelerator at a rate that is only 220 times slower than the time it takes real particles to travel around the machine. For a model consisting of only thin lenses, it is only 150 times slower than real particles. An additional factor of 2 can be obtained with chips now becoming available. The number of magnets in the accelerator is limited only by the amount of memory available for storing magnet parameters. (author) 20 refs., 7 figs., 2 tabs
Performance testing of dosimetry processors, status of NRC rulemaking for improved personnel dosimetry processing, and some beta dosimetry and instrumentation problems observed by NRC regional inspectors

International Nuclear Information System (INIS)

Dennis, N.A.; Kinneman, J.D.; Costello, F.M.; White, J.R.; Nimitz, R.L.

1983-01-01

Early dosimetry processor performance studies conducted between 1967 and 1979 by several different investigators indicated that a significant percentage of personnel dosimetry processors may not be performing with a reasonable degree of accuracy. Results of voluntary performance testing of US personnel dosimetry processors against the final Health Physics Society Standard, Criteria for Testing Personnel Dosimetry Performance by the University of Michigan for the Nuclear Regulatory Commission (NRC) will be summarized with emphasis on processor performance in radiation categories involving beta particles and beta particles and photon mixtures. The current status of the NRC's regulatory program for improved personnel dosimetry processing will be reviewed. The NRC is proposing amendments to its regulations, 10 CFR Part 20, that would require its licensees to utilize specified personnel dosimetry services from processors accredited by the National Voluntary Laboratory Accreditation Program of the National Bureau of Standards. Details of the development and schedule for implementation of the program will be highlighted. Finally, selected beta dosimetry and beta instrumentation problems observed by NRC Regional Staff during inspections of NRC licensed facilities will be discussed
High-speed special-purpose processor for event selection by number of direct tracks

International Nuclear Information System (INIS)

Kalinnikov, V.A.; Krastev, V.R.; Chudakov, E.A.

1986-01-01

A processor which uses data on events from five detector planes is described. To increase economy and speed in parallel processing, the processor converts the input data to superposition code and recognizes tracks by a generated search mask. The resolving time of the processor is ≤300 nsec. The processor is CAMAC-compatible and uses ECL integrated circuits
Software verification and validation methodology for advanced digital reactor protection system using diverse dual processors to prevent common mode failure

International Nuclear Information System (INIS)

Son, Ki Chang; Shin, Hyun Kook; Lee, Nam Hoon; Baek, Seung Min; Kim, Hang Bae

2001-01-01

The Advanced Digital Reactor Protection System (ADRPS) with diverse dual processors is being developed by the National Research Lab of KOPEC for ADRPS development. One of the ADRPS goals is to develop digital Plant Protection System (PPS) free of Common Mode Failure (CMF). To prevent CMF, the principle of diversity is applied to both hardware design and software design. For the hardware diversity, two different types of CPUs are used for Bistable Processor and Local Coincidence Logic Processor. The VME based Single Board Computers (SBC) are used for the CPU hardware platforms. The QNX Operating System (OS) and the VxWorks OS are used for software diversity. Rigorous Software Verification and Validation (V and V) is also required to prevent CMF. In this paper, software V and V methodology for the ADRPS is described to enhance the ADRPS software reliability and to assure high quality of the ADRPS software
Implementation theory of distortion-invariant pattern recognition for optical and digital signal processing systems

Science.gov (United States)

Lhamon, Michael Earl

A pattern recognition system which uses complex correlation filter banks requires proportionally more computational effort than single-real valued filters. This introduces increased computation burden but also introduces a higher level of parallelism, that common computing platforms fail to identify. As a result, we consider algorithm mapping to both optical and digital processors. For digital implementation, we develop computationally efficient pattern recognition algorithms, referred to as, vector inner product operators that require less computational effort than traditional fast Fourier methods. These algorithms do not need correlation and they map readily onto parallel digital architectures, which imply new architectures for optical processors. These filters exploit circulant-symmetric matrix structures of the training set data representing a variety of distortions. By using the same mathematical basis as with the vector inner product operations, we are able to extend the capabilities of more traditional correlation filtering to what we refer to as "Super Images". These "Super Images" are used to morphologically transform a complicated input scene into a predetermined dot pattern. The orientation of the dot pattern is related to the rotational distortion of the object of interest. The optical implementation of "Super Images" yields feature reduction necessary for using other techniques, such as artificial neural networks. We propose a parallel digital signal processor architecture based on specific pattern recognition algorithms but general enough to be applicable to other similar problems. Such an architecture is classified as a data flow architecture. Instead of mapping an algorithm to an architecture, we propose mapping the DSP architecture to a class of pattern recognition algorithms. Today's optical processing systems have difficulties implementing full complex filter structures. Typically, optical systems (like the 4f correlators) are limited to phase
Unified Compact ECC-AES Co-Processor with Group-Key Support for IoT Devices in Wireless Sensor Networks

Directory of Open Access Journals (Sweden)

Luis Parrilla

2018-01-01

Full Text Available Security is a critical challenge for the effective expansion of all new emerging applications in the Internet of Things paradigm. Therefore, it is necessary to define and implement different mechanisms for guaranteeing security and privacy of data interchanged within the multiple wireless sensor networks being part of the Internet of Things. However, in this context, low power and low area are required, limiting the resources available for security and thus hindering the implementation of adequate security protocols. Group keys can save resources and communications bandwidth, but should be combined with public key cryptography to be really secure. In this paper, a compact and unified co-processor for enabling Elliptic Curve Cryptography along to Advanced Encryption Standard with low area requirements and Group-Key support is presented. The designed co-processor allows securing wireless sensor networks with independence of the communications protocols used. With an area occupancy of only 2101 LUTs over Spartan 6 devices from Xilinx, it requires 15% less area while achieving near 490% better performance when compared to cryptoprocessors with similar features in the literature.
Unified Compact ECC-AES Co-Processor with Group-Key Support for IoT Devices in Wireless Sensor Networks

Science.gov (United States)

Castillo, Encarnación; López-Ramos, Juan A.; Morales, Diego P.

2018-01-01

Security is a critical challenge for the effective expansion of all new emerging applications in the Internet of Things paradigm. Therefore, it is necessary to define and implement different mechanisms for guaranteeing security and privacy of data interchanged within the multiple wireless sensor networks being part of the Internet of Things. However, in this context, low power and low area are required, limiting the resources available for security and thus hindering the implementation of adequate security protocols. Group keys can save resources and communications bandwidth, but should be combined with public key cryptography to be really secure. In this paper, a compact and unified co-processor for enabling Elliptic Curve Cryptography along to Advanced Encryption Standard with low area requirements and Group-Key support is presented. The designed co-processor allows securing wireless sensor networks with independence of the communications protocols used. With an area occupancy of only 2101 LUTs over Spartan 6 devices from Xilinx, it requires 15% less area while achieving near 490% better performance when compared to cryptoprocessors with similar features in the literature. PMID:29337921
Unified Compact ECC-AES Co-Processor with Group-Key Support for IoT Devices in Wireless Sensor Networks.

Science.gov (United States)

Parrilla, Luis; Castillo, Encarnación; López-Ramos, Juan A; Álvarez-Bermejo, José A; García, Antonio; Morales, Diego P

2018-01-16

Security is a critical challenge for the effective expansion of all new emerging applications in the Internet of Things paradigm. Therefore, it is necessary to define and implement different mechanisms for guaranteeing security and privacy of data interchanged within the multiple wireless sensor networks being part of the Internet of Things. However, in this context, low power and low area are required, limiting the resources available for security and thus hindering the implementation of adequate security protocols. Group keys can save resources and communications bandwidth, but should be combined with public key cryptography to be really secure. In this paper, a compact and unified co-processor for enabling Elliptic Curve Cryptography along to Advanced Encryption Standard with low area requirements and Group-Key support is presented. The designed co-processor allows securing wireless sensor networks with independence of the communications protocols used. With an area occupancy of only 2101 LUTs over Spartan 6 devices from Xilinx, it requires 15% less area while achieving near 490% better performance when compared to cryptoprocessors with similar features in the literature.
Application of an array processor to the analysis of magnetic data for the Doublet III tokamak

International Nuclear Information System (INIS)

Wang, T.S.; Saito, M.T.

1980-08-01

Discussed herein is a fast computational technique employing the Floating Point Systems AP-190L array processor to analyze magnetic data for the Doublet III tokamak, a fusion research device. Interpretation of the experimental data requires the repeated solution of a free-boundary nonlinear partial differential equation, which describes the magnetohydrodynamic (MHD) equilibrium of the plasma. For this particular application, we have found that the array processor is only 1.4 and 3.5 times slower than the CDC-7600 and CRAY computers, respectively. The overhead on the host DEC-10 computer was kept to a minimum by chaining the complete Poisson solver and free-boundary algorithm into one single-load module using the vector function chainer (VFC). A simple time-sharing scheme for using the MHD code is also discussed

Monitoring the performance of off-site processors

International Nuclear Information System (INIS)

Miller, C.C.

1995-01-01

Commercial nuclear power plants have been able to utilize the latest technologies and achieve large volume reduction by obtaining off-site waste processor services. Although the use of such services reduce the burden of waste processing it also reduces the utility's control over the process. Monitoring the performance of off-site processors is important so that the utility is cognizant of the waste disposition for required regulatory reporting. In addition to obtaining data for Reg Guide 1.21 reporting, Performance monitoring is important to determine which vendor and which services to utilize. Off-site processor services were initially offered for the decontamination of metallic waste. Since that time the list of services has expanded to include supercompaction, survey for release, incineration and metal melting. The number of vendors offering off-site services has increased and the services they offer vary. processing rates vary between vendors and have different charge bases. Determining which vendor to use for what service can be complicated and confusing
3D Seismic Imaging through Reverse-Time Migration on Homogeneous and Heterogeneous Multi-Core Processors

Directory of Open Access Journals (Sweden)

Mauricio Araya-Polo

2009-01-01

Full Text Available Reverse-Time Migration (RTM is a state-of-the-art technique in seismic acoustic imaging, because of the quality and integrity of the images it provides. Oil and gas companies trust RTM with crucial decisions on multi-million-dollar drilling investments. But RTM requires vastly more computational power than its predecessor techniques, and this has somewhat hindered its practical success. On the other hand, despite multi-core architectures promise to deliver unprecedented computational power, little attention has been devoted to mapping efficiently RTM to multi-cores. In this paper, we present a mapping of the RTM computational kernel to the IBM Cell/B.E. processor that reaches close-to-optimal performance. The kernel proves to be memory-bound and it achieves a 98% utilization of the peak memory bandwidth. Our Cell/B.E. implementation outperforms a traditional processor (PowerPC 970MP in terms of performance (with an 15.0× speedup and energy-efficiency (with a 10.0× increase in the GFlops/W delivered. Also, it is the fastest RTM implementation available to the best of our knowledge. These results increase the practical usability of RTM. Also, the RTM-Cell/B.E. combination proves to be a strong competitor in the seismic arena.
Digital Signal Processor System for AC Power Drivers

Directory of Open Access Journals (Sweden)

Ovidiu Neamtu

2009-10-01

Full Text Available DSP (Digital Signal Processor is the bestsolution for motor control systems to make possible thedevelopment of advanced motor drive systems. The motorcontrol processor calculates the required motor windingvoltage magnitude and frequency to operate the motor atthe desired speed. A PWM (Pulse Width Modulationcircuit controls the on and off duty cycle of the powerinverter switches to vary the magnitude of the motorvoltages.
Early experience with the cochlear ESPrit ear-level speech processor in children.

Science.gov (United States)

Totten, C; Cope, Y; McCormick, B

2000-12-01

The ESPrit ear-level speech processor has recently become available in the United Kingdom for use with the Nucleus CI24M multichannel cochlear implant. We report on the use of this ear-level processor with 6 children, ages 8 to 15 years. In this study, all patients were initially fitted with the SPrint body-worn processor, this being a prerequisite for programming the ESPrit. Five of the children were fitted successfully with the ESPrit and are using their devices consistently. The results show that patient experience with the ESPrit has been favorable, although there have been some device and programming difficulties. Aided threshold measures show that the ESPrit processor performs at least as well as the SPrint processor, with a trend toward improved aided thresholds for the ESPrit processor compared with the SPrint processor. Further study of the functional benefit of both of these devices may confirm these potential gains. The ESPrit device currently has a disadvantage for children in that it does not support FM radio hearing aid use. Finally, caution is advised in the fitting of the ESPrit in very young children or inexperienced listeners, because of difficulties in monitoring device function.
Survey of cochlear implant user satisfaction with the Neptune™ waterproof sound processor

Directory of Open Access Journals (Sweden)

Jeroen J. Briaire

2016-04-01

Full Text Available A multi-center self-assessment survey was conducted to evaluate patient satisfaction with the Advanced Bionics Neptune™ waterproof sound processor used with the AquaMic™ totally submersible microphone. Subjective satisfaction with the different Neptune™ wearing options, comfort, ease of use, sound quality and use of the processor in a range of active and water related situations were assessed for 23 adults and 73 children, using an online and paper based questionnaire. Upgraded subjects compared their previous processor to the Neptune™. The Neptune™ was most popular for use in general sports and in the pool. Subjects were satisfied with the sound quality of the sound processor outside and under water and following submersion. Seventyeight percent of subjects rated waterproofness as being very useful and 83% of the newly implanted subjects selected waterproofness as one of the reasons why they chose the Neptune™ processor. Providing a waterproof sound processor is considered by cochlear implant recipients to be useful and important and is a factor in their processor choice. Subjects reported that they were satisfied with the Neptune™ sound quality, ease of use and different wearing options.
A word processor optimized for preparing journal articles and student papers.

Science.gov (United States)

Wolach, A H; McHale, M A

2001-11-01

A new Windows-based word processor for preparing journal articles and student papers is described. In addition to standard features found in word processors, the present word processor provides specific help in preparing manuscripts. Clicking on "Reference Help (APA Form)" in the "File" menu provides a detailed help system for entering the references in a journal article. Clicking on "Examples and Explanations of APA Form" provides a help system with examples of the various sections of a review article, journal article that has one experiment, or journal article that has two or more experiments. The word processor can automatically place the manuscript page header and page number at the top of each page using the form required by APA and Psychonomic Society journals. The "APA Form" submenu of the "Help" menu provides detailed information about how the word processor is optimized for preparing articles and papers.
Extended performance electric propulsion power processor design study. Volume 2: Technical summary

Science.gov (United States)

Biess, J. J.; Inouye, L. Y.; Schoenfeld, A. D.

1977-01-01

Electric propulsion power processor technology has processed during the past decade to the point that it is considered ready for application. Several power processor design concepts were evaluated and compared. Emphasis was placed on a 30 cm ion thruster power processor with a beam power rating supply of 2.2KW to 10KW for the main propulsion power stage. Extension in power processor performance were defined and were designed in sufficient detail to determine efficiency, component weight, part count, reliability and thermal control. A detail design was performed on a microprocessor as the thyristor power processor controller. A reliability analysis was performed to evaluate the effect of the control electronics redesign. Preliminary electrical design, mechanical design and thermal analysis were performed on a 6KW power transformer for the beam supply. Bi-Mod mechanical, structural and thermal control configurations were evaluated for the power processor and preliminary estimates of mechanical weight were determined.
Data driven processor 'Vertex Trigger' for B experiments

International Nuclear Information System (INIS)

Hartouni, E.P.

1993-01-01

Data Driven Processors (DDP's) are specialized computation engines configured to solve specific numerical problems, such as vertex reconstruction. The architecture of the DDP which is the subject of this talk was designed and implemented by W. Sippach and B.C. Knapp at Nevis Lab. in the early 1980's. This particular implementation allows multiple parallel streams of data to provide input to a heterogenous collection of simple operators whose interconnection form an algorithm. The local data flow control allows this device to execute algorithms extremely quickly provided that care is taken in the layout of the algorithm. I/O rates of several hundred megabytes/second are routinely achieved thus making DDP's attractive candidates for complex online calculations. The original question was open-quote can a DDP reconstruct tracks in a Silicon Vertex Detector, find events with a separated vertex and do it fast enough to be used as an online trigger?close-quote Restating this inquiry as three questions and describing the answers to the questions will be the subject of this talk. The three specific questions are: (1) Can an algorithm be found which reconstructs tracks in a planar geometry and no magnetic field; (2) Can separated vertices be recognized in some way; (3) Can the algorithm be implemented in the Nevis-UMass and DDP and execute in 10-20 μs?
Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

Science.gov (United States)

Hristov, Ivan; Goranov, Goran; Hristova, Radoslava

2018-02-01

We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named "Ivy Bridge-EP") in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named "Knights Landing" (KNL). The results show 2 times better performance on KNL processor.
M7--a high speed digital processor for second level trigger selections

International Nuclear Information System (INIS)

Droege, T.F.; Gaines, I.; Turner, K.J.

1978-01-01

A digital processor is described which reconstructs mass and momentum as a second-level trigger selection. The processor is a five-address, microprogramed, pipelined, ECL machine with simultaneous memory access to four operands which load two parallel multipliers and an ALU. Source data modules are extensions of the processor
Experimental demonstration of selective quantum process tomography on an NMR quantum information processor

Science.gov (United States)

Gaikwad, Akshay; Rehal, Diksha; Singh, Amandeep; Arvind, Dorai, Kavita

2018-02-01

We present the NMR implementation of a scheme for selective and efficient quantum process tomography without ancilla. We generalize this scheme such that it can be implemented efficiently using only a set of measurements involving product operators. The method allows us to estimate any element of the quantum process matrix to a desired precision, provided a set of quantum states can be prepared efficiently. Our modified technique requires fewer experimental resources as compared to the standard implementation of selective and efficient quantum process tomography, as it exploits the special nature of NMR measurements to allow us to compute specific elements of the process matrix by a restrictive set of subsystem measurements. To demonstrate the efficacy of our scheme, we experimentally tomograph the processes corresponding to "no operation," a controlled-NOT (CNOT), and a controlled-Hadamard gate on a two-qubit NMR quantum information processor, with high fidelities.
Comparison of Small Baseline Interferometric SAR Processors for Estimating Ground Deformation

Directory of Open Access Journals (Sweden)

Wenyu Gong

2016-04-01

Full Text Available The small Baseline Synthetic Aperture Radar (SAR Interferometry (SBI technique has been widely and successfully applied in various ground deformation monitoring applications. Over the last decade, a variety of SBI algorithms have been developed based on the same fundamental concepts. Recently developed SBI toolboxes provide an open environment for researchers to apply different SBI methods for various purposes. However, there has been no thorough discussion that compares the particular characteristics of different SBI methods and their corresponding performance in ground deformation reconstruction. Thus, two SBI toolboxes that implement a total of four SBI algorithms were selected for comparison. This study discusses and summarizes the main differences, pros and cons of these four SBI implementations, which could help users to choose a suitable SBI method for their specific application. The study focuses on exploring the suitability of each SBI module under various data set conditions, including small/large number of interferograms, the presence or absence of larger time gaps, urban/vegetation ground coverage, and temporally regular/irregular ground displacement with multiple spatial scales. Within this paper we discuss the corresponding theoretical background of each SBI method. We present a performance analysis of these SBI modules based on two real data sets characterized by different environmental and surface deformation conditions. The study shows that all four SBI processors are capable of generating similar ground deformation results when the data set has sufficient temporal sampling and a stable ground backscatter mechanism like urban area. Strengths and limitations of different SBI processors were analyzed based on data set configuration and environmental conditions and are summarized in this paper to guide future users of SBI techniques.
SSC 254 Screen-Based Word Processors: Production Tests. The Lanier Word Processor.

Science.gov (United States)

Moyer, Ruth A.

Designed for use in Trident Technical College's Secretarial Lab, this series of 12 production tests focuses on the use of the Lanier Word Processor for a variety of tasks. In tests 1 and 2, students are required to type and print out letters. Tests 3 through 8 require students to reformat a text; make corrections on a letter; divide and combine…
Dark matter analysis of XENON100 data and cut development utilizing the novel PAX raw data processor

Energy Technology Data Exchange (ETDEWEB)

Wittweg, Christian [Institut fuer Kernphysik, Westfaelische Wilhelms-Universitaet, Muenster (Germany)

2016-07-01

The XENON100 experiment located at LNGS is aimed at the direct detection of weakly interacting massive particles (WIMPs). It utilizes an ultra-low background dual-phase xenon TPC which yields two separate scintillation signals that facilitate background discrimination and event selection. Limits on various interaction types have been published by the collaboration (Science 349 (2015) 6250, 851-854). In the analysis dark matter candidate events have to pass cuts with respect to data quality, consistency and physical features of the interaction. The former ones are implemented with regard to the used data processor's capabilities for noise discrimination and peak-finding. The Processor for Analyzing Xenon (PAX), developed for the XENON1T experiment, enhances these capabilities compared to XENON100. A greater robustness against noise and an increased peak-identification efficiency open up new opportunities for physically motivated cuts while rendering old ones obsolete. The poster will focus on the implementation of new cuts into the analysis chain. Both PAX and the xenon analysis will be introduced. A planned full-scale dark matter analysis of PAX-processed XENON100 data will be outlined.
Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

OpenAIRE

Abdul Kareem PARCHUR; Kuppangari Krishna RAO; Fazal NOORBASHA; Ram Asaray SINGH

2010-01-01

As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI) benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT) @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalab...
Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

Directory of Open Access Journals (Sweden)

Hristov Ivan

2018-01-01

Full Text Available We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named “Ivy Bridge-EP” in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named “Knights Landing” (KNL. The results show 2 times better performance on KNL processor.
HTGR core seismic analysis using an array processor

International Nuclear Information System (INIS)

Shatoff, H.; Charman, C.M.

1983-01-01

A Floating Point Systems array processor performs nonlinear dynamic analysis of the high-temperature gas-cooled reactor (HTGR) core with significant time and cost savings. The graphite HTGR core consists of approximately 8000 blocks of various shapes which are subject to motion and impact during a seismic event. Two-dimensional computer programs (CRUNCH2D, MCOCO) can perform explicit step-by-step dynamic analyses of up to 600 blocks for time-history motions. However, use of two-dimensional codes was limited by the large cost and run times required. Three-dimensional analysis of the entire core, or even a large part of it, had been considered totally impractical. Because of the needs of the HTGR core seismic program, a Floating Point Systems array processor was used to enhance computer performance of the two-dimensional core seismic computer programs, MCOCO and CRUNCH2D. This effort began by converting the computational algorithms used in the codes to a form which takes maximum advantage of the parallel and pipeline processors offered by the architecture of the Floating Point Systems array processor. The subsequent conversion of the vectorized FORTRAN coding to the array processor required a significant programming effort to make the system work on the General Atomic (GA) UNIVAC 1100/82 host. These efforts were quite rewarding, however, since the cost of running the codes has been reduced approximately 50-fold and the time threefold. The core seismic analysis with large two-dimensional models has now become routine and extension to three-dimensional analysis is feasible. These codes simulate the one-fifth-scale full-array HTGR core model. This paper compares the analysis with the test results for sine-sweep motion
A Hybrid Scheme Based on Pipelining and Multitasking in Mobile Application Processors for Advanced Video Coding

Directory of Open Access Journals (Sweden)

Muhammad Asif

2015-01-01

Full Text Available One of the key requirements for mobile devices is to provide high-performance computing at lower power consumption. The processors used in these devices provide specific hardware resources to handle computationally intensive video processing and interactive graphical applications. Moreover, processors designed for low-power applications may introduce limitations on the availability and usage of resources, which present additional challenges to the system designers. Owing to the specific design of the JZ47x series of mobile application processors, a hybrid software-hardware implementation scheme for H.264/AVC encoder is proposed in this work. The proposed scheme distributes the encoding tasks among hardware and software modules. A series of optimization techniques are developed to speed up the memory access and data transferring among memories. Moreover, an efficient data reusage design is proposed for the deblock filter video processing unit to reduce the memory accesses. Furthermore, fine grained macroblock (MB level parallelism is effectively exploited and a pipelined approach is proposed for efficient utilization of hardware processing cores. Finally, based on parallelism in the proposed design, encoding tasks are distributed between two processing cores. Experiments show that the hybrid encoder is 12 times faster than a highly optimized sequential encoder due to proposed techniques.
The hardware track finder processor in CMS at CERN

CERN Document Server

Kluge, A

1997-01-01

The work covers the design of the Track Finder Processor in the high energy experiment CMS (Compact Muon Solenoid, planned for 2005) at CERN/Geneva. The task of this processor is to identify muons and measure their transverse momentum. The track finder processor makes it possible to determine the physical relevance of each high energetic collision and to forward only interesting data to the data an alysis units. Data of more than two hundred thousand detector cells are used to determine the location of muons and measure their transverse momentum. Each 25 ns a new data set is generated. Measurem ent of location and transverse momentum of the muons can be terminated within 350 ns by using an ASIC (Application Specific Integrated Circuit). A pipeline architecture processes new data sets with th e required data rate of 40 MHz to ensure dead time free operation. In the framework of this study specifications and the overall concept of the track finder processor were worked out in detail. Simul ations were performed...
Multipurpose silicon photonics signal processor core.

Science.gov (United States)

Pérez, Daniel; Gasulla, Ivana; Crudgington, Lee; Thomson, David J; Khokhar, Ali Z; Li, Ke; Cao, Wei; Mashanovich, Goran Z; Capmany, José

2017-09-21

Integrated photonics changes the scaling laws of information and communication systems offering architectural choices that combine photonics with electronics to optimize performance, power, footprint, and cost. Application-specific photonic integrated circuits, where particular circuits/chips are designed to optimally perform particular functionalities, require a considerable number of design and fabrication iterations leading to long development times. A different approach inspired by electronic Field Programmable Gate Arrays is the programmable photonic processor, where a common hardware implemented by a two-dimensional photonic waveguide mesh realizes different functionalities through programming. Here, we report the demonstration of such reconfigurable waveguide mesh in silicon. We demonstrate over 20 different functionalities with a simple seven hexagonal cell structure, which can be applied to different fields including communications, chemical and biomedical sensing, signal processing, multiprocessor networks, and quantum information systems. Our work is an important step toward this paradigm.Integrated optical circuits today are typically designed for a few special functionalities and require complex design and development procedures. Here, the authors demonstrate a reconfigurable but simple silicon waveguide mesh with different functionalities.

Analysis of simulated triples γ-ray data on a 100-processor transputer system

International Nuclear Information System (INIS)

Loennroth, T.; Hattula, J.; Julin, R.; Lampinen, A.; Aspnaes, M.; Back, R.J.R.; Granlund, J.; Waxlax, P.

1993-01-01

We present a study of the administration and analysis of simulated nuclear 3-fold coincidences implemented on a 100-processor transputer system, which has a computational capacity of about 150 Mflops. From a simple model of a level scheme the corresponding 'data' is extracted in the form of three-dimensional addresses with background. The data structure and the analysis rate is studied. Further, a set of real data is used to estimate the rate of sort to be performed. The feasibility of using such a system to analyse large data spaces is discussed. (orig.)
Two-dimensional optoelectronic interconnect-processor and its operational bit error rate

Science.gov (United States)

Liu, J. Jiang; Gollsneider, Brian; Chang, Wayne H.; Carhart, Gary W.; Vorontsov, Mikhail A.; Simonis, George J.; Shoop, Barry L.

2004-10-01

Two-dimensional (2-D) multi-channel 8x8 optical interconnect and processor system were designed and developed using complementary metal-oxide-semiconductor (CMOS) driven 850-nm vertical-cavity surface-emitting laser (VCSEL) arrays and the photodetector (PD) arrays with corresponding wavelengths. We performed operation and bit-error-rate (BER) analysis on this free-space integrated 8x8 VCSEL optical interconnects driven by silicon-on-sapphire (SOS) circuits. Pseudo-random bit stream (PRBS) data sequence was used in operation of the interconnects. Eye diagrams were measured from individual channels and analyzed using a digital oscilloscope at data rates from 155 Mb/s to 1.5 Gb/s. Using a statistical model of Gaussian distribution for the random noise in the transmission, we developed a method to compute the BER instantaneously with the digital eye-diagrams. Direct measurements on this interconnects were also taken on a standard BER tester for verification. We found that the results of two methods were in the same order and within 50% accuracy. The integrated interconnects were investigated in an optoelectronic processing architecture of digital halftoning image processor. Error diffusion networks implemented by the inherently parallel nature of photonics promise to provide high quality digital halftoned images.
On the efficacy of using the transfer-controlled procedure during periods of STP processor overloads in SS7 networks

Science.gov (United States)

Rumsewicz, Michael

1994-04-01

In this paper, we examine call completion performance, rather than message throughput, in a Common Channel Signaling network in which the processing resources, and not transmission resources, of a Signaling Transfer Point (STP) are overloaded. Specifically, we perform a transient analysis, via simulation, of a network consisting of a single Central Processor-based STP connecting many local exchanges. We consider the efficacy of using the Transfer Controlled (TFC) procedure when the network call attempt rate exceeds the processing capability of the STP. We find the following: (1) the success of the control depends critically on the rate at which TFC's are sent; (2) use of the TFC procedure in theevent of processor overload can provide reasonable call completion rates.
Reducing adaptive optics latency using Xeon Phi many-core processors

Science.gov (United States)

Barr, David; Basden, Alastair; Dipper, Nigel; Schwartz, Noah

2015-11-01

The next generation of Extremely Large Telescopes (ELTs) for astronomy will rely heavily on the performance of their adaptive optics (AO) systems. Real-time control is at the heart of the critical technologies that will enable telescopes to deliver the best possible science and will require a very significant extrapolation from current AO hardware existing for 4-10 m telescopes. Investigating novel real-time computing architectures and testing their eligibility against anticipated challenges is one of the main priorities of technology development for the ELTs. This paper investigates the suitability of the Intel Xeon Phi, which is a commercial off-the-shelf hardware accelerator. We focus on wavefront reconstruction performance, implementing a straightforward matrix-vector multiplication (MVM) algorithm. We present benchmarking results of the Xeon Phi on a real-time Linux platform, both as a standalone processor and integrated into an existing real-time controller (RTC). Performance of single and multiple Xeon Phis are investigated. We show that this technology has the potential of greatly reducing the mean latency and variations in execution time (jitter) of large AO systems. We present both a detailed performance analysis of the Xeon Phi for a typical E-ELT first-light instrument along with a more general approach that enables us to extend to any AO system size. We show that systematic and detailed performance analysis is an essential part of testing novel real-time control hardware to guarantee optimal science results.
Application of the Computer Capacity to the Analysis of Processors Evolution

OpenAIRE

Ryabko, Boris; Rakitskiy, Anton

2017-01-01

The notion of computer capacity was proposed in 2012, and this quantity has been estimated for computers of different kinds. In this paper we show that, when designing new processors, the manufacturers change the parameters that affect the computer capacity. This allows us to predict the values of parameters of future processors. As the main example we use Intel processors, due to the accessibility of detailed description of all their technical characteristics.
Optimal processor for malfunction detection in operating nuclear reactor

International Nuclear Information System (INIS)

Ciftcioglu, O.

1990-01-01

An optimal processor for diagnosing operational transients in a nuclear reactor is described. Basic design of the processor involves real-time processing of noise signal obtained from a particular in core sensor and the optimality is based on minimum alarm failure in contrast to minimum false alarm criterion from the safe and reliable plant operation viewpoint
An updated program-controlled analog processor, model AP-006, for semiconductor detector spectrometers

International Nuclear Information System (INIS)

Shkola, N.F.; Shevchenko, Yu.A.

1989-01-01

An analog processor, model AP-006, is reported. The processor is a development of a series of spectrometric units based on a shaper of the type 'DL dif +TVS+gated ideal integrator'. Structural and circuits design features are described. The results of testing the processor in a setup with a Si(Li) detecting unit over an input count-rate range of up to 5x10 5 cps are presented. Processor applications are illustrated. (orig.)
Intelligent trigger processor for the crystal box

International Nuclear Information System (INIS)

Sanders, G.H.; Butler, H.S.; Cooper, M.D.

1981-01-01

A large solid angle modular NaI(Tl) detector with 432 phototubes and 88 trigger scintillators is being used to search simultaneously for three lepton flavor changing decays of muon. A beam of up to 10 6 muons stopping per second with a 6% duty factor would yield up to 1000 triggers per second from random triple coincidences. A reduction of the trigger rate to 10 Hz is required from a hardwired primary trigger processor described in this paper. Further reduction to < 1 Hz is achieved by a microprocessor based secondary trigger processor. The primary trigger hardware imposes voter coincidence logic, stringent timing requirements, and a non-adjacency requirement in the trigger scintillators defined by hardwired circuits. Sophisticated geometric requirements are imposed by a PROM-based matrix logic, and energy and vector-momentum cuts are imposed by a hardwired processor using LSI flash ADC's and digital arithmetic loci. The secondary trigger employs four satellite microprocessors to do a sparse data scan, multiplex the data acquisition channels and apply additional event filtering
40 CFR 80.840 - What requirements apply to transmix processors?

Science.gov (United States)

2010-07-01

... PROGRAMS (CONTINUED) REGULATION OF FUELS AND FUEL ADDITIVES Gasoline Toxics Gasoline Toxics Performance Requirements § 80.840 What requirements apply to transmix processors? Any transmix processor who produces gasoline or gasoline blendstock from transmix, or recovers gasoline or gasoline blendstock from transmix...
THOR Fields and Wave Processor - FWP

Science.gov (United States)

Soucek, Jan; Rothkaehl, Hanna; Ahlen, Lennart; Balikhin, Michael; Carr, Christopher; Dekkali, Moustapha; Khotyaintsev, Yuri; Lan, Radek; Magnes, Werner; Morawski, Marek; Nakamura, Rumi; Uhlir, Ludek; Yearby, Keith; Winkler, Marek; Zaslavsky, Arnaud

2017-04-01

If selected, Turbulence Heating ObserveR (THOR) will become the first spacecraft mission dedicated to the study of plasma turbulence. The Fields and Waves Processor (FWP) is an integrated electronics unit for all electromagnetic field measurements performed by THOR. FWP will interface with all THOR fields sensors: electric field antennas of the EFI instrument, the MAG fluxgate magnetometer, and search-coil magnetometer (SCM), and perform signal digitization and on-board data processing. FWP box will house multiple data acquisition sub-units and signal analyzers all sharing a common power supply and data processing unit and thus a single data and power interface to the spacecraft. Integrating all the electromagnetic field measurements in a single unit will improve the consistency of field measurement and accuracy of time synchronization. The scientific value of highly sensitive electric and magnetic field measurements in space has been demonstrated by Cluster (among other spacecraft) and THOR instrumentation will further improve on this heritage. Large dynamic range of the instruments will be complemented by a thorough electromagnetic cleanliness program, which will prevent perturbation of field measurements by interference from payload and platform subsystems. Taking advantage of the capabilities of modern electronics and the large telemetry bandwidth of THOR, FWP will provide multi-component electromagnetic field waveforms and spectral data products at a high time resolution. Fully synchronized sampling of many signals will allow to resolve wave phase information and estimate wavelength via interferometric correlations between EFI probes. FWP will also implement a plasma resonance sounder and a digital plasma quasi-thermal noise analyzer designed to provide high cadence measurements of plasma density and temperature complementary to data from particle instruments. FWP will rapidly transmit information about magnetic field vector and spacecraft potential to the
The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

CERN Document Server

Biesuz, Nicolo Vladi; The ATLAS collaboration; Luciano, Pierluigi; Magalotti, Daniel; Rossi, Enrico

2015-01-01

The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed on purpose to execute pattern matching with a high degree of parallelism. It finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 2 Gb/s serial links. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. We report on the performance of the intermedia...
The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

CERN Document Server

Andreani, A; The ATLAS collaboration; Beccherle, R; Beretta, M; Cipriani, R; Citraro, S; Citterio, M; Colombo, A; Crescioli, F; Dimas, D; Donati, S; Giannetti, P; Kordas, K; Lanza, A; Liberali, V; Luciano, P; Magalotti, D; Neroutsos, P; Nikolaidis, S; Piendibene, M; Sakellariou, A; Shojaii, S; Sotiropoulou, C-L; Stabile, A

2014-01-01

The Associative Memory (AM) system of the FTK processor has been designed to perform pattern matching using the hit information of the ATLAS silicon tracker. The AM is the heart of the FTK and it finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside the FTK, multiple designs and tests have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 2 Gb/s serial links. This paper reports on the design of the Serial Link Processor consisting of the AM chip, an ASIC designed and optimized to perform pattern matching, and two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. Special relevance will be given to the AMchip design that includes two custom cells optimized for low consumption. We repo...
The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

CERN Document Server

Biesuz, Nicolo Vladi; The ATLAS collaboration; Luciano, Pierluigi; Magalotti, Daniel; Rossi, Enrico

2015-01-01

The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed to execute pattern matching with a high degree of parallelism. The AM system finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 828 2 Gbit/s serial links for a total in/out bandwidth of 56 Gb/s. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. ...
A programmable systolic trigger processor for FERA bus data

International Nuclear Information System (INIS)

Appelquist, G.; Hovander, B.; Sellden, B.; Bohm, C.

1992-09-01

A generic CAMAC based trigger processor module for fast processing of large amounts of ADC data, has been designed. This module has been realised using complex programmable gate arrays (LCAs from XILINX). The gate arrays have been connected to memories and multipliers in such a way that different gate array configurations can cover a wide range of module applications. Using this module, it is possible to construct complex trigger processors. The module uses both the fast ECL FERA bus and the CAMAC bus for inputs and outputs. The latter, however, is primarily used for set-up and control but may also be used for data output. Large numbers of ADCs can be served by a hierarchical arrangement of trigger processor modules, processing ADC data with pipe-line arithmetics producing the final result at the apex of the pyramid. The trigger decision will be transmitted to the data acquisition system via a logic signal while numeric results may be extracted by the CAMAC controller. The trigger processor was originally developed for the proposed neutral particle search experiment at CERN, NUMASS. There it was designed to serve as a second level trigger processor. It was required to correct all ADC raw data for efficiency and pedestal, calculate the total calorimeter energy, obtain the optimal time of flight data and calculate the particle mass. A suitable mass cut would then deliver the trigger decision. More complex triggers were also considered. (au)
Emergency product generation for disaster management using RISAT and DMSAR quick look SAR processors

Science.gov (United States)

Desai, Nilesh; Sharma, Ritesh; Kumar, Saravana; Misra, Tapan; Gujraty, Virendra; Rana, SurinderSingh

2006-12-01

generate full-swath (6 to 75 Kms) DMSAR images in 1m / 3m / 5m / 10m / 30m resolution SAR operating modes. For RISAT mission, this generic Quick Look SAR Processor will be mainly used for browse product generation at NRSA-Shadnagar (SAN) ground receive station. RISAT QLP/NRTP is also proposed to provide an alternative emergency SAR product generation chain. For this, the S/C aux data appended in Onboard SAR Frame Format (x, y, z, x', y', z', roll, pitch, yaw) and predicted orbit from previous days Orbit Determination data will be used. The QLP / NRTP will produce ground range images in real / near real time. For emergency data product generation, additional Off-line tasks like geo-tagging, masking, QC etc needs to be performed on the processed image. The QLP / NRTP would generate geo-tagged images from the annotation data available from the SAR P/L data itself. Since the orbit & attitude information are taken as it is, the location accuracy will be poorer compared to the product generated using ADIF, where smoothened attitude and orbit are made available. Additional tasks like masking, output formatting and Quality checking of the data product will be carried out at Balanagar, NRSA after the image annotated data from QLP / NRTP is sent to Balanagar. The necessary interfaces to the QLP/NRTP for Emergency product generation are also being worked out. As is widely acknowledged, QLP/NRTP for RISAT and DMSAR is an ambitious effort and the technology of future. It is expected that by the middle of next decade, the next generation SAR missions worldwide will have onboard SAR Processors of varying capabilities and generate SAR Data products and Information products onboard instead of SAR raw data. Thus, it is also envisaged that these activities related to QLP/NRTP implementation for RISAT ground segment and DMSAR will be a significant step which will directly feed into the development of onboard real time processing systems for ISRO's future space borne SAR missions. This paper
CASPER: Embedding Power Estimation and Hardware-Controlled Power Management in a Cycle-Accurate Micro-Architecture Simulation Platform for Many-Core Multi-Threading Heterogeneous Processors

Directory of Open Access Journals (Sweden)

Arun Ravindran

2012-02-01

Full Text Available Despite the promising performance improvement observed in emerging many-core architectures in high performance processors, high power consumption prohibitively affects their use and marketability in the low-energy sectors, such as embedded processors, network processors and application specific instruction processors (ASIPs. While most chip architects design power-efficient processors by finding an optimal power-performance balance in their design, some use sophisticated on-chip autonomous power management units, which dynamically reduce the voltage or frequencies of idle cores and hence extend battery life and reduce operating costs. For large scale designs of many-core processors, a holistic approach integrating both these techniques at different levels of abstraction can potentially achieve maximal power savings. In this paper we present CASPER, a robust instruction trace driven cycle-accurate many-core multi-threading micro-architecture simulation platform where we have incorporated power estimation models of a wide variety of tunable many-core micro-architectural design parameters, thus enabling processor architects to explore a sufficiently large design space and achieve power-efficient designs. Additionally CASPER is designed to accommodate cycle-accurate models of hardware controlled power management units, enabling architects to experiment with and evaluate different autonomous power-saving mechanisms to study the run-time power-performance trade-offs in embedded many-core processors. We have implemented two such techniques in CASPER–Chipwide Dynamic Voltage and Frequency Scaling, and Performance Aware Core-Specific Frequency Scaling, which show average power savings of 35.9% and 26.2% on a baseline 4-core SPARC based architecture respectively. This power saving data accounts for the power consumption of the power management units themselves. The CASPER simulation platform also provides users with complete support of SPARCV9
LHCb base-line level-0 trigger 3D-flow implementation

CERN Document Server

Crosetto, D

1999-01-01

The LHCb Level-0 trigger implementation with the 3D-Flow system offers full programmability, allowing it to adapt to unexpected operating conditions and enabling new, unpredicted physics. The implementation is described in detail and refers to components and technology available today. The 3D-Flow Processor system is a new, technology-independent concept in very fast, real-time system architectures. Based on the replication of a single type of circuit of 100 k gates, which communicates in six directions: bi-directional with North, East, West, and South neighbors, unidirectional from Top to Bottom, the system offers full programmability, modularity, ease of expansion and adaptation to the latest technology. A complete study of its applicability to the LHCb calorimeter triggers is presented. Full description of the input data handling, either in digital or mixed digital-analog form, of the data processing, and the transmission of results to the global level-0 trigger decision unit are provided. Any level-0 trig...
7 CFR 1215.14 - Processor.

Science.gov (United States)

2010-01-01

... 7 Agriculture 10 2010-01-01 2010-01-01 false Processor. 1215.14 Section 1215.14 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (MARKETING AGREEMENTS... CONSUMER INFORMATION Popcorn Promotion, Research, and Consumer Information Order Definitions § 1215.14...
PetaScale calculations of the electronic structures of nanostructures with hundreds of thousands of processors

International Nuclear Information System (INIS)

Wang, Lin-Wang; Zhao, Zhengji; Meza, Juan

2006-01-01

Density functional theory (DFT) is the most widely used ab initio method in material simulations. It accounts for 75% of the NERSC allocation time in the material science category. The DFT can be used to calculate the electronic structure, the charge density, the total energy and the atomic forces of a material system. With the advance of the HPC power and new algorithms, DFT can now be used to study thousand atom systems in some limited ways (e.g, a single selfconsistent calculation without atomic relaxation). But there are many problems which either requires much larger systems (e.g, >100,000 atoms), or many total energy calculation steps (e.g. for molecular dynamics or atomic relaxations). Examples include: grain boundary, dislocation energies and atomic structures, impurity transport and clustering in semiconductors, nanostructure growth, electronic structures of nanostructures and their internal electric fields. Due to the O(N 3 ) scaling of the conventional DFT algorithms (as implemented in codes like Qbox, Paratec, Petots), these problems are beyond the reach even for petascale computers. As the proposed petascale computers might have millions of processors, new computational paradigms and algorithms are needed to solve the above large scale problems. In particular, O(N) scaling algorithms with parallelization capability up to millions of processors are needed. For a large material science problem, a natural approach to achieve this goal is by divide-and-conquer method: to spatially divide the system into many small pieces, and solve each piece by a small local group of processors. This solves the O(N) scaling and the parallelization problem at the same time. However, the challenge of this approach is for how to divide the system into small pieces and how to patch them up without the trace of the spatial division. Here, we present a linear scaling 3 dimensional fragment (LS3DF) method which uses a novel division-patching scheme that cancels out the
Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

Directory of Open Access Journals (Sweden)

Abdul Kareem PARCHUR

2010-12-01

Full Text Available As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalability in Intel Core 2 Duo series processors. Even though AI benchmarks have similar execution time, they have dissimilar characteristics which are identified using principal component analysis and dendogram. As the processor frequency increased from 1.8 GHz to 3.167 GHz the execution time is decreased by ~370 sec for AI workloads. In the case of Physics/Quantum Computing programs it was ~940 sec.

Photonics and Fiber Optics Processor Lab

Data.gov (United States)

Federal Laboratory Consortium — The Photonics and Fiber Optics Processor Lab develops, tests and evaluates high speed fiber optic network components as well as network protocols. In addition, this...
Invasive tightly coupled processor arrays

CERN Document Server

LARI, VAHID

2016-01-01

This book introduces new massively parallel computer (MPSoC) architectures called invasive tightly coupled processor arrays. It proposes strategies, architecture designs, and programming interfaces for invasive TCPAs that allow invading and subsequently executing loop programs with strict requirements or guarantees of non-functional execution qualities such as performance, power consumption, and reliability. For the first time, such a configurable processor array architecture consisting of locally interconnected VLIW processing elements can be claimed by programs, either in full or in part, using the principle of invasive computing. Invasive TCPAs provide unprecedented energy efficiency for the parallel execution of nested loop programs by avoiding any global memory access such as GPUs and may even support loops with complex dependencies such as loop-carried dependencies that are not amenable to parallel execution on GPUs. For this purpose, the book proposes different invasion strategies for claiming a desire...
Reaction-diffusion path planning in a hybrid chemical and cellular-automaton processor

International Nuclear Information System (INIS)

Adamatzky, Andrew; Lacy Costello, Benjamin de

2003-01-01

To find the shortest collision-free path in a room containing obstacles we designed a chemical processor and coupled it with a cellular-automaton processor. In the chemical processor obstacles are represented by sites of high concentration of potassium iodide and a planar substrate is saturated with palladium chloride. Potassium iodide diffuses into the substrate and reacts with palladium chloride. A dark coloured precipitate of palladium iodide is formed almost everywhere except sites where two or more diffusion wavefronts collide. The less coloured sites are situated at the furthest distance from obstacles. Thus, the chemical processor develops a repulsive field, generated by obstacles. A snapshot of the chemical processor is inputted to a cellular automaton. The automaton behaves like a discrete excitable media; also, every cell of the automaton is supplied with a pointer that shows an origin of the cell's excitation. The excitation spreads along the cells corresponding to precipitate depleted sites of the chemical processor. When the destination-site is excited, waves travel on the lattice and update the orientations of the pointers. Thus, the automaton constructs a spanning tree, made of pointers, that guides a traveler towards the destination point. Thus, the automaton medium generates an attractive field and combination of this attractive field with the repulsive field, generated by the chemical processor, provides us with a solution of the collision-free path problem
16-Bit RISC Processor Design for Convolution Application

OpenAIRE

Anand Nandakumar Shardul

2013-01-01

In this project, we propose a 16-bit non-pipelined RISC processor, which is used for signal processing applications. The processor consists of the blocks, namely, program counter, clock control unit, ALU, IDU and registers. Advantageous architectural modifications have been made in the incremented circuit used in program counter and carry select adder unit of the ALU in the RISC CPU core. Furthermore, a high speed and low power modified modifies multiplier has been designed and introduced in ...
Assessment of Single European Sky Implementation in the Functional Airspace Block Central Europe

Directory of Open Access Journals (Sweden)

Tomislav Mihetec

2017-12-01

implementation is performed through sub-regional grouping of Air Navigation Service Providers in a form of Functional Airspace Blocks. This paper analyses the level of implementation of ATM-related projects in the Functional Airspace Block Central Europe and their relation to other Functional Airspace Blocks defined in Europe. From this paper it is obvious that even though the planning of Single European Sky projects is based on the collaborative implementation of Functional Airspace Block level, the real implementation is fragmented and based on national levels.
The Impact of 3D Stacking and Technology Scaling on the Power and Area of Stereo Matching Processors

Directory of Open Access Journals (Sweden)

Seung-Ho Ok

2017-02-01

Full Text Available Recently, stereo matching processors have been adopted in real-time embedded systems such as intelligent robots and autonomous vehicles, which require minimal hardware resources and low power consumption. Meanwhile, thanks to the through-silicon via (TSV, three-dimensional (3D stacking technology has emerged as a practical solution to achieving the desired requirements of a high-performance circuit. In this paper, we present the benefits of 3D stacking and process technology scaling on stereo matching processors. We implemented 2-tier 3D-stacked stereo matching processors with GlobalFoundries 130-nm and Nangate 45-nm process design kits and compare them with their two-dimensional (2D counterparts to identify comprehensive design benefits. In addition, we examine the findings from various analyses to identify the power benefits of 3D-stacked integrated circuit (IC and device technology advancements. From experiments, we observe that the proposed 3D-stacked ICs, compared to their 2D IC counterparts, obtain 43% area, 13% power, and 14% wire length reductions. In addition, we present a logic partitioning method suitable for a pipeline-based hardware architecture that minimizes the use of TSVs.
Modal Processor Effects Inspired by Hammond Tonewheel Organs

Directory of Open Access Journals (Sweden)

Kurt James Werner

2016-06-01

Full Text Available In this design study, we introduce a novel class of digital audio effects that extend the recently introduced modal processor approach to artificial reverberation and effects processing. These pitch and distortion processing effects mimic the design and sonics of a classic additive-synthesis-based electromechanical musical instrument, the Hammond tonewheel organ. As a reverb effect, the modal processor simulates a room response as the sum of resonant filter responses. This architecture provides precise, interactive control over the frequency, damping, and complex amplitude of each mode. Into this framework, we introduce two types of processing effects: pitch effects inspired by the Hammond organ’s equal tempered “tonewheels”, “drawbar” tone controls, vibrato/chorus circuit, and distortion effects inspired by the pseudo-sinusoidal shape of its tonewheels and electromagnetic pickup distortion. The result is an effects processor that imprints the Hammond organ’s sonics onto any audio input.
The performances of coffee processors and coffee market in the Republic of Serbia

Directory of Open Access Journals (Sweden)

Nuševa Daniela

2017-01-01

Full Text Available The main aim of this paper is to investigate the performances of coffee processors and coffee market in Serbia based on the market concentration analysis, profitability analysis, and profitability determinants analysis. The research was based on the sample of 40 observations of coffee processing companies divided into two groups: large and small coffee processors. The results indicate that two large coffee processors have dominant market share. Even though the Serbian coffee market is an oligopolistic, profitability analysis indicates that small coffee processors have a significant better profitability ratio than large coffee processors. Furthermore, results show that profitability ratio is positively related to the inventory turnover and negatively related to the market share.
Hardware Synchronization for Embedded Multi-Core Processors

DEFF Research Database (Denmark)

Stoif, Christian; Schoeberl, Martin; Liccardi, Benito

2011-01-01

Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper, establi......Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper...
The microelectronic and photonic test bed RISC processor and DRAM memory stack experiments

International Nuclear Information System (INIS)

Clark, K.A.; Meehan, T.J.

1999-01-01

This paper reports on the on-orbit data obtained from the MPTB RISC Processor Experiment, containing three Integrated Device Technologies R3081 processors. During operations, nine SEUs were observed in the processors, and four SEUs were observed in the memory and/or support circuitry. (authors)
Digital image processing software system using an array processor

International Nuclear Information System (INIS)

Sherwood, R.J.; Portnoff, M.R.; Journeay, C.H.; Twogood, R.E.

1981-01-01

A versatile array processor-based system for general-purpose image processing was developed. At the heart of this system is an extensive, flexible software package that incorporates the array processor for effective interactive image processing. The software system is described in detail, and its application to a diverse set of applications at LLNL is briefly discussed. 4 figures, 1 table
Embedded Processor Based Automatic Temperature Control of VLSI Chips

Directory of Open Access Journals (Sweden)

Narasimha Murthy Yayavaram

2009-01-01

Full Text Available This paper presents embedded processor based automatic temperature control of VLSI chips, using temperature sensor LM35 and ARM processor LPC2378. Due to the very high packing density, VLSI chips get heated very soon and if not cooled properly, the performance is very much affected. In the present work, the sensor which is kept very near proximity to the IC will sense the temperature and the speed of the fan arranged near to the IC is controlled based on the PWM signal generated by the ARM processor. A buzzer is also provided with the hardware, to indicate either the failure of the fan or overheating of the IC. The entire process is achieved by developing a suitable embedded C program.
Scientific programming on massively parallel processor CP-PACS

International Nuclear Information System (INIS)

Boku, Taisuke

1998-01-01

The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)
Investigation of Large Scale Cortical Models on Clustered Multi-Core Processors

Science.gov (United States)

2013-02-01

Playstation 3 with 6 available SPU cores outperforms the Intel Xeon processor (with 4 cores) by about 1.9 times for the HTM model and by 2.4 times...runtime breakdowns of the HTM and Dean models respectively on the Cell processor (on the Playstation 3) and the Intel Xeon processor ( 4 thread...YOUR FORM TO THE ABOVE ORGANIZATION. 1. REPORT DATE (DD-MM-YYYY) 2. REPORT TYPE 3. DATES COVERED (From - To) 4 . TITLE AND SUBTITLE 5a. CONTRACT NUMBER
Compilation Techniques Specific for a Hardware Cryptography-Embedded Multimedia Mobile Processor

Directory of Open Access Journals (Sweden)

Masa-aki FUKASE

2007-12-01

Full Text Available The development of single chip VLSI processors is the key technology of ever growing pervasive computing to answer overall demands for usability, mobility, speed, security, etc. We have so far developed a hardware cryptography-embedded multimedia mobile processor architecture, HCgorilla. Since HCgorilla integrates a wide range of techniques from architectures to applications and languages, one-sided design approach is not always useful. HCgorilla needs more complicated strategy, that is, hardware/software (H/S codesign. Thus, we exploit the software support of HCgorilla composed of a Java interface and parallelizing compilers. They are assumed to be installed in servers in order to reduce the load and increase the performance of HCgorilla-embedded clients. Since compilers are the essence of software's responsibility, we focus in this article on our recent results about the design, specifications, and prototyping of parallelizing compilers for HCgorilla. The parallelizing compilers are composed of a multicore compiler and a LIW compiler. They are specified to abstract parallelism from executable serial codes or the Java interface output and output the codes executable in parallel by HCgorilla. The prototyping compilers are written in Java. The evaluation by using an arithmetic test program shows the reasonability of the prototyping compilers compared with hand compilers.
Automation of ORIGEN2 calculations for the transuranic waste baseline inventory database using a pre-processor and a post-processor

International Nuclear Information System (INIS)

Liscum-Powell, J.

1997-06-01

The purpose of the work described in this report was to automate ORIGEN2 calculations for the Waste Isolation Pilot Plant (WIPP) Transuranic Waste Baseline Inventory Database (WTWBID); this was done by developing a pre-processor to generate ORIGEN2 input files from WWBID inventory files and a post-processor to remove excess information from the ORIGEN2 output files. The calculations performed with ORIGEN2 estimate the radioactive decay and buildup of various radionuclides in the waste streams identified in the WTWBID. The resulting radionuclide inventories are needed for performance assessment calculations for the WIPP site. The work resulted in the development of PreORG, which requires interaction with the user to generate ORIGEN2 input files on a site-by-site basis, and PostORG, which processes ORIGEN2 output into more manageable files. Both programs are written in the FORTRAN 77 computer language. After running PreORG, the user will run ORIGEN2 to generate the desired data; upon completion of ORIGEN2 calculations, the user can run PostORG to process the output to make it more manageable. All the programs run on a 386 PC or higher with a math co-processor or a computer platform running under VMS operating system. The pre- and post-processors for ORIGEN2 were generated for use with Rev. 1 data of the WTWBID and can also be used with Rev. 2 and 3 data of the TWBID (Transuranic Waste Baseline Inventory Database)
A design of a computer complex including vector processors

International Nuclear Information System (INIS)

Asai, Kiyoshi

1982-12-01

We, members of the Computing Center, Japan Atomic Energy Research Institute have been engaged for these six years in the research of adaptability of vector processing to large-scale nuclear codes. The research has been done in collaboration with researchers and engineers of JAERI and a computer manufacturer. In this research, forty large-scale nuclear codes were investigated from the viewpoint of vectorization. Among them, twenty-six codes were actually vectorized and executed. As the results of the investigation, it is now estimated that about seventy percents of nuclear codes and seventy percents of our total amount of CPU time of JAERI are highly vectorizable. Based on the data obtained by the investigation, (1)currently vectorizable CPU time, (2)necessary number of vector processors, (3)necessary manpower for vectorization of nuclear codes, (4)computing speed, memory size, number of parallel 1/0 paths, size and speed of 1/0 buffer of vector processor suitable for our applications, (5)necessary software and operational policy for use of vector processors are discussed, and finally (6)a computer complex including vector processors is presented in this report. (author)
Digital control card based on digital signal processor

International Nuclear Information System (INIS)

Hou Shigang; Yin Zhiguo; Xia Le

2008-01-01

A digital control card based on digital signal processor was developed. Two Freescale DSP-56303 processors were utilized to achieve 3 channels proportional- integral-differential regulations. The card offers high flexibility for 100 MeV cyclotron RF system development. It was used as feedback controller in low level radio frequency control prototype, with the feedback gain parameters continuously adjustable. By using high precision analog to digital converter with 500 kHz sampling rate, a regulation bandwidth of 20 kHz was achieved. (authors)
Tunable microwave signal generation based on an Opto-DMD processor and a photonic crystal fiber

International Nuclear Information System (INIS)

Wang Tao; Sang Xin-Zhu; Yan Bin-Bin; Li Yan; Song Fei-Jun; Zhang Xia; Wang Kui-Ru; Yuan Jin-Hui; Yu Chong-Xiu; Ai Qi; Chen Xiao; Zhang Ying; Chen Gen-Xiang; Xiao Feng; Kamal Alameh

2014-01-01

Frequency-tunable microwave signal generation is proposed and experimentally demonstrated with a dual-wavelength single-longitudinal-mode (SLM) erbium-doped fiber ring laser based on a digital Opto-DMD processor and four-wave mixing (FWM) in a high-nonlinear photonic crystal fiber (PCF). The high-nonlinear PCF is employed for the generation of the FWM to obtain stable and uniform dual-wavelength oscillation. Two different short passive sub-ring cavities in the main ring cavity serve as mode filters to make SLM lasing. The two lasing wavelengths are electronically selected by loading different gratings on the Opto-DMD processor controlled with a computer. The wavelength spacing can be smartly adjusted from 0.165 nm to 1.08 nm within a tuning accuracy of 0.055 nm. Two microwave signals at 17.23 GHz and 27.47 GHz are achieved. The stability of the microwave signal is discussed. The system has the ability to generate a 137.36-GHz photonic millimeter signal at room temperature
A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System.

Science.gov (United States)

Zhang, Zhen; Ma, Cheng; Zhu, Rong

2017-08-23

Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA) architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP). The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO) real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.