WorldWideScience

Sample records for vlsi processor architectures

  1. A novel VLSI processor architecture for supercomputing arrays

    Science.gov (United States)

    Venkateswaran, N.; Pattabiraman, S.; Devanathan, R.; Ahmed, Ashaf; Venkataraman, S.; Ganesh, N.

    1993-01-01

    Design of the processor element for general purpose massively parallel supercomputing arrays is highly complex and cost ineffective. To overcome this, the architecture and organization of the functional units of the processor element should be such as to suit the diverse computational structures and simplify mapping of complex communication structures of different classes of algorithms. This demands that the computation and communication structures of different class of algorithms be unified. While unifying the different communication structures is a difficult process, analysis of a wide class of algorithms reveals that their computation structures can be expressed in terms of basic IP,IP,OP,CM,R,SM, and MAA operations. The execution of these operations is unified on the PAcube macro-cell array. Based on this PAcube macro-cell array, we present a novel processor element called the GIPOP processor, which has dedicated functional units to perform the above operations. The architecture and organization of these functional units are such to satisfy the two important criteria mentioned above. The structure of the macro-cell and the unification process has led to a very regular and simpler design of the GIPOP processor. The production cost of the GIPOP processor is drastically reduced as it is designed on high performance mask programmable PAcube arrays.

  2. A novel reconfigurable optical interconnect architecture using an Opto-VLSI processor and a 4-f imaging system.

    Science.gov (United States)

    Shen, Mingya; Xiao, Feng; Alameh, Kamal

    2009-12-07

    A novel reconfigurable optical interconnect architecture for on-board high-speed data transmission is proposed and experimentally demonstrated. The interconnect architecture is based on the use of an Opto-VLSI processor in conjunction with a 4-f imaging system to achieve reconfigurable chip-to-chip or board-to-board data communications. By reconfiguring the phase hologram of an Opto-VLSI processor, optical data generated by a vertical Cavity Surface Emitting Laser (VCSEL) associated to a chip (or a board) is arbitrarily steered to the photodetector associated to another chip (or another board). Experimental results show that the optical interconnect losses range from 5.8dB to 9.6dB, and that the maximum crosstalk level is below -36dB. The proposed architecture is tested for high-speed data transmission, and measured eye diagrams display good eye opening for data rate of up to 10Gb/s.

  3. Design and Realization of Array Signal Processor VLSI Architecture for Phased Array System

    Directory of Open Access Journals (Sweden)

    D. Govind Rao

    2016-08-01

    Full Text Available A method for implementing an array signal processor for phased array radars. The array signal processor can receive planar array antenna inputs and can process it. It is based on the application of Adaptive Digital beam formers using FPGAs. Adaptive filter algorithm used here is Inverse Q-R Decomposition based Recursive Least Squares (IQRD-RLS [1] algorithm. Array signal processor based on FPGAs is suitable in the areas of Phased Array Radar receiver, where speed, accuracy and numerical stability are of utmost important. Using IQRD-RLS algorithm, optimal weights are calculated in much less time compared to conventional QRD-RLS algorithm. A customized multiple FPGA board comprising three Kintex-7 FPGAs is employed to implement array signal processor. The proposed architecture can form multiple beams from planar array antenna elements

  4. VLSI Processor For Vector Quantization

    Science.gov (United States)

    Tawel, Raoul

    1995-01-01

    Pixel intensities in each kernel compared simultaneously with all code vectors. Prototype high-performance, low-power, very-large-scale integrated (VLSI) circuit designed to perform compression of image data by vector-quantization method. Contains relatively simple analog computational cells operating on direct or buffered outputs of photodetectors grouped into blocks in imaging array, yielding vector-quantization code word for each such block in sequence. Scheme exploits parallel-processing nature of vector-quantization architecture, with consequent increase in speed.

  5. Bilinear Interpolation Image Scaling Processor for VLSI

    Directory of Open Access Journals (Sweden)

    Ms. Pawar Ashwini Dilip

    2014-05-01

    Full Text Available We introduce image scaling processor using VLSI technique. It consist of Bilinear interpolation, clamp filter and a sharpening spatial filter. Bilinear interpolation algorithm is popular due to its computational efficiency and image quality. But resultant image consist of blurring edges and aliasing artifacts after scaling. To reduce the blurring and aliasing artifacts sharpening spatial filter and clamp filters are used as pre-filter. These filters are realized by using T-model and inversed T-model convolution kernels. To reduce the memory buffer and computing resources for proposed image processor design two T-model or inversed T-model filters are combined into combined filter which requires only one line buffer memory. Also, to reduce hardware cost Reconfigurable calculation unit (RCUis invented. The VLSI architecture in this work can achieve 280 MHz with 6.08-K gate counts, and its core area is 30 378 μm2 synthesized by a 0.13-μm CMOS process

  6. VLSI Architectures for Computing DFT's

    Science.gov (United States)

    Truong, T. K.; Chang, J. J.; Hsu, I. S.; Reed, I. S.; Pei, D. Y.

    1986-01-01

    Simplifications result from use of residue Fermat number systems. System of finite arithmetic over residue Fermat number systems enables calculation of discrete Fourier transform (DFT) of series of complex numbers with reduced number of multiplications. Computer architectures based on approach suitable for design of very-large-scale integrated (VLSI) circuits for computing DFT's. General approach not limited to DFT's; Applicable to decoding of error-correcting codes and other transform calculations. System readily implemented in VLSI.

  7. Wavelength-encoded OCDMA system using opto-VLSI processors.

    Science.gov (United States)

    Aljada, Muhsen; Alameh, Kamal

    2007-07-01

    We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.

  8. Geometric Design Rule Check of VLSI Layouts in Mesh Connected Processors

    Directory of Open Access Journals (Sweden)

    S. K. Nandy

    1994-01-01

    Full Text Available Design Rule Checking is a compute-intensive VLSI CAD tool. In this paper we propose a parallel algorithm to perform Design Rule Check (DRC of Layout geometries in a VLSI layout. The algorithm assumes the parallel architecture to be a two-dimensional mesh of processors. The algorithm is based on a linear quadtree representation of the layout. Through a complexity analysis it is shown that it is possible to achieve a linear speedup in DRC with respect to the number of processors.

  9. Opto-VLSI-based reconfigurable free-space optical interconnects architecture

    DEFF Research Database (Denmark)

    Aljada, Muhsen; Alameh, Kamal; Chung, Il-Sug;

    2007-01-01

    is the Opto-VLSI processor which can be driven by digital phase steering and multicasting holograms that reconfigure the optical interconnects between the input and output ports. The optical interconnects architecture is experimentally demonstrated at 2.5 Gbps using high-speed 1×3 VCSEL array and 1......This paper presents a short-distance reconfigurable high-speed optical interconnects architecture employing a Vertical Cavity Surface Emitting Laser (VCSEL) array, Opto-very-large-scale-integrated (Opto-VLSI) processors, and a photodetector (PD) array. The core component of the architecture......×3 photoreceiver array in conjunction with two 1×4096 pixel Opto-VLSI processors. The minimisation of the crosstalk between the output ports is achieved by appropriately aligning the VCSEL and PD elements with respect to the Opto-VLSI processors and driving the latter with optimal steering phase holograms....

  10. VLSI digital demodulator co-processor

    Science.gov (United States)

    Stephen, Karen J.; Buznitsky, Mitchell A.; Lindsey, Mark J.

    A demodulation coprocessor that incorporates into a single VLSI package a number of important arithmetic functions commonly encountered in demodulation processing is developed. The LD17 demodulator is designed for use in a digital modem as a companion to any of the commercially available digital signal processing (DSP) microprocessors. The LD17 includes an 8-b complex multiplier-accumulator (MAC), a programmable tone generator, a preintegrator, a dedicated noncoherent differential phase-shift keying (DPSK) calculator, and a program/data sequencer. By using a simple generic interface and small but powerful instruction set, the LD17 has the capability to operate in several architectural schemes with a minimum of glue logic. Speed, size, and power constraints will dictate which of these schemes is best for a particular application. The LD17 will be implemented in a 1.5-micron DLM CMOS gate array and packaged in an 84-pin JLCC. With the LD17 and its memory, the real-time processing compatibility of a typical DSP microprocessor can be extended to sampling rates from hundreds to thousands of kilosamples per second.

  11. Specification for a reconfigurable optoelectronic VLSI processor suitable for digital signal processing.

    Science.gov (United States)

    Fey, D; Kasche, B; Burkert, C; Tschäche, O

    1998-01-10

    A concept for a parallel digital signal processor based on opticalinterconnections and optoelectronic VLSI circuits is presented. Itis shown that the proper combination of optical communication, architecture, and algorithms allows a throughput that outperformspurely electronic solutions. The usefulness of low-level algorithmsfrom the add-and-shift class is emphasized. These algorithms leadto fine-grain, massively parallel on-chip processor architectures withhigh demands for optical off-chip interconnections. A comparativeperformance analysis shows the superiority of a bit-serialarchitecture. This architecture is mapped onto an optoelectronicthree-dimensional circuit, and the necessary optical interconnectionscheme is specified.

  12. Phase-Synchronization Early Epileptic Seizure Detector VLSI Architecture.

    Science.gov (United States)

    Abdelhalim, K; Smolyakov, V; Genov, R

    2011-10-01

    A low-power VLSI processor architecture that computes in real time the magnitude and phase-synchronization of two input neural signals is presented. The processor is a part of an envisioned closed-loop implantable microsystem for adaptive neural stimulation. The architecture uses three CORDIC processing cores that require shift-and-add operations but no multiplication. The 10-bit processor synthesized and prototyped in a standard 1.2 V 0.13 μm CMOS technology utilizes 41,000 logic gates. It dissipates 3.6 μW per input pair, and provides 1.7 kS/s per-channel throughput when clocked at 2.5 MHz. The power scales linearly with the number of input channels or the sampling rate. The efficacy of the processor in early epileptic seizure detection is validated on human intracranial EEG data.

  13. Embedded Processor Based Automatic Temperature Control of VLSI Chips

    Directory of Open Access Journals (Sweden)

    Narasimha Murthy Yayavaram

    2009-01-01

    Full Text Available This paper presents embedded processor based automatic temperature control of VLSI chips, using temperature sensor LM35 and ARM processor LPC2378. Due to the very high packing density, VLSI chips get heated very soon and if not cooled properly, the performance is very much affected. In the present work, the sensor which is kept very near proximity to the IC will sense the temperature and the speed of the fan arranged near to the IC is controlled based on the PWM signal generated by the ARM processor. A buzzer is also provided with the hardware, to indicate either the failure of the fan or overheating of the IC. The entire process is achieved by developing a suitable embedded C program.

  14. Hybrid VLSI/QCA Architecture for Computing FFTs

    Science.gov (United States)

    Fijany, Amir; Toomarian, Nikzad; Modarres, Katayoon; Spotnitz, Matthew

    2003-01-01

    A data-processor architecture that would incorporate elements of both conventional very-large-scale integrated (VLSI) circuitry and quantum-dot cellular automata (QCA) has been proposed to enable the highly parallel and systolic computation of fast Fourier transforms (FFTs). The proposed circuit would complement the QCA-based circuits described in several prior NASA Tech Briefs articles, namely Implementing Permutation Matrices by Use of Quantum Dots (NPO-20801), Vol. 25, No. 10 (October 2001), page 42; Compact Interconnection Networks Based on Quantum Dots (NPO-20855) Vol. 27, No. 1 (January 2003), page 32; and Bit-Serial Adder Based on Quantum Dots (NPO-20869), Vol. 27, No. 1 (January 2003), page 35. The cited prior articles described the limitations of very-large-scale integrated (VLSI) circuitry and the major potential advantage afforded by QCA. To recapitulate: In a VLSI circuit, signal paths that are required not to interact with each other must not cross in the same plane. In contrast, for reasons too complex to describe in the limited space available for this article, suitably designed and operated QCAbased signal paths that are required not to interact with each other can nevertheless be allowed to cross each other in the same plane without adverse effect. In principle, this characteristic could be exploited to design compact, coplanar, simple (relative to VLSI) QCA-based networks to implement complex, advanced interconnection schemes.

  15. New Generation Processor Architecture Research

    Institute of Scientific and Technical Information of China (English)

    Chen Hongsong(陈红松); Hu Mingzeng; Ji Zhenzhou

    2003-01-01

    With the rapid development of microelectronics and hardware,the use of ever faster micro-processors and new architecture must be continued to meet tomorrow′s computing needs. New processor microarchitectures are needed to push performance further and to use higher transistor counts effectively.At the same time,aiming at different usages,the processor has been optimized in different aspects,such as high performace,low power consumption,small chip area and high security. SOC (System on chip)and SCMP (Single Chip Multi Processor) constitute the main processor system architecture.

  16. High-speed (2.5 Gbps) reconfigurable inter-chip optical interconnects using opto-VLSI processors.

    Science.gov (United States)

    Aljada, Muhsen; Alameh, Kamal E; Lee, Yong-Tak; Chung, Il-Sug

    2006-07-24

    Reconfigurablele optical interconnects enable flexible and high-performance communication in multi-chip architectures to be arbitrarily adapted, leading to efficient parallel signal processing. The use of Opto-VLSI processors as beam steerers and multicasters for reconfigurable inter-chip optical interconnection is discussed. We demonstrate, as proof-of-concept, 2.5 Gbps reconfigurable optical interconnects between an 850nm vertical cavity surface emitting lasers (VCSEL) array and a photodiode (PD) array integrated onto a PCB by driving two Opto-VLSI processors with steering and multicasting digital phase holograms. The architecture is experimentally demonstrated through three scenarios showing its flexibility to perform single, multicasting, and parallel reconfigurable optical interconnects. To our knowledge, this is the first reported high-speed reconfigurable N-to-N optical interconnects architecture, which will have a significant impact on the flexibility and efficiency of large shared-memory multiprocessor machines.

  17. Implementing neural architectures using analog VLSI circuits

    Science.gov (United States)

    Maher, Mary Ann C.; Deweerth, Stephen P.; Mahowald, Misha A.; Mead, Carver A.

    1989-05-01

    Analog very large-scale integrated (VLSI) technology can be used not only to study and simulate biological systems, but also to emulate them in designing artificial sensory systems. A methodology for building these systems in CMOS VLSI technology has been developed using analog micropower circuit elements that can be hierarchically combined. Using this methodology, experimental VLSI chips of visual and motor subsystems have been designed and fabricated. These chips exhibit behavior similar to that of biological systems, and perform computations useful for artificial sensory systems.

  18. Experimental demonstration of a tunable laser using an SOA and an Opto-VLSI Processor.

    Science.gov (United States)

    Aljada, Muhsen; Zheng, Rong; Alameh, Kamal; Lee, Yong-Tak

    2007-07-23

    In this paper we propose and experimentally demonstrate a tunable laser structure cascading a semiconductor optical amplifier (SOA) that generates broadband amplified spontaneous emission and a reflective Opto-VLSI processor that dynamically reflects arbitrarily wavelengths and injects them back into the SOA, thus synthesizing an output signal of variable wavelength. The wavelength tunablility is performed using digital phase holograms uploaded on the Opto-VLSI processor. Experimental results demonstrate a tuning range from 1524nm to 1534nm, and show that the proposed tunable laser structure has a stable performance.

  19. A VLSI Processor Design of Real-Time Data Compression for High-Resolution Imaging Radar

    Science.gov (United States)

    Fang, W.

    1994-01-01

    For the high-resolution imaging radar systems, real-time data compression of raw imaging data is required to accomplish the science requirements and satisfy the given communication and storage constraints. The Block Adaptive Quantizer (BAQ) algorithm and its associated VLSI processor design have been developed to provide a real-time data compressor for high-resolution imaging radar systems.

  20. A Novel Efficient VLSI Architecture for IEEE 754 Floating point multiplier using Modified CSA

    Directory of Open Access Journals (Sweden)

    Nishi Pandey

    2015-10-01

    Full Text Available Due to advancement of new technology in the field of VLSI and Embedded system, there is an increasing demand of high speed and low power consumption processor. Speed of processor greatly depends on its multiplier as well as adder performance. In spite of complexity involved in floating point arithmetic, its implementation is increasing day by day. Due to which high speed adder architecture become important. Several adder architecture designs have been developed to increase the efficiency of the adder. In this paper, we introduce an architecture that performs high speed IEEE 754 floating point multiplier using modified carry select adder (CSA. Modified CSA depend on booth encoder (BEC Technique. Booth encoder, Mathematics is an ancient Indian system of Mathematics. Here we are introduced two carry select based design. These designs are implementation Xilinx Vertex device family

  1. An efficient interpolation filter VLSI architecture for HEVC standard

    Science.gov (United States)

    Zhou, Wei; Zhou, Xin; Lian, Xiaocong; Liu, Zhenyu; Liu, Xiaoxiang

    2015-12-01

    The next-generation video coding standard of High-Efficiency Video Coding (HEVC) is especially efficient for coding high-resolution video such as 8K-ultra-high-definition (UHD) video. Fractional motion estimation in HEVC presents a significant challenge in clock latency and area cost as it consumes more than 40 % of the total encoding time and thus results in high computational complexity. With aims at supporting 8K-UHD video applications, an efficient interpolation filter VLSI architecture for HEVC is proposed in this paper. Firstly, a new interpolation filter algorithm based on the 8-pixel interpolation unit is proposed in this paper. It can save 19.7 % processing time on average with acceptable coding quality degradation. Based on the proposed algorithm, an efficient interpolation filter VLSI architecture, composed of a reused data path of interpolation, an efficient memory organization, and a reconfigurable pipeline interpolation filter engine, is presented to reduce the implement hardware area and achieve high throughput. The final VLSI implementation only requires 37.2k gates in a standard 90-nm CMOS technology at an operating frequency of 240 MHz. The proposed architecture can be reused for either half-pixel interpolation or quarter-pixel interpolation, which can reduce the area cost for about 131,040 bits RAM. The processing latency of our proposed VLSI architecture can support the real-time processing of 4:2:0 format 7680 × 4320@78fps video sequences.

  2. VLSI architecture of NEO spike detection with noise shaping filter and feature extraction using informative samples.

    Science.gov (United States)

    Hoang, Linh; Yang, Zhi; Liu, Wentai

    2009-01-01

    An emerging class of multi-channel neural recording systems aims to simultaneously monitor the activity of many neurons by miniaturizing and increasing the number of recording channels. Vast volume of data from the recording systems, however, presents a challenge for processing and transmitting wirelessly. An on-chip neural signal processor is needed for filtering uninterested recording samples and performing spike sorting. This paper presents a VLSI architecture of a neural signal processor that can reliably detect spike via a nonlinear energy operator, enhance spike signal over noise ratio by a noise shaping filter, and select meaningful recording samples for clustering by using informative samples. The architecture is implemented in 90-nm CMOS process, occupies 0.2 mm(2), and consumes 0.5 mW of power.

  3. Tunable multi-wavelength fiber lasers based on an Opto-VLSI processor and optical amplifiers.

    Science.gov (United States)

    Xiao, Feng; Alameh, Kamal; Lee, Yong Tak

    2009-12-07

    A multi-wavelength tunable fiber laser based on the use of an Opto-VLSI processor in conjunction with different optical amplifiers is proposed and experimentally demonstrated. The Opto-VLSI processor can simultaneously select any part of the gain spectrum from each optical amplifier into its associated fiber ring, leading to a multiport tunable fiber laser source. We experimentally demonstrate a 3-port tunable fiber laser source, where each output wavelength of each port can independently be tuned within the C-band with a wavelength step of about 0.05 nm. Experimental results demonstrate a laser linewidth as narrow as 0.05 nm and an optical side-mode-suppression-ratio (SMSR) of about 35 dB. The demonstrated three fiber lasers have excellent stability at room temperature and output power uniformity less than 0.5 dB over the whole C-band.

  4. VLSI ARCHITECTURE OF AN AREA EFFICIENT IMAGE INTERPOLATION

    Directory of Open Access Journals (Sweden)

    John Moses C

    2014-05-01

    Full Text Available Image interpolation is widely used in many image processing applications, such as digital camera, mobile phone, tablet and display devices. Image interpolation is a method of estimating the new data points within the range of discrete set of known data points. Image interpolation can also be referred as image scaling, image resizing, image re-sampling and image zooming. This paper presents VLSI (Very Large Scale Integration architecture of an area efficient image interpolation algorithm for any two dimensional (2-D image scalar. This architecture is implemented in FPGA (Field Programmable Gate Array and the performance of this system is simulated using Xilinx system generator and synthesized using Xilinx ISE smulation tool. Various VLSI parameters such as combinational path delay, CPU time, memory usage, number of LUTs (Look Up Tables are measured from the synthesis report.

  5. VLSI neural system architecture for finite ring recursive reduction.

    Science.gov (United States)

    Zhang, D; Jullien, G A

    1996-12-01

    The use of neural-like networks to implement finite ring computations has been presented in a previous paper. This paper develops efficient VLSI neural system architecture for the finite ring recursive reduction (FRRR), including module reduction, MSB carry iteration and feedforward processing. These techniques deal with the basic principles involved in constructing a FRRR, and their implementations are efficiently matched to the VLSI medium. Compared with the other structure models for finite ring computation (e.g. modification of binary arithmetic logic and bit-steered ROM's), the FRRR structure has the lowest area complexity in silicon while maintaining a high throughput rate. Examples of several implementations are used to illustrate the effectiveness of the FRRR architecture.

  6. Digital VLSI algorithms and architectures for support vector machines.

    Science.gov (United States)

    Anguita, D; Boni, A; Ridella, S

    2000-06-01

    In this paper, we propose some very simple algorithms and architectures for a digital VLSI implementation of Support Vector Machines. We discuss the main aspects concerning the realization of the learning phase of SVMs, with special attention on the effects of fixed-point math for computing and storing the parameters of the network. Some experiments on two classification problems are described that show the efficiency of the proposed methods in reaching optimal solutions with reasonable hardware requirements.

  7. A VLSI architecture for simplified arithmetic Fourier transform algorithm

    Science.gov (United States)

    Reed, Irving S.; Shih, Ming-Tang; Truong, T. K.; Hendon, E.; Tufts, D. W.

    1992-01-01

    The arithmetic Fourier transform (AFT) is a number-theoretic approach to Fourier analysis which has been shown to perform competitively with the classical FFT in terms of accuracy, complexity, and speed. Theorems developed in a previous paper for the AFT algorithm are used here to derive the original AFT algorithm which Bruns found in 1903. This is shown to yield an algorithm of less complexity and of improved performance over certain recent AFT algorithms. A VLSI architecture is suggested for this simplified AFT algorithm. This architecture uses a butterfly structure which reduces the number of additions by 25 percent of that used in the direct method.

  8. A pipeline VLSI design of fast singular value decomposition processor for real-time EEG system based on on-line recursive independent component analysis.

    Science.gov (United States)

    Huang, Kuan-Ju; Shih, Wei-Yeh; Chang, Jui Chung; Feng, Chih Wei; Fang, Wai-Chi

    2013-01-01

    This paper presents a pipeline VLSI design of fast singular value decomposition (SVD) processor for real-time electroencephalography (EEG) system based on on-line recursive independent component analysis (ORICA). Since SVD is used frequently in computations of the real-time EEG system, a low-latency and high-accuracy SVD processor is essential. During the EEG system process, the proposed SVD processor aims to solve the diagonal, inverse and inverse square root matrices of the target matrices in real time. Generally, SVD requires a huge amount of computation in hardware implementation. Therefore, this work proposes a novel design concept for data flow updating to assist the pipeline VLSI implementation. The SVD processor can greatly improve the feasibility of real-time EEG system applications such as brain computer interfaces (BCIs). The proposed architecture is implemented using TSMC 90 nm CMOS technology. The sample rate of EEG raw data adopts 128 Hz. The core size of the SVD processor is 580×580 um(2), and the speed of operation frequency is 20MHz. It consumes 0.774mW of power during the 8-channel EEG system per execution time.

  9. Efficient VLSI architecture for training radial basis function networks.

    Science.gov (United States)

    Fan, Zhe-Cheng; Hwang, Wen-Jyi

    2013-03-19

    This paper presents a novel VLSI architecture for the training of radial basis function (RBF) networks. The architecture contains the circuits for fuzzy C-means (FCM) and the recursive Least Mean Square (LMS) operations. The FCM circuit is designed for the training of centers in the hidden layer of the RBF network. The recursive LMS circuit is adopted for the training of connecting weights in the output layer. The architecture is implemented by the field programmable gate array (FPGA). It is used as a hardware accelerator in a system on programmable chip (SOPC) for real-time training and classification. Experimental results reveal that the proposed RBF architecture is an effective alternative for applications where fast and efficient RBF training is desired.

  10. Efficient VLSI Architecture for Training Radial Basis Function Networks

    Directory of Open Access Journals (Sweden)

    Wen-Jyi Hwang

    2013-03-01

    Full Text Available This paper presents a novel VLSI architecture for the training of radial basis function (RBF networks. The architecture contains the circuits for fuzzy C-means (FCM and the recursive Least Mean Square (LMS operations. The FCM circuit is designed for the training of centers in the hidden layer of the RBF network. The recursive LMS circuit is adopted for the training of connecting weights in the output layer. The architecture is implemented by the field programmable gate array (FPGA. It is used as a hardware accelerator in a system on programmable chip (SOPC for real-time training and classification. Experimental results reveal that the proposed RBF architecture is an effective alternative for applications where fast and efficient RBF training is desired.

  11. VLSI design

    CERN Document Server

    Einspruch, Norman G

    1986-01-01

    VLSI Electronics Microstructure Science, Volume 14: VLSI Design presents a comprehensive exposition and assessment of the developments and trends in VLSI (Very Large Scale Integration) electronics. This volume covers topics that range from microscopic aspects of materials behavior and device performance to the comprehension of VLSI in systems applications. Each article is prepared by a recognized authority. The subjects discussed in this book include VLSI processor design methodology; the RISC (Reduced Instruction Set Computer); the VLSI testing program; silicon compilers for VLSI; and special

  12. Critical review of programmable media processor architectures

    Science.gov (United States)

    Berg, Stefan G.; Sun, Weiyun; Kim, Donglok; Kim, Yongmin

    1998-12-01

    In the past several years, there has been a surge of new programmable mediaprocessors introduced to provide an alternative solution to ASICs and dedicated hardware circuitries in the multimedia PC and embedded consumer electronics markets. These processors attempt to combine the programmability of multimedia-enhanced general purpose processors with the performance and low cost of dedicated hardware. We have reviewed five current multimedia architectures and evaluated their strengths and weaknesses.

  13. Implementation Issues for Algorithmic VLSI (Very Large Scale Integration) Processor Arrays.

    Science.gov (United States)

    1984-10-01

    analysis of the various algorithms are described in Appendiccs 5.A, 5.B and 5.C. A note on notation: Following Ottmann ei aL [40], the variable n is used...redundant operations OK. Ottmann log i I log 1 up to n wasted processors. X-tree topology. Atallah log n I 1 redundant operations OK. up to n wasted...for Computing Machinery 14(2):203-241, April, 1967. 40] Thomas A. Ottmann , Arnold L. Rosenberg and Larry J. Stockmeyer. A dictionary machine (for VLSI

  14. Design of Analog VLSI Architecture for DCT

    Directory of Open Access Journals (Sweden)

    M.Thiruveni

    2012-08-01

    Full Text Available When implementing real-time DSP algorithms on digital circuits, the system is always constrained by limited speed, accuracy and roundoff noise. These limitations must be taken into account for the design and implementation stages. Doubling the dynamic rate of theanalog DCT is expensive, whereas in digital DCT an addition of 1 bit in data path is adequate. This paper proposes a novel approach ofanalog CMOS implementation technique for Digital Signal Processing (DSP algorithms to reduce the area and power requirement in theexisting Digital CMOS implementations. Discrete Cosine Transform (DCT with signed coefficients have been designed andimplemented in this paper. The problems of digital DCTs viz., quantization error, round-off noise, high power consumption and largearea are overcome by the proposed implementation. It can be used to develop the architecture design of DFT, DST and DHT.

  15. Optical linear algebra processors - Architectures and algorithms

    Science.gov (United States)

    Casasent, David

    1986-01-01

    Attention is given to the component design and optical configuration features of a generic optical linear algebra processor (OLAP) architecture, as well as the large number of OLAP architectures, number representations, algorithms and applications encountered in current literature. Number-representation issues associated with bipolar and complex-valued data representations, high-accuracy (including floating point) performance, and the base or radix to be employed, are discussed, together with case studies on a space-integrating frequency-multiplexed architecture and a hybrid space-integrating and time-integrating multichannel architecture.

  16. Optical linear algebra processors - Architectures and algorithms

    Science.gov (United States)

    Casasent, David

    1986-01-01

    Attention is given to the component design and optical configuration features of a generic optical linear algebra processor (OLAP) architecture, as well as the large number of OLAP architectures, number representations, algorithms and applications encountered in current literature. Number-representation issues associated with bipolar and complex-valued data representations, high-accuracy (including floating point) performance, and the base or radix to be employed, are discussed, together with case studies on a space-integrating frequency-multiplexed architecture and a hybrid space-integrating and time-integrating multichannel architecture.

  17. Intrusion Detection Architecture Utilizing Graphics Processors

    Directory of Open Access Journals (Sweden)

    Branislav Madoš

    2012-12-01

    Full Text Available With the thriving technology and the great increase in the usage of computer networks, the risk of having these network to be under attacks have been increased. Number of techniques have been created and designed to help in detecting and/or preventing such attacks. One common technique is the use of Intrusion Detection Systems (IDS. Today, number of open sources and commercial IDS are available to match enterprises requirements. However, the performance of these systems is still the main concern. This paper examines perceptions of intrusion detection architecture implementation, resulting from the use of graphics processor. It discusses recent research activities, developments and problems of operating systems security. Some exploratory evidence is presented that shows capabilities of using graphical processors and intrusion detection systems. The focus is on how knowledge experienced throughout the graphics processor inclusion has played out in the design of intrusion detection architecture that is seen as an opportunity to strengthen research expertise.

  18. VLSI Design of a Variable-Length FFT/IFFT Processor for OFDM-Based Communication Systems

    Directory of Open Access Journals (Sweden)

    Jen-Chih Kuo

    2003-12-01

    Full Text Available The technique of {orthogonal frequency division multiplexing (OFDM} is famous for its robustness against frequency-selective fading channel. This technique has been widely used in many wired and wireless communication systems. In general, the {fast Fourier transform (FFT} and {inverse FFT (IFFT} operations are used as the modulation/demodulation kernel in the OFDM systems, and the sizes of FFT/IFFT operations are varied in different applications of OFDM systems. In this paper, we design and implement a variable-length prototype FFT/IFFT processor to cover different specifications of OFDM applications. The cached-memory FFT architecture is our suggested VLSI system architecture to design the prototype FFT/IFFT processor for the consideration of low-power consumption. We also implement the twiddle factor butterfly {processing element (PE} based on the {{coordinate} rotation digital computer (CORDIC} algorithm, which avoids the use of conventional multiplication-and-accumulation unit, but evaluates the trigonometric functions using only add-and-shift operations. Finally, we implement a variable-length prototype FFT/IFFT processor with TSMC 0.35 μm 1P4M CMOS technology. The simulations results show that the chip can perform (64-2048-point FFT/IFFT operations up to 80 MHz operating frequency which can meet the speed requirement of most OFDM standards such as WLAN, ADSL, VDSL (256∼2K, DAB, and 2K-mode DVB.

  19. On VLSI Design of Rank-Order Filtering using DCRAM Architecture.

    Science.gov (United States)

    Lin, Meng-Chun; Dung, Lan-Rong

    2008-02-01

    This paper addresses on VLSI design of rank-order filtering (ROF) with a maskable memory for real-time speech and image processing applications. Based on a generic bit-sliced ROF algorithm, the proposed design uses a special-defined memory, called the dual-cell random-access memory (DCRAM), to realize major operations of ROF: threshold decomposition and polarization. Using the memory-oriented architecture, the proposed ROF processor can benefit from high flexibility, low cost and high speed. The DCRAM can perform the bit-sliced read, partial write, and pipelined processing. The bit-sliced read and partial write are driven by maskable registers. With recursive execution of the bit-slicing read and partial write, the DCRAM can effectively realize ROF in terms of cost and speed. The proposed design has been implemented using TSMC 0.18 μm 1P6M technology. As shown in the result of physical implementation, the core size is 356.1 × 427.7μm(2) and the VLSI implementation of ROF can operate at 256 MHz for 1.8V supply.

  20. An Efficient Circulant MIMO Equalizer for CDMA Downlink: Algorithm and VLSI Architecture

    Directory of Open Access Journals (Sweden)

    Cavallaro Joseph R

    2006-01-01

    Full Text Available We present an efficient circulant approximation-based MIMO equalizer architecture for the CDMA downlink. This reduces the direct matrix inverse (DMI of size with complexity to some FFT operations with complexity and the inverse of some submatrices. We then propose parallel and pipelined VLSI architectures with Hermitian optimization and reduced-state FFT for further complexity optimization. Generic VLSI architectures are derived for the high-order receiver from partitioned submatrices. This leads to more parallel VLSI design with further complexity reduction. Comparative study with both the conjugate-gradient and DMI algorithms shows very promising performance/complexity tradeoff. VLSI design space in terms of area/time efficiency is explored extensively for layered parallelism and pipelining with a Catapult C high-level-synthesis methodology.

  1. VLSI based FFT Processor with Improvement in Computation Speed and Area Reduction

    Directory of Open Access Journals (Sweden)

    M.Sheik Mohamed

    2013-06-01

    Full Text Available In this paper, a modular approach is presented to develop parallel pipelined architectures for the fast Fourier transform (FFT processor. The new pipelined FFT architecture has the advantage of underutilized hardware based on the complex conjugate of final stage results without increasing the hardware complexity. The operating frequency of the new architecture can be decreased that in turn reduces the power consumption. A comparison of area and computing time are drawn between the new design and the previous architectures. The new structure is synthesized using Xilinx ISE and simulated using ModelSim Starter Edition. The designed FFT algorithm is realized in our processor to reduce the number of complex computations.

  2. VLSI Implementation of Hybrid Algorithm Architecture for Speech Enhancement

    Directory of Open Access Journals (Sweden)

    Jigar Shah

    2012-07-01

    Full Text Available The speech enhancement techniques are required to improve the speech signal quality without causing any offshoot in many applications. Recently the growing use of cellular and mobile phones, hands free systems, VoIP phones, voice messaging service, call service centers etc. require efficient real time speech enhancement and detection strategies to make them superior over conventional speech communication systems. The speech enhancement algorithms are required to deal with additive noise and convolutive distortion that occur in any wireless communication system. Also the single channel (one microphone signal is available in real environments. Hence a single channel hybrid algorithm is used which combines minimum mean square error-log spectral amplitude (MMSE-LSA algorithm for additive noise removal and the relative spectral amplitude (RASTA algorithm for reverberation cancellation. The real time and embedded implementation on directly available DSP platforms like TMS320C6713 shows some defects. Hence the VLSI implementation using semi-custom (e.g. FPGA or full-custom approach is required. One such architecture is proposed in this paper.

  3. Efficient VLSI architecture of CAVLC decoder with power optimized

    Institute of Scientific and Technical Information of China (English)

    CHEN Guang-hua; HU Deng-ji; ZHANG Jin-yi; ZHENG Wei-feng; ZENG Wei-min

    2009-01-01

    This paper presents an efficient VLSI architecture of the contest-based adaptive variable length code (CAVLC) decoder with power optimized for the H.264/advanced video coding (AVC) standard. In the proposed design, according to the regularity of the codewords, the first one detector is used to solve the low efficiency and high power dissipation problem within the traditional method of table-searching. Considering the relevance of the data used in the process of runbefore's decoding,arithmetic operation is combined with finite state machine (FSM), which achieves higher decoding efficiency. According to the CAVLC decoding flow, clock gating is employed in the module level and the register level respectively, which reduces 43% of the overall dynamic power dissipation. The proposed design can decode every syntax element in one clock cycle. When the proposed design is synthesized at the clock constraint of 100 MHz, the synthesis result shows that the design costs 11 300gates under a 0.25 μm CMOS technology, which meets the demand of real time decoding in the H.264/AVC standard.

  4. Parallel optimization algorithms and their implementation in VLSI design

    Science.gov (United States)

    Lee, G.; Feeley, J. J.

    1991-01-01

    Two new parallel optimization algorithms based on the simplex method are described. They may be executed by a SIMD parallel processor architecture and be implemented in VLSI design. Several VLSI design implementations are introduced. An application example is reported to demonstrate that the algorithms are effective.

  5. An efficient VLSI implementation of on-line recursive ICA processor for real-time multi-channel EEG signal separation.

    Science.gov (United States)

    Shih, Wei-Yeh; Liao, Jui-Chieh; Huang, Kuan-Ju; Fang, Wai-Chi; Cauwenberghs, Gert; Jung, Tzyy-Ping

    2013-01-01

    This paper presents an efficient VLSI implementation of on-line recursive ICA (ORICA) processor for real-time multi-channel EEG signal separation. The proposed design contains a system control unit, a whitening unit, a singular value decomposition unit, a floating matrix multiply unit and, and an ORICA weight training unit. Because the input sample rate of the ORICA processor is 128 Hz, the ORICA processor should produce independent components before the next sample is input in 1/128 s. Under the timing constraints of commutating multi-channel ORICA in real time, the design of the ORICA processor is a mixed architecture, which is designed as different hardware parallelism according to the complexity of processing units. The shared arithmetic processing unit and shared register can reduce hardware complexity and power consumption. The proposed design is implemented used TSMC 90 nm CMOS technology with 8-channel EEG processing in 128 Hz sample rate of raw data and consumes 2.827 mW at 50 MHz clock rate. The performance of the proposed design is also shown to reach 0.0078125 s latency after each EEG sample time, and the average correlation coefficient between the original source signals and extracted ORICA signals for each 1 s frame is 0.9763.

  6. High-Level Synthesis of VLSI Processors for Intelligent Integrated SystemsBased on Logic-in-Memory Structure

    Science.gov (United States)

    Kudoh, Takao; Kameyama, Michitaka

    One of the most serious problems in recent VLSI systems is data transfer bottleneck between memories and processing elements. To solve the problem, a model of highly parallel VLSI processors for intelligent integrated systems is presented. A logic-in-memory module composed of a processing element, a register and a local memory is defined as a basic building block to form a regular parallel structure. The data transfer between adjacent modules are done simply in a single clock period by a shift-register chain. A high-level synthesis method is discussed on the hardware model, when a data-dependency graph corresponding to a processing algorithm is given. We must simultaneously consider both scheduling and allocation for the time optimization problem under a constraint of an chip area. That is, we consider the best scheduling together with allocation such that the processing time becomes minimum under a constraint of a fixed number of modules. Not only an exhaustive enumeration method but also a branch-and-bound method is proposed for the problem. As a result, it is made clear that the proposed high-level synthesis method is very effective to design special-purpose VLSI processors free from data transfer bottleneck.

  7. Novel broadband reconfigurable optical add-drop multiplexer employing custom fiber arrays and Opto-VLSI processors.

    Science.gov (United States)

    Xiao, Feng; Juswardy, Budi; Alameh, Kamal; Lee, Yong Tak

    2008-08-04

    A reconfigurable optical add/drop multiplexer (ROADM) structure based on using a custom-made fiber array and an Opto-VLSI processor is proposed and demonstrated. The fiber array consists of N pairs of angled fibers corresponding to N channels, each of which can independently perform add, drop, and thru functions through a reconfigurable Opto-VLSI beam steerer. Experimental results show that the ROADM structure can attain an average add, drop/thru insertion loss of 5.5 dB and a uniformity of 0.3 dB over a wide bandwidth from 1524 nm to 1576 nm, and a drop/thru crosstalk level as small as -40 dB.

  8. Design of 10Gbps optical encoder/decoder structure for FE-OCDMA system using SOA and opto-VLSI processors.

    Science.gov (United States)

    Aljada, Muhsen; Hwang, Seow; Alameh, Kamal

    2008-01-21

    In this paper we propose and experimentally demonstrate a reconfigurable 10Gbps frequency-encoded (1D) encoder/decoder structure for optical code division multiple access (OCDMA). The encoder is constructed using a single semiconductor optical amplifier (SOA) and 1D reflective Opto-VLSI processor. The SOA generates broadband amplified spontaneous emission that is dynamically sliced using digital phase holograms loaded onto the Opto-VLSI processor to generate 1D codewords. The selected wavelengths are injected back into the same SOA for amplifications. The decoder is constructed using single Opto-VLSI processor only. The encoded signal can successfully be retrieved at the decoder side only when the digital phase holograms of the encoder and the decoder are matched. The system performance is measured in terms of the auto-correlation and cross-correlation functions as well as the eye diagram.

  9. VLSI neuroprocessors

    Science.gov (United States)

    Kemeny, Sabrina E.

    1994-01-01

    Electronic and optoelectronic hardware implementations of highly parallel computing architectures address several ill-defined and/or computation-intensive problems not easily solved by conventional computing techniques. The concurrent processing architectures developed are derived from a variety of advanced computing paradigms including neural network models, fuzzy logic, and cellular automata. Hardware implementation technologies range from state-of-the-art digital/analog custom-VLSI to advanced optoelectronic devices such as computer-generated holograms and e-beam fabricated Dammann gratings. JPL's concurrent processing devices group has developed a broad technology base in hardware implementable parallel algorithms, low-power and high-speed VLSI designs and building block VLSI chips, leading to application-specific high-performance embeddable processors. Application areas include high throughput map-data classification using feedforward neural networks, terrain based tactical movement planner using cellular automata, resource optimization (weapon-target assignment) using a multidimensional feedback network with lateral inhibition, and classification of rocks using an inner-product scheme on thematic mapper data. In addition to addressing specific functional needs of DOD and NASA, the JPL-developed concurrent processing device technology is also being customized for a variety of commercial applications (in collaboration with industrial partners), and is being transferred to U.S. industries. This viewgraph p resentation focuses on two application-specific processors which solve the computation intensive tasks of resource allocation (weapon-target assignment) and terrain based tactical movement planning using two extremely different topologies. Resource allocation is implemented as an asynchronous analog competitive assignment architecture inspired by the Hopfield network. Hardware realization leads to a two to four order of magnitude speed-up over conventional

  10. VLSI System Implementation of 200 MHz, 8-bit, 90nm CMOS Arithmetic and Logic Unit (ALU) Processor Controller

    OpenAIRE

    2012-01-01

    In this present study includes the Very Large Scale Integration (VLSI) system implementation of 200MHz, 8-bit, 90nm Complementary Metal Oxide Semiconductor (CMOS) Arithmetic and Logic Unit (ALU) processor control with logic gate design style and 0.12µm six metal 90nm CMOS fabrication technology. The system blocks and the behaviour are defined and the logical design is implemented in gate level in the design phase. Then, the logic circuits are simulated and the subunits are converted in to 90n...

  11. VLSI System Implementation of 200 MHz, 8-bit, 90nm CMOS Arithmetic and Logic Unit (ALU Processor Controller

    Directory of Open Access Journals (Sweden)

    Fazal NOORBASHA

    2012-08-01

    Full Text Available In this present study includes the Very Large Scale Integration (VLSI system implementation of 200MHz, 8-bit, 90nm Complementary Metal Oxide Semiconductor (CMOS Arithmetic and Logic Unit (ALU processor control with logic gate design style and 0.12µm six metal 90nm CMOS fabrication technology. The system blocks and the behaviour are defined and the logical design is implemented in gate level in the design phase. Then, the logic circuits are simulated and the subunits are converted in to 90nm CMOS layout. Finally, in order to construct the VLSI system these units are placed in the floor plan and simulated with analog and digital, logic and switch level simulators. The results of the simulations indicates that the VLSI system can control different instructions which can divided into sub groups: transfer instructions, arithmetic and logic instructions, rotate and shift instructions, branch instructions, input/output instructions, control instructions. The data bus of the system is 16-bit. It runs at 200MHz, and operating power is 1.2V. In this paper, the parametric analysis of the system, the design steps and obtained results are explained.

  12. Temporal Partitioning and Multi-Processor Scheduling for Reconfigurable Architectures

    DEFF Research Database (Denmark)

    Popp, Andreas; Le Moullec, Yannick; Koch, Peter

    This poster presentation outlines a proposed framework for handling mapping of signal processing applications to heterogeneous reconfigurable architectures. The methodology consists of an extension to traditional multi-processor scheduling by creating a separate HW track for generation of groups...... of tasks that are handled similarly to SW processes in a traditional multi-processor scheduling context....

  13. MICROTHREAD BASED (MTB) COARSE GRAINED FAULT TOLERANCE SUPERSCALAR PROCESSOR ARCHITECTURE

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Fault tolerance in microprocessor systems has become a popular topic of architecture research.Much work has been done at different levels to accomplish reliability against soft errors, and some fault tolerance architectures have been proposed. But little attention is paid to the thread level superscalar fault tolerance.This letter introduces microthread concept into superscalar processor fault tolerance domain, and puts forward a novel fault tolerance architecture, namely, MicroThread Based (MTB) coarse grained transient fault tolerance superscalar processor architecture, then discusses some detailed implementations.

  14. VLSI Architectures for Sliding-Window-Based Space-Time Turbo Trellis Code Decoders

    Directory of Open Access Journals (Sweden)

    Georgios Passas

    2012-01-01

    Full Text Available The VLSI implementation of SISO-MAP decoders used for traditional iterative turbo coding has been investigated in the literature. In this paper, a complete architectural model of a space-time turbo code receiver that includes elementary decoders is presented. These architectures are based on newly proposed building blocks such as a recursive add-compare-select-offset (ACSO unit, A-, B-, Γ-, and LLR output calculation modules. Measurements of complexity and decoding delay of several sliding-window-technique-based MAP decoder architectures and a proposed parameter set lead to defining equations and comparison between those architectures.

  15. Acoustooptic linear algebra processors - Architectures, algorithms, and applications

    Science.gov (United States)

    Casasent, D.

    1984-01-01

    Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.

  16. Acoustooptic linear algebra processors - Architectures, algorithms, and applications

    Science.gov (United States)

    Casasent, D.

    1984-01-01

    Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.

  17. VLSI architectures for modern error-correcting codes

    CERN Document Server

    Zhang, Xinmiao

    2015-01-01

    Error-correcting codes are ubiquitous. They are adopted in almost every modern digital communication and storage system, such as wireless communications, optical communications, Flash memories, computer hard drives, sensor networks, and deep-space probing. New-generation and emerging applications demand codes with better error-correcting capability. On the other hand, the design and implementation of those high-gain error-correcting codes pose many challenges. They usually involve complex mathematical computations, and mapping them directly to hardware often leads to very high complexity. VLSI

  18. FPGA Based Intelligent Co-operative Processor in Memory Architecture

    DEFF Research Database (Denmark)

    Ahmed, Zaki; Sotudeh, Reza; Hussain, Dil Muhammad Akbar

    2011-01-01

    In a continuing effort to improve computer system performance, Processor-In-Memory (PIM) architecture has emerged as an alternative solution. PIM architecture incorporates computational units and control logic directly on the memory to provide immediate access to the data. To exploit the potentia...

  19. FPGA Based Intelligent Co-operative Processor in Memory Architecture

    DEFF Research Database (Denmark)

    Ahmed, Zaki; Sotudeh, Reza; Hussain, Dil Muhammad Akbar

    2011-01-01

    In a continuing effort to improve computer system performance, Processor-In-Memory (PIM) architecture has emerged as an alternative solution. PIM architecture incorporates computational units and control logic directly on the memory to provide immediate access to the data. To exploit the potential...

  20. Design and Implementation of Quintuple Processor Architecture Using FPGA

    Directory of Open Access Journals (Sweden)

    P.Annapurna

    2014-09-01

    Full Text Available The advanced quintuple processor core is a design philosophy that has become a mainstream in Scientific and engineering applications. Increasing performance and gate capacity of recent FPGA devices permit complex logic systems to be implemented on a single programmable device. The embedded multiprocessors face a new problem with thread synchronization. It is caused by the distributed memory, when thread synchronization is violated the processors can access the same value at the same time. Basically the processor performance can be increased by adopting clock scaling technique and micro architectural Enhancements. Therefore, Designed a new Architecture called Advanced Concurrent Computing. This is implemented on the FPGA chip using VHDL. The advanced Concurrent Computing architecture performs a simultaneous use of both parallel and distributed computing. The full architecture of quintuple processor core designed for realistic to perform arithmetic, logical, shifting and bit manipulation operations. The proposed advanced quintuple processor core contains Homogeneous RISC processors, added with pipelined processing units, multi bus organization and I/O ports along with the other functional elements required to implement embedded SOC solutions. The designed quintuple performance issues like area, speed and power dissipation and propagation delay are analyzed at 90nm process technology using Xilinx tool.

  1. VLSI Architecture for Configurable and Low-Complexity Design of Hard-Decision Viterbi Decoding Algorithm

    Directory of Open Access Journals (Sweden)

    Rachmad Vidya Wicaksana Putra

    2016-06-01

    Full Text Available Convolutional encoding and data decoding are fundamental processes in convolutional error correction. One of the most popular error correction methods in decoding is the Viterbi algorithm. It is extensively implemented in many digital communication applications. Its VLSI design challenges are about area, speed, power, complexity and configurability. In this research, we specifically propose a VLSI architecture for a configurable and low-complexity design of a hard-decision Viterbi decoding algorithm. The configurable and low-complexity design is achieved by designing a generic VLSI architecture, optimizing each processing element (PE at the logical operation level and designing a conditional adapter. The proposed design can be configured for any predefined number of trace-backs, only by changing the trace-back parameter value. Its computational process only needs N + 2 clock cycles latency, with N is the number of trace-backs. Its configurability function has been proven for N = 8, N = 16, N = 32 and N = 64. Furthermore, the proposed design was synthesized and evaluated in Xilinx and Altera FPGA target boards for area consumption and speed performance.

  2. VLSI architectures for computing multiplications and inverses in GF(2-m)

    Science.gov (United States)

    Wang, C. C.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.; Omura, J. K.; Reed, I. S.

    1983-01-01

    Finite field arithmetic logic is central in the implementation of Reed-Solomon coders and in some cryptographic algorithms. There is a need for good multiplication and inversion algorithms that are easily realized on VLSI chips. Massey and Omura recently developed a new multiplication algorithm for Galois fields based on a normal basis representation. A pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m). With the simple squaring property of the normal-basis representation used together with this multiplier, a pipeline architecture is also developed for computing inverse elements in GF(2m). The designs developed for the Massey-Omura multiplier and the computation of inverse elements are regular, simple, expandable and, therefore, naturally suitable for VLSI implementation.

  3. VLSI architectures for computing multiplications and inverses in GF(2m)

    Science.gov (United States)

    Wang, C. C.; Truong, T. K.; Shao, H. M.; Deutsch, L. J.; Omura, J. K.

    1985-01-01

    Finite field arithmetic logic is central in the implementation of Reed-Solomon coders and in some cryptographic algorithms. There is a need for good multiplication and inversion algorithms that are easily realized on VLSI chips. Massey and Omura recently developed a new multiplication algorithm for Galois fields based on a normal basis representation. A pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m). With the simple squaring property of the normal-basis representation used together with this multiplier, a pipeline architecture is also developed for computing inverse elements in GF(2m). The designs developed for the Massey-Omura multiplier and the computation of inverse elements are regular, simple, expandable and, therefore, naturally suitable for VLSI implementation.

  4. Message-Driven Processor Architecture Version 11

    Science.gov (United States)

    1988-08-18

    UNCLASSIFIED . $CUUIT. v A$SIf9CAYON Or IMIS SAGE ’Whlken Dese E,...’lld) __ REPO_Or T CU NT PAGE ateREAD INSTRUCTIONS REPORT DOCUmtNTATION PAGE...fields instead of 2. This reflects the change in machine topology from 2D to 3D . Also, the NNR is no longer set to zero on a reset; it is left to...an X field, a Y field and a Z field indicating the position of the node in the 3D network grid. Its value identifies the processor on the network and

  5. Parallel processor simulator for multiple optic channel architectures

    Science.gov (United States)

    Wailes, Tom S.; Meyer, David G.

    1992-12-01

    A parallel processing architecture based on multiple channel optical communication is described and compared with existing interconnection strategies for parallel computers. The proposed multiple channel architecture (MCA) uses MQW-DBR lasers to provide a large number of independent, selectable channels (or virtual buses) for data transport. Arbitrary interconnection patterns as well as machine partitions can be emulated via appropriate channel assignments. Hierarchies of parallel architectures and simultaneous execution of parallel tasks are also possible. Described are a basic overview of the proposed architecture, various channel allocation strategies that can be utilized by the MCA, and a summary of advantages of the MCA compared with traditional interconnection techniques. Also describes is a comprehensive multiple processor simulator that has been developed to execute parallel algorithms using the MCA as a data transport mechanism between processors and memory units. Simulation results -- including average channel load, effective channel utilization, and average network latency for different algorithms and different transmission speeds -- are also presented.

  6. Architecture and Design of Medical Processor Units for Medical Networks

    CERN Document Server

    Ahamed, Syed V; 10.5121/ijcnc.2010.2602

    2011-01-01

    This paper introduces analogical and deductive methodologies for the design medical processor units (MPUs). From the study of evolution of numerous earlier processors, we derive the basis for the architecture of MPUs. These specialized processors perform unique medical functions encoded as medical operational codes (mopcs). From a pragmatic perspective, MPUs function very close to CPUs. Both processors have unique operation codes that command the hardware to perform a distinct chain of subprocesses upon operands and generate a specific result unique to the opcode and the operand(s). In medical environments, MPU decodes the mopcs and executes a series of medical sub-processes and sends out secondary commands to the medical machine. Whereas operands in a typical computer system are numerical and logical entities, the operands in medical machine are objects such as such as patients, blood samples, tissues, operating rooms, medical staff, medical bills, patient payments, etc. We follow the functional overlap betw...

  7. Reversible machine code and its abstract processor architecture

    DEFF Research Database (Denmark)

    Axelsen, Holger Bock; Glück, Robert; Yokoyama, Tetsuo

    2007-01-01

    A reversible abstract machine architecture and its reversible machine code are presented and formalized. For machine code to be reversible, both the underlying control logic and each instruction must be reversible. A general class of machine instruction sets was proven to be reversible, building ...... on our concept of reversible updates. The presentation is abstract and can serve as a guideline for a family of reversible processor designs. By example, we illustrate programming principles for the abstract machine architecture formalized in this paper....

  8. Opto-VLSI-based photonic true-time delay architecture for broadband adaptive nulling in phased array antennas.

    Science.gov (United States)

    Juswardy, Budi; Xiao, Feng; Alameh, Kamal

    2009-03-16

    This paper proposes a novel Opto-VLSI-based tunable true-time delay generation unit for adaptively steering the nulls of microwave phased array antennas. Arbitrary single or multiple true-time delays can simultaneously be synthesized for each antenna element by slicing an RF-modulated broadband optical source and routing specific sliced wavebands through an Opto-VLSI processor to a high-dispersion fiber. Experimental results are presented, which demonstrate the principle of the true-time delay unit through the generation of 5 arbitrary true-time delays of up to 2.5 ns each.

  9. An Efficient VLSI Architecture for Multi-Channel Spike Sorting Using a Generalized Hebbian Algorithm.

    Science.gov (United States)

    Chen, Ying-Lun; Hwang, Wen-Jyi; Ke, Chi-En

    2015-08-13

    A novel VLSI architecture for multi-channel online spike sorting is presented in this paper. In the architecture, the spike detection is based on nonlinear energy operator (NEO), and the feature extraction is carried out by the generalized Hebbian algorithm (GHA). To lower the power consumption and area costs of the circuits, all of the channels share the same core for spike detection and feature extraction operations. Each channel has dedicated buffers for storing the detected spikes and the principal components of that channel. The proposed circuit also contains a clock gating system supplying the clock to only the buffers of channels currently using the computation core to further reduce the power consumption. The architecture has been implemented by an application-specific integrated circuit (ASIC) with 90-nm technology. Comparisons to the existing works show that the proposed architecture has lower power consumption and hardware area costs for real-time multi-channel spike detection and feature extraction.

  10. VLSI architecture of a K-best detector for MIMO-OFDM wireless communication systems

    Energy Technology Data Exchange (ETDEWEB)

    Jian Haifang; Shi Yin, E-mail: jhf@semi.ac.c [Institute of Semiconductors, Chinese Academy of Sciences, Beijing 100083 (China)

    2009-07-15

    The K-best detector is considered as a promising technique in the MIMO-OFDM detection because of its good performance and low complexity. In this paper, a new K-best VLSI architecture is presented. In the proposed architecture, the metric computation units (MCUs) expand each surviving path only to its partial branches, based on the novel expansion scheme, which can predetermine the branches' ascending order by their local distances. Then a distributed sorter sorts out the new K surviving paths from the expanded branches in pipelines. Compared to the conventional K-best scheme, the proposed architecture can approximately reduce fundamental operations by 50% and 75% for the 16-QAM and the 64-QAM cases, respectively, and, consequently, lower the demand on the hardware resource significantly. Simulation results prove that the proposed architecture can achieve a performance very similar to conventional K-best detectors. Hence, it is an efficient solution to the K-best detector's VLSI implementation for high-throughput MIMO-OFDM systems.

  11. Vlsi implementation of flexible architecture for decision tree classification in data mining

    Science.gov (United States)

    Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

    2017-07-01

    The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.

  12. A Parallel-based Lifting Algorithm and VLSI Architecture for DWT

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    A novel Parallel-Based Lifting Algorithm (PBLA) for Discrete Wavelet Transform (DWT), exploiting the parallelism of arithmetic operations in all lifting steps, is proposed in this paper. It leads to reduce the critical path latency of computation, and to reduce the complexity of hardware implementation as well. The detailed derivation on the proposed algorithm, as well as the resulting Very Large Scale Integration (VLSI) architecture, is introduced, taking the 9/7 DWT as an example but without loss of generality. In comparison with the Conventional Lifting Algorithm Based Implementation (CLABI), the critical path latency of the proposed architecture is reduced by more than half from (4Tm + 8Ta)to Tm + 4Ta, and is competitive to that of Convolution-Based Implementation (CBI), but the new implementation will save significantly in hardware. The experimental results demonstrate that the proposed architecture has good performance in both increasing working frequency and reducing area.

  13. A unified approach to VLSI layout automation and algorithm mapping on processor arrays

    Science.gov (United States)

    Venkateswaran, N.; Pattabiraman, S.; Srinivasan, Vinoo N.

    1993-01-01

    Development of software tools for designing supercomputing systems is highly complex and cost ineffective. To tackle this a special purpose PAcube silicon compiler which integrates different design levels from cell to processor arrays has been proposed. As a part of this, we present in this paper a novel methodology which unifies the problems of Layout Automation and Algorithm Mapping.

  14. Behavioral Simulation and Performance Evaluation of Multi-Processor Architectures

    Directory of Open Access Journals (Sweden)

    Ausif Mahmood

    1996-01-01

    Full Text Available The development of multi-processor architectures requires extensive behavioral simulations to verify the correctness of design and to evaluate its performance. A high level language can provide maximum flexibility in this respect if the constructs for handling concurrent processes and a time mapping mechanism are added. This paper describes a novel technique for emulating hardware processes involved in a parallel architecture such that an object-oriented description of the design is maintained. The communication and synchronization between hardware processes is handled by splitting the processes into their equivalent subprograms at the entry points. The proper scheduling of these subprograms is coordinated by a timing wheel which provides a time mapping mechanism. Finally, a high level language pre-processor is proposed so that the timing wheel and the process emulation details can be made transparent to the user.

  15. Design and VLSI Implementation of Anticollision Enabled Robot Processor Using RFID Technology

    Directory of Open Access Journals (Sweden)

    Joyashree Bag

    2012-12-01

    Full Text Available RFID is a low power wireless emerging technology which has given rise to highly promising applications in real life. It can be employed for robot navigation. In multi-robot environment, when many robots are moving in the same work space, there is a possibility of their physical collision with themselves as well as with physical objects. In the present work, we have proposed and developed a processor incorporating smart algorithm for avoiding such collisions with the help of RFID technology and implemented it by using VHDL. The design procedure and the simulated results are very useful in designing and implementing a practical RFID system. The RTL schematic view of the processor is achieved by successfully synthesizing the proposed design.KEYWORDS

  16. An Efficient VLSI Architecture of the Enhanced Three Step Search Algorithm

    Science.gov (United States)

    Biswas, Baishik; Mukherjee, Rohan; Saha, Priyabrata; Chakrabarti, Indrajit

    2016-09-01

    The intense computational complexity of any video codec is largely due to the motion estimation unit. The Enhanced Three Step Search is a popular technique that can be adopted for fast motion estimation. This paper proposes a novel VLSI architecture for the implementation of the Enhanced Three Step Search Technique. A new addressing mechanism has been introduced which enhances the speed of operation and reduces the area requirements. The proposed architecture when implemented in Verilog HDL on Virtex-5 Technology and synthesized using Xilinx ISE Design Suite 14.1 achieves a critical path delay of 4.8 ns while the area comes out to be 2.9K gate equivalent. It can be incorporated in commercial devices like smart-phones, camcorders, video conferencing systems etc.

  17. A novel VLSI architecture of arithmetic encoder with reduced memory in SPIHT

    Science.gov (United States)

    Liu, Kai; Li, YunSong; Belyaev, Eugeniy

    2010-08-01

    The paper presents a context-based arithmetic coder's VLSI architecture used in SPIHT with reduced memory, which is used for high speed real-time applications. For hardware implementation, a dedicated context model is proposed for the coder. Each context can be processed in parallel and high speed operators are used for interval calculations. An embedded register array is used for cumulative frequency update. As a result, the coder can consume one symbol at each clock cycle. After FPGA synthesis and simulation, the throughput of our coder is comparable with those of similar hardware architectures used in ASIC technology. Especially, the memory capacity of the coder is smaller than those of corresponding systems.

  18. A Low Cost VLSI Architecture for Spike Sorting Based on Feature Extraction with Peak Search.

    Science.gov (United States)

    Chang, Yuan-Jyun; Hwang, Wen-Jyi; Chen, Chih-Chang

    2016-12-07

    The goal of this paper is to present a novel VLSI architecture for spike sorting with high classification accuracy, low area costs and low power consumption. A novel feature extraction algorithm with low computational complexities is proposed for the design of the architecture. In the feature extraction algorithm, a spike is separated into two portions based on its peak value. The area of each portion is then used as a feature. The algorithm is simple to implement and less susceptible to noise interference. Based on the algorithm, a novel architecture capable of identifying peak values and computing spike areas concurrently is proposed. To further accelerate the computation, a spike can be divided into a number of segments for the local feature computation. The local features are subsequently merged with the global ones by a simple hardware circuit. The architecture can also be easily operated in conjunction with the circuits for commonly-used spike detection algorithms, such as the Non-linear Energy Operator (NEO). The architecture has been implemented by an Application-Specific Integrated Circuit (ASIC) with 90-nm technology. Comparisons to the existing works show that the proposed architecture is well suited for real-time multi-channel spike detection and feature extraction requiring low hardware area costs, low power consumption and high classification accuracy.

  19. A Low Cost VLSI Architecture for Spike Sorting Based on Feature Extraction with Peak Search

    Directory of Open Access Journals (Sweden)

    Yuan-Jyun Chang

    2016-12-01

    Full Text Available The goal of this paper is to present a novel VLSI architecture for spike sorting with high classification accuracy, low area costs and low power consumption. A novel feature extraction algorithm with low computational complexities is proposed for the design of the architecture. In the feature extraction algorithm, a spike is separated into two portions based on its peak value. The area of each portion is then used as a feature. The algorithm is simple to implement and less susceptible to noise interference. Based on the algorithm, a novel architecture capable of identifying peak values and computing spike areas concurrently is proposed. To further accelerate the computation, a spike can be divided into a number of segments for the local feature computation. The local features are subsequently merged with the global ones by a simple hardware circuit. The architecture can also be easily operated in conjunction with the circuits for commonly-used spike detection algorithms, such as the Non-linear Energy Operator (NEO. The architecture has been implemented by an Application-Specific Integrated Circuit (ASIC with 90-nm technology. Comparisons to the existing works show that the proposed architecture is well suited for real-time multi-channel spike detection and feature extraction requiring low hardware area costs, low power consumption and high classification accuracy.

  20. VLSI electronics microstructure science

    CERN Document Server

    1981-01-01

    VLSI Electronics: Microstructure Science, Volume 3 evaluates trends for the future of very large scale integration (VLSI) electronics and the scientific base that supports its development.This book discusses the impact of VLSI on computer architectures; VLSI design and design aid requirements; and design, fabrication, and performance of CCD imagers. The approaches, potential, and progress of ultra-high-speed GaAs VLSI; computer modeling of MOSFETs; and numerical physics of micron-length and submicron-length semiconductor devices are also elaborated. This text likewise covers the optical linewi

  1. An Efficient VLSI Architecture for Multi-Channel Spike Sorting Using a Generalized Hebbian Algorithm

    Directory of Open Access Journals (Sweden)

    Ying-Lun Chen

    2015-08-01

    Full Text Available A novel VLSI architecture for multi-channel online spike sorting is presented in this paper. In the architecture, the spike detection is based on nonlinear energy operator (NEO, and the feature extraction is carried out by the generalized Hebbian algorithm (GHA. To lower the power consumption and area costs of the circuits, all of the channels share the same core for spike detection and feature extraction operations. Each channel has dedicated buffers for storing the detected spikes and the principal components of that channel. The proposed circuit also contains a clock gating system supplying the clock to only the buffers of channels currently using the computation core to further reduce the power consumption. The architecture has been implemented by an application-specific integrated circuit (ASIC with 90-nm technology. Comparisons to the existing works show that the proposed architecture has lower power consumption and hardware area costs for real-time multi-channel spike detection and feature extraction.

  2. Improved FFSBM Algorithm and Its VLSI Architecture for AVS Video Standard

    Institute of Scientific and Technical Information of China (English)

    Li Zhang; Don Xie; Di Wu

    2006-01-01

    The Video part of AVS (Audio Video Coding Standard) has been finalized recently. It has adopted variable block size motion compensation to improve its coding efficiency. This has brought heavy computation burden when it is applied to compress the HDTV (high definition television) content. Based on the original FFSBM (fast full search blocking matching),this paper proposes an improved FFSBM algorithm to adaptively reduce the complexity of motion estimation according to the actual motion intensity. The main idea of the proposed algorithm is to use the statistical distribution of MVD (motion vector difference). A VLSI (very large scale integration) architecture is also proposed to implement the improved motion estimation algorithm. Experimental results show that this algorithm-hardware co-design gives better tradeoff of gate-count and throughput than the existing ones and is a proper solution for the variable block size motion estimation in AVS.

  3. A radial basis function neurocomputer implemented with analog VLSI circuits

    Science.gov (United States)

    Watkins, Steven S.; Chau, Paul M.; Tawel, Raoul

    1992-01-01

    An electronic neurocomputer which implements a radial basis function neural network (RBFNN) is described. The RBFNN is a network that utilizes a radial basis function as the transfer function. The key advantages of RBFNNs over existing neural network architectures include reduced learning time and the ease of VLSI implementation. This neurocomputer is based on an analog/digital hybrid design and has been constructed with both custom analog VLSI circuits and a commercially available digital signal processor. The hybrid architecture is selected because it offers high computational performance while compensating for analog inaccuracies, and it features the ability to model large problems.

  4. Architecture-level performance/power tradeoff in network processor design

    Institute of Scientific and Technical Information of China (English)

    CHEN Hong-song; JI Zhen-zhou; HU Ming-zeng

    2007-01-01

    Network processors are used in the core node of network to flexibly process packet streams. With the increase of performance, the power of network processor increases fast, and power and cooling become a bottleneck. Architecture-level power conscious design must go beyond low-level circuit design. Architectural power and performance tradeoff should be considered at the same time. Simulation is an efficient method to design modern network processor before making chip. In order to achieve the tradeoff between performance and power,the processor simulator is used to design the architecture of network processor. Using Netbench, Commubench benchmark and processor simulator-SimpleScalar, the performance and power of network processor are quantitatively evaluated. New performance tradeoff evaluation metric is proposed to analyze the architecture of network processor. Based on the high performance Intel IXP 2800 Network processor configuration, optimized instruction fetch width and speed 、instruction issue width, instruction window size are analyzed and selected. Simulation results show that the tradeoff design method makes the usage of network processor more effectively. The optimal key parameters of network processor are important in architecture-level design. It is meaningful for the next generation network processor design.

  5. FY1995 study of design methodology and environment of high-performance processor architectures; 1995 nendo koseino processor architecture sekkeiho to sekkei kankyo no kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-03-01

    The aim of our project is to develop high-performance processor architectures for both general purpose and application-specific purpose. We also plan to develop basic softwares, such as compliers, and various design aid tools for those architectures. We are particularly interested in performance evaluation at architecture design phase, design optimization, automatic generation of compliers from processor designs, and architecture design methodologies combined with circuit layout. We have investigated both microprocessor architectures and design methodologies / environments for the processors. Our goal is to establish design technologies for high-performance, low-power, low-cost and highly-reliable systems in system-on-silicon era. We have proposed PPRAM architecture for high-performance system using DRAM and logic mixture technology, Softcore processor architecture for special purpose processors in embedded systems, and Power-Pro architecture for low power systems. We also developed design methodologies and design environments for the above architectures as well as a new method for design verification of microprocessors. (NEDO)

  6. Soft-core dataflow processor architecture optimised for radar signal processing: Article

    CSIR Research Space (South Africa)

    Broich, R

    2014-10-01

    Full Text Available an iterative design methodology to propose a novel softcore streaming processor architecture. The datapaths of this architecture are arranged in a circular pattern, with multiple operands simultaneously flowing between switching multiplexers and functional...

  7. VLSI ARCHITECTURE FOR IMAGE COMPRESSION THROUGH ADDER MINIMIZATION TECHNIQUE AT DCT STRUCTURE

    Directory of Open Access Journals (Sweden)

    N.R. Divya

    2014-08-01

    Full Text Available Data compression plays a vital role in multimedia devices to present the information in a succinct frame. Initially, the DCT structure is used for Image compression, which has lesser complexity and area efficient. Similarly, 2D DCT also has provided reasonable data compression, but implementation concern, it calls more multipliers and adders thus its lead to acquire more area and high power consumption. To contain an account of all, this paper has been dealt with VLSI architecture for image compression using Rom free DA based DCT (Discrete Cosine Transform structure. This technique provides high-throughput and most suitable for real-time implementation. In order to achieve this image matrix is subdivided into odd and even terms then the multiplication functions are removed by shift and add approach. Kogge_Stone_Adder techniques are proposed for obtaining a bit-wise image quality which determines the new trade-off levels as compared to the previous techniques. Overall the proposed architecture produces reduced memory, low power consumption and high throughput. MATLAB is used as a funding tool for receiving an input pixel and obtaining output image. Verilog HDL is used for implementing the design, Model Sim for simulation, Quatres II is used to synthesize and obtain details about power and area.

  8. A Design Methodology for Folded, Pipelined Architectures in VLSI Applications using Projective Space Lattices

    CERN Document Server

    Sharma, Hrishikesh

    2011-01-01

    Semi-parallel, or folded, VLSI architectures are used whenever hardware resources need to be saved at design time. Most recent applications that are based on Projective Geometry (PG) based balanced bipartite graph also fall in this category. In this paper, we provide a high-level, top-down design methodology to design optimal semi-parallel architectures for applications, whose Data Flow Graph (DFG) is based on PG bipartite graph. Such applications have been found e.g. in error-control coding and matrix computations. Unlike many other folding schemes, the topology of connections between physical elements does not change in this methodology. Another advantage is the ease of implementation. To lessen the throughput loss due to folding, we also incorporate a pipelining strategy in the design methodology. A complete decoder has been prototyped for proof of concept, and is publicly available. Another specific high-performance design of an LDPC decoder based on this methodology was worked out in past, and has been p...

  9. High-performance VLSI architectures for turbo decoders with QPP interleaver

    Science.gov (United States)

    Verma, Shivani; Kumar, S.

    2015-04-01

    This paper analyses different VLSI architectures for 3GPP LTE/LTE-advanced turbo decoders for trade-offs in terms of throughput and area requirement. Data flow graphs for standard SISO MAP (maximum a posteriori) turbo decoder, SW - SISO MAP turbo decoder, PW SISO MAP turbo decoder have been presented, thus analysing their performance. Two variants of quadratic permutation polynomial (QPP) interleaver have been proposed which tend to simplify the complexity of 'mod' operator implementation and provide best compromise between area, delay and power dissipation. Implementation of decoder using one variant of QPP interleaver has also been discussed. A novel approach for area optimisation has been proposed to reduce required number of interleavers for parallel window turbo decoder. Multi-port memory has also been used for parallel turbo decoder. To increase the throughput without any effective increase in area complexity, circuit-level pipelining and retiming have been used. Proposed architectures have been synthesised using Synopsys Design Compiler using 45-nm CMOS technology.

  10. Directions in parallel processor architecture, and GPUs too

    CERN Document Server

    CERN. Geneva

    2014-01-01

    Modern computing is power-limited in every domain of computing. Performance increments extracted from instruction-level parallelism (ILP) are no longer power-efficient; they haven't been for some time. Thread-level parallelism (TLP) is a more easily exploited form of parallelism, at the expense of programmer effort to expose it in the program. In this talk, I will introduce you to disparate topics in parallel processor architecture that will impact programming models (and you) in both the near and far future. About the speaker Olivier is a senior GPU (SM) architect at NVIDIA and an active participant in the concurrency working group of the ISO C++ committee. He has also worked on very large diesel engines as a mechanical engineer, and taught at McGill University (Canada) as a faculty instructor.

  11. Digital signal processor and its application. ; Animation processing DSP architectures. Digital signal processor to sono oyo. ; Dogazo shoriyo DSP architecture

    Energy Technology Data Exchange (ETDEWEB)

    Murakami, T.; Ohira, H. (Mitsubishi Electric Corp., Tokyo (Japan))

    1991-12-20

    A description is given on the internationally standardized animation coding system, and existing and next generation type image processing digital signal processor (DSP) architectures. The internationally standardized animation coding system stratifies images into segments of picture element, frame, and block, with each stratum given exclusive processing. A TV conference and a TV telephone conversation require a huge amount of animation image data. To process these data on a real-time basis, the current video image processing system takes a multi DSP configuration. Methods to split the loads and allocate each load fixedly to each DSP are classified into splitting the loads in the units of coding processing function, the object images, and the loads according to amounts of loads to be calculated. These splitting methods are applied to each stratum processing. The process and splitting corresponding to each stratum processing improved the efficiency. Since the method is a software-based processing, it can be applied not only to the irternationally standardized system, but also to the vector quantization system. Although the present LSI technology is not sufficiently capable to mount the architectures meeting the stratified configuration on one chip, an architecture that specializes the functions in each stratum has a possibility to serve as one chip DSP. 14 refs., 5 figs., 4 tabs.

  12. Motion-sensor fusion-based gesture recognition and its VLSI architecture design for mobile devices

    Science.gov (United States)

    Zhu, Wenping; Liu, Leibo; Yin, Shouyi; Hu, Siqi; Tang, Eugene Y.; Wei, Shaojun

    2014-05-01

    With the rapid proliferation of smartphones and tablets, various embedded sensors are incorporated into these platforms to enable multimodal human-computer interfaces. Gesture recognition, as an intuitive interaction approach, has been extensively explored in the mobile computing community. However, most gesture recognition implementations by now are all user-dependent and only rely on accelerometer. In order to achieve competitive accuracy, users are required to hold the devices in predefined manner during the operation. In this paper, a high-accuracy human gesture recognition system is proposed based on multiple motion sensor fusion. Furthermore, to reduce the energy overhead resulted from frequent sensor sampling and data processing, a high energy-efficient VLSI architecture implemented on a Xilinx Virtex-5 FPGA board is also proposed. Compared with the pure software implementation, approximately 45 times speed-up is achieved while operating at 20 MHz. The experiments show that the average accuracy for 10 gestures achieves 93.98% for user-independent case and 96.14% for user-dependent case when subjects hold the device randomly during completing the specified gestures. Although a few percent lower than the conventional best result, it still provides competitive accuracy acceptable for practical usage. Most importantly, the proposed system allows users to hold the device randomly during operating the predefined gestures, which substantially enhances the user experience.

  13. VLSI design of an RSA encryption/decryption chip using systolic array based architecture

    Science.gov (United States)

    Sun, Chi-Chia; Lin, Bor-Shing; Jan, Gene Eu; Lin, Jheng-Yi

    2016-09-01

    This article presents the VLSI design of a configurable RSA public key cryptosystem supporting the 512-bit, 1024-bit and 2048-bit based on Montgomery algorithm achieving comparable clock cycles of current relevant works but with smaller die size. We use binary method for the modular exponentiation and adopt Montgomery algorithm for the modular multiplication to simplify computational complexity, which, together with the systolic array concept for electric circuit designs effectively, lower the die size. The main architecture of the chip consists of four functional blocks, namely input/output modules, registers module, arithmetic module and control module. We applied the concept of systolic array to design the RSA encryption/decryption chip by using VHDL hardware language and verified using the TSMC/CIC 0.35 m 1P4 M technology. The die area of the 2048-bit RSA chip without the DFT is 3.9 × 3.9 mm2 (4.58 × 4.58 mm2 with DFT). Its average baud rate can reach 10.84 kbps under a 100 MHz clock.

  14. NASA Space Engineering Research Center for VLSI systems design

    Science.gov (United States)

    1991-01-01

    This annual review reports the center's activities and findings on very large scale integration (VLSI) systems design for 1990, including project status, financial support, publications, the NASA Space Engineering Research Center (SERC) Symposium on VLSI Design, research results, and outreach programs. Processor chips completed or under development are listed. Research results summarized include a design technique to harden complementary metal oxide semiconductors (CMOS) memory circuits against single event upset (SEU); improved circuit design procedures; and advances in computer aided design (CAD), communications, computer architectures, and reliability design. Also described is a high school teacher program that exposes teachers to the fundamentals of digital logic design.

  15. DFT algorithms for bit-serial GaAs array processor architectures

    Science.gov (United States)

    Mcmillan, Gary B.

    1988-01-01

    Systems and Processes Engineering Corporation (SPEC) has developed an innovative array processor architecture for computing Fourier transforms and other commonly used signal processing algorithms. This architecture is designed to extract the highest possible array performance from state-of-the-art GaAs technology. SPEC's architectural design includes a high performance RISC processor implemented in GaAs, along with a Floating Point Coprocessor and a unique Array Communications Coprocessor, also implemented in GaAs technology. Together, these data processors represent the latest in technology, both from an architectural and implementation viewpoint. SPEC has examined numerous algorithms and parallel processing architectures to determine the optimum array processor architecture. SPEC has developed an array processor architecture with integral communications ability to provide maximum node connectivity. The Array Communications Coprocessor embeds communications operations directly in the core of the processor architecture. A Floating Point Coprocessor architecture has been defined that utilizes Bit-Serial arithmetic units, operating at very high frequency, to perform floating point operations. These Bit-Serial devices reduce the device integration level and complexity to a level compatible with state-of-the-art GaAs device technology.

  16. Opto-VLSI-based N × M wavelength selective switch.

    Science.gov (United States)

    Xiao, Feng; Alameh, Kamal

    2013-07-29

    In this paper, we propose and experimentally demonstrate a novel N × M wavelength selective switch (WSS) architecture based on the use of an Opto-VLSI processor. Through a two-stage beamsteering process, wavelength channels from any input optical fiber port can be switched into any output optical fiber port. A proof-of-concept 2 × 3 WSS structure is developed, demonstrating flexible wavelength selective switching with an insertion loss around 15 dB.

  17. Design and Implementation of 64-Bit Execute Stage for VLIW Processor Architecture on FPGA

    Directory of Open Access Journals (Sweden)

    Manju Rani

    2012-07-01

    Full Text Available FPGA implementation of 64-bit execute unit for VLIW processor, and improve power representation have been done in this paper. VHDL is used to modelled this architecture. VLIW stands for Very Long Instruction Word. This Processor Architecture is based on parallel processing in which more than one instruction is executed in parallel. This architecture is used to increase the instruction throughput. So this is the base of the modern Superscalar Processors. Basically VLIW is a RISC Processor. The difference is it contains long instruction as compared to RISC. This stage of the pipeline executes the instruction. This is the stage where the ALU (arithmetic logic unit is located. Execute stage are synthesized and targeted for Xilinx Virtex 4 FPGA and the results calculated for 64-bit Execute stage improve the power as compared to previous work done.

  18. VLSI design

    CERN Document Server

    Basu, D K

    2014-01-01

    Very Large Scale Integrated Circuits (VLSI) design has moved from costly curiosity to an everyday necessity, especially with the proliferated applications of embedded computing devices in communications, entertainment and household gadgets. As a result, more and more knowledge on various aspects of VLSI design technologies is becoming a necessity for the engineering/technology students of various disciplines. With this goal in mind the course material of this book has been designed to cover the various fundamental aspects of VLSI design, like Categorization and comparison between various technologies used for VLSI design Basic fabrication processes involved in VLSI design Design of MOS, CMOS and Bi CMOS circuits used in VLSI Structured design of VLSI Introduction to VHDL for VLSI design Automated design for placement and routing of VLSI systems VLSI testing and testability The various topics of the book have been discussed lucidly with analysis, when required, examples, figures and adequate analytical and the...

  19. FPGA wavelet processor design using language for instruction-set architectures (LISA)

    Science.gov (United States)

    Meyer-Bäse, Uwe; Vera, Alonzo; Rao, Suhasini; Lenk, Karl; Pattichis, Marios

    2007-04-01

    The design of an microprocessor is a long, tedious, and error-prone task consisting of typically three design phases: architecture exploration, software design (assembler, linker, loader, profiler), architecture implementation (RTL generation for FPGA or cell-based ASIC) and verification. The Language for instruction-set architectures (LISA) allows to model a microprocessor not only from instruction-set but also from architecture description including pipelining behavior that allows a design and development tool consistency over all levels of the design. To explore the capability of the LISA processor design platform a.k.a. CoWare Processor Designer we present in this paper three microprocessor designs that implement a 8/8 wavelet transform processor that is typically used in today's FBI fingerprint compression scheme. We have designed a 3 stage pipelined 16 bit RISC processor (NanoBlaze). Although RISC μPs are usually considered "fast" processors due to design concept like constant instruction word size, deep pipelines and many general purpose registers, it turns out that DSP operations consume essential processing time in a RISC processor. In a second step we have used design principles from programmable digital signal processor (PDSP) to improve the throughput of the DWT processor. A multiply-accumulate operation along with indirect addressing operation were the key to achieve higher throughput. A further improvement is possible with today's FPGA technology. Today's FPGAs offer a large number of embedded array multipliers and it is now feasible to design a "true" vector processor (TVP). A multiplication of two vectors can be done in just one clock cycle with our TVP, a complete scalar product in two clock cycles. Code profiling and Xilinx FPGA ISE synthesis results are provided that demonstrate the essential improvement that a TVP has compared with traditional RISC or PDSP designs.

  20. Architecture and Design of Medical Processor Units for Medical Networks

    Directory of Open Access Journals (Sweden)

    Syed V. Ahamed

    2010-11-01

    Full Text Available This paper1 introduces analogical and deductive methodologies for the design medical processor units(MPUs. From the study of evolution of numerous earlier processors, we derive the basis for thearchitecture of MPUs. These specialized processors perform unique medical functions encoded as medicaloperational codes (mopcs. From a pragmatic perspective, MPUs function very close to CPUs. Bothprocessors have unique operation codes that command the hardware to perform a distinct chain of subprocessesupon operands and generate a specific result unique to the opcode and the operand(s. Inmedical environments, MPU decodes the mopcs and executes a series of medical sub-processes and sendsout secondary commands to the medical machine. Whereas operands in a typical computer system arenumerical and logical entities, the operands in medical machine are objects such as such as patients, bloodsamples, tissues, operating rooms, medical staff, medical bills, patient payments, etc. We follow thefunctional overlap between the two processes and evolve the design of medical computer systems andnetworks.

  1. An efficient hardware architecture of a scalable elliptic curve crypto-processor over GF(2n)

    Science.gov (United States)

    Tawalbeh, Lo'ai; Tenca, Alexandre; Park, Song; Koc, Cetin

    2005-08-01

    This paper presents a scalable Elliptic Curve Crypto-Processor (ECCP) architecture for computing the point multiplication for curves defined over the binary extension fields (GF(2n)). This processor computes modular inverse and Montgomery modular multiplication using a new effcient algorithm. The scalability feature of the proposed crypto-processor allows a fixed-area datapath to handle operands of any size. Also, the word size of the datapath can be adjusted to meet the area and performance requirements. On the other hand, the processor is reconfigurable in the sense that the user has the ability to choose the value of the field parameter (n). Experimental results show that the proposed crypto-processor is competitive with many other previous designs.

  2. MGSim - simulation tools for multi-core processor architectures

    NARCIS (Netherlands)

    Lankamp, M.; Poss, R.; Yang, Q.; Fu, J.; Uddin, I.; Jesshope, C.R.

    2013-01-01

    MGSim is an open source discrete event simulator for on-chip hardware components, developed at the University of Amsterdam. It is intended to be a research and teaching vehicle to study the fine-grained hardware/software interactions on many-core and hardware multithreaded processors. It includes su

  3. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

    Science.gov (United States)

    Sharma, Anuj; Manolakos, Elias S

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.

  4. The Distributed Network Processor: a novel off-chip and on-chip interconnection network architecture

    CERN Document Server

    Biagioni, Andrea; Lonardo, Alessandro; Paolucci, Pier Stanislao; Perra, Mersia; Rossetti, Davide; Sidore, Carlo; Simula, Francesco; Tosoratto, Laura; Vicini, Piero

    2012-01-01

    One of the most demanding challenges for the designers of parallel computing architectures is to deliver an efficient network infrastructure providing low latency, high bandwidth communications while preserving scalability. Besides off-chip communications between processors, recent multi-tile (i.e. multi-core) architectures face the challenge for an efficient on-chip interconnection network between processor's tiles. In this paper, we present a configurable and scalable architecture, based on our Distributed Network Processor (DNP) IP Library, targeting systems ranging from single MPSoCs to massive HPC platforms. The DNP provides inter-tile services for both on-chip and off-chip communications with a uniform RDMA style API, over a multi-dimensional direct network with a (possibly) hybrid topology.

  5. Tinuso: A processor architecture for a multi-core hardware simulation platform

    DEFF Research Database (Denmark)

    Schleuniger, Pascal; Karlsson, Sven

    2010-01-01

    Multi-core systems have the potential to improve performance, energy and cost properties of embedded systems but also require new design methods and tools to take advantage of the new architectures. Due to the limited accuracy and performance of pure software simulators, we are working on a cycle...... accurate hardware simulation platform. We have developed the Tinuso processor architecture for this platform. Tinuso is a processor architecture optimized for FPGA implementation. The instruction set makes use of predicated instructions and supports C/C++ and assembly language programming. It is designed...... to be easy extendable to maintain the exibility required for the research on multi-core systems. Tinuso contains a co-processor interface to connect to a network interface. This interface allow for communication over an on-chip network. A clock frequency estimation study on a deeply pipelined Tinuso...

  6. A Reversible Processor Architecture and its Reversible Logic Design

    DEFF Research Database (Denmark)

    Thomsen, Michael Kirkedal; Axelsen, Holger Bock; Glück, Robert

    2012-01-01

    We describe the design of a purely reversible computing architecture, Bob, and its instruction set, BobISA. The special features of the design include a simple, yet expressive, locally-invertible instruction set, and fully reversible control logic and address calculation. We have designed...... an architecture with an ISA that is expressive enough to serve as the target for a compiler from a high-level structured reversible programming language. All-in-all, this paper demonstrates that the design of a complete reversible computing architecture is possible and can serve as the core of a programmable...

  7. Design of Variable Width Barrel Shifter for High Speed Processor Architecture

    Directory of Open Access Journals (Sweden)

    Rajeev Kumar

    2012-04-01

    Full Text Available Microprocessor is the brain of the computer. It works as the Central Processing Unit of the computer. It contains Arithmetic Logical Unit (ALU that performs the arithmetic operations such as Addition, Subtraction, Multiplication and Division. It also performs the Logical operations such as AND, NAND, OR, NOR, EXOR, EXNOR and NOT. It also contains register file to store the operand in load/store instructions in RISC Processor Architecture. Control Unit genetares the control signals that synchronize the operation of the processor which tells the microarchitecture which operation is done at which time. Now during the multiplication partial product is shifted and added. So shifter is an important part of the processor architecture. Barrel Shifter is an important combinational logic block. It was incorporated in 386 processor and is also used in microcontroller design. Intel has since moved to software implemented shifters in the Pentium 4 Processor Architecture but AMD still uses it. Here the design of the variable width barrel shifter is presented in which we can shift 4bit, 8bit, 16bit, and 32bit and maximum 64bit partial product during multiplication. Functionality is check using Modelsim 6.4a.Now to generate the gate level netlist Xilinx ISE 9.2i is used.

  8. Scalable architecture for a room temperature solid-state quantum information processor.

    Science.gov (United States)

    Yao, N Y; Jiang, L; Gorshkov, A V; Maurer, P C; Giedke, G; Cirac, J I; Lukin, M D

    2012-04-24

    The realization of a scalable quantum information processor has emerged over the past decade as one of the central challenges at the interface of fundamental science and engineering. Here we propose and analyse an architecture for a scalable, solid-state quantum information processor capable of operating at room temperature. Our approach is based on recent experimental advances involving nitrogen-vacancy colour centres in diamond. In particular, we demonstrate that the multiple challenges associated with operation at ambient temperature, individual addressing at the nanoscale, strong qubit coupling, robustness against disorder and low decoherence rates can be simultaneously achieved under realistic, experimentally relevant conditions. The architecture uses a novel approach to quantum information transfer and includes a hierarchy of control at successive length scales. Moreover, it alleviates the stringent constraints currently limiting the realization of scalable quantum processors and will provide fundamental insights into the physics of non-equilibrium many-body quantum systems.

  9. Extending and implementing the Self-adaptive Virtual Processor for distributed memory architectures

    NARCIS (Netherlands)

    van Tol, M.W.; Koivisto, J.

    2011-01-01

    Many-core architectures of the future are likely to have distributed memory organizations and need fine grained concurrency management to be used effectively. The Self-adaptive Virtual Processor (SVP) is an abstract concurrent programming model which can provide this, but the model and its current i

  10. Extending and implementing the Self-adaptive Virtual Processor for distributed memory architectures

    NARCIS (Netherlands)

    van Tol, M.W.; Koivisto, J.

    2011-01-01

    Many-core architectures of the future are likely to have distributed memory organizations and need fine grained concurrency management to be used effectively. The Self-adaptive Virtual Processor (SVP) is an abstract concurrent programming model which can provide this, but the model and its current

  11. Reversible machine code and its abstract processor architecture

    DEFF Research Database (Denmark)

    Axelsen, Holger Bock; Glück, Robert; Yokoyama, Tetsuo

    2007-01-01

    A reversible abstract machine architecture and its reversible machine code are presented and formalized. For machine code to be reversible, both the underlying control logic and each instruction must be reversible. A general class of machine instruction sets was proven to be reversible, building...

  12. Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor Architecture

    Institute of Scientific and Technical Information of China (English)

    郑方; 李宏亮; 吕晖; 过锋; 许晓红; 谢向辉

    2015-01-01

    Due to advances in semiconductor techniques, many-core processors have been widely used in high performance computing. However, many applications still cannot be carried out efficiently due to the memory wall, which has become a bottleneck in many-core processors. In this paper, we present a novel heterogeneous many-core processor architecture named deeply fused many-core (DFMC) for high performance computing systems. DFMC integrates management processing ele-ments (MPEs) and computing processing elements (CPEs), which are heterogeneous processor cores for different application features with a unified ISA (instruction set architecture), a unified execution model, and share-memory that supports cache coherence. The DFMC processor can alleviate the memory wall problem by combining a series of cooperative computing techniques of CPEs, such as multi-pattern data stream transfer, efficient register-level communication mechanism, and fast hardware synchronization technique. These techniques are able to improve on-chip data reuse and optimize memory access performance. This paper illustrates an implementation of a full system prototype based on FPGA with four MPEs and 256 CPEs. Our experimental results show that the effect of the cooperative computing techniques of CPEs is significant, with DGEMM (double-precision matrix multiplication) achieving an efficiency of 94%, FFT (fast Fourier transform) obtaining a performance of 207 GFLOPS and FDTD (finite-difference time-domain) obtaining a performance of 27 GFLOPS.

  13. Novel CCD image processor for Z-plane architecture

    Science.gov (United States)

    Kemeny, S. E.; Eid, E.-S.; Fossum, E. R.

    1989-09-01

    The use of charge-coupled device (CCD) circuits in Z-plane architectures for focal-plane image processing is discussed. The low-power, compact layout nature of CCDs makes them attractive for Z-plane application. Three application areas are addressed: non-uniformity compensation using CCD MDAC circuits, neighborhood image processing functions implemented with CCD circuits, and the use of CCDs for buffering multiple image frames. Such buffering enables spatial-temporal image transformation for lossless compression.

  14. Advanced Avionics and Processor Systems for a Flexible Space Exploration Architecture

    Science.gov (United States)

    Keys, Andrew S.; Adams, James H.; Smith, Leigh M.; Johnson, Michael A.; Cressler, John D.

    2010-01-01

    The Advanced Avionics and Processor Systems (AAPS) project, formerly known as the Radiation Hardened Electronics for Space Environments (RHESE) project, endeavors to develop advanced avionic and processor technologies anticipated to be used by NASA s currently evolving space exploration architectures. The AAPS project is a part of the Exploration Technology Development Program, which funds an entire suite of technologies that are aimed at enabling NASA s ability to explore beyond low earth orbit. NASA s Marshall Space Flight Center (MSFC) manages the AAPS project. AAPS uses a broad-scoped approach to developing avionic and processor systems. Investment areas include advanced electronic designs and technologies capable of providing environmental hardness, reconfigurable computing techniques, software tools for radiation effects assessment, and radiation environment modeling tools. Near-term emphasis within the multiple AAPS tasks focuses on developing prototype components using semiconductor processes and materials (such as Silicon-Germanium (SiGe)) to enhance a device s tolerance to radiation events and low temperature environments. As the SiGe technology will culminate in a delivered prototype this fiscal year, the project emphasis shifts its focus to developing low-power, high efficiency total processor hardening techniques. In addition to processor development, the project endeavors to demonstrate techniques applicable to reconfigurable computing and partially reconfigurable Field Programmable Gate Arrays (FPGAs). This capability enables avionic architectures the ability to develop FPGA-based, radiation tolerant processor boards that can serve in multiple physical locations throughout the spacecraft and perform multiple functions during the course of the mission. The individual tasks that comprise AAPS are diverse, yet united in the common endeavor to develop electronics capable of operating within the harsh environment of space. Specifically, the AAPS tasks for

  15. Stagnant Timing investigation of Embedded Software on Advanced Processor Architectures

    Directory of Open Access Journals (Sweden)

    M.Shankar

    2012-01-01

    Full Text Available Most processors today are embedded inproducts like mobile phones, microwave owns, weldingmachines etc and are not used in PC’s as many believeSince some of these embedded computers are used in time-critical or safety-critical systems it is very important thatthe behaviour of these systems are well known. One part ofthat is to know the Worst Case Execution Time (WCET ofthe different tasks in the embedded system. First,shortcomings in current as well as future standards tocontrolling the power grid are outlined. From theseeconomic and safety threats, we derive an immediate needto invest in research on the protection of the power grid,both from the perspective of cyber attacks and distributedcontrol system problems. Second, current software designpractice does not adequately verify and validate worst-casetiming scenarios that have to be guaranteed in order tomeet deadlines in safety-critical embedded systems. Thisequally applies to avionics and the automotive industry,both of which are increasingly requiring their suppliers toprovide variable bounds on worst-case execution time ofsoftware.

  16. VLSI design

    CERN Document Server

    Chandrasetty, Vikram Arkalgud

    2011-01-01

    This book provides insight into the practical design of VLSI circuits. It is aimed at novice VLSI designers and other enthusiasts who would like to understand VLSI design flows. Coverage includes key concepts in CMOS digital design, design of DSP and communication blocks on FPGAs, ASIC front end and physical design, and analog and mixed signal design. The approach is designed to focus on practical implementation of key elements of the VLSI design process, in order to make the topic accessible to novices. The design concepts are demonstrated using software from Mathworks, Xilinx, Mentor Graphic

  17. VLSI metallization

    CERN Document Server

    Einspruch, Norman G; Gildenblat, Gennady Sh

    1987-01-01

    VLSI Electronics Microstructure Science, Volume 15: VLSI Metallization discusses the various issues and problems related to VLSI metallization. It details the available solutions and presents emerging trends.This volume is comprised of 10 chapters. The two introductory chapters, Chapter 1 and 2 serve as general references for the electrical and metallurgical properties of thin conducting films. Subsequent chapters review the various aspects of VLSI metallization. The order of presentation has been chosen to follow the common processing sequence. In Chapter 3, some relevant metal deposition tec

  18. Scalable Architecture for a Room Temperature Solid-State Quantum Information Processor

    CERN Document Server

    Yao, Norman Y; Gorshkov, Alexey V; Maurer, Peter C; Giedke, Geza; Cirac, J Ignacio; Lukin, Mikhail D

    2010-01-01

    The realization of a scalable quantum information processor has emerged over the past decade as one of the central challenges at the interface of fundamental science and engineering. Much progress has been made towards this goal. Indeed, quantum operations have been demonstrated on several trapped ion qubits, and other solid-state systems are approaching similar levels of control. Extending these techniques to achieve fault-tolerant operations in larger systems with more qubits remains an extremely challenging goal, in part, due to the substantial technical complexity of current implementations. Here, we propose and analyze an architecture for a scalable, solid-state quantum information processor capable of operating at or near room temperature. The architecture is applicable to realistic conditions, which include disorder and relevant decoherence mechanisms, and includes a hierarchy of control at successive length scales. Our approach is based upon recent experimental advances involving Nitrogen-Vacancy colo...

  19. High-performance hardware architecture of elliptic curve cryptography processor over GF(2163)

    Institute of Scientific and Technical Information of China (English)

    Yong-ping DAN; Xue-cheng ZOU; Zheng-lin LIU; Yu HAN; Li-hua YI

    2009-01-01

    We propose a novel high-performance hardware architecture of processor for elliptic curve scalar multiplication based on the Lopez-Dahab algorithm over GF(2163) in polynomial basis representation. The processor can do all the operations using an efficient modular arithmetic logic unit, which includes an addition unit, a square and a carefully designed multiplication unit. In the proposed architecture, multiplication, addition, and square can be performed in parallel by the decomposition of computation. The point addition and point doubling iteration operations can be performed in six multiplications by optimization and solution of data dependency. The implementation results based on Xilinx Virtexll XC2V6000 FPGA show that the proposed design can do random elliptic curve scalar multiplication GF(2163) in 34.11 μs, occupying 2821 registers and 13 376 LUTs.

  20. An FFT Performance Model for Optimizing General-Purpose Processor Architecture

    Institute of Scientific and Technical Information of China (English)

    Ling Li; Yun-Ji Chen; Dao-Fu Liu; Cheng Qian; Wei-Wu Hu

    2011-01-01

    General-purpose processor (GPP) is an important platform for fast Fourier transform (FFT),due to its flexibility,reliability and practicality.FFT is a representative application intensive in both computation and memory access,optimizing the FFT performance of a GPP also benefits the performances of many other applications.To facilitate the analysis of FFT,this paper proposes a theoretical model of the FFT processing.The model gives out a tight lower bound of the runtime of FFT on a GPP,and guides the architecture optimization for GPP as well.Based on the model,two theorems on optimization of architecture parameters are deduced,which refer to the lower bounds of register number and memory bandwidth.Experimental results on different processor architectures (including Intel Core i7 and Godson-3B) validate the performance model.The above investigations were adopted in the development of Godson-3B,which is an industrial GPP.The optimization techniques deduced from our performance model improve the FFT performance by about 40%,while incurring only 0.8% additional area cost.Consequently,Godson-3B solves the 1024-point single-precision complex FFT in 0.368 μs with about 40 Watt power consumption,and has the highest performance-per-watt in complex FFT among processors as far as we know.This work could benefit optimization of other GPPs as well.

  1. GASP-PL/I Simulation of Integrated Avionic System Processor Architectures. M.S. Thesis

    Science.gov (United States)

    Brent, G. A.

    1978-01-01

    A development study sponsored by NASA was completed in July 1977 which proposed a complete integration of all aircraft instrumentation into a single modular system. Instead of using the current single-function aircraft instruments, computers compiled and displayed inflight information for the pilot. A processor architecture called the Team Architecture was proposed. This is a hardware/software approach to high-reliability computer systems. A follow-up study of the proposed Team Architecture is reported. GASP-PL/1 simulation models are used to evaluate the operating characteristics of the Team Architecture. The problem, model development, simulation programs and results at length are presented. Also included are program input formats, outputs and listings.

  2. A VLSI design concept for parallel iterative algorithms

    Directory of Open Access Journals (Sweden)

    C. C. Sun

    2009-05-01

    Full Text Available Modern VLSI manufacturing technology has kept shrinking down to the nanoscale level with a very fast trend. Integration with the advanced nano-technology now makes it possible to realize advanced parallel iterative algorithms directly which was almost impossible 10 years ago. In this paper, we want to discuss the influences of evolving VLSI technologies for iterative algorithms and present design strategies from an algorithmic and architectural point of view. Implementing an iterative algorithm on a multiprocessor array, there is a trade-off between the performance/complexity of processors and the load/throughput of interconnects. This is due to the behavior of iterative algorithms. For example, we could simplify the parallel implementation of the iterative algorithm (i.e., processor elements of the multiprocessor array in any way as long as the convergence is guaranteed. However, the modification of the algorithm (processors usually increases the number of required iterations which also means that the switch activity of interconnects is increasing. As an example we show that a 25×25 full Jacobi EVD array could be realized into one single FPGA device with the simplified μ-rotation CORDIC architecture.

  3. A HIGH-PERFORMANCE VLSI ARCHITECTURE OF EBCOT BLOCK CODING IN JPEG2000

    Institute of Scientific and Technical Information of China (English)

    Liu Kai; Wu Chengke; Li Yunsong

    2006-01-01

    The paper presents a new architecture composed of bit plane-parallel coder for Embedded Block Coding with Optimized Truncation (EBCOT) entropy encoder used in JPEG2000. In the architecture, the coding information of each bit plane can be obtained simultaneously and processed parallel. Compared with other architectures, it has advantages of high parallelism, and no waste clock cycles for a single point. The experimental results show that it reduces the processing time about 86% than that of bit plane sequential scheme. A Field Programmable Gate Array (FPGA) prototype chip is designed and simulation results show that it can process 512×512 gray-scaled images with more than 30 frames per second at 52MHz.

  4. A coherent VLSI design environment

    Science.gov (United States)

    Penfield, Paul, Jr.

    1988-05-01

    The CAD effort is centered on timing analysis and circuit simulation. Advances have been made in tightening the bounds of timing analysis. The superiority of the Gauss-Jacobi technique for matrix solution, over the Gauss-Seidel method, has been proven when the algorithms are implemented on massively parallel machines. In the circuits area, one result of importance is a new technique for calculating the highest frequency of operation of transistors with parasitic elements present. Work on a synthesis technique is under way. In the architecture area, many new results have been derived for parallel algorithms and complexity. One of the most astonishing is that a hypercube with a large number of faulty nodes can be used, with high probability, as another perfectly functioning hypercube of half the size, by using reconfiguration algorithms that are simple, fast, and require only local information. Also, the design of the message-driven processor is continuing, with several advances in architecture, software, communications, and ALU design. Many of these are being implemented in VLSI circuits. The theory work has as a central theme that the cost of communication should be included in complexity analyses. This has led to advances in models for computation, including volume-universal networks, routing, network flow, fault avoidance, queue management, and network simulation.

  5. The Fifth NASA Symposium on VLSI Design

    Science.gov (United States)

    1993-01-01

    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design.

  6. Parallel optical interconnects utilizing VLSI/FLC spatial light modulators

    Science.gov (United States)

    Genco, Sheryl M.

    1991-12-01

    Interconnection architectures are a cornerstone of parallel computing systems. However, interconnections can be a bottleneck in conventional computer architectures because of queuing structures that are necessary to handle the traffic through a switch at very high data rates and bandwidths. These issues must find new solutions to advance the state of the art in computing beyond the fundamental limit of silicon logic technology. Today's optoelectronic (OE) technology in particular VLSI/FLC spatial light modulators (SLMs) can provide a unique and innovative solution to these issues. This paper reports on the motivations for the system, describes the major areas of architectural requirements, discusses interconnection topologies and processor element alternatives, and documents an optical arbitration (i.e., control) scheme using `smart' SLMs and optical logic gates. The network topology is given in section 2.1 `Architectural Requirements -- Networks,' but it should be noted that the emphasis is on the optical control scheme (section 2.4) and the system.

  7. An implantable VLSI architecture for real time spike sorting in cortically controlled Brain Machine Interfaces.

    Science.gov (United States)

    Aghagolzadeh, Mehdi; Zhang, Fei; Oweiss, Karim

    2010-01-01

    Brain Machine Interface (BMI) systems demand real-time spike sorting to instantaneously decode the spike trains of simultaneously recorded cortical neurons. Real-time spike sorting, however, requires extensive computational power that is not feasible to implement in implantable BMI architectures, thereby requiring transmission of high-bandwidth raw neural data to an external computer. In this work, we describe a miniaturized, low power, programmable hardware module capable of performing this task within the resource constraints of an implantable chip. The module computes a sparse representation of the spike waveforms followed by "smart" thresholding. This cascade restricts the sparse representation to a subset of projections that preserve the discriminative features of neuron-specific spike waveforms. In addition, it further reduces telemetry bandwidth making it feasible to wirelessly transmit only the important biological information to the outside world, thereby improving the efficiency, practicality and viability of BMI systems in clinical applications.

  8. DATA BYPASSING ARCHITECTURE AND CIRCUIT DESIGN FOR 32-BIT DIGITAL SIGNAL PROCESSOR

    Institute of Scientific and Technical Information of China (English)

    Chen Xiaoyi; Yao Qingdong; Liu Peng

    2005-01-01

    This paper presents a design method of ByPassing Unit(BPU) in 32-bit Digital Signal Processor(DSP)-MD32. MD32 is realized in 0,18μm technology, 1.8V and 200 MHz working clock. It focuses on the Reduced Instruction Set Computer(RISC) architecture and DSP computation capability thoroughly, extends DSP with various addressing modes in a customized DSP pipeline stage architecture. The paper also discusses the architecture and circuit design of bypassing logic to fit MD32 architecture. The parallel execution of BPU with instruction decode in architecture level is applied to reduce time delay. The optimization of circuit that serial select with priority is analyzed in detail, and the result shows that about half of time delay is reduced after this optimization. Examples show that BPU is useful for improving the DSP's performance.The forwarding logic in MD32 realizes 8 data channels feedback and meets the working clock limit.

  9. Server-Based Data Push Architecture for Multi-Processor Environments

    Institute of Scientific and Technical Information of China (English)

    Xian-He Sun; Surendra Byna; Yong Chen

    2007-01-01

    Data access delay is a major bottleneck in utilizing current high-end computing (HEC) machines. Prefetching, where data is fetched before CPU demands for it, has been considered as an effective solution to masking data access delay. However, current client-initiated prefetching strategies, where a computing processor initiates prefetching instructions, have many limitations. They do not work well for applications with complex, non-contiguous data access patterns. While technology advances continue to increase the gap between computing and data access performance,trading computing power for reducing data access delay has become a natural choice. In this paper, we present a serverbased data-push approach and discuss its associated implementation mechanisms. In the server-push architecture, a dedicated server called Data Push Server (DPS) initiates and proactively pushes data closer to the client in time. Issues,such as what data to fetch, when to fetch, and how to push are studied. The SimpleScalar simulator is modified with a dedicated prefetching engine that pushes data for another processor to test DPS based prefetching. Simulation results show that L1 Cache miss rate can be reduced by up to 97% (71% on average) over a superscalar processor for SPEC CPU2000 benchmarks that have high cache miss rates.

  10. Gilgamesh: A Multithreaded Processor-In-Memory Architecture for Petaflops Computing

    Science.gov (United States)

    Sterling, T. L.; Zima, H. P.

    2002-01-01

    Processor-in-Memory (PIM) architectures avoid the von Neumann bottleneck in conventional machines by integrating high-density DRAM and CMOS logic on the same chip. Parallel systems based on this new technology are expected to provide higher scalability, adaptability, robustness, fault tolerance and lower power consumption than current MPPs or commodity clusters. In this paper we describe the design of Gilgamesh, a PIM-based massively parallel architecture, and elements of its execution model. Gilgamesh extends existing PIM capabilities by incorporating advanced mechanisms for virtualizing tasks and data and providing adaptive resource management for load balancing and latency tolerance. The Gilgamesh execution model is based on macroservers, a middleware layer which supports object-based runtime management of data and threads allowing explicit and dynamic control of locality and load balancing. The paper concludes with a discussion of related research activities and an outlook to future work.

  11. Hardware Genetic Algorithm Optimization by Critical Path Analysis using a Custom VLSI Architecture

    Directory of Open Access Journals (Sweden)

    Farouk Smith

    2015-07-01

    Full Text Available This paper propose a Virtual-Field Programmable Gate Array (V-FPGA architecture that allows direct access to its configuration bits to facilitate hardware evolution, thereby allowing any combinational or sequential digital circuit to be realized. By using the V-FPGA, this paper investigates two possible ways of making evolutionary hardware systems more scalable: by optimizing the system’s genetic algorithm (GA; and by decomposing the solution circuit into smaller, evolvable sub-circuits. GA optimization is done by: omitting a canonical GA’s crossover operator (i.e. by using a 1+λ algorithm; applying evolution constraints; and optimizing the fitness function. A noteworthy contribution this research has made is the in-depth analysis of the phenotypes’ CPs. Through analyzing the CPs, it has been shown that a great amount of insight can be gained into a phenotype’s fitness. We found that as the number of columns in the Cartesian Genetic Programming array increases, so the likelihood of an external output being placed in the column decreases. Furthermore, the number of used LEs per column also substantially decreases per added column. Finally, we demonstrated the evolution of a state-decomposed control circuit. It was shown that the evolution of each state’s sub-circuit was possible, and suggest that modular evolution can be a successful tool when dealing with scalability.

  12. VLSI implementations for image communications

    CERN Document Server

    Pirsch, P

    1993-01-01

    The past few years have seen a rapid growth in image processing and image communication technologies. New video services and multimedia applications are continuously being designed. Essential for all these applications are image and video compression techniques. The purpose of this book is to report on recent advances in VLSI architectures and their implementation for video signal processing applications with emphasis on video coding for bit rate reduction. Efficient VLSI implementation for video signal processing spans a broad range of disciplines involving algorithms, architectures, circuits

  13. 低成本可调FFT处理器的超大规模集成电路设计%Low Cost VLSI Design of a Flexible FFT Processor

    Institute of Scientific and Technical Information of China (English)

    戴亦奇

    2011-01-01

    In this paper, a radix-22/23 based pipeline structure is presented, in order to implement a low-cost VLSI fast Fourier transform (FFT) processor. As well as reducing the steps of normal complex multiplications, it minimizes the memory words to get the FFT results with the single-path delay feedback (SDF) memory access method. As for the data-path in the pipeline FFT processor, the hybrid floating point data-scaling scheme is adopted to achieve enough signal-to-quantization-noise ratio with minimum data width and RAM requirements.%文章提出了一种以基-22/23为基础的流水线结构,用以实现低成本、超大规模集成电路(VLSI)的快速傅里叶变换(FFT)处理器设计。该处理器在减少普通复数乘法器级数的同时,通过单路延时反馈(SDF)存取方式,以最少的存储字来获得FFT结果。对于数据通路,我们采用了混合浮点的数据缩放方式,在保证信噪比的同时,降低了数据长度和RAM容量的需求。

  14. Associative Memory Design for the FastTrack Processor (FTK) at ATLAS

    CERN Document Server

    Annovi, A; The ATLAS collaboration; Volpi, G; Beccherle, R; Bossini, E; Crescioli, F; Dell'Orso, M; Giannetti, P; Amerio, S; Hoff, J; Liu, T; Sacco, I; Liberali, V; Stabile, A; Schoening, A; Soltveit, H; Tripiccione, R

    2011-01-01

    We propose a new generation of VLSI processor for pattern recognition based on Associative Memory architecture, optimized for on-line track finding in high-energy physics experiments. We describe the architecture, the technology studies and the prototype design of a new Associative Memory project: it maximizes the pattern density on ASICs, minimizes the power consumption and improves the functionality for the fast tracker processor proposed to upgrade the ATLAS trigger at LHC. Finally we will focus on possible future applications inside and outside high physics energy.

  15. Architecture-Aware Session Lookup Design for Inline Deep Inspection on Network Processors

    Institute of Scientific and Technical Information of China (English)

    XU Bo; HE Fei; XUE Yibo; LI Jun

    2009-01-01

    Today's firewalls and security gateways are required to not only block unauthorized accesses by authenticating packet headers,but also inspect flow payloads against malicious intrusions.Deep inspection emerges as a seamless integration of packet classification for access control and pattern matching for intrusion prevention.The two function blocks are linked together via well-designed session lookup schemes.This paper presents an architecture-aware session lookup scheme for deep inspection on network processors (NPs).Test results show that the proposed session data structure and integration approach can achieve the OC-48 line rate (2.5 Gbps) with inline stateful content inspection on the Intel IXP2850 NP.This work provides an insight into application design and implementation on NPs and principles for performance tuning of NP-based programming such as data allocation,task partitioning,latency hiding,and thread synchronization.

  16. ESAIR: A Behavior-Based Robotic Software Architecture on Multi-Core Processor Platforms

    Directory of Open Access Journals (Sweden)

    Chin-Yuan Tseng

    2013-03-01

    Full Text Available This paper introduces an Embedded Software Architecture for Intelligent Robot systems (ESAIR that addresses the issues of parallel thread executions on multi-core processor platforms. ESAIR provides a thread scheduling interface to improve the execution performance of a robot system by assigning a dedicated core to a running thread on the fly and dynamically rescheduling the priority of the thread. In the paper, we describe the object-oriented design and the control functions of ESAIR. The modular design of ESAIR helps improve the software quality, reliability and scalability in research and real practice. We prove the improvement by realizing ESAIR on an autonomous robot, named AVATAR. AVATAR implements various human-robot interactions, including speech recognition, human following, face recognition, speaker identification, etc. With the support of ESAIR, AVATAR can integrate a comprehensive set of behaviors and peripherals with better resource utilization.

  17. Evaluation of Dual-Launch Lunar Architectures Using the Mission Assessment Post Processor

    Science.gov (United States)

    Stewart, Shaun M.; Senent, Juan; Williams, Jacob; Condon, Gerald L.; Lee, David E.

    2010-01-01

    The National Aeronautics and Space Administrations (NASA) Constellation Program is currently designing a new transportation system to replace the Space Shuttle, support human missions to both the International Space Station (ISS) and the Moon, and enable the eventual establishment of an outpost on the lunar surface. The present Constellation architecture is designed to meet nominal capability requirements and provide flexibility sufficient for handling a host of contingency scenarios including (but not limited to) launch delays at the Earth. This report summarizes a body of work performed in support of the Review of U.S. Human Space Flight Committee. It analyzes three lunar orbit rendezvous dual-launch architecture options which incorporate differing methodologies for mitigating the effects of launch delays at the Earth. NASA employed the recently-developed Mission Assessment Post Processor (MAPP) tool to quickly evaluate vehicle performance requirements for several candidate approaches for conducting human missions to the Moon. The MAPP tool enabled analysis of Earth perturbation effects and Earth-Moon geometry effects on the integrated vehicle performance as it varies over the 18.6-year lunar nodal cycle. Results are provided summarizing best-case and worst-case vehicle propellant requirements for each architecture option. Additionally, the associated vehicle payload mass requirements at launch are compared between each architecture and against those of the Constellation Program. The current Constellation Program architecture assumes that the Altair lunar lander and Earth Departure Stage (EDS) vehicles are launched on a heavy lift launch vehicle. The Orion Crew Exploration Vehicle (CEV) is separately launched on a smaller man-rated vehicle. This strategy relaxes man-rating requirements for the heavy lift launch vehicle and has the potential to significantly reduce the cost of the overall architecture over the operational lifetime of the program. The crew launch

  18. An Asynchronous Low Power and High Performance VLSI Architecture for Viterbi Decoder Implemented with Quasi Delay Insensitive Templates.

    Science.gov (United States)

    Devi, T Kalavathi; Palaniappan, Sakthivel

    2015-01-01

    Convolutional codes are comprehensively used as Forward Error Correction (FEC) codes in digital communication systems. For decoding of convolutional codes at the receiver end, Viterbi decoder is often used to have high priority. This decoder meets the demand of high speed and low power. At present, the design of a competent system in Very Large Scale Integration (VLSI) technology requires these VLSI parameters to be finely defined. The proposed asynchronous method focuses on reducing the power consumption of Viterbi decoder for various constraint lengths using asynchronous modules. The asynchronous designs are based on commonly used Quasi Delay Insensitive (QDI) templates, namely, Precharge Half Buffer (PCHB) and Weak Conditioned Half Buffer (WCHB). The functionality of the proposed asynchronous design is simulated and verified using Tanner Spice (TSPICE) in 0.25 µm, 65 nm, and 180 nm technologies of Taiwan Semiconductor Manufacture Company (TSMC). The simulation result illustrates that the asynchronous design techniques have 25.21% of power reduction compared to synchronous design and work at a speed of 475 MHz.

  19. Phase-Based Binocular Perception of Motion in Depth: Cortical-Like Operators and Analog VLSI Architectures

    Science.gov (United States)

    Sabatini, Silvio P.; Solari, Fabio; Cavalleri, Paolo; Bisio, Giacomo Mario

    2003-12-01

    We present a cortical-like strategy to obtain reliable estimates of the motions of objects in a scene toward/away from the observer (motion in depth), from local measurements of binocular parameters derived from direct comparison of the results of monocular spatiotemporal filtering operations performed on stereo image pairs. This approach is suitable for a hardware implementation, in which such parameters can be gained via a feedforward computation (i.e., collection, comparison, and punctual operations) on the outputs of the nodes of recurrent VLSI lattice networks, performing local computations. These networks act as efficient computational structures for embedded analog filtering operations in smart vision sensors. Extensive simulations on both synthetic and real-world image sequences prove the validity of the approach that allows to gain high-level information about the 3D structure of the scene, directly from sensorial data, without resorting to explicit scene reconstruction.

  20. Phase-Based Binocular Perception of Motion in Depth: Cortical-Like Operators and Analog VLSI Architectures

    Directory of Open Access Journals (Sweden)

    Silvio P. Sabatini

    2003-06-01

    Full Text Available We present a cortical-like strategy to obtain reliable estimates of the motions of objects in a scene toward/away from the observer (motion in depth, from local measurements of binocular parameters derived from direct comparison of the results of monocular spatiotemporal filtering operations performed on stereo image pairs. This approach is suitable for a hardware implementation, in which such parameters can be gained via a feedforward computation (i.e., collection, comparison, and punctual operations on the outputs of the nodes of recurrent VLSI lattice networks, performing local computations. These networks act as efficient computational structures for embedded analog filtering operations in smart vision sensors. Extensive simulations on both synthetic and real-world image sequences prove the validity of the approach that allows to gain high-level information about the 3D structure of the scene, directly from sensorial data, without resorting to explicit scene reconstruction.

  1. Software implementation of floating-Point arithmetic on a reduced-Instruction-set processor

    Energy Technology Data Exchange (ETDEWEB)

    Gross, T.

    1985-11-01

    Current single chip implementations of reduced-instruction-set processors do not support hardware floating-point operations. Instead, floating-point operations have to be provided either by a coprocessor or by software. This paper discusses issues arising from a software implementation of floating-point arithmetic for the MIPS processor, an experimental VLSI architecture. Measurements indicate that an acceptable level of performance is achieved, but this approach is no substitute for a hardware accelerator if higher-precision results are required. This paper includes instruction profiles for the basic floating-point operations and evaluates the usefulness of some aspects of the instruction set.

  2. Next generation Associative Memory devices for the FTK tracking processor of the ATLAS experiment

    CERN Document Server

    Andreani, A; The ATLAS collaboration; Beccherle, B; Beretta, M; Citterio, M; Crescioli, F; Colombo, A; Giannetti, P; Liberali, V; Shojaii, J; Stabile, A

    2013-01-01

    The AMchip is a VLSI device that implements the associative memory function, a special content addressable memory specifically designed for high energy physics applications and first used in the CDF experiment at Tevatron. The 4th generation of AMchip has been developed for the core pattern recognition stage of the Fast TracKer (FTK) processor: a hardware processor for online reconstruction of particle trajectories at the ATLAS experiment at LHC. We present the architecture, design considerations, power consumption and performance measurements of the 4th generation of AMchip. We present also the design innovations toward the 5th generation and the first prototype results.

  3. VLSI signal processing technology

    CERN Document Server

    Swartzlander, Earl

    1994-01-01

    This book is the first in a set of forthcoming books focussed on state-of-the-art development in the VLSI Signal Processing area. It is a response to the tremendous research activities taking place in that field. These activities have been driven by two factors: the dramatic increase in demand for high speed signal processing, especially in consumer elec­ tronics, and the evolving microelectronic technologies. The available technology has always been one of the main factors in determining al­ gorithms, architectures, and design strategies to be followed. With every new technology, signal processing systems go through many changes in concepts, design methods, and implementation. The goal of this book is to introduce the reader to the main features of VLSI Signal Processing and the ongoing developments in this area. The focus of this book is on: • Current developments in Digital Signal Processing (DSP) pro­ cessors and architectures - several examples and case studies of existing DSP chips are discussed in...

  4. On-board neural processor design for intelligent multisensor microspacecraft

    Science.gov (United States)

    Fang, Wai-Chi; Sheu, Bing J.; Wall, James

    1996-03-01

    A compact VLSI neural processor based on the Optimization Cellular Neural Network (OCNN) has been under development to provide a wide range of support for an intelligent remote sensing microspacecraft which requires both high bandwidth communication and high- performance computing for on-board data analysis, thematic data reduction, synergy of multiple types of sensors, and other advanced smart-sensor functions. The OCNN is developed with emphasis on its capability to find global optimal solutions by using a hardware annealing method. The hardware annealing function is embedded in the network. It is a parallel version of fast mean-field annealing in analog networks, and is highly efficient in finding globally optimal solutions for cellular neural networks. The OCNN is designed to perform programmable functions for fine-grained processing with annealing control to enhance the output quality. The OCNN architecture is a programmable multi-dimensional array of neurons which are locally connected with their local neurons. Major design features of the OCNN neural processor includes massively parallel neural processing, hardware annealing capability, winner-take-all mechanism, digitally programmable synaptic weights, and multisensor parallel interface. A compact current-mode VLSI design feasibility of the OCNN neural processor is demonstrated by a prototype 5 X 5-neuroprocessor array chip in a 2-micrometers CMOS technology. The OCNN operation theory, architecture, design and implementation, prototype chip, and system applications have been investigated in detail and presented in this paper.

  5. HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

    Science.gov (United States)

    van Dyk, Danny; Geveler, Markus; Mallach, Sven; Ribbrock, Dirk; Göddeke, Dominik; Gutwenger, Carsten

    2009-12-01

    We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI's libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI's SSE backend, and additional 3-4 and 4-16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development. Program summaryProgram title: HONEI Catalogue identifier: AEDW_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDW_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPLv2 No. of lines in distributed program, including test data, etc.: 216 180 No. of bytes in distributed program, including test data, etc.: 1 270 140 Distribution format: tar.gz Programming language: C++ Computer: x86, x86_64, NVIDIA CUDA GPUs, Cell blades and PlayStation 3 Operating system: Linux RAM: at least 500 MB free Classification: 4.8, 4.3, 6.1 External routines: SSE: none; [1] for GPU, [2] for Cell backend Nature of problem: Computational science in general and numerical simulation in particular have reached a turning point. The revolution developers are facing is not primarily driven by a change in (problem-specific) methodology, but rather by the fundamental paradigm shift of the

  6. An Architectural Approach for Decoding and Distributing Functions in FPUs in a Functional Processor System

    CERN Document Server

    Nair, T R Gopalakrishnan; Krutthika, H K

    2010-01-01

    The main goal of this research is to develop the concepts of a revolutionary processor system called Functional Processor System. The fairly novel work carried out in this proposal concentrates on decoding of function pipelines and distributing it in FPUs as a part of scheduling approach. As the functional programs are super-level programs that entails requirements only at functional level, decoding of functions and distribution of functions in the heterogeneous functional processor units are a challenge. We explored the possibilities of segregation of the functions from the application program and distributing the functions on the relevant FPUs by using address mapping techniques. Here we pursue the perception of feeding the functions into the processor farm rather than the processor fetching the instructions or functions and executing it. This work is carried out at theoretical levels and it requires a long way to go in the realization of this work in hardware perhaps with a large industrial team with a pra...

  7. CMOS VLSI Layout and Verification of a SIMD Computer

    Science.gov (United States)

    Zheng, Jianqing

    1996-01-01

    A CMOS VLSI layout and verification of a 3 x 3 processor parallel computer has been completed. The layout was done using the MAGIC tool and the verification using HSPICE. Suggestions for expanding the computer into a million processor network are presented. Many problems that might be encountered when implementing a massively parallel computer are discussed.

  8. VLSI electronics microstructure science

    CERN Document Server

    1982-01-01

    VLSI Electronics: Microstructure Science, Volume 4 reviews trends for the future of very large scale integration (VLSI) electronics and the scientific base that supports its development.This book discusses the silicon-on-insulator for VLSI and VHSIC, X-ray lithography, and transient response of electron transport in GaAs using the Monte Carlo method. The technology and manufacturing of high-density magnetic-bubble memories, metallic superlattices, challenge of education for VLSI, and impact of VLSI on medical signal processing are also elaborated. This text likewise covers the impact of VLSI t

  9. Three-dimensional optoelectronic stacked processor by use of free-space optical interconnection and three-dimensional VLSI chip stacks.

    Science.gov (United States)

    Li, Guoqiang; Huang, Dawei; Yuceturk, Emel; Marchand, Philippe J; Esener, Sadik C; Ozguz, Volkan H; Liu, Yue

    2002-01-10

    We present a demonstration system under the three-dimensional (3D) optoelectronic stacked processor consortium. The processor combines the advantages of optics in global, high-density, high-speed parallel interconnections with the density and computational power of 3D chip stacks. In particular, a compact and scalable optoelectronic switching system with a high bandwidth is designed. The system consists of three silicon chip stacks, each integrated with a single vertical-cavity-surface-emitting-laser-metal-semiconductor-metal detector array and an optical interconnection module. Any input signal at one end stack can be switched through the central crossbar stack to any output channel on the opposite end stack. The crossbar bandwidth is designed to be 256 Gb/s. For the free-space optical interconnection, a novel folded hybrid micro-macro optical system with a concave reflection mirror has been designed. The optics module can provide a high resolution, a large field of view, a high link efficiency, and low optical cross talk. It is also symmetric and modular. Off-the-shelf macro-optical components are used. The concave reflection mirror can significantly improve the image quality and tolerate a large misalignment of the optical components, and it can also compensate for the lateral shift of the chip stacks. Scaling of the macrolens can be used to adjust the interconnection length between the chip stacks or make the system more compact. The components are easy to align, and only passive alignment is required. Optics and electronics are separated until the final assembly step, and the optomechanic module can be removed and replaced. By use of 3D chip stacks, commercially available optical components, and simple passive packaging techniques, it is possible to achieve a high-performance optoelectronic switching system.

  10. Associative Memory design for the FastTrack processor (FTK) at ATLAS

    CERN Document Server

    Annovi, A; The ATLAS collaboration; Bossini, E; Crescioli, F; Dell'Orso, M; Giannetti, P; Piendibene, M; Sacco, I; Sartori, L; Tripiccione, R

    2010-01-01

    We propose a new generation of VLSI processor for pattern recognition based on Associative Memory architecture, optimized for on-line track finding in high-energy physics experiments. We describe the architecture, the technology studies and the prototype design of a new R&D Associative Memory project: it maximizes the pattern density on ASICs, minimizes the power consumption and improves the functionality for the Fast Tracker (FTK) proposed to upgrade the ATLAS trigger at LHC. Finally we will focus on possible future applications inside and outside High Physics Energy (HEP).

  11. Novel Highly Parallel and Systolic Architectures Using Quantum Dot-Based Hardware

    Science.gov (United States)

    Fijany, Amir; Toomarian, Benny N.; Spotnitz, Matthew

    1997-01-01

    VLSI technology has made possible the integration of massive number of components (processors, memory, etc.) into a single chip. In VLSI design, memory and processing power are relatively cheap and the main emphasis of the design is on reducing the overall interconnection complexity since data routing costs dominate the power, time, and area required to implement a computation. Communication is costly because wires occupy the most space on a circuit and it can also degrade clock time. In fact, much of the complexity (and hence the cost) of VLSI design results from minimization of data routing. The main difficulty in VLSI routing is due to the fact that crossing of the lines carrying data, instruction, control, etc. is not possible in a plane. Thus, in order to meet this constraint, the VLSI design aims at keeping the architecture highly regular with local and short interconnection. As a result, while the high level of integration has opened the way for massively parallel computation, practical and full exploitation of such a capability in many applications of interest has been hindered by the constraints on interconnection pattern. More precisely. the use of only localized communication significantly simplifies the design of interconnection architecture but at the expense of somewhat restricted class of applications. For example, there are currently commercially available products integrating; hundreds of simple processor elements within a single chip. However, the lack of adequate interconnection pattern among these processing elements make them inefficient for exploiting a large degree of parallelism in many applications.

  12. VLSI in medicine

    CERN Document Server

    Einspruch, Norman G

    1989-01-01

    VLSI Electronics Microstructure Science, Volume 17: VLSI in Medicine deals with the more important applications of VLSI in medical devices and instruments.This volume is comprised of 11 chapters. It begins with an article about medical electronics. The following three chapters cover diagnostic imaging, focusing on such medical devices as magnetic resonance imaging, neurometric analyzer, and ultrasound. Chapters 5, 6, and 7 present the impact of VLSI in cardiology. The electrocardiograph, implantable cardiac pacemaker, and the use of VLSI in Holter monitoring are detailed in these chapters. The

  13. VLSI placement

    Energy Technology Data Exchange (ETDEWEB)

    Hojat, S.

    1986-01-01

    The placement problem of assigning modules to module sites in a regular array must be addressed in VLSI and WSI. The placement problem of assigning heterogeneous modules to module sites in a regular array is NP-complete. The placement problem could be simplified if one could find a footprint with the property that all modules of the optimum placement occupy locations in the footprint, with no vacancies within the footprint region. If such footprints were known, they could be precomputed for each system size and the optimization problem would be reduced to a search of placements meeting the footprint constraint. The author shows that the placement problem could not be simplified by finding footprints. As result, several heuristic algorithms for the placement problem were developed and compared to each other and other established algorithms with respect to time complexity and performance measured, by the expected distance traversed by an intermodule message. Compared to previous algorithms, one new heuristic algorithm gave better performance in a shorter execution time on all test examples.

  14. Architectural considerations in the design of a superconducting quantum annealing processor

    OpenAIRE

    Bunyk, P. I.; Hoskinson, E.; Johnson, M. W.; Tolkacheva, E.; Altomare, F.; Berkley, A. J.; Harris, R; Hilton, J. P.; Lanting, T.; Whittaker, J

    2014-01-01

    We have developed a quantum annealing processor, based on an array of tunably coupled rf-SQUID flux qubits, fabricated in a superconducting integrated circuit process [1]. Implementing this type of processor at a scale of 512 qubits and 1472 programmable inter-qubit couplers and operating at ~ 20 mK has required attention to a number of considerations that one may ignore at the smaller scale of a few dozen or so devices. Here we discuss some of these considerations, and the delicate balance n...

  15. A soft-core processor architecture optimised for radar signal processing applications

    CSIR Research Space (South Africa)

    Broich, R

    2013-12-01

    Full Text Available Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4 Datapath Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.5 Control Unit Architecture...View for FPGA, MATLAB/Sim- ulink plug-ins: Xilinx Sys- tem Generator, Altera DSP Builder) Graphical interconnection of different DSP blocks, multi- rate systems, fused datapath optimisations. Difficult to de- scribe and debug complex designs, no design trade...

  16. VLSI 'smart' I/O module development

    Science.gov (United States)

    Kirk, Dan

    The developmental history, design, and operation of the MIL-STD-1553A/B discrete and serial module (DSM) for the U.S. Navy AN/AYK-14(V) avionics computer are described and illustrated with diagrams. The ongoing preplanned product improvement for the AN/AYK-14(V) includes five dual-redundant MIL-STD-1553 channels based on DSMs. The DSM is a front-end processor for transferring data to and from a common memory, sharing memory with a host processor to provide improved 'smart' input/output performance. Each DSM comprises three hardware sections: three VLSI-6000 semicustomized CMOS arrays, memory units to support the arrays, and buffers and resynchronization circuits. The DSM hardware module design, VLSI-6000 design tools, controlware and test software, and checkout procedures (using a hardware simulator) are characterized in detail.

  17. VLSI Research

    Science.gov (United States)

    1984-04-01

    massive amounts of data pertaining to seismic exploration or weather observation require much more processing power. These scientific calculations...1« IC *• Number of Processors it 3* (a) 5g - *• * C > «i o •• u w »- a • c a. MM , / \\ i i T2C sp«r*ttoni •*l«y > M unit...algorithms can be divided into two categories; namely, single-input single-output (SISO) and multi-input multi- output ( MIMO ) systems. A highly

  18. ASIC Design of Floating-Point FFT Processor

    Institute of Scientific and Technical Information of China (English)

    陈禾; 赵忠武

    2004-01-01

    An application specific integrated circuit (ASIC) design of a 1024 points floating-point fast Fourier transform(FFT) processor is presented. It can satisfy the requirement of high accuracy FFT result in related fields. Several novel design techniques for floating-point adder and multiplier are introduced in detail to enhance the speed of the system. At the same time, the power consumption is decreased. The hardware area is effectively reduced as an improved butterfly processor is developed. There is a substantial increase in the performance of the design since a pipelined architecture is adopted, and very large scale integrated (VLSI) is easy to realize due to the regularity. A result of validation using field programmable gate array (FPGA) is shown at the end. When the system clock is set to 50 MHz, 204.8 μs is needed to complete the operation of FFT computation.

  19. Reconfigurable optical power splitter/combiner based on Opto-VLSI processing.

    Science.gov (United States)

    Mustafa, Haithem; Xiao, Feng; Alameh, Kamal

    2011-10-24

    A novel 1×4 reconfigurable optical splitter/combiner structure based on Opto-VLSI processor and 4-f imaging system with high resolution is proposed and experimentally demonstrated. By uploading optimized multicasting phase holograms onto the software-driven Opto-VLSI processor, an input optical signal is dynamically split into different output fiber ports with user-defined splitting ratios. Also, multiple input optical signals are dynamically combined with arbitrary user-defined weights.

  20. vPELS: An E-Learning Social Environment for VLSI Design with Content Security Using DRM

    Science.gov (United States)

    Dewan, Jahangir; Chowdhury, Morshed; Batten, Lynn

    2014-01-01

    This article provides a proposal for personal e-learning system (vPELS [where "v" stands for VLSI: very large scale integrated circuit])) architecture in the context of social network environment for VLSI Design. The main objective of vPELS is to develop individual skills on a specific subject--say, VLSI--and share resources with peers.…

  1. vPELS: An E-Learning Social Environment for VLSI Design with Content Security Using DRM

    Science.gov (United States)

    Dewan, Jahangir; Chowdhury, Morshed; Batten, Lynn

    2014-01-01

    This article provides a proposal for personal e-learning system (vPELS [where "v" stands for VLSI: very large scale integrated circuit])) architecture in the context of social network environment for VLSI Design. The main objective of vPELS is to develop individual skills on a specific subject--say, VLSI--and share resources with peers.…

  2. Transformational VLSI Design

    DEFF Research Database (Denmark)

    Rasmussen, Ole Steen

    This thesis introduces a formal approach to deriving VLSI circuits by the use of correctness-preserving transformations. Both the specification and the implementation are descibed by the relation based language Ruby. In order to prove the transformation rules a proof tool called RubyZF has been...... in connection with VLSI design are defined in terms of Pure Ruby and their properties proved. The design process is illustrated by several non-trivial examples of standard VLSI problems....

  3. Heterogeneous reconfigurable processors for real-time baseband processing from algorithm to architecture

    CERN Document Server

    Zhang, Chenxin; Öwall, Viktor

    2016-01-01

    This book focuses on domain-specific heterogeneous reconfigurable architectures, demonstrating for readers a computing platform which is flexible enough to support multiple standards, multiple modes, and multiple algorithms. The content is multi-disciplinary, covering areas of wireless communication, computing architecture, and circuit design. The platform described provides real-time processing capability with reasonable implementation cost, achieving balanced trade-offs among flexibility, performance, and hardware costs. The authors discuss efficient design methods for wireless communication processing platforms, from both an algorithm and architecture design perspective. Coverage also includes computing platforms for different wireless technologies and standards, including MIMO, OFDM, Massive MIMO, DVB, WLAN, LTE/LTE-A, and 5G. •Discusses reconfigurable architectures, including hardware building blocks such as processing elements, memory sub-systems, Network-on-Chip (NoC), and dynamic hardware reconfigur...

  4. A fast band–Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors

    Energy Technology Data Exchange (ETDEWEB)

    Aliaga, José I., E-mail: aliaga@uji.es [Depto. Ingeniería y Ciencia de Computadores, Universitat Jaume I, Castellón (Spain); Alonso, Pedro [Departamento de Sistemas Informáticos y Computación, Universitat Politècnica de València (Spain); Badía, José M. [Depto. Ingeniería y Ciencia de Computadores, Universitat Jaume I, Castellón (Spain); Chacón, Pablo [Dept. Biological Chemical Physics, Rocasolano Physics and Chemistry Institute, CSIC, Madrid (Spain); Davidović, Davor [Rudjer Bošković Institute, Centar za Informatiku i Računarstvo – CIR, Zagreb (Croatia); López-Blanco, José R. [Dept. Biological Chemical Physics, Rocasolano Physics and Chemistry Institute, CSIC, Madrid (Spain); Quintana-Ortí, Enrique S. [Depto. Ingeniería y Ciencia de Computadores, Universitat Jaume I, Castellón (Spain)

    2016-03-15

    We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). The method consists of two stages, with the original problem first reduced into a simpler band-structured form by means of a high-performance compute-intensive procedure. This is followed by a memory-intensive but low-cost Krylov iteration, which is off-loaded to be computed on the GPU by means of an efficient data-parallel kernel. The experimental results reveal the performance of the new eigensolver. Concretely, when applied to the simulation of macromolecules with a few thousands degrees of freedom and the number of eigenpairs to be computed is small to moderate, the new solver outperforms other methods implemented as part of high-performance numerical linear algebra packages for multithreaded architectures.

  5. Associative Pattern Recognition In Analog VLSI Circuits

    Science.gov (United States)

    Tawel, Raoul

    1995-01-01

    Winner-take-all circuit selects best-match stored pattern. Prototype cascadable very-large-scale integrated (VLSI) circuit chips built and tested to demonstrate concept of electronic associative pattern recognition. Based on low-power, sub-threshold analog complementary oxide/semiconductor (CMOS) VLSI circuitry, each chip can store 128 sets (vectors) of 16 analog values (vector components), vectors representing known patterns as diverse as spectra, histograms, graphs, or brightnesses of pixels in images. Chips exploit parallel nature of vector quantization architecture to implement highly parallel processing in relatively simple computational cells. Through collective action, cells classify input pattern in fraction of microsecond while consuming power of few microwatts.

  6. Fault tolerant, radiation hard, high performance digital signal processor

    Science.gov (United States)

    Holmann, Edgar; Linscott, Ivan R.; Maurer, Michael J.; Tyler, G. L.; Libby, Vibeke

    1990-01-01

    An architecture has been developed for a high-performance VLSI digital signal processor that is highly reliable, fault-tolerant, and radiation-hard. The signal processor, part of a spacecraft receiver designed to support uplink radio science experiments at the outer planets, organizes the connections between redundant arithmetic resources, register files, and memory through a shuffle exchange communication network. The configuration of the network and the state of the processor resources are all under microprogram control, which both maps the resources according to algorithmic needs and reconfigures the processing should a failure occur. In addition, the microprogram is reloadable through the uplink to accommodate changes in the science objectives throughout the course of the mission. The processor will be implemented with silicon compiler tools, and its design will be verified through silicon compilation simulation at all levels from the resources to full functionality. By blending reconfiguration with redundancy the processor implementation is fault-tolerant and reliable, and possesses the long expected lifetime needed for a spacecraft mission to the outer planets.

  7. Associative Memory design for the Fast TracK processor (FTK) at Atlas

    CERN Document Server

    Annovi, A; The ATLAS collaboration; Bossini, E; Crescioli, F; Dell'Orso, M; Giannetti, P; Piendibene, M; Sacco, I; Sartori, L; Tripiccione, R

    2010-01-01

    We describe a VLSI processor for pattern recognition based on Content Addressable Memory (CAM) architecture, optimized for on-line track finding in high-energy physics experiments. A large CAM bank stores all trajectories of interest and extracts the ones compatible with a given event. This task is naturally parallelized by a CAM architecture able to output identified trajectories, recognized among a huge amount of possible combinations, in just a few 100 MHz clock cycles. We have developed this device (called the AMchip03 processor), using 180 nm technology, for the Silicon Vertex Trigger (SVT) upgrade at CDF using a standard-cell VLSI design methodology. We propose now a new design (90 nm technology) where we introduce a full custom standard cell. This is a customized design that allows to maximize the pattern density and to minimize the power consumption. We discuss also possible future extensions based on 3-D technology. This processor has a flexible and easily configurable structure that makes it suitabl...

  8. TSC696E: a SPARC V8 processor with SEU protection built-in architecture

    Science.gov (United States)

    Corbiere, T.

    2002-12-01

    To fulfill the always increasing computing power required by airborne equipment's, ESA has carried out the development of the LEON, a SPARC v8 processor. This device, available as a soft macro, is commonly targeting commercial and space markets. Therefore, to resist against radiation inherent to space environment, its design is highly configurable so as SEU protection mechanisms can be implemented without significant penalty. The work described in this paper presents the design by Atmel(1) of a 0.35μm technology demonstrator implementing the Fault Tolerant version of the LEON aiming at the validation of the SPARC standard compliance as well as SEU protection efficiency verification. Both functional validation and radiation results are reviewed with special attention to SEU induced errors circumventing actions.

  9. Media processors using a new microsystem architecture designed for the Internet era

    Science.gov (United States)

    Wyland, David C.

    1999-12-01

    The demands of digital image processing, communications and multimedia applications are growing more rapidly than traditional design methods can fulfill them. Previously, only custom hardware designs could provide the performance required to meet the demands of these applications. However, hardware design has reached a crisis point. Hardware design can no longer deliver a product with the required performance and cost in a reasonable time for a reasonable risk. Software based designs running on conventional processors can deliver working designs in a reasonable time and with low risk but cannot meet the performance requirements. What is needed is a media processing approach that combines very high performance, a simple programming model, complete programmability, short time to market and scalability. The Universal Micro System (UMS) is a solution to these problems. The UMS is a completely programmable (including I/O) system on a chip that combines hardware performance with the fast time to market, low cost and low risk of software designs.

  10. Associative Memory Design for the Fast TracKer Processor (FTK) at ATLAS

    CERN Document Server

    Beretta, M; The ATLAS collaboration

    2011-01-01

    We describe a VLSI processor for pattern recognition based on Content Addressable Memory (CAM) architecture, optimized for on-line track finding in high-energy physics experiments. We have developed this device using 65 nm technology combining a full custom CAM cell with standard-cell control logic. The customized design maximizes the pattern density, minimizes the power consumption and implements the functionalities needed for the planned Fast Tracker (FTK) [2], an ATLAS trigger upgrade project at LHC. We introduce a new variable resolution pattern matching technique using “don’t care” bits to set the pattern-matching window for each pattern and each layer can be independently.

  11. Distributed processor allocation for launching applications in a massively connected processors complex

    Science.gov (United States)

    Pedretti, Kevin

    2008-11-18

    A compute processor allocator architecture for allocating compute processors to run applications in a multiple processor computing apparatus is distributed among a subset of processors within the computing apparatus. Each processor of the subset includes a compute processor allocator. The compute processor allocators can share a common database of information pertinent to compute processor allocation. A communication path permits retrieval of information from the database independently of the compute processor allocators.

  12. HyperForest: A high performance multi-processor architecture for real-time intelligent systems

    Energy Technology Data Exchange (ETDEWEB)

    Garcia, P. Jr.; Rebeil, J.P. [Sandia National Labs., Albuquerque, NM (United States); Pollard, H. [Univ. of New Mexico, Albuquerque, NM (United States). Electrical Engineering and Computer Engineering Dept.

    1997-04-01

    Intelligent Systems are characterized by the intensive use of computer power. The computer revolution of the last few years is what has made possible the development of the first generation of Intelligent Systems. Software for second generation Intelligent Systems will be more complex and will require more powerful computing engines in order to meet real-time constraints imposed by new robots, sensors, and applications. A multiprocessor architecture was developed that merges the advantages of message-passing and shared-memory structures: expendability and real-time compliance. The HyperForest architecture will provide an expandable real-time computing platform for computationally intensive Intelligent Systems and open the doors for the application of these systems to more complex tasks in environmental restoration and cleanup projects, flexible manufacturing systems, and DOE`s own production and disassembly activities.

  13. A High-Throughput, Adaptive FFT Architecture for FPGA-Based Space-Borne Data Processors

    Science.gov (United States)

    Nguyen, Kayla; Zheng, Jason; He, Yutao; Shah, Biren

    2010-01-01

    Historically, computationally-intensive data processing for space-borne instruments has heavily relied on ground-based computing resources. But with recent advances in functional densities of Field-Programmable Gate-Arrays (FPGAs), there has been an increasing desire to shift more processing on-board; therefore relaxing the downlink data bandwidth requirements. Fast Fourier Transforms (FFTs) are commonly used building blocks for data processing applications, with a growing need to increase the FFT block size. Many existing FFT architectures have mainly emphasized on low power consumption or resource usage; but as the block size of the FFT grows, the throughput is often compromised first. In addition to power and resource constraints, space-borne digital systems are also limited to a small set of space-qualified memory elements, which typically lag behind the commercially available counterparts in capacity and bandwidth. The bandwidth limitation of the external memory creates a bottleneck for a large, high-throughput FFT design with large block size. In this paper, we present the Multi-Pass Wide Kernel FFT (MPWK-FFT) architecture for a moderately large block size (32K) with considerations to power consumption and resource usage, as well as throughput. We will also show that the architecture can be easily adapted for different FFT block sizes with different throughput and power requirements. The result is completely contained within an FPGA without relying on external memories. Implementation results are summarized.

  14. Design and Implementation of an Efficient Software Communications Architecture Core Framework for a Digital Signal Processors Platform

    Directory of Open Access Journals (Sweden)

    Wael A. Murtada

    2011-01-01

    Full Text Available Problem statement: The Software Communications Architecture (SCA was developed to improve software reuse and interoperability in Software Defined Radios (SDR. However, there have been performance concerns since its conception. Arguably, the majority of the problems and inefficiencies associated with the SCA can be attributed to the assumption of modular distributed platforms relying on General Purpose Processors (GPPs to perform all signal processing. Approach: Significant improvements in cost and power consumption can be obtained by utilizing specialized and more efficient platforms. Digital Signal Processors (DSPs present such a platform and have been widely used in the communications industry. Improvements in development tools and middleware technology opened the possibility of fully integrating DSPs into the SCA. This approach takes advantage of the exceptional power, cost and performance characteristics of DSPs, while still enjoying the flexibility and portability of the SCA. Results: This study presents the design and implementation of an SCA Core Framework (CF for a TI TMS320C6416 DSP. The framework is deployed on a C6416 Device Cycle Accurate Simulator and TI C6416 Development board. The SCA CF is implemented by leveraging OSSIE, an open-source implementation of the SCA, to support the DSP platform. OIS’s ORBExpress DSP and DSP/BIOS are used as the middleware and operating system, respectively. A sample waveform was developed to demonstrate the framework’s functionality. Benchmark results for the framework and sample applications are provided. Conclusion: Benchmark results show that, using OIS ORBExpress DSP ORB middleware has an impact for decreasing the Software Memory Footprint and increasing the System Performance compared with PrismTech's e*ORB middleware.

  15. Memory Based Machine Intelligence Techniques in VLSI hardware

    OpenAIRE

    James, Alex Pappachen

    2012-01-01

    We briefly introduce the memory based approaches to emulate machine intelligence in VLSI hardware, describing the challenges and advantages. Implementation of artificial intelligence techniques in VLSI hardware is a practical and difficult problem. Deep architectures, hierarchical temporal memories and memory networks are some of the contemporary approaches in this area of research. The techniques attempt to emulate low level intelligence tasks and aim at providing scalable solutions to high ...

  16. Memory Based Machine Intelligence Techniques in VLSI hardware

    CERN Document Server

    James, Alex Pappachen

    2012-01-01

    We briefly introduce the memory based approaches to emulate machine intelligence in VLSI hardware, describing the challenges and advantages. Implementation of artificial intelligence techniques in VLSI hardware is a practical and difficult problem. Deep architectures, hierarchical temporal memories and memory networks are some of the contemporary approaches in this area of research. The techniques attempt to emulate low level intelligence tasks and aim at providing scalable solutions to high level intelligence problems such as sparse coding and contextual processing.

  17. ET: An Energy-efficient Processor Architecture for Embedded Tera-scale Computing%ET:一种能耗有效的高性能嵌入式处理器

    Institute of Scientific and Technical Information of China (English)

    杨乾明; 伍楠; 管茂林; 张春元; 全巍; 黄达飞

    2011-01-01

    As criterions and algorithms evolve and become molt complex,high performance embedded application demands the high performance and energy efficiency.The challenge,however,is how to turn the VLSI capability into the actual computing performance.This research proposed an energy efficient processor architecture named ET(Embedded Tera-scale Computing),which is composed of many lightweight VLIW processor cores.Also named small COILS.Each core executes a thread witll the mechanisms for explicitly managing the data and instructions.ET uses a hierarchical data registers to reduce the cost of delivering data,and the asymmetric and distributed instruction registers to deliver the instructions.In order to further reduce the energy,ET employs non-deep pipeline and simple control flow and optimizes the execution of loop body of applications.The primary result shows that ET Can achieve the 1TOPS performance and the 100GOPS/W efficiency when scaled to 40nm.%随着标准和算法的不断演进,高端嵌入式应用对性能和能耗提出了越来越高的要求.然而,能耗问题成为将VLSI潜力转换为实际应用需求的最大挑战,基于此,提出ET( Embedded Tera-scale Computing)处理器设计.ET以众多轻量级处理器(称为小核)来搭建目标处理器,每个小核都是一个基于显式数据和指令管理的VLIW处理器,能单独执行一个线程,采用层次化的寄存器文件和非对称全分布式指令寄存器来分别降低数据和指令的供应能耗.为了进一步降低功耗,ET处理器采用了较短的运算流水线和简单的循环控制结构,并面向应用领域针对循环体进行优化.初步的实验结果表明,在40nm工艺下,ET处理器可以获得单芯片1TOPS以上的性能,同时保持操作能效比在100GOPS/W以上.

  18. VLSI Universal Noiseless Coder

    Science.gov (United States)

    Rice, Robert F.; Lee, Jun-Ji; Fang, Wai-Chi

    1989-01-01

    Proposed universal noiseless coder (UNC) compresses stream of data signals for efficient transmission in channel of limited bandwidth. Noiseless in sense original data completely recoverable from output code. System built as very-large-scale integrated (VLSI) circuit, compressing data in real time at input rates as high as 24 Mb/s, and possibly faster, depending on specific design. Approach yields small, lightweight system operating reliably and consuming little power. Constructed as single, compact, low-power VLSI circuit chip. Design of VLSI circuit chip made specific to code algorithms. Entire UNC fabricated in single chip, worst-case power dissipation less than 1 W.

  19. Plasma processing for VLSI

    CERN Document Server

    Einspruch, Norman G

    1984-01-01

    VLSI Electronics: Microstructure Science, Volume 8: Plasma Processing for VLSI (Very Large Scale Integration) discusses the utilization of plasmas for general semiconductor processing. It also includes expositions on advanced deposition of materials for metallization, lithographic methods that use plasmas as exposure sources and for multiple resist patterning, and device structures made possible by anisotropic etching.This volume is divided into four sections. It begins with the history of plasma processing, a discussion of some of the early developments and trends for VLSI. The second section

  20. High-Throughput, Adaptive FFT Architecture for FPGA-Based Spaceborne Data Processors

    Science.gov (United States)

    NguyenKobayashi, Kayla; Zheng, Jason X.; He, Yutao; Shah, Biren N.

    2011-01-01

    Exponential growth in microelectronics technology such as field-programmable gate arrays (FPGAs) has enabled high-performance spaceborne instruments with increasing onboard data processing capabilities. As a commonly used digital signal processing (DSP) building block, fast Fourier transform (FFT) has been of great interest in onboard data processing applications, which needs to strike a reasonable balance between high-performance (throughput, block size, etc.) and low resource usage (power, silicon footprint, etc.). It is also desirable to be designed so that a single design can be reused and adapted into instruments with different requirements. The Multi-Pass Wide Kernel FFT (MPWK-FFT) architecture was developed, in which the high-throughput benefits of the parallel FFT structure and the low resource usage of Singleton s single butterfly method is exploited. The result is a wide-kernel, multipass, adaptive FFT architecture. The 32K-point MPWK-FFT architecture includes 32 radix-2 butterflies, 64 FIFOs to store the real inputs, 64 FIFOs to store the imaginary inputs, complex twiddle factor storage, and FIFO logic to route the outputs to the correct FIFO. The inputs are stored in sequential fashion into the FIFOs, and the outputs of each butterfly are sequentially written first into the even FIFO, then the odd FIFO. Because of the order of the outputs written into the FIFOs, the depth of the even FIFOs, which are 768 each, are 1.5 times larger than the odd FIFOs, which are 512 each. The total memory needed for data storage, assuming that each sample is 36 bits, is 2.95 Mbits. The twiddle factors are stored in internal ROM inside the FPGA for fast access time. The total memory size to store the twiddle factors is 589.9Kbits. This FFT structure combines the benefits of high throughput from the parallel FFT kernels and low resource usage from the multi-pass FFT kernels with desired adaptability. Space instrument missions that need onboard FFT capabilities such as the

  1. Study and Design for a Java Processor Based on RISC Architecture%基于RISC结构的Java处理器研究与设计

    Institute of Scientific and Technical Information of China (English)

    张金钟; 胡平

    2011-01-01

    文中结合PicoJava和JOP等一些经典的Java处理器的优势,设计了一种基于RISC结构的Java处理器.它充分利用了Java指令折叠技术和精简指令集处理器的优势,不仅降低了设计复杂度,而且在很大程度上提高了Java处理器的性能.%In this paper, a new Java processor was designed, which combines the advantages of the classic Java processors such as Picojava and JOP, has high performance and low complexity features, takes full advantage of Java instruction folding technology and the Reduced Instruction Set Computer architecture of advantage. So it not only reduces the design complexity, but has greatly improved the performance of Java processors.

  2. VLSI Reliability in Europe

    NARCIS (Netherlands)

    Verweij, Jan F.

    1993-01-01

    Several issue's regarding VLSI reliability research in Europe are discussed. Organizations involved in stimulating the activities on reliability by exchanging information or supporting research programs are described. Within one such program, ESPRIT, a technical interest group on IC reliability was

  3. Associative Memory Design for the Fast TracKer Processor (FTK)at ATLAS

    CERN Document Server

    Annovi, A; The ATLAS collaboration; Beretta, M; Bossini, E; Crescioli, F; Dell'Orso, M; Giannetti, P; Hoff, J; Liu, T; Liberali, V; Sacco, I; Schoening, A; Soltveit, H K; Stabile, A; Tripiccione, R

    2011-01-01

    We describe a VLSI processor for pattern recognition based on Content Addressable Memory (CAM) architecture, optimized for on-line track finding in high-energy physics experiments. A large CAM bank stores all trajectories of interest and extracts the ones compatible with a given event. This task is naturally parallelized by a CAM architecture able to output identified trajectories, recognized among a huge amount of possible combinations, in just a few 100 MHz clock cycles. We have developed this device (called the AMchip03 processor), using 180 nm technology, for the Silicon Vertex Trigger (SVT) upgrade at CDF [1] using a standard-cell VLSI design methodology. We propose a new design that introduces a full custom CAM cell and takes advantage of 65 nm technology. The customized design maximizes the pattern density, minimizes the power consumption and implements the functionalities needed for the planned Fast Tracker (FTK) [2], an ATLAS trigger upgrade project at LHC. We introduce a new variable resolution patt...

  4. Dual-core Itanium Processor

    CERN Multimedia

    2006-01-01

    Intel’s first dual-core Itanium processor, code-named "Montecito" is a major release of Intel's Itanium 2 Processor Family, which implements the Intel Itanium architecture on a dual-core processor with two cores per die (integrated circuit). Itanium 2 is much more powerful than its predecessor. It has lower power consumption and thermal dissipation.

  5. Lithography for VLSI

    CERN Document Server

    Einspruch, Norman G

    1987-01-01

    VLSI Electronics Microstructure Science, Volume 16: Lithography for VLSI treats special topics from each branch of lithography, and also contains general discussion of some lithographic methods.This volume contains 8 chapters that discuss the various aspects of lithography. Chapters 1 and 2 are devoted to optical lithography. Chapter 3 covers electron lithography in general, and Chapter 4 discusses electron resist exposure modeling. Chapter 5 presents the fundamentals of ion-beam lithography. Mask/wafer alignment for x-ray proximity printing and for optical lithography is tackled in Chapter 6.

  6. Opto-VLSI-based tunable single-mode fiber laser.

    Science.gov (United States)

    Xiao, Feng; Alameh, Kamal; Lee, Tongtak

    2009-10-12

    A new tunable fiber ring laser structure employing an Opto-VLSI processor and an erbium-doped fiber amplifier (EDFA) is reported. The Opto-VLSI processor is able to dynamically select and couple a waveband from the gain spectrum of the EDFA into a fiber ring, leading to a narrow-linewidth high-quality tunable laser output. Experimental results demonstrate a tunable fiber laser of linewidth 0.05 nm and centre wavelength tuned over the C-band with a 0.05 nm step. The measured side mode suppression ratio (SMSR) is greater than 35 dB and the laser output power uniformity is better than 0.25 dB. The laser output is very stable at room temperature.

  7. Analog and VLSI circuits

    CERN Document Server

    Chen, Wai-Kai

    2009-01-01

    Featuring hundreds of illustrations and references, this book provides the information on analog and VLSI circuits. It focuses on analog integrated circuits, presenting the knowledge on monolithic device models, analog circuit cells, high performance analog circuits, RF communication circuits, and PLL circuits.

  8. Low-Power and Optimized VLSI Implementation of Compact Recursive Discrete Fourier Transform (RDFT Processor for the Computations of DFT and Inverse Modified Cosine Transform (IMDCT in a Digital Radio Mondiale (DRM and DRM+ Receiver

    Directory of Open Access Journals (Sweden)

    Sheau-Fang Lei

    2013-05-01

    Full Text Available This paper presents a compact structure of recursive discrete Fourier transform (RDFT with prime factor (PF and common factor (CF algorithms to calculate variable-length DFT coefficients. Low-power optimizations in VLSI implementation are applied to the proposed RDFT design. In the algorithm, for 256-point DFT computation, the results show that the proposed method greatly reduces the number of multiplications/additions/computational cycles by 97.40/94.31/46.50% compared to a recent approach. In chip realization, the core size and chip size are, respectively, 0.84 × 0.84 and 1.38 × 1.38 mm2. The power consumption for the 288- and 256-point DFT computations are, respectively, 10.2 (or 0.1051 and 11.5 (or 0.1176 mW at 25 (or 0.273 MHz simulated by NanoSim. It would be more efficient and more suitable than previous works for DRM and DRM+ applications.

  9. 软件无线电数字信号处理器体系结构研究%Software Defined Radio Digital Signal Processor Architecture Research

    Institute of Scientific and Technical Information of China (English)

    刘衡竹; 莫方政; 张波涛; 赵恒; 刘冬培; 陈艇; 周理

    2009-01-01

    Software defined radio (SDR) has won much interest for being considered to be in line with the trend of wireless communication developrnent. Now the digital signal processor (DSP) is the bottleneck of software defined radio. The advantages and disadvantages of diverse architecture of software defined radio digital signal processor are summarized, and then the trends of software defined radio digital signal processor are discussed.%软件无线电因被认为是无线通信技术未来的发展趋势而受到广泛关注.目前数字信号处理器是软件无线电发展的瓶颈.通过分析、比较目前几种较为典型的软件无线电数字信号处理器结构,归纳总结各种结构各自设计出发点和优缺点,并对软件无线电数字信号处理器的发展趋势做了展望.

  10. Design concepts for a virtualizable embedded MPSoC architecture enabling virtualization in embedded multi-processor systems

    CERN Document Server

    Biedermann, Alexander

    2014-01-01

    Alexander Biedermann presents a generic hardware-based virtualization approach, which may transform an array of any off-the-shelf embedded processors into a multi-processor system with high execution dynamism. Based on this approach, he highlights concepts for the design of energy aware systems, self-healing systems as well as parallelized systems. For the latter, the novel so-called Agile Processing scheme is introduced by the author, which enables a seamless transition between sequential and parallel execution schemes. The design of such virtualizable systems is further aided by introduction

  11. A Motion Adaptive De-interlacing Technique and VLSI Architecture%一种运动自适应去隔行技术及其VLSI结构

    Institute of Scientific and Technical Information of China (English)

    普玉伟; 叶兵; 曾德瑞; 蒋特林

    2011-01-01

    An efficient motion adaptive de-interlacing is proposed in this paper. The mixing pixels is classified to fast motion, slow motion or static region according to the motion detection of the same parity field, the corresponding interpolation method is used in different motion region. The edge detection uses the improved ELA algorithm which overcomes the traditional ELA algorithm's deficiency at processing horizontal edge, and the edge is preserved effectively. Compared with motion compensated algorithm, our proposed algorithm required lower computational complexity, and it is easier to implement by VLSI .The experiment shows that the proposed algorithm gains better de-interlacing and high peak signal-to-noise ratio.%提出一种有效的运动自适应去隔行算法.该算法通过对同极性的相邻场进行运动检测,把插值点所处的区域分为快速运动区域、慢速运动区域和静止区域,对不同的区域采用不同的插值算法.在边缘检测方面,采用改进型ELA算法克服了传统的ELA算法处理水平边缘方面的不足,使边缘得到有效保护.与运动补偿算法相比,该算法计算复杂度较低,易于VLSI实现.实验结果显示,该算法取得了良好的去隔行效果和较高的峰值信噪比.

  12. CASPER: Embedding Power Estimation and Hardware-Controlled Power Management in a Cycle-Accurate Micro-Architecture Simulation Platform for Many-Core Multi-Threading Heterogeneous Processors

    Directory of Open Access Journals (Sweden)

    Arun Ravindran

    2012-02-01

    Full Text Available Despite the promising performance improvement observed in emerging many-core architectures in high performance processors, high power consumption prohibitively affects their use and marketability in the low-energy sectors, such as embedded processors, network processors and application specific instruction processors (ASIPs. While most chip architects design power-efficient processors by finding an optimal power-performance balance in their design, some use sophisticated on-chip autonomous power management units, which dynamically reduce the voltage or frequencies of idle cores and hence extend battery life and reduce operating costs. For large scale designs of many-core processors, a holistic approach integrating both these techniques at different levels of abstraction can potentially achieve maximal power savings. In this paper we present CASPER, a robust instruction trace driven cycle-accurate many-core multi-threading micro-architecture simulation platform where we have incorporated power estimation models of a wide variety of tunable many-core micro-architectural design parameters, thus enabling processor architects to explore a sufficiently large design space and achieve power-efficient designs. Additionally CASPER is designed to accommodate cycle-accurate models of hardware controlled power management units, enabling architects to experiment with and evaluate different autonomous power-saving mechanisms to study the run-time power-performance trade-offs in embedded many-core processors. We have implemented two such techniques in CASPER–Chipwide Dynamic Voltage and Frequency Scaling, and Performance Aware Core-Specific Frequency Scaling, which show average power savings of 35.9% and 26.2% on a baseline 4-core SPARC based architecture respectively. This power saving data accounts for the power consumption of the power management units themselves. The CASPER simulation platform also provides users with complete support of SPARCV9

  13. Low complexity VLSI implementation of CORDIC-based exponent calculation for neural networks

    Science.gov (United States)

    Aggarwal, Supriya; Khare, Kavita

    2012-11-01

    This article presents a low hardware complexity for exponent calculations based on CORDIC. The proposed CORDIC algorithm is designed to overcome major drawbacks (scale-factor compensation, low range of convergence and optimal selection of micro-rotations) of the conventional CORDIC in hyperbolic mode of operation. The micro-rotations are identified using leading-one bit detection with uni-direction rotations to eliminate redundant iterations and improve throughput. The efficiency and performance of the processor are independent of the probability of rotation angles being known prior to implementation. The eight-staged pipelined architecture implementation requires an 8 × N ROM in the pre-processing unit for storing the initial coordinate values; it no longer requires the ROM for storing the elementary angles. It provides an area-time efficient design for VLSI implementation for calculating exponents in activation functions and Gaussain Potential Functions (GPF) in neural networks. The proposed CORDIC processor requires 32.68% less adders and 72.23% less registers compared to that of the conventional design. The proposed design when implemented on Virtex 2P (2vp50ff1148-6) device, dissipates 55.58% less power and has 45.09% less total gate count and 16.91% less delay as compared to Xilinx CORDIC Core. The detailed algorithm design along with FPGA implementation and area and time complexities is presented.

  14. Evaluating New Architectural Features Of The Intel(R) Xeon(R) 7500 Processor For Hpc Workloads

    OpenAIRE

    2011-01-01

    In this paper we take a look at what the Intel Xeon Processor 7500 family, code namedNehalem-EX, brings to high performance computing. We compare two families of Intel Xeonbased systems (Intel Xeon 7500 and Intel Xeon 5600) and present a performance evolutionof 16 node clusters based on these CPUs. We compare CPU generations utilizing dual socketplatforms and a cluster across a number of HPC benchmarks and focused on differentperformance field and aspect. We will evaluate also technologies an...

  15. Very Large Scale Integration (VLSI).

    Science.gov (United States)

    Yeaman, Andrew R. J.

    Very Large Scale Integration (VLSI), the state-of-the-art production techniques for computer chips, promises such powerful, inexpensive computing that, in the future, people will be able to communicate with computer devices in natural language or even speech. However, before full-scale VLSI implementation can occur, certain salient factors must be…

  16. A special purpose silicon compiler for designing supercomputing VLSI systems

    Science.gov (United States)

    Venkateswaran, N.; Murugavel, P.; Kamakoti, V.; Shankarraman, M. J.; Rangarajan, S.; Mallikarjun, M.; Karthikeyan, B.; Prabhakar, T. S.; Satish, V.; Venkatasubramaniam, P. R.

    1991-01-01

    Design of general/special purpose supercomputing VLSI systems for numeric algorithm execution involves tackling two important aspects, namely their computational and communication complexities. Development of software tools for designing such systems itself becomes complex. Hence a novel design methodology has to be developed. For designing such complex systems a special purpose silicon compiler is needed in which: the computational and communicational structures of different numeric algorithms should be taken into account to simplify the silicon compiler design, the approach is macrocell based, and the software tools at different levels (algorithm down to the VLSI circuit layout) should get integrated. In this paper a special purpose silicon (SPS) compiler based on PACUBE macrocell VLSI arrays for designing supercomputing VLSI systems is presented. It is shown that turn-around time and silicon real estate get reduced over the silicon compilers based on PLA's, SLA's, and gate arrays. The first two silicon compiler characteristics mentioned above enable the SPS compiler to perform systolic mapping (at the macrocell level) of algorithms whose computational structures are of GIPOP (generalized inner product outer product) form. Direct systolic mapping on PLA's, SLA's, and gate arrays is very difficult as they are micro-cell based. A novel GIPOP processor is under development using this special purpose silicon compiler.

  17. An 8×8/4×4 Adaptive Hadamard Transform Based FME VLSI Architecture for 4K×2K H.264/AVC Encoder

    Science.gov (United States)

    Fan, Yibo; Liu, Jialiang; Zhang, Dexue; Zeng, Xiaoyang; Chen, Xinhua

    Fidelity Range Extension (FRExt) (i.e. High Profile) was added to the H.264/AVC recommendation in the second version. One of the features included in FRExt is the Adaptive Block-size Transform (ABT). In order to conform to the FRExt, a Fractional Motion Estimation (FME) architecture is proposed to support the 8×8/4×4 adaptive Hadamard Transform (8×8/4×4 AHT). The 8×8/4×4 AHT circuit contributes to higher throughput and encoding performance. In order to increase the utilization of SATD (Sum of Absolute Transformed Difference) Generator (SG) in unit time, the proposed architecture employs two 8-pel interpolators (IP) to time-share one SG. These two IPs can work in turn to provide the available data continuously to the SG, which increases the data throughput and significantly reduces the cycles that are needed to process one Macroblock. Furthermore, this architecture also exploits the linear feature of Hadamard Transform to generate the quarter-pel SATD. This method could help to shorten the long datapath in the second-step of two-iteration FME algorithm. Finally, experimental results show that this architecture could be used in the applications requiring different performances by adjusting the supported modes and operation frequency. It can support the real-time encoding of the seven-mode 4K×2K@24fps or six-mode 4K×2K@30fps video sequences.

  18. Design Principles for Synthesizable Processor Cores

    DEFF Research Database (Denmark)

    Schleuniger, Pascal; McKee, Sally A.; Karlsson, Sven

    2012-01-01

    As FPGAs get more competitive, synthesizable processor cores become an attractive choice for embedded computing. Currently popular commercial processor cores do not fully exploit current FPGA architectures. In this paper, we propose general design principles to increase instruction throughput...... on FPGA-based processor cores: first, superpipelining enables higher-frequency system clocks, and second, predicated instructions circumvent costly pipeline stalls due to branches. To evaluate their effects, we develop Tinuso, a processor architecture optimized for FPGA implementation. We demonstrate...

  19. Microfluidic very large scale integration (VLSI) modeling, simulation, testing, compilation and physical synthesis

    CERN Document Server

    Pop, Paul; Madsen, Jan

    2016-01-01

    This book presents the state-of-the-art techniques for the modeling, simulation, testing, compilation and physical synthesis of mVLSI biochips. The authors describe a top-down modeling and synthesis methodology for the mVLSI biochips, inspired by microelectronics VLSI methodologies. They introduce a modeling framework for the components and the biochip architecture, and a high-level microfluidic protocol language. Coverage includes a topology graph-based model for the biochip architecture, and a sequencing graph to model for biochemical application, showing how the application model can be obtained from the protocol language. The techniques described facilitate programmability and automation, enabling developers in the emerging, large biochip market. · Presents the current models used for the research on compilation and synthesis techniques of mVLSI biochips in a tutorial fashion; · Includes a set of "benchmarks", that are presented in great detail and includes the source code of several of the techniques p...

  20. Reconfigurable Communication Processor:A New Approach for Network Processor

    Institute of Scientific and Technical Information of China (English)

    孙华; 陈青山; 张文渊

    2003-01-01

    As the traditional RISC +ASIC/ASSP approach for network processor design can not meet the today'srequirements, this paper described an alternate approach, Reconfigurable Processing Architecture, to boost theperformance to ASIC level while reserve the programmability of the traditional RISC based system. This papercovers both the hardware architecture and the software development environment architecture.

  1. Evaluating New Architectural Features Of The Intel(R Xeon(R 7500 Processor For Hpc Workloads

    Directory of Open Access Journals (Sweden)

    Paweł Gepner

    2011-01-01

    Full Text Available In this paper we take a look at what the Intel Xeon Processor 7500 family, code namedNehalem-EX, brings to high performance computing. We compare two families of Intel Xeonbased systems (Intel Xeon 7500 and Intel Xeon 5600 and present a performance evolutionof 16 node clusters based on these CPUs. We compare CPU generations utilizing dual socketplatforms and a cluster across a number of HPC benchmarks and focused on differentperformance field and aspect. We will evaluate also technologies and features like Intels HyperThreading Technology (HT and Intel Turbo Boost Technology (Turbo Mode and theperformance implication of these technologies for HPC.

  2. An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications.

    Science.gov (United States)

    Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun

    2015-12-01

    Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm × 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.

  3. UW VLSI chip tester

    Science.gov (United States)

    McKenzie, Neil

    1989-12-01

    We present a design for a low-cost, functional VLSI chip tester. It is based on the Apple MacIntosh II personal computer. It tests chips that have up to 128 pins. All pin drivers of the tester are bidirectional; each pin is programmed independently as an input or an output. The tester can test both static and dynamic chips. Rudimentary speed testing is provided. Chips are tested by executing C programs written by the user. A software library is provided for program development. Tests run under both the Mac Operating System and A/UX. The design is implemented using Xilinx Logic Cell Arrays. Price/performance tradeoffs are discussed.

  4. The VLSI handbook

    CERN Document Server

    Chen, Wai-Kai

    2007-01-01

    Written by a stellar international panel of expert contributors, this handbook remains the most up-to-date, reliable, and comprehensive source for real answers to practical problems. In addition to updated information in most chapters, this edition features several heavily revised and completely rewritten chapters, new chapters on such topics as CMOS fabrication and high-speed circuit design, heavily revised sections on testing of digital systems and design languages, and two entirely new sections on low-power electronics and VLSI signal processing. An updated compendium of references and othe

  5. Mixed voltage VLSI design

    Science.gov (United States)

    Panwar, Ramesh; Rennels, David; Alkalaj, Leon

    1993-01-01

    A technique for minimizing the power dissipated in a Very Large Scale Integration (VLSI) chip by lowering the operating voltage without any significant penalty in the chip throughput even though low voltage operation results in slower circuits. Since the overall throughput of a VLSI chip depends on the speed of the critical path(s) in the chip, it may be possible to sustain the throughput rates attained at higher voltages by operating the circuits in the critical path(s) with a high voltage while operating the other circuits with a lower voltage to minimize the power dissipation. The interface between the gates which operate at different voltages is crucial for low power dissipation since the interface may possibly have high static current dissipation thus negating the gains of the low voltage operation. The design of a voltage level translator which does the interface between the low voltage and high voltage circuits without any significant static dissipation is presented. Then, the results of the mixed voltage design using a greedy algorithm on three chips for various operating voltages are presented.

  6. Optimal speech codec implementation on ARM9E (v5E architecture) RISC processor for next-generation mobile multimedia

    Science.gov (United States)

    Bangla, Ajay Kumar; Vinay, M. K.; Suresh Babu, P. V.

    2004-01-01

    The mobile phone is undergoing a rapid evolution from a voice and limited text-messaging device to a complete multimedia client. RISC processors are predominantly used in these devices due to low cost, time to market and power consumption. The growing demand for signal processing performance on these platforms has triggered a convergence of RISC, CISC and DSP technologies on to a single core/system. This convergence leads to a multitude of challenges for optimal usage of available processing power. Voice codecs, which have been traditionally implemented on DSP platforms, have been adapted to sole RISC platforms as well. In this paper, the issues involved in optimizing a standard vocoder for RISC-DSP convergence platform (DSP enhanced RISC platforms) are addressed. Our optimization techniques are based on identification of algorithms, which could exploit either the DSP features or the RISC features or both. A few algorithmic modifications have also been suggested. By a systematic application of these optimization techniques for a GSM-AMR (NB) codec on ARM9E core, we could achieve more than 77% improvement over the baseline codec and almost 33% over that optimized for a RISC platform (ARM9T) alone in terms of processing cycle requirements. The optimization techniques outlined are generic in nature and are applicable to other vocoders on similar 'application-platform" combinations.

  7. Embedded Processor Oriented Compiler Infrastructure

    Directory of Open Access Journals (Sweden)

    DJUKIC, M.

    2014-08-01

    Full Text Available In the recent years, research of special compiler techniques and algorithms for embedded processors broaden the knowledge of how to achieve better compiler performance in irregular processor architectures. However, industrial strength compilers, besides ability to generate efficient code, must also be robust, understandable, maintainable, and extensible. This raises the need for compiler infrastructure that provides means for convenient implementation of embedded processor oriented compiler techniques. Cirrus Logic Coyote 32 DSP is an example that shows how traditional compiler infrastructure is not able to cope with the problem. That is why the new compiler infrastructure was developed for this processor, based on research. in the field of embedded system software tools and experience in development of industrial strength compilers. The new infrastructure is described in this paper. Compiler generated code quality is compared with code generated by the previous compiler for the same processor architecture.

  8. Green Building between Tradition and Modernity Study Comparative Analysis between Conventional Methods and Updated Styles of Design and Architecture Processors

    Directory of Open Access Journals (Sweden)

    H Elshimy

    2017-03-01

    Full Text Available Green house   concept appeared from the ancient to the modern age ages and there is a tendency to use a traditional architecture with a pristine ecological environment areas and through sophisticated systems arrived to modern systems of the upgraded systems by Treatment architectural achieve environmental   sustainability   in   recent   years,   sustainability concept has become the common interest of numerous disciplines. The reason for this popularity is to perform the sustainable development. The Concept of Green Architecture, also known as "sustainable architecture” or “green house,” is the theory, science and style of buildings designed and constructed in accordance   with environmentally   friendly   principles.   Green house strives to minimize the number of resources consumed in the   building's  construction,   use   and   operation,   as  well  as curtailing  the  harm  done  to  the  environment  through  the emission, pollution and waste of its components.To design, construct, operate and maintain buildings energy, water and new materials are utilized as well as amounts of waste causing negative effects to health and environment is generated. In order to limit these effects and design environmentally sound and resource efficient buildings; "green building systems" must be introduced, clarified, understood and practiced.This paper aims at highlighting these difficult and complex issues of sustainability which encompass the scope of almost every aspect of human life.

  9. VLSI circuits for bidirectional interface to peripheral and visceral nerves.

    Science.gov (United States)

    Greenwald, Elliot; Wang, Qihong; Thakor, Nitish V

    2015-08-01

    This paper presents an architecture for sensing nerve signals and delivering functional electrical stimulation to peripheral and visceral nerves. The design is based on the very large scale integration (VLSI) technology and amenable to interface to microelectrodes and building a fully implantable system. The proposed stimulator was tested on the vagus nerve and is under further evaluation and testing of various visceral nerves and their functional effects on the innervated organs.

  10. An efficient ASIC implementation of 16-channel on-line recursive ICA processor for real-time EEG system.

    Science.gov (United States)

    Fang, Wai-Chi; Huang, Kuan-Ju; Chou, Chia-Ching; Chang, Jui-Chung; Cauwenberghs, Gert; Jung, Tzyy-Ping

    2014-01-01

    This is a proposal for an efficient very-large-scale integration (VLSI) design, 16-channel on-line recursive independent component analysis (ORICA) processor ASIC for real-time EEG system, implemented with TSMC 40 nm CMOS technology. ORICA is appropriate to be used in real-time EEG system to separate artifacts because of its highly efficient and real-time process features. The proposed ORICA processor is composed of an ORICA processing unit and a singular value decomposition (SVD) processing unit. Compared with previous work [1], this proposed ORICA processor has enhanced effectiveness and reduced hardware complexity by utilizing a deeper pipeline architecture, shared arithmetic processing unit, and shared registers. The 16-channel random signals which contain 8-channel super-Gaussian and 8-channel sub-Gaussian components are used to analyze the dependence of the source components, and the average correlation coefficient is 0.95452 between the original source signals and extracted ORICA signals. Finally, the proposed ORICA processor ASIC is implemented with TSMC 40 nm CMOS technology, and it consumes 15.72 mW at 100 MHz operating frequency.

  11. VLSI circuits for high speed data conversion

    Science.gov (United States)

    Wooley, Bruce A.

    1994-05-01

    The focus of research has been the study of fundamental issues in the design and testing of data conversion interfaces for high performance VLSI signal processing and communications systems. Because of the increased speed and density that accompany the continuing scaling of VLSI technologies, digital means of processing, communicating, and storing information are rapidly displacing their analog counterparts across a broadening spectrum of applications. In such systems, the limitations on system performance generally occur at the interfaces between the digital representation of information and the analog environment in which the system is embedded. Specific results of this research include the design and implementation of low-power BiCMOS comparators and sample-and-hold amplifiers operating at clock rates as high as 200 MHz, the design and integration of a 12-bit, 5 MHz CMOS A/D converter employing a two-step architecture and a novel self-calibrating comparator, the design and integration of an optoelectronic communications receiver front-end in a GaAs-on-Si technology, the initiation of research into the use of an active silicon substrate probe card for fully testing high-performance mixed-signal circuits at the wafer level, and a preliminary study of means for correcting dynamic errors in high-performance A/D converters.

  12. An Efficient Architecture Design of Reconfigurable Float-point FFT Processor%高效可配置浮点FFT处理器设计

    Institute of Scientific and Technical Information of China (English)

    桑红石; 高伟

    2012-01-01

    Large resource cost is the design bottleneck of high-precision float-point FFT processor,a novel R2/22SDF reconfigurable architecture using shared-butterfly which employs single-port-based FIFO.Radix 2/22algorithm and pipeline architecture,which is suitable for float-point design,can reduce the multiplicative complexity and improve the multiplication efficiency.The FIFO memory using double-width single-port ram can avoid the larger area and power coat of dual-port ram.Two butterfly units can be merged by the proposed shared-butterfly architecture,which solves the low utilization factor problem of traditional single-path-delay-feedback architecture.The float-point design cost is efficiently reduced and the calculator utilization factor is improved,compared with the traditional pipeline method.%为了克服高精度浮点FFT处理器具有较大资源开销的设计瓶颈,采用基于单口存储器的FIFO构建共享蝶形结构的R2/22SDF流水可配置结构.采用适合浮点设计的基2/22算法实现流水结构,不仅有利于可配置电路的实现,还能够有效减少复数乘法次数,提高复数乘法器的计算效率.采用双倍数据位宽的单口存储器实现FIFO存储器,有效避免了双口存储器面积和功耗较大的问题.改进的蝶形共享结构实现两级蝶形的合并,解决了单路径延迟反馈流水线结构蝶形单元利用率低的问题.与传统流水线结构FFT处理器设计相比,有效降低了浮点设计中的资源开销,提高了计算单元的利用效率.

  13. Tiled Multicore Processors

    Science.gov (United States)

    Taylor, Michael B.; Lee, Walter; Miller, Jason E.; Wentzlaff, David; Bratt, Ian; Greenwald, Ben; Hoffmann, Henry; Johnson, Paul R.; Kim, Jason S.; Psota, James; Saraf, Arvind; Shnidman, Nathan; Strumpen, Volker; Frank, Matthew I.; Amarasinghe, Saman; Agarwal, Anant

    For the last few decades Moore’s Law has continually provided exponential growth in the number of transistors on a single chip. This chapter describes a class of architectures, called tiled multicore architectures, that are designed to exploit massive quantities of on-chip resources in an efficient, scalable manner. Tiled multicore architectures combine each processor core with a switch to create a modular element called a tile. Tiles are replicated on a chip as needed to create multicores with any number of tiles. The Raw processor, a pioneering example of a tiled multicore processor, is examined in detail to explain the philosophy, design, and strengths of such architectures. Raw addresses the challenge of building a general-purpose architecture that performs well on a larger class of stream and embedded computing applications than existing microprocessors, while still running existing ILP-based sequential programs with reasonable performance. Central to achieving this goal is Raw’s ability to exploit all forms of parallelism, including ILP, DLP, TLP, and Stream parallelism. Raw approaches this challenge by implementing plenty of on-chip resources - including logic, wires, and pins - in a tiled arrangement, and exposing them through a new ISA, so that the software can take advantage of these resources for parallel applications. Compared to a traditional superscalar processor, Raw performs within a factor of 2x for sequential applications with a very low degree of ILP, about 2x-9x better for higher levels of ILP, and 10x-100x better when highly parallel applications are coded in a stream language or optimized by hand.

  14. Compilation Techniques Specific for a Hardware Cryptography-Embedded Multimedia Mobile Processor

    Directory of Open Access Journals (Sweden)

    Masa-aki FUKASE

    2007-12-01

    Full Text Available The development of single chip VLSI processors is the key technology of ever growing pervasive computing to answer overall demands for usability, mobility, speed, security, etc. We have so far developed a hardware cryptography-embedded multimedia mobile processor architecture, HCgorilla. Since HCgorilla integrates a wide range of techniques from architectures to applications and languages, one-sided design approach is not always useful. HCgorilla needs more complicated strategy, that is, hardware/software (H/S codesign. Thus, we exploit the software support of HCgorilla composed of a Java interface and parallelizing compilers. They are assumed to be installed in servers in order to reduce the load and increase the performance of HCgorilla-embedded clients. Since compilers are the essence of software's responsibility, we focus in this article on our recent results about the design, specifications, and prototyping of parallelizing compilers for HCgorilla. The parallelizing compilers are composed of a multicore compiler and a LIW compiler. They are specified to abstract parallelism from executable serial codes or the Java interface output and output the codes executable in parallel by HCgorilla. The prototyping compilers are written in Java. The evaluation by using an arithmetic test program shows the reasonability of the prototyping compilers compared with hand compilers.

  15. Compilation Techniques Specific for a Hardware Cryptography-Embedded Multimedia Mobile Processor

    Directory of Open Access Journals (Sweden)

    Masa-aki FUKASE

    2007-12-01

    Full Text Available The development of single chip VLSI processors is the key technology of ever growing pervasive computing to answer overall demands for usability, mobility, speed, security, etc. We have so far developed a hardware cryptography-embedded multimedia mobile processor architecture, HCgorilla. Since HCgorilla integrates a wide range of techniques from architectures to applications and languages, one-sided design approach is not always useful. HCgorilla needs more complicated strategy, that is, hardware/software (H/S codesign. Thus, we exploit the software support of HCgorilla composed of a Java interface and parallelizing compilers. They are assumed to be installed in servers in order to reduce the load and increase the performance of HCgorilla-embedded clients. Since compilers are the essence of software's responsibility, we focus in this article on our recent results about the design, specifications, and prototyping of parallelizing compilers for HCgorilla. The parallelizing compilers are composed of a multicore compiler and a LIW compiler. They are specified to abstract parallelism from executable serial codes or the Java interface output and output the codes executable in parallel by HCgorilla. The prototyping compilers are written in Java. The evaluation by using an arithmetic test program shows the reasonability of the prototyping compilers compared with hand compilers.

  16. Secure Co-processor and Billboard Manager Based Architecture Help to Protect & Store the Citrix Xenserver Based Virtual Data.

    Directory of Open Access Journals (Sweden)

    Debabrata Sarddar

    2014-01-01

    Full Text Available Any discussion of Cloud computing typically begins with virtualization. Virtualization is critical to cloud computing because it simplifies the delivery of services by providing a platform for optimizing complex IT resources in a scalable manner, which is what makes cloud computing so cost effective. Desktop virtualization, often called client virtualization, is a virtualization technology used to separate a computer desktop environment from the physical computer. Desktop virtualization is considered a type of client-server computing model because the "virtualized" desktop is stored on a centralized, or remote, server and not the physical machine being virtualized. Desktop virtualization "virtualizes desktop computers" and these virtual desktop environments are "served" to users on the network. In this paper, we proposed a secure cloud data center architecture that made by an application virtualization product like citrix xenapp/citrix xen desktop and with a proposed model that help us to encrypt and store the data like virtualized desktop or virtualized application in a suitable storage area.

  17. Synaptic dynamics in analog VLSI.

    Science.gov (United States)

    Bartolozzi, Chiara; Indiveri, Giacomo

    2007-10-01

    Synapses are crucial elements for computation and information transfer in both real and artificial neural systems. Recent experimental findings and theoretical models of pulse-based neural networks suggest that synaptic dynamics can play a crucial role for learning neural codes and encoding spatiotemporal spike patterns. Within the context of hardware implementations of pulse-based neural networks, several analog VLSI circuits modeling synaptic functionality have been proposed. We present an overview of previously proposed circuits and describe a novel analog VLSI synaptic circuit suitable for integration in large VLSI spike-based neural systems. The circuit proposed is based on a computational model that fits the real postsynaptic currents with exponentials. We present experimental data showing how the circuit exhibits realistic dynamics and show how it can be connected to additional modules for implementing a wide range of synaptic properties.

  18. VLSI IMPLEMENTATION OF CHANNEL ESTIMATION FOR MIMO-OFDM TRANSCEIVER

    Directory of Open Access Journals (Sweden)

    Joseph Gladwin Sekar

    2013-01-01

    Full Text Available In this study the VLSI architecture for MIMO-OFDM transceiver and the algorithm for the implementation of MMSE detection in MIMO-OFDM system is proposed. The implemented MIMO-OFDM system is capable of transmitting data at high throughput in physical layer and provides optimized hardware resources while achieving the same data rate. The proposed architecture has low latency, high throughput and efficient resource utilization. The result obtained is compared with the MATLAB results for verification. The main aim is to reduce the hardware complexity of the channel estimation.

  19. Efficient Algorithm and Architecture of Critical-Band Transform for Low-Power Speech Applications

    Directory of Open Access Journals (Sweden)

    Woon-Seng Gan

    2007-01-01

    Full Text Available An efficient algorithm and its corresponding VLSI architecture for the critical-band transform (CBT are developed to approximate the critical-band filtering of the human ear. The CBT consists of a constant-bandwidth transform in the lower frequency range and a Brown constant-Q transform (CQT in the higher frequency range. The corresponding VLSI architecture is proposed to achieve significant power efficiency by reducing the computational complexity, using pipeline and parallel processing, and applying the supply voltage scaling technique. A 21-band Bark scale CBT processor with a sampling rate of 16 kHz is designed and simulated. Simulation results verify its suitability for performing short-time spectral analysis on speech. It has a better fitting on the human ear critical-band analysis, significantly fewer computations, and therefore is more energy-efficient than other methods. With a 0.35 μm CMOS technology, it calculates a 160-point speech in 4.99 milliseconds at 234 kHz. The power dissipation is 15.6 μW at 1.1 V. It achieves 82.1% power reduction as compared to a benchmark 256-point FFT processor.

  20. Efficient Algorithm and Architecture of Critical-Band Transform for Low-Power Speech Applications

    Directory of Open Access Journals (Sweden)

    Gan Woon-Seng

    2007-01-01

    Full Text Available An efficient algorithm and its corresponding VLSI architecture for the critical-band transform (CBT are developed to approximate the critical-band filtering of the human ear. The CBT consists of a constant-bandwidth transform in the lower frequency range and a Brown constant- transform (CQT in the higher frequency range. The corresponding VLSI architecture is proposed to achieve significant power efficiency by reducing the computational complexity, using pipeline and parallel processing, and applying the supply voltage scaling technique. A 21-band Bark scale CBT processor with a sampling rate of 16 kHz is designed and simulated. Simulation results verify its suitability for performing short-time spectral analysis on speech. It has a better fitting on the human ear critical-band analysis, significantly fewer computations, and therefore is more energy-efficient than other methods. With a 0.35 m CMOS technology, it calculates a 160-point speech in 4.99 milliseconds at 234 kHz. The power dissipation is 15.6 W at 1.1 V. It achieves 82.1 power reduction as compared to a benchmark 256-point FFT processor.

  1. Design of Analog VLSI Architecture for DCT

    OpenAIRE

    2012-01-01

    When implementing real-time DSP algorithms on digital circuits, the system is always constrained by limited speed, accuracy and roundoff noise. These limitations must be taken into account for the design and implementation stages. Doubling the dynamic rate of theanalog DCT is expensive, whereas in digital DCT an addition of 1 bit in data path is adequate. This paper proposes a novel approach ofanalog CMOS implementation technique for Digital Signal Processing (DSP) algorithms to reduce the ar...

  2. A digital neuron-type processor and its VLSI design

    Science.gov (United States)

    Akel, H.; Habib, Mahmoud K.

    1989-05-01

    A set of neuron-type circuits elements based on logic gate circuits with multiinput multifan output capability is described. Three types of elements are introduced, one called the cell body with its dendritic inputs and synaptic junction, another representing the axon base, and the axon circuit. These three elements are cascaded to form a neuron-type processing element. The circuit performs input temporal and spatial summation as well as thresholding. The entire neuron circuit is simulated and a design is given using VSLI techniques.

  3. Design and Implementation of VLSI Prime Factor Algorithm Processor.

    Science.gov (United States)

    1987-12-01

    for A, 1i ’ 1 1 Fh Ai ,,r equjAtIInI. art, ( , 4 10 4,t’ 4 ( - /’ cr tht, (-arr% sur ma% akL, be represented as Figure 36 Carry Select Adder Blocking... Select Adder Blocking .......................................................... 81 Figure 37: ALU Adder Cell...ALU Logic Implementation............................................................ 81 viii J,.. in List of Figures (continued) Figure 36: Carry

  4. A cost-effective methodology for the design of massively-parallel VLSI functional units

    Science.gov (United States)

    Venkateswaran, N.; Sriram, G.; Desouza, J.

    1993-01-01

    In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.

  5. Recovery Act - CAREER: Sustainable Silicon -- Energy-Efficient VLSI Interconnect for Extreme-Scale Computing

    Energy Technology Data Exchange (ETDEWEB)

    Chiang, Patrick [Oregon State Univ., Corvallis, OR (United States)

    2014-01-31

    The research goal of this CAREER proposal is to develop energy-efficient, VLSI interconnect circuits and systems that will facilitate future massively-parallel, high-performance computing. Extreme-scale computing will exhibit massive parallelism on multiple vertical levels, from thou­ sands of computational units on a single processor to thousands of processors in a single data center. Unfortunately, the energy required to communicate between these units at every level (on­ chip, off-chip, off-rack) will be the critical limitation to energy efficiency. Therefore, the PI's career goal is to become a leading researcher in the design of energy-efficient VLSI interconnect for future computing systems.

  6. VLSI mixed signal processing system

    Science.gov (United States)

    Alvarez, A.; Premkumar, A. B.

    1993-01-01

    An economical and efficient VLSI implementation of a mixed signal processing system (MSP) is presented in this paper. The MSP concept is investigated and the functional blocks of the proposed MSP are described. The requirements of each of the blocks are discussed in detail. A sample application using active acoustic cancellation technique is described to demonstrate the power of the MSP approach.

  7. Fundamentals of Microelectronics Processing (VLSI).

    Science.gov (United States)

    Takoudis, Christos G.

    1987-01-01

    Describes a 15-week course in the fundamentals of microelectronics processing in chemical engineering, which emphasizes the use of very large scale integration (VLSI). Provides a listing of the topics covered in the course outline, along with a sample of some of the final projects done by students. (TW)

  8. Making CSB + -Trees Processor Conscious

    DEFF Research Database (Denmark)

    Samuel, Michael; Pedersen, Anders Uhl; Bonnet, Philippe

    2005-01-01

    Cache-conscious indexes, such as CSB+-tree, are sensitive to the underlying processor architecture. In this paper, we focus on how to adapt the CSB+-tree so that it performs well on a range of different processor architectures. Previous work has focused on the impact of node size on the performance...... of the CSB+-tree. We argue that it is necessary to consider a larger group of parameters in order to adapt CSB+-tree to processor architectures as different as Pentium and Itanium. We identify this group of parameters and study how it impacts the performance of CSB+-tree on Itanium 2. Finally, we propose...... a systematic method for adapting CSB+-tree to new platforms. This work is a first step towards integrating CSB+-tree in MySQL’s heap storage manager....

  9. A Design Methodology for Optoelectronic VLSI

    Science.gov (United States)

    2007-01-01

    it for the layout of large-scale VLSI circuits such as bit-parallel datapaths , crossbars, RAMs, megacells and cores. These VLSI circuits have custom...by the 64-bit ALU and the 64-bit register file circuits. Typically, these VLSI circuits use a datapath layout style that creates a highly regular row...and column structure. The datapath layout style is preferred for multiple-bit processing circuits because it achieves uniform timing for all bits in a

  10. Making CSB+-Tree Processor Conscious

    DEFF Research Database (Denmark)

    Samuel, Michael; Pedersen, Anders Uhl; Bonnet, Philippe

    2005-01-01

    Cache-conscious indexes, such as CSB+-tree, are sensitive to the underlying processor architecture. In this paper, we focus on how to adapt the CSB+-tree so that it performs well on a range of different processor architectures. Previous work has focused on the impact of node size on the performance...... of the CSB+-tree. We argue that it is necessary to consider a larger group of parameters in order to adapt CSB+-tree to processor architectures as different as Pentium and Itanium. We identify this group of parameters and study how it impacts the performance of CSB+-tree on Itanium 2. Finally, we propose...

  11. A Coherent VLSI Environment

    Science.gov (United States)

    1987-03-31

    smallest and largest eigenvalues of YH and AminAH and Am,..AH represent the smallest and largest eigenvalues of YAH, respectively. Fig. 3b illustrates a...101, Princeton U. Press, Princeton, NJ, 1970. [17] G. Clark and R. Zippel, "Schema: An Architecture for Knowledge Based Design," International

  12. An Experimental Digital Image Processor

    Science.gov (United States)

    Cok, Ronald S.

    1986-12-01

    A prototype digital image processor for enhancing photographic images has been built in the Research Laboratories at Kodak. This image processor implements a particular version of each of the following algorithms: photographic grain and noise removal, edge sharpening, multidimensional image-segmentation, image-tone reproduction adjustment, and image-color saturation adjustment. All processing, except for segmentation and analysis, is performed by massively parallel and pipelined special-purpose hardware. This hardware runs at 10 MHz and can be adjusted to handle any size digital image. The segmentation circuits run at 30 MHz. The segmentation data are used by three single-board computers for calculating the tonescale adjustment curves. The system, as a whole, has the capability of completely processing 10 million three-color pixels per second. The grain removal and edge enhancement algorithms represent the largest part of the pipelined hardware, operating at over 8 billion integer operations per second. The edge enhancement is performed by unsharp masking, and the grain removal is done using a collapsed Walsh-hadamard transform filtering technique (U.S. Patent No. 4549212). These two algo-rithms can be realized using four basic processing elements, some of which have been imple-mented as VLSI semicustom integrated circuits. These circuits implement the algorithms with a high degree of efficiency, modularity, and testability. The digital processor is controlled by a Digital Equipment Corporation (DEC) PDP 11 minicomputer and can be interfaced to electronic printing and/or electronic scanning de-vices. The processor has been used to process over a thousand diagnostic images.

  13. Functional Verification of Enhanced RISC Processor

    OpenAIRE

    SHANKER NILANGI; SOWMYA L

    2013-01-01

    This paper presents design and verification of a 32-bit enhanced RISC processor core having floating point computations integrated within the core, has been designed to reduce the cost and complexity. The designed 3 stage pipelined 32-bit RISC processor is based on the ARM7 processor architecture with single precision floating point multiplier, floating point adder/subtractor for floating point operations and 32 x 32 booths multiplier added to the integer core of ARM7. The binary representati...

  14. Design Principles for Synthesizable Processor Cores

    DEFF Research Database (Denmark)

    Schleuniger, Pascal; McKee, Sally A.; Karlsson, Sven

    2012-01-01

    As FPGAs get more competitive, synthesizable processor cores become an attractive choice for embedded computing. Currently popular commercial processor cores do not fully exploit current FPGA architectures. In this paper, we propose general design principles to increase instruction throughput...... through the use of micro-benchmarks that our principles guide the design of a processor core that improves performance by an average of 38% over a similar Xilinx MicroBlaze configuration....

  15. VLSI Watermark Implementations and Applications

    OpenAIRE

    Shoshan, Yonatan; Fish, Alexander; Li, Xin; Jullien, Graham,; Yadid-Pecht, Orly

    2008-01-01

    This paper presents an up to date review of digital watermarking (WM) from a VLSI designer point of view. The reader is introduced to basic principles and terms in the field of image watermarking. It goes through a brief survey on WM theory, laying out common classification criterions and discussing important design considerations and trade-offs. Elementary WM properties such as robustness, computational complexity and their influence on image quality are discussed. Common att...

  16. Implementation of Plasmonics in VLSI

    Directory of Open Access Journals (Sweden)

    Shreya Bhattacharya

    2012-12-01

    Full Text Available This Paper presents the idea of Very Large Scale Integration (VLSI using Plasmonic Waveguides.Current VLSI techniques are facing challenges with respect to clock frequencies which tend to scale up, making it more difficult for the designers to distribute and maintain low clock skew between these high frequency clocks across the entire chip. Surface Plasmons are light waves that occur at a metal/dielectric interface, where a group of electrons is collectively moving back and forth. These waves are trapped near the surface as they interact with the plasma of electrons near the surface of the metal. The decay length of SPs into the metal is two orders of magnitude smaller than the wavelength of the light in air. This feature of SPs provides the possibility of localization and the guiding of light in sub wavelength metallic structures, and it can be used to construct miniaturized optoelectronic circuits with sub wavelength components. In this paper, various methods of doing the same have been discussed some of which include DLSPPW’s, Plasmon waveguides by self-assembly, Silicon-based plasmonic waveguides etc. Hence by using Plasmonic chips, the speed, size and efficiency of microprocessor chips can be revolutionized thus bringing a whole new dimension to VLSI design.

  17. Implementation of Plasmonics in VLSI

    Directory of Open Access Journals (Sweden)

    Shreya Bhattacharya

    2012-12-01

    Full Text Available This Paper presents the idea of Very Large Scale Integration (VLSI using Plasmonic Waveguides. Current VLSI techniques are facing challenges with respect to clock frequencies which tend to scale up, making it more difficult for the designers to distribute and maintain low clock skew between these high frequency clocks across the entire chip. Surface Plasmons are light waves that occur at a metal/dielectric interface, where a group of electrons is collectively moving back and forth. These waves are trapped near the surface as they interact with the plasma of electrons near the surface of the metal. The decay length of SPs into the metal is two orders of magnitude smaller than the wavelength of the light in air. This feature of SPs provides the possibility of localization and the guiding of light in sub wavelength metallic structures, and it can be used to construct miniaturized optoelectronic circuits with sub wavelength components. In this paper, various methods of doing the same have been discussed some of which include DLSPPW’s, Plasmon waveguides by self-assembly, Silicon-based plasmonic waveguides etc. Hence by using Plasmonic chips, the speed, size and efficiency of microprocessor chips can be revolutionized thus bringing a whole new dimension to VLSI design.

  18. DART: A Functional-Level Reconfigurable Architecture for High Energy Efficiency

    Directory of Open Access Journals (Sweden)

    David Raphaël

    2008-01-01

    Full Text Available Abstract Flexibility becomes a major concern for the development of multimedia and mobile communication systems, as well as classical high-performance and low-energy consumption constraints. The use of general-purpose processors solves flexibility problems but fails to cope with the increasing demand for energy efficiency. This paper presents the DART architecture based on the functional-level reconfiguration paradigm which allows a significant improvement in energy efficiency. DART is built around a hierarchical interconnection network allowing high flexibility while keeping the power overhead low. To enable specific optimizations, DART supports two modes of reconfiguration. The compilation framework is built using compilation and high-level synthesis techniques. A 3G mobile communication application has been implemented as a proof of concept. The energy distribution within the architecture and the physical implementation are also discussed. Finally, the VLSI design of a 0.13  m CMOS SoC implementing a specialized DART cluster is presented.

  19. Ultrafast Fourier-transform parallel processor

    Energy Technology Data Exchange (ETDEWEB)

    Greenberg, W.L.

    1980-04-01

    A new, flexible, parallel-processing architecture is developed for a high-speed, high-precision Fourier transform processor. The processor is intended for use in 2-D signal processing including spatial filtering, matched filtering and image reconstruction from projections.

  20. The TM3270 Media-processor

    NARCIS (Netherlands)

    van de Waerdt, J.W.

    2006-01-01

    I n this thesis, we present the TM3270 VLIW media-processor, the latest of TriMedia processors, and describe the innovations with respect to its prede- cessor: the TM3260. We describe enhancements to the load/store unit design, such as a new data prefetching technique, and architectural

  1. The TM3270 Media-processor

    NARCIS (Netherlands)

    van de Waerdt, J.W.

    2006-01-01

    I n this thesis, we present the TM3270 VLIW media-processor, the latest of TriMedia processors, and describe the innovations with respect to its prede- cessor: the TM3260. We describe enhancements to the load/store unit design, such as a new data prefetching technique, and architectural enhancements

  2. Spaceborne Processor Array

    Science.gov (United States)

    Chow, Edward T.; Schatzel, Donald V.; Whitaker, William D.; Sterling, Thomas

    2008-01-01

    A Spaceborne Processor Array in Multifunctional Structure (SPAMS) can lower the total mass of the electronic and structural overhead of spacecraft, resulting in reduced launch costs, while increasing the science return through dynamic onboard computing. SPAMS integrates the multifunctional structure (MFS) and the Gilgamesh Memory, Intelligence, and Network Device (MIND) multi-core in-memory computer architecture into a single-system super-architecture. This transforms every inch of a spacecraft into a sharable, interconnected, smart computing element to increase computing performance while simultaneously reducing mass. The MIND in-memory architecture provides a foundation for high-performance, low-power, and fault-tolerant computing. The MIND chip has an internal structure that includes memory, processing, and communication functionality. The Gilgamesh is a scalable system comprising multiple MIND chips interconnected to operate as a single, tightly coupled, parallel computer. The array of MIND components shares a global, virtual name space for program variables and tasks that are allocated at run time to the distributed physical memory and processing resources. Individual processor- memory nodes can be activated or powered down at run time to provide active power management and to configure around faults. A SPAMS system is comprised of a distributed Gilgamesh array built into MFS, interfaces into instrument and communication subsystems, a mass storage interface, and a radiation-hardened flight computer.

  3. Multithreading architecture

    CERN Document Server

    Nemirovsky, Mario

    2013-01-01

    Multithreaded architectures now appear across the entire range of computing devices, from the highest-performing general purpose devices to low-end embedded processors. Multithreading enables a processor core to more effectively utilize its computational resources, as a stall in one thread need not cause execution resources to be idle. This enables the computer architect to maximize performance within area constraints, power constraints, or energy constraints. However, the architectural options for the processor designer or architect looking to implement multithreading are quite extensive and

  4. First Cluster Algorithm Special Purpose Processor

    Science.gov (United States)

    Talapov, A. L.; Andreichenko, V. B.; Dotsenko S., Vi.; Shchur, L. N.

    We describe the architecture of the special purpose processor built to realize in hardware cluster Wolff algorithm, which is not hampered by a critical slowing down. The processor simulates two-dimensional Ising-like spin systems. With minor changes the same very effective architecture, which can be defined as a Memory Machine, can be used to study phase transitions in a wide range of models in two or three dimensions.

  5. Review: “Implementation of Feedforward and Feedback Neural Network for Signal Processing Using Analog VLSI Technology”

    Directory of Open Access Journals (Sweden)

    Miss. Rachana R. Patil

    2015-01-01

    Full Text Available Main focus of project is on implementation of Neural Network Architecture (NNA with on chip learning on Analog VLSI Technology for signal processing application. In the proposed paper the analog components like Gilbert Cell Multiplier (GCM, Neuron Activation Function (NAF are used to implement artificial NNA. Analog components used comprises of multiplier, adder and tan sigmoidal function circuit using MOS transistor. This Neural Architecture is trained using Back Propagation (BP Algorithm in analog domain with new techniques of weight storage. Layout design and verification of above design is carried out using VLSI Backend Microwind 3.1 software Tool. The technology used to design layout is 32 nm CMOS Technology

  6. The TMS34010 graphic processor - an architecture for image visualization in NMR tomography; O processador grafico TMS34010 - uma arquitetura para visualizacao de imagem em tomografia por RMN

    Energy Technology Data Exchange (ETDEWEB)

    Slaets, Jan Frans Willem; Paiva, Maria Stela Veludo de; Almeida, Lirio O.B

    1989-12-31

    This abstract presents a description of the minimum system implemented with the graphic processor TMS34010, which will be used in the reconstruction, treatment and interpretation f images obtained by NMR tomography. The project is being developed in the LIE (Electronic Instrumentation Laboratory), of the Sao Carlos Chemistry and Physical Institute, S P, Brazil and is already in operation 4 refs., 7 figs.

  7. Design and implementation of Gbps VLSI architecture of the cipher engine orienting to IEEE 802.11ac%面向802.11ac的安全加速引擎Gbps VLSI架构设计与实现

    Institute of Scientific and Technical Information of China (English)

    潘志鹏; 吴斌; 尉志伟; 叶甜春

    2015-01-01

    针对IEEE 802.11i协议中多种安全协议实现进行研究,结合以IEEE 802.11ac协议为代表的下一代无线局域网( WLAN)系统对高吞吐率的需求,提出了一种支持WEP/TKIP/CCMP协议的多模、高速安全加速引擎的大规模集成电路( VLSI)架构. 提出了基于哈希算法的密钥信息查找算法,缩小了查找时钟延迟. 基于复合域的运算方式实现高级加密标准( AES)算法,提出双AES运算核的并行架构实现计数器与密码分组链接( CCM)模式,提升运算吞吐率的同时也降低了引擎的响应延迟. 经过FPGA实现和ASIC流片验证表明,该安全加速引擎具备可重构性,处理延迟仅为33个时钟周期,在322 MHz工作频率下运算吞吐率可达3.747 Gbit/s.%In this paper, the implementation of multiple security protocols for IEEE 802.11i was researched. A very large scale integration ( VLSI) architecture of the multi-mode cipher engine supporting WEP/TKIP/CCMP proto-cols was presented taking into account the demand for high throughput of the next generation wireless local area net-work ( WLAN) system that is represented by IEEE 802.11ac. A key searching algorithm based on Hash scheme was proposed to reduce the lookup clock latency. For the high throughput hardware implementation of advanced encryp-tion standard ( AES) algorithm, composite field arithmetic was employed. In order to improve the data throughput and reduce the response time, dual AES computing core with parallel structure was used to implement the cipher block chaining message authentication code ( CCM) mode. The proposed design was implemented in both FPGA and ASIC. The results show that the cipher engine with reconfiguration architecture can achieve 33 clock cycles, and the computing throughput is 3.747 Gbit/s when the work frequency is 322 MHz.

  8. Invasive tightly coupled processor arrays

    CERN Document Server

    LARI, VAHID

    2016-01-01

    This book introduces new massively parallel computer (MPSoC) architectures called invasive tightly coupled processor arrays. It proposes strategies, architecture designs, and programming interfaces for invasive TCPAs that allow invading and subsequently executing loop programs with strict requirements or guarantees of non-functional execution qualities such as performance, power consumption, and reliability. For the first time, such a configurable processor array architecture consisting of locally interconnected VLIW processing elements can be claimed by programs, either in full or in part, using the principle of invasive computing. Invasive TCPAs provide unprecedented energy efficiency for the parallel execution of nested loop programs by avoiding any global memory access such as GPUs and may even support loops with complex dependencies such as loop-carried dependencies that are not amenable to parallel execution on GPUs. For this purpose, the book proposes different invasion strategies for claiming a desire...

  9. Concept of a Supervector Processor: A Vector Approach to Superscalar Processor, Design and Performance Analysis

    Directory of Open Access Journals (Sweden)

    Deepak Kumar, Ranjan Kumar Behera, K. S. Pandey

    2013-07-01

    Full Text Available To maximize the available performance is always a goal in microprocessor design. In this paper a new technique has been implemented which exploits the advantage of both superscalar and vector processing technique in a proposed processor called Supervector processor. Vector processor operates on array of data called vector and can greatly improve certain task such as numerical simulation and tasks which requires huge number crunching. On other handsuperscalar processor issues multiple instructions per cyclewhich can enhance the throughput. To implement parallelism multiple vector instructions were issued and executed per cycle in superscalar fashion. Case study has been done on various benchmarks to compare the performance of proposedsupervector processor architecture with superscalar and vectorprocessor architecture. Trimaran Framework has been used in order to evaluate the performance of the proposed supervector processor scheme.

  10. Surface and interface effects in VLSI

    CERN Document Server

    Einspruch, Norman G

    1985-01-01

    VLSI Electronics Microstructure Science, Volume 10: Surface and Interface Effects in VLSI provides the advances made in the science of semiconductor surface and interface as they relate to electronics. This volume aims to provide a better understanding and control of surface and interface related properties. The book begins with an introductory chapter on the intimate link between interfaces and devices. The book is then divided into two parts. The first part covers the chemical and geometric structures of prototypical VLSI interfaces. Subjects detailed include, the technologically most import

  11. Mapping of H.264 decoding on a multiprocessor architecture

    Science.gov (United States)

    van der Tol, Erik B.; Jaspers, Egbert G.; Gelderblom, Rob H.

    2003-05-01

    Due to the increasing significance of development costs in the competitive domain of high-volume consumer electronics, generic solutions are required to enable reuse of the design effort and to increase the potential market volume. As a result from this, Systems-on-Chip (SoCs) contain a growing amount of fully programmable media processing devices as opposed to application-specific systems, which offered the most attractive solutions due to a high performance density. The following motivates this trend. First, SoCs are increasingly dominated by their communication infrastructure and embedded memory, thereby making the cost of the functional units less significant. Moreover, the continuously growing design costs require generic solutions that can be applied over a broad product range. Hence, powerful programmable SoCs are becoming increasingly attractive. However, to enable power-efficient designs, that are also scalable over the advancing VLSI technology, parallelism should be fully exploited. Both task-level and instruction-level parallelism can be provided by means of e.g. a VLIW multiprocessor architecture. To provide the above-mentioned scalability, we propose to partition the data over the processors, instead of traditional functional partitioning. An advantage of this approach is the inherent locality of data, which is extremely important for communication-efficient software implementations. Consequently, a software implementation is discussed, enabling e.g. SD resolution H.264 decoding with a two-processor architecture, whereas High-Definition (HD) decoding can be achieved with an eight-processor system, executing the same software. Experimental results show that the data communication considerably reduces up to 65% directly improving the overall performance. Apart from considerable improvement in memory bandwidth, this novel concept of partitioning offers a natural approach for optimally balancing the load of all processors, thereby further improving the

  12. On the design of multimedia software and future system architectures

    Science.gov (United States)

    de With, Peter H. N.; Jaspers, Egbert G.

    2004-04-01

    A principal challenge for reducing the cost for designing complex systems-on-chip is to pursue more generic systems for a broad range of products. For this purpose, we explore three new architectural concepts for state-of-art video applications. First, we discuss a reusable scalable hardware architecture employing a hierarchical communication network fitting with the natural hierarchy of the application. In a case study, we show that MPEG streaming in DTV occurs at high level, while subsystems communicate at lower levels. The second concept is a software design that scales over a number of processors to enable reuse over a range of VLSI process technologies. We explore this via an H.264 decoder implementation scaling nearly linearly over up to eight processors by applying data partitioning. The third topic is resource-scalability, which is required to satisfy realtime constraints in a system with a high amount of shared resources. An example complexity-scalable MPEG-2 coder scales the required cycle budget with a factor of three, in parallel with a smooth degradation of quality.

  13. Application analysis and communication aspects for future multimedia architectures

    Science.gov (United States)

    de With, Peter H. N.; Jaspers, Egbert G. T.; Mietens, Stephan

    2004-11-01

    A principal challenge for reducing the cost of complex systems-on-chip is to pursue more generic systems for a broad range of products. For this purpose, we explore three new architectural concepts for state-of-the-art video applications. First, we discuss a reusable scalable hardware architecture employing a hierarchical communication network fitting with the natural hierarchy of the application. In a case study, we show that MPEG streaming in DTV occurs at high level, while subsystems communicate at lower levels. The second concept is a software design that scales over a number of processors to enable reuse over a range of VLSI process technologies. We explore this via an H.264 decoder implementation that scales nearly linearly over up to eight processors by applying data partitioning. The third concept is resource-scalability, which is required to satisfy real-time constraints in a system with a high amount of shared resources. An example complexity-scalable MPEG-2 encoder scales the required cycle budget with a factor of three, in parallel with a smooth degradation of quality.

  14. VLSI implementation of neural networks.

    Science.gov (United States)

    Wilamowski, B M; Binfet, J; Kaynak, M O

    2000-06-01

    Currently, fuzzy controllers are the most popular choice for hardware implementation of complex control surfaces because they are easy to design. Neural controllers are more complex and hard to train, but provide an outstanding control surface with much less error than that of a fuzzy controller. There are also some problems that have to be solved before the networks can be implemented on VLSI chips. First, an approximation function needs to be developed because CMOS neural networks have an activation function different than any function used in neural network software. Next, this function has to be used to train the network. Finally, the last problem for VLSI designers is the quantization effect caused by discrete values of the channel length (L) and width (W) of MOS transistor geometries. Two neural networks were designed in 1.5 microm technology. Using adequate approximation functions solved the problem of activation function. With this approach, trained networks were characterized by very small errors. Unfortunately, when the weights were quantized, errors were increased by an order of magnitude. However, even though the errors were enlarged, the results obtained from neural network hardware implementations were superior to the results obtained with fuzzy system approach.

  15. Processor Cache

    NARCIS (Netherlands)

    Boncz, P.A.; Liu, L.; Özsu, M. Tamer

    2008-01-01

    To hide the high latencies of DRAM access, modern computer architecture now features a memory hierarchy that besides DRAM also includes SRAM cache memories, typically located on the CPU chip. Memory access first check these caches, which takes only a few cycles. Only if the needed data is not found,

  16. Event-driven neural integration and synchronicity in analog VLSI.

    Science.gov (United States)

    Yu, Theodore; Park, Jongkil; Joshi, Siddharth; Maier, Christoph; Cauwenberghs, Gert

    2012-01-01

    Synchrony and temporal coding in the central nervous system, as the source of local field potentials and complex neural dynamics, arises from precise timing relationships between spike action population events across neuronal assemblies. Recently it has been shown that coincidence detection based on spike event timing also presents a robust neural code invariant to additive incoherent noise from desynchronized and unrelated inputs. We present spike-based coincidence detection using integrate-and-fire neural membrane dynamics along with pooled conductance-based synaptic dynamics in a hierarchical address-event architecture. Within this architecture, we encode each synaptic event with parameters that govern synaptic connectivity, synaptic strength, and axonal delay with additional global configurable parameters that govern neural and synaptic temporal dynamics. Spike-based coincidence detection is observed and analyzed in measurements on a log-domain analog VLSI implementation of the integrate-and-fire neuron and conductance-based synapse dynamics.

  17. Verilog Implementation of 32-Bit CISC Processor

    Directory of Open Access Journals (Sweden)

    P.Kanaka Sirisha

    2016-04-01

    Full Text Available The Project deals with the design of the 32-Bit CISC Processor and modeling of its components using Verilog language. The Entire Processor uses 32-Bit bus to deal with all the registers and the memories. This Processor implements various arithmetic, logical, Data Transfer operations etc., using variable length instructions, which is the core property of the CISC Architecture. The Processor also supports various addressing modes to perform a 32-Bit instruction. Our Processor uses Harvard Architecture (i.e., to have a separate program and data memory and hence has different buses to negotiate with the Program Memory and Data Memory individually. This feature enhances the speed of our processor. Hence it has two different Program Counters to point to the memory locations of the Program Memory and Data Memory.Our processor has ‘Instruction Queuing’ which enables it to save the time needed to fetch the instruction and hence increases the speed of operation. ‘Interrupt Service Routine’ is provided in our Processor to make it address the Interrupts.

  18. Median and Morphological Specialized Processors for a Real-Time Image Data Processing

    Directory of Open Access Journals (Sweden)

    Kazimierz Wiatr

    2002-01-01

    Full Text Available This paper presents the considerations on selecting a multiprocessor MISD architecture for fast implementation of the vision image processing. Using the author′s earlier experience with real-time systems, implementing of specialized hardware processors based on the programmable FPGA systems has been proposed in the pipeline architecture. In particular, the following processors are presented: median filter and morphological processor. The structure of a universal reconfigurable processor developed has been proposed as well. Experimental results are presented as delays on LCA level implementation for median filter, morphological processor, convolution processor, look-up-table processor, logic processor and histogram processor. These times compare with delays in general purpose processor and DSP processor.

  19. Evaluating current processors performance and machines stability

    CERN Document Server

    Esposito, R; Tortone, G; Taurino, F M

    2003-01-01

    Accurately estimate performance of currently available processors is becoming a key activity, particularly in HENP environment, where high computing power is crucial. This document describes the methods and programs, opensource or freeware, used to benchmark processors, memory and disk subsystems and network connection architectures. These tools are also useful to stress test new machines, before their acquisition or before their introduction in a production environment, where high uptimes are requested.

  20. Percola: a special purpose programmable 64-BIT floating-point processor

    Energy Technology Data Exchange (ETDEWEB)

    Normand, J.M.

    1988-01-01

    The computer PERCOLA is designed for lengthy numerical simulations on a percolation problem in Statistical Mechanics of disordered media. The project that the computer is engaged on at present is intended to improve the true values of critical indices characteristic of the behaviour of electrical conductivity at percolation threshold in a system of random resistors. The architecture is based on an efficient highly iterative algorithm to compute the electrical conductivity of random resistor networks. This computer runs programs of percolation problems considered 10 percent faster than the Cray X-MP with the same 64-bit floating-point precision. Operating since May 1987, months of calculation have already been performed. Although best suited to a class of algorithms, the processor includes a powerful 32-bit integer random number generator and has many characteristics of general purpose 64-bit floating-point microprogrammable computers that can run programs for various type of problems with a peak performance of more than 25 Mflops. This high computing speed is not the result of one all-powerful feature, but a balance among several factors including supercomputer derived features such as mainly: concurrent functional units separately controlled from independent microcode fields, distinct buses for data, addresses and instructions, flexible and powerful address generators for matrix processing and an advanced pipeline design implementing high performance VLSI Weitek's and Analog Devices components.

  1. DESIGN OF A RECONFIGURABLE DSP PROCESSOR WITH BIT EFFICIENT RESIDUE NUMBER SYSTEM

    Directory of Open Access Journals (Sweden)

    Chaitali Biswas Dutta

    2012-10-01

    Full Text Available Residue Number System (RNS, which originates from the Chinese Remainder Theorem, offers a promising future in VLSI because of its carry-free operations in addition, subtraction and multiplication. This property of RNS is very helpful to reduce the complexity of calculation in many applications. A residue number system represents a large integer using a set of smaller integers, called residues. But the area overhead, cost and speed not only depend on this word length, but also the selection of moduli, which is a very crucial step for residue system. This parameter determines bit efficiency, area, frequency etc. In this paper a new moduli set selection technique is proposed to improve bit efficiency which can be used to construct a residue system for digital signal processing environment. Subsequently, it is theoretically proved and illustrated using examples, that the proposed solution gives better results than the schemes reported in the literature. The novelty of the architecture is shown by comparison the different schemes reported in the literature. Using the novel moduli set, a guideline for a Reconfigurable Processor is presented here that can process some predefined functions. As RNS minimizes the carry propagation, the scheme can be implemented in Real Time Signal Processing & other fields where high speed computations are required.

  2. Enabling Future Robotic Missions with Multicore Processors

    Science.gov (United States)

    Powell, Wesley A.; Johnson, Michael A.; Wilmot, Jonathan; Some, Raphael; Gostelow, Kim P.; Reeves, Glenn; Doyle, Richard J.

    2011-01-01

    Recent commercial developments in multicore processors (e.g. Tilera, Clearspeed, HyperX) have provided an option for high performance embedded computing that rivals the performance attainable with FPGA-based reconfigurable computing architectures. Furthermore, these processors offer more straightforward and streamlined application development by allowing the use of conventional programming languages and software tools in lieu of hardware design languages such as VHDL and Verilog. With these advantages, multicore processors can significantly enhance the capabilities of future robotic space missions. This paper will discuss these benefits, along with onboard processing applications where multicore processing can offer advantages over existing or competing approaches. This paper will also discuss the key artchitecural features of current commercial multicore processors. In comparison to the current art, the features and advancements necessary for spaceflight multicore processors will be identified. These include power reduction, radiation hardening, inherent fault tolerance, and support for common spacecraft bus interfaces. Lastly, this paper will explore how multicore processors might evolve with advances in electronics technology and how avionics architectures might evolve once multicore processors are inserted into NASA robotic spacecraft.

  3. Making CSB+-Tree Processor Conscious

    DEFF Research Database (Denmark)

    Samuel, Michael; Pedersen, Anders Uhl; Bonnet, Philippe

    2005-01-01

    Cache-conscious indexes, such as CSB+-tree, are sensitive to the underlying processor architecture. In this paper, we focus on how to adapt the CSB+-tree so that it performs well on a range of different processor architectures. Previous work has focused on the impact of node size on the performance...... of the CSB+-tree. We argue that it is necessary to consider a larger group of parameters in order to adapt CSB+-tree to processor architectures as different as Pentium and Itanium. We identify this group of parameters and study how it impacts the performance of CSB+-tree on Itanium 2. Finally, we propose...... a systematic method for adapting CSB+-tree to new platforms. This work is a first step towards integrating CSB+-tree in MySQL’s heap storage manager....

  4. Multiple core computer processor with globally-accessible local memories

    Energy Technology Data Exchange (ETDEWEB)

    Shalf, John; Donofrio, David; Oliker, Leonid

    2016-09-20

    A multi-core computer processor including a plurality of processor cores interconnected in a Network-on-Chip (NoC) architecture, a plurality of caches, each of the plurality of caches being associated with one and only one of the plurality of processor cores, and a plurality of memories, each of the plurality of memories being associated with a different set of at least one of the plurality of processor cores and each of the plurality of memories being configured to be visible in a global memory address space such that the plurality of memories are visible to two or more of the plurality of processor cores.

  5. Architectures for block Toeplitz systems

    NARCIS (Netherlands)

    Bouras, Ilias; Glentis, George-Othon; Kalouptsidis, Nicholas

    1996-01-01

    In this paper efficient VLSI architectures of highly concurrent algorithms for the solution of block linear systems with Toeplitz or near-to-Toeplitz entries are presented. The main features of the proposed scheme are the use of scalar only operations, multiplications/divisions and additions, and th

  6. Cascaded VLSI Chips Help Neural Network To Learn

    Science.gov (United States)

    Duong, Tuan A.; Daud, Taher; Thakoor, Anilkumar P.

    1993-01-01

    Cascading provides 12-bit resolution needed for learning. Using conventional silicon chip fabrication technology of VLSI, fully connected architecture consisting of 32 wide-range, variable gain, sigmoidal neurons along one diagonal and 7-bit resolution, electrically programmable, synaptic 32 x 31 weight matrix implemented on neuron-synapse chip. To increase weight nominally from 7 to 13 bits, synapses on chip individually cascaded with respective synapses on another 32 x 32 matrix chip with 7-bit resolution synapses only (without neurons). Cascade correlation algorithm varies number of layers effectively connected into network; adds hidden layers one at a time during learning process in such way as to optimize overall number of neurons and complexity and configuration of network.

  7. VLSI design techniques for floating-point computation

    Energy Technology Data Exchange (ETDEWEB)

    Bose, B. K.

    1988-01-01

    The thesis presents design techniques for floating-point computation in VLSI. A basis for area-time design decisions for arithmetic and memory operations is formulated from a study of computationally intensive programs. Tradeoffs in the design and implementation of an efficient coprocessor interface are studied, together with the implications of hardware support for the IEEE Floating-Point Standard. Algorithm area-time tradeoffs for basic arithmetic functions are analyzed in light of changing technology. Details of a single-chip floating-point unit designed in two-micron CMOS for SPUR are described, including special design considerations for very wide data paths. The pervasive effects of scaling technology on different levels of design are explored, from devices and circuits, through logic and micro-architecture, to algorithms and systems.

  8. Compact MOSFET models for VLSI design

    CERN Document Server

    Bhattacharyya, A B

    2009-01-01

    Practicing designers, students, and educators in the semiconductor field face an ever expanding portfolio of MOSFET models. In Compact MOSFET Models for VLSI Design , A.B. Bhattacharyya presents a unified perspective on the topic, allowing the practitioner to view and interpret device phenomena concurrently using different modeling strategies. Readers will learn to link device physics with model parameters, helping to close the gap between device understanding and its use for optimal circuit performance. Bhattacharyya also lays bare the core physical concepts that will drive the future of VLSI.

  9. VLSI implementation of a template subtraction algorithm for real-time stimulus artifact rejection.

    Science.gov (United States)

    Limnuson, Kanokwan; Lu, Hui; Chiel, Hillel J; Mohseni, Pedram

    2010-01-01

    In this paper, we present very-large-scale integrated (VLSI) implementation of a template subtraction algorithm for stimulus artifact rejection (SAR) in real time with applicability to closed-loop neuroprostheses. The SAR algorithm is based upon an infinite impulse response (IIR) temporal filtering technique, which can be efficiently implemented in VLSI with reduced power consumption and silicon area. We demonstrate that initialization of the memory within the system architecture using the first recorded stimulus artifact significantly decreases system response time as compared to the case without memory initialization. Two sets of pre-recorded neural data from an Aplysia californica are used to simulate the functionality of the proposed VLSI architecture in AMS 0.35 microm complementary metal-oxide-semiconductor (CMOS) technology. Depending upon the reproducibility in the shape of stimulus artifacts in vivo, the system eliminates virtually all artifacts in real time and recovers the extracellular neural activity with microW-level power consumption from 1.5 V.

  10. Technology development and circuit design for a parallel laser programmable floating-point application specific processor. Master's thesis

    Energy Technology Data Exchange (ETDEWEB)

    Scriber, M.W.

    1989-12-01

    The laser programmable floating point application specific processor (LPASP) is a new approach at rapid development of custom VLSI chips. The LPASP is a generic application specific processor that can be programmed to perform a specific function. The effort of this thesis is to develop and test the double precision floating point adder and the laser programmable read-only memory (LPROM) that are macrocells within the LPASP. In addition, the thesis analyzes the applicability of an LPASP parallel processing system. The double precision floating point adder is an adder/subtractor macrocell designed to comply with the IEEE double precision floating point standard. An 84-pin chip of the adder was fabricated using 2 micron feature sizes. The fastest processing time was measured at 120 nanoseconds over 23 worst case test vectors. The adder uses the optimized carry multiplexed (OCM) adder that was developed at AFIT. The OCM adder is a new adder architecture that uses four parallel carry paths to attain a performance time on the order of (cubed root of M) with a gate count on the order of O (n). The redundant logic associated with the parallel propagation banks is eliminated in the OCM adder so that the largest bit-slice of the adder contains only eight 2-to-1 multiplexer gates. A 57-bit adder was fabricated using 2 micron feature sizes. The processing time for the adder is 31 nsec.

  11. Technology Development and Circuit Design for a Parallel Laser Programmable Floating Point Application Specific Processor

    Science.gov (United States)

    1989-12-01

    The laser programmable floating point application specific processor (LPASP) is a new approach at rapid development of custom VLSI chips. The LPASP...double precision floating point adder and the laser programmable read-only memory (LPROM) that are macrocells within the LPASP. In addition, the...thesis analyzes the applicability of an LPASP parallel processing system. The double precision floating point adder is an adder/subtractor macrocell

  12. Impact of the Use of Object Request Broker Middleware for Inter-Component Communications in C6416 Digital Signal Processor Based Software Communications Architecture Radio Systems

    Directory of Open Access Journals (Sweden)

    Mohamed I. Yousef

    2012-01-01

    Full Text Available Problem statement: This study presents an in-depth analysis of the performance of Software Communications Architecture (SCA component-based waveform applications in terms of inter-component communications. The main limitation with SCA, in the context of embedded systems, is the additional cost introduced by the use of Object Request Broker (ORB middleware. The ORB middleware handles the interaction between components and objects in SCA distributed environment. This interaction should be highly efficient, due to the real time nature of SCA systems and transparent to the application programmer. Approach: We can achieve high efficiency in SCA systems by enhancing the Inter-Process Communications (IPC mechanisms in Operating systems (OS micro kernels, while we achieve transparency through Interface Definition Language (IDL. Different encoding mechanisms like “External Data Representation (XDR, Network Data Representation (NDR and Common Data Representation (CDR facilitate inter-component communication transparently and efficiently”. Marshalling procedures format data from the local machine representation to common network representations. A most common encoding mechanism for Common Object Request Broker Architecture (CORBA systems is CDR representation. Measurements have been performed with ORBExpress DSP as a CORBA distribution and Open Source SCA Implementation Embedded (OSSIE for SCA implementation. In order to perform these measurements we proposed two metrics for profiling the ORB that are invocation and marshalling. In addition, we propose three elements of data types to evaluate the performance of ORB middleware that are, Basic, Array and Sequence data types. Results: The CORBA bus is really the part, which brings an overhead to the SCA radio systems. This overhead is due to method invocations that have been carried out by ORB middleware. Conclusion: Performance benchmarks of ORBExpress DSP middleware show that, although using CORBA for

  13. SSI/MSI/LSI/VLSI/ULSI.

    Science.gov (United States)

    Alexander, George

    1984-01-01

    Discusses small-scale integrated (SSI), medium-scale integrated (MSI), large-scale integrated (LSI), very large-scale integrated (VLSI), and ultra large-scale integrated (ULSI) chips. The development and properties of these chips, uses of gallium arsenide, Josephson devices (two superconducting strips sandwiching a thin insulator), and future…

  14. Hardware Implementation of a Genetic Algorithm Based Canonical Singed Digit Multiplierless Fast Fourier Transform Processor for Multiband Orthogonal Frequency Division Multiplexing Ultra Wideband Applications

    Directory of Open Access Journals (Sweden)

    Mahmud Benhamid

    2009-01-01

    Full Text Available Problem statement: Ultra Wide Band (UWB technology has attracted many researchers' attention due to its advantages and its great potential for future applications. The physical layer standard of Multi-band Orthogonal Frequency Division Multiplexing (MB-OFDM UWB system is defined by ECMA International. In this standard, the data sampling rate from the analog-to-digital converter to the physical layer is up to 528 M sample sec-1. Therefore, it is a challenge to realize the physical layer especially the components with high computational complexity in Very Large Scale Integration (VLSI implementation. Fast Fourier Transform (FFT block which plays an important role in MB-OFDM system is one of these components. Furthermore, the execution time of this module is only 312.5 ns. Therefore, if employing the traditional approach, high power consumption and hardware cost of the processor will be needed to meet the strict specifications of the UWB system. The objective of this study was to design an Application Specific Integrated Circuit (ASIC FFT processor for this system. The specification was defined from the system analysis and literature research. Approach: Based on the algorithm and architecture analysis, a novel Genetic Algorithm (GA based Canonical Signed Digit (CSD Multiplier less 128-point FFT processor and its inverse (IFFT for MB-OFDM UWB systems had been proposed. The proposed pipelined architecture was based on the modified Radix-22 algorithm that had same number of multipliers as that of the conventional Radix-22. However, the multiplication complexity and the ROM memory needed for storing twiddle factors coefficients could be eliminated by replacing the conventional complex multipliers with a newly proposed GA optimized CSD constant multipliers. The design had been coded in Verilog HDL and targeted Xilinx Virtex-II FPGA series. It was fully implemented and tested on real hardware using Virtex-II FG456 prototype board and logic analyzer

  15. Processor-Dependent Malware... and codes

    CERN Document Server

    Desnos, Anthony; Filiol, Eric

    2010-01-01

    Malware usually target computers according to their operating system. Thus we have Windows malwares, Linux malwares and so on ... In this paper, we consider a different approach and show on a technical basis how easily malware can recognize and target systems selectively, according to the onboard processor chip. This technology is very easy to build since it does not rely on deep analysis of chip logical gates architecture. Floating Point Arithmetic (FPA) looks promising to define a set of tests to identify the processor or, more precisely, a subset of possible processors. We give results for different families of processors: AMD, Intel (Dual Core, Atom), Sparc, Digital Alpha, Cell, Atom ... As a conclusion, we propose two {\\it open problems} that are new, to the authors' knowledge.

  16. High speed matrix processors using floating point representation

    Energy Technology Data Exchange (ETDEWEB)

    Birkner, D.A.

    1980-01-01

    The author describes the architecture of a high-speed matrix processor which uses a floating-point format for data representation. It is shown how multipliers and other LSI devices are used in the design to obtain the high speed of the processor.

  17. A Simple and Affordable TTL Processor for the Classroom

    Science.gov (United States)

    Feinberg, Dave

    2007-01-01

    This paper presents a simple 4 bit computer processor design that may be built using TTL chips for less than $65. In addition to describing the processor itself in detail, we discuss our experience using the laboratory kit and its associated machine instruction set to teach computer architecture to high school students. (Contains 3 figures and 5…

  18. Open|SpeedShop Ease of Use Performance Analysis for Heterogenious Processor Systems Project

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose building upon the modular extensible architecture and existing capabilities of Open|SpeedShop to provide seamless, integrated, heterogeneous processor...

  19. Research into Self-Timed VLSI Circuits.

    Science.gov (United States)

    1984-10-22

    without reorganization. and identical processors are spaced around the store. In the Gauss- Seidel method [8] the grid point Computer simulation of the...convergence rates intermediate between those of grid point values are used throughout each iteration. the Jacobi and Gauss- Seidel methods are obtained... Seidel method , number of processors) of 40-60% is achieved with as when it uses one processor per grid point, it reduces to many as N processors, where

  20. Noise limitations in optical linear algebra processors.

    Science.gov (United States)

    Batsell, S G; Jong, T L; Walkup, J F; Krile, T F

    1990-05-10

    A general statistical noise model is presented for optical linear algebra processors. A statistical analysis which includes device noise, the multiplication process, and the addition operation is undertaken. We focus on those processes which are architecturally independent. Finally, experimental results which verify the analytical predictions are also presented.

  1. Floating-point multiple data stream digital signal processor

    Energy Technology Data Exchange (ETDEWEB)

    Fortier, M.; Corinthios, M.J.

    1982-01-01

    A microprogrammed multiple data stream digital signal processor is introduced. This floating-point processor is capable of implementing optimum Wiener filtering of signals, in general, and images in particular. Generalised spectral analysis transforms such as Fourier, Walsh, Hadamard, and generalised Walsh are efficiently implemented in a bit-slice microprocessor-based architecture. In this architecture, a microprogrammed sequencing section directly controls a central floating-point signal processing unit. Throughout, computations are performed on pipelined multiple complex data streams. 12 references.

  2. A Coherent VLSI Design Environment

    Science.gov (United States)

    1987-03-31

    In the figure, AminH and Ama.H represent the smallest and largest eigenvalues I of YH and AminAH and AmaAH represent the smallest and largest...Press, Princeton, NJ, 1970. [17] G. Clark and R. Zippel, "Schema: An Architecture for Knowledge Based Design," International Conference on Computer-Aided

  3. Compensating Inhomogeneities of Neuromorphic VLSI Devices Via Short-Term Synaptic Plasticity.

    Science.gov (United States)

    Bill, Johannes; Schuch, Klaus; Brüderle, Daniel; Schemmel, Johannes; Maass, Wolfgang; Meier, Karlheinz

    2010-01-01

    Recent developments in neuromorphic hardware engineering make mixed-signal VLSI neural network models promising candidates for neuroscientific research tools and massively parallel computing devices, especially for tasks which exhaust the computing power of software simulations. Still, like all analog hardware systems, neuromorphic models suffer from a constricted configurability and production-related fluctuations of device characteristics. Since also future systems, involving ever-smaller structures, will inevitably exhibit such inhomogeneities on the unit level, self-regulation properties become a crucial requirement for their successful operation. By applying a cortically inspired self-adjusting network architecture, we show that the activity of generic spiking neural networks emulated on a neuromorphic hardware system can be kept within a biologically realistic firing regime and gain a remarkable robustness against transistor-level variations. As a first approach of this kind in engineering practice, the short-term synaptic depression and facilitation mechanisms implemented within an analog VLSI model of I&F neurons are functionally utilized for the purpose of network level stabilization. We present experimental data acquired both from the hardware model and from comparative software simulations which prove the applicability of the employed paradigm to neuromorphic VLSI devices.

  4. C-slow Technique vs Multiprocessor in designing Low Area Customized Instruction set Processor for Embedded Applications

    CERN Document Server

    Akram, Muhammad Adeel; Sarfaraz, Muhammad Masood

    2012-01-01

    The demand for high performance embedded processors, for consumer electronics, is rapidly increasing for the past few years. Many of these embedded processors depend upon custom built Instruction Ser Architecture (ISA) such as game processor (GPU), multimedia processors, DSP processors etc. Primary requirement for consumer electronic industry is low cost with high performance and low power consumption. A lot of research has been evolved to enhance the performance of embedded processors through parallel computing. But some of them focus superscalar processors i.e. single processors with more resources like Instruction Level Parallelism (ILP) which includes Very Long Instruction Word (VLIW) architecture, custom instruction set extensible processor architecture and others require more number of processing units on a single chip like Thread Level Parallelism (TLP) that includes Simultaneous Multithreading (SMT), Chip Multithreading (CMT) and Chip Multiprocessing (CMP). In this paper, we present a new technique, n...

  5. APRON: A Cellular Processor Array Simulation and Hardware Design Tool

    Directory of Open Access Journals (Sweden)

    David R. W. Barr

    2009-01-01

    Full Text Available We present a software environment for the efficient simulation of cellular processor arrays (CPAs. This software (APRON is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.

  6. Embedded Processor Laboratory

    Data.gov (United States)

    Federal Laboratory Consortium — The Embedded Processor Laboratory provides the means to design, develop, fabricate, and test embedded computers for missile guidance electronics systems in support...

  7. Harnessing VLSI System Design with EDA Tools

    CERN Document Server

    Kamat, Rajanish K; Gaikwad, Pawan K; Guhilot, Hansraj

    2012-01-01

    This book explores various dimensions of EDA technologies for achieving different goals in VLSI system design. Although the scope of EDA is very broad and comprises diversified hardware and software tools to accomplish different phases of VLSI system design, such as design, layout, simulation, testability, prototyping and implementation, this book focuses only on demystifying the code, a.k.a. firmware development and its implementation with FPGAs. Since there are a variety of languages for system design, this book covers various issues related to VHDL, Verilog and System C synergized with EDA tools, using a variety of case studies such as testability, verification and power consumption. * Covers aspects of VHDL, Verilog and Handel C in one text; * Enables designers to judge the appropriateness of each EDA tool for relevant applications; * Omits discussion of design platforms and focuses on design case studies; * Uses design case studies from diversified application domains such as network on chip, hospital on...

  8. VLSI Microsystem for Rapid Bioinformatic Pattern Recognition

    Science.gov (United States)

    Fang, Wai-Chi; Lue, Jaw-Chyng

    2009-01-01

    A system comprising very-large-scale integrated (VLSI) circuits is being developed as a means of bioinformatics-oriented analysis and recognition of patterns of fluorescence generated in a microarray in an advanced, highly miniaturized, portable genetic-expression-assay instrument. Such an instrument implements an on-chip combination of polymerase chain reactions and electrochemical transduction for amplification and detection of deoxyribonucleic acid (DNA).

  9. Leak detection utilizing analog binaural (VLSI) techniques

    Science.gov (United States)

    Hartley, Frank T. (Inventor)

    1995-01-01

    A detection method and system utilizing silicon models of the traveling wave structure of the human cochlea to spatially and temporally locate a specific sound source in the presence of high noise pandemonium. The detection system combines two-dimensional stereausis representations, which are output by at least three VLSI binaural hearing chips, to generate a three-dimensional stereausis representation including both binaural and spectral information which is then used to locate the sound source.

  10. Modular VLSI Reed-Solomon Decoder

    Science.gov (United States)

    Hsu, In-Shek; Truong, Trieu-Kie

    1991-01-01

    Proposed Reed-Solomon decoder contains multiple very-large-scale integrated (VLSI) circuit chips of same type. Each chip contains sets of logic cells and subcells performing functions from all stages of decoding process. Full decoder assembled by concatenating chips, with selective utilization of cells in particular chips. Cost of development reduced by factor of 5. In addition, decoder programmable in field and switched between 8-bit and 10-bit symbol sizes.

  11. Modular VLSI Reed-Solomon Decoder

    Science.gov (United States)

    Hsu, In-Shek; Truong, Trieu-Kie

    1991-01-01

    Proposed Reed-Solomon decoder contains multiple very-large-scale integrated (VLSI) circuit chips of same type. Each chip contains sets of logic cells and subcells performing functions from all stages of decoding process. Full decoder assembled by concatenating chips, with selective utilization of cells in particular chips. Cost of development reduced by factor of 5. In addition, decoder programmable in field and switched between 8-bit and 10-bit symbol sizes.

  12. Generating Weighted Test Patterns for VLSI Chips

    Science.gov (United States)

    Siavoshi, Fardad

    1990-01-01

    Improved built-in self-testing circuitry for very-large-scale integrated (VLSI) digital circuits based on version of weighted-test-pattern-generation concept, in which ones and zeros in pseudorandom test patterns occur with probabilities weighted to enhance detection of certain kinds of faults. Requires fewer test patterns and less computation time and occupies less area on circuit chips. Easy to relate switching activity in outputs with fault-detection activity by use of probabilistic fault-detection techniques.

  13. SPROC: A multiple-processor DSP IC

    Science.gov (United States)

    Davis, R.

    1991-01-01

    A large, single-chip, multiple-processor, digital signal processing (DSP) integrated circuit (IC) fabricated in HP-Cmos34 is presented. The innovative architecture is best suited for analog and real-time systems characterized by both parallel signal data flows and concurrent logic processing. The IC is supported by a powerful development system that transforms graphical signal flow graphs into production-ready systems in minutes. Automatic compiler partitioning of tasks among four on-chip processors gives the IC the signal processing power of several conventional DSP chips.

  14. Wide-range, picoampere-sensitivity multichannel VLSI potentiostat for neurotransmitter sensing.

    Science.gov (United States)

    Murari, Kartikeya; Thakor, Nitish; Stanacevic, Milutin; Cauwenberghs, Gert

    2004-01-01

    Neurotransmitter sensing is critical in studying nervous pathways and neurological disorders. A 16-channel current-measuring VLSI potentiostat with multiple ranges from picoamperes to microamperes is presented for electrochemical detection of electroactive neurotransmitters like dopamine, nitric oxide etc. The analog-to-digital converter design employs a current-mode, first-order single-bit delta-sigma modulator architecture with a two-stage, digitally reconfigurable oversampling ratio for ranging the conversion scale. An integrated prototype is fabricated in CMOS technology, and experimentally characterized. Real-time multi-channel acquisition of dopamine concentration in vitro is performed with a microfabricated sensor array.

  15. VLSI Structure for an All Digital Receiver for CDMA PABX Handset

    Institute of Scientific and Technical Information of China (English)

    ZhouShidong; BiGuangguo

    1995-01-01

    In this paper,a VLSI architecture of a CDMA receiver is put forward for wirelesss PABX handset.To meet the critically low cost and power consumption requirement with neglectable per-formance degradation,some new techniques are employed to reduce hardware complexity,including base band processing,chip-rate sampling,low ADC resolution,absolute value detector,double branch acquisition ,and modified carrier phase compensation.Performance of experimental system fits well with theoretical predition ,and the practical SNR lose compared with ideal reception is about 2-3dB.

  16. Array processors in chemistry

    Energy Technology Data Exchange (ETDEWEB)

    Ostlund, N.S.

    1980-01-01

    The field of attached scientific processors (''array processors'') is surveyed, and an attempt is made to indicate their present and possible future use in computational chemistry. The current commercial products from Floating Point Systems, Inc., Datawest Corporation, and CSP, Inc. are discussed.

  17. Training probabilistic VLSI models on-chip to recognise biomedical signals under hardware nonidealities.

    Science.gov (United States)

    Jiang, P C; Chen, H

    2006-01-01

    VLSI implementation of probabilistic models is attractive for many biomedical applications. However, hardware non-idealities can prevent probabilistic VLSI models from modelling data optimally through on-chip learning. This paper investigates the maximum computational errors that a probabilistic VLSI model can tolerate when modelling real biomedical data. VLSI circuits capable of achieving the required precision are also proposed.

  18. Optoelectronic Correlator Architecture for Shift Invariant Target Recognition

    CERN Document Server

    Monjur, Mehjabin S; Tripathi, Renu; Donoghue, John; Shahriar, M S

    2013-01-01

    In this paper, we present theoretical details and the underlying architecture of a hybrid optoelectronic correlator that correlates images using SLMs, detectors and VLSI chips. The proposed architecture bypasses the nonlinear material such as photorefractive polymer film by using detectors instead, and the phase information is yet conserved by the interference of plane waves with the images.

  19. Analogue VLSI for probabilistic networks and spike-time computation.

    Science.gov (United States)

    Murray, A

    2001-02-01

    The history and some of the methods of analogue neural VLSI are described. The strengths of analogue techniques are described, along with residual problems to be solved. The nature of hardware-friendly and hardware-appropriate algorithms is reviewed and suggestions are offered as to where analogue neural VLSI's future lies.

  20. Trends and challenges in VLSI technology scaling towards 100 nm

    NARCIS (Netherlands)

    Rusu, S.; Sachdev, M.; Svensson, C.; Nauta, Bram

    Summary form only given. Moore's Law drives VLSI technology to continuous increases in transistor densities and higher clock frequencies. This tutorial will review the trends in VLSI technology scaling in the last few years and discuss the challenges facing process and circuit engineers in the 100nm

  1. 超大规模集成电路可调试性设计综述%Survey of Design-for-Debug of VLSI

    Institute of Scientific and Technical Information of China (English)

    钱诚; 沈海华; 陈天石; 陈云霁

    2012-01-01

    随着硬件复杂度的不断提高和并行软件调试的需求不断增长,可调试性设计已经成为集成电路设计中的重要内容.一方面,仅靠传统的硅前验证已经无法保证现代超大规模复杂集成电路设计验证的质量,因此作为硅后验证重要支撑技术的可调试性设计日渐成为大规模集成电路设计领域的研究热点.另一方面,并行程序的调试非常困难,很多细微的bug无法直接用传统的单步、断点等方法进行调试,如果没有专门的硬件支持,需要耗费极大的人力和物力.全面分析了现有的可调试性设计,在此基础上归纳总结了可调试性设计技术的主要研究方向并介绍了各个方向的研究进展,深入探讨了可调试性结构设计研究中的热点问题及其产生根源,给出了可调试性结构设计领域的发展趋势.%Design-for-debug (DFD) has become an important feature of modern VLSI. On the one hand, traditional pre-silicon verification methods are not sufficient to enssure the quality of modern complex VLSI designs, thus employing DFD to facilitate post-silicon verification has attracted wide interests from both academia and industry; on the other hand, debugging parallel program is a worldwide difficult problem, which cries out for DFD hardware supports. In this paper, we analyze the existing structures of DFD comprehensively and introduce different fields of DFD for debugging hardware and software. These fields contain various kinds of DFD infrastructures, such as the DFD infrastructure for the pipe line of processor, the system-on-chips (SOC) and the networks on multi-cores processor. We also introduce the recent researches on how to design the DFD infrastructures with certain processor architecture and how to use the DFD infrastructures to solve the debug problems in these different fields. The topologic of the whole infrastructure, the hardware design of components, the methods of analyzing signals, the

  2. Real time processor for array speckle interferometry

    Science.gov (United States)

    Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

    1989-01-01

    The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.

  3. ETHERNET PACKET PROCESSOR FOR SOC APPLICATION

    Directory of Open Access Journals (Sweden)

    Raja Jitendra Nayaka

    2012-07-01

    Full Text Available As the demand for Internet expands significantly in numbers of users, servers, IP addresses, switches and routers, the IP based network architecture must evolve and change. The design of domain specific processors that require high performance, low power and high degree of programmability is the bottleneck in many processor based applications. This paper describes the design of ethernet packet processor for system-on-chip (SoC which performs all core packet processing functions, including segmentation and reassembly, packetization classification, route and queue management which will speedup switching/routing performance. Our design has been configured for use with multiple projects ttargeted to a commercial configurable logic device the system is designed to support 10/100/1000 links with a speed advantage. VHDL has been used to implement and simulated the required functions in FPGA.

  4. Distributed Processor/Memory Architectures Design Program

    Science.gov (United States)

    1975-02-01

    plemnen ted onl tile imemory UUE 2. lrrir Defctceion BIU INP-UT IFFF MESSAGE Nfeir’nr~~noriir~f fases rie oiri~’ jCONTROL. cr’Ii eIItecion Ir id. ill...is discussed Iin Subsection MD.B.". Trhe ne\\t problem. which to a great extenit is of’ nanagement-decisiois nature. is to comec up with a sensible

  5. Modelling of Human Glottis in VLSI for Low Power Architectures

    CERN Document Server

    Raj, Nikhil

    2010-01-01

    The Glottal Source is an important component of voice as it can be considered as the excitation signal to the voice apparatus. Nowadays, new techniques of speech processing such as speech recognition and speech synthesis use the glottal closure and opening instants. Current models of the glottal waves derive their shape from approximate information rather than from exactly measured data. General method concentrate on assessment of the glottis opening using optical, acoustical methods, or on visualization of the larynx position using ultrasound, computer tomography or magnetic resonance imaging techniques. In this work, circuit model of Human Glottis using MOS is designed by exploiting fluid volume velocity to current, fluid pressure to voltage, and linear and nonlinear mechanical impedances to linear and nonlinear electrical impedances. The glottis modeled as current source includes linear, non-linear impedances to represent laminar and turbulent flow respectively, in vocal tract. The MOS modelling and simula...

  6. VLSI architecture for a Reed-Solomon decoder

    Science.gov (United States)

    Hsu, In-Shek (Inventor); Truong, Trieu-Kie (Inventor)

    1992-01-01

    A basic single-chip building block for a Reed-Solomon (RS) decoder system is partitioned into a plurality of sections, the first of which consists of a plurality of syndrome subcells each of which contains identical standard-basis finite-field multipliers that are programmable between 10 and 8 bit operation. A desired number of basic building blocks may be assembled to provide a RS decoder of any syndrome subcell size that is programmable between 10 and 8 bit operation.

  7. 3-D VLSI Architecture Implementation for Data Fusion Problems

    Science.gov (United States)

    Duong, T.; Weldon, D.; Thomas, T.

    1999-01-01

    This paper gives an overview of hardware implementation techniques employed in solving real-time classification problems using Neural Network, Principle Component Analysis (PCA), and Independent Component Analysis (ICA) techniques.

  8. Cascaded VLSI neural network architecture for on-line learning

    Science.gov (United States)

    Duong, Tuan A. (Inventor); Daud, Taher (Inventor); Thakoor, Anilkumar P. (Inventor)

    1995-01-01

    High-speed, analog, fully-parallel and asynchronous building blocks are cascaded for larger sizes and enhanced resolution. A hardware-compatible algorithm permits hardware-in-the-loop learning despite limited weight resolution. A comparison-intensive feature classification application has been demonstrated with this flexible hardware and new algorithm at high speed. This result indicates that these building block chips can be embedded as application-specific-coprocessors for solving real-world problems at extremely high data rates.

  9. VLSI Architecture Of A Binary Up/Down Counter

    Science.gov (United States)

    Hsu, In-Shek; Truong, Trieu-Kie; Reed, I. S.

    1988-01-01

    Identical stages contain relatively-few logic gates. New algorithm simplifies design of binary up/down counter. Design suitable for very-large-scale integrated circuits. Contains simple "pipeline" array of identical cells. Programmable logic unit converts increment and decrement input signals to "U" and "D" signals required by algorithm of counter.

  10. Adaptive WTA with an analog VLSI neuromorphic learning chip.

    Science.gov (United States)

    Häfliger, Philipp

    2007-03-01

    In this paper, we demonstrate how a particular spike-based learning rule (where exact temporal relations between input and output spikes of a spiking model neuron determine the changes of the synaptic weights) can be tuned to express rate-based classical Hebbian learning behavior (where the average input and output spike rates are sufficient to describe the synaptic changes). This shift in behavior is controlled by the input statistic and by a single time constant. The learning rule has been implemented in a neuromorphic very large scale integration (VLSI) chip as part of a neurally inspired spike signal image processing system. The latter is the result of the European Union research project Convolution AER Vision Architecture for Real-Time (CAVIAR). Since it is implemented as a spike-based learning rule (which is most convenient in the overall spike-based system), even if it is tuned to show rate behavior, no explicit long-term average signals are computed on the chip. We show the rule's rate-based Hebbian learning ability in a classification task in both simulation and chip experiment, first with artificial stimuli and then with sensor input from the CAVIAR system.

  11. Parallel VLSI design for the fast -D DWT core algorithm

    Institute of Scientific and Technical Information of China (English)

    WEI Benjie; LIU Mingye; ZHOU Yihua; CHENG Baodong

    2007-01-01

    By studying the core algorithm of a three-dimensional discrete wavelet transform (3-D DWT) in depth,this Paper divides it into three one-dimensional discrete wavelet transforms (1-D DWTs).Based on the implementation of a 3-D DWT software,a parallel architecture design of a very large-scale integration(VLSI)is produced.It needs three dual-port random-access memory(RAM)to store the temporary results and transpose the matrix,then builds up a pipeline model composed of the three 1-D DWTs.In the design.the finite state machine(FSM)is used well to control the flow.Compared with the serial mode.the experimental results of the post synthesized simulation show that the design method is correct and effective.It can increase the processing speed by about 66%.work at 59 MHz,and meet the real-time needs of the video encoder.

  12. VLSI Circuits for High Speed Data Conversion

    Science.gov (United States)

    1994-05-16

    Meeting, pp. 289-292, Sept. 199 1. [4] Behzad Razavi , "High-Speed, Nigh-Resolution Analog-to-Digital Conversion in VLSI Technologies, Ph.D. Thesis... Behzad Razavi and Bruce A. Wooley, "Design Techniques for High-Speed, High- Resolution Comparators," IEEE J. Solid-State Circuits, vol. 27, pp. 1916-192...Dec. 1992. [8] Behzad Razavi and Bruce A. Wooley, "A 12-Bkt 5-MSamplesoc Two-Step CMOS A/D Converter," IEEE J. Solid-State Circuits, vol. 27, pp

  13. Self arbitrated VLSI asynchronous sequential circuits

    Science.gov (United States)

    Whitaker, S.; Maki, G.

    1990-01-01

    A new class of asynchronous sequential circuits is introduced in this paper. The new design procedures are oriented towards producing asynchronous sequential circuits that are implemented with CMOS VLSI and take advantage of pass transistor technology. The first design algorithm utilizes a standard Single Transition Time (STT) state assignment. The second method introduces a new class of self synchronizing asynchronous circuits which eliminates the need for critical race free state assignments. These circuits arbitrate the transition path action by forcing the circuit to sequence through proper unstable states. These methods result in near minimum hardware since only the transition paths associated with state variable changes need to be implemented with pass transistor networks.

  14. Single Spin Logic Implementation of VLSI Adders

    CERN Document Server

    Shukla, Soumitra

    2011-01-01

    Some important VLSI adder circuits are implemented using quantum dots (qd) and Spin Polarized Scanning Tunneling Microscopy (SPSTM) in Single Spin Logic (SSL) paradigm. A simple comparison between these adder circuits shows that the mirror adder implementation in SSL does not carry any advantage over CMOS adder in terms of complexity and number of qds, opposite to the trend observed in their charge-based counterparts. On the contrary, the transmission gate adder, Static and Dynamic Manchester carry gate adders in SSL reduce the complexity and number of qds, in harmony with the trend shown in transistor adders.

  15. An Analog VLSI Saccadic Eye Movement System

    OpenAIRE

    1994-01-01

    In an effort to understand saccadic eye movements and their relation to visual attention and other forms of eye movements, we - in collaboration with a number of other laboratories - are carrying out a large-scale effort to design and build a complete primate oculomotor system using analog CMOS VLSI technology. Using this technology, a low power, compact, multi-chip system has been built which works in real-time using real-world visual inputs. We describe in this paper the performance of a...

  16. Communication Protocols Augmentation in VLSI Design Applications

    Directory of Open Access Journals (Sweden)

    Kanhu Charan Padhy

    2015-05-01

    Full Text Available With the advancement in communication System, the use of various protocols got a sharp rise in the different applications. Especially in the VLSI design for FPGAs, ASICS, CPLDs, the application areas got expanded to FPGA based technologies. Today, it has moved from commercial application to the defence sectors like missiles & aerospace controls. In this paper the use of FPGAs and its interface with various application circuits in the communication field for data (textual & visual & control transfer is discussed. To be specific, the paper discusses the use of FPGA in various communication protocols like SPI, I2C, and TMDS in synchronous mode in Digital System Design using VHDL/Verilog.

  17. VLSI binary multiplier using residue number systems

    Energy Technology Data Exchange (ETDEWEB)

    Barsi, F.; Di Cola, A.

    1982-01-01

    The idea of performing multiplication of n-bit binary numbers using a hardware based on residue number systems is considered. This paper develops the design of a VLSI chip deriving area and time upper bounds of a n-bit multiplier. To perform multiplication using residue arithmetic, numbers are converted from binary to residue representation and, after residue multiplication, the result is reconverted to the original notation. It is shown that the proposed design requires an area a=o(n/sup 2/ log n) and an execution time t=o(log/sup 2/n). 7 references.

  18. Benchmarking a DSP processor

    OpenAIRE

    Lennartsson, Per; Nordlander, Lars

    2002-01-01

    This Master thesis describes the benchmarking of a DSP processor. Benchmarking means measuring the performance in some way. In this report, we have focused on the number of instruction cycles needed to execute certain algorithms. The algorithms we have used in the benchmark are all very common in signal processing today. The results we have reached in this thesis have been compared to benchmarks for other processors, performed by Berkeley Design Technology, Inc. The algorithms were programm...

  19. Algorithms, architectures and information systems security

    CERN Document Server

    Sur-Kolay, Susmita; Nandy, Subhas C; Bagchi, Aditya

    2008-01-01

    This volume contains articles written by leading researchers in the fields of algorithms, architectures, and information systems security. The first five chapters address several challenging geometric problems and related algorithms. These topics have major applications in pattern recognition, image analysis, digital geometry, surface reconstruction, computer vision and in robotics. The next five chapters focus on various optimization issues in VLSI design and test architectures, and in wireless networks. The last six chapters comprise scholarly articles on information systems security coverin

  20. Technology computer aided design simulation for VLSI MOSFET

    CERN Document Server

    Sarkar, Chandan Kumar

    2013-01-01

    Responding to recent developments and a growing VLSI circuit manufacturing market, Technology Computer Aided Design: Simulation for VLSI MOSFET examines advanced MOSFET processes and devices through TCAD numerical simulations. The book provides a balanced summary of TCAD and MOSFET basic concepts, equations, physics, and new technologies related to TCAD and MOSFET. A firm grasp of these concepts allows for the design of better models, thus streamlining the design process, saving time and money. This book places emphasis on the importance of modeling and simulations of VLSI MOS transistors and

  1. The VLSI-PLM Board: Design, Construction, and Testing

    Science.gov (United States)

    1989-03-01

    Computer Aided Design CB- Xenologic Corporation’s X-1 cache board DAS - Digital Analysis System EECS - Electrical Engineering and Computer...PLM Board is to debug the VLSI-PLM Chip [STN88] and to interface the chip to the Xenologic Corporation’s X-1 cache board. The chip is a high...a wire-wrapped board designed for debugging VLSI-PLM [STN88] and connecting VLSI- PLM to the cache board of Xenologic Corporation’s X-1 system. The

  2. Explore the Performance of the ARM Processor Using JPEG

    Directory of Open Access Journals (Sweden)

    A.D. Jadhav

    2010-01-01

    Full Text Available Recently, the evolution of embedded systems has shown a strong trend towards application- specific, single- chip solutions. The ARM processor core is a leading RISC processor architecture in the embedded domain. The ARM family of processors supports a unique feature of code size reduction. In this paper it is illustrated using an embedded platform trying to design an image encoder, more specifically a JPEG encoder using ARM7TDMI processor. Here gray scale image is used and it is coded by using keil software and same procedure is repeated by using MATLAB software for compare the results with standard one. Successfully putting a new application of JPEG on ARM7 processor.

  3. Digital optical cellular image processor (DOCIP) - Experimental implementation

    Science.gov (United States)

    Huang, K.-S.; Sawchuk, A. A.; Jenkins, B. K.; Chavel, P.; Wang, J.-M.; Weber, A. G.; Wang, C.-H.; Glaser, I.

    1993-01-01

    We demonstrate experimentally the concept of the digital optical cellular image processor architecture by implementing one processing element of a prototype optical computer that includes a 54-gate processor, an instruction decoder, and electronic input-output interfaces. The processor consists of a two-dimensional (2-D) array of 54 optical logic gates implemented by use of a liquid-crystal light valve and a 2-D array of 53 subholograms to provide interconnections between gates. The interconnection hologram is fabricated by a computer-controlled optical system.

  4. CMOS VLSI Active-Pixel Sensor for Tracking

    Science.gov (United States)

    Pain, Bedabrata; Sun, Chao; Yang, Guang; Heynssens, Julie

    2004-01-01

    An architecture for a proposed active-pixel sensor (APS) and a design to implement the architecture in a complementary metal oxide semiconductor (CMOS) very-large-scale integrated (VLSI) circuit provide for some advanced features that are expected to be especially desirable for tracking pointlike features of stars. The architecture would also make this APS suitable for robotic- vision and general pointing and tracking applications. CMOS imagers in general are well suited for pointing and tracking because they can be configured for random access to selected pixels and to provide readout from windows of interest within their fields of view. However, until now, the architectures of CMOS imagers have not supported multiwindow operation or low-noise data collection. Moreover, smearing and motion artifacts in collected images have made prior CMOS imagers unsuitable for tracking applications. The proposed CMOS imager (see figure) would include an array of 1,024 by 1,024 pixels containing high-performance photodiode-based APS circuitry. The pixel pitch would be 9 m. The operations of the pixel circuits would be sequenced and otherwise controlled by an on-chip timing and control block, which would enable the collection of image data, during a single frame period, from either the full frame (that is, all 1,024 1,024 pixels) or from within as many as 8 different arbitrarily placed windows as large as 8 by 8 pixels each. A typical prior CMOS APS operates in a row-at-a-time ( grolling-shutter h) readout mode, which gives rise to exposure skew. In contrast, the proposed APS would operate in a sample-first/readlater mode, suppressing rolling-shutter effects. In this mode, the analog readout signals from the pixels corresponding to the windows of the interest (which windows, in the star-tracking application, would presumably contain guide stars) would be sampled rapidly by routing them through a programmable diagonal switch array to an on-chip parallel analog memory array. The

  5. Hardware multiplier processor

    Science.gov (United States)

    Pierce, Paul E.

    1986-01-01

    A hardware processor is disclosed which in the described embodiment is a memory mapped multiplier processor that can operate in parallel with a 16 bit microcomputer. The multiplier processor decodes the address bus to receive specific instructions so that in one access it can write and automatically perform single or double precision multiplication involving a number written to it with or without addition or subtraction with a previously stored number. It can also, on a single read command automatically round and scale a previously stored number. The multiplier processor includes two concatenated 16 bit multiplier registers, two 16 bit concatenated 16 bit multipliers, and four 16 bit product registers connected to an internal 16 bit data bus. A high level address decoder determines when the multiplier processor is being addressed and first and second low level address decoders generate control signals. In addition, certain low order address lines are used to carry uncoded control signals. First and second control circuits coupled to the decoders generate further control signals and generate a plurality of clocking pulse trains in response to the decoded and address control signals.

  6. Signal processor packaging design

    Science.gov (United States)

    McCarley, Paul L.; Phipps, Mickie A.

    1993-10-01

    The Signal Processor Packaging Design (SPPD) program was a technology development effort to demonstrate that a miniaturized, high throughput programmable processor could be fabricated to meet the stringent environment imposed by high speed kinetic energy guided interceptor and missile applications. This successful program culminated with the delivery of two very small processors, each about the size of a large pin grid array package. Rockwell International's Tactical Systems Division in Anaheim, California developed one of the processors, and the other was developed by Texas Instruments' (TI) Defense Systems and Electronics Group (DSEG) of Dallas, Texas. The SPPD program was sponsored by the Guided Interceptor Technology Branch of the Air Force Wright Laboratory's Armament Directorate (WL/MNSI) at Eglin AFB, Florida and funded by SDIO's Interceptor Technology Directorate (SDIO/TNC). These prototype processors were subjected to rigorous tests of their image processing capabilities, and both successfully demonstrated the ability to process 128 X 128 infrared images at a frame rate of over 100 Hz.

  7. Recursive Architecture for the Forward and Inverse Modified Discrete Cosine Transfer

    Institute of Scientific and Technical Information of China (English)

    LUJunming; JIANGGuoxiong; LINZhenghui

    2003-01-01

    Recursive algorithms have been found very effective for realization using software and very large scale integrated circuit (VLSI) techniques. In this paper, an efficient recursive algorithm for the forward and inverse modified discrete cosine transfer (IMDCT) with arbitrary length is presented. This new recursive structure for the transform kernels of the MDCT and IMDCT can reduce the computational complexity. The proposed regular ar-chitecture is particularly suitable for parallel VLSI realiza-tion.

  8. Token-Aware Completion Functions for Elastic Processor Verification

    Directory of Open Access Journals (Sweden)

    Sudarshan K. Srinivasan

    2009-01-01

    Full Text Available We develop a formal verification procedure to check that elastic pipelined processor designs correctly implement their instruction set architecture (ISA specifications. The notion of correctness we use is based on refinement. Refinement proofs are based on refinement maps, which—in the context of this problem—are functions that map elastic processor states to states of the ISA specification model. Data flow in elastic architectures is complicated by the insertion of any number of buffers in any place in the design, making it hard to construct refinement maps for elastic systems in a systematic manner. We introduce token-aware completion functions, which incorporate a mechanism to track the flow of data in elastic pipelines, as a highly automated and systematic approach to construct refinement maps. We demonstrate the efficiency of the overall verification procedure based on token-aware completion functions using six elastic pipelined processor models based on the DLX architecture.

  9. The 1992 4th NASA SERC Symposium on VLSI Design

    Science.gov (United States)

    Whitaker, Sterling R.

    1992-01-01

    Papers from the fourth annual NASA Symposium on VLSI Design, co-sponsored by the IEEE, are presented. Each year this symposium is organized by the NASA Space Engineering Research Center (SERC) at the University of Idaho and is held in conjunction with a quarterly meeting of the NASA Data System Technology Working Group (DSTWG). One task of the DSTWG is to develop new electronic technologies that will meet next generation electronic data system needs. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The NASA SERC is proud to offer, at its fourth symposium on VLSI design, presentations by an outstanding set of individuals from national laboratories, the electronics industry, and universities. These speakers share insights into next generation advances that will serve as a basis for future VLSI design.

  10. Interaction of algorithm and implementation for analog VLSI stereo vision

    Science.gov (United States)

    Hakkarainen, J. M.; Little, James J.; Lee, Hae-Seung; Wyatt, John L., Jr.

    1991-07-01

    Design of a high-speed stereo vision system in analog VLSI technology is reported. The goal is to determine how the advantages of analog VLSI--small area, high speed, and low power-- can be exploited, and how the effects of its principal disadvantages--limited accuracy, inflexibility, and lack of storage capacity--can be minimized. Three stereo algorithms are considered, and a simulation study is presented to examine details of the interaction between algorithm and analog VLSI implementation. The Marr-Poggio-Drumheller algorithm is shown to be best suited for analog VLSI implementation. A CCD/CMOS stereo system implementation is proposed, capable of operation at 6000 image frame pairs per second for 48 X 48 images, and faster than frame rate operation on 256 X 256 binocular image pairs.

  11. DART: A Functional-Level Reconfigurable Architecture for High Energy Efficiency

    Directory of Open Access Journals (Sweden)

    Sébastien Pillement

    2007-12-01

    Full Text Available Flexibility becomes a major concern for the development of multimedia and mobile communication systems, as well as classical high-performance and low-energy consumption constraints. The use of general-purpose processors solves flexibility problems but fails to cope with the increasing demand for energy efficiency. This paper presents the DART architecture based on the functional-level reconfiguration paradigm which allows a significant improvement in energy efficiency. DART is built around a hierarchical interconnection network allowing high flexibility while keeping the power overhead low. To enable specific optimizations, DART supports two modes of reconfiguration. The compilation framework is built using compilation and high-level synthesis techniques. A 3G mobile communication application has been implemented as a proof of concept. The energy distribution within the architecture and the physical implementation are also discussed. Finally, the VLSI design of a 0.13 μm CMOS SoC implementing a specialized DART cluster is presented.

  12. Finite Field Arithmetic Architecture Based on Cellular Array

    Directory of Open Access Journals (Sweden)

    Kee-Won Kim

    2015-05-01

    Full Text Available Recently, various finite field arithmetic structures are introduced for VLSI circuit implementation on cryptosystems and error correcting codes. In this study, we present an efficient finite field arithmetic architecture based on cellular semi-systolic array for Montgomery multiplication by choosing a proper Montgomery factor which is highly suitable for the design on parallel structures. Therefore, our architecture has reduced a time complexity by 50% compared to typical architecture.

  13. NASA Space Engineering Research Center for VLSI System Design

    Science.gov (United States)

    1993-01-01

    This annual report outlines the activities of the past year at the NASA SERC on VLSI Design. Highlights for this year include the following: a significant breakthrough was achieved in utilizing commercial IC foundries for producing flight electronics; the first two flight qualified chips were designed, fabricated, and tested and are now being delivered into NASA flight systems; and a new technology transfer mechanism has been established to transfer VLSI advances into NASA and commercial systems.

  14. Design and Verification of High-Speed VLSI Physical Design

    Institute of Scientific and Technical Information of China (English)

    Dian Zhou; Rui-Ming Li

    2005-01-01

    With the rapid development of deep submicron (DSM) VLSI circuit designs, many issues such as time closure and power consumption are making the physical designs more and more challenging. In this review paper we provide readers with some recent progress of the VLSI physical designs. The recent developments of floorplanning and placement,interconnect effects, modeling and delay, buffer insertion and wire sizing, circuit order reduction, power grid analysis,parasitic extraction, and clock signal distribution are briefly reviewed.

  15. The Milstar Advanced Processor

    Science.gov (United States)

    Tjia, Khiem-Hian; Heely, Stephen D.; Morphet, John P.; Wirick, Kevin S.

    The Milstar Advanced Processor (MAP) is a 'drop-in' replacement for its predecessor which preserves existing interfaces with other Milstar satellite processors and minimizes the impact of such upgrading to already-developed application software. In addition to flight software development, and hardware development that involves the application of VHSIC technology to the electrical design, the MAP project is developing two sophisticated and similar test environments. High density RAM and ROM are employed by the MAP memory array. Attention is given to the fine-pitch VHSIC design techniques and lead designs used, as well as the tole of TQM and concurrent engineering in the development of the MAP manufacturing process.

  16. VLSI Circuit Configuration Using Satisfiability Logic in Hopfield Network

    Directory of Open Access Journals (Sweden)

    Mohd Asyraf Mansor

    2016-09-01

    Full Text Available Very large scale integration (VLSI circuit comprises of integrated circuit (IC with transistors in a single chip, widely used in many sophisticated electronic devices. In our paper, we proposed VLSI circuit design by implementing satisfiability problem in Hopfield neural network as circuit verification technique. We restrict our logic construction to 2-Satisfiability (2-SAT and 3- Satisfiability (3-SAT clauses in order to suit with the transistor configuration in VLSI circuit. In addition, we developed VLSI circuit based on Hopfield neural network in order to detect any possible error earlier than the manual circuit design. Microsoft Visual C++ 2013 is used as a platform for training, testing and validating of our proposed design. Hence, the performance of our proposed technique evaluated based on global VLSI configuration, circuit accuracy and the runtime. It has been observed that the VLSI circuits (HNN-2SAT and HNN-3SAT circuit developed by proposed design are better than the conventional circuit due to the early error detection in our circuit.

  17. Hardware Synchronization for Embedded Multi-Core Processors

    DEFF Research Database (Denmark)

    Stoif, Christian; Schoeberl, Martin; Liccardi, Benito

    2011-01-01

    Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper, establi......Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper...

  18. VLSI Design of a Turbo Decoder

    Science.gov (United States)

    Fang, Wai-Chi

    2007-01-01

    A very-large-scale-integrated-circuit (VLSI) turbo decoder has been designed to serve as a compact, high-throughput, low-power, lightweight decoder core of a receiver in a data-communication system. In a typical contemplated application, such a decoder core would be part of a single integrated circuit that would include the rest of the receiver circuitry and possibly some or all of the transmitter circuitry, all designed and fabricated together according to an advanced communication-system-on-a-chip design concept. Turbo codes are forward-error-correction (FEC) codes. Relative to older FEC codes, turbo codes enable communication at lower signal-to-noise ratios and offer greater coding gain. In addition, turbo codes can be implemented by relatively simple hardware. Therefore, turbo codes have been adopted as standard for some advanced broadband communication systems.

  19. Analog VLSI neural network integrated circuits

    Science.gov (United States)

    Kub, F. J.; Moon, K. K.; Just, E. A.

    1991-01-01

    Two analog very large scale integration (VLSI) vector matrix multiplier integrated circuit chips were designed, fabricated, and partially tested. They can perform both vector-matrix and matrix-matrix multiplication operations at high speeds. The 32 by 32 vector-matrix multiplier chip and the 128 by 64 vector-matrix multiplier chip were designed to perform 300 million and 3 billion multiplications per second, respectively. An additional circuit that has been developed is a continuous-time adaptive learning circuit. The performance achieved thus far for this circuit is an adaptivity of 28 dB at 300 KHz and 11 dB at 15 MHz. This circuit has demonstrated greater than two orders of magnitude higher frequency of operation than any previous adaptive learning circuit.

  20. Relaxation Based Electrical Simulation for VLSI Circuits

    Directory of Open Access Journals (Sweden)

    S. Rajkumar

    2012-06-01

    Full Text Available Electrical circuit simulation was one of the first CAD tools developed for IC design. The conventional circuit simulators like SPICE and ASTAP were designed initially for the cost effective analysis of circuits containing a few hundred transistors or less. A number of approaches have been used to improve the performances of congenital circuit simulators for the analysis of large circuits. Thereafter relaxation methods was proposed to provide more accurate waveforms than standard circuit simulators with up to two orders of magnitude speed improvement for large circuits. In this paper we have tried to highlights recently used waveform and point relaxation techniques for simulation of VLSI circuits. We also propose a simple parallelization technique and experimentally demonstrate that we can solve digital circuits with tens of million transistors in a few hours.

  1. Multi-net optimization of VLSI interconnect

    CERN Document Server

    Moiseev, Konstantin; Wimer, Shmuel

    2015-01-01

    This book covers layout design and layout migration methodologies for optimizing multi-net wire structures in advanced VLSI interconnects. Scaling-dependent models for interconnect power, interconnect delay and crosstalk noise are covered in depth, and several design optimization problems are addressed, such as minimization of interconnect power under delay constraints, or design for minimal delay in wire bundles within a given routing area. A handy reference or a guide for design methodologies and layout automation techniques, this book provides a foundation for physical design challenges of interconnect in advanced integrated circuits.  • Describes the evolution of interconnect scaling and provides new techniques for layout migration and optimization, focusing on multi-net optimization; • Presents research results that provide a level of design optimization which does not exist in commercially-available design automation software tools; • Includes mathematical properties and conditions for optimal...

  2. PLA realizations for VLSI state machines

    Science.gov (United States)

    Gopalakrishnan, S.; Whitaker, S.; Maki, G.; Liu, K.

    1990-01-01

    A major problem associated with state assignment procedures for VLSI controllers is obtaining an assignment that produces minimal or near minimal logic. The key item in Programmable Logic Array (PLA) area minimization is the number of unique product terms required by the design equations. This paper presents a state assignment algorithm for minimizing the number of product terms required to implement a finite state machine using a PLA. Partition algebra with predecessor state information is used to derive a near optimal state assignment. A maximum bound on the number of product terms required can be obtained by inspecting the predecessor state information. The state assignment algorithm presented is much simpler than existing procedures and leads to the same number of product terms or less. An area-efficient PLA structure implemented in a 1.0 micron CMOS process is presented along with a summary of the performance for a controller implemented using this design procedure.

  3. Interactive Digital Signal Processor

    Science.gov (United States)

    Mish, W. H.

    1985-01-01

    Interactive Digital Signal Processor, IDSP, consists of set of time series analysis "operators" based on various algorithms commonly used for digital signal analysis. Processing of digital signal time series to extract information usually achieved by applications of number of fairly standard operations. IDSP excellent teaching tool for demonstrating application for time series operators to artificially generated signals.

  4. Beyond processor sharing

    NARCIS (Netherlands)

    Aalto, S.; Ayesta, U.; Borst, S.C.; Misra, V.; Núñez Queija, R.

    2007-01-01

    While the (Egalitarian) Processor-Sharing (PS) discipline offers crucial insights in the performance of fair resource allocation mechanisms, it is inherently limited in analyzing and designing differentiated scheduling algorithms such as Weighted Fair Queueing and Weighted Round-Robin. The Discrimin

  5. Conversion via software of a simd processor into a mimd processor

    Energy Technology Data Exchange (ETDEWEB)

    Guzman, A.; Gerzso, M.; Norkin, K.B.; Vilenkin, S.Y.

    1983-01-01

    A method is described which takes a pure LISP program and automatically decomposes it via automatic parallelization into several parts, one for each processor of an SIMD architecture. Each of these parts is a different execution flow, i.e., a different program. The execution of these different programs by an SIMD architecture is examined. The method has been developed in some detail for the PS-2000, an SIMD Soviet multiprocessor, making it behave like AHR, a Mexican MIMD multi-microprocessor. Both the PS-2000 and AHR execute a pure LISP program in parallel; its decomposition into >n> pieces, their synchronization, scheduling, etc., are performed by the system (hardware and software). In order to achieve simultaneous execution of different programs in an SIMD processor, the method uses a scheme of node scheduling and node exportation. 14 references.

  6. Design and Evaluation of Fault-Tolerant VLSI/WSI Processor Arrays.

    Science.gov (United States)

    1987-12-31

    way that production quantitiis. However, the -arR-- adata item is not only used when it is input advantages of this technology are nut fully FIR, IIR...rla’lt fP i m iodhile is r , -aIict i tn o’ raftlier low% wh~en N i. very big . ’lf’s i 𔃾n -V4let2t Ito Ill’ statement thlat, regardless of the relia

  7. The Central Trigger Processor (CTP)

    CERN Multimedia

    Franchini, Matteo

    2016-01-01

    The Central Trigger Processor (CTP) receives trigger information from the calorimeter and muon trigger processors, as well as from other sources of trigger. It makes the Level-1 decision (L1A) based on a trigger menu.

  8. VLSI circuit techniques and technologies for ultrahigh speed data conversion interfaces

    Science.gov (United States)

    Wooley, Bruce A.

    1991-04-01

    The performance of digital VLSI signal processing and communications systems is often limited by the data conversion interfaces between digital system-level components and the analog environment in which those components are embedded. The focus of this program has been research into the fundamental nature of such interfaces in systems that digitally process high-bandwidth signals for purposes such as radar imaging, high-resolution graphics, high-definition video, mobile and fiber-optic communications, and broadband instrumentation. Effort has been devoted to the study of both generic circuit functions, such as sampling and comparison, and architectural alternatives relevant to the implementation of high-speed data converters in present and emerging VLSI technologies. Specific results of the research include the design and realization of novel low-power CMOS and BiCMOS sampled-data comparators operating at rates as high as 200 MHz, the exploration of various design approaches to the implementation of high-speed sample-and-hold circuits in CMOS and BiCMOS technologies, and the design of a subranging CMOS analog-to-digital converter that provides 12-bit resolution at a conversion rate of 10 MHz.

  9. A Domain Specific DSP Processor

    OpenAIRE

    Tell, Eric

    2001-01-01

    This thesis describes the design of a domain specific DSP processor. The thesis is divided into two parts. The first part gives some theoretical background, describes the different steps of the design process (both for DSP processors in general and for this project) and motivates the design decisions made for this processor. The second part is a nearly complete design specification. The intended use of the processor is as a platform for hardware acceleration units. Support for this has howe...

  10. Processor register error correction management

    Science.gov (United States)

    Bose, Pradip; Cher, Chen-Yong; Gupta, Meeta S.

    2016-12-27

    Processor register protection management is disclosed. In embodiments, a method of processor register protection management can include determining a sensitive logical register for executable code generated by a compiler, generating an error-correction table identifying the sensitive logical register, and storing the error-correction table in a memory accessible by a processor. The processor can be configured to generate a duplicate register of the sensitive logical register identified by the error-correction table.

  11. Embedding Binary Tree in VLSI/WSI Processor Array

    Institute of Scientific and Technical Information of China (English)

    陈宗汉

    1996-01-01

    Many reconfiguration schemes for fault-tolerant binary tree architectures have been proposed in the literature[1-6].The VLSI layouts of most previous studies are based on the classical H-tree layout,resulting in low area utilization and likely an unnecessarily high maufacturing cost simply due to the waste of a significant portion of silicon area. In this paper,we present an area-efficient approach to the reconfigurable binary tree architecture.Area utilization and interconnection complexity of our design compare favorably with the other known approaches.In the reliability analysis,we take into account the fact that accepted chips(after fabrication)are with different degrees of redundancy initially,so as to obtain results which better reflect real situations.

  12. Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

    Directory of Open Access Journals (Sweden)

    Abdul Kareem PARCHUR

    2010-12-01

    Full Text Available As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalability in Intel Core 2 Duo series processors. Even though AI benchmarks have similar execution time, they have dissimilar characteristics which are identified using principal component analysis and dendogram. As the processor frequency increased from 1.8 GHz to 3.167 GHz the execution time is decreased by ~370 sec for AI workloads. In the case of Physics/Quantum Computing programs it was ~940 sec.

  13. Time-Predictable Computer Architecture

    Directory of Open Access Journals (Sweden)

    Schoeberl Martin

    2009-01-01

    Full Text Available Today's general-purpose processors are optimized for maximum throughput. Real-time systems need a processor with both a reasonable and a known worst-case execution time (WCET. Features such as pipelines with instruction dependencies, caches, branch prediction, and out-of-order execution complicate WCET analysis and lead to very conservative estimates. In this paper, we evaluate the issues of current architectures with respect to WCET analysis. Then, we propose solutions for a time-predictable computer architecture. The proposed architecture is evaluated with implementation of some features in a Java processor. The resulting processor is a good target for WCET analysis and still performs well in the average case.

  14. Cache Energy Optimization Techniques For Modern Processors

    Energy Technology Data Exchange (ETDEWEB)

    Mittal, Sparsh [ORNL

    2013-01-01

    Modern multicore processors are employing large last-level caches, for example Intel's E7-8800 processor uses 24MB L3 cache. Further, with each CMOS technology generation, leakage energy has been dramatically increasing and hence, leakage energy is expected to become a major source of energy dissipation, especially in last-level caches (LLCs). The conventional schemes of cache energy saving either aim at saving dynamic energy or are based on properties specific to first-level caches, and thus these schemes have limited utility for last-level caches. Further, several other techniques require offline profiling or per-application tuning and hence are not suitable for product systems. In this book, we present novel cache leakage energy saving schemes for single-core and multicore systems; desktop, QoS, real-time and server systems. Also, we present cache energy saving techniques for caches designed with both conventional SRAM devices and emerging non-volatile devices such as STT-RAM (spin-torque transfer RAM). We present software-controlled, hardware-assisted techniques which use dynamic cache reconfiguration to configure the cache to the most energy efficient configuration while keeping the performance loss bounded. To profile and test a large number of potential configurations, we utilize low-overhead, micro-architecture components, which can be easily integrated into modern processor chips. We adopt a system-wide approach to save energy to ensure that cache reconfiguration does not increase energy consumption of other components of the processor. We have compared our techniques with state-of-the-art techniques and have found that our techniques outperform them in terms of energy efficiency and other relevant metrics. The techniques presented in this book have important applications in improving energy-efficiency of higher-end embedded, desktop, QoS, real-time, server processors and multitasking systems. This book is intended to be a valuable guide for both

  15. Conversion of an 8-bit to a 16-bit Soft-core RISC Processor

    Directory of Open Access Journals (Sweden)

    Ahmad Jamal Salim

    2013-03-01

    Full Text Available The demand for 8-bit processors nowadays is still going strong despite efforts by manufacturers in producing higher end microcontroller solutions to the mass market. Low-end processor offers a simple, low-cost and fast solution especially on I/O applications development in embedded system. However, due to architectural constraint, complex calculation could not be performed efficiently on 8-bit processor. This paper presents the conversion method from an 8-bit to a 16-bit Reduced Instruction Set Computer (RISC processor in a soft-core reconfigurable platform in order to extend its capability in handling larger data sets thus enabling intensive calculations process. While the conversion expands the data bus width to 16-bit, it also maintained the simple architecture design of an 8-bit processor.The expansion also provides more room for improvement to the processor’s performance. The modified architecture is successfully simulated in CPUSim together with its new instruction set architecture (ISA. Xilinx Virtex-6 platform is utilized to execute and verified the architecture. Results show that the modified 16-bit RISC architecture only required 17% more register slice on Field Programmable Gate Array (FPGA implementation which is a slight increase compared to the original 8-bit RISC architecture. A test program containing instruction sets that handle 16-bit data are also simulated and verified. As the 16-bit architecture is described as a soft-core, further modifications could be performed in order to customize the architecture to suit any specific applications.

  16. A Shared Memory Module for Asynchronous Arrays of Processors

    Directory of Open Access Journals (Sweden)

    Zhiyi Yu

    2007-05-01

    Full Text Available A shared memory module connecting multiple independently clocked processors is presented. The memory module itself is independently clocked, supports hardware address generation, mutual exclusion, and multiple addressing modes. The architecture supports independent address generation and data generation/consumption by different processors which increases efficiency and simplifies programming for many embedded and DSP tasks. Simultaneous access by different processors is arbitrated using a least-recently-serviced priority scheme. Simulations show high throughputs over a variety of memory loads. A standard cell implementation shares an 8 K-word SRAM among four processors, and can support a 64 K-word SRAM with no additional changes. It cycles at 555 MHz and occupies 1.2 mm2 in 0.18 μm CMOS.

  17. A Shared Memory Module for Asynchronous Arrays of Processors

    Directory of Open Access Journals (Sweden)

    Meeuwsen MichaelJ

    2007-01-01

    Full Text Available A shared memory module connecting multiple independently clocked processors is presented. The memory module itself is independently clocked, supports hardware address generation, mutual exclusion, and multiple addressing modes. The architecture supports independent address generation and data generation/consumption by different processors which increases efficiency and simplifies programming for many embedded and DSP tasks. Simultaneous access by different processors is arbitrated using a least-recently-serviced priority scheme. Simulations show high throughputs over a variety of memory loads. A standard cell implementation shares an 8 K-word SRAM among four processors, and can support a 64 K-word SRAM with no additional changes. It cycles at 555 MHz and occupies 1.2 mm2 in 0.18 μm CMOS.

  18. Space and frequency-multiplexed optical linear algebra processor - Fabrication and initial tests

    Science.gov (United States)

    Casasent, D.; Jackson, J.

    1986-01-01

    A new optical linear algebra processor architecture is described. Space and frequency-multiplexing are used to accommodate bipolar and complex-valued data. A fabricated laboratory version of this processor is described, the electronic support system used is discussed, and initial test data obtained on it are presented.

  19. Stereoscopic Optical Signal Processor

    Science.gov (United States)

    Graig, Glenn D.

    1988-01-01

    Optical signal processor produces two-dimensional cross correlation of images from steroscopic video camera in real time. Cross correlation used to identify object, determines distance, or measures movement. Left and right cameras modulate beams from light source for correlation in video detector. Switch in position 1 produces information about range of object viewed by cameras. Position 2 gives information about movement. Position 3 helps to identify object.

  20. Concurrent Smalltalk on the Message-Driven Processor

    Science.gov (United States)

    1991-09-01

    NDPSim -x 2 -y 2 - maize Ox1O00 ::Coamoa:Coamoa.m NewFact.mdp Message-Driven Processor Simulator Version 7.0 Rev B Accompanies MDP Architecture Document...it could peel invocations of recursive functions forever. However, the sin- gle pass of inlining does not mean that functions are only inlined one

  1. Processor Management in the Tera MTA Computer System,

    Science.gov (United States)

    1993-01-01

    This paper describes the processor scheduling issues specific to the Tera MTA (Multi Threaded Architecture) computer system and presents solutions to...classic scheduling problems. The Tera MTA exploits parallelism at all levels, from fine-grained instruction-level parallelism within a single

  2. Knowledge-based synthesis of custom VLSI physical design tools: First steps

    Science.gov (United States)

    Setliff, Dorothy E.; Rutenbar, Rob A.

    A description is given of how program synthesis techniques can be applied to the synthesis of technology-sensitive VLSI physical design tools. Physical design refers to the process of reducing a structural description of a piece of hardware down to the geometric layout of an integrated circuit. Successful physical design tools must cope with shifting technology and application environments. The goal is to automatically generate a tool's implementation to match the application. The authors describe a synthesis architecture that combines knowledge of the application domain and knowledge of generic programming mechanics. The approach uses a very high-level language to describe algorithms, domain and programming knowledge to select appropriate algorithms and data structures, and code generation to arrive at final executable code. Results are presented detailing the performance and implementation of ELF, a prototype generator for wire-routing applications. Comparisons between a hand-crafted router and an automatically synthesized router are presented.

  3. VLSI design of 3D display processing chip for binocular stereo displays

    Institute of Scientific and Technical Information of China (English)

    Ge Chenyang; Zheng Nanning

    2010-01-01

    In order to develop the core chip supporting binocular stereo displays for head mounted display(HMD)and glasses-TV,a very large scale integrated(VLSI)design scheme is proposed by using a pipeline architecture for 3D display processing chip(HMD100).Some key techniques including stereo display processing and high precision video scaling based bicubic interpolation,and their hardware implementations which improve the image quality are presented.The proposed HMD100 chip is verified by the field-programmable gate array(FPGA).As one of innovative and high integration SoC chips,HMD100 is designed by a digital and analog mixed circuit.It can support binocular stereo display,has better scaling effect and integration.Hence it is applicable in virtual reality(VR),3D games and other microdisplay domains.

  4. High Speed Continuous-Time Bandpass Σ∆ADC for Mixed Signal VLSI Chips

    Directory of Open Access Journals (Sweden)

    P.A.HarshaVardhini

    2012-04-01

    Full Text Available With the unremitting progress in VLSI technology, there is a commensurate increase in performance demand on analog to digital converter and are now being applied to wide band communication systems. sigma Delta (Σ∆ converter is a popular technique for obtaining high resolution with relatively small bandwidth. Σ∆ ADCs which trade sampling speed for resolution can benefit from the speed advantages of nm-CMOS technologies. This paper compares various Band pass sigma Delta ADC architectures, both continuous-time and discrete-time, in respect of power consumption and SNDR. Design of 2nd order multi bit continuous time band pass Σ∆ modulator is discussed with the methods to resolve DAC non-idealities.

  5. High Speed Continuous-Time Bandpass Σ∆ADC for Mixed Signal VLSI Chips

    Directory of Open Access Journals (Sweden)

    M.Madhavi Latha

    2012-05-01

    Full Text Available With the unremitting progress in VLSI technology, there is a commensurate increase in performance demand on analog to digital converter and are now being applied to wideband communication systems. sigma Delta (Σ∆ converter is a popular technique for obtaining high resolution with relatively small bandwidth. Σ∆ ADCs which trade sampling speed for resolution can benefit from the speed advantages of nm-CMOS technologies. This paper compares various Band pass sigma Delta ADC architectures, both continuous-time and discrete-time, in respect of power consumption and SNDR. Design of 2nd order multibit continuous time band pass Σ∆ modulator is discussed with the methods to resolve DAC non-idealities.

  6. A dynamic CMOS multiplier for analog VLSI based on exponential pulse-decay modulation

    Science.gov (United States)

    Massengill, Lloyd W.

    1991-03-01

    A clocked, charge-based, CMOS modulator circuit is presented. The circuit, which performs a semilinear multiplication function, has applications in arrayed analog VLSI architectures such as parallel filters and neural network systems. The design presented is simple in structure, uses no operational amplifiers for the actual multiplication function, and uses no power in the static mode. Two-quadrant weighting of an input signal is accomplished by control of the magnitude and decay time of an exponential current pulse, resulting in the delivery of charge packets to a shared capacitive summing bus. The cell is modular in structure and can be fabricated in a standard CMOS process. An analytical derivation of the operation of the circuit, SPICE simulations, and MOSIS fabrication results are presented. The simulation studies indicate that the circuit is inherently tolerant to temperature effects, absolute device sizing errors, and clock-feedthrough transients.

  7. An Opto-VLSI-based reconfigurable optical adddrop multiplexer employing an off-axis 4-f imaging system.

    Science.gov (United States)

    Shen, Mingya; Xiao, Feng; Ahderom, Selam; Alameh, Kamal

    2009-08-03

    A novel reconfigurable optical add-drop multiplexer (ROADM) structure is proposed and demonstrated experimentally. The ROADM structure employs two arrayed waveguide gratings (AWGs), an array of optical fiber pairs, an array of 4-f imaging microlenses that are offset in relation to the axis of symmetry of the fiber pairs, and a reconfigurable Opto-VLSI processor that switches various wavelength channels between the fiber pairs to achieve add or drop multiplexing. Experimental results are shown, which demonstrate the principle of add/drop multiplexing with crosstalk of less than -27dB and insertion loss of less than 8dB over the Cband for drop and through operation modes.

  8. ProperCAD: A portable object-oriented parallel environment for VLSI CAD

    Science.gov (United States)

    Ramkumar, Balkrishna; Banerjee, Prithviraj

    1993-01-01

    Most parallel algorithms for VLSI CAD proposed to date have one important drawback: they work efficiently only on machines that they were designed for. As a result, algorithms designed to date are dependent on the architecture for which they are developed and do not port easily to other parallel architectures. A new project under way to address this problem is described. A Portable object-oriented parallel environment for CAD algorithms (ProperCAD) is being developed. The objectives of this research are (1) to develop new parallel algorithms that run in a portable object-oriented environment (CAD algorithms using a general purpose platform for portable parallel programming called CARM is being developed and a C++ environment that is truly object-oriented and specialized for CAD applications is also being developed); and (2) to design the parallel algorithms around a good sequential algorithm with a well-defined parallel-sequential interface (permitting the parallel algorithm to benefit from future developments in sequential algorithms). One CAD application that has been implemented as part of the ProperCAD project, flat VLSI circuit extraction, is described. The algorithm, its implementation, and its performance on a range of parallel machines are discussed in detail. It currently runs on an Encore Multimax, a Sequent Symmetry, Intel iPSC/2 and i860 hypercubes, a NCUBE 2 hypercube, and a network of Sun Sparc workstations. Performance data for other applications that were developed are provided: namely test pattern generation for sequential circuits, parallel logic synthesis, and standard cell placement.

  9. VLSI micro- and nanophotonics science, technology, and applications

    CERN Document Server

    Lee, El-Hang; Razeghi, Manijeh; Jagadish, Chennupati

    2011-01-01

    Addressing the growing demand for larger capacity in information technology, VLSI Micro- and Nanophotonics: Science, Technology, and Applications explores issues of science and technology of micro/nano-scale photonics and integration for broad-scale and chip-scale Very Large Scale Integration photonics. This book is a game-changer in the sense that it is quite possibly the first to focus on ""VLSI Photonics"". Very little effort has been made to develop integration technologies for micro/nanoscale photonic devices and applications, so this reference is an important and necessary early-stage pe

  10. Handbook of VLSI chip design and expert systems

    CERN Document Server

    Schwarz, A F

    1993-01-01

    Handbook of VLSI Chip Design and Expert Systems provides information pertinent to the fundamental aspects of expert systems, which provides a knowledge-based approach to problem solving. This book discusses the use of expert systems in every possible subtask of VLSI chip design as well as in the interrelations between the subtasks.Organized into nine chapters, this book begins with an overview of design automation, which can be identified as Computer-Aided Design of Circuits and Systems (CADCAS). This text then presents the progress in artificial intelligence, with emphasis on expert systems.

  11. AN ALGORITHM FOR ASSEMBLING A COMMON IMAGE OF VLSI LAYOUT

    Directory of Open Access Journals (Sweden)

    Y. Y. Lankevich

    2015-01-01

    Full Text Available We consider problem of assembling a common image of VLSI layout. Common image is composedof frames obtained by electron microscope photographing. Many frames require a lot of computation for positioning each frame inside the common image. Employing graphics processing units enables acceleration of computations. We realize algorithms and programs for assembling a common image of VLSI layout. Specificity of this work is to use abilities of CUDA to reduce computation time. Experimental results show efficiency of the proposed programs.

  12. Software-Reconfigurable Processors for Spacecraft

    Science.gov (United States)

    Farrington, Allen; Gray, Andrew; Bell, Bryan; Stanton, Valerie; Chong, Yong; Peters, Kenneth; Lee, Clement; Srinivasan, Jeffrey

    2005-01-01

    A report presents an overview of an architecture for a software-reconfigurable network data processor for a spacecraft engaged in scientific exploration. When executed on suitable electronic hardware, the software performs the functions of a physical layer (in effect, acts as a software radio in that it performs modulation, demodulation, pulse-shaping, error correction, coding, and decoding), a data-link layer, a network layer, a transport layer, and application-layer processing of scientific data. The software-reconfigurable network processor is undergoing development to enable rapid prototyping and rapid implementation of communication, navigation, and scientific signal-processing functions; to provide a long-lived communication infrastructure; and to provide greatly improved scientific-instrumentation and scientific-data-processing functions by enabling science-driven in-flight reconfiguration of computing resources devoted to these functions. This development is an extension of terrestrial radio and network developments (e.g., in the cellular-telephone industry) implemented in software running on such hardware as field-programmable gate arrays, digital signal processors, traditional digital circuits, and mixed-signal application-specific integrated circuits (ASICs).

  13. Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA

    Science.gov (United States)

    Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei

    2013-03-01

    With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.

  14. A Systolic Array RLS Processor

    OpenAIRE

    Asai, T.; Matsumoto, T.

    2000-01-01

    This paper presents the outline of the systolic array recursive least-squares (RLS) processor prototyped primarily with the aim of broadband mobile communication applications. To execute the RLS algorithm effectively, this processor uses an orthogonal triangularization technique known in matrix algebra as QR decomposition for parallel pipelined processing. The processor board comprises 19 application-specific integrated circuit chips, each with approximately one million gates. Thirty-two bit ...

  15. AMD's 64-bit Opteron processor

    CERN Document Server

    CERN. Geneva

    2003-01-01

    This talk concentrates on issues that relate to obtaining peak performance from the Opteron processor. Compiler options, memory layout, MPI issues in multi-processor configurations and the use of a NUMA kernel will be covered. A discussion of recent benchmarking projects and results will also be included.BiographiesDavid RichDavid directs AMD's efforts in high performance computing and also in the use of Opteron processors...

  16. Emerging Trends in Embedded Processors

    Directory of Open Access Journals (Sweden)

    Gurvinder Singh

    2014-05-01

    Full Text Available An Embedded Processors is simply a µProcessors that has been “Embedded” into a device. Embedded systems are important part of human life. For illustration, one cannot visualize life without mobile phones for personal communication. Embedded systems are used in many places like healthcare, automotive, daily life, and in different offices and industries.Embedded Processors develop new research area in the field of hardware designing.

  17. Communication Efficient Multi-processor FFT

    Science.gov (United States)

    Lennart Johnsson, S.; Jacquemin, Michel; Krawitz, Robert L.

    1992-10-01

    Computing the fast Fourier transform on a distributed memory architecture by a direct pipelined radix-2, a bi-section, or a multisection algorithm, all yield the same communications requirement, if communication for all FFT stages can be performed concurrently, the input data is in normal order, and the data allocation is consecutive. With a cyclic data allocation, or bit-reversed input data and a consecutive allocation, multi-sectioning offers a reduced communications requirement by approximately a factor of two. For a consecutive data allocation, normal input order, a decimation-in-time FFT requires that P/ N + d-2 twiddle factors be stored for P elements distributed evenly over N processors, and the axis that is subject to transformation be distributed over 2 d processors. No communication of twiddle factors is required. The same storage requirements hold for a decimation-in-frequency FFT, bit-reversed input order, and consecutive data allocation. The opposite combination of FFT type and data ordering requires a factor of log 2N more storage for N processors. The peak performance for a Connection Machine system CM-200 implementation is 12.9 Gflops/s in 32-bit precision, and 10.7 Gflops/s in 64-bit precision for unordered transforms local to each processor. The corresponding execution rates for ordered transforms are 11.1 Gflops/s and 8.5 Gflops/s, respectively. For distributed one- and two-dimensional transforms the peak performance for unordered transforms exceeds 5 Gflops/s in 32-bit precision and 3 Gflops/s in 64-bit precision. Three-dimensional transforms execute at a slightly lower rate. Distributed ordered transforms execute at a rate of about {1}/{2}to {2}/{3} of the unordered transforms.

  18. Breadboard Signal Processor for Arraying DSN Antennas

    Science.gov (United States)

    Jongeling, Andre; Sigman, Elliott; Chandra, Kumar; Trinh, Joseph; Soriano, Melissa; Navarro, Robert; Rogstad, Stephen; Goodhart, Charles; Proctor, Robert; Jourdan, Michael; hide

    2008-01-01

    A recently developed breadboard version of an advanced signal processor for arraying many antennas in NASA s Deep Space Network (DSN) can accept inputs in a 500-MHz-wide frequency band from six antennas. The next breadboard version is expected to accept inputs from 16 antennas, and a following developed version is expected to be designed according to an architecture that will be scalable to accept inputs from as many as 400 antennas. These and similar signal processors could also be used for combining multiple wide-band signals in non-DSN applications, including very-long-baseline interferometry and telecommunications. This signal processor performs functions of a wide-band FX correlator and a beam-forming signal combiner. [The term "FX" signifies that the digital samples of two given signals are fast Fourier transformed (F), then the fast Fourier transforms of the two signals are multiplied (X) prior to accumulation.] In this processor, the signals from the various antennas are broken up into channels in the frequency domain (see figure). In each frequency channel, the data from each antenna are correlated against the data from each other antenna; this is done for all antenna baselines (that is, for all antenna pairs). The results of the correlations are used to obtain calibration data to align the antenna signals in both phase and delay. Data from the various antenna frequency channels are also combined and calibration corrections are applied. The frequency-domain data thus combined are then synthesized back to the time domain for passing on to a telemetry receiver

  19. Keystone Business Models for Network Security Processors

    Directory of Open Access Journals (Sweden)

    Arthur Low

    2013-07-01

    Full Text Available Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor” models nor the silicon intellectual-property licensing (“IP-licensing” models allow small technology companies to successfully compete. This article describes an alternative approach that produces an ongoing stream of novel network security processors for niche markets through continuous innovation by both large and small companies. This approach, referred to here as the "business ecosystem model for network security processors", includes a flexible and reconfigurable technology platform, a “keystone” business model for the company that maintains the platform architecture, and an extended ecosystem of companies that both contribute and share in the value created by innovation. New opportunities for business model innovation by participating companies are made possible by the ecosystem model. This ecosystem model builds on: i the lessons learned from the experience of the first author as a senior integrated circuit architect for providers of public-key cryptography solutions and as the owner of a semiconductor startup, and ii the latest scholarly research on technology entrepreneurship, business models, platforms, and business ecosystems. This article will be of interest to all technology entrepreneurs, but it will be of particular interest to owners of small companies that provide security solutions and to specialized security professionals seeking to launch their own companies.

  20. Coordinated Energy Management in Heterogeneous Processors

    Directory of Open Access Journals (Sweden)

    Indrani Paul

    2014-01-01

    Full Text Available This paper examines energy management in a heterogeneous processor consisting of an integrated CPU–GPU for high-performance computing (HPC applications. Energy management for HPC applications is challenged by their uncompromising performance requirements and complicated by the need for coordinating energy management across distinct core types – a new and less understood problem. We examine the intra-node CPU–GPU frequency sensitivity of HPC applications on tightly coupled CPU–GPU architectures as the first step in understanding power and performance optimization for a heterogeneous multi-node HPC system. The insights from this analysis form the basis of a coordinated energy management scheme, called DynaCo, for integrated CPU–GPU architectures. We implement DynaCo on a modern heterogeneous processor and compare its performance to a state-of-the-art power- and performance-management algorithm. DynaCo improves measured average energy-delay squared (ED2 product by up to 30% with less than 2% average performance loss across several exascale and other HPC workloads.

  1. Efficiency of Cache Mechanism for Network Processors

    Institute of Scientific and Technical Information of China (English)

    XU Bo; CHANG Jian; HUANG Shimeng; XUE Yibo; LI Jun

    2009-01-01

    With the explosion of network bandwidth and the ever-changing requirements for diverse net-work-based applications, the traditional processing architectures, i.e., general purpose processor (GPP) and application specific integrated circuits (ASIC) cannot provide sufficient flexibility and high performance at the same time. Thus, the network processor (NP) has emerged as an altemative to meet these dual demands for today's network processing. The NP combines embedded multi-threaded cores with a dch memory hierarchy that can adapt to different networking circumstances when customized by the application developers. In to-day's NP architectures, muitithreading prevails over cache mechanism, which has achieved great success in GPP to hide memory access latencies. This paper focuses on the efficiency of the cache mechanism in an NP. Theoretical timing models of packet processing are established for evaluating cache efficiency and experi-ments are performed based on real-life network backbone traces. Testing results show that an improvement of neady 70% can be gained in throughput with assistance from the cache mechanism. Accordingly, the cache mechanism is still efficient and irreplaceable in network processing, despite the existing of multithreading.

  2. Reconfigurable signal processor designs for advanced digital array radar systems

    Science.gov (United States)

    Suarez, Hernan; Zhang, Yan (Rockee); Yu, Xining

    2017-05-01

    The new challenges originated from Digital Array Radar (DAR) demands a new generation of reconfigurable backend processor in the system. The new FPGA devices can support much higher speed, more bandwidth and processing capabilities for the need of digital Line Replaceable Unit (LRU). This study focuses on using the latest Altera and Xilinx devices in an adaptive beamforming processor. The field reprogrammable RF devices from Analog Devices are used as analog front end transceivers. Different from other existing Software-Defined Radio transceivers on the market, this processor is designed for distributed adaptive beamforming in a networked environment. The following aspects of the novel radar processor will be presented: (1) A new system-on-chip architecture based on Altera's devices and adaptive processing module, especially for the adaptive beamforming and pulse compression, will be introduced, (2) Successful implementation of generation 2 serial RapidIO data links on FPGA, which supports VITA-49 radio packet format for large distributed DAR processing. (3) Demonstration of the feasibility and capabilities of the processor in a Micro-TCA based, SRIO switching backplane to support multichannel beamforming in real-time. (4) Application of this processor in ongoing radar system development projects, including OU's dual-polarized digital array radar, the planned new cylindrical array radars, and future airborne radars.

  3. A 16-Bit Fully Functional Single Cycle Processor

    Directory of Open Access Journals (Sweden)

    Nidhi Maheshwari

    2011-08-01

    Full Text Available The existing commercial microprocessors are provided as black box units, with which users are unable to monitor internal signals and operation process, neither can they modify the original structure. Inorder to solve this problem 16-bit fully functional single cycle processor is designed in terms of its architecture and its functional capabilities. The procedure of design and verification for a 16-bit processor is introduced in this paper. The key architecture elements are being described, as well as the hardware block diagram and internal structure. The summary of instruction set is presented. This processor is modify as a Very High Speed Integrated Circuit Hardware Description Language (VHDL and gives access to every internal signal. In order to consume fewer resources, the design of arithmetic logical unit (ALU is optimized. The RTL views and verified simulation results of processor are shown in this paper. The synthesis report of the design is also described. The design architecture is written in Very High Speed Integrated Circuit Hardware Description Language (VHDL code using Xilinx ISE 9.2i tool for synthesis and simulation.

  4. Parallel Processor for 3D Recovery from Optical Flow

    Directory of Open Access Journals (Sweden)

    Jose Hugo Barron-Zambrano

    2009-01-01

    Full Text Available 3D recovery from motion has received a major effort in computer vision systems in the recent years. The main problem lies in the number of operations and memory accesses to be performed by the majority of the existing techniques when translated to hardware or software implementations. This paper proposes a parallel processor for 3D recovery from optical flow. Its main feature is the maximum reuse of data and the low number of clock cycles to calculate the optical flow, along with the precision with which 3D recovery is achieved. The results of the proposed architecture as well as those from processor synthesis are presented.

  5. Boolean approaches to graph embeddings related to VLSI

    Institute of Scientific and Technical Information of China (English)

    刘彦佩

    2001-01-01

    This paper discusses the development of Boolean methods in some topics on graph em-beddings which are related to VLSI. They are mainly the general theory of graph embeddability, the orientabilities of a graph and the rectilinear layout of an electronic circuit.

  6. Artificial immune system algorithm in VLSI circuit configuration

    Science.gov (United States)

    Mansor, Mohd. Asyraf; Sathasivam, Saratha; Kasihmuddin, Mohd Shareduwan Mohd

    2017-08-01

    In artificial intelligence, the artificial immune system is a robust bio-inspired heuristic method, extensively used in solving many constraint optimization problems, anomaly detection, and pattern recognition. This paper discusses the implementation and performance of artificial immune system (AIS) algorithm integrated with Hopfield neural networks for VLSI circuit configuration based on 3-Satisfiability problems. Specifically, we emphasized on the clonal selection technique in our binary artificial immune system algorithm. We restrict our logic construction to 3-Satisfiability (3-SAT) clauses in order to outfit with the transistor configuration in VLSI circuit. The core impetus of this research is to find an ideal hybrid model to assist in the VLSI circuit configuration. In this paper, we compared the artificial immune system (AIS) algorithm (HNN-3SATAIS) with the brute force algorithm incorporated with Hopfield neural network (HNN-3SATBF). Microsoft Visual C++ 2013 was used as a platform for training, simulating and validating the performances of the proposed network. The results depict that the HNN-3SATAIS outperformed HNN-3SATBF in terms of circuit accuracy and CPU time. Thus, HNN-3SATAIS can be used to detect an early error in the VLSI circuit design.

  7. Tungsten and other refractory metals for VLSI applications II

    Energy Technology Data Exchange (ETDEWEB)

    Broadbent, E.K.

    1987-01-01

    This book presents papers on tungsten and other refractory metals for VLSI applications. Topics include the following: Selectivity loss and nucleation on insulators, fundamental reaction and growth studies, chemical vapor deposition of tungsten, chemical vapor deposition of molybdenum, reactive ion etching of refractory metal films; and properties of refractory metals deposited by sputtering.

  8. An Interactive Multimedia Learning Environment for VLSI Built with COSMOS

    Science.gov (United States)

    Angelides, Marios C.; Agius, Harry W.

    2002-01-01

    This paper presents Bigger Bits, an interactive multimedia learning environment that teaches students about VLSI within the context of computer electronics. The system was built with COSMOS (Content Oriented semantic Modelling Overlay Scheme), which is a modelling scheme that we developed for enabling the semantic content of multimedia to be used…

  9. Scientific Computing Kernels on the Cell Processor

    Energy Technology Data Exchange (ETDEWEB)

    Williams, Samuel W.; Shalf, John; Oliker, Leonid; Kamil, Shoaib; Husbands, Parry; Yelick, Katherine

    2007-04-04

    The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of using the recently-released STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. First, we introduce a performance model for Cell and apply it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs. The difficulty of programming Cell, which requires assembly level intrinsics for the best performance, makes this model useful as an initial step in algorithm design and evaluation. Next, we validate the accuracy of our model by comparing results against published hardware results, as well as our own implementations on a 3.2GHz Cell blade. Additionally, we compare Cell performance to benchmarks run on leading superscalar (AMD Opteron), VLIW (Intel Itanium2), and vector (Cray X1E) architectures. Our work also explores several different mappings of the kernels and demonstrates a simple and effective programming model for Cell's unique architecture. Finally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.

  10. Simultaneous multithreaded processor enhanced for multimedia applications

    Science.gov (United States)

    Mombers, Friederich; Thomas, Michel

    1999-12-01

    The paper proposes a new media processor architecture specifically designed to handle state-of-the-art multimedia encoding and decoding tasks. To achieve this, the architecture efficiently exploit Data-, Instruction- and Thread-Level parallelisms while continuously adapting its computational resources to reach the most appropriate parallelism level among all the concurrent encoding/decoding processes. Looking at the implementation constraints, several critical choices were adopted that solve the interconnection delay problem, lower the cache misses and pipeline stalls effects and reduce register files and memory size by adopting a clustered Simultaneous Multithreaded Architecture. We enhanced the classic model to exploit both Instruction and Data Level Parallelism through vector instructions. The vector extension is well justified for multimedia workload and improves code density, crossbars complexity, register file ports and decoding logic area while it still provides an efficient way to fully exploit a large set of functional units. An MPEG-2 encoding algorithms based on Hybrid Genetic search has been implemented that show the efficiency of the architecture to adapt its resources allocation to better fulfill the application requirements.

  11. Template Matching on Parallel Architectures,

    Science.gov (United States)

    1985-07-01

    memory. The processors run asynchronously. Thus according to Hynn’s categories the Butterfl . is a MIMD machine. The processors of the Butterfly are...Generalized Butterfly Architecture This section describes timings for pattern matching on the generalized Butterfl .. Ihe implementations on the Butterfly...these algorithms. Thus the best implementation of the techniques on the generalized Butterfl % are the same as the implementation on the real Butterfly

  12. Macrocell Builder: IP-Block-Based Design Environment for High-Throughput VLSI Dedicated Digital Signal Processing Systems

    Directory of Open Access Journals (Sweden)

    Urard Pascal

    2006-01-01

    Full Text Available We propose an efficient IP-block-based design environment for high-throughput VLSI systems. The flow generates SystemC register-transfer-level (RTL architecture, starting from a Matlab functional model described as a netlist of functional IP. The refinement model inserts automatically control structures to manage delays induced by the use of RTL IPs. It also inserts a control structure to coordinate the execution of parallel clocked IP. The delays may be managed by registers or by counters included in the control structure. The flow has been used successfully in three real-world DSP systems. The experimentations show that the approach can produce efficient RTL architecture and allows to save huge amount of time.

  13. An Analogue VLSI Implementation of the Meddis Inner Hair Cell Model

    Directory of Open Access Journals (Sweden)

    McEwan Alistair

    2003-01-01

    Full Text Available The Meddis inner hair cell model is a widely accepted, but computationally intensive computer model of mammalian inner hair cell function. We have produced an analogue VLSI implementation of this model that operates in real time in the current domain by using translinear and log-domain circuits. The circuit has been fabricated on a chip and tested against the Meddis model for (a rate level functions for onset and steady-state response, (b recovery after masking, (c additivity, (d two-component adaptation, (e phase locking, (f recovery of spontaneous activity, and (g computational efficiency. The advantage of this circuit, over other electronic inner hair cell models, is its nearly exact implementation of the Meddis model which can be tuned to behave similarly to the biological inner hair cell. This has important implications on our ability to simulate the auditory system in real time. Furthermore, the technique of mapping a mathematical model of first-order differential equations to a circuit of log-domain filters allows us to implement real-time neuromorphic signal processors for a host of models using the same approach.

  14. VLSI Implementation of Novel Class of High Speed Pipelined Digital Signal Processing Filter for Wireless Receivers

    Directory of Open Access Journals (Sweden)

    Rozita Teymourzadeh

    2010-01-01

    Full Text Available Problem statement: The need for high performance transceiver with high Signal to Noise Ratio (SNR has driven the communication system to utilize latest technique identified as over sampling systems. It was the most economical modulator and decimation in communication system. It has been proven to increase the SNR and is used in many high performance systems such as in the Analog to Digital Converter (ADC for wireless transceiver. Approach: This research presented the design of the novel class of decimation and its VLSI implementation which was the sub-component in the over sampling technique. The design and realization of main unit of decimation stage that was the Cascaded Integrator Comb (CIC filter, the associated half band filters and the droop correction are also designed. The Verilog HDL code in Xilinx ISE environment has been derived to describe the proposed advanced CIC filter properties. Consequently, Virtex-II FPGA board was used to implement and test the design on the real hardware. The ASIC design implementation was performed accordingly and resulted power and area measurement on chip core layout. Results: The proposed design focused on the trade-off between the high speed and the low power consumption as well as the silicon area and high resolution for the chip implementation which satisfies wireless communication systems. The synthesis report illustrates the maximum clock frequency of 332 MHz with the active core area of 0.308×0.308 mm2. Conclusion: It can be concluded that VLSI implementation of proposed filter architecture is an enabler in solving problems that affect communication capability in DSP application.

  15. Low-complexity systolic architecture for inversion

    Institute of Scientific and Technical Information of China (English)

    Yuan Danshou; Rong Mengtian

    2006-01-01

    A modified extended binary Euclid's algorithm which is more regularly iterative for computing an inversion in GF(2m) is presented. Based on above modified algorithm, a serial-in serial-out architecture is proposed. It has area complexity of O(m), latency of 5m-2, and throughput of 1/m. Compared with other serial systolic architectures, the proposed one has the smallest area complexity, shorter latency. It is highly regular, modular, and thus well suited for high-speed VLSI design.

  16. Reconfigurable Architecture for Minimizing the Network Delays in the Multi-core Systems

    Directory of Open Access Journals (Sweden)

    D. Venkatavara Prasad

    2015-03-01

    Full Text Available Noc architecture performs better comparing to bus based when the number of processors is small. On the other hand bus based performs better than noc when number of the processors is large. This leads to new architecture which is hybrid bus based architecture where each node is packet switched in a mesh network of noc architecture that contains bus based system with small number of processors. Few results showed that this hybrid architecture performs optimally better than either purely noc based or purely bus based architecture. Hybrid architecture contains a processor connected to the bus, the bus in turn connected to the router. Each processor contains a private Level 1 (L1 cache. When hybrid architecture is preferable, the optimal number of processors on each bus subsystem varies based on the application. Hence the proposed architecture allows scalable bus-based multiprocessor subsystems on each node in the NoC. This system provides a multi-bus execution environment where each processor is connected to a bus and the bus-based subsystems communicate via routers connected in a mesh-style configuration. The system can be reconfigured to vary the number of bus subsystems and the number of processors on each subsystem. This architecture provides reliability and adaptability and reduces the network delays. Implementing and presenting the details of architecture and experimental results using ns2 indicating the advantages of this architecture.

  17. High Performance Ethernet Packet Processor Core for Next Generation Networks

    Directory of Open Access Journals (Sweden)

    Raja Jitendra Nayaka

    2012-10-01

    Full Text Available As the demand for high speed Internet significantly increasing to meet the requirement of large datatransfers, real-time communication and High Definition ( HD multimedia transfer over IP, the IP basednetwork products architecture must evolve and change. Application specific processors require highperformance, low power and high degree of programmability is the limitation in many general processorbased applications. This paper describes the design of Ethernet packet processor for system-on-chip (SoCwhich performs all core packet processing functions, including segmentation and reassembly, packetizationclassification, route and queue management which will speedup switching/routing performance making itmore suitable for Next Generation Networks (NGN. Ethernet packet processor design can be configuredfor use with multiple projects targeted to a FPGA device the system is designed to support 1/10/20/40/100Gigabit links with a speed and performance advantage. VHDL has been used to implement and simulatedthe required functions in FPGA.

  18. Stepping motor control processor reference manual. Volume I

    Energy Technology Data Exchange (ETDEWEB)

    Holloway, F.W.; VanArsdall, P.J.; Suski, G.J.; Gant, R.G.; Rash, M.

    1980-06-06

    This manual is intended to serve several purposes. The first goal is to describe the capabilities and operation of the SMC processor package from an operator or user point of view. Secondly, the manual will describe in some detail the basic hardware elements and how they can be used effectively to implement a step motor control system. Practical information on the use, installation and checkout of the hardware set is presented in the following sections along with programming suggestions. Available related system software is described in this manual for reference and as an aid in understanding the system architecture. Section two presents an overview and operations manual of the SMC processor describing its composition and functional capabilities. Section three contains hardware descriptions in some detail for the LLL-designed hardware used in the SMC processor. Basic theory of operation and important features are explained.

  19. Modal Processor Effects Inspired by Hammond Tonewheel Organs

    Directory of Open Access Journals (Sweden)

    Kurt James Werner

    2016-06-01

    Full Text Available In this design study, we introduce a novel class of digital audio effects that extend the recently introduced modal processor approach to artificial reverberation and effects processing. These pitch and distortion processing effects mimic the design and sonics of a classic additive-synthesis-based electromechanical musical instrument, the Hammond tonewheel organ. As a reverb effect, the modal processor simulates a room response as the sum of resonant filter responses. This architecture provides precise, interactive control over the frequency, damping, and complex amplitude of each mode. Into this framework, we introduce two types of processing effects: pitch effects inspired by the Hammond organ’s equal tempered “tonewheels”, “drawbar” tone controls, vibrato/chorus circuit, and distortion effects inspired by the pseudo-sinusoidal shape of its tonewheels and electromagnetic pickup distortion. The result is an effects processor that imprints the Hammond organ’s sonics onto any audio input.

  20. Fault Tolerance Mechanism in Chip Many-Core Processors

    Institute of Scientific and Technical Information of China (English)

    ZHANG Lei; HAN Yinhe; LI Huawei; LI Xiaowei

    2007-01-01

    As semiconductor technology advances, there will be billions of transistors on a single chip. Chip many-core processors are emerging to take advantage of these greater transistor densities to deliver greater performance. Effective fault tolerance techniques are essential to improve the yield of such complex chips. In this paper, a core-level redundancy scheme called N+M is proposed to improve N-core processors'yield by providing M spare cores. In such architecture, topology is an important factor because it greatly affects the processors'performance. The concept of logical topology and a topology reconfiguration problem are introduced, which is able to transparently provide target topology with lowest performance degradation as the presence of faulty cores on-chip. A row rippling and column stealing (RRCS) algorithm is also proposed. Results show that PRCS can give solutions with average 13.8% degradation with negligible computing time.

  1. Microprocessor architectures RISC, CISC and DSP

    CERN Document Server

    Heath, Steve

    1995-01-01

    'Why are there all these different processor architectures and what do they all mean? Which processor will I use? How should I choose it?' Given the task of selecting an architecture or design approach, both engineers and managers require a knowledge of the whole system and an explanation of the design tradeoffs and their effects. This is information that rarely appears in data sheets or user manuals. This book fills that knowledge gap.Section 1 provides a primer and history of the three basic microprocessor architectures. Section 2 describes the ways in which the architectures react with the

  2. High-level language computer architecture

    CERN Document Server

    Chu, Yaohan

    1975-01-01

    High-Level Language Computer Architecture offers a tutorial on high-level language computer architecture, including von Neumann architecture and syntax-oriented architecture as well as direct and indirect execution architecture. Design concepts of Japanese-language data processing systems are discussed, along with the architecture of stack machines and the SYMBOL computer system. The conceptual design of a direct high-level language processor is also described.Comprised of seven chapters, this book first presents a classification of high-level language computer architecture according to the pr

  3. Performance Analysis of Embedded Processor%嵌入式处理器的性能分析方法

    Institute of Scientific and Technical Information of China (English)

    盛永杰; 刘博勤; 陈章龙

    2003-01-01

    This paper introduces the architecture of current embedded processors firstly. It focuses on some importantfactors that affect performance of embedded processor ,and proposes some performance analyzing methods. At the lastpart, a generic performance analysis system is illustrated to clarify the idea mentioned before with a short example.

  4. Scientific programming on massively parallel processor CP-PACS

    Energy Technology Data Exchange (ETDEWEB)

    Boku, Taisuke [Tsukuba Univ., Ibaraki (Japan). Inst. of Information Sciences and Electronics

    1998-03-01

    The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)

  5. Low complexity reconfigurable architecture for the 5/3 and 9/7 discrete wavelet transform

    Institute of Scientific and Technical Information of China (English)

    Xiong Chengyi; Tian Jinwen; Liu Jian

    2006-01-01

    Efficient reconfigurable VLSI architecture for 1-D 5/3 and 9/7 wavelet transforms adopted in JPEG2000 proposal, based on lifting scheme is proposed. The embedded decimation technique based on fold and time multiplexing, as well as embedded boundary data extension technique, is adopted to optimize the design of the architecture. These reduce significantly the required numbers of the multipliers, adders and registers, as well as the amount of accessing external memory, and lead to decrease efficiently the hardware cost and power consumption of the design. The architecture is designed to generate an output per clock cycle, and the detailed component and the approximation of the input signal are available alternately. Experimental simulation and comparison results are presented, which demonstrate that the proposed architecture has lower hardware complexity, thus it is adapted for embedded applications. The presented architecture is simple, regular and scalable, and well suited for VLSI implementation.

  6. Dense and Sparse Matrix Operations on the Cell Processor

    Energy Technology Data Exchange (ETDEWEB)

    Williams, Samuel W.; Shalf, John; Oliker, Leonid; Husbands,Parry; Yelick, Katherine

    2005-05-01

    The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. Therefore, the high performance computing community is examining alternative architectures that address the limitations of modern superscalar designs. In this work, we examine STI's forthcoming Cell processor: a novel, low-power architecture that combines a PowerPC core with eight independent SIMD processing units coupled with a software-controlled memory to offer high FLOP/s/Watt. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop an analytic framework to predict Cell performance on dense and sparse matrix operations, using a variety of algorithmic approaches. Results demonstrate Cell's potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors.

  7. Advanced symbolic analysis for VLSI systems methods and applications

    CERN Document Server

    Shi, Guoyong; Tlelo Cuautle, Esteban

    2014-01-01

    This book provides comprehensive coverage of the recent advances in symbolic analysis techniques for design automation of nanometer VLSI systems. The presentation is organized in parts of fundamentals, basic implementation methods and applications for VLSI design. Topics emphasized include  statistical timing and crosstalk analysis, statistical and parallel analysis, performance bound analysis and behavioral modeling for analog integrated circuits . Among the recent advances, the Binary Decision Diagram (BDD) based approaches are studied in depth. The BDD-based hierarchical symbolic analysis approaches, have essentially broken the analog circuit size barrier. In particular, this book   • Provides an overview of classical symbolic analysis methods and a comprehensive presentation on the modern  BDD-based symbolic analysis techniques; • Describes detailed implementation strategies for BDD-based algorithms, including the principles of zero-suppression, variable ordering and canonical reduction; • Int...

  8. VLSI Design with Alliance Free CAD Tools: an Implementation Example

    Directory of Open Access Journals (Sweden)

    Chávez-Bracamontes Ramón

    2015-07-01

    Full Text Available This paper presents the methodology used for a digital integrated circuit design that implements the communication protocol known as Serial Peripheral Interface, using the Alliance CAD System. The aim of this paper is to show how the work of VLSI design can be done by graduate and undergraduate students with minimal resources and experience. The physical design was sent to be fabricated using the CMOS AMI C5 process that features 0.5 micrometer in transistor size, sponsored by the MOSIS Educational Program. Tests were made on a platform that transfers data from inertial sensor measurements to the designed SPI chip, which in turn sends the data back on a parallel bus to a common microcontroller. The results show the efficiency of the employed methodology in VLSI design, as well as the feasibility of ICs manufacturing from school projects that have insufficient or no source of funding

  9. VLSI physical design analyzer: A profiling and data mining tool

    Science.gov (United States)

    Somani, Shikha; Verma, Piyush; Madhavan, Sriram; Batarseh, Fadi; Pack, Robert C.; Capodieci, Luigi

    2015-03-01

    Traditional physical design verification tools employ a deck of known design rules, each of which has a pre-defined pass/fail criteria associated with it. While passing a design rule deck is a necessary condition for a VLSI design to be manufacturable, it is not sufficient. Other physical design profiling decks that attempt to obtain statistical information about the various critical dimensions in the VLSI design lack a systematic methodology for rule enumeration. These decks are often inadequate, unable to extract all the interlayer and intralayer dimensions in a design that have a correlation with process yield. The Physical Design Analyzer is a comprehensive design analysis tool built with the objective of exhaustively exploring design-process correlations to increase the wafer yield.

  10. A novel 3D algorithm for VLSI floorplanning

    Science.gov (United States)

    Rani, D. Gracia N.; Rajaram, S.; Sudarasan, Athira

    2013-01-01

    3-D VLSI circuit is becoming a hot issue because of its potential of enhancing performance, while it is also facing challenges such as the increased complexity on floorplanning and placement in VLSI Physical design. Efficient 3-D floorplan representations are needed to handle the placement optimization in new circuit designs. We analyze and categorize some state-of-the-art 3-D representations, and propose a Ternary tree model for 3-D nonslicing floorplans, extending the B*tree from 2D.This paper proposes a novel optimization algorithm for packing of 3D rectangular blocks. The new techniques considered are Differential evolutionary algorithm (DE) is very fast in that it evaluates the feasibility of a Ternary tree representation. Experimental results based on MCNC benchmark with constraints show that our proposed Differential Evolutionary (DE) can quickly produce optimal solutions.

  11. VLSI design for fault-dictionary based testability

    Science.gov (United States)

    Miller, Charles D.

    The fault-dictionary approach to isolating failures in digital circuits provides inferior isolation accuracy compared to that which is now generally attained with other isolation methods. This limitation is particularly apparent when circuits which use bidirectional bus configurations are being tested. For this reason, fault-dictionary-based isolation has serious economic implications when testing digital circuits which use expensive VLSI or HSIC devices. However, by incorporating relatively minor circuit additions into the design of VLSI and HSIC devices, the normal set/scan or equivalent testability pins can additionally serve to improve actual fault-isolation accuracy. The described additions for improving fault-dictionary-based fault isolation require little semiconductor area, and one configuration even serves to prevent bus-drive conflicts.

  12. Trace-based post-silicon validation for VLSI circuits

    CERN Document Server

    Liu, Xiao

    2014-01-01

    This book first provides a comprehensive coverage of state-of-the-art validation solutions based on real-time signal tracing to guarantee the correctness of VLSI circuits.  The authors discuss several key challenges in post-silicon validation and provide automated solutions that are systematic and cost-effective.  A series of automatic tracing solutions and innovative design for debug (DfD) techniques are described, including techniques for trace signal selection for enhancing visibility of functional errors, a multiplexed signal tracing strategy for improving functional error detection, a tracing solution for debugging electrical errors, an interconnection fabric for increasing data bandwidth and supporting multi-core debug, an interconnection fabric design and optimization technique to increase transfer flexibility and a DfD design and associated tracing solution for improving debug efficiency and expanding tracing window. The solutions presented in this book improve the validation quality of VLSI circuit...

  13. Fast Forwarding with Network Processors

    OpenAIRE

    Lefèvre, Laurent; Lemoine, E.; Pham, C; Tourancheau, B.

    2003-01-01

    Forwarding is a mechanism found in many network operations. Although a regular workstation is able to perform forwarding operations it still suffers from poor performances when compared to dedicated hardware machines. In this paper we study the possibility of using Network Processors (NPs) to improve the capability of regular workstations to forward data. We present a simple model and an experimental study demonstrating that even though NPs are less powerful than Host Processors (HPs) they ca...

  14. Floating-point array processors evolve into tailorable systems

    Energy Technology Data Exchange (ETDEWEB)

    Kotelly, G.

    1983-05-12

    Recently introduced 32-bit floating point array processors (APs) combine configuration flexibility, integrated hardware/software system architecture and real-time computational power to meet a variety of application requirements. APS have now evolved into general-purpose boxes and PC boards which readily adapt to changing OEM needs. Contributing to this greater AP versatility are a variety of hardware and software features. These features are described and the range of available products is surveyed.

  15. High speed optical object recognition processor with massive holographic memory

    Science.gov (United States)

    Chao, T.; Zhou, H.; Reyes, G.

    2002-01-01

    Real-time object recognition using a compact grayscale optical correlator will be introduced. A holographic memory module for storing a large bank of optimum correlation filters, to accommodate the large data throughput rate needed for many real-world applications, has also been developed. System architecture of the optical processor and the holographic memory will be presented. Application examples of this object recognition technology will also be demonstrated.

  16. Implementation Of A Prototype Digital Optical Cellular Image Processor (DOCIP)

    Science.gov (United States)

    Huang, K. S.; Sawchuk, A. A.; Jenkins, B. K.; Chavel, P.; Wang, J. M.; Weber, A. G.; Wang, C. H.; Glaser, I.

    1989-02-01

    A processing element of a prototype digital optical cellular image processor (DOCIP) is implemented to demonstrate a particular parallel computing and interconnection architecture. This experimental digital optical computing system consists of a 2-D array of 54 optical logic gates, a 2-D array of 53 subholograms to provide interconnections between gates, and electronic input/output interfaces. The multi-facet interconnection hologram used in this system is fabricated by a computer-controlled optical system to offer very flexible interconnections.

  17. Vlsi Implementation of Edge Detection for Images

    Directory of Open Access Journals (Sweden)

    T. Mahalakshmi

    2012-12-01

    Full Text Available Edge is the boundary between the image and its background. Edge detection in general is defined as the local maxima obtained from high pass filters, but an optimized edge detector should mark the edges with respect to luminance or brightness changes. It is easy to obtain them in software implementation but for hardware implementation there is an issue with percentage of accuracy and processing time. This study discusses various edge detection algorithms and proposes an optimized edge detector which provides the solution for mentioned above issue. Since FPGA provides practical solutions for most of the image processing problems, the proposed architecture has been developed using Matlab System generator. Experimental results show the accuracy of edge detected using proposed architecture.

  18. A systematic method for configuring VLSI networks of spiking neurons.

    Science.gov (United States)

    Neftci, Emre; Chicca, Elisabetta; Indiveri, Giacomo; Douglas, Rodney

    2011-10-01

    An increasing number of research groups are developing custom hybrid analog/digital very large scale integration (VLSI) chips and systems that implement hundreds to thousands of spiking neurons with biophysically realistic dynamics, with the intention of emulating brainlike real-world behavior in hardware and robotic systems rather than simply simulating their performance on general-purpose digital computers. Although the electronic engineering aspects of these emulation systems is proceeding well, progress toward the actual emulation of brainlike tasks is restricted by the lack of suitable high-level configuration methods of the kind that have already been developed over many decades for simulations on general-purpose computers. The key difficulty is that the dynamics of the CMOS electronic analogs are determined by transistor biases that do not map simply to the parameter types and values used in typical abstract mathematical models of neurons and their networks. Here we provide a general method for resolving this difficulty. We describe a parameter mapping technique that permits an automatic configuration of VLSI neural networks so that their electronic emulation conforms to a higher-level neuronal simulation. We show that the neurons configured by our method exhibit spike timing statistics and temporal dynamics that are the same as those observed in the software simulated neurons and, in particular, that the key parameters of recurrent VLSI neural networks (e.g., implementing soft winner-take-all) can be precisely tuned. The proposed method permits a seamless integration between software simulations with hardware emulations and intertranslatability between the parameters of abstract neuronal models and their emulation counterparts. Most important, our method offers a route toward a high-level task configuration language for neuromorphic VLSI systems.

  19. VLSI Design with Alliance Free CAD Tools: an Implementation Example

    OpenAIRE

    Chávez-Bracamontes Ramón; García-López Reyna Itzel; Gurrola-Navarro Marco Antonio; Bandala-Sánchez Manuel

    2015-01-01

    This paper presents the methodology used for a digital integrated circuit design that implements the communication protocol known as Serial Peripheral Interface, using the Alliance CAD System. The aim of this paper is to show how the work of VLSI design can be done by graduate and undergraduate students with minimal resources and experience. The physical design was sent to be fabricated using the CMOS AMI C5 process that features 0.5 micrometer in transistor size, sponsored ...

  20. DESIGN AND ANALOG VLSI IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK

    OpenAIRE

    2011-01-01

    Nature has evolved highly advanced systems capable of performing complex computations, adoption and learning using analog computations. Furthermore nature has evolved techniques to deal with imprecise analog computations by using redundancy and massive connectivity. In this paper we are making use of Artificial Neural Network to demonstrate the way in which the biological system processes in analog domain. We are using 180nm CMOS VLSI technology for implementing circuits which ...

  1. Design of a VLSI Decoder for Partially Structured LDPC Codes

    Directory of Open Access Journals (Sweden)

    Fabrizio Vacca

    2008-01-01

    of their parity matrix can be partitioned into two disjoint sets, namely, the structured and the random ones. For the proposed class of codes a constructive design method is provided. To assess the value of this method the constructed codes performance are presented. From these results, a novel decoding method called split decoding is introduced. Finally, to prove the effectiveness of the proposed approach a whole VLSI decoder is designed and characterized.

  2. Diseño digital : una perspectiva VLSI-CMOS

    OpenAIRE

    Alcubilla González, Ramón; Pons Nin, Joan; Bardés Llorensí, Daniel

    1996-01-01

    Bibliografia El presente texto aporta el material necesario para un curso introductorio de Electrónica Digital. Incluye los conceptos fundamentales de diseño clásico de circuitos lógicos combinacionales y secuenciales. Adicionalmente se introducen aspectos de diseño de circuitos integrados con tecnología VLSI-CMOS. Se ha incidido particularmente en los elementos de autoaprendizaje mediante la inclusión de numerosos ejemplos y problemas.

  3. Scaling the ion trap quantum processor.

    Science.gov (United States)

    Monroe, C; Kim, J

    2013-03-08

    Trapped atomic ions are standards for quantum information processing, serving as quantum memories, hosts of quantum gates in quantum computers and simulators, and nodes of quantum communication networks. Quantum bits based on trapped ions enjoy a rare combination of attributes: They have exquisite coherence properties, they can be prepared and measured with nearly 100% efficiency, and they are readily entangled with each other through the Coulomb interaction or remote photonic interconnects. The outstanding challenge is the scaling of trapped ions to hundreds or thousands of qubits and beyond, at which scale quantum processors can outperform their classical counterparts in certain applications. We review the latest progress and prospects in that effort, with the promise of advanced architectures and new technologies, such as microfabricated ion traps and integrated photonics.

  4. A microprogrammable versatile processor (MVP) for digital signal and data processing

    Science.gov (United States)

    Tsou, H. E.; Nix, W. C.; Weir, E. M.; Smith, J. J.; Kushner, M. L.

    A simple, functionally partitioned, data bus-oriented, horizontally-microprogrammed processor architecture has been developed which is ideal for low cost signal processing, emulation, and data processing applications. Due to its functional modularity and bus oriented architecture, new technologies can be inserted without perturbing the design and the support software. The processor has separate RALUs for data processing and address generation and is the first signal processor to use the TRW 16 x 16-bit multiplier and the AMD RALUs. It allows for growth by adding attached arithmetic processors, microprocessors, etc. Support software includes register transfer-level HOL, assembler, loader, linker, diagnostics, library, executive and I/O drivers. Applications include speech processing, coding/decoding, modulation/demodulation, signal processing and emulation.

  5. Robust Bioinformatics Recognition with VLSI Biochip Microsystem

    Science.gov (United States)

    Lue, Jaw-Chyng L.; Fang, Wai-Chi

    2006-01-01

    A microsystem architecture for real-time, on-site, robust bioinformatic patterns recognition and analysis has been proposed. This system is compatible with on-chip DNA analysis means such as polymerase chain reaction (PCR)amplification. A corresponding novel artificial neural network (ANN) learning algorithm using new sigmoid-logarithmic transfer function based on error backpropagation (EBP) algorithm is invented. Our results show the trained new ANN can recognize low fluorescence patterns better than the conventional sigmoidal ANN does. A differential logarithmic imaging chip is designed for calculating logarithm of relative intensities of fluorescence signals. The single-rail logarithmic circuit and a prototype ANN chip are designed, fabricated and characterized.

  6. VLSI (Very Large Scale Integration) Design Tools Reference Manual - Release 1.0.

    Science.gov (United States)

    1983-10-01

    34" SUBCXT Sabna N1 < N2 N3 ... > 1_V/NW VLSI Release 1 -18- * SPICE User’s Guide UW/NW VLSI Consortium Examples: .SUBCKT OPAMP 12 3 4 A circuit definition... OPAMP This card must be the last one for any subcircuit definition. The subcircuit name, if included, indicates which subcircuit definition is being

  7. Neural network surface acoustic wave RF signal processor for digital modulation recognition.

    Science.gov (United States)

    Kavalov, Dimitar; Kalinin, Victor

    2002-09-01

    An architecture of a surface acoustic wave (SAW) processor based on an artificial neural network is proposed for an automatic recognition of different types of digital passband modulation. Three feed-forward networks are trained to recognize filtered and unfiltered binary phase shift keying (BPSK) and quadrature phase shift keying (QPSK) signals, as well as unfiltered BPSK, QPSK, and 16 quadrature amplitude (16QAM) signals. Performance of the processor in the presence of additive white Gaussian noise (AWGN) is simulated. The influence of second-order effects in SAW devices, phase, and amplitude errors on the performance of the processor also is studied.

  8. Multiple-Morphs Adaptive Stream Architecture

    Institute of Scientific and Technical Information of China (English)

    Mei Wen; Nan Wu; Hai-Yan Li; Chun-Yuan Zhang

    2005-01-01

    In modern VLSI technology, hundreds of thousands of arithmetic units fit on a 1cm2 chip. The challenge is supplying them with instructions and data. Stream architecture is able to solve the problem well. However, the applications suited for typical stream architecture are limited. This paper presents the definition of regular stream and irregular stream,and then describes MASA (Multiple-morphs Adaptive Stream Architecture) prototype system which supports different execution models according to applications' stream characteristics. This paper first discusses MASA architecture and stream model, and then explores the features and advantages of MASA through mapping stream applications to hardware.Finally MASA is evaluated by ten benchmarks. The result is encouraging.

  9. Liquid state machine with dendritically enhanced readout for low-power, neuromorphic VLSI implementations.

    Science.gov (United States)

    Roy, Subhrajit; Banerjee, Amitava; Basu, Arindam

    2014-10-01

    In this paper, we describe a new neuro-inspired, hardware-friendly readout stage for the liquid state machine (LSM), a popular model for reservoir computing. Compared to the parallel perceptron architecture trained by the p-delta algorithm, which is the state of the art in terms of performance of readout stages, our readout architecture and learning algorithm can attain better performance with significantly less synaptic resources making it attractive for VLSI implementation. Inspired by the nonlinear properties of dendrites in biological neurons, our readout stage incorporates neurons having multiple dendrites with a lumped nonlinearity (two compartment model). The number of synaptic connections on each branch is significantly lower than the total number of connections from the liquid neurons and the learning algorithm tries to find the best 'combination' of input connections on each branch to reduce the error. Hence, the learning involves network rewiring (NRW) of the readout network similar to structural plasticity observed in its biological counterparts. We show that compared to a single perceptron using analog weights, this architecture for the readout can attain, even by using the same number of binary valued synapses, up to 3.3 times less error for a two-class spike train classification problem and 2.4 times less error for an input rate approximation task. Even with 60 times larger synapses, a group of 60 parallel perceptrons cannot attain the performance of the proposed dendritically enhanced readout. An additional advantage of this method for hardware implementations is that the 'choice' of connectivity can be easily implemented exploiting address event representation (AER) protocols commonly used in current neuromorphic systems where the connection matrix is stored in memory. Also, due to the use of binary synapses, our proposed method is more robust against statistical variations.

  10. Libera Electron Beam Position Processor

    CERN Document Server

    Ursic, Rok

    2005-01-01

    Libera is a product family delivering unprecedented possibilities for either building powerful single station solutions or architecting complex feedback systems in the field of accelerator instrumentation and controls. This paper presents functionality and field performance of its first member, the electron beam position processor. It offers superior performance with multiple measurement channels delivering simultaneously position measurements in digital format with MHz kHz and Hz bandwidths. This all-in-one product, facilitating pulsed and CW measurements, is much more than simply a high performance beam position measuring device delivering micrometer level reproducibility with sub-micrometer resolution. Rich connectivity options and innate processing power make it a powerful feedback building block. By interconnecting multiple Libera electron beam position processors one can build a low-latency high throughput orbit feedback system without adding additional hardware. Libera electron beam position processor ...

  11. Java Processor Optimized for RTSJ

    Directory of Open Access Journals (Sweden)

    Tu Shiliang

    2007-01-01

    Full Text Available Due to the preeminent work of the real-time specification for Java (RTSJ, Java is increasingly expected to become the leading programming language in real-time systems. To provide a Java platform suitable for real-time applications, a Java processor which can execute Java bytecode is directly proposed in this paper. It provides efficient support in hardware for some mechanisms specified in the RTSJ and offers a simpler programming model through ameliorating the scoped memory of the RTSJ. The worst case execution time (WCET of the bytecodes implemented in this processor is predictable by employing the optimization method proposed in our previous work, in which all the processing interfering predictability is handled before bytecode execution. Further advantage of this method is to make the implementation of the processor simpler and suited to a low-cost FPGA chip.

  12. Monte Carlo simulations on SIMD computer architectures

    Energy Technology Data Exchange (ETDEWEB)

    Burmester, C.P.; Gronsky, R. [Lawrence Berkeley Lab., CA (United States); Wille, L.T. [Florida Atlantic Univ., Boca Raton, FL (United States). Dept. of Physics

    1992-03-01

    Algorithmic considerations regarding the implementation of various materials science applications of the Monte Carlo technique to single instruction multiple data (SMM) computer architectures are presented. In particular, implementation of the Ising model with nearest, next nearest, and long range screened Coulomb interactions on the SIMD architecture MasPar MP-1 (DEC mpp-12000) series of massively parallel computers is demonstrated. Methods of code development which optimize processor array use and minimize inter-processor communication are presented including lattice partitioning and the use of processor array spanning tree structures for data reduction. Both geometric and algorithmic parallel approaches are utilized. Benchmarks in terms of Monte Carlo updates per second for the MasPar architecture are presented and compared to values reported in the literature from comparable studies on other architectures.

  13. Cluster Algorithm Special Purpose Processor

    Science.gov (United States)

    Talapov, A. L.; Shchur, L. N.; Andreichenko, V. B.; Dotsenko, Vl. S.

    We describe a Special Purpose Processor, realizing the Wolff algorithm in hardware, which is fast enough to study the critical behaviour of 2D Ising-like systems containing more than one million spins. The processor has been checked to produce correct results for a pure Ising model and for Ising model with random bonds. Its data also agree with the Nishimori exact results for spin glass. Only minor changes of the SPP design are necessary to increase the dimensionality and to take into account more complex systems such as Potts models.

  14. Cluster algorithm special purpose processor

    Energy Technology Data Exchange (ETDEWEB)

    Talapov, A.L.; Shchur, L.N.; Andreichenko, V.B.; Dotsenko, V.S. (Landau Inst. for Theoretical Physics, GSP-1 117940 Moscow V-334 (USSR))

    1992-08-10

    In this paper, the authors describe a Special Purpose Processor, realizing the Wolff algorithm in hardware, which is fast enough to study the critical behaviour of 2D Ising-like systems containing more than one million spins. The processor has been checked to produce correct results for a pure Ising model and for Ising model with random bonds. Its data also agree with the Nishimori exact results for spin glass. Only minor changes of the SPP design are necessary to increase the dimensionality and to take into account more complex systems such as Potts models.

  15. General Architecture and Instruction Set Enhancements for Multimedia Applications

    Directory of Open Access Journals (Sweden)

    Mansour Assaf

    2007-12-01

    Full Text Available The present day multimedia applications (MMAs are driving the computing industry as every application being developed is using multimedia in one or the other way. Computer architects are building computer systems with powerful processors to handle the MMAs. There have been tremendous changes in the design of the processors to handle different types of MMAs. We see a lot of such application specific processors today in the industry; different architectures have been proposed for processing MMAs such as VLIW, superscalar (general-purpose processor enhanced with a multimedia extension such as MMX, vector architecture, SIMD architectures, and reconfigurable computing devices. Many of the General Purpose Processors (GPPs require coprocessors to handle graphics and sound and usually those processors are either expensive or incompatible. Keeping these and the demands MMAs in mind designers have made changes to GPPs; many GPP Vendors have added instructions to their Instruction Set Architecture (ISA. All these processors use similar techniques to execute multimedia instructions. This survey paper investigates the enhancements made to the GPPS in their general Architecture as well as the ISA. We will present the many different techniques used by GPP designers to handle MMAs, the present day GPP available architectures, compare different techniques, and concludes this survey.

  16. RASSP signal processing architectures

    Science.gov (United States)

    Shirley, Fred; Bassett, Bob; Letellier, J. P.

    1995-06-01

    The rapid prototyping of application specific signal processors (RASSP) program is an ARPA/tri-service effort to dramatically improve the process by which complex digital systems, particularly embedded signal processors, are specified, designed, documented, manufactured, and supported. The domain of embedded signal processing was chosen because it is important to a variety of military and commercial applications as well as for the challenge it presents in terms of complexity and performance demands. The principal effort is being performed by two major contractors, Lockheed Sanders (Nashua, NH) and Martin Marietta (Camden, NJ). For both, improvements in methodology are to be exercised and refined through the performance of individual 'Demonstration' efforts. The Lockheed Sanders' Demonstration effort is to develop an infrared search and track (IRST) processor. In addition, both contractors' results are being measured by a series of externally administered (by Lincoln Labs) six-month Benchmark programs that measure process improvement as a function of time. The first two Benchmark programs are designing and implementing a synthetic aperture radar (SAR) processor. Our demonstration team is using commercially available VME modules from Mercury Computer to assemble a multiprocessor system scalable from one to hundreds of Intel i860 microprocessors. Custom modules for the sensor interface and display driver are also being developed. This system implements either proprietary or Navy owned algorithms to perform the compute-intensive IRST function in real time in an avionics environment. Our Benchmark team is designing custom modules using commercially available processor ship sets, communication submodules, and reconfigurable logic devices. One of the modules contains multiple vector processors optimized for fast Fourier transform processing. Another module is a fiberoptic interface that accepts high-rate input data from the sensors and provides video-rate output data to a

  17. The hardware track finder processor in CMS at CERN

    CERN Document Server

    Kluge, A

    1997-01-01

    The work covers the design of the Track Finder Processor in the high energy experiment CMS (Compact Muon Solenoid, planned for 2005) at CERN/Geneva. The task of this processor is to identify muons and measure their transverse momentum. The track finder processor makes it possible to determine the physical relevance of each high energetic collision and to forward only interesting data to the data an alysis units. Data of more than two hundred thousand detector cells are used to determine the location of muons and measure their transverse momentum. Each 25 ns a new data set is generated. Measurem ent of location and transverse momentum of the muons can be terminated within 350 ns by using an ASIC (Application Specific Integrated Circuit). A pipeline architecture processes new data sets with th e required data rate of 40 MHz to ensure dead time free operation. In the framework of this study specifications and the overall concept of the track finder processor were worked out in detail. Simul ations were performed...

  18. Performance-Optimum Superscalar Architecture for Embedded Applications

    CERN Document Server

    Alipour, Mehdi

    2012-01-01

    Embedded applications are widely used in portable devices such as wireless phones, personal digital assistants, laptops, etc. High throughput and real time requirements are especially important in such data-intensive tasks. Therefore, architectures that provide the required performance are the most desirable. On the other hand, processor performance is severely related to the average memory access delay, number of processor registers and also size of the instruction window and superscalar parameters. Therefore, cache, register file and superscalar parameters are the major architectural concerns in designing a superscalar architecture for embedded processors. Although increasing cache and register file size leads to performance improvements in high performance embedded processors, the increased area, power consumption and memory delay are the overheads of these techniques. This paper explores the effect of cache, register file and superscalar parameters on the processor performance to specify the optimum size ...

  19. Emergent Auditory Feature Tuning in a Real-Time Neuromorphic VLSI System.

    Science.gov (United States)

    Sheik, Sadique; Coath, Martin; Indiveri, Giacomo; Denham, Susan L; Wennekers, Thomas; Chicca, Elisabetta

    2012-01-01

    Many sounds of ecological importance, such as communication calls, are characterized by time-varying spectra. However, most neuromorphic auditory models to date have focused on distinguishing mainly static patterns, under the assumption that dynamic patterns can be learned as sequences of static ones. In contrast, the emergence of dynamic feature sensitivity through exposure to formative stimuli has been recently modeled in a network of spiking neurons based on the thalamo-cortical architecture. The proposed network models the effect of lateral and recurrent connections between cortical layers, distance-dependent axonal transmission delays, and learning in the form of Spike Timing Dependent Plasticity (STDP), which effects stimulus-driven changes in the pattern of network connectivity. In this paper we demonstrate how these principles can be efficiently implemented in neuromorphic hardware. In doing so we address two principle problems in the design of neuromorphic systems: real-time event-based asynchronous communication in multi-chip systems, and the realization in hybrid analog/digital VLSI technology of neural computational principles that we propose underlie plasticity in neural processing of dynamic stimuli. The result is a hardware neural network that learns in real-time and shows preferential responses, after exposure, to stimuli exhibiting particular spectro-temporal patterns. The availability of hardware on which the model can be implemented, makes this a significant step toward the development of adaptive, neurobiologically plausible, spike-based, artificial sensory systems.

  20. A neuromorphic VLSI device for implementing 2-D selective attention systems.

    Science.gov (United States)

    Indiveri, G

    2001-01-01

    Selective attention is a mechanism used to sequentially select and process salient subregions of the input space, while suppressing inputs arriving from nonsalient regions. By processing small amounts of sensory information in a serial fashion, rather than attempting to process all the sensory data in parallel, this mechanism overcomes the problem of flooding limited processing capacity systems with sensory inputs. It is found in many biological systems and can be a useful engineering tool for developing artificial systems that need to process in real-time sensory data. In this paper we present a neuromorphic hardware model of a selective attention mechanism implemented on a very large scale integration (VLSI) chip, using analog circuits. The chip makes use of a spike-based representation for receiving input signals, transmitting output signals and for shifting the selection of the attended input stimulus over time. It can be interfaced to neuromorphic sensors and actuators, for implementing multichip selective attention systems. We describe the characteristics of the circuits used in the architecture and present experimental data measured from the system.

  1. A Single Chip VLSI Implementation of a QPSK/SQPSK Demodulator for a VSAT Receiver Station

    Science.gov (United States)

    Kwatra, S. C.; King, Brent

    1995-01-01

    This thesis presents a VLSI implementation of a QPSK/SQPSK demodulator. It is designed to be employed in a VSAT earth station that utilizes the FDMA/TDM link. A single chip architecture is used to enable this chip to be easily employed in the VSAT system. This demodulator contains lowpass filters, integrate and dump units, unique word detectors, a timing recovery unit, a phase recovery unit and a down conversion unit. The design stages start with a functional representation of the system by using the C programming language. Then it progresses into a register based representation using the VHDL language. The layout components are designed based on these VHDL models and simulated. Component generators are developed for the adder, multiplier, read-only memory and serial access memory in order to shorten the design time. These sub-components are then block routed to form the main components of the system. The main components are block routed to form the final demodulator.

  2. Emergent auditory feature tuning in a real-time neuromorphic VLSI system

    Directory of Open Access Journals (Sweden)

    Sadique eSheik

    2012-02-01

    Full Text Available Many sounds of ecological importance, such as communication calls, are characterised by time-varying spectra. However, most neuromorphic auditory models to date have focused on distinguishing mainly static patterns, under the assumption that dynamic patterns can be learned as sequences of static ones. In contrast, the emergence of dynamic feature sensitivity through exposure to formative stimuli has been recently modeled in a network of spiking neurons based on the thalamocortical architecture. The proposed network models the effect of lateral and recurrent connections between cortical layers, distance-dependent axonal transmission delays, and learning in the form of Spike Timing Dependent Plasticity (STDP, which effects stimulus-driven changes in the pattern of network connectivity. In this paper we demonstrate how these principles can be efficiently implemented in neuromorphic hardware. In doing so we address two principle problems in the design of neuromorphic systems: real-time event-based asynchronous communication in multi-chip systems, and the realization in hybrid analog/digital VLSI technology of neural computational principles that we propose underlie plasticity in neural processing of dynamic stimuli. The result is a hardware neural network that learns in real-time and shows preferential responses, after exposure, to stimuli exhibiting particular spectrotemporal patterns. The availability of hardware on which the model can be implemented, makes this a significant step towards the development of adaptive, neurobiologically plausible, spike-based, artificial sensory systems.

  3. Pipelining and bypassing in a RISC/DSP processor

    Science.gov (United States)

    Yu, Guojun; Yao, Qingdong; Liu, Peng; Jiang, Zhidi; Li, Fuping

    2005-03-01

    This paper proposes pipelining and bypassing unit (BPU) design method in our 32-bit RISC/DSP processor: MediaDsp3201 (briefly, MD32). MD32 is realized in 0.18μm technology, 1.8v, 200MHz working clock and can achieve 200 million/s Multiply-Accumulate (MAC) operations. It merges RISC architecture and DSP computation capability thoroughly, achieves fundamental RISC, extended DSP and single instruction multiple data (SIMD) instruction set with various addressing modes in a unified and customized DSP pipeline stage architecture. We will first describe the pipeline structure of MD32, comparing it to typical RISC-style pipeline structure. And then we will study the validity of two bypassing schemes in terms of their effectiveness in resolving pipeline data hazards: Centralized and Distributed BPU design strategy (CBPU and DBPU). A bypassing circuit chain model is given for DBPU, which register read is only placed at ID pipe stage. Considering the processor"s working clock which is decided by the pipeline time delay, the optimization of circuit that serial select with priority is also analyzed in detail since the BPU consists of a long serial path for combination logic. Finally, the performance improvement is analyzed.

  4. Imaging with polycrystalline mercuric iodide detectors using VLSI readout

    Energy Technology Data Exchange (ETDEWEB)

    Turchetta, R.; Dulinski, W.; Husson, D.; Riester, J.L.; Schieber, M.; Zuck, A.; Melekhov, L.; Saado, Y.; Hermon, H.; Nissenbaum, J

    1999-06-01

    Potentially low cost and large area polycrystalline mercuric iodide room-temperature radiation detectors, with thickness of 100-600 {mu}m have been successfully tested with dedicated low-noise, low-power mixed signal VLSI electronics which can be used for compact, imaging solutions. The detectors are fabricated by depositing HgI{sub 2} directly on an insulating substrate having electrodes in the form of microstrips and pixels with an upper continuous electrode. The deposition is made either by direct evaporation or by screen printing HgI{sub 2} mixed with glue such as Poly-Vinyl-Butiral. The properties of these first-generation detectors are quite uniform from one detector to another. Also for each single detector the response is quite uniform and no charge loss in the inter-electrode space have been detected. Because of the low cost and of the polycrystallinity, detectors can be potentially fabricated in any size and shape, using standard ceramic technology equipment, which is an attractive feature where low cost and large area applications are needed. The detectors which act as radiation counters have been tested with a beta source as well as in a high-energy beam of 100 GeV muons at CERN, connected to VLSI, low noise electronics. Charge collection efficiency and uniformity have been studied. The charge is efficiently collected even in the space between strips indicating that fill factors of 100% could be reached in imaging applications with direct detection of radiation. Single photon counting capability is reached with VLSI electronics. These results show the potential of this material for applications demanding position sensitive, radiation resistant, room-temperature operating radiation detectors, where position resolution is essential, as it can be found in some applications in high-energy physics, nuclear medicine and astrophysics.

  5. VLSI implementations of threshold logic-a comprehensive survey.

    Science.gov (United States)

    Beiu, V; Quintana, J M; Avedillo, M J

    2003-01-01

    This paper is an in-depth review on silicon implementations of threshold logic gates that covers several decades. In this paper, we will mention early MOS threshold logic solutions and detail numerous very-large-scale integration (VLSI) implementations including capacitive (switched capacitor and floating gate with their variations), conductance/current (pseudo-nMOS and output-wired-inverters, including a plethora of solutions evolved from them), as well as many differential solutions. At the end, we will briefly mention other implementations, e.g., based on negative resistance devices and on single electron technologies.

  6. Crystal growth and evaluation of silicon for VLSI and ULSI

    CERN Document Server

    Eranna, Golla

    2014-01-01

    PrefaceAbout the AuthorIntroductionSilicon: The SemiconductorWhy Single CrystalsRevolution in Integrated Circuit Fabrication Technology and the Art of Device MiniaturizationUse of Silicon as a SemiconductorSilicon Devices for Boolean ApplicationsIntegration of Silicon Devices and the Art of Circuit MiniaturizationMOS and CMOS Devices for Digital ApplicationsLSI, VLSI, and ULSI Circuits and ApplicationsSilicon for MEMS ApplicationsSummaryReferencesSilicon: The Key Material for Integrated Circuit Fabrication TechnologyIntroductionPreparation of Raw Silicon MaterialMetallurgical-Grade SiliconPuri

  7. Formal verification an essential toolkit for modern VLSI design

    CERN Document Server

    Seligman, Erik; Kumar, M V Achutha Kiran

    2015-01-01

    Formal Verification: An Essential Toolkit for Modern VLSI Design presents practical approaches for design and validation, with hands-on advice for working engineers integrating these techniques into their work. Building on a basic knowledge of System Verilog, this book demystifies FV and presents the practical applications that are bringing it into mainstream design and validation processes at Intel and other companies. The text prepares readers to effectively introduce FV in their organization and deploy FV techniques to increase design and validation productivity. Presents formal verific

  8. Low-power Analog VLSI Implementation of Wavelet Transform

    Institute of Scientific and Technical Information of China (English)

    ZHANG Jiang-hong

    2009-01-01

    For applications requiring low-power, low-voltage and real-time, a novel analog VLSI implementation of continuous Marr wavelet transform based on CMOS log-domain integrator is proposed.Mart wavelet is approximated by a parameterized class of function and with Levenbery-Marquardt nonlinear least square method,the optimum parameters of this function are obtained.The circuits of implementating Mart wavelet transform are composed of analog filter whose impulse response is the required wavelet.The filter design is based on IFLF structure with CMOS log-domain integrators as the main building blocks.SPICE simulations indicate an excellent approximations of ideal wavelet.

  9. VLSI implementation of a fairness ATM buffer system

    DEFF Research Database (Denmark)

    Nielsen, J.V.; Dittmann, Lars; Madsen, Jens Kargaard

    1996-01-01

    This paper presents a VLSI implementation of a resource allocation scheme, based on the concept of weighted fair queueing. The design can be used in asynchronous transfer mode (ATM) networks to ensure fairness and robustness. Weighted fair queueing is a scheduling and buffer management scheme...... that can provide a resource allocation policy and enforcement of this policy. It can be used in networks in order to provide defined allocation policies (fairness) and improve network robustness. The presented design illustrates how the theoretical weighted fair queueing model can be approximated...

  10. An adaptive, lossless data compression algorithm and VLSI implementations

    Science.gov (United States)

    Venbrux, Jack; Zweigle, Greg; Gambles, Jody; Wiseman, Don; Miller, Warner H.; Yeh, Pen-Shu

    1993-01-01

    This paper first provides an overview of an adaptive, lossless, data compression algorithm originally devised by Rice in the early '70s. It then reports the development of a VLSI encoder/decoder chip set developed which implements this algorithm. A recent effort in making a space qualified version of the encoder is described along with several enhancements to the algorithm. The performance of the enhanced algorithm is compared with those from other currently available lossless compression techniques on multiple sets of test data. The results favor our implemented technique in many applications.

  11. A VLSI Algorithm for Calculating the Treee to Tree Distance

    Institute of Scientific and Technical Information of China (English)

    徐美瑞; 刘小林

    1993-01-01

    Given two ordered,labeled trees βand α,to find the distance from tree β to tree α is an important problem in many fields,for example,the pattern recognition field.In this paper,a VLSI algorithm for calculating the tree-to-tree distance is presented.The computation structure of the algorithm is a 2-D Mesh with the size m&n.and the time is O(m=n),where m,n are the numbers of nodes of the tree βand tree α,respectively.

  12. Fast, Massively Parallel Data Processors

    Science.gov (United States)

    Heaton, Robert A.; Blevins, Donald W.; Davis, ED

    1994-01-01

    Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.

  13. ASSP Advanced Sensor Signal Processor.

    Science.gov (United States)

    1984-06-01

    transfer data sad cimeds . When a Processor receives the required data (Image) md/or oamand, that data will be operated on B-3 I I I autonomouly. The...BAN is provided by two separately controled DMA address generator chips (Am29o40). Each of these DMA chips create an 8 bit address. One DMA chip gene

  14. Development of an integrated circuit VLSI used for time measurement and selective read out in the front end electronics of the DIRC for the Babar experience at SLAC; Developpement d'un circuit integre VLSI assurant mesure de temps et lecture selective dans l'electronique frontale du compteur DIRC de l'experience babar a slac

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, B

    1999-07-01

    This thesis deals with the design the development and the tests of an integrated circuit VLSI, supplying selective read and time measure for 16 channels. This circuit has been developed for a experiment of particles physics, BABAR, that will take place at SLAC (Stanford Linear Accelerator Center). A first part describes the physical stakes of the experiment, the electronic architecture and the place of the developed circuit in the research program. The second part presents the technical drawings of the circuit, the prototypes leading to the final design and the validity tests. (A.L.B.)

  15. Cassava processors' awareness of occupational and environmental ...

    African Journals Online (AJOL)

    Cassava processors' awareness of occupational and environmental hazards ... Majority of the respondents also complained of lack of water (78.4%), lack of ... so as to reduce the problems faced by cassava processors during processing.

  16. Architecture on Architecture

    DEFF Research Database (Denmark)

    Olesen, Karen

    2016-01-01

    This paper will discuss the challenges faced by architectural education today. It takes as its starting point the double commitment of any school of architecture: on the one hand the task of preserving the particular knowledge that belongs to the discipline of architecture, and on the other hand...... that is not scientific or academic but is more like a latent body of data that we find embedded in existing works of architecture. This information, it is argued, is not limited by the historical context of the work. It can be thought of as a virtual capacity – a reservoir of spatial configurations that can...... the autonomy of architecture, not as an esoteric concept but as a valid source of information in a pragmatic design practice, may help us overcome the often-proclaimed dichotomy between formal autonomy and a societally committed architecture. It follows that in architectural education there can be a close...

  17. Radiation tolerant back biased CMOS VLSI

    Science.gov (United States)

    Maki, Gary K. (Inventor); Gambles, Jody W. (Inventor); Hass, Kenneth J. (Inventor)

    2003-01-01

    A CMOS circuit formed in a semiconductor substrate having improved immunity to total ionizing dose radiation, improved immunity to radiation induced latch up, and improved immunity to a single event upset. The architecture of the present invention can be utilized with the n-well, p-well, or dual-well processes. For example, a preferred embodiment of the present invention is described relative to a p-well process wherein the p-well is formed in an n-type substrate. A network of NMOS transistors is formed in the p-well, and a network of PMOS transistors is formed in the n-type substrate. A contact is electrically coupled to the p-well region and is coupled to first means for independently controlling the voltage in the p-well region. Another contact is electrically coupled to the n-type substrate and is coupled to second means for independently controlling the voltage in the n-type substrate. By controlling the p-well voltage, the effective threshold voltages of the n-channel transistors both drawn and parasitic can be dynamically tuned. Likewise, by controlling the n-type substrate, the effective threshold voltages of the p-channel transistors both drawn and parasitic can also be dynamically tuned. Preferably, by optimizing the threshold voltages of the n-channel and p-channel transistors, the total ionizing dose radiation effect will be neutralized and lower supply voltages can be utilized for the circuit which would result in the circuit requiring less power.

  18. Digital design and computer architecture

    CERN Document Server

    Harris, David

    2010-01-01

    Digital Design and Computer Architecture is designed for courses that combine digital logic design with computer organization/architecture or that teach these subjects as a two-course sequence. Digital Design and Computer Architecture begins with a modern approach by rigorously covering the fundamentals of digital logic design and then introducing Hardware Description Languages (HDLs). Featuring examples of the two most widely-used HDLs, VHDL and Verilog, the first half of the text prepares the reader for what follows in the second: the design of a MIPS Processor. By the end of D

  19. QPACE -- a QCD parallel computer based on Cell processors

    CERN Document Server

    Baier, H; Drochner, M; Eicker, N; Fischer, U; Fodor, Z; Frommer, A; Gomez, C; Goldrian, G; Heybrock, S; Hierl, D; Hüsken, M; Huth, T; Krill, B; Lauritsen, J; Lippert, T; Maurer, T; Meyer, N; Nobile, A; Ouda, I; Pivanti, M; Pleiter, D; Schäfer, A; Schick, H; Schifano, F; Simma, H; Solbrig, S; Streuer, T; Sulanke, K -H; Tripiccione, R; Vogt, J -S; Wettig, T; Winter, F

    2009-01-01

    QPACE is a novel parallel computer which has been developed to be primarily used for lattice QCD simulations. The compute power is provided by the IBM PowerXCell 8i processor, an enhanced version of the Cell processor that is used in the Playstation 3. The QPACE nodes are interconnected by a custom, application optimized 3-dimensional torus network implemented on an FPGA. To achieve the very high packaging density of 26 TFlops per rack a new water cooling concept has been developed and successfully realized. In this paper we give an overview of the architecture and highlight some important technical details of the system. Furthermore, we provide initial performance results and report on the installation of 8 QPACE racks providing an aggregate peak performance of 200 TFlops.

  20. Programming massively parallel processors a hands-on approach

    CERN Document Server

    Kirk, David B

    2010-01-01

    Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...

  1. JIST: Just-In-Time Scheduling Translation for Parallel Processors

    Directory of Open Access Journals (Sweden)

    Giovanni Agosta

    2005-01-01

    Full Text Available The application fields of bytecode virtual machines and VLIW processors overlap in the area of embedded and mobile systems, where the two technologies offer different benefits, namely high code portability, low power consumption and reduced hardware cost. Dynamic compilation makes it possible to bridge the gap between the two technologies, but special attention must be paid to software instruction scheduling, a must for the VLIW architectures. We have implemented JIST, a Virtual Machine and JIT compiler for Java Bytecode targeted to a VLIW processor. We show the impact of various optimizations on the performance of code compiled with JIST through the experimental study on a set of benchmark programs. We report significant speedups, and increments in the number of instructions issued per cycle up to 50% with respect to the non-scheduling version of the JITcompiler. Further optimizations are discussed.

  2. Investigating the Performance of an Adiabatic Quantum Optimization Processor

    CERN Document Server

    Rose, Geordie; Dickson, Neil G; Hamze, Firas; Amin, M H S; Drew-Brook, Marshall; Chudak, Fabian A; Bunyk, Paul I; Macready, William G

    2010-01-01

    We calculate median adiabatic times (in seconds) of a specific superconducting adiabatic quantum processor for an NP-hard Ising spin glass instance class with up to N=128 binary variables. To do so, we ran high performance Quantum Monte Carlo simulations on a large-scale Internet-based computing platform. We compare the median adiabatic times with the median running times of two classical solvers and find that, for problems with up to 128 variables, the adiabatic times for the simulated processor architecture are about 4 and 6 orders of magnitude shorter than the two classical solvers' times. This performance difference shows that, even in the potential absence of a scaling advantage, adiabatic quantum optimization may outperform classical solvers.

  3. Image processing algorithm acceleration using reconfigurable macro processor model

    Institute of Scientific and Technical Information of China (English)

    孙广富; 陈华明; 卢焕章

    2004-01-01

    The concept and advantage of reconfigurable technology is introduced. A kind of processor architecture of reconfigurable macro processor (RMP) model based on FPGA array and DSP is put forward and has been implemented.Two image algorithms are developed: template-based automatic target recognition and zone labeling. One is estimating for motion direction in the infrared image background, another is line picking-up algorithm based on image zone labeling and phase grouping technique. It is a kind of "hardware" function that can be called by the DSP in high-level algorithm.It is also a kind of hardware algorithm of the DSP. The results of experiments show the reconfigurable computing technology based on RMP is an ideal accelerating means to deal with the high-speed image processing tasks. High real time performance is obtained in our two applications on RMP.

  4. In-Network Adaptation of Video Streams Using Network Processors

    Directory of Open Access Journals (Sweden)

    Mohammad Shorfuzzaman

    2009-01-01

    problem can be addressed, near the network edge, by applying dynamic, in-network adaptation (e.g., transcoding of video streams to meet available connection bandwidth, machine characteristics, and client preferences. In this paper, we extrapolate from earlier work of Shorfuzzaman et al. 2006 in which we implemented and assessed an MPEG-1 transcoding system on the Intel IXP1200 network processor to consider the feasibility of in-network transcoding for other video formats and network processor architectures. The use of “on-the-fly” video adaptation near the edge of the network offers the promise of simpler support for a wide range of end devices with different display, and so forth, characteristics that can be used in different types of environments.

  5. VLSI implementation of a new LMS-based algorithm for noise removal in ECG signal

    Science.gov (United States)

    Satheeskumaran, S.; Sabrigiriraj, M.

    2016-06-01

    Least mean square (LMS)-based adaptive filters are widely deployed for removing artefacts in electrocardiogram (ECG) due to less number of computations. But they posses high mean square error (MSE) under noisy environment. The transform domain variable step-size LMS algorithm reduces the MSE at the cost of computational complexity. In this paper, a variable step-size delayed LMS adaptive filter is used to remove the artefacts from the ECG signal for improved feature extraction. The dedicated digital Signal processors provide fast processing, but they are not flexible. By using field programmable gate arrays, the pipelined architectures can be used to enhance the system performance. The pipelined architecture can enhance the operation efficiency of the adaptive filter and save the power consumption. This technique provides high signal-to-noise ratio and low MSE with reduced computational complexity; hence, it is a useful method for monitoring patients with heart-related problem.

  6. Spike timing dependent plasticity (STDP) can ameliorate process variations in neuromorphic VLSI.

    Science.gov (United States)

    Cameron, Katherine; Boonsobhak, Vasin; Murray, Alan; Renshaw, David

    2005-11-01

    A transient-detecting very large scale integration (VLSI) pixel is described, suitable for use in a visual-processing, depth-recovery algorithm based upon spike timing. A small array of pixels is coupled to an adaptive system, based upon spike timing dependent plasticity (STDP), that aims to reduce the effect of VLSI process variations on the algorithm's performance. Results from 0.35 microm CMOS temporal differentiating pixels and STDP circuits show that the system is capable of adapting to substantially reduce the effects of process variations without interrupting the algorithm's natural processes. The concept is generic to all spike timing driven processing algorithms in a VLSI.

  7. Methodology of Efficient Energy Design for Noisy Deep Submicron VLSI Chips

    Institute of Scientific and Technical Information of China (English)

    WANGJun

    2004-01-01

    Power dissipation is becoming increasingly important as technology continues to scale. This paper describes a way to consider the dynamic, static and shortcircuit power dissipation simultaneously for making complete, quantitative prediction on the total power dissipation of noisy VLSI chip. Especially, this new method elucidates the mechanism of power dissipation caused by the intrinsic noise of deep submicron VLSI chip. To capture the noise dependency of efficient energy design strategies for VLSI chip, the simulation of two illustrative cases are observed. Finally, the future works are proposed for the optimum tradeoff among the power, speed and area, which includes the use of floating-body partially depleted silicon-on-insulator CMOS technology.

  8. 40 CFR 791.45 - Processors.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 31 2010-07-01 2010-07-01 true Processors. 791.45 Section 791.45 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) TOXIC SUBSTANCES CONTROL ACT (CONTINUED) DATA REIMBURSEMENT Basis for Proposed Order § 791.45 Processors. (a) Generally, processors will be...

  9. A high-accuracy optical linear algebra processor for finite element applications

    Science.gov (United States)

    Casasent, D.; Taylor, B. K.

    1984-01-01

    Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced.

  10. Raexplore: Enabling Rapid, Automated Architecture Exploration for Full Applications

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Yao [Argonne National Lab. (ANL), Argonne, IL (United States); Balaprakash, Prasanna [Argonne National Lab. (ANL), Argonne, IL (United States); Meng, Jiayuan [Argonne National Lab. (ANL), Argonne, IL (United States); Morozov, Vitali [Argonne National Lab. (ANL), Argonne, IL (United States); Parker, Scott [Argonne National Lab. (ANL), Argonne, IL (United States); Kumaran, Kalyan [Argonne National Lab. (ANL), Argonne, IL (United States)

    2014-12-01

    We present Raexplore, a performance modeling framework for architecture exploration. Raexplore enables rapid, automated, and systematic search of architecture design space by combining hardware counter-based performance characterization and analytical performance modeling. We demonstrate Raexplore for two recent manycore processors IBM Blue- Gene/Q compute chip and Intel Xeon Phi, targeting a set of scientific applications. Our framework is able to capture complex interactions between architectural components including instruction pipeline, cache, and memory, and to achieve a 3–22% error for same-architecture and cross-architecture performance predictions. Furthermore, we apply our framework to assess the two processors, and discover and evaluate a list of architectural scaling options for future processor designs.

  11. Testing interconnected VLSI circuits in the Big Viterbi Decoder

    Science.gov (United States)

    Onyszchuk, I. M.

    1991-01-01

    The Big Viterbi Decoder (BVD) is a powerful error-correcting hardware device for the Deep Space Network (DSN), in support of the Galileo and Comet Rendezvous Asteroid Flyby (CRAF)/Cassini Missions. Recently, a prototype was completed and run successfully at 400,000 or more decoded bits per second. This prototype is a complex digital system whose core arithmetic unit consists of 256 identical very large scale integration (VLSI) gate-array chips, 16 on each of 16 identical boards which are connected through a 28-layer, printed-circuit backplane using 4416 wires. Special techniques were developed for debugging, testing, and locating faults inside individual chips, on boards, and within the entire decoder. The methods are based upon hierarchical structure in the decoder, and require that chips or boards be wired themselves as Viterbi decoders. The basic procedure consists of sending a small set of known, very noisy channel symbols through a decoder, and matching observables against values computed by a software simulation. Also, tests were devised for finding open and short-circuited wires which connect VLSI chips on the boards and through the backplane.

  12. New VLSI complexity results for threshold gate comparison

    Energy Technology Data Exchange (ETDEWEB)

    Beiu, V.

    1996-12-31

    The paper overviews recent developments concerning optimal (from the point of view of size and depth) implementations of COMPARISON using threshold gates. We detail a class of solutions which also covers another particular solution, and spans from constant to logarithmic depths. These circuit complexity results are supplemented by fresh VLSI complexity results having applications to hardware implementations of neural networks and to VLSI-friendly learning algorithms. In order to estimate the area (A) and the delay (T), as well as the classical AT{sup 2}, we shall use the following {open_quote}cost functions{close_quote}: (i) the connectivity (i.e., sum of fan-ins) and the number-of-bits for representing the weights and thresholds are used as closer approximations of the area; while (ii) the fan-ins and the length of the wires are used for closer estimates of the delay. Such approximations allow us to compare the different solutions-which present very interesting fan-in dependent depth-size and area-delay tradeoffs - with respect to AT{sup 2}.

  13. A fast neural-network algorithm for VLSI cell placement.

    Science.gov (United States)

    Aykanat, Cevdet; Bultan, Tevfik; Haritaoğlu, Ismail

    1998-12-01

    Cell placement is an important phase of current VLSI circuit design styles such as standard cell, gate array, and Field Programmable Gate Array (FPGA). Although nondeterministic algorithms such as Simulated Annealing (SA) were successful in solving this problem, they are known to be slow. In this paper, a neural network algorithm is proposed that produces solutions as good as SA in substantially less time. This algorithm is based on Mean Field Annealing (MFA) technique, which was successfully applied to various combinatorial optimization problems. A MFA formulation for the cell placement problem is derived which can easily be applied to all VLSI design styles. To demonstrate that the proposed algorithm is applicable in practice, a detailed formulation for the FPGA design style is derived, and the layouts of several benchmark circuits are generated. The performance of the proposed cell placement algorithm is evaluated in comparison with commercial automated circuit design software Xilinx Automatic Place and Route (APR) which uses SA technique. Performance evaluation is conducted using ACM/SIGDA Design Automation benchmark circuits. Experimental results indicate that the proposed MFA algorithm produces comparable results with APR. However, MFA is almost 20 times faster than APR on the average.

  14. Real-time optical processor prototype for remote SAR applications

    Science.gov (United States)

    Marchese, Linda; Doucet, Michel; Harnisch, Bernd; Suess, Martin; Bourqui, Pascal; Legros, Mathieu; Desnoyers, Nichola; Guillot, Ludovic; Mercier, Luc; Savard, Maxime; Martel, Anne; Châteauneuf, François; Bergeron, Alain

    2009-09-01

    A Compact Real-Time Optical SAR Processor has been successfully developed and tested. SAR, or Synthetic Aperture Radar, is a powerful tool providing enhanced day and night imaging capabilities. SAR systems typically generate large amounts of information generally in the form of complex data that are difficult to compress. Specifically, for planetary missions and unmanned aerial vehicle (UAV) systems with limited communication data rates this is a clear disadvantage. SAR images are typically processed electronically applying dedicated Fourier transformations. This, however, can also be performed optically in real-time. Indeed, the first SAR images have been optically processed. The optical processor architecture provides inherent parallel computing capabilities that can be used advantageously for the SAR data processing. Onboard SAR image generation would provide local access to processed information paving the way for real-time decision-making. This could eventually benefit navigation strategy and instrument orientation decisions. Moreover, for interplanetary missions, onboard analysis of images could provide important feature identification clues and could help select the appropriate images to be transmitted to Earth, consequently helping bandwidth management. This could ultimately reduce the data throughput requirements and related transmission bandwidth. This paper reviews the design of a compact optical SAR processor prototype that would reduce power, weight, and size requirements and reviews the analysis of SAR image generation using the table-top optical processor. Various SAR processor parameters such as processing capabilities, image quality (point target analysis), weight and size are reviewed. Results of image generation from simulated point targets as well as real satellite-acquired raw data are presented.

  15. Communications systems and methods for subsea processors

    Science.gov (United States)

    Gutierrez, Jose; Pereira, Luis

    2016-04-26

    A subsea processor may be located near the seabed of a drilling site and used to coordinate operations of underwater drilling components. The subsea processor may be enclosed in a single interchangeable unit that fits a receptor on an underwater drilling component, such as a blow-out preventer (BOP). The subsea processor may issue commands to control the BOP and receive measurements from sensors located throughout the BOP. A shared communications bus may interconnect the subsea processor and underwater components and the subsea processor and a surface or onshore network. The shared communications bus may be operated according to a time division multiple access (TDMA) scheme.

  16. Analysis of the computational requirements of a pulse-doppler radar signal processor

    CSIR Research Space (South Africa)

    Broich, R

    2012-05-01

    Full Text Available architectures [1]. These simplifications are often degrading to algorithmic performance and thus to the entire radar system. In this paper the different computational operations that are used in pulse-Doppler radar signal processing are explored, in order...H z to 10 GH z Fig. 1. Radar signal processor (RSP) flow of operations purpose computer architectures [3]. An abstract machine, in which only memory reads, writes, additions and multiplica- tions are considered to be significant operations...

  17. New Methodologies for Parallel Architecture

    Institute of Scientific and Technical Information of China (English)

    Dong-Rui Fan; Xiao-Wei Li; Guo-Jie Li

    2011-01-01

    Moore's law continues to grant computer architects ever more transistors in the foreseeable future,and parallelism is the key to continued performance scaling in modern microprocessors.In this paper,the achievements in our research project,which is supported by the National Basic Research 973 Program of China,on parallel architecture,are systematically presented.The innovative approaches and techniques to solve the significant problems in parallel architecture design are summarized,including architecture level optimization,compiler and languag~supported technologies,reliability,power-performance efficient design,test and verification challenges,and platform building.Two prototype chips,a multiheavy-core Godson-3 and a many-light-core Godson-T,are described to demonstrate the highly scalable and reconfigurable parallel architecture designs.We also present some of our achievements appearing in ISCA,MICRO,ISSCC,HPCA,PLDI,PACT,IJCAI,Hot Chips,DATE,IEEE Trans.VLSI,IEEE Micro,IEEE Trans.Computers,etc.

  18. A 2-D Convolver Architecture For Real-Time Image Processing

    Science.gov (United States)

    Landeta, David; Malinowski, Chris W.

    1989-08-01

    This paper presents a novel architecture for two VLSI ICs, an 8-bit and 12-bit version, which execute real-time 3x3 kernel image convolutions at rates exceeding 10 ms per 512x512 pixel frame (at a 30 MHz external clock rate). The ICs are capable of performing "on-the-fly" convolutions of images without any need for external input image buffers. Both symmetric and asymmetric coefficient kernels are supported, with coefficient precision up to 12 bits. Nine on-chip multiplier-accumulators maintain double-precision accuracy for maximum precision of the results and minimum roundoff noise. In addition, an on-chip ALU can be switched into the pixel datapath to perform simultaneous pixel-point operations on the incoming data. Thus, operations such as thresholding, inversion, shifts, and double frame arithmetic can be performed on the pixels with no extra speed penalty. Flexible internal datapaths of the processors provide easy means for cascadability of several devices if larger image arrays need to be processed. Moreover, larger convolution kernels, such as 6x6, can easily be supported with no speed penalty by employing two or more convolvers. On-chip delay buffers can be programmed to any desired raster line width up to 1024 pixels. The delay buffers may also be bypassed when direct "Sum-Of-Products" operation of the multipliers is required; such as when external frame buffer address sequencing is desired. These features make the convolvers suitable for applications such as affine and bilinear interpolation, one-dimensional convolution (FIR filtration), and matrix operations. Several examples of applications illustrating stand-alone and cascade mode operation of the ICs will be discussed.

  19. Two-dimensional convolver architecture for real-time image processing

    Science.gov (United States)

    Landeta, David; Malinowski, Chris W.

    1989-11-01

    This paper presents a novel architecture for two VLSI ICs, an 8-bit and 12-bit version, which execute real-time 3x3 kernel image convolutions at rates exceeding 10 ms per 512x512 pixel frame (at a 30 MHz external clock rate). The ICs are capable of performing "on-the-fly" convolutions of images without any need for external input image buffers. Both symmetric and asymmetric coefficient kernels are supported, with coefficient precision up to 12 bits. Nine on-chip multiplier-accumulators maintain double-precision accuracy for maximum precision of the results and minimum roundoff noise. In addition, an on-chip ALU can be switched into the pixel datapath to perform simultaneous pixel-point operations on the incoming data. Thus, operations such as thresholding, inversion, shifts, and double frame arithmetic can be performed on the pixels with no extra speed penalty. Flexible internal datapaths of the processors provide easy means for cascadability of several devices if larger image arrays need to be processed. Moreover, larger convolution kernels, such as 6x6, can easily be supported with no speed penalty by employing two or more convolvers. On-chip delay buffers can be programmed to any desired raster line width up to 1024 pixels. The delay buffers may also be bypassed when direct "Sum-Of-Products" operation of the multipliers is required; such as when external frame buffer address sequencing is desired. These features make the convolvers suitable for applications such as affine and bilinear interpolation, one-dimensional convolution (FIR filtration), and matrix operations. Several examples of applications illustrating stand-alone and cascade mode operation of the ICs will be discussed

  20. Challenges of Algebraic Multigrid across Multicore Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Baker, A H; Gamblin, T; Schulz, M; Yang, U M

    2010-04-12

    Algebraic multigrid (AMG) is a popular solver for large-scale scientific computing and an essential component of many simulation codes. AMG has shown to be extremely efficient on distributed-memory architectures. However, when executed on modern multicore architectures, we face new challenges that can significantly deteriorate AMG's performance. We examine its performance and scalability on three disparate multicore architectures: a cluster with four AMD Opteron Quad-core processors per node (Hera), a Cray XT5 with two AMD Opteron Hex-core processors per node (Jaguar), and an IBM BlueGene/P system with a single Quad-core processor (Intrepid). We discuss our experiences on these platforms and present results using both an MPI-only and a hybrid MPI/OpenMP model. We also discuss a set of techniques that helped to overcome the associated problems, including thread and process pinning and correct memory associations.

  1. A wearable real-time image processor for a vision prosthesis.

    Science.gov (United States)

    Tsai, D; Morley, J W; Suaning, G J; Lovell, N H

    2009-09-01

    Rapid progress in recent years has made implantable retinal prostheses a promising therapeutic option in the near future for patients with macular degeneration or retinitis pigmentosa. Yet little work on devices that encode visual images into electrical stimuli have been reported to date. This paper presents a wearable image processor for use as the external module of a vision prosthesis. It is based on a dual-core microprocessor architecture and runs the Linux operating system. A set of image-processing algorithms executes on the digital signal processor of the device, which may be controlled remotely via a standard desktop computer. The results indicate that a highly flexible and configurable image processor can be built with the dual-core architecture. Depending on the image-processing requirements, general-purpose embedded microprocessors alone may be inadequate for implementing image-processing strategies required by retinal prostheses.

  2. FPGA Based Implementation of Pipelined 32-bit RISC Processor with Floating Point Unit

    Directory of Open Access Journals (Sweden)

    Jinde Vijay Kumar

    2014-04-01

    Full Text Available This paper presents 32-bit RISC processor with floating point unit to be designed using pipelined architecture; through this we can improve the speed of the operation as well as overall performance. This processor is developed especially for Arithmetic operations of both fixed and floating point numbers, branch and logical functions. The proposed architecture is able to prevent pipelining from flushing when branch instruction occurs and able to provide halt support. Floating point operations are widely used these days for many applications ranging from graphics application to medical imaging. Thus, the processor can be used for diversified application area. The necessary code is written in the hardware description language Verilog HDL. Quartus II 10.1 suite is used for software development; Modelsim is used for simulations and then implementation on Altera DE 2 FPGA board. Keywords -

  3. Low-Power and High Speed 128-Point Pipline FFT/IFFT Processor for OFDM Applications

    Directory of Open Access Journals (Sweden)

    D. Rajaveerappa

    2012-03-01

    Full Text Available This paper represents low power and high speed 128-point pipelined Fast Fourier Transform (FFT and its inverse Fast Fourier Transform (IFFT processor for OFDM. The Modified architecture also provides concept of ROM module and variable length support from 128~2048 point for FFT/IFFT for OFDM applications such as digital audio broadcasting (DAB, digital video broadcasting-terrestrial (DVB-T, asymmetric digital subscriber loop (ADSL and very-high-speed digital subscriber loop (VDSL. The 128-point architecture consists of an optimized pipeline implementation based on Radix-2 butterfly processor Element. To reduce power consumption and chip area, special current-mode SRAMs are adopted to replace shift registers in the delay lines. In low-power operation, when the supply voltage is scaled down to 2.3 V, the processor consumes 176mW when it runs at 17.8 MHz.

  4. Turbo decoder architecture for beyond-4G applications

    CERN Document Server

    Wong, Cheng-Chi

    2013-01-01

    This book describes the most recent techniques for turbo decoder implementation, especially for 4G and beyond 4G applications. The authors reveal techniques for the design of high-throughput decoders for future telecommunication systems, enabling designers to reduce hardware cost and shorten processing time. Coverage includes an explanation of VLSI implementation of the turbo decoder, from basic functional units to advanced parallel architecture. The authors discuss both hardware architecture techniques and experimental results, showing the variations in area/throughput/performance with respec

  5. A VLSI Architecture Evaluation of a Syntax Element Level Parallel Arithmetic Entropy Coder for Parallel H.264 Encoder%一种用于并行H.264编码器的语法元素级分组并行算术编码器体系结构的评估

    Institute of Scientific and Technical Information of China (English)

    陈胜刚; 陈书明; 谷会涛; 刘尧

    2012-01-01

    设计了一种语法元素指令流驱动的全流水CABAC(Context-based Adaptive Binary Arithmetic Coding)熵编码VLSI结构,并对提出的语法元素级分组并行算术编码器的体系结构进行了设计和开销评估.该并行方法可以与现有符号级并行算法正交,可同时使用,适合大规模片上并行视频编码器;相比标准CABAC,增加约55%的晶体管即可实现2倍以上的符号处理加速比和>1Gbin/s的吞吐率.%This paper proposes a new CABAC pipeline driving by syntax element instructions and a GABAC-based parallel arithmetic entropy coder for on-chip large-scale parallel H.264 video coders is also presented in this paper. Furthermore, the hardware architecture and transistor cost of the new parallel entropy coder are evaluated. This new parallel entropy coder can cooperate with the traditional symbol-level parallel algorithms and it suits the manycore platform well.Compared with the traditional CABAC hardware,the new parallel entropy coder can double its throughput to > 1Gbins/s by a cost of 55% transistors.

  6. The UA1 trigger processor

    CERN Document Server

    Grayer, G H

    1981-01-01

    Experiment UA1 is a large multipurpose spectrometer at the CERN proton-antiproton collider. The principal trigger is formed on the basis of the energy deposition in calorimeters. A trigger decision taken in under 2.4 microseconds can avoid dead-time losses due to the bunched nature of the beam. To achieve this fast 8-bit charge to digital converters have been built followed by two identical digital processors tailored to the experiment. The outputs of groups of the 2440 photomultipliers in the calorimeters are summed to form a total of 288 input channels to the ADCs. A look-up table in RAM is used to convert the digitised photomultiplier signals to energy in one processor, and to transverse energy in the other. Each processor forms four sums from a chosen combination of input channels, and also counts the number of clusters with electromagnetic or hadronic energy above pre-determined levels. Up to twelve combinations of these conditions, together with external information, may be combined in coincidence or in...

  7. Taxonomy of Data Prefetching for Multicore Processors

    Institute of Scientific and Technical Information of China (English)

    Surendra Byna; Yong Chen; Xian-He Sun

    2009-01-01

    Data prefetching is an effective data access latency hiding technique to mask the CPU stall caused by cache misses and to bridge the performance gap between processor and memory. With hardware and/or software support, data prefetching brings data closer to a processor before it is actually needed. Many prefetching techniques have been developed for single-core processors. Recent developments in processor technology have brought multicore processors into mainstream.While some of the single-core prefetching techniques are directly applicable to multicore processors, numerous novel strategies have been proposed in the past few years to take advantage of multiple cores. This paper aims to provide a comprehensive review of the state-of-the-art prefetching techniques, and proposes a taxonomy that classifies various design concerns in developing a prefetching strategy, especially for multicore processors. We compare various existing methods through analysis as well.

  8. Real-time simulation of biologically realistic stochastic neurons in VLSI.

    Science.gov (United States)

    Chen, Hsin; Saighi, Sylvain; Buhry, Laure; Renaud, Sylvie

    2010-09-01

    Neuronal variability has been thought to play an important role in the brain. As the variability mainly comes from the uncertainty in biophysical mechanisms, stochastic neuron models have been proposed for studying how neurons compute with noise. However, most papers are limited to simulating stochastic neurons in a digital computer. The speed and the efficiency are thus limited especially when a large neuronal network is of concern. This brief explores the feasibility of simulating the stochastic behavior of biological neurons in a very large scale integrated (VLSI) system, which implements a programmable and configurable Hodgkin-Huxley model. By simply injecting noise to the VLSI neuron, various stochastic behaviors observed in biological neurons are reproduced realistically in VLSI. The noise-induced variability is further shown to enhance the signal modulation of a neuron. These results point toward the development of analog VLSI systems for exploring the stochastic behaviors of biological neuronal networks in large scale.

  9. Vertically Coupled Microring Resonator Filter :Versatile Building Block for VLSI Filter Circuits

    Institute of Scientific and Technical Information of China (English)

    Yasuo; Kokubun

    2003-01-01

    In this review, the recent progress in the development of vertically coupled micro-ring resonator filters is summarized and the potential applications of the filters leading to the development of VLSI photonics are described.

  10. Vertically Coupled Microring Resonator Filter : Versatile Building Block for VLSI Filter Circuits

    Institute of Scientific and Technical Information of China (English)

    Yasuo Kokubun

    2003-01-01

    In this review, the recent progress in the development of vertically coupled micro-ring resonator filters is summarized and the potential applications of the filters leading to the development of VLSI photonics are described.

  11. Application of evolutionary algorithms for multi-objective optimization in VLSI and embedded systems

    CERN Document Server

    2015-01-01

    This book describes how evolutionary algorithms (EA), including genetic algorithms (GA) and particle swarm optimization (PSO) can be utilized for solving multi-objective optimization problems in the area of embedded and VLSI system design. Many complex engineering optimization problems can be modelled as multi-objective formulations. This book provides an introduction to multi-objective optimization using meta-heuristic algorithms, GA and PSO, and how they can be applied to problems like hardware/software partitioning in embedded systems, circuit partitioning in VLSI, design of operational amplifiers in analog VLSI, design space exploration in high-level synthesis, delay fault testing in VLSI testing, and scheduling in heterogeneous distributed systems. It is shown how, in each case, the various aspects of the EA, namely its representation, and operators like crossover, mutation, etc. can be separately formulated to solve these problems. This book is intended for design engineers and researchers in the field ...

  12. Adaptive Optoelectronic Eyes: Hybrid Sensor/Processor Architectures

    Science.gov (United States)

    2006-11-13

    J.  Lange , C. von der Malsburg, R. P. Würtz, and W. Konen, “Distortion Invariant Object Recognition Adaptive Optoelectronic Eyes: Hybrid Sensor...Meeting, Dallas, Texas, (November, 1998). 17.  G. Sáry, G. Kovács, K. Köteles, G.  Benedek , J. Fiser, and I. Biederman, “Selectivity Variations in Monkey

  13. A system architecture, processor, and communication protocol for secure implants

    NARCIS (Netherlands)

    C. Strydis (Christos); R.M. Seepers (Robert); P. Peris-Lopez (Pedro); D. Siskos (Dimitrios); I. Sourdis (Ioannis)

    2013-01-01

    textabstractSecure and energy-efficient communication between Implantable Medical Devices (IMDs) and authorized external users is attracting increasing attention these days. However, there currently exists no systematic approach to the problem, while solutions from neighboring fields, such as wirele

  14. A system architecture, processor, and communication protocol for secure implants

    NARCIS (Netherlands)

    C. Strydis (Christos); R.M. Seepers (Robert); P. Peris-Lopez (Pedro); D. Siskos (Dimitrios); I. Sourdis (Ioannis)

    2013-01-01

    textabstractSecure and energy-efficient communication between Implantable Medical Devices (IMDs) and authorized external users is attracting increasing attention these days. However, there currently exists no systematic approach to the problem, while solutions from neighboring fields, such as

  15. SBNR (Signed Binary Number Representations) Digital Signal Processor Architecture.

    Science.gov (United States)

    1987-05-31

    from Con’trolling Office) IS. SECURITY CLASS. (of Chia report) Defense Contract Audit Agency UNCLASSIFIED Denver Branch Office 158. OZCL ASSi FICATION...is shifted up and We contend that it holds promise of many new commercially written into the next higher row of PFe as each new video line available...presented at the nt. density (number of PFe per wafer) and in computational power, Con(. on Parallel Processing, 1985 we can also expect distribution of

  16. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

    Directory of Open Access Journals (Sweden)

    Anuj Sharma

    2015-01-01

    F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.

  17. Towards a Systematic Exploration of the Optimization Space for Many-Core Processors

    NARCIS (Netherlands)

    Fang, J.

    2014-01-01

    The architecture diversity of many-core processors - with their different types of cores, and memory hierarchies - makes the old model of reprogramming every application for every platform infeasible. Therefore, inter-platform portability has become a desirable feature of programming models. While

  18. Towards a Systematic Exploration of the Optimization Space for Many-Core Processors

    NARCIS (Netherlands)

    Fang, J.

    2014-01-01

    The architecture diversity of many-core processors - with their different types of cores, and memory hierarchies - makes the old model of reprogramming every application for every platform infeasible. Therefore, inter-platform portability has become a desirable feature of programming models. While f

  19. Architectural Prototyping

    DEFF Research Database (Denmark)

    Bardram, Jakob; Christensen, Henrik Bærbak; Hansen, Klaus Marius

    2004-01-01

    ' concerns with respect to a system under development. An architectural prototype is primarily a learning and communication vehicle used to explore and experiment with alternative architectural styles, features, and patterns in order to balance different architectural qualities. The use of architectural......A major part of software architecture design is learning how specific architectural designs balance the concerns of stakeholders. We explore the notion of "architectural prototypes", correspondingly architectural prototyping, as a means of using executable prototypes to investigate stakeholders...

  20. CMOS VLSI Hyperbolic Tangent Function & its Derivative Circuits for Neuron Implementation

    Directory of Open Access Journals (Sweden)

    Hussein CHIBLE,

    2013-10-01

    Full Text Available The hyperbolic tangent function and its derivative are key essential element in analog signal processing and especially in analog VLSI implementation of neuron of artificial neural networks. The main conditions of these types of circuits are the small silicon area, and the low power consumption. The objective of this paper is to study and design CMOS VLSI hyperbolic tangent function and its derivative circuit for neural network implementation. A circuit is designed and the results are presented