Rational calculation accuracy in acousto-optical matrix-vector processor
Oparin, V. V.; Tigin, Dmitry V.
1994-01-01
The high speed of parallel computations for a comparatively small-size processor and acceptable power consumption makes the usage of acousto-optic matrix-vector multiplier (AOMVM) attractive for processing of large amounts of information in real time. The limited accuracy of computations is an essential disadvantage of such a processor. The reduced accuracy requirements allow for considerable simplification of the AOMVM architecture and the reduction of the demands on its components.
Matrix-vector multiplication using digital partitioning for more accurate optical computing
Gary, C. K.
1992-01-01
Digital partitioning offers a flexible means of increasing the accuracy of an optical matrix-vector processor. This algorithm can be implemented with the same architecture required for a purely analog processor, which gives optical matrix-vector processors the ability to perform high-accuracy calculations at speeds comparable with or greater than electronic computers as well as the ability to perform analog operations at a much greater speed. Digital partitioning is compared with digital multiplication by analog convolution, residue number systems, and redundant number representation in terms of the size and the speed required for an equivalent throughput as well as in terms of the hardware requirements. Digital partitioning and digital multiplication by analog convolution are found to be the most efficient alogrithms if coding time and hardware are considered, and the architecture for digital partitioning permits the use of analog computations to provide the greatest throughput for a single processor.
Perlee, Caroline J.; Casasent, David P.
1990-09-01
Error sources in an optical matrix-vector processor are analyzed in terms of their effect on the performance of the algorithms used to solve a set of nonlinear and linear algebraic equations. A direct and an iterative algorithm are used to solve a nonlinear time-dependent case-study from computational fluid dynamics. A simulator which emulates the data flow and number representation of the OLAP is used to studs? these error effects. The ability of each algorithm to tolerate or correct the error sources is quantified. These results are extended to the general case of solving nonlinear and linear algebraic equations on the optical system.
A high-accuracy optical linear algebra processor for finite element applications
Casasent, D.; Taylor, B. K.
1984-01-01
Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced.
Recursive Matrix Inverse Update On An Optical Processor
Casasent, David P.; Baranoski, Edward J.
1988-02-01
A high accuracy optical linear algebraic processor (OLAP) using the digital multiplication by analog convolution (DMAC) algorithm is described for use in an efficient matrix inverse update algorithm with speed and accuracy advantages. The solution of the parameters in the algorithm are addressed and the advantages of optical over digital linear algebraic processors are advanced.
Matrix preconditioning: a robust operation for optical linear algebra processors.
Ghosh, A; Paparao, P
1987-07-15
Analog electrooptical processors are best suited for applications demanding high computational throughput with tolerance for inaccuracies. Matrix preconditioning is one such application. Matrix preconditioning is a preprocessing step for reducing the condition number of a matrix and is used extensively with gradient algorithms for increasing the rate of convergence and improving the accuracy of the solution. In this paper, we describe a simple parallel algorithm for matrix preconditioning, which can be implemented efficiently on a pipelined optical linear algebra processor. From the results of our numerical experiments we show that the efficacy of the preconditioning algorithm is affected very little by the errors of the optical system.
Integrated optic vector-matrix multiplier
Watts, Michael R [Albuquerque, NM
2011-09-27
A vector-matrix multiplier is disclosed which uses N different wavelengths of light that are modulated with amplitudes representing elements of an N.times.1 vector and combined to form an input wavelength-division multiplexed (WDM) light stream. The input WDM light stream is split into N streamlets from which each wavelength of the light is individually coupled out and modulated for a second time using an input signal representing elements of an M.times.N matrix, and is then coupled into an output waveguide for each streamlet to form an output WDM light stream which is detected to generate a product of the vector and matrix. The vector-matrix multiplier can be formed as an integrated optical circuit using either waveguide amplitude modulators or ring resonator amplitude modulators.
Design and experimental verification for optical module of optical vector-matrix multiplier.
Zhu, Weiwei; Zhang, Lei; Lu, Yangyang; Zhou, Ping; Yang, Lin
2013-06-20
Optical computing is a new method to implement signal processing functions. The multiplication between a vector and a matrix is an important arithmetic algorithm in the signal processing domain. The optical vector-matrix multiplier (OVMM) is an optoelectronic system to carry out this operation, which consists of an electronic module and an optical module. In this paper, we propose an optical module for OVMM. To eliminate the cross talk and make full use of the optical elements, an elaborately designed structure that involves spherical lenses and cylindrical lenses is utilized in this optical system. The optical design software package ZEMAX is used to optimize the parameters and simulate the whole system. Finally, experimental data is obtained through experiments to evaluate the overall performance of the system. The results of both simulation and experiment indicate that the system constructed can implement the multiplication between a matrix with dimensions of 16 by 16 and a vector with a dimension of 16 successfully.
Habiby, Sarry F.
1987-01-01
The design and implementation of a digital (numerical) optical matrix-vector multiplier are presented. The objective is to demonstrate the operation of an optical processor designed to minimize computation time in performing a practical computing application. This is done by using the large array of processing elements in a Hughes liquid crystal light valve, and relying on the residue arithmetic representation, a holographic optical memory, and position coded optical look-up tables. In the design, all operations are performed in effectively one light valve response time regardless of matrix size. The features of the design allowing fast computation include the residue arithmetic representation, the mapping approach to computation, and the holographic memory. In addition, other features of the work include a practical light valve configuration for efficient polarization control, a model for recording multiple exposures in silver halides with equal reconstruction efficiency, and using light from an optical fiber for a reference beam source in constructing the hologram. The design can be extended to implement larger matrix arrays without increasing computation time.
Acoustooptic linear algebra processors - Architectures, algorithms, and applications
Casasent, D.
1984-01-01
Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
International Nuclear Information System (INIS)
Underwood, D.
1986-01-01
Simple examples of finding tracks by Fourier transform with filter or correlation function are presented. Possibilities for using this method in more complicated real situations and the processing times which might be achieved are discussed. The method imitates the simplest examples in the literature on optical pattern recognition and optical processing. The possible benefits of the method are in speed of processing in the optical Fourier transform wherein an entire picture is processed simultaneously. The speed of a computer vector processor may be competitive with present electro-optical devices. 2 refs., 6 figs
Multithreading in vector processors
Evangelinos, Constantinos; Kim, Changhoan; Nair, Ravi
2018-01-16
In one embodiment, a system includes a processor having a vector processing mode and a multithreading mode. The processor is configured to operate on one thread per cycle in the multithreading mode. The processor includes a program counter register having a plurality of program counters, and the program counter register is vectorized. Each program counter in the program counter register represents a distinct corresponding thread of a plurality of threads. The processor is configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle.
Real-time optical laboratory solution of parabolic differential equations
Casasent, David; Jackson, James
1988-01-01
An optical laboratory matrix-vector processor is used to solve parabolic differential equations (the transient diffusion equation with two space variables and time) by an explicit algorithm. This includes optical matrix-vector nonbase-2 encoded laboratory data, the combination of nonbase-2 and frequency-multiplexed data on such processors, a high-accuracy optical laboratory solution of a partial differential equation, new data partitioning techniques, and a discussion of a multiprocessor optical matrix-vector architecture.
Dual-scale topology optoelectronic processor.
Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H
1991-12-15
The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.
Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors
DEFF Research Database (Denmark)
Liu, Weifeng; Vinter, Brian
2015-01-01
of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over...
Optical Array Processor: Laboratory Results
Casasent, David; Jackson, James; Vaerewyck, Gerard
1987-01-01
A Space Integrating (SI) Optical Linear Algebra Processor (OLAP) is described and laboratory results on its performance in several practical engineering problems are presented. The applications include its use in the solution of a nonlinear matrix equation for optimal control and a parabolic Partial Differential Equation (PDE), the transient diffusion equation with two spatial variables. Frequency-multiplexed, analog and high accuracy non-base-two data encoding are used and discussed. A multi-processor OLAP architecture is described and partitioning and data flow issues are addressed.
A design of a computer complex including vector processors
International Nuclear Information System (INIS)
Asai, Kiyoshi
1982-12-01
We, members of the Computing Center, Japan Atomic Energy Research Institute have been engaged for these six years in the research of adaptability of vector processing to large-scale nuclear codes. The research has been done in collaboration with researchers and engineers of JAERI and a computer manufacturer. In this research, forty large-scale nuclear codes were investigated from the viewpoint of vectorization. Among them, twenty-six codes were actually vectorized and executed. As the results of the investigation, it is now estimated that about seventy percents of nuclear codes and seventy percents of our total amount of CPU time of JAERI are highly vectorizable. Based on the data obtained by the investigation, (1)currently vectorizable CPU time, (2)necessary number of vector processors, (3)necessary manpower for vectorization of nuclear codes, (4)computing speed, memory size, number of parallel 1/0 paths, size and speed of 1/0 buffer of vector processor suitable for our applications, (5)necessary software and operational policy for use of vector processors are discussed, and finally (6)a computer complex including vector processors is presented in this report. (author)
An efficient parallel algorithm for matrix-vector multiplication
Energy Technology Data Exchange (ETDEWEB)
Hendrickson, B.; Leland, R.; Plimpton, S.
1993-03-01
The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in the well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.
Vector and parallel processors in computational science
International Nuclear Information System (INIS)
Duff, I.S.; Reid, J.K.
1985-01-01
These proceedings contain the articles presented at the named conference. These concern hardware and software for vector and parallel processors, numerical methods and algorithms for the computation on such processors, as well as applications of such methods to different fields of physics and related sciences. See hints under the relevant topics. (HSI)
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY
Energy Technology Data Exchange (ETDEWEB)
Barhen, Jacob [ORNL; Kerekes, Ryan A [ORNL; ST Charles, Jesse Lee [ORNL; Buckner, Mark A [ORNL
2008-01-01
performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY
International Nuclear Information System (INIS)
Barhen, Jacob; Kerekes, Ryan A.; St Charles, Jesse Lee; Buckner, Mark A.
2008-01-01
performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R and D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.
Accuracies Of Optical Processors For Adaptive Optics
Downie, John D.; Goodman, Joseph W.
1992-01-01
Paper presents analysis of accuracies and requirements concerning accuracies of optical linear-algebra processors (OLAP's) in adaptive-optics imaging systems. Much faster than digital electronic processor and eliminate some residual distortion. Question whether errors introduced by analog processing of OLAP overcome advantage of greater speed. Paper addresses issue by presenting estimate of accuracy required in general OLAP that yields smaller average residual aberration of wave front than digital electronic processor computing at given speed.
Matrix-Vector Based Fast Fourier Transformations on SDR Architectures
Directory of Open Access Journals (Sweden)
Y. He
2008-05-01
Full Text Available Today Discrete Fourier Transforms (DFTs are applied in various radio standards based on OFDM (Orthogonal Frequency Division Multiplex. It is important to gain a fast computational speed for the DFT, which is usually achieved by using specialized Fast Fourier Transform (FFT engines. However, in face of the Software Defined Radio (SDR development, more general (parallel processor architectures are often desirable, which are not tailored to FFT computations. Therefore, alternative approaches are required to reduce the complexity of the DFT. Starting from a matrix-vector based description of the FFT idea, we will present different factorizations of the DFT matrix, which allow a reduction of the complexity that lies between the original DFT and the minimum FFT complexity. The computational complexities of these factorizations and their suitability for implementation on different processor architectures are investigated.
Habiby, Sarry F.; Collins, Stuart A., Jr.
1987-01-01
The design and implementation of a digital (numerical) optical matrix-vector multiplier are presented. A Hughes liquid crystal light valve, the residue arithmetic representation, and a holographic optical memory are used to construct position coded optical look-up tables. All operations are performed in effectively one light valve response time with a potential for a high information density.
Vectorization of phase space Monte Carlo code in FACOM vector processor VP-200
International Nuclear Information System (INIS)
Miura, Kenichi
1986-01-01
This paper describes the vectorization techniques for Monte Carlo codes in Fujitsu's Vector Processor System. The phase space Monte Carlo code FOWL is selected as a benchmark, and scalar and vector performances are compared. The vectorized kernel Monte Carlo routine which contains heavily nested IF tests runs up to 7.9 times faster in vector mode than in scalar mode. The overall performance improvement of the vectorized FOWL code over the original scalar code reaches 3.3. The results of this study strongly indicate that supercomputer can be a powerful tool for Monte Carlo simulations in high energy physics. (Auth.)
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems
Downie, John D.; Goodman, Joseph W.
1989-10-01
The accuracy requirements of optical processors in adaptive optics systems are determined by estimating the required accuracy in a general optical linear algebra processor (OLAP) that results in a smaller average residual aberration than that achieved with a conventional electronic digital processor with some specific computation speed. Special attention is given to an error analysis of a general OLAP with regard to the residual aberration that is created in an adaptive mirror system by the inaccuracies of the processor, and to the effect of computational speed of an electronic processor on the correction. Results are presented on the ability of an OLAP to compete with a digital processor in various situations.
Parallel computation for distributed parameter system-from vector processors to Adena computer
Energy Technology Data Exchange (ETDEWEB)
Nogi, T
1983-04-01
Research on advanced parallel hardware and software architectures for very high-speed computation deserves and needs more support and attention to fulfil its promise. Novel architectures for parallel processing are being made ready. Architectures for parallel processing can be roughly divided into two groups. One is a vector processor in which a single central processing unit involves multiple vector-arithmetic registers. The other is a processor array in which slave processors are connected to a host processor to perform parallel computation. In this review, the concept and data structure of the Adena (alternating-direction edition nexus array) architecture, which is conformable to distributed-parameter simulation algorithms, are described. 5 references.
Performance of direct and iterative algorithms on an optical systolic processor
Ghosh, A. K.; Casasent, D.; Neuman, C. P.
1985-11-01
The frequency-multiplexed optical linear algebra processor (OLAP) is treated in detail with attention to its performance in the solution of systems of linear algebraic equations (LAEs). General guidelines suitable for most OLAPs, including digital-optical processors, are advanced concerning system and component error source models, guidelines for appropriate use of direct and iterative algorithms, the dominant error sources, and the effect of multiple simultaneous error sources. Specific results are advanced on the quantitative performance of both direct and iterative algorithms in the solution of systems of LAEs and in the solution of nonlinear matrix equations. Acoustic attenuation is found to dominate iterative algorithms and detector noise to dominate direct algorithms. The effect of multiple spatial errors is found to be additive. A theoretical expression for the amount of acoustic attenuation allowed is advanced and verified. Simulations and experimental data are included.
Vector and parallel processors in computational science
International Nuclear Information System (INIS)
Duff, I.S.; Reid, J.K.
1985-01-01
This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems
Downie, John D.
1990-01-01
A ground-based adaptive optics imaging telescope system attempts to improve image quality by detecting and correcting for atmospherically induced wavefront aberrations. The required control computations during each cycle will take a finite amount of time. Longer time delays result in larger values of residual wavefront error variance since the atmosphere continues to change during that time. Thus an optical processor may be well-suited for this task. This paper presents a study of the accuracy requirements in a general optical processor that will make it competitive with, or superior to, a conventional digital computer for the adaptive optics application. An optimization of the adaptive optics correction algorithm with respect to an optical processor's degree of accuracy is also briefly discussed.
Photonics and Fiber Optics Processor Lab
Federal Laboratory Consortium — The Photonics and Fiber Optics Processor Lab develops, tests and evaluates high speed fiber optic network components as well as network protocols. In addition, this...
Optical backplane interconnect switch for data processors and computers
Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.
1989-01-01
An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.
Inverse Operation of Four-dimensional Vector Matrix
H J Bao; A J Sang; H X Chen
2011-01-01
This is a new series of study to define and prove multidimensional vector matrix mathematics, which includes four-dimensional vector matrix determinant, four-dimensional vector matrix inverse and related properties. There are innovative concepts of multi-dimensional vector matrix mathematics created by authors with numerous applications in engineering, math, video conferencing, 3D TV, and other fields.
International Nuclear Information System (INIS)
Dubois, J.; Calvin, Ch.; Dubois, J.; Petiton, S.
2011-01-01
This paper presents a parallelized hybrid single-vector Arnoldi algorithm for computing approximations to Eigen-pairs of a nonsymmetric matrix. We are interested in the use of accelerators and multi-core units to speed up the Arnoldi process. The main goal is to propose a parallel version of the Arnoldi solver, which can efficiently use multiple multi-core processors or multiple graphics processing units (GPUs) in a mixed coarse and fine grain fashion. In the proposed algorithms, this is achieved by an auto-tuning of the matrix vector product before starting the Arnoldi Eigen-solver as well as the reorganization of the data and global communications so that communication time is reduced. The execution time, performance, and scalability are assessed with well-known dense and sparse test matrices on multiple Nehalems, GT200 NVidia Tesla, and next generation Fermi Tesla. With one processor, we see a performance speedup of 2 to 3x when using all the physical cores, and a total speedup of 2 to 8x when adding a GPU to this multi-core unit, and hence a speedup of 4 to 24x compared to the sequential solver. (authors)
Ring-array processor distribution topology for optical interconnects
Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.
1992-01-01
The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.
Stokes-vector and Mueller-matrix polarimetry [Invited].
Azzam, R M A
2016-07-01
This paper reviews the current status of instruments for measuring the full 4×1 Stokes vector S, which describes the state of polarization (SOP) of totally or partially polarized light, and the 4×4 Mueller matrix M, which determines how the SOP is transformed as light interacts with a material sample or an optical element or system. The principle of operation of each instrument is briefly explained by using the Stokes-Mueller calculus. The development of fast, automated, imaging, and spectroscopic instruments over the last 50 years has greatly expanded the range of applications of optical polarimetry and ellipsometry in almost every branch of science and technology. Current challenges and future directions of this important branch of optics are also discussed.
Scientific programming on massively parallel processor CP-PACS
International Nuclear Information System (INIS)
Boku, Taisuke
1998-01-01
The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)
Accuracy Limitations in Optical Linear Algebra Processors
Batsell, Stephen Gordon
1990-01-01
One of the limiting factors in applying optical linear algebra processors (OLAPs) to real-world problems has been the poor achievable accuracy of these processors. Little previous research has been done on determining noise sources from a systems perspective which would include noise generated in the multiplication and addition operations, noise from spatial variations across arrays, and from crosstalk. In this dissertation, we propose a second-order statistical model for an OLAP which incorporates all these system noise sources. We now apply this knowledge to determining upper and lower bounds on the achievable accuracy. This is accomplished by first translating the standard definition of accuracy used in electronic digital processors to analog optical processors. We then employ our second-order statistical model. Having determined a general accuracy equation, we consider limiting cases such as for ideal and noisy components. From the ideal case, we find the fundamental limitations on improving analog processor accuracy. From the noisy case, we determine the practical limitations based on both device and system noise sources. These bounds allow system trade-offs to be made both in the choice of architecture and in individual components in such a way as to maximize the accuracy of the processor. Finally, by determining the fundamental limitations, we show the system engineer when the accuracy desired can be achieved from hardware or architecture improvements and when it must come from signal pre-processing and/or post-processing techniques.
Vector Array Processor software development. Final report, September 1, 1978-October 14, 1979
International Nuclear Information System (INIS)
Drummond, W.E.
1979-10-01
During the performance period of this contract, from September 1, 1978, to October 14, 1979 (including an extension from April 30, 1979, to October 14, 1979), Austin Research Associates developed and debugged for assembly errors a complete library of utility programs for a Vector Array Processor. The Vector Array Processor is a modified AP-120B array processor having four large serial memories with data paths to the arithmetic pipelines. This hardware arrangement allows the array processor to operate with a speed of 12 million floating point operations per second for many classes of calculations with internal data flow rates up to 30 million words per second. The configuration of the data paths is under program control, and a general setup subroutine was written. The utility subroutines are available to the user through simple Fortran call statements from the host. Anticipated completion of hardware modifications is in the spring of 1980, and then the machine will be available to the Department of Energy for evaluation of the utilities. The option of future development of a compiler for exploitation of the utility library by a high-level language was explored. 9 figures, 7 tables
Optical chirp z-transform processor with a simplified architecture.
Ngo, Nam Quoc
2014-12-29
Using a simplified chirp z-transform (CZT) algorithm based on the discrete-time convolution method, this paper presents the synthesis of a simplified architecture of a reconfigurable optical chirp z-transform (OCZT) processor based on the silica-based planar lightwave circuit (PLC) technology. In the simplified architecture of the reconfigurable OCZT, the required number of optical components is small and there are no waveguide crossings which make fabrication easy. The design of a novel type of optical discrete Fourier transform (ODFT) processor as a special case of the synthesized OCZT is then presented to demonstrate its effectiveness. The designed ODFT can be potentially used as an optical demultiplexer at the receiver of an optical fiber orthogonal frequency division multiplexing (OFDM) transmission system.
Reducing adaptive optics latency using Xeon Phi many-core processors
Barr, David; Basden, Alastair; Dipper, Nigel; Schwartz, Noah
2015-11-01
The next generation of Extremely Large Telescopes (ELTs) for astronomy will rely heavily on the performance of their adaptive optics (AO) systems. Real-time control is at the heart of the critical technologies that will enable telescopes to deliver the best possible science and will require a very significant extrapolation from current AO hardware existing for 4-10 m telescopes. Investigating novel real-time computing architectures and testing their eligibility against anticipated challenges is one of the main priorities of technology development for the ELTs. This paper investigates the suitability of the Intel Xeon Phi, which is a commercial off-the-shelf hardware accelerator. We focus on wavefront reconstruction performance, implementing a straightforward matrix-vector multiplication (MVM) algorithm. We present benchmarking results of the Xeon Phi on a real-time Linux platform, both as a standalone processor and integrated into an existing real-time controller (RTC). Performance of single and multiple Xeon Phis are investigated. We show that this technology has the potential of greatly reducing the mean latency and variations in execution time (jitter) of large AO systems. We present both a detailed performance analysis of the Xeon Phi for a typical E-ELT first-light instrument along with a more general approach that enables us to extend to any AO system size. We show that systematic and detailed performance analysis is an essential part of testing novel real-time control hardware to guarantee optimal science results.
Image Matrix Processor for Volumetric Computations Final Report CRADA No. TSB-1148-95
Energy Technology Data Exchange (ETDEWEB)
Roberson, G. Patrick [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Browne, Jolyon [Advanced Research & Applications Corporation, Sunnyvale, CA (United States)
2018-01-22
The development of an Image Matrix Processor (IMP) was proposed that would provide an economical means to perform rapid ray-tracing processes on volume "Giga Voxel" data sets. This was a multi-phased project. The objective of the first phase of the IMP project was to evaluate the practicality of implementing a workstation-based Image Matrix Processor for use in volumetric reconstruction and rendering using hardware simulation techniques. Additionally, ARACOR and LLNL worked together to identify and pursue further funding sources to complete a second phase of this project.
Eisenman, Richard L
2005-01-01
This outstanding text and reference applies matrix ideas to vector methods, using physical ideas to illustrate and motivate mathematical concepts but employing a mathematical continuity of development rather than a physical approach. The author, who taught at the U.S. Air Force Academy, dispenses with the artificial barrier between vectors and matrices--and more generally, between pure and applied mathematics.Motivated examples introduce each idea, with interpretations of physical, algebraic, and geometric contexts, in addition to generalizations to theorems that reflect the essential structur
Fractal vector optical fields.
Pan, Yue; Gao, Xu-Zhen; Cai, Meng-Qiang; Zhang, Guan-Lin; Li, Yongnan; Tu, Chenghou; Wang, Hui-Tian
2016-07-15
We introduce the concept of a fractal, which provides an alternative approach for flexibly engineering the optical fields and their focal fields. We propose, design, and create a new family of optical fields-fractal vector optical fields, which build a bridge between the fractal and vector optical fields. The fractal vector optical fields have polarization states exhibiting fractal geometry, and may also involve the phase and/or amplitude simultaneously. The results reveal that the focal fields exhibit self-similarity, and the hierarchy of the fractal has the "weeding" role. The fractal can be used to engineer the focal field.
Simple and practical approach for computing the ray Hessian matrix in geometrical optics.
Lin, Psang Dain
2018-02-01
A method is proposed for simplifying the computation of the ray Hessian matrix in geometrical optics by replacing the angular variables in the system variable vector with their equivalent cosine and sine functions. The variable vector of a boundary surface is similarly defined in such a way as to exclude any angular variables. It is shown that the proposed formulations reduce the computation time of the Hessian matrix by around 10 times compared to the previous method reported by the current group in Advanced Geometrical Optics (2016). Notably, the method proposed in this study involves only polynomial differentiation, i.e., trigonometric function calls are not required. As a consequence, the computation complexity is significantly reduced. Five illustrative examples are given. The first three examples show that the proposed method is applicable to the determination of the Hessian matrix for any pose matrix, irrespective of the order in which the rotation and translation motions are specified. The last two examples demonstrate the use of the proposed Hessian matrix in determining the axial and lateral chromatic aberrations of a typical optical system.
Sn transport calculations on vector and parallel processors
International Nuclear Information System (INIS)
Rhoades, W.A.; Childs, R.L.
1987-01-01
The transport of radiation from the source to the location of people or equipment gives rise to some of the most challenging of calculations. A problem may involve as many as a billion unknowns, each evaluated several times to resolve interdependence. Such calculations run many hours on a Cray computer, and a typical study involves many such calculations. This paper will discuss the steps taken to vectorize the DOT code, which solves transport problems in two space dimensions (2-D); the extension of this code to 3-D; and the plans for extension to parallel processors
Acousto-Optical Vector Matrix Product Processor: Implementation Issues
1989-04-25
power by a factor of 3.8. The acoustic velocity in longitudinal TeO2 is 4200 m/s, almost the same as the 4100 m/s acoustic velocity in dense flint glass ...field via an Interaction Model AOD150 dense flint glass Bragg Cell. The cell’s specifications are listed in the table below. BRAGG CELL SPECIFICATIONS...39 ns intervals). Since the speed of sound in dense flint glass is 4100 m/s, the acoustic field generated in a 10 As interval is distributed over a 4.1
Elliptic-symmetry vector optical fields.
Pan, Yue; Li, Yongnan; Li, Si-Min; Ren, Zhi-Cheng; Kong, Ling-Jun; Tu, Chenghou; Wang, Hui-Tian
2014-08-11
We present in principle and demonstrate experimentally a new kind of vector fields: elliptic-symmetry vector optical fields. This is a significant development in vector fields, as this breaks the cylindrical symmetry and enriches the family of vector fields. Due to the presence of an additional degrees of freedom, which is the interval between the foci in the elliptic coordinate system, the elliptic-symmetry vector fields are more flexible than the cylindrical vector fields for controlling the spatial structure of polarization and for engineering the focusing fields. The elliptic-symmetry vector fields can find many specific applications from optical trapping to optical machining and so on.
Optical Finite Element Processor
Casasent, David; Taylor, Bradley K.
1986-01-01
A new high-accuracy optical linear algebra processor (OLAP) with many advantageous features is described. It achieves floating point accuracy, handles bipolar data by sign-magnitude representation, performs LU decomposition using only one channel, easily partitions and considers data flow. A new application (finite element (FE) structural analysis) for OLAPs is introduced and the results of a case study presented. Error sources in encoded OLAPs are addressed for the first time. Their modeling and simulation are discussed and quantitative data are presented. Dominant error sources and the effects of composite error sources are analyzed.
Parallel Processor for 3D Recovery from Optical Flow
Directory of Open Access Journals (Sweden)
Jose Hugo Barron-Zambrano
2009-01-01
Full Text Available 3D recovery from motion has received a major effort in computer vision systems in the recent years. The main problem lies in the number of operations and memory accesses to be performed by the majority of the existing techniques when translated to hardware or software implementations. This paper proposes a parallel processor for 3D recovery from optical flow. Its main feature is the maximum reuse of data and the low number of clock cycles to calculate the optical flow, along with the precision with which 3D recovery is achieved. The results of the proposed architecture as well as those from processor synthesis are presented.
Energy Technology Data Exchange (ETDEWEB)
McLay, R.T.; Carey, G.F.
1996-12-31
In this study we consider parallel solution of sparse linear systems arising from discretized PDE`s. As part of our continuing work on our parallel PCG Solver package, we have made improvements in two areas. The first is improving the performance of the matrix-vector product. Here on regular finite-difference grids, we are able to use the cache memory more efficiently for smaller domains or where there are multiple degrees of freedom. The second problem of interest in the present work is the construction of preconditioners in the context of the parallel PCG solver we are developing. Here the problem is partitioned over a set of processors subdomains and the matrix-vector product for PCG is carried out in parallel for overlapping grid subblocks. For problems of scaled speedup, the actual rate of convergence of the unpreconditioned system deteriorates as the mesh is refined. Multigrid and subdomain strategies provide a logical approach to resolving the problem. We consider the parallel trade-offs between communication and computation and provide a complexity analysis of a representative algorithm. Some preliminary calculations using the parallel package and comparisons with other preconditioners are provided together with parallel performance results.
Matrix elements of a hyperbolic vector operator under SO(2,1)
International Nuclear Information System (INIS)
Zettili, N.; Boukahil, A.
2003-01-01
We deal here with the use of Wigner–Eckart type arguments to calculate the matrix elements of a hyperbolic vector operator V-vector by expressing them in terms of reduced matrix elements. In particular, we focus on calculating the matrix elements of this vector operator within the basis of the hyperbolic angular momentum T-vector whose components T-vector 1 , T-vector 2 , T-vector 3 satisfy an SO(2,1) Lie algebra. We show that the commutation rules between the components of V-vector and T-vector can be inferred from the algebra of ordinary angular momentum. We then show that, by analogy to the Wigner–Eckart theorem, we can calculate the matrix elements of V-vector within a representation where T-vector 2 and T-vector 3 are jointly diagonal. (author)
Finding a Hadamard matrix by simulated annealing of spin vectors
Bayu Suksmono, Andriyan
2017-05-01
Reformulation of a combinatorial problem into optimization of a statistical-mechanics system enables finding a better solution using heuristics derived from a physical process, such as by the simulated annealing (SA). In this paper, we present a Hadamard matrix (H-matrix) searching method based on the SA on an Ising model. By equivalence, an H-matrix can be converted into a seminormalized Hadamard (SH) matrix, whose first column is unit vector and the rest ones are vectors with equal number of -1 and +1 called SH-vectors. We define SH spin vectors as representation of the SH vectors, which play a similar role as the spins on Ising model. The topology of the lattice is generalized into a graph, whose edges represent orthogonality relationship among the SH spin vectors. Starting from a randomly generated quasi H-matrix Q, which is a matrix similar to the SH-matrix without imposing orthogonality, we perform the SA. The transitions of Q are conducted by random exchange of {+, -} spin-pair within the SH-spin vectors that follow the Metropolis update rule. Upon transition toward zeroth energy, the Q-matrix is evolved following a Markov chain toward an orthogonal matrix, at which the H-matrix is said to be found. We demonstrate the capability of the proposed method to find some low-order H-matrices, including the ones that cannot trivially be constructed by the Sylvester method.
Noise limitations in optical linear algebra processors.
Batsell, S G; Jong, T L; Walkup, J F; Krile, T F
1990-05-10
A general statistical noise model is presented for optical linear algebra processors. A statistical analysis which includes device noise, the multiplication process, and the addition operation is undertaken. We focus on those processes which are architecturally independent. Finally, experimental results which verify the analytical predictions are also presented.
Optical linear algebra processors - Architectures and algorithms
Casasent, David
1986-01-01
Attention is given to the component design and optical configuration features of a generic optical linear algebra processor (OLAP) architecture, as well as the large number of OLAP architectures, number representations, algorithms and applications encountered in current literature. Number-representation issues associated with bipolar and complex-valued data representations, high-accuracy (including floating point) performance, and the base or radix to be employed, are discussed, together with case studies on a space-integrating frequency-multiplexed architecture and a hybrid space-integrating and time-integrating multichannel architecture.
Directory of Open Access Journals (Sweden)
Pablo Soto-Quiros
2015-01-01
Full Text Available This paper presents a parallel implementation of a kind of discrete Fourier transform (DFT: the vector-valued DFT. The vector-valued DFT is a novel tool to analyze the spectra of vector-valued discrete-time signals. This parallel implementation is developed in terms of a mathematical framework with a set of block matrix operations. These block matrix operations contribute to analysis, design, and implementation of parallel algorithms in multicore processors. In this work, an implementation and experimental investigation of the mathematical framework are performed using MATLAB with the Parallel Computing Toolbox. We found that there is advantage to use multicore processors and a parallel computing environment to minimize the high execution time. Additionally, speedup increases when the number of logical processors and length of the signal increase.
On the estimation of matrix elements for optical transitions in semiconductors
International Nuclear Information System (INIS)
Hassan, A.R.
1992-09-01
A semi-empirical method is used to calculate the numerical values of the interband momentum matrix elements of the allowed optical transitions in semiconductors. This method is based on the evaluation of the ratio of the two-photon and one-photon absorption coefficients and the compare the result with the corresponding experimental values in a number of semiconductors both for direct and indirect transition processes. The numerical values of the momentum matrix elements are compared with the convenient theoretical calculations available. The result is found to agree fairly well with the corresponding values computed using the k-vector · p-vector perturbation theory. (author). 19 refs, 2 figs, 2 tabs
An efficient optical architecture for sparsely connected neural networks
Hine, Butler P., III; Downie, John D.; Reid, Max B.
1990-01-01
An architecture for general-purpose optical neural network processor is presented in which the interconnections and weights are formed by directing coherent beams holographically, thereby making use of the space-bandwidth products of the recording medium for sparsely interconnected networks more efficiently that the commonly used vector-matrix multiplier, since all of the hologram area is in use. An investigation is made of the use of computer-generated holograms recorded on such updatable media as thermoplastic materials, in order to define the interconnections and weights of a neural network processor; attention is given to limits on interconnection densities, diffraction efficiencies, and weighing accuracies possible with such an updatable thin film holographic device.
Space and frequency-multiplexed optical linear algebra processor - Fabrication and initial tests
Casasent, D.; Jackson, J.
1986-01-01
A new optical linear algebra processor architecture is described. Space and frequency-multiplexing are used to accommodate bipolar and complex-valued data. A fabricated laboratory version of this processor is described, the electronic support system used is discussed, and initial test data obtained on it are presented.
Vector and parallel processors in computational science. Proceedings
Energy Technology Data Exchange (ETDEWEB)
Duff, I S; Reid, J K
1985-01-01
This volume contains papers from most of the invited talks and from several of the contributed talks and poster sessions presented at VAPP II. The contents present an extensive coverage of all important aspects of vector and parallel processors, including hardware, languages, numerical algorithms and applications. The topics covered include descriptions of new machines (both research and commercial machines), languages and software aids, and general discussions of whole classes of machines and their uses. Numerical methods papers include Monte Carlo algorithms, iterative and direct methods for solving large systems, finite elements, optimization, random number generation and mathematical software. The specific applications covered include neutron diffusion calculations, molecular dynamics, weather forecasting, lattice gauge calculations, fluid dynamics, flight simulation, cartography, image processing and cryptography. Most machines and architecture types are being used for these applications. many refs.
Vectorization of KENO IV code and an estimate of vector-parallel processing
International Nuclear Information System (INIS)
Asai, Kiyoshi; Higuchi, Kenji; Katakura, Jun-ichi; Kurita, Yutaka.
1986-10-01
The multi-group criticality safety code KENO IV has been vectorized and tested on FACOM VP-100 vector processor. At first the vectorized KENO IV on a scalar processor became slower than the original one by a factor of 1.4 because of the overhead introduced by the vectorization. Making modifications of algorithms and techniques for vectorization, the vectorized version has become faster than the original one by a factor of 1.4 and 3.0 on the vector processor for sample problems of complex and simple geometries, respectively. For further speedup of the code, some improvements on compiler and hardware, especially on addition of Monte Carlo pipelines to the vector processor, are discussed. Finally a pipelined parallel processor system is proposed and its performance is estimated. (author)
Microlens array processor with programmable weight mask and direct optical input
Schmid, Volker R.; Lueder, Ernst H.; Bader, Gerhard; Maier, Gert; Siegordner, Jochen
1999-03-01
We present an optical feature extraction system with a microlens array processor. The system is suitable for online implementation of a variety of transforms such as the Walsh transform and DCT. Operating with incoherent light, our processor accepts direct optical input. Employing a sandwich- like architecture, we obtain a very compact design of the optical system. The key elements of the microlens array processor are a square array of 15 X 15 spherical microlenses on acrylic substrate and a spatial light modulator as transmissive mask. The light distribution behind the mask is imaged onto the pixels of a customized a-Si image sensor with adjustable gain. We obtain one output sample for each microlens image and its corresponding weight mask area as summation of the transmitted intensity within one sensor pixel. The resulting architecture is very compact and robust like a conventional camera lens while incorporating a high degree of parallelism. We successfully demonstrate a Walsh transform into the spatial frequency domain as well as the implementation of a discrete cosine transform with digitized gray values. We provide results showing the transformation performance for both synthetic image patterns and images of natural texture samples. The extracted frequency features are suitable for neural classification of the input image. Other transforms and correlations can be implemented in real-time allowing adaptive optical signal processing.
Directory of Open Access Journals (Sweden)
Xie Yiwei
2017-12-01
Full Text Available Integrated optical signal processors have been identified as a powerful engine for optical processing of microwave signals. They enable wideband and stable signal processing operations on miniaturized chips with ultimate control precision. As a promising application, such processors enables photonic implementations of reconfigurable radio frequency (RF filters with wide design flexibility, large bandwidth, and high-frequency selectivity. This is a key technology for photonic-assisted RF front ends that opens a path to overcoming the bandwidth limitation of current digital electronics. Here, the recent progress of integrated optical signal processors for implementing such RF filters is reviewed. We highlight the use of a low-loss, high-index-contrast stoichiometric silicon nitride waveguide which promises to serve as a practical material platform for realizing high-performance optical signal processors and points toward photonic RF filters with digital signal processing (DSP-level flexibility, hundreds-GHz bandwidth, MHz-band frequency selectivity, and full system integration on a chip scale.
Efficient implementations of block sparse matrix operations on shared memory vector machines
International Nuclear Information System (INIS)
Washio, T.; Maruyama, K.; Osoda, T.; Doi, S.; Shimizu, F.
2000-01-01
In this paper, we propose vectorization and shared memory-parallelization techniques for block-type random sparse matrix operations in finite element (FEM) applications. Here, a block corresponds to unknowns on one node in the FEM mesh and we assume that the block size is constant over the mesh. First, we discuss some basic vectorization ideas (the jagged diagonal (JAD) format and the segmented scan algorithm) for the sparse matrix-vector product. Then, we extend these ideas to the shared memory parallelization. After that, we show that the techniques can be applied not only to the sparse matrix-vector product but also to the sparse matrix-matrix product, the incomplete or complete sparse LU factorization and preconditioning. Finally, we report the performance evaluation results obtained on an NEC SX-4 shared memory vector machine for linear systems in some FEM applications. (author)
Lin, Yongping; Zhang, Xiyang; He, Youwu; Cai, Jianyong; Li, Hui
2018-02-01
The Jones matrix and the Mueller matrix are main tools to study polarization devices. The Mueller matrix can also be used for biological tissue research to get complete tissue properties, while the commercial optical coherence tomography system does not give relevant analysis function. Based on the LabVIEW, a near real time display method of Mueller matrix image of biological tissue is developed and it gives the corresponding phase retardant image simultaneously. A quarter-wave plate was placed at 45 in the sample arm. Experimental results of the two orthogonal channels show that the phase retardance based on incident light vector fixed mode and the Mueller matrix based on incident light vector dynamic mode can provide an effective analysis method of the existing system.
Parallel Sparse Matrix - Vector Product
DEFF Research Database (Denmark)
Alexandersen, Joe; Lazarov, Boyan Stefanov; Dammann, Bernd
This technical report contains a case study of a sparse matrix-vector product routine, implemented for parallel execution on a compute cluster with both pure MPI and hybrid MPI-OpenMP solutions. C++ classes for sparse data types were developed and the report shows how these class can be used...
Optical linear algebra processors - Noise and error-source modeling
Casasent, D.; Ghosh, A.
1985-01-01
The modeling of system and component noise and error sources in optical linear algebra processors (OLAPs) are considered, with attention to the frequency-multiplexed OLAP. General expressions are obtained for the output produced as a function of various component errors and noise. A digital simulator for this model is discussed.
Optical linear algebra processors: noise and error-source modeling.
Casasent, D; Ghosh, A
1985-06-01
The modeling of system and component noise and error sources in optical linear algebra processors (OLAP's) are considered, with attention to the frequency-multiplexed OLAP. General expressions are obtained for the output produced as a function of various component errors and noise. A digital simulator for this model is discussed.
High-speed vector-processing system of the MELCOM-COSMO 900II
Energy Technology Data Exchange (ETDEWEB)
Masuda, K; Mori, H; Fujikake, J; Sasaki, Y
1983-01-01
Progress in scientific and technical calculations has lead to a growing demand for high-speed vector calculations. Mitsubishi electric has developed an integrated array processor and automatic-vectorizing fortran compiler as an option for the MELCOM-COSMO 900II computer system. This facilitates the performance of vector calculations and matrix calculations, achieving significant gains in cost-effectiveness. The article outlines the high-speed vector system, includes discussion of compiler structuring, and cites examples of effective system application. 1 reference.
Fast sparse matrix-vector multiplication by partitioning and reordering
Yzelman, A.N.
2011-01-01
The thesis introduces a cache-oblivious method for the sparse matrix-vector (SpMV) multiplication, which is an important computational kernel in many applications. The method works by permuting rows and columns of the input matrix so that the resulting reordered matrix induces cache-friendly
Directory of Open Access Journals (Sweden)
Bérenger Bramas
2018-04-01
Full Text Available The sparse matrix-vector product (SpMV is a fundamental operation in many scientific applications from various fields. The High Performance Computing (HPC community has therefore continuously invested a lot of effort to provide an efficient SpMV kernel on modern CPU architectures. Although it has been shown that block-based kernels help to achieve high performance, they are difficult to use in practice because of the zero padding they require. In the current paper, we propose new kernels using the AVX-512 instruction set, which makes it possible to use a blocking scheme without any zero padding in the matrix memory storage. We describe mask-based sparse matrix formats and their corresponding SpMV kernels highly optimized in assembly language. Considering that the optimal blocking size depends on the matrix, we also provide a method to predict the best kernel to be used utilizing a simple interpolation of results from previous executions. We compare the performance of our approach to that of the Intel MKL CSR kernel and the CSR5 open-source package on a set of standard benchmark matrices. We show that we can achieve significant improvements in many cases, both for sequential and for parallel executions. Finally, we provide the corresponding code in an open source library, called SPC5.
Vectorization at the KENO-IV code
International Nuclear Information System (INIS)
Asai, K.; Higuchi, K.; Katakura, J.
1986-01-01
The multigroup criticality safety code KENO-IV has been vectorized and tested on the FACOM VP-100 vector processor. At first, the vectorized KENO-IV on a scalar processor was slower than the original one by a factor of 1.4 because of the overhead introduced by vectorization. Making modifications of algorithms and techniques for vectorization, the vectorized version has become faster than the original one by a factor of 1.4 on the vector processor. For further speedup of the code, some improvements on compiler and hardware, especially on addition of Monte Carlo pipelines to the vector processor, are discussed
Intelligent trigger processor for the crystal box
International Nuclear Information System (INIS)
Sanders, G.H.; Butler, H.S.; Cooper, M.D.
1981-01-01
A large solid angle modular NaI(Tl) detector with 432 phototubes and 88 trigger scintillators is being used to search simultaneously for three lepton flavor changing decays of muon. A beam of up to 10 6 muons stopping per second with a 6% duty factor would yield up to 1000 triggers per second from random triple coincidences. A reduction of the trigger rate to 10 Hz is required from a hardwired primary trigger processor described in this paper. Further reduction to < 1 Hz is achieved by a microprocessor based secondary trigger processor. The primary trigger hardware imposes voter coincidence logic, stringent timing requirements, and a non-adjacency requirement in the trigger scintillators defined by hardwired circuits. Sophisticated geometric requirements are imposed by a PROM-based matrix logic, and energy and vector-momentum cuts are imposed by a hardwired processor using LSI flash ADC's and digital arithmetic loci. The secondary trigger employs four satellite microprocessors to do a sparse data scan, multiplex the data acquisition channels and apply additional event filtering
Kraus, Wayne A; Wagner, Albert F
1986-04-01
A triatomic classical trajectory code has been modified by extensive vectorization of the algorithms to achieve much improved performance on an FPS 164 attached processor. Extensive timings on both the FPS 164 and a VAX 11/780 with floating point accelerator are presented as a function of the number of trajectories simultaneously run. The timing tests involve a potential energy surface of the LEPS variety and trajectories with 1000 time steps. The results indicate that vectorization results in timing improvements on both the VAX and the FPS. For larger numbers of trajectories run simultaneously, up to a factor of 25 improvement in speed occurs between VAX and FPS vectorized code. Copyright © 1986 John Wiley & Sons, Inc.
Optoelectronic switch matrix as a look-up table for residue arithmetic.
Macdonald, R I
1987-10-01
The use of optoelectronic matrix switches to perform look-up table functions in residue arithmetic processors is proposed. In this application, switchable detector arrays give the advantage of a greatly reduced requirement for optical sources by comparison with previous optoelectronic residue processors.
On the Vectorization of FIR Filterbanks
Directory of Open Access Journals (Sweden)
Barbedo Jayme Garcia Arnal
2007-01-01
Full Text Available This paper presents a vectorization technique to implement FIR filterbanks. The word vectorization, in the context of this work, refers to a strategy in which all iterative operations are replaced by equivalent vector and matrix operations. This approach allows that the increasing parallelism of the most recent computer processors and systems be properly explored. The vectorization techniques are applied to two kinds of FIR filterbanks (conventional and recursi ve, and are presented in such a way that they can be easily extended to any kind of FIR filterbanks. The vectorization approach is compared to other kinds of implementation that do not explore the parallelism, and also to a previous FIR filter vectorization approach. The tests were performed in Matlab and , in order to explore different aspects of the proposed technique.
On the Vectorization of FIR Filterbanks
Directory of Open Access Journals (Sweden)
Amauri Lopes
2007-01-01
Full Text Available This paper presents a vectorization technique to implement FIR filterbanks. The word vectorization, in the context of this work, refers to a strategy in which all iterative operations are replaced by equivalent vector and matrix operations. This approach allows that the increasing parallelism of the most recent computer processors and systems be properly explored. The vectorization techniques are applied to two kinds of FIR filterbanks (conventional and recursi ve, and are presented in such a way that they can be easily extended to any kind of FIR filterbanks. The vectorization approach is compared to other kinds of implementation that do not explore the parallelism, and also to a previous FIR filter vectorization approach. The tests were performed in Matlab and C, in order to explore different aspects of the proposed technique.
NASF transposition network: A computing network for unscrambling p-ordered vectors
Lim, R. S.
1979-01-01
The viewpoints of design, programming, and application of the transportation network (TN) is presented. The TN is a programmable combinational logic network that connects 521 memory modules to 512 processors. The unscrambling of p-ordered vectors to 1-ordered vectors in one cycle is described. The TN design is based upon the concept of cyclic groups from abstract algebra and primitive roots and indices from number theory. The programming of the TN is very simple, requiring only 20 bits: 10 bits for offset control and 10 bits for barrel switch shift control. This simple control is executed by the control unit (CU), not the processors. Any memory access by a processor must be coordinated with the CU and wait for all other processors to come to a synchronization point. These wait and synchronization events can be a degradation in performance to a computation. The TN application is for multidimensional data manipulation, matrix processing, and data sorting, and can also perform a perfect shuffle. Unlike other more complicated and powerful permutation networks, the TN cannot, if possible at all, unscramble non-p-ordered vectors in one cycle.
Optimizing Vector-Quantization Processor Architecture for Intelligent Query-Search Applications
Xu, Huaiyu; Mita, Yoshio; Shibata, Tadashi
2002-04-01
The architecture of a very large scale integration (VLSI) vector-quantization processor (VQP) has been optimized to develop a general-purpose intelligent query-search agent. The agent performs a similarity-based search in a large-volume database. Although similarity-based search processing is computationally very expensive, latency-free searches have become possible due to the highly parallel maximum-likelihood search architecture of the VQP chip. Three architectures of the VQP chip have been studied and their performances are compared. In order to give reasonable searching results according to the different policies, the concept of penalty function has been introduced into the VQP. An E-commerce real-estate agency system has been developed using the VQP chip implemented in a field-programmable gate array (FPGA) and the effectiveness of such an agency system has been demonstrated.
Optical Associative Processors For Visual Perception"
Casasent, David; Telfer, Brian
1988-05-01
We consider various associative processor modifications required to allow these systems to be used for visual perception, scene analysis, and object recognition. For these applications, decisions on the class of the objects present in the input image are required and thus heteroassociative memories are necessary (rather than the autoassociative memories that have been given most attention). We analyze the performance of both associative processors and note that there is considerable difference between heteroassociative and autoassociative memories. We describe associative processors suitable for realizing functions such as: distortion invariance (using linear discriminant function memory synthesis techniques), noise and image processing performance (using autoassociative memories in cascade with with a heteroassociative processor and with a finite number of autoassociative memory iterations employed), shift invariance (achieved through the use of associative processors operating on feature space data), and the analysis of multiple objects in high noise (which is achieved using associative processing of the output from symbolic correlators). We detail and provide initial demonstrations of the use of associative processors operating on iconic, feature space and symbolic data, as well as adaptive associative processors.
Auto-tuning Dense Vector and Matrix-vector Operations for Fermi GPUs
DEFF Research Database (Denmark)
Sørensen, Hans Henrik Brandenborg
2012-01-01
applications. As examples, we develop single-precision CUDA kernels for the Euclidian norm (SNRM2) and the matrix-vector multiplication (SGEMV). The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture). We show that auto-tuning can be successfully applied to achieve high performance...
Energy Technology Data Exchange (ETDEWEB)
Adachi, Masaaki; Ogasawara, Shinobu; Kume, Etsuo [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Ishizuki, Shigeru; Nemoto, Toshiyuki; Kawasaki, Nobuo; Kawai, Wataru [Fujitsu Ltd., Tokyo (Japan); Yatake, Yo-ichi [Hitachi Ltd., Tokyo (Japan)
2001-02-01
Several computer codes in the nuclear field have been vectorized, parallelized and trans-ported on the FUJITSU VPP500 system, the AP3000 system, the SX-4 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 18 codes in fiscal 1999. These results are reported in 3 parts, i.e., the vectorization and the parallelization part on vector processors, the parallelization part on scalar processors and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of Relativistic Molecular Orbital Calculation code RSCAT, a microscopic transport code for high energy nuclear collisions code JAM, three-dimensional non-steady thermal-fluid analysis code STREAM, Relativistic Density Functional Theory code RDFT and High Speed Three-Dimensional Nodal Diffusion code MOSRA-Light on the VPP500 system and the SX-4 system are described. (author)
Vector optical fields with bipolar symmetry of linear polarization.
Pan, Yue; Li, Yongnan; Li, Si-Min; Ren, Zhi-Cheng; Si, Yu; Tu, Chenghou; Wang, Hui-Tian
2013-09-15
We focus on a new kind of vector optical field with bipolar symmetry of linear polarization instead of cylindrical and elliptical symmetries, enriching members of family of vector optical fields. We design theoretically and generate experimentally the demanded vector optical fields and then explore some novel tightly focusing properties. The geometric configurations of states of polarization provide additional degrees of freedom assisting in engineering the field distribution at the focus to the specific applications such as lithography, optical trapping, and material processing.
The optical analogy for vector fields
Parker, E. N. (Editor)
1991-01-01
This paper develops the optical analogy for a general vector field. The optical analogy allows the examination of certain aspects of a vector field that are not otherwise readily accessible. In particular, in the cases of a stationary Eulerian flow v of an ideal fluid and a magnetostatic field B, the vectors v and B have surface loci in common with their curls. The intrinsic discontinuities around local maxima in absolute values of v and B take the form of vortex sheets and current sheets, respectively, the former playing a fundamental role in the development of hydrodyamic turbulence and the latter playing a major role in heating the X-ray coronas of stars and galaxies.
Discrete-ordinate method with matrix exponential for a pseudo-spherical atmosphere: Vector case
International Nuclear Information System (INIS)
Doicu, A.; Trautmann, T.
2009-01-01
The paper is devoted to the extension of the matrix-exponential formalism for the scalar radiative transfer to the vector case. Using basic results of the theory of matrix-exponential functions we provide a compact and versatile formulation of the vector radiative transfer. As in the scalar case, we operate with the concept of the layer equation incorporating the level values of the Stokes vector. The matrix exponentials which enter in the expression of the layer equation are computed by using the matrix eigenvalue method and the Pade approximation. A discussion of the computational efficiency of the proposed method for both an aerosol-loaded atmosphere as well as a cloudy atmosphere is also provided
International Nuclear Information System (INIS)
Ishizuki, Shigeru; Kawai, Wataru; Nemoto, Toshiyuki; Ogasawara, Shinobu; Kume, Etsuo; Adachi, Masaaki; Kawasaki, Nobuo; Yatake, Yo-ichi
2000-03-01
Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 12 codes in fiscal 1998. These results are reported in 3 parts, i.e., the vectorization and parallelization on vector processors part, the parallelization on scalar processors part and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of General Tokamak Circuit Simulation Program code GTCSP, the vectorization and parallelization of Molecular Dynamics NTV (n-particle, Temperature and Velocity) Simulation code MSP2, Eddy Current Analysis code EDDYCAL, Thermal Analysis Code for Test of Passive Cooling System by HENDEL T2 code THANPACST2 and MHD Equilibrium code SELENEJ on the VPP500 are described. In the parallelization on scalar processors part, the parallelization of Monte Carlo N-Particle Transport code MCNP4B2, Plasma Hydrodynamics code using Cubic Interpolated Propagation Method PHCIP and Vectorized Monte Carlo code (continuous energy model / multi-group model) MVP/GMVP on the Paragon are described. In the porting part, the porting of Monte Carlo N-Particle Transport code MCNP4B2 and Reactor Safety Analysis code RELAP5 on the AP3000 are described. (author)
Optical cage generated by azimuthal- and radial-variant vector beams.
Man, Zhongsheng; Bai, Zhidong; Li, Jinjian; Zhang, Shuoshuo; Li, Xiaoyu; Zhang, Yuquan; Ge, Xiaolu; Fu, Shenggui
2018-05-01
We propose a method to generate an optical cage using azimuthal- and radial-variant vector beams in a high numerical aperture optical system. A new kind of vector beam that has azimuthal- and radial-variant polarization states is proposed and demonstrated theoretically. Then, an integrated analytical model to calculate the electromagnetic field and Poynting vector distributions of the input azimuthal- and radial-variant vector beams is derived and built based on the vector diffraction theory of Richards and Wolf. From calculations, a full polarization-controlled optical cage is obtained by simply tailoring the radial index of the polarization, the uniformity U of which is up to 0.7748, and the cleanness C is zero. Additionally, a perfect optical cage can be achieved with U=1, and C=0 by introducing an amplitude modulation; its magnetic field and energy flow are also demonstrated in detail. Such optical cages may be helpful in applications such as optical trapping and high-resolution imaging.
Gamow-Jordan vectors and non-reducible density operators from higher-order S-matrix poles
International Nuclear Information System (INIS)
Bohm, A.; Loewe, M.; Maxson, S.; Patuleanu, P.; Puentmann, C.; Gadella, M.
1997-01-01
In analogy to Gamow vectors that are obtained from first-order resonance poles of the S-matrix, one can also define higher-order Gamow vectors which are derived from higher-order poles of the S-matrix. An S-matrix pole of r-th order at z R =E R -iΓ/2 leads to r generalized eigenvectors of order k=0,1,hor-ellipsis,r-1, which are also Jordan vectors of degree (k+1) with generalized eigenvalue (E R -iΓ/2). The Gamow-Jordan vectors are elements of a generalized complex eigenvector expansion, whose form suggests the definition of a state operator (density matrix) for the microphysical decaying state of this higher-order pole. This microphysical state is a mixture of non-reducible components. In spite of the fact that the k-th order Gamow-Jordan vectors has the polynomial time-dependence which one always associates with higher-order poles, the microphysical state obeys a purely exponential decay law. copyright 1997 American Institute of Physics
Chen, Rui-Pin; Chen, Zhaozhong; Chew, Khian-Hooi; Li, Pei-Gang; Yu, Zhongliang; Ding, Jianping; He, Sailing
2015-05-29
A caustic vector vortex optical field is experimentally generated and demonstrated by a caustic-based approach. The desired caustic with arbitrary acceleration trajectories, as well as the structured states of polarization (SoP) and vortex orders located in different positions in the field cross-section, is generated by imposing the corresponding spatial phase function in a vector vortex optical field. Our study reveals that different spin and orbital angular momentum flux distributions (including opposite directions) in different positions in the cross-section of a caustic vector vortex optical field can be dynamically managed during propagation by intentionally choosing the initial polarization and vortex topological charges, as a result of the modulation of the caustic phase. We find that the SoP in the field cross-section rotates during propagation due to the existence of the vortex. The unique structured feature of the caustic vector vortex optical field opens the possibility of multi-manipulation of optical angular momentum fluxes and SoP, leading to more complex manipulation of the optical field scenarios. Thus this approach further expands the functionality of an optical system.
Bounds on achievable accuracy in analog optical linear-algebra processors
Batsell, Stephen G.; Walkup, John F.; Krile, Thomas F.
1990-07-01
Upper arid lower bounds on the number of bits of accuracy achievable are determined by applying a seconth-ortler statistical model to the linear algebra processor. The use of bounds was found necessary due to the strong signal-dependence of the noise at the output of the optical linear algebra processor (OLAP). 1 1. ACCURACY BOUNDS One of the limiting factors in applying OLAPs to real world problems has been the poor achievable accuracy of these processors. Little previous research has been done on determining noise sources from a systems perspective which would include noise generated in the multiplication ard addition operations spatial variations across arrays and crosstalk. We have previously examined these noise sources and determined a general model for the output noise mean and variance. The model demonstrates a strony signaldependency in the noise at the output of the processor which has been confirmed by our experiments. 1 We define accuracy similar to its definition for an analog signal input to an analog-to-digital (ND) converter. The number of bits of accuracy achievable is related to the log (base 2) of the number of separable levels at the P/D converter output. The number of separable levels is fouri by dividing the dynamic range by m times the standard deviation of the signal a. 2 Here m determines the error rate in the P/D conversion. The dynamic range can be expressed as the
Negative base encoding in optical linear algebra processors
Perlee, C.; Casasent, D.
1986-01-01
In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary. Attention is presently given to negative base encoding, treating base -2 initially, and then showing that the negative base system can be readily extended to any radix. In general, negative base encoding in optical linear algebra processors represents a more efficient technique than either sign magnitude or 2's complement encoding, when the additions of digitally encoded products are performed in parallel.
International Nuclear Information System (INIS)
Sanghavi, Suniti; Davis, Anthony B.; Eldering, Annmarie
2014-01-01
In this paper, we build up on the scalar model smartMOM to arrive at a formalism for linearized vector radiative transfer based on the matrix operator method (vSmartMOM). Improvements have been made with respect to smartMOM in that a novel method of computing intensities for the exact viewing geometry (direct raytracing) without interpolation between quadrature points has been implemented. Also, the truncation method employed for dealing with highly peaked phase functions has been changed to a vector adaptation of Wiscombe's delta-m method. These changes enable speedier and more accurate radiative transfer computations by eliminating the need for a large number of quadrature points and coefficients for generalized spherical functions. We verify our forward model against the benchmarking results of Kokhanovsky et al. (2010) [22]. All non-zero Stokes vector elements are found to show agreement up to mostly the seventh significant digit for the Rayleigh atmosphere. Intensity computations for aerosol and cloud show an agreement of well below 0.03% and 0.05% at all viewing angles except around the solar zenith angle (60°), where most radiative models demonstrate larger variances due to the strongly forward-peaked phase function. We have for the first time linearized vector radiative transfer based on the matrix operator method with respect to aerosol optical and microphysical parameters. We demonstrate this linearization by computing Jacobian matrices for all Stokes vector elements for a multi-angular and multispectral measurement setup. We use these Jacobians to compare the aerosol information content of measurements using only the total intensity component against those using the idealized measurements of full Stokes vector [I,Q,U,V] as well as the more practical use of only [I,Q,U]. As expected, we find for the considered example that the accuracy of the retrieved parameters improves when the full Stokes vector is used. The information content for the full Stokes
MODELING OF DYNAMIC SYSTEMS WITH MODULATION BY MEANS OF KRONECKER VECTOR-MATRIX REPRESENTATION
Directory of Open Access Journals (Sweden)
A. S. Vasilyev
2015-09-01
Full Text Available The paper deals with modeling of dynamic systems with modulation by the possibilities of state-space method. This method, being the basis of modern control theory, is based on the possibilities of vector-matrix formalism of linear algebra and helps to solve various problems of technical control of continuous and discrete nature invariant with respect to the dimension of their “input-output” objects. Unfortunately, it turned its back on the wide group of control systems, which hardware environment modulates signals. The marked system deficiency is partially offset by this paper, which proposes Kronecker vector-matrix representations for purposes of system representation of processes with signal modulation. The main result is vector-matrix representation of processes with modulation with no formal difference from continuous systems. It has been found that abilities of these representations could be effectively used in research of systems with modulation. Obtained model representations of processes with modulation are best adapted to the state-space method. These approaches for counting eigenvalues of Kronecker matrix summaries, that are matrix basis of model representations of processes described by Kronecker vector products, give the possibility to use modal direction in research of dynamics for systems with modulation. It is shown that the use of controllability for eigenvalues of general matrixes applied to Kronecker structures enabled to divide successfully eigenvalue spectrum into directed and not directed components. Obtained findings including design problems for models of dynamic processes with modulation based on the features of Kronecker vector and matrix structures, invariant with respect to the dimension of input-output relations, are applicable in the development of alternate current servo drives.
Barr, David; Basden, Alastair; Dipper, Nigel; Schwartz, Noah; Vick, Andy; Schnetler, Hermine
2014-08-01
We present wavefront reconstruction acceleration of high-order AO systems using an Intel Xeon Phi processor. The Xeon Phi is a coprocessor providing many integrated cores and designed for accelerating compute intensive, numerical codes. Unlike other accelerator technologies, it allows virtually unchanged C/C++ to be recompiled to run on the Xeon Phi, giving the potential of making development, upgrade and maintenance faster and less complex. We benchmark the Xeon Phi in the context of AO real-time control by running a matrix vector multiply (MVM) algorithm. We investigate variability in execution time and demonstrate a substantial speed-up in loop frequency. We examine the integration of a Xeon Phi into an existing RTC system and show that performance improvements can be achieved with limited development effort.
Energy Technology Data Exchange (ETDEWEB)
Kuhlemann, Verena [Emory Univ., Atlanta, GA (United States); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2013-10-28
Matrix-vector multiplication is the key operation in any Krylov-subspace iteration method. We are interested in Krylov methods applied to problems associated with the graph Laplacian arising from large scale-free graphs. Furthermore, computations with graphs of this type on parallel distributed-memory computers are challenging. This is due to the fact that scale-free graphs have a degree distribution that follows a power law, and currently available graph partitioners are not efficient for such an irregular degree distribution. The lack of a good partitioning leads to excessive interprocessor communication requirements during every matrix-vector product. Here, we present an approach to alleviate this problem based on embedding the original irregular graph into a more regular one by disaggregating (splitting up) vertices in the original graph. The matrix-vector operations for the original graph are performed via a factored triple matrix-vector product involving the embedding graph. And even though the latter graph is larger, we are able to decrease the communication requirements considerably and improve the performance of the matrix-vector product.
Some Algorithms for the Conditional Mean Vector and Covariance Matrix
Directory of Open Access Journals (Sweden)
John F. Monahan
2006-08-01
Full Text Available We consider here the problem of computing the mean vector and covariance matrix for a conditional normal distribution, considering especially a sequence of problems where the conditioning variables are changing. The sweep operator provides one simple general approach that is easy to implement and update. A second, more goal-oriented general method avoids explicit computation of the vector and matrix, while enabling easy evaluation of the conditional density for likelihood computation or easy generation from the conditional distribution. The covariance structure that arises from the special case of an ARMA(p, q time series can be exploited for substantial improvements in computational efficiency.
Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.
2017-07-01
Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
High-Performance Matrix-Vector Multiplication on the GPU
DEFF Research Database (Denmark)
Sørensen, Hans Henrik Brandenborg
2012-01-01
In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing...
Special processor for in-core control systems
International Nuclear Information System (INIS)
Golovanov, M.N.; Duma, V.R.; Levin, G.L.; Mel'nikov, A.V.; Polikanin, A.V.; Filatov, V.P.
1978-01-01
The BUTs-20 special processor is discussed, designed to control the units of the in-core control equipment which are incorporated into the VECTOR communication channel, and to provide preliminary data processing prior to computer calculations. A set of instructions and flowsheet of the processor, organization of its communication with memories and other units of the system are given. The processor components: a control unit and an arithmetic logical unit are discussed. It is noted that the special processor permits more effective utilization of the computer time
Linear Matrix Inequalities for Analysis and Control of Linear Vector Second-Order Systems
DEFF Research Database (Denmark)
Adegas, Fabiano Daher; Stoustrup, Jakob
2015-01-01
the Lyapunov matrix and the system matrices by introducing matrix multipliers, which potentially reduce conservativeness in hard control problems. Multipliers facilitate the usage of parameter-dependent Lyapunov functions as certificates of stability of uncertain and time-varying vector second-order systems......SUMMARY Many dynamical systems are modeled as vector second-order differential equations. This paper presents analysis and synthesis conditions in terms of LMI with explicit dependence in the coefficient matrices of vector second-order systems. These conditions benefit from the separation between....... The conditions introduced in this work have the potential to increase the practice of analyzing and controlling systems directly in vector second-order form. Copyright © 2014 John Wiley & Sons, Ltd....
Applicability of vector processing to large-scale nuclear codes
International Nuclear Information System (INIS)
Ishiguro, Misako; Harada, Hiroo; Matsuura, Toshihiko; Okuda, Motoi; Ohta, Fumio; Umeya, Makoto.
1982-03-01
To meet the growing trend of computational requirements in JAERI, introduction of a high-speed computer with vector processing faculty (a vector processor) is desirable in the near future. To make effective use of a vector processor, appropriate optimization of nuclear codes to pipelined-vector architecture is vital, which will pose new problems concerning code development and maintenance. In this report, vector processing efficiency is assessed with respect to large-scale nuclear codes by examining the following items: 1) The present feature of computational load in JAERI is analyzed by compiling the computer utilization statistics. 2) Vector processing efficiency is estimated for the ten heavily-used nuclear codes by analyzing their dynamic behaviors run on a scalar machine. 3) Vector processing efficiency is measured for the other five nuclear codes by using the current vector processors, FACOM 230-75 APU and CRAY-1. 4) Effectiveness of applying a high-speed vector processor to nuclear codes is evaluated by taking account of the characteristics in JAERI jobs. Problems of vector processors are also discussed from the view points of code performance and ease of use. (author)
Optical propagators in vector and spinor theories by path integral formalism
International Nuclear Information System (INIS)
Linares, J.
1993-01-01
The construction of an extended parabolic (wide-angle) vector and spinor wave theory is presented. For that, optical propagators in monochromatic vector light optics and monoenergetic spinor electron optics are evaluated by the path integral formalism. The auxiliary parameter method introduced by Fock and the Feynman-Dyson perturbative series are used. The proposed theory supplies, by a generalized Fermat's principle, the Mukunda-Simon-Sudarshan transformation for the passage from scalar to vector light (or spinor electron) optics in an asymptotic approximation. (author). 19 refs
Lhamon, Michael Earl
A pattern recognition system which uses complex correlation filter banks requires proportionally more computational effort than single-real valued filters. This introduces increased computation burden but also introduces a higher level of parallelism, that common computing platforms fail to identify. As a result, we consider algorithm mapping to both optical and digital processors. For digital implementation, we develop computationally efficient pattern recognition algorithms, referred to as, vector inner product operators that require less computational effort than traditional fast Fourier methods. These algorithms do not need correlation and they map readily onto parallel digital architectures, which imply new architectures for optical processors. These filters exploit circulant-symmetric matrix structures of the training set data representing a variety of distortions. By using the same mathematical basis as with the vector inner product operations, we are able to extend the capabilities of more traditional correlation filtering to what we refer to as "Super Images". These "Super Images" are used to morphologically transform a complicated input scene into a predetermined dot pattern. The orientation of the dot pattern is related to the rotational distortion of the object of interest. The optical implementation of "Super Images" yields feature reduction necessary for using other techniques, such as artificial neural networks. We propose a parallel digital signal processor architecture based on specific pattern recognition algorithms but general enough to be applicable to other similar problems. Such an architecture is classified as a data flow architecture. Instead of mapping an algorithm to an architecture, we propose mapping the DSP architecture to a class of pattern recognition algorithms. Today's optical processing systems have difficulties implementing full complex filter structures. Typically, optical systems (like the 4f correlators) are limited to phase
Ghosh, A
1988-08-01
Lanczos and conjugate gradient algorithms are important in computational linear algebra. In this paper, a parallel pipelined realization of these algorithms on a ring of optical linear algebra processors is described. The flow of data is designed to minimize the idle times of the optical multiprocessor and the redundancy of computations. The effects of optical round-off errors on the solutions obtained by the optical Lanczos and conjugate gradient algorithms are analyzed, and it is shown that optical preconditioning can improve the accuracy of these algorithms substantially. Algorithms for optical preconditioning and results of numerical experiments on solving linear systems of equations arising from partial differential equations are discussed. Since the Lanczos algorithm is used mostly with sparse matrices, a folded storage scheme to represent sparse matrices on spatial light modulators is also described.
Compact optical processor for Hough and frequency domain features
Ott, Peter
1996-11-01
Shape recognition is necessary in a broad band of applications such as traffic sign or work piece recognition. It requires not only neighborhood processing of the input image pixels but global interconnection of them. The Hough transform (HT) performs such a global operation and it is well suited in the preprocessing stage of a shape recognition system. Translation invariant features can be easily calculated form the Hough domain. We have implemented on the computer a neural network shape recognition system which contains a HT, a feature extraction, and a classification layer. The advantage of this approach is that the total system can be optimized with well-known learning techniques and that it can explore the parallelism of the algorithms. However, the HT is a time consuming operation. Parallel, optical processing is therefore advantageous. Several systems have been proposed, based on space multiplexing with arrays of holograms and CGH's or time multiplexing with acousto-optic processors or by image rotation with incoherent and coherent astigmatic optical processors. We took up the last mentioned approach because 2D array detectors are read out line by line, so a 2D detector can achieve the same speed and is easier to implement. Coherent processing can allow the implementation of tilers in the frequency domain. Features based on wedge/ring, Gabor, or wavelet filters have been proven to show good discrimination capabilities for texture and shape recognition. The astigmatic lens system which is derived form the mathematical formulation of the HT is long and contains a non-standard, astigmatic element. By methods of lens transformation s for coherent applications we map the original design to a shorter lens with a smaller number of well separated standard elements and with the same coherent system response. The final lens design still contains the frequency plane for filtering and ray-tracing shows diffraction limited performance. Image rotation can be done
Aljada, Muhsen; Hwang, Seow; Alameh, Kamal
2008-01-21
In this paper we propose and experimentally demonstrate a reconfigurable 10Gbps frequency-encoded (1D) encoder/decoder structure for optical code division multiple access (OCDMA). The encoder is constructed using a single semiconductor optical amplifier (SOA) and 1D reflective Opto-VLSI processor. The SOA generates broadband amplified spontaneous emission that is dynamically sliced using digital phase holograms loaded onto the Opto-VLSI processor to generate 1D codewords. The selected wavelengths are injected back into the same SOA for amplifications. The decoder is constructed using single Opto-VLSI processor only. The encoded signal can successfully be retrieved at the decoder side only when the digital phase holograms of the encoder and the decoder are matched. The system performance is measured in terms of the auto-correlation and cross-correlation functions as well as the eye diagram.
Pavlichin, Dmitri S.; Mabuchi, Hideo
2014-06-01
Nanoscale integrated photonic devices and circuits offer a path to ultra-low power computation at the few-photon level. Here we propose an optical circuit that performs a ubiquitous operation: the controlled, random-access readout of a collection of stored memory phases or, equivalently, the computation of the inner product of a vector of phases with a binary selector" vector, where the arithmetic is done modulo 2pi and the result is encoded in the phase of a coherent field. This circuit, a collection of cascaded interferometers driven by a coherent input field, demonstrates the use of coherence as a computational resource, and of the use of recently-developed mathematical tools for modeling optical circuits with many coupled parts. The construction extends in a straightforward way to the computation of matrix-vector and matrix-matrix products, and, with the inclusion of an optical feedback loop, to the computation of a weighted" readout of stored memory phases. We note some applications of these circuits for error correction and for computing tasks requiring fast vector inner products, e.g. statistical classification and some machine learning algorithms.
Accelerating Matrix-Vector Multiplication on Hierarchical Matrices Using Graphical Processing Units
Boukaram, W.
2015-03-25
Large dense matrices arise from the discretization of many physical phenomena in computational sciences. In statistics very large dense covariance matrices are used for describing random fields and processes. One can, for instance, describe distribution of dust particles in the atmosphere, concentration of mineral resources in the earth\\'s crust or uncertain permeability coefficient in reservoir modeling. When the problem size grows, storing and computing with the full dense matrix becomes prohibitively expensive both in terms of computational complexity and physical memory requirements. Fortunately, these matrices can often be approximated by a class of data sparse matrices called hierarchical matrices (H-matrices) where various sub-blocks of the matrix are approximated by low rank matrices. These matrices can be stored in memory that grows linearly with the problem size. In addition, arithmetic operations on these H-matrices, such as matrix-vector multiplication, can be completed in almost linear time. Originally the H-matrix technique was developed for the approximation of stiffness matrices coming from partial differential and integral equations. Parallelizing these arithmetic operations on the GPU has been the focus of this work and we will present work done on the matrix vector operation on the GPU using the KSPARSE library.
Updating optical pseudoinverse associative memories.
Telfer, B; Casasent, D
1989-07-01
Selected algorithms for adding to and deleting from optical pseudoinverse associative memories are presented and compared. New realizations of pseudoinverse updating methods using vector inner product matrix bordering and reduced-dimensionality Karhunen-Loeve approximations (which have been used for updating optical filters) are described in the context of associative memories. Greville's theorem is reviewed and compared with the Widrow-Hoff algorithm. Kohonen's gradient projection method is expressed in a different form suitable for optical implementation. The data matrix memory is also discussed for comparison purposes. Memory size, speed and ease of updating, and key vector requirements are the comparison criteria used.
Mixed Analog/Digital Matrix-Vector Multiplier for Neural Network Synapses
DEFF Research Database (Denmark)
Lehmann, Torsten; Bruun, Erik; Dietrich, Casper
1996-01-01
In this work we present a hardware efficient matrix-vector multiplier architecture for artificial neural networks with digitally stored synapse strengths. We present a novel technique for manipulating bipolar inputs based on an analog two's complements method and an accurate current rectifier...
A fast inner product processor based on equal alignments
Energy Technology Data Exchange (ETDEWEB)
Smith, S.P.; Torng, H.C.
1985-11-01
Inner product computation is an important operation, invoked repeatedly in matrix multiplications. A high-speed inner product processor can be very useful (among many possible applications) in real-time signal processing. This paper presents the design of a fast inner product processor, with appreciably reduced latency and cost. The inner product processor is implemented with a tree of carry-propagate or carry-save adders; this structure is obtained with the incorporation of three innovations in the conventional multiply/add tree: The leaf-multipliers are expanded into adder subtrees, thus achieving an O(log Nb) latency, where N denotes the number of elements in a vector and b the number of bits in each element. The partial products, to be summed in producing an inner product, are reordered according to their ''minimum alignments.'' This reordering brings approximately a 20% savings in hardware-including adders and data paths. The reduction in adder widths also yields savings in carry propagation time for carry-propagate adders. For trees implemented with carry-save adders, the partial product reordering also serves to truncate the carry propagation chain in the final propagation stage by 2 log b - 1 positions, thus significantly reducing the latency further. A form of the Baugh and Wooley algorithm is adopted to implement two's complement notation with changes only in peripheral hardware.
Satellite on-board real-time SAR processor prototype
Bergeron, Alain; Doucet, Michel; Harnisch, Bernd; Suess, Martin; Marchese, Linda; Bourqui, Pascal; Desnoyers, Nicholas; Legros, Mathieu; Guillot, Ludovic; Mercier, Luc; Châteauneuf, François
2017-11-01
A Compact Real-Time Optronic SAR Processor has been successfully developed and tested up to a Technology Readiness Level of 4 (TRL4), the breadboard validation in a laboratory environment. SAR, or Synthetic Aperture Radar, is an active system allowing day and night imaging independent of the cloud coverage of the planet. The SAR raw data is a set of complex data for range and azimuth, which cannot be compressed. Specifically, for planetary missions and unmanned aerial vehicle (UAV) systems with limited communication data rates this is a clear disadvantage. SAR images are typically processed electronically applying dedicated Fourier transformations. This, however, can also be performed optically in real-time. Originally the first SAR images were optically processed. The optical Fourier processor architecture provides inherent parallel computing capabilities allowing real-time SAR data processing and thus the ability for compression and strongly reduced communication bandwidth requirements for the satellite. SAR signal return data are in general complex data. Both amplitude and phase must be combined optically in the SAR processor for each range and azimuth pixel. Amplitude and phase are generated by dedicated spatial light modulators and superimposed by an optical relay set-up. The spatial light modulators display the full complex raw data information over a two-dimensional format, one for the azimuth and one for the range. Since the entire signal history is displayed at once, the processor operates in parallel yielding real-time performances, i.e. without resulting bottleneck. Processing of both azimuth and range information is performed in a single pass. This paper focuses on the onboard capabilities of the compact optical SAR processor prototype that allows in-orbit processing of SAR images. Examples of processed ENVISAT ASAR images are presented. Various SAR processor parameters such as processing capabilities, image quality (point target analysis), weight and
Vector-Parallel processing of the successive overrelaxation method
International Nuclear Information System (INIS)
Yokokawa, Mitsuo
1988-02-01
Successive overrelaxation method, called SOR method, is one of iterative methods for solving linear system of equations, and it has been calculated in serial with a natural ordering in many nuclear codes. After the appearance of vector processors, this natural SOR method has been changed for the parallel algorithm such as hyperplane or red-black method, in which the calculation order is modified. These methods are suitable for vector processors, and more high-speed calculation can be obtained compared with the natural SOR method on vector processors. In this report, a new scheme named 4-colors SOR method is proposed. We find that the 4-colors SOR method can be executed on vector-parallel processors and it gives the most high-speed calculation among all SOR methods according to results of the vector-parallel execution on the Alliant FX/8 multiprocessor system. It is also shown that the theoretical optimal acceleration parameters are equal among five different ordering SOR methods, and the difference between convergence rates of these SOR methods are examined. (author)
Deri, Robert J.; DeGroot, Anthony J.; Haigh, Ronald E.
2002-01-01
As the performance of individual elements within parallel processing systems increases, increased communication capability between distributed processor and memory elements is required. There is great interest in using fiber optics to improve interconnect communication beyond that attainable using electronic technology. Several groups have considered WDM, star-coupled optical interconnects. The invention uses a fiber optic transceiver to provide low latency, high bandwidth channels for such interconnects using a robust multimode fiber technology. Instruction-level simulation is used to quantify the bandwidth, latency, and concurrency required for such interconnects to scale to 256 nodes, each operating at 1 GFLOPS performance. Performance scales have been shown to .apprxeq.100 GFLOPS for scientific application kernels using a small number of wavelengths (8 to 32), only one wavelength received per node, and achievable optoelectronic bandwidth and latency.
Collapse dynamics of a vector vortex optical field with inhomogeneous states of polarization
International Nuclear Information System (INIS)
Chen, Rui-Pin; Zhao, Ting-Yu; Zhang, Xiaobo; Zhong, Li-Xin; Chew, Khian-Hooi
2015-01-01
Based on a pair of coupled 2D nonlinear Schrödinger equations, the collapse dynamics of a vector field with hybrid states of polarization (SoP) in a Kerr medium is demonstrated. The critical power for an optical field to collapse is present, and the full vectorial numerical simulations provide detailed information about the evolution and partial collapse of the vector field in a Kerr medium. Our results reveal that the optical field prefers to collapse in linearly-polarization, as a result of the self-focusing effect difference in linearly, elliptically and circularly polarized components. The SoP in the field cross-section changes and propagates with a spiral trajectory when the vector beams are imposed with a vortex. The vectorial effect on the collapse of a vector optical field can prevail over the noise even though it reaches 10% amplitude of the optical field. The unique feature of these structured collapses of a vector optical field may lead to new phenomena in the interaction of light with matter. (paper)
Migration of vectorized iterative solvers to distributed memory architectures
Energy Technology Data Exchange (ETDEWEB)
Pommerell, C. [AT& T Bell Labs., Murray Hill, NJ (United States); Ruehl, R. [CSCS-ETH, Manno (Switzerland)
1994-12-31
Both necessity and opportunity motivate the use of high-performance computers for iterative linear solvers. Necessity results from the size of the problems being solved-smaller problems are often better handled by direct methods. Opportunity arises from the formulation of the iterative methods in terms of simple linear algebra operations, even if this {open_quote}natural{close_quotes} parallelism is not easy to exploit in irregularly structured sparse matrices and with good preconditioners. As a result, high-performance implementations of iterative solvers have attracted a lot of interest in recent years. Most efforts are geared to vectorize or parallelize the dominating operation-structured or unstructured sparse matrix-vector multiplication, or to increase locality and parallelism by reformulating the algorithm-reducing global synchronization in inner products or local data exchange in preconditioners. Target architectures for iterative solvers currently include mostly vector supercomputers and architectures with one or few optimized (e.g., super-scalar and/or super-pipelined RISC) processors and hierarchical memory systems. More recently, parallel computers with physically distributed memory and a better price/performance ratio have been offered by vendors as a very interesting alternative to vector supercomputers. However, programming comfort on such distributed memory parallel processors (DMPPs) still lags behind. Here the authors are concerned with iterative solvers and their changing computing environment. In particular, they are considering migration from traditional vector supercomputers to DMPPs. Application requirements force one to use flexible and portable libraries. They want to extend the portability of iterative solvers rather than reimplementing everything for each new machine, or even for each new architecture.
A high-speed analog neural processor
Masa, P.; Masa, Peter; Hoen, Klaas; Hoen, Klaas; Wallinga, Hans
1994-01-01
Targeted at high-energy physics research applications, our special-purpose analog neural processor can classify up to 70 dimensional vectors within 50 nanoseconds. The decision-making process of the implemented feedforward neural network enables this type of computation to tolerate weight
Integrated optical circuits for numerical computation
Verber, C. M.; Kenan, R. P.
1983-01-01
The development of integrated optical circuits (IOC) for numerical-computation applications is reviewed, with a focus on the use of systolic architectures. The basic architecture criteria for optical processors are shown to be the same as those proposed by Kung (1982) for VLSI design, and the advantages of IOCs over bulk techniques are indicated. The operation and fabrication of electrooptic grating structures are outlined, and the application of IOCs of this type to an existing 32-bit, 32-Mbit/sec digital correlator, a proposed matrix multiplier, and a proposed pipeline processor for polynomial evaluation is discussed. The problems arising from the inherent nonlinearity of electrooptic gratings are considered. Diagrams and drawings of the application concepts are provided.
Scientific Computing Kernels on the Cell Processor
Energy Technology Data Exchange (ETDEWEB)
Williams, Samuel W.; Shalf, John; Oliker, Leonid; Kamil, Shoaib; Husbands, Parry; Yelick, Katherine
2007-04-04
The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of using the recently-released STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. First, we introduce a performance model for Cell and apply it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs. The difficulty of programming Cell, which requires assembly level intrinsics for the best performance, makes this model useful as an initial step in algorithm design and evaluation. Next, we validate the accuracy of our model by comparing results against published hardware results, as well as our own implementations on a 3.2GHz Cell blade. Additionally, we compare Cell performance to benchmarks run on leading superscalar (AMD Opteron), VLIW (Intel Itanium2), and vector (Cray X1E) architectures. Our work also explores several different mappings of the kernels and demonstrates a simple and effective programming model for Cell's unique architecture. Finally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.
A direct derivation of the exact Fisther information matrix of Gaussian vector state space models
Klein, A.A.B.; Neudecker, H.
2000-01-01
This paper deals with a direct derivation of Fisher's information matrix of vector state space models for the general case, by which is meant the establishment of the matrix as a whole and not element by element. The method to be used is matrix differentiation, see [4]. We assume the model to be
Samba, A. S.
1985-01-01
The problem of solving banded linear systems by direct (non-iterative) techniques on the Vector Processor System (VPS) 32 supercomputer is considered. Two efficient direct methods for solving banded linear systems on the VPS 32 are described. The vector cyclic reduction (VCR) algorithm is discussed in detail. The performance of the VCR on a three parameter model problem is also illustrated. The VCR is an adaptation of the conventional point cyclic reduction algorithm. The second direct method is the Customized Reduction of Augmented Triangles' (CRAT). CRAT has the dominant characteristics of an efficient VPS 32 algorithm. CRAT is tailored to the pipeline architecture of the VPS 32 and as a consequence the algorithm is implicitly vectorizable.
Directory of Open Access Journals (Sweden)
A. A. Zolotin
2015-07-01
Full Text Available Posteriori inference is one of the three kinds of probabilistic-logic inferences in the probabilistic graphical models theory and the base for processing of knowledge patterns with probabilistic uncertainty using Bayesian networks. The paper deals with a task of local posteriori inference description in algebraic Bayesian networks that represent a class of probabilistic graphical models by means of matrix-vector equations. The latter are essentially based on the use of tensor product of matrices, Kronecker degree and Hadamard product. Matrix equations for calculating posteriori probabilities vectors within posteriori inference in knowledge patterns with quanta propositions are obtained. Similar equations of the same type have already been discussed within the confines of the theory of algebraic Bayesian networks, but they were built only for the case of posteriori inference in the knowledge patterns on the ideals of conjuncts. During synthesis and development of matrix-vector equations on quanta propositions probability vectors, a number of earlier results concerning normalizing factors in posteriori inference and assignment of linear projective operator with a selector vector was adapted. We consider all three types of incoming evidences - deterministic, stochastic and inaccurate - combined with scalar and interval estimation of probability truth of propositional formulas in the knowledge patterns. Linear programming problems are formed. Their solution gives the desired interval values of posterior probabilities in the case of inaccurate evidence or interval estimates in a knowledge pattern. That sort of description of a posteriori inference gives the possibility to extend the set of knowledge pattern types that we can use in the local and global posteriori inference, as well as simplify complex software implementation by use of existing third-party libraries, effectively supporting submission and processing of matrices and vectors when
Gain in computational efficiency by vectorization in the dynamic simulation of multi-body systems
Amirouche, F. M. L.; Shareef, N. H.
1991-01-01
An improved technique for the identification and extraction of the exact quantities associated with the degrees of freedom at the element as well as the flexible body level is presented. It is implemented in the dynamic equations of motions based on the recursive formulation of Kane et al. (1987) and presented in a matrix form, integrating the concepts of strain energy, the finite-element approach, modal analysis, and reduction of equations. This technique eliminates the CPU intensive matrix multiplication operations in the code's hot spots for the dynamic simulation of the interconnected rigid and flexible bodies. A study of a simple robot with flexible links is presented by comparing the execution times on a scalar machine and a vector-processor with and without vector options. Performance figures demonstrating the substantial gains achieved by the technique are plotted.
Speculative dynamic vectorization to assist static vectorization in a HW/SW co-designed environment
Kumar, R.; Martinez, A.; Gonzalez, A.
2013-01-01
Compiler based static vectorization is used widely to extract data level parallelism from computation intensive applications. Static vectorization is very effective in vectorizing traditional array based applications. However, compilers inability to reorder ambiguous memory references severely limits vectorization opportunities, especially in pointer rich applications. HW/SW co-designed processors provide an excellent opportunity to optimize the applications at runtime. The availability of dy...
Pseudo-Random Number Generators for Vector Processors and Multicore Processors
DEFF Research Database (Denmark)
Fog, Agner
2015-01-01
Large scale Monte Carlo applications need a good pseudo-random number generator capable of utilizing both the vector processing capabilities and multiprocessing capabilities of modern computers in order to get the maximum performance. The requirements for such a generator are discussed. New ways...
Matrix Optical Absorption in UV-MALDI MS.
Robinson, Kenneth N; Steven, Rory T; Bunch, Josephine
2018-03-01
In ultraviolet matrix-assisted laser desorption/ionization mass spectrometry (UV-MALDI MS) matrix compound optical absorption governs the uptake of laser energy, which in turn has a strong influence on experimental results. Despite this, quantitative absorption measurements are lacking for most matrix compounds. Furthermore, despite the use of UV-MALDI MS to detect a vast range of compounds, investigations into the effects of laser energy have been primarily restricted to single classes of analytes. We report the absolute solid state absorption spectra of the matrix compounds α-cyano-4-hydroxycinnamic acid (CHCA), para-nitroaniline (PNA), 2-mercaptobenzothiazole (MBT), 2,5-dihydroxybenzoic acid (2,5-DHB), and 2,4,6-trihydroxyacetophenone (THAP). The desorption/ionization characteristics of these matrix compounds with respect to laser fluence was investigated using mixed systems of matrix with either angiotensin II, PC(34:1) lipid standard, or haloperidol, acting as representatives for typical classes of analyte encountered in UV-MALDI MS. The first absolute solid phase spectra for PNA, MBT, and THAP are reported; additionally, inconsistencies between previously published spectra for CHCA are resolved. In light of these findings, suggestions are made for experimental optimization with regards to matrix and laser wavelength selection. The relationship between matrix optical cross-section and wavelength-dependant threshold fluence, fluence of maximum ion yield, and R, a new descriptor for the change in ion intensity with fluence, are described. A matrix cross-section of 1.3 × 10 -17 cm -2 was identified as a potential minimum for desorption/ionization of analytes. Graphical Abstract ᅟ.
A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs
Directory of Open Access Journals (Sweden)
Guixia He
2016-01-01
Full Text Available Sparse matrix-vector multiplication (SpMV is an important operation in scientific computations. Compressed sparse row (CSR is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs, for example, CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based SpMV on the GPU that is called PCSR. PCSR involves two kernels and accesses CSR arrays in a fully coalesced manner by introducing a middle array, which greatly alleviates the deficiencies of CSR-scalar (rare coalescing and CSR-vector (partial coalescing. Test results on a single C2050 GPU show that PCSR fully outperforms CSR-scalar, CSR-vector, and CSRMV and HYBMV in the vendor-tuned CUSPARSE library and is comparable with a most recently proposed CSR-based algorithm, CSR-Adaptive. Furthermore, we extend PCSR on a single GPU to multiple GPUs. Experimental results on four C2050 GPUs show that no matter whether the communication between GPUs is considered or not PCSR on multiple GPUs achieves good performance and has high parallel efficiency.
Optical Associative Memory Model With Threshold Modification Using Complementary Vector
Bian, Shaoping; Xu, Kebin; Hong, Jing
1989-02-01
A new criterion to evaluate the similarity between two vectors in associative memory is presented. According to it, an experimental research about optical associative memory model with threshold modification using complementary vector is carried out. This model is capable of eliminating the posibility to recall erroneously. Therefore the accuracy of reading out is improved.
Optical force exerted on a Rayleigh particle by a vector arbitrary-order Bessel beam
International Nuclear Information System (INIS)
Yang, Ruiping; Li, Renxian
2016-01-01
An analytical description of optical force on a Rayleigh particle by a vector Bessel beam is investigated. Linearly, radially, azimuthally, and circularly polarized Bessel beams are considered. The radial, azimuthal, and axial forces by a vector Bessel beam are numerically simulated. The effect of polarization, order of beams, and half-cone angle to the optical force are mainly discussed. For Bessel beams of larger half-cone angle, the non-paraxiality of beams plays an important role in optical forces. Numerical calculations show that optical forces, especially azimuthal forces, are very sensitive to the polarization of beams. - Highlights: • Optical force exerted on a Rayleigh particle by a vector Bessel beam is analytically derived. • Radial, azimuthal, and axial forces are numerically analyzed. • The effect of polarization, order of beam, and non-paraxiality is analyzed.
Modulated error diffusion CGHs for neural nets
Vermeulen, Pieter J. E.; Casasent, David P.
1990-05-01
New modulated error diffusion CGHs (computer generated holograms) for optical computing are considered. Specific attention is given to their use in optical matrix-vector, associative processor, neural net and optical interconnection architectures. We consider lensless CGH systems (many CGHs use an external Fourier transform (FT) lens), the Fresnel sampling requirements, the effects of finite CGH apertures (sample and hold inputs), dot size correction (for laser recorders), and new applications for this novel encoding method (that devotes attention to quantization noise effects).
Park, Kyoung-Duck; Raschke, Markus B.
2018-05-01
Controlling the propagation and polarization vectors in linear and nonlinear optical spectroscopy enables to probe the anisotropy of optical responses providing structural symmetry selective contrast in optical imaging. Here we present a novel tilted antenna-tip approach to control the optical vector-field by breaking the axial symmetry of the nano-probe in tip-enhanced near-field microscopy. This gives rise to a localized plasmonic antenna effect with significantly enhanced optical field vectors with control of both \\textit{in-plane} and \\textit{out-of-plane} components. We use the resulting vector-field specificity in the symmetry selective nonlinear optical response of second-harmonic generation (SHG) for a generalized approach to optical nano-crystallography and -imaging. In tip-enhanced SHG imaging of monolayer MoS$_2$ films and single-crystalline ferroelectric YMnO$_3$, we reveal nano-crystallographic details of domain boundaries and domain topology with enhanced sensitivity and nanoscale spatial resolution. The approach is applicable to any anisotropic linear and nonlinear optical response, and provides for optical nano-crystallographic imaging of molecular or quantum materials.
Vectorization of Monte Carlo particle transport
International Nuclear Information System (INIS)
Burns, P.J.; Christon, M.; Schweitzer, R.; Lubeck, O.M.; Wasserman, H.J.; Simmons, M.L.; Pryor, D.V.
1989-01-01
This paper reports that fully vectorized versions of the Los Alamos National Laboratory benchmark code Gamteb, a Monte Carlo photon transport algorithm, were developed for the Cyber 205/ETA-10 and Cray X-MP/Y-MP architectures. Single-processor performance measurements of the vector and scalar implementations were modeled in a modified Amdahl's Law that accounts for additional data motion in the vector code. The performance and implementation strategy of the vector codes are related to architectural features of each machine. Speedups between fifteen and eighteen for Cyber 205/ETA-10 architectures, and about nine for CRAY X-MP/Y-MP architectures are observed. The best single processor execution time for the problem was 0.33 seconds on the ETA-10G, and 0.42 seconds on the CRAY Y-MP
Embedded processor extensions for image processing
Thevenin, Mathieu; Paindavoine, Michel; Letellier, Laurent; Heyrman, Barthélémy
2008-04-01
The advent of camera phones marks a new phase in embedded camera sales. By late 2009, the total number of camera phones will exceed that of both conventional and digital cameras shipped since the invention of photography. Use in mobile phones of applications like visiophony, matrix code readers and biometrics requires a high degree of component flexibility that image processors (IPs) have not, to date, been able to provide. For all these reasons, programmable processor solutions have become essential. This paper presents several techniques geared to speeding up image processors. It demonstrates that a gain of twice is possible for the complete image acquisition chain and the enhancement pipeline downstream of the video sensor. Such results confirm the potential of these computing systems for supporting future applications.
Vector sparse representation of color image using quaternion matrix analysis.
Xu, Yi; Yu, Licheng; Xu, Hongteng; Zhang, Hao; Nguyen, Truong
2015-04-01
Traditional sparse image models treat color image pixel as a scalar, which represents color channels separately or concatenate color channels as a monochrome image. In this paper, we propose a vector sparse representation model for color images using quaternion matrix analysis. As a new tool for color image representation, its potential applications in several image-processing tasks are presented, including color image reconstruction, denoising, inpainting, and super-resolution. The proposed model represents the color image as a quaternion matrix, where a quaternion-based dictionary learning algorithm is presented using the K-quaternion singular value decomposition (QSVD) (generalized K-means clustering for QSVD) method. It conducts the sparse basis selection in quaternion space, which uniformly transforms the channel images to an orthogonal color space. In this new color space, it is significant that the inherent color structures can be completely preserved during vector reconstruction. Moreover, the proposed sparse model is more efficient comparing with the current sparse models for image restoration tasks due to lower redundancy between the atoms of different color channels. The experimental results demonstrate that the proposed sparse image model avoids the hue bias issue successfully and shows its potential as a general and powerful tool in color image analysis and processing domain.
Rotations with Rodrigues' vector
International Nuclear Information System (INIS)
Pina, E
2011-01-01
The rotational dynamics was studied from the point of view of Rodrigues' vector. This vector is defined here by its connection with other forms of parametrization of the rotation matrix. The rotation matrix was expressed in terms of this vector. The angular velocity was computed using the components of Rodrigues' vector as coordinates. It appears to be a fundamental matrix that is used to express the components of the angular velocity, the rotation matrix and the angular momentum vector. The Hamiltonian formalism of rotational dynamics in terms of this vector uses the same matrix. The quantization of the rotational dynamics is performed with simple rules if one uses Rodrigues' vector and similar formal expressions for the quantum operators that mimic the Hamiltonian classical dynamics.
Effective Vectorization with OpenMP 4.5
Energy Technology Data Exchange (ETDEWEB)
Huber, Joseph N. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Hernandez, Oscar R. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Lopez, Matthew Graham [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
2017-03-01
This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMD is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to be executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.
DEFF Research Database (Denmark)
Yang, Yukay
I consider multivariate (vector) time series models in which the error covariance matrix may be time-varying. I derive a test of constancy of the error covariance matrix against the alternative that the covariance matrix changes over time. I design a new family of Lagrange-multiplier tests against...... to consider multivariate volatility modelling....
International Nuclear Information System (INIS)
Paddon, D.J.
1984-01-01
This book is based on the proceedings of a conference on parallel computing held in 1982. There are 18 papers which cover the following topics: VLSI parallel architectures, the theory of parallel computing and vector and array processor computing. One paper on 'Tough Problems in Reactor Design' is indexed separately. All the contributions are on research done in the United Kingdom. Although much of the experience in array processor computing is associated with the ICL distributed array processor (DAP) and this is reflected in the contributions, the research relating to the ICL DAP is relevant to all types of array processors. (UK)
Xue, Min; Pan, Shilong; He, Chao; Guo, Ronghui; Zhao, Yongjiu
2013-11-15
A novel approach to increase the measurement range of the optical vector network analyzer (OVNA) based on optical single-sideband (OSSB) modulation is proposed and experimentally demonstrated. In the proposed system, each comb line in an optical frequency comb (OFC) is selected by an optical filter and used as the optical carrier for the OSSB-based OVNA. The frequency responses of an optical device-under-test (ODUT) are thus measured channel by channel. Because the comb lines in the OFC have fixed frequency spacing, by fitting the responses measured in all channels together, the magnitude and phase responses of the ODUT can be accurately achieved in a large range. A proof-of-concept experiment is performed. A measurement range of 105 GHz and a resolution of 1 MHz is achieved when a five-comb-line OFC with a frequency spacing of 20 GHz is applied to measure the magnitude and phase responses of a fiber Bragg grating.
Reconfigurable lattice mesh designs for programmable photonic processors.
Pérez, Daniel; Gasulla, Ivana; Capmany, José; Soref, Richard A
2016-05-30
We propose and analyse two novel mesh design geometries for the implementation of tunable optical cores in programmable photonic processors. These geometries are the hexagonal and the triangular lattice. They are compared here to a previously proposed square mesh topology in terms of a series of figures of merit that account for metrics that are relevant to on-chip integration of the mesh. We find that that the hexagonal mesh is the most suitable option of the three considered for the implementation of the reconfigurable optical core in the programmable processor.
International Nuclear Information System (INIS)
Littlefield, R.J.; Maschhoff, K.J.
1991-04-01
Many linear algebra algorithms utilize an array of processors across which matrices are distributed. Given a particular matrix size and a maximum number of processors, what configuration of processors, i.e., what size and shape array, will execute the fastest? The answer to this question depends on tradeoffs between load balancing, communication startup and transfer costs, and computational overhead. In this paper we analyze in detail one algorithm: the blocked factored Jacobi method for solving dense eigensystems. A performance model is developed to predict execution time as a function of the processor array and matrix sizes, plus the basic computation and communication speeds of the underlying computer system. In experiments on a large hypercube (up to 512 processors), this model has been found to be highly accurate (mean error ∼ 2%) over a wide range of matrix sizes (10 x 10 through 200 x 200) and processor counts (1 to 512). The model reveals, and direct experiment confirms, that the tradeoffs mentioned above can be surprisingly complex and counterintuitive. We propose decision procedures based directly on the performance model to choose configurations for fastest execution. The model-based decision procedures are compared to a heuristic strategy and shown to be significantly better. 7 refs., 8 figs., 1 tab
Vectorization, parallelization and porting of nuclear codes (porting). Progress report fiscal 1998
International Nuclear Information System (INIS)
Nemoto, Toshiyuki; Kawai, Wataru; Ishizuki, Shigeru; Kawasaki, Nobuo; Kume, Etsuo; Adachi, Masaaki; Ogasawara, Shinobu
2000-03-01
Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 12 codes in fiscal 1998. These results are reported in 3 parts, i.e., the vectorization and parallelization on vector processors part, the parallelization on scalar processors part and the porting part. In this report, we describe the porting. In this porting part, the porting of Monte Carlo N-Particle Transport code MCNP4B2 and Reactor Safety Analysis code RELAP5 on the AP3000 are described. In the vectorization and parallelization on vector processors part, the vectorization of General Tokamak Circuit Simulation Program code GTCSP, the vectorization and parallelization of Molecular Dynamics Ntv Simulation code MSP2, Eddy Current Analysis code EDDYCAL, Thermal Analysis Code for Test of Passive Cooling System by HENDEL T2 code THANPACST2 and MHD Equilibrium code SELENEJ on the VPP500 are described. In the parallelization on scalar processors part, the parallelization of Monte Carlo N-Particle Transport code MCNP4B2, Plasma Hydrodynamics code using Cubic Interpolated propagation Method PHCIP and Vectorized Monte Carlo code (continuous energy model/multi-group model) MVP/GMVP on the Paragon are described. (author)
Space Vector Modulation for an Indirect Matrix Converter with Improved Input Power Factor
Directory of Open Access Journals (Sweden)
Nguyen Dinh Tuyen
2017-04-01
Full Text Available Pulse width modulation strategies have been developed for indirect matrix converters (IMCs in order to improve their performance. In indirect matrix converters, the LC input filter is used to remove input current harmonics and electromagnetic interference problems. Unfortunately, due to the existence of the input filter, the input power factor is diminished, especially during operation at low voltage outputs. In this paper, a new space vector modulation (SVM is proposed to compensate for the input power factor of the indirect matrix converter. Both computer simulation and experimental studies through hardware implementation were performed to verify the effectiveness of the proposed modulation strategy.
A High-Speed and Low-Energy-Consumption Processor for SVD-MIMO-OFDM Systems
Directory of Open Access Journals (Sweden)
Hiroki Iwaizumi
2013-01-01
Full Text Available A processor design for singular value decomposition (SVD and compression/decompression of feedback matrices, which are mandatory operations for SVD multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM systems, is proposed and evaluated. SVD-MIMO is a transmission method for suppressing multistream interference and improving communication quality by beamforming. An application specific instruction-set processor (ASIP architecture is adopted to achieve flexibility in terms of operations and matrix size. The proposed processor realizes a high-speed/low-power design and real-time processing by the parallelization of floating-point units (FPUs and arithmetic instructions specialized in complex matrix operations.
Vector optical fields with polarization distributions similar to electric and magnetic field lines.
Pan, Yue; Li, Si-Min; Mao, Lei; Kong, Ling-Jun; Li, Yongnan; Tu, Chenghou; Wang, Pei; Wang, Hui-Tian
2013-07-01
We present, design and generate a new kind of vector optical fields with linear polarization distributions modeling to electric and magnetic field lines. The geometric configurations of "electric charges" and "magnetic charges" can engineer the spatial structure and symmetry of polarizations of vector optical field, providing additional degrees of freedom assisting in controlling the field symmetry at the focus and allowing engineering of the field distribution at the focus to the specific applications.
Air-Lubricated Thermal Processor For Dry Silver Film
Siryj, B. W.
1980-09-01
Since dry silver film is processed by heat, it may be viewed on a light table only seconds after exposure. On the other hand, wet films require both bulky chemicals and substantial time before an image can be analyzed. Processing of dry silver film, although simple in concept, is not so simple when reduced to practice. The main concern is the effect of film temperature gradients on uniformity of optical film density. RCA has developed two thermal processors, different in implementation but based on the same philosophy. Pressurized air is directed to both sides of the film to support the film and to conduct the heat to the film. Porous graphite is used as the medium through which heat and air are introduced. The initial thermal processor was designed to process 9.5-inch-wide film moving at speeds ranging from 0.0034 to 0.008 inch per second. The processor configuration was curved to match the plane generated by the laser recording beam. The second thermal processor was configured to process 5-inch-wide film moving at a continuously variable rate ranging from 0.15 to 3.5 inches per second. Due to field flattening optics used in this laser recorder, the required film processing area was plane. In addition, this processor was sectioned in the direction of film motion, giving the processor the capability of varying both temperature and effective processing area.
Image matrix processor for fast multi-dimensional computations
Roberson, George P.; Skeate, Michael F.
1996-01-01
An apparatus for multi-dimensional computation which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination.
The UA1 upgrade calorimeter trigger processor
International Nuclear Information System (INIS)
Bains, N.; Baird, S.A.; Biddulph, P.
1990-01-01
The increased luminosity of the improved CERN Collider and the more subtle signals of second-generation collider physics demand increasingly sophisticated triggering. We have built a new first-level trigger processor designed to use the excellent granularity of the UA1 upgrade calorimeter. This device is entirely digital and handles events in 1.5 μs, thus introducing no deadtime. Its most novel feature is fast two-dimensional electromagnetic cluster-finding with the possibility of demanding an isolated shower of limited penetration. The processor allows multiple combinations of triggers on electromagnetic showers, hadronic jets and energy sums, including a total-energy veto of multiple interactions and a full vector sum of missing transverse energy. This hard-wired processor is about five times more powerful than its predecessor, and makes extensive use of pipelining techniques. It was used extensively in the 1988 and 1989 runs of the CERN Collider. (author)
The UA1 upgrade calorimeter trigger processor
International Nuclear Information System (INIS)
Bains, M.; Charleton, D.; Ellis, N.; Garvey, J.; Gregory, J.; Jimack, M.P.; Jovanovic, P.; Kenyon, I.R.; Baird, S.A.; Campbell, D.; Cawthraw, M.; Coughlan, J.; Flynn, P.; Galagedera, S.; Grayer, G.; Halsall, R.; Shah, T.P.; Stephens, R.; Biddulph, P.; Eisenhandler, E.; Fensome, I.F.; Landon, M.; Robinson, D.; Oliver, J.; Sumorok, K.
1990-01-01
The increased luminosity of the improved CERN Collider and the more subtle signals of second-generation collider physics demand increasingly sophisticated triggering. We have built a new first-level trigger processor designed to use the excellent granularity of the UA1 upgrade calorimeter. This device is entirely digital and handles events in 1.5 μs, thus introducing no dead time. Its most novel feature is fast two-dimensional electromagnetic cluster-finding with the possibility of demanding an isolated shower of limited penetration. The processor allows multiple combinations of triggers on electromagnetic showers, hadronic jets and energy sums, including a total-energy veto of multiple interactions and a full vector sum of missing transverse energy. This hard-wired processor is about five times more powerful than its predecessor, and makes extensive use of pipelining techniques. It was used extensively in the 1988 and 1989 runs of the CERN Collider. (orig.)
The vector and parallel processing of MORSE code on Monte Carlo Machine
International Nuclear Information System (INIS)
Hasegawa, Yukihiro; Higuchi, Kenji.
1995-11-01
Multi-group Monte Carlo Code for particle transport, MORSE is modified for high performance computing on Monte Carlo Machine Monte-4. The method and the results are described. Monte-4 was specially developed to realize high performance computing of Monte Carlo codes for particle transport, which have been difficult to obtain high performance in vector processing on conventional vector processors. Monte-4 has four vector processor units with the special hardware called Monte Carlo pipelines. The vectorization and parallelization of MORSE code and the performance evaluation on Monte-4 are described. (author)
Truong, Tuan; Brack, Gary L.; Troy, Mitchell; Trinh, Thang; Shi, Fang; Dekany, Richard G.
2003-02-01
Adaptive optics (AO) systems currently under investigation will require at least two orders of magitude increase in the number of actuators, which in turn translates to effectively a 104 increase in compute latency. Since the performance of an AO system invariably improves as the compute latency decreases, it is important to study how today's computer systems will scale to address this expected increase in actuator utilization. This paper answers this question by characterizing the performance of a single deformable mirror (DM) Shack-Hartmann natural guide star AO system implemented on the present-generation digital signal processor (DSP) TMS320C6701 from Texas Instruments. We derive the compute latency of such a system in terms of a few basic parameters, such as the number of DM actuators, the number of data channels used to read out the camera pixels, the number of DSPs, the available memory bandwidth, as well as the inter-processor communication (IPC) bandwidth and the pixel transfer rate. We show how the results would scale for future systems that utilizes multiple DMs and guide stars. We demonstrate that the principal performance bottleneck of such a system is the available memory bandwidth of the processors and to lesser extent the IPC bandwidth. This paper concludes with suggestions for mitigating this bottleneck.
Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform
Xu, S.; Xue, W.; Lin, H.X.
2011-01-01
In this article, we discuss the performance modeling and optimization of Sparse Matrix-Vector Multiplication (SpMV) on NVIDIA GPUs using CUDA. SpMV has a very low computation-data ratio and its performance is mainly bound by the memory bandwidth. We propose optimization of SpMV based on ELLPACK from
Effect of processor temperature on film dosimetry
International Nuclear Information System (INIS)
Srivastava, Shiv P.; Das, Indra J.
2012-01-01
Optical density (OD) of a radiographic film plays an important role in radiation dosimetry, which depends on various parameters, including beam energy, depth, field size, film batch, dose, dose rate, air film interface, postexposure processing time, and temperature of the processor. Most of these parameters have been studied for Kodak XV and extended dose range (EDR) films used in radiation oncology. There is very limited information on processor temperature, which is investigated in this study. Multiple XV and EDR films were exposed in the reference condition (d max. , 10 × 10 cm 2 , 100 cm) to a given dose. An automatic film processor (X-Omat 5000) was used for processing films. The temperature of the processor was adjusted manually with increasing temperature. At each temperature, a set of films was processed to evaluate OD at a given dose. For both films, OD is a linear function of processor temperature in the range of 29.4–40.6°C (85–105°F) for various dose ranges. The changes in processor temperature are directly related to the dose by a quadratic function. A simple linear equation is provided for the changes in OD vs. processor temperature, which could be used for correcting dose in radiation dosimetry when film is used.
Fully-differential NNLO predictions for vector-boson pair production with MATRIX
Wiesemann, Marius; Kallweit, Stefan; Rathlev, Dirk
2016-01-01
We review the computations of the next-to-next-to-leading order (NNLO) QCD corrections to vector-boson pair production processes in proton–proton collisions and their implementation in the numerical code MATRIX. Our calculations include the leptonic decays of W and Z bosons, consistently taking into account all spin correlations, off-shell effects and non-resonant contributions. For massive vector-boson pairs we show inclusive cross sections, applying the respective mass windows chosen by ATLAS and CMS to define Z bosons from their leptonic decay products, as well as total cross sections for stable bosons. Moreover, we provide samples of differential distributions in fiducial phase-space regions inspired by typical selection cuts used by the LHC experiments. For the vast majority of measurements, the inclusion of NNLO corrections significantly improves the agreement of the Standard Model predictions with data.
Samlan, C T; Viswanathan, Nirmal K
2018-01-31
Electric-field applied perpendicular to the direction of propagation of paraxial beam through an optical crystal dynamically modifies the spin-orbit interaction (SOI), leading to the demonstration of controllable spin-Hall effect of light (SHEL). The electro- and piezo-optic effects of the crystal modifies the radially symmetric spatial variation in the fast-axis orientation of the crystal, resulting in a complex pattern with different topologies due to the symmetry-breaking effect of the applied field. This introduces spatially-varying Pancharatnam-Berry type geometric phase on to the paraxial beam of light, leading to the observation of SHEL in addition to the spin-to-vortex conversion. A wave-vector resolved conoscopic Mueller matrix measurement and analysis provides a first glimpse of the SHEL in the biaxial crystal, identified via the appearance of weak circular birefringence. The emergence of field-controllable fast-axis orientation of the crystal and the resulting SHEL provides a new degree of freedom for affecting and controlling the spin and orbital angular momentum of photons to unravel the rich underlying physics of optical crystals and aid in the development of active photonic spin-Hall devices.
Monte Carlo photon transport on shared memory and distributed memory parallel processors
International Nuclear Information System (INIS)
Martin, W.R.; Wan, T.C.; Abdel-Rahman, T.S.; Mudge, T.N.; Miura, K.
1987-01-01
Parallelized Monte Carlo algorithms for analyzing photon transport in an inertially confined fusion (ICF) plasma are considered. Algorithms were developed for shared memory (vector and scalar) and distributed memory (scalar) parallel processors. The shared memory algorithm was implemented on the IBM 3090/400, and timing results are presented for dedicated runs with two, three, and four processors. Two alternative distributed memory algorithms (replication and dispatching) were implemented on a hypercube parallel processor (1 through 64 nodes). The replication algorithm yields essentially full efficiency for all cube sizes; with the 64-node configuration, the absolute performance is nearly the same as with the CRAY X-MP. The dispatching algorithm also yields efficiencies above 80% in a large simulation for the 64-processor configuration
Synthesis and characterization of optically transparent epoxy matrix nanocomposites
International Nuclear Information System (INIS)
Esposito Corcione, C.; Manera, M.G.; Maffezzoli, A.; Rella, R.
2009-01-01
In this work optically transparent nanocomposites were prepared and characterized from an optical and morphological point of view. An organically modified boehmite was added at different concentrations in a diglycidyl ether of bisphenol A (DGEBA) epoxy matrix, hardened with a polyether diamine. Nanocomposites were characterized structurally by X-ray diffraction (XRD), optically by UV-Vis-NIR spectrophotometry and their morphology was investigated by Atomic Force Microscopy (AFM). Morphological investigation reveals the presence of boehmite particles dispersed in the epoxy matrix in different dimensions ranging from ten to hundreds of nanometers; some aggregation in the particles is the tendency noticed in the AFM images. The acquisition of multiple AFM images in different areas of the sample was used for a statistical analysis of the volumetric distribution of boehmite aggregates. The obtained result, (3.6 ± 0.3)%vol, is well comparable to thermogravimetric analysis.
Vectorization of the KENO V.a criticality safety code
International Nuclear Information System (INIS)
Hollenbach, D.F.; Dodds, H.L.; Petrie, L.M.
1991-01-01
The development of the vector processor, which is used in the current generation of supercomputers and is beginning to be used in workstations, provides the potential for dramatic speed-up for codes that are able to process data as vectors. Unfortunately, the stochastic nature of Monte Carlo codes prevents the old scalar version of these codes from taking advantage of the vector processors. New Monte Carlo algorithms that process all the histories undergoing the same event as a batch are required. Recently, new vectorized Monte Carlo codes have been developed that show significant speed-ups when compared to the scalar version of themselves or equivalent codes. This paper discusses the vectorization of an already existing and widely used criticality safety code, KENO V.a All the changes made to KENO V.a are transparent to the user making it possible to upgrade from the standard scalar version of KENO V.a to the vectorized version without learning a new code
Ouyang, J; Perrie, W; Allegre, O J; Heil, T; Jin, Y; Fearon, E; Eckford, D; Edwardson, S P; Dearden, G
2015-05-18
Precise tailoring of optical vector beams is demonstrated, shaping their focal electric fields and used to create complex laser micro-patterning on a metal surface. A Spatial Light Modulator (SLM) and a micro-structured S-waveplate were integrated with a picosecond laser system and employed to structure the vector fields into radial and azimuthal polarizations with and without a vortex phase wavefront as well as superposition states. Imprinting Laser Induced Periodic Surface Structures (LIPSS) elucidates the detailed vector fields around the focal region. In addition to clear azimuthal and radial plasmon surface structures, unique, variable logarithmic spiral micro-structures with a pitch Λ ∼1μm, not observed previously, were imprinted on the surface, confirming unambiguously the complex 2D focal electric fields. We show clearly also how the Orbital Angular Momentum(OAM) associated with a helical wavefront induces rotation of vector fields along the optic axis of a focusing lens and confirmed by the observed surface micro-structures.
Vector method for strain estimation in phase-sensitive optical coherence elastography
Matveyev, A. L.; Matveev, L. A.; Sovetsky, A. A.; Gelikonov, G. V.; Moiseev, A. A.; Zaitsev, V. Y.
2018-06-01
A noise-tolerant approach to strain estimation in phase-sensitive optical coherence elastography, robust to decorrelation distortions, is discussed. The method is based on evaluation of interframe phase-variation gradient, but its main feature is that the phase is singled out at the very last step of the gradient estimation. All intermediate steps operate with complex-valued optical coherence tomography (OCT) signals represented as vectors in the complex plane (hence, we call this approach the ‘vector’ method). In comparison with such a popular method as least-square fitting of the phase-difference slope over a selected region (even in the improved variant with amplitude weighting for suppressing small-amplitude noisy pixels), the vector approach demonstrates superior tolerance to both additive noise in the receiving system and speckle-decorrelation caused by tissue straining. Another advantage of the vector approach is that it obviates the usual necessity of error-prone phase unwrapping. Here, special attention is paid to modifications of the vector method that make it especially suitable for processing deformations with significant lateral inhomogeneity, which often occur in real situations. The method’s advantages are demonstrated using both simulated and real OCT scans obtained during reshaping of a collagenous tissue sample irradiated by an IR laser beam producing complex spatially inhomogeneous deformations.
Analysis of Optical Fiber Complex Propagation Matrix on the Basis of Vortex Modes
DEFF Research Database (Denmark)
Lyubopytov, Vladimir S.; Tatarczak, Anna; Lu, Xiaofeng
2016-01-01
We propose and experimentally demonstrate a novel method for reconstruction of the complex propagation matrix of optical fibers supporting propagation of multiple vortex modes. This method is based on the azimuthal decomposition approach and allows the complex matrix elements to be determined...... by direct calculations. We apply the proposed method to demonstrate the feasibility of optical compensation for coupling between vortex modes in optical fiber....
Structuring Stokes correlation functions using vector-vortex beam
Kumar, Vijay; Anwar, Ali; Singh, R. P.
2018-01-01
Higher order statistical correlations of the optical vector speckle field, formed due to scattering of a vector-vortex beam, are explored. Here, we report on the experimental construction of the Stokes parameters covariance matrix, consisting of all possible spatial Stokes parameters correlation functions. We also propose and experimentally realize a new Stokes correlation functions called Stokes field auto correlation functions. It is observed that the Stokes correlation functions of the vector-vortex beam will be reflected in the respective Stokes correlation functions of the corresponding vector speckle field. The major advantage of proposing Stokes correlation functions is that the Stokes correlation function can be easily tuned by manipulating the polarization of vector-vortex beam used to generate vector speckle field and to get the phase information directly from the intensity measurements. Moreover, this approach leads to a complete experimental Stokes characterization of a broad range of random fields.
High-dimensional statistical inference: From vector to matrix
Zhang, Anru
Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, delta kA 0, delta kA < 1/3 + epsilon, deltak A + thetak,kA < 1 + epsilon, or deltatkA< √(t - 1) / t + epsilon are not sufficient to guarantee the exact recovery of all k-sparse signals for large k. Similar result also holds for matrix recovery. In addition, the conditions delta kA<1/3, deltak A+ thetak,kA<1, delta tkA < √(t - 1)/t and deltarM<1/3, delta rM+ thetar,rM<1, delta trM< √(t - 1)/ t are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case. For the second part of the thesis, we introduce a rank-one projection model for low-rank matrix recovery and propose a constrained nuclear norm minimization method for stable recovery of low-rank matrices in the noisy case. The procedure is adaptive to the rank and robust against small perturbations. Both upper and lower bounds for the estimation accuracy under the Frobenius norm loss are obtained. The proposed estimator is shown to be rate-optimal under certain conditions. The
CW-THz vector spectroscopy and imaging system based on 1.55-µm fiber-optics.
Kim, Jae-Young; Song, Ho-Jin; Yaita, Makoto; Hirata, Akihiko; Ajito, Katsuhiro
2014-01-27
We present a continuous-wave terahertz (THz) vector spectroscopy and imaging system based on a 1.5-µm fiber optic uni-traveling-carrier photodiode and InGaAs photo-conductive receiver. Using electro-optic (EO) phase modulators for THz phase control with shortened optical paths, the system achieves fast vector measurement with effective phase stabilization. Dynamic ranges of 100 dB · Hz and 75 dB · Hz at 300 GHz and 1 THz, and phase stability of 1.5° per minute are obtained. With the simultaneous measurement of absorbance and relative permittivity, we demonstrate non-destructive analyses of pharmaceutical cocrystals inside tablets within a few minutes.
Kreutzer, Moritz; Hager, Georg; Wellein, Gerhard; Fehske, Holger; Basermann, Achim; Bishop, Alan R.
2011-01-01
Sparse matrix-vector multiplication (spMVM) is the dominant operation in many sparse solvers. We investigate performance properties of spMVM with matrices of various sparsity patterns on the nVidia “Fermi” class of GPGPUs. A new “padded jagged diagonals storage” (pJDS) format is proposed which may substantially reduce the memory overhead intrinsic to the widespread ELLPACK-R scheme while making no assumptions about the matrix structure. In our test scenarios the pJDS format cuts the ...
Energy Technology Data Exchange (ETDEWEB)
Aktulga, Hasan Metin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Buluc, Aydin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Williams, Samuel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Yang, Chao [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
2014-08-14
Obtaining highly accurate predictions on the properties of light atomic nuclei using the configuration interaction (CI) approach requires computing a few extremal Eigen pairs of the many-body nuclear Hamiltonian matrix. In the Many-body Fermion Dynamics for nuclei (MFDn) code, a block Eigen solver is used for this purpose. Due to the large size of the sparse matrices involved, a significant fraction of the time spent on the Eigen value computations is associated with the multiplication of a sparse matrix (and the transpose of that matrix) with multiple vectors (SpMM and SpMM-T). Existing implementations of SpMM and SpMM-T significantly underperform expectations. Thus, in this paper, we present and analyze optimized implementations of SpMM and SpMM-T. We base our implementation on the compressed sparse blocks (CSB) matrix format and target systems with multi-core architectures. We develop a performance model that allows us to understand and estimate the performance characteristics of our SpMM kernel implementations, and demonstrate the efficiency of our implementation on a series of real-world matrices extracted from MFDn. In particular, we obtain 3-4 speedup on the requisite operations over good implementations based on the commonly used compressed sparse row (CSR) matrix format. The improvements in the SpMM kernel suggest we may attain roughly a 40% speed up in the overall execution time of the block Eigen solver used in MFDn.
Sparse Matrix-Vector Multiplication on Multicore and Accelerators
Energy Technology Data Exchange (ETDEWEB)
Williams, Samuel W. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Bell, Nathan [NVIDIA Research, Santa Clara, CA (United States); Choi, Jee Whan [Georgia Inst. of Technology, Atlanta, GA (United States); Garland, Michael [NVIDIA Research, Santa Clara, CA (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Vuduc, Richard [Georgia Inst. of Technology, Atlanta, GA (United States)
2010-12-07
This chapter consolidates recent work on the development of high performance multicore and accelerator-based implementations of sparse matrix-vector multiplication (SpMV). As an object of study, SpMV is an interesting computation for two key reasons. First, it appears widely in applications in scientific and engineering computing, financial and economic modeling, and information retrieval, among others, and is therefore of great practical interest. Secondly, it is both simple to describe but challenging to implement well, since its performance is limited by a variety of factors, including low computational intensity, potentially highly irregular memory access behavior, and a strong input dependence that be known only at run time. Thus, we believe SpMV is both practically important and provides important insights for understanding the algorithmic and implementation principles necessary to making effective use of state-of-the-art systems.
Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA
Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei
2013-03-01
With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.
Off-diagonal helicity density matrix elements for vector mesons produced at LEP
International Nuclear Information System (INIS)
Anselmino, M.; Bertini, M.; Quintairos, P.
1997-05-01
Final state q q-bar interactions may give origin to non zero values of the off-diagonal element ρ 1 of the helicity density matrix of vector mesons produced in e + e - annihilations, as confirmed by recent OPAL data on φ and D * 's. Predictions are given for ρ1,-1 of several mesons produced at large z and small PT, collinear with the parent jet; the values obtained for θ and D * are in agreement with data. (author)
EISPACK, Subroutines for Eigenvalues, Eigenvectors, Matrix Operations
International Nuclear Information System (INIS)
Garbow, Burton S.; Cline, A.K.; Meyering, J.
1993-01-01
1 - Description of problem or function: EISPACK3 is a collection of 75 FORTRAN subroutines, both single- and double-precision, that compute the eigenvalues and eigenvectors of nine classes of matrices. The package can determine the Eigen-system of complex general, complex Hermitian, real general, real symmetric, real symmetric band, real symmetric tridiagonal, special real tridiagonal, generalized real, and generalized real symmetric matrices. In addition, there are two routines which use the singular value decomposition to solve certain least squares problem. The individual subroutines are - Identification/Description: BAKVEC: Back transform vectors of matrix formed by FIGI; BALANC: Balance a real general matrix; BALBAK: Back transform vectors of matrix formed by BALANC; BANDR: Reduce sym. band matrix to sym. tridiag. matrix; BANDV: Find some vectors of sym. band matrix; BISECT: Find some values of sym. tridiag. matrix; BQR: Find some values of sym. band matrix; CBABK2: Back transform vectors of matrix formed by CBAL; CBAL: Balance a complex general matrix; CDIV: Perform division of two complex quantities; CG: Driver subroutine for a complex general matrix; CH: Driver subroutine for a complex Hermitian matrix; CINVIT: Find some vectors of complex Hess. matrix; COMBAK: Back transform vectors of matrix formed by COMHES; COMHES: Reduce complex matrix to complex Hess. (elementary); COMLR: Find all values of complex Hess. matrix (LR); COMLR2: Find all values/vectors of cmplx Hess. matrix (LR); CCMQR: Find all values of complex Hessenberg matrix (QR); COMQR2: Find all values/vectors of cmplx Hess. matrix (QR); CORTB: Back transform vectors of matrix formed by CORTH; CORTH: Reduce complex matrix to complex Hess. (unitary); CSROOT: Find square root of complex quantity; ELMBAK: Back transform vectors of matrix formed by ELMHES; ELMHES: Reduce real matrix to real Hess. (elementary); ELTRAN: Accumulate transformations from ELMHES (for HQR2); EPSLON: Estimate unit roundoff
Vectorization with SIMD extensions speeds up reconstruction in electron tomography.
Agulleiro, J I; Garzón, E M; García, I; Fernández, J J
2010-06-01
Electron tomography allows structural studies of cellular structures at molecular detail. Large 3D reconstructions are needed to meet the resolution requirements. The processing time to compute these large volumes may be considerable and so, high performance computing techniques have been used traditionally. This work presents a vector approach to tomographic reconstruction that relies on the exploitation of the SIMD extensions available in modern processors in combination to other single processor optimization techniques. This approach succeeds in producing full resolution tomograms with an important reduction in processing time, as evaluated with the most common reconstruction algorithms, namely WBP and SIRT. The main advantage stems from the fact that this approach is to be run on standard computers without the need of specialized hardware, which facilitates the development, use and management of programs. Future trends in processor design open excellent opportunities for vector processing with processor's SIMD extensions in the field of 3D electron microscopy.
Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi
1994-01-01
An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.
Xue, Min; Pan, Shilong; Zhao, Yongjiu
2015-02-15
A novel optical vector network analyzer (OVNA) based on optical single-sideband (OSSB) modulation and balanced photodetection is proposed and experimentally demonstrated, which can eliminate the measurement error induced by the high-order sidebands in the OSSB signal. According to the analytical model of the conventional OSSB-based OVNA, if the optical carrier in the OSSB signal is fully suppressed, the measurement result is exactly the high-order-sideband-induced measurement error. By splitting the OSSB signal after the optical device-under-test (ODUT) into two paths, removing the optical carrier in one path, and then detecting the two signals in the two paths using a balanced photodetector (BPD), high-order-sideband-induced measurement error can be ideally eliminated. As a result, accurate responses of the ODUT can be achieved without complex post-signal processing. A proof-of-concept experiment is carried out. The magnitude and phase responses of a fiber Bragg grating (FBG) measured by the proposed OVNA with different modulation indices are superimposed, showing that the high-order-sideband-induced measurement error is effectively removed.
Ewing, Anthony
2005-01-01
Solar sail propulsive performance is dependent on sail membrane optical properties and on sail membrane shape. Assumptions of an ideal sail (flat, perfect reflector) can result in errors which can affect spacecraft control, trajectory analyses, and overall evaluation of solar sail performance. A MATLAB(R) program has been developed to generate sail shape point cloud files for two square-architecture solar sail designs. Simple parabolic profiles are assumed for sail shape under solar pressure loading. These files are then input into the Solar Vectoring Evaluation Tool (SVET) software to determine the propulsive force vector, center of pressure, and moments about the sail body axes as a function of sail shape and optical properties. Also, the impact of the center-line angle, due to non-perfect optical properties, is addressed since this constrains sail force vector cone angle and is often overlooked when assuming ideal-reflector membranes. Preliminary sensitivity analysis using these tools aids in determining the key geometric and optical parameters that drive solar sail propulsive performance.
Quality assurance through constancy control for X-ray film processors
International Nuclear Information System (INIS)
Weberling, R.
1982-01-01
A control method to check the reproduction of X-ray film processors and necessary instruments is presented. The application of a light sensitometer allows the production of test films daily, independent of X-ray exposures, X-ray film cassettes and X-ray intensifying screens. The optical densities on the test films will be read by means of a densitometer and the results are plotted on a special control chart. A limitation through optical densities of +-0,15 for Speed Index and +-0,20 for Contrast Index determines the tolerance variation for X-ray film processors. Targets of this control method are uniform image quality, dose reduction and saving of cost. (orig.) [de
DEFF Research Database (Denmark)
Zhang, Guanguan; Yang, Jian; Yang, Yongheng
2017-01-01
In order to reduce the Common-Mode Voltage (CMV) in three-to-five phase indirect matrix converters, three improved Space Vector Pulse Width Modulation (SVPWM) methods are proposed and discussed. The improved modulation schemes are achieved by reorganizing zero vectors from the inversion stage......) in the inversion stage, which results in a large amount of third-order harmonics in output currents. In addition, the method that utilizes two adjacent active current vectors (ACVs) and the method that uses two non-adjacent ACVs in the rectification stage have the same CMV peak value. By contrast, the latter...... achieves a lower Total Harmonic Distortion (THD) level of the output currents. Simulation results verify the effectiveness of the proposed methods....
Effect of a spiral phase on a vector optical field with hybrid polarization states
International Nuclear Information System (INIS)
Chen, Rui-Pin; Zhao, Tingyu; Zhong, Li-Xin; Chew, Khian-Hooi; Gu, Bing; Zhou, Guoquan
2015-01-01
The propagation dynamics of a vector field with inhomogeneous states of polarization (SoP) imposed a vortex is studied using the angular spectrum method. The evolution of SoP in the cross section of the field during propagation is analyzed numerically by the Stokes polarization parameters. The results indicate that SoP in the field cross section rotate along the propagation axis during propagation due to the existence of a vortex. In addition, the interaction between the phase singularity and the polarization singularity leads to the creation or annihilation of the optical field in the central region. In particular, the distributions of the transverse energy flow and both spin and orbital optical angular momentum fluxes in the cross section of the vortex vector optical field depend sensitively on both the vortex and polarization topology charges. (paper)
International Nuclear Information System (INIS)
Qin Yiqiang
2006-01-01
A dual-periodic structure for quasi-phase matching cascaded optical parametric interactions is proposed. Due to the coupling of reciprocal vectors between the original and imposed periodic sequence, the reciprocal vectors and the corresponding effective nonlinear coefficients is no longer the simple combination of two periodic structures. The new analytical expression of the effective nonlinear coefficients is deduced and given. The degeneracy phenomena and the novel extinction rule resulting from the coupling of reciprocal vectors are found and investigated. The corresponding physical nature is also discussed
Image Jacobian Matrix Estimation Based on Online Support Vector Regression
Directory of Open Access Journals (Sweden)
Shangqin Mao
2012-10-01
Full Text Available Research into robotics visual servoing is an important area in the field of robotics. It has proven difficult to achieve successful results for machine vision and robotics in unstructured environments without using any a priori camera or kinematic models. In uncalibrated visual servoing, image Jacobian matrix estimation methods can be divided into two groups: the online method and the offline method. The offline method is not appropriate for most natural environments. The online method is robust but rough. Moreover, if the images feature configuration changes, it needs to restart the approximating procedure. A novel approach based on an online support vector regression (OL-SVR algorithm is proposed which overcomes the drawbacks and combines the virtues just mentioned.
Ultrafast Optics: Vector Cavity Laser - Physics and Technology
2016-06-14
with a quasi- vector cavity both numerically and experimentally. It is expected that through the study a deep and comprehensive understanding on the...799-801, Jun. 1997. 31. L. M. Zhao, D. Y. Tang, J. Wu, X. Q. Fu, and S. C. Wen , "Noise-like pulse in a gain-guided soliton fiber laser," Opt...solitons in a ring fiber laser," Optics Communications 281 (22), 5614 (2008). 110. L. M. Zhao, D. Y. Tang, J. Wu, X. Q. Fu, and S. C. Wen , "Noise-like
Ultrafast Optics - Vector Cavity Lasers: Physics and Technology
2016-06-14
with a quasi- vector cavity both numerically and experimentally. It is expected that through the study a deep and comprehensive understanding on the...799-801, Jun. 1997. 31. L. M. Zhao, D. Y. Tang, J. Wu, X. Q. Fu, and S. C. Wen , "Noise-like pulse in a gain-guided soliton fiber laser," Opt...solitons in a ring fiber laser," Optics Communications 281 (22), 5614 (2008). 110. L. M. Zhao, D. Y. Tang, J. Wu, X. Q. Fu, and S. C. Wen , "Noise-like
Spatial optical (2+1)-dimensional scalar- and vector-solitons in saturable nonlinear media
Energy Technology Data Exchange (ETDEWEB)
Weilnau, C.; Traeger, D.; Schroeder, J.; Denz, C. [Institute of Applied Physics, Westfaelische Wilhelms-Universitaet Muenster, Corrensstr. 2/4, 48149 Muenster (Germany); Ahles, M.; Petter, J. [Institute of Applied Physics, Technische Universitaet Darmstadt, Hochschulstr. 6, 64289 Darmstadt (Germany)
2002-10-01
(2+1)-dimensional optical spatial solitons have become a major field of research in nonlinear physics throughout the last decade due to their potential in adaptive optical communication technologies. With the help of photorefractive crystals that supply the required type of nonlinearity for soliton generation, we are able to demonstrate experimentally the formation, the dynamic properties, and especially the interaction of solitary waves, which were so far only known from general soliton theory. Among the complex interaction scenarios of scalar solitons, we reveal a distinct behavior denoted as anomalous interaction, which is unique in soliton-supporting systems. Further on, we realize highly parallel, light-induced waveguide configurations based on photorefractive screening solitons that give rise to technical applications towards waveguide couplers and dividers as well as all-optical information processing devices where light is controlled by light itself. Finally, we demonstrate the generation, stability and propagation dynamics of multi-component or vector solitons, multipole transverse optical structures bearing a complex geometry. In analogy to the particle-light dualism of scalar solitons, various types of vector solitons can - in a broader sense - be interpreted as molecules of light. (Abstract Copyright [2002], Wiley Periodicals, Inc.)
Spatial optical (2+1)-dimensional scalar- and vector-solitons in saturable nonlinear media
International Nuclear Information System (INIS)
Weilnau, C.; Traeger, D.; Schroeder, J.; Denz, C.; Ahles, M.; Petter, J.
2002-01-01
(2+1)-dimensional optical spatial solitons have become a major field of research in nonlinear physics throughout the last decade due to their potential in adaptive optical communication technologies. With the help of photorefractive crystals that supply the required type of nonlinearity for soliton generation, we are able to demonstrate experimentally the formation, the dynamic properties, and especially the interaction of solitary waves, which were so far only known from general soliton theory. Among the complex interaction scenarios of scalar solitons, we reveal a distinct behavior denoted as anomalous interaction, which is unique in soliton-supporting systems. Further on, we realize highly parallel, light-induced waveguide configurations based on photorefractive screening solitons that give rise to technical applications towards waveguide couplers and dividers as well as all-optical information processing devices where light is controlled by light itself. Finally, we demonstrate the generation, stability and propagation dynamics of multi-component or vector solitons, multipole transverse optical structures bearing a complex geometry. In analogy to the particle-light dualism of scalar solitons, various types of vector solitons can - in a broader sense - be interpreted as molecules of light. (Abstract Copyright [2002], Wiley Periodicals, Inc.)
Optical-limiting response of rare-earth metallo-phthalocyanine-doped copolymer matrix
Aneeshkumar, B.N.; Gopinath, P.; Vallabhan, C.P.G.; Nampoori, V.P.N.; Radhakrishnan, P.; Thomas, J.
2003-01-01
The nanosecond optical-limiting characteristics (at 532 nm) of some rare-earth metallo-phthalocyanines (Sm(Pc)2, Eu(Pc)2, and LaPc) doped in a copolymer matrix of poly(Me methacrylate) and Me-2-cyanoacrylate were studied for the 1st time to the authors' knowledge. The optical-limiting response is
Analysis of Few-Mode Multi-Core Fiber Splice Behavior Using an Optical Vector Network Analyzer
DEFF Research Database (Denmark)
Rommel, Simon; Mendinueta, Jose Manuel Delgado; Klaus, Werner
2017-01-01
The behavior of splices in a 3-mode 36-core ﬁber is analyzed using optical vector network analysis. Time-domain response analysis conﬁrms splices may cause signiﬁcant mode-mixing, while frequency-domain analysis shows splices may affect system level mode-dependent loss both positively and negativ......The behavior of splices in a 3-mode 36-core ﬁber is analyzed using optical vector network analysis. Time-domain response analysis conﬁrms splices may cause signiﬁcant mode-mixing, while frequency-domain analysis shows splices may affect system level mode-dependent loss both positively...
A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors
DEFF Research Database (Denmark)
Liu, Weifeng; Vinter, Brian
2015-01-01
General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle...... extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse matrix is unknown in advance, (2) very expensive parallel insert operations at random positions in the resulting sparse matrix dominate the execution time, and (3) load balancing must account for sparse data...... memory space and efficiently utilizes the very limited on-chip scratchpad memory. Parallel insert operations of the nonzero entries are implemented through the GPU merge path algorithm that is experimentally found to be the fastest GPU merge approach. Load balancing builds on the number of necessary...
Optical Flow in a Smart Sensor Based on Hybrid Analog-Digital Architecture
Directory of Open Access Journals (Sweden)
Pablo Guzmán
2010-03-01
Full Text Available The purpose of this study is to develop a motion sensor (delivering optical flow estimations using a platform that includes the sensor itself, focal plane processing resources, and co-processing resources on a general purpose embedded processor. All this is implemented on a single device as a SoC (System-on-a-Chip. Optical flow is the 2-D projection into the camera plane of the 3-D motion information presented at the world scenario. This motion representation is widespread well-known and applied in the science community to solve a wide variety of problems. Most applications based on motion estimation require work in real-time; hence, this restriction must be taken into account. In this paper, we show an efficient approach to estimate the motion velocity vectors with an architecture based on a focal plane processor combined on-chip with a 32 bits NIOS II processor. Our approach relies on the simplification of the original optical flow model and its efficient implementation in a platform that combines an analog (focal-plane and digital (NIOS II processor. The system is fully functional and is organized in different stages where the early processing (focal plane stage is mainly focus to pre-process the input image stream to reduce the computational cost in the post-processing (NIOS II stage. We present the employed co-design techniques and analyze this novel architecture. We evaluate the system’s performance and accuracy with respect to the different proposed approaches described in the literature. We also discuss the advantages of the proposed approach as well as the degree of efficiency which can be obtained from the focal plane processing capabilities of the system. The final outcome is a low cost smart sensor for optical flow computation with real-time performance and reduced power consumption that can be used for very diverse application domains.
Optical Flow in a Smart Sensor Based on Hybrid Analog-Digital Architecture
Guzmán, Pablo; Díaz, Javier; Agís, Rodrigo; Ros, Eduardo
2010-01-01
The purpose of this study is to develop a motion sensor (delivering optical flow estimations) using a platform that includes the sensor itself, focal plane processing resources, and co-processing resources on a general purpose embedded processor. All this is implemented on a single device as a SoC (System-on-a-Chip). Optical flow is the 2-D projection into the camera plane of the 3-D motion information presented at the world scenario. This motion representation is widespread well-known and applied in the science community to solve a wide variety of problems. Most applications based on motion estimation require work in real-time; hence, this restriction must be taken into account. In this paper, we show an efficient approach to estimate the motion velocity vectors with an architecture based on a focal plane processor combined on-chip with a 32 bits NIOS II processor. Our approach relies on the simplification of the original optical flow model and its efficient implementation in a platform that combines an analog (focal-plane) and digital (NIOS II) processor. The system is fully functional and is organized in different stages where the early processing (focal plane) stage is mainly focus to pre-process the input image stream to reduce the computational cost in the post-processing (NIOS II) stage. We present the employed co-design techniques and analyze this novel architecture. We evaluate the system’s performance and accuracy with respect to the different proposed approaches described in the literature. We also discuss the advantages of the proposed approach as well as the degree of efficiency which can be obtained from the focal plane processing capabilities of the system. The final outcome is a low cost smart sensor for optical flow computation with real-time performance and reduced power consumption that can be used for very diverse application domains. PMID:22319283
Wavelength-encoded OCDMA system using opto-VLSI processors.
Aljada, Muhsen; Alameh, Kamal
2007-07-01
We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.
Wavelength-encoded OCDMA system using opto-VLSI processors
Aljada, Muhsen; Alameh, Kamal
2007-07-01
We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.
Treecode with a Special-Purpose Processor
Makino, Junichiro
1991-08-01
We describe an implementation of the modified Barnes-Hut tree algorithm for a gravitational N-body calculation on a GRAPE (GRAvity PipE) backend processor. GRAPE is a special-purpose computer for N-body calculations. It receives the positions and masses of particles from a host computer and then calculates the gravitational force at each coordinate specified by the host. To use this GRAPE processor with the hierarchical tree algorithm, the host computer must maintain a list of all nodes that exert force on a particle. If we create this list for each particle of the system at each timestep, the number of floating-point operations on the host and that on GRAPE would become comparable, and the increased speed obtained by using GRAPE would be small. In our modified algorithm, we create a list of nodes for many particles. Thus, the amount of the work required of the host is significantly reduced. This algorithm was originally developed by Barnes in order to vectorize the force calculation on a Cyber 205. With this algorithm, the computing time of the force calculation becomes comparable to that of the tree construction, if the GRAPE backend processor is sufficiently fast. The obtained speed-up factor is 30 to 50 for a RISC-based host computer and GRAPE-1A with a peak speed of 240 Mflops.
International Nuclear Information System (INIS)
Ganesan, Lakshmi Meena; Wirges, Werner; Gerhard, Reimund; Mellinger, Axel
2010-01-01
Polymer-dispersed liquid crystals (PDLCs) are composite materials that consist of micrometre-sized liquid-crystal (LC) droplets embedded in a polymer matrix. From ferroelectric poly(vinylidene fluoride-trifluoroethylene) (P(VDF-TrFE)) and a nematic LC, PDLC films containing 10 and 60 wt% LC were prepared, and their electro-optical and piezo-optical behaviour was investigated. The electric field that is generated by the application of mechanical stress leads to changes in the transmittance of the PDLC film through a combination of piezoelectric and electro-optical effects. Such a piezo-optical PDLC material may be useful, e.g., in sensing and visualization applications.
DEFF Research Database (Denmark)
Lee, Kyo-Beum; Blaabjerg, Frede
2004-01-01
This paper presents a new sensorless vector control system for high performance induction motor drives fed by a matrix converter with non-linearity compensation. The nonlinear voltage distortion that is caused by commutation delay and on-state voltage drop in switching device is corrected by a new...
DEFF Research Database (Denmark)
Rommel, Simon; Mendinueta, José Manuel Delgado; Klaus, Werner
2017-01-01
This paper discusses spatially diverse optical vector network analysis for space division multiplexing (SDM) component and system characterization, which is becoming essential as SDM is widely considered to increase the capacity of optical communication systems. Characterization of a 108-channel ...... in the few-mode multi-core fiber and their impact on system IL and MDL are analyzed, finding splices to cause significant mode-mixing and to be non-negligible in system capacity analysis.......This paper discusses spatially diverse optical vector network analysis for space division multiplexing (SDM) component and system characterization, which is becoming essential as SDM is widely considered to increase the capacity of optical communication systems. Characterization of a 108-channel...... photonic lantern spatial multiplexer, coupled to a 36-core 3-mode fiber, is experimentally demonstrated, extracting the full impulse response and complex transfer function matrices as well as insertion loss (IL) and mode-dependent loss (MDL) data. Moreover, the mode-mixing behavior of fiber splices...
Energy Technology Data Exchange (ETDEWEB)
Paulsson, Bjorn N.P. [Paulsson, Inc., Van Nuys, CA (United States); Thornburg, Jon A. [Paulsson, Inc., Van Nuys, CA (United States); He, Ruiqing [Paulsson, Inc., Van Nuys, CA (United States)
2015-04-21
Seismic techniques are the dominant geophysical techniques for the characterization of subsurface structures and stratigraphy. The seismic techniques also dominate the monitoring and mapping of reservoir injection and production processes. Borehole seismology, of all the seismic techniques, despite its current shortcomings, has been shown to provide the highest resolution characterization and most precise monitoring results because it generates higher signal to noise ratio and higher frequency data than surface seismic techniques. The operational environments for borehole seismic instruments are however much more demanding than for surface seismic instruments making both the instruments and the installation much more expensive. The current state-of-the-art borehole seismic instruments have not been robust enough for long term monitoring compounding the problems with expensive instruments and installations. Furthermore, they have also not been able to record the large bandwidth data available in boreholes or having the sensitivity allowing them to record small high frequency micro seismic events with high vector fidelity. To reliably achieve high resolution characterization and long term monitoring of Enhanced Geothermal Systems (EGS) sites a new generation of borehole seismic instruments must therefore be developed and deployed. To address the critical site characterization and monitoring needs for EGS programs, US Department of Energy (DOE) funded Paulsson, Inc. in 2010 to develop a fiber optic based ultra-large bandwidth clamped borehole seismic vector array capable of deploying up to one thousand 3C sensor pods suitable for deployment into ultra-high temperature and high pressure boreholes. Tests of the fiber optic seismic vector sensors developed on the DOE funding have shown that the new borehole seismic sensor technology is capable of generating outstanding high vector fidelity data with extremely large bandwidth: 0.01 – 6,000 Hz. Field tests have shown
Golomidov, Y. V.; Li, S. K.; Popov, S. A.; Smolov, V. B.
1986-01-01
After a classification and analysis of electronic and optoelectronic switching devices, the design principles and structure of a matrix optical switch is described. The switching and pair-exclusion operations in this type of switch are examined, and a method for the optical switching of communication channels is elaborated. Finally, attention is given to the structural organization of a parallel computer system with a matrix optical switch.
Video deraining and desnowing using temporal correlation and low-rank matrix completion.
Kim, Jin-Hwan; Sim, Jae-Young; Kim, Chang-Su
2015-09-01
A novel algorithm to remove rain or snow streaks from a video sequence using temporal correlation and low-rank matrix completion is proposed in this paper. Based on the observation that rain streaks are too small and move too fast to affect the optical flow estimation between consecutive frames, we obtain an initial rain map by subtracting temporally warped frames from a current frame. Then, we decompose the initial rain map into basis vectors based on the sparse representation, and classify those basis vectors into rain streak ones and outliers with a support vector machine. We then refine the rain map by excluding the outliers. Finally, we remove the detected rain streaks by employing a low-rank matrix completion technique. Furthermore, we extend the proposed algorithm to stereo video deraining. Experimental results demonstrate that the proposed algorithm detects and removes rain or snow streaks efficiently, outperforming conventional algorithms.
DEFF Research Database (Denmark)
Dridi, Kim; Bjarklev, Anders Overgaard
1999-01-01
An electromagnetic vector-field modle for design of optical components based on the finite-difference-time-domain method and radiation integrals in presented. Its ability to predict the optical electromagnetic dynamics in structures with complex material distribution is demonstrated. Theoretical...
Investigation of optical currents in coherent and partially coherent vector fields
DEFF Research Database (Denmark)
Angelsky, O. V.; Gorsky, M. P.; Maksimyak, P. P.
2011-01-01
We present the computer simulation results of the spatial distri-bution of the Poynting vector and illustrate motion of micro and nanopar-ticles in spatially inhomogeneously polarized fields. The influence of phase relations and the degree of mutual coherence of superimposing waves...... by polarization characteristics of an optical field alone, using nanoscale me-tallic particles has been shown experimentally....
HTGR core seismic analysis using an array processor
International Nuclear Information System (INIS)
Shatoff, H.; Charman, C.M.
1983-01-01
A Floating Point Systems array processor performs nonlinear dynamic analysis of the high-temperature gas-cooled reactor (HTGR) core with significant time and cost savings. The graphite HTGR core consists of approximately 8000 blocks of various shapes which are subject to motion and impact during a seismic event. Two-dimensional computer programs (CRUNCH2D, MCOCO) can perform explicit step-by-step dynamic analyses of up to 600 blocks for time-history motions. However, use of two-dimensional codes was limited by the large cost and run times required. Three-dimensional analysis of the entire core, or even a large part of it, had been considered totally impractical. Because of the needs of the HTGR core seismic program, a Floating Point Systems array processor was used to enhance computer performance of the two-dimensional core seismic computer programs, MCOCO and CRUNCH2D. This effort began by converting the computational algorithms used in the codes to a form which takes maximum advantage of the parallel and pipeline processors offered by the architecture of the Floating Point Systems array processor. The subsequent conversion of the vectorized FORTRAN coding to the array processor required a significant programming effort to make the system work on the General Atomic (GA) UNIVAC 1100/82 host. These efforts were quite rewarding, however, since the cost of running the codes has been reduced approximately 50-fold and the time threefold. The core seismic analysis with large two-dimensional models has now become routine and extension to three-dimensional analysis is feasible. These codes simulate the one-fifth-scale full-array HTGR core model. This paper compares the analysis with the test results for sine-sweep motion
Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms
Quintin, Jean-Noel
2013-10-01
Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon\\'s algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon\\'s algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.
Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms
Quintin, Jean-Noel; Hasanov, Khalid; Lastovetsky, Alexey
2013-01-01
Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon's algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon's algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.
Architecture and evaluation of software-defined optical switching matrix for hybrid data centers
DEFF Research Database (Denmark)
Mehmeri, Victor; Vegas Olmos, Juan José; Tafur Monroy, Idelfonso
2016-01-01
A software architecture is proposed for hybrid packet/optical data centers employing programmable NETCONF-enabled optical switching matrix, and a performance evaluation is presented comparing hybrid and electrical-only architectures for elephant flows under different traffic patterns. Network...
International Nuclear Information System (INIS)
Asai, Kiyoshi; Shinozawa, Naohisa; Ishikawa, Hirohiko; Chino, Masamichi; Hayashi, Takashi
1983-02-01
Three computer codes MATHEW, ADPIC of LLNL and GAMPUL of JAERI for prediction of wind field, concentration and external exposure rate of airborne radioactive materials are vectorized and the results are presented. Using the continuous equation of incompressible flow as a constraint, the MATHEW calculates the three dimensional wind field by a variational method. Using the particle-in -cell method, the ADPIC calculates the advection and diffusion of radioactive materials in three dimensional wind field and terrain, and gives the concentration of the materials in each cell of the domain. The GAMPUL calculates the external exposure rate assuming Gaussian plume type distribution of concentration. The vectorized code MATHEW attained 7.8 times speedup by a vector processor FACOM230-75 APU. The ADPIC and GAMPUL are estimated to attain 1.5 and 4 times speedup respectively on CRAY-1 type vector processor. (author)
Energy Technology Data Exchange (ETDEWEB)
Singh, Harpreet; Arvind; Dorai, Kavita, E-mail: kavita@iisermohali.ac.in
2016-09-07
Estimation of quantum states is an important step in any quantum information processing experiment. A naive reconstruction of the density matrix from experimental measurements can often give density matrices which are not positive, and hence not physically acceptable. How do we ensure that at all stages of reconstruction, we keep the density matrix positive? Recently a method has been suggested based on maximum likelihood estimation, wherein the density matrix is guaranteed to be positive definite. We experimentally implement this protocol on an NMR quantum information processor. We discuss several examples and compare with the standard method of state estimation. - Highlights: • State estimation using maximum likelihood method was performed on an NMR quantum information processor. • Physically valid density matrices were obtained every time in contrast to standard quantum state tomography. • Density matrices of several different entangled and separable states were reconstructed for two and three qubits.
gFEX, the ATLAS Calorimeter Level-1 Real Time Processor
AUTHOR|(SzGeCERN)759889; The ATLAS collaboration; Begel, Michael; Chen, Hucheng; Lanni, Francesco; Takai, Helio; Wu, Weihao
2016-01-01
The global feature extractor (gFEX) is a component of the Level-1 Calorimeter trigger Phase-I upgrade for the ATLAS experiment. It is intended to identify patterns of energy associated with the hadronic decays of high momentum Higgs, W, & Z bosons, top quarks, and exotic particles in real time at the LHC crossing rate. The single processor board will be packaged in an Advanced Telecommunications Computing Architecture (ATCA) module and implemented as a fast reconfigurable processor based on three Xilinx Vertex Ultra-scale FPGAs. The board will receive coarse-granularity information from all the ATLAS calorimeters on 276 optical fibers with the data transferred at the 40 MHz Large Hadron Collider (LHC) clock frequency. The gFEX will be controlled by a single system-on-chip processor, ZYNQ, that will be used to configure all the processor Field-Programmable Gate Array (FPGAs), monitor board health, and interface to external signals. Now, the pre-prototype board which includes one ZYNQ and one Vertex-7 FPGA ...
gFEX, the ATLAS Calorimeter Level 1 Real Time Processor
Tang, Shaochun; The ATLAS collaboration
2015-01-01
The global feature extractor (gFEX) is a component of the Level-1Calorimeter trigger Phase-I upgrade for the ATLAS experiment. It is intended to identify patterns of energy associated with the hadronic decays of high momentum Higgs, W, & Z bosons, top quarks, and exotic particles in real time at the LHC crossing rate. The single processor board will be packaged in an Advanced Telecommunications Computing Architecture (ATCA) module and implemented as a fast reconfigurable processor based on three Xilinx Ultra-scale FPGAs. The board will receive coarse-granularity information from all the ATLAS calorimeters on 264 optical fibers with the data transferred at the 40 MHz LHC clock frequency. The gFEX will be controlled by a single system-on-chip processor, ZYNQ, that will be used to configure all the processor FPGAs, monitor board health, and interface to external signals. Now, the pre-prototype board which includes one ZYNQ and one Vertex-7 FPGA has been designed for testing and verification. The performance ...
Li, Wei; Liu, Jian Guo; Zhu, Ning Hua
2015-04-15
We report a novel optical vector network analyzer (OVNA) with improved accuracy based on polarization modulation and stimulated Brillouin scattering (SBS) assisted polarization pulling. The beating between adjacent higher-order optical sidebands which are generated because of the nonlinearity of an electro-optic modulator (EOM) introduces considerable error to the OVNA. In our scheme, the measurement error is significantly reduced by removing the even-order optical sidebands using polarization discrimination. The proposed approach is theoretically analyzed and experimentally verified. The experimental results show that the accuracy of the OVNA is greatly improved compared to a conventional OVNA.
Malas, Tareq Majed Yasin; Ahmadia, Aron; Brown, Jed; Gunnels, John A.; Keyes, David E.
2012-01-01
Several emerging petascale architectures use energy-efficient processors with vectorized computational units and in-order thread processing. On these architectures the sustained performance of streaming numerical kernels, ubiquitous in the solution
Shift-, rotation-, and scale-invariant shape recognition system using an optical Hough transform
Schmid, Volker R.; Bader, Gerhard; Lueder, Ernst H.
1998-02-01
We present a hybrid shape recognition system with an optical Hough transform processor. The features of the Hough space offer a separate cancellation of distortions caused by translations and rotations. Scale invariance is also provided by suitable normalization. The proposed system extends the capabilities of Hough transform based detection from only straight lines to areas bounded by edges. A very compact optical design is achieved by a microlens array processor accepting incoherent light as direct optical input and realizing the computationally expensive connections massively parallel. Our newly developed algorithm extracts rotation and translation invariant normalized patterns of bright spots on a 2D grid. A neural network classifier maps the 2D features via a nonlinear hidden layer onto the classification output vector. We propose initialization of the connection weights according to regions of activity specifically assigned to each neuron in the hidden layer using a competitive network. The presented system is designed for industry inspection applications. Presently we have demonstrated detection of six different machined parts in real-time. Our method yields very promising detection results of more than 96% correctly classified parts.
International Nuclear Information System (INIS)
Potton, R J
2004-01-01
The application of reciprocity principles in optics has a long history that goes back to Stokes, Lorentz, Helmholtz and others. Moreover, optical applications need to be seen in the context of applications of reciprocity in particle scattering, acoustics, seismology and the solution of inverse problems, generally. In some of these other fields vector wave propagation is, as it is in optics, of the essence. For this reason the simplified approach to light wave polarization developed by, and named for, Jones is explored initially to see how and to what extent it encompasses reciprocity. The characteristic matrix of a uniform dielectric layer, used in the analysis of interference filters and mirrors, is reciprocal except when the layer is magneto-optical. The way in which the reciprocal nature of a characteristic matrix can be recognized is discussed next. After this, work on the influence of more realistic attributes of a dielectric stack on reciprocity is reviewed. Some of the numerous technological applications of magneto-optic non-reciprocal media are identified and the potential of a new class of non-reciprocal components is briefly introduced. Finally, the extension of the classical reciprocity concept to systems containing components that have nonlinear optical response is briefly mentioned
Optical vector network analysis of ultranarrow transitions in 166Er3+ : 7LiYF4 crystal.
Kukharchyk, N; Sholokhov, D; Morozov, O; Korableva, S L; Cole, J H; Kalachev, A A; Bushev, P A
2018-02-15
We present optical vector network analysis (OVNA) of an isotopically purified Er166 3+ :LiYF 4 7 crystal. The OVNA method is based on generation and detection of a modulated optical sideband by using a radio-frequency vector network analyzer. This technique is widely used in the field of microwave photonics for the characterization of optical responses of optical devices such as filters and high-Q resonators. However, dense solid-state atomic ensembles induce a large phase shift on one of the optical sidebands that results in the appearance of extra features on the measured transmission response. We present a simple theoretical model that accurately describes the observed spectra and helps to reconstruct the absorption profile of a solid-state atomic ensemble as well as corresponding change of the refractive index in the vicinity of atomic resonances.
Ultrafast Optics: Vector Cavity Fiber Lasers - Physics and Technology
2016-06-14
with a quasi- vector cavity both numerically and experimentally. It is expected that through the study a deep and comprehensive understanding on the...799-801, Jun. 1997. 31. L. M. Zhao, D. Y. Tang, J. Wu, X. Q. Fu, and S. C. Wen , "Noise-like pulse in a gain-guided soliton fiber laser," Opt...solitons in a ring fiber laser," Optics Communications 281 (22), 5614 (2008). 110. L. M. Zhao, D. Y. Tang, J. Wu, X. Q. Fu, and S. C. Wen , "Noise-like
KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators
Abdelfattah, Ahmad
2016-05-11
KBLAS is an open-source, high-performance library that provides optimized kernels for a subset of Level 2 BLAS functionalities on CUDA-enabled GPUs. Since performance of dense matrix-vector multiplication is hindered by the overhead of memory accesses, a double-buffering optimization technique is employed to overlap data motion with computation. After identifying a proper set of tuning parameters, KBLAS efficiently runs on various GPU architectures while avoiding code rewriting and retaining compliance with the standard BLAS API. Another optimization technique allows ensuring coalesced memory access when dealing with submatrices, especially for high-level dense linear algebra algorithms. All KBLAS kernels have been leveraged to a multi-GPU environment, which requires the introduction of new APIs. Considering general matrices, KBLAS is very competitive with existing state-of-the-art kernels and provides a smoother performance across a wide range of matrix dimensions. Considering symmetric and Hermitian matrices, the KBLAS performance outperforms existing state-of-the-art implementations on all matrix sizes and achieves asymptotically up to 50% and 60% speedup against the best competitor on single GPU and multi-GPUs systems, respectively. Performance results also validate our performance model. A subset of KBLAS highperformance kernels have been integrated into NVIDIA\\'s standard BLAS implementation (cuBLAS) for larger dissemination, starting from version 6.0. © 2016 ACM.
Lie-optic matrix algorithm for computer simulation of paraxial self ...
Indian Academy of Sciences (India)
It gives rise to a matrix method for self-focusing beam propagation that is ... are applicable for other media like linear optical fibers and to more general ..... operators for small slices of the plasma of thickness ¡z each, it is advisable to work.
Principles of laser spectroscopy and quantum optics
Berman, Paul R
2011-01-01
Principles of Laser Spectroscopy and Quantum Optics is an essential textbook for graduate students studying the interaction of optical fields with atoms. It also serves as an ideal reference text for researchers working in the fields of laser spectroscopy and quantum optics. The book provides a rigorous introduction to the prototypical problems of radiation fields interacting with two- and three-level atomic systems. It examines the interaction of radiation with both atomic vapors and condensed matter systems, the density matrix and the Bloch vector, and applications involving linear absorptio
Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi
Stanic, Milan; Palomar, Oscar; Ratkovic, Ivan; Duric, Milovan; Unsal, Osman; Cristal, Adrian; Valero, Mateo
2014-01-01
Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released by Intel with new features such as a wide 512-bit vector unit and vector scatter/gather instru...
Franklin, Joel N
2003-01-01
Mathematically rigorous introduction covers vector and matrix norms, the condition-number of a matrix, positive and irreducible matrices, much more. Only elementary algebra and calculus required. Includes problem-solving exercises. 1968 edition.
Energy Technology Data Exchange (ETDEWEB)
Wald, R M [Chicago Univ., Ill. (USA). Lab. for Astrophysics and Space Research
1975-11-01
Hawking's analysis of particle creation by black holes is extended by explicity obtaining the expression for the quantum mechanical state vector PSI which results from particle creation starting from the vacuum during gravitational collapse. We first discuss the quantum field theory of a Hermitian scalar field in an external potential or in a curved but asymptotically flat spacetime with no horizon present. Making the necessary modification for the case when a horizon is present, we apply this theory for a massless Hermitian scalar field to get the state vector describing the steady state emission at late times for particle creation during gravitational collapse to a Schwarzschild black hole. We find that the state vector describing particle creation from the vacuum decomposes into a simple product of state vectors for each individual mode. The density matrix describing emission of particles to infinity by this particle creation process is found to be identical to that of black body emission. Thus, black hole emission agrees in complete detail with black body emission (orig./BJ).
Hyper-systolic matrix multiplication
Lippert, Th.; Petkov, N.; Palazzari, P.; Schilling, K.
A novel parallel algorithm for matrix multiplication is presented. It is based on a 1-D hyper-systolic processor abstraction. The procedure can be implemented on all types of parallel systems. (C) 2001 Elsevier Science B,V. All rights reserved.
Efficient multitasking of Choleski matrix factorization on CRAY supercomputers
Overman, Andrea L.; Poole, Eugene L.
1991-01-01
A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.
International Nuclear Information System (INIS)
He, Cenlin; Takano, Yoshi; Liou, Kuo-Nan; Yang, Ping; Li, Qinbin; Mackowski, Daniel W.
2016-01-01
We perform a comprehensive intercomparison of the geometric-optics surface-wave (GOS) approach, the superposition T-matrix method, and laboratory measurements for optical properties of fresh and coated/aged black carbon (BC) particles with complex structures. GOS and T-matrix calculations capture the measured optical (i.e., extinction, absorption, and scattering) cross sections of fresh BC aggregates, with 5–20% differences depending on particle size. We find that the T-matrix results tend to be lower than the measurements, due to uncertainty in theoretical approximations of realistic BC structures, particle property measurements, and numerical computations in the method. On the contrary, the GOS results are higher than the measurements (hence the T-matrix results) for BC radii 100 nm. We find good agreement (differences 100 nm. We find small deviations (≤10%) in asymmetry factors computed from the two methods for most BC coating structures and sizes, but several complex structures have 10–30% differences. This study provides the foundation for downstream application of the GOS approach in radiative transfer and climate studies. - Highlights: • The GOS and T-matrix methods capture laboratory measurements of BC optical properties. • The GOS results are consistent with the T-matrix results for BC optical properties. • BC optical properties vary remarkably with coating structures and sizes during aging.
Distributed CPU multi-core implementation of SIRT with vectorized matrix kernel for micro-CT
Energy Technology Data Exchange (ETDEWEB)
Gregor, Jens [Tennessee Univ., Knoxville, TN (United States)
2011-07-01
We describe an implementation of SIRT for execution using a cluster of multi-core PCs. Algorithmic techniques are presented for reducing the size and computational cost of a reconstruction including near-optimal relaxation, scalar preconditioning, orthogonalized ordered subsets, and data-driven focus of attention. Implementation wise, a scheme is outlined which provides each core mutex-free access to its local shared memory while also balancing the workload across the cluster, and the system matrix is computed on-the-fly using vectorized code. Experimental results show the efficacy of the approach. (orig.)
[Orthogonal Vector Projection Algorithm for Spectral Unmixing].
Song, Mei-ping; Xu, Xing-wei; Chang, Chein-I; An, Ju-bai; Yao, Li
2015-12-01
Spectrum unmixing is an important part of hyperspectral technologies, which is essential for material quantity analysis in hyperspectral imagery. Most linear unmixing algorithms require computations of matrix multiplication and matrix inversion or matrix determination. These are difficult for programming, especially hard for realization on hardware. At the same time, the computation costs of the algorithms increase significantly as the number of endmembers grows. Here, based on the traditional algorithm Orthogonal Subspace Projection, a new method called. Orthogonal Vector Projection is prompted using orthogonal principle. It simplifies this process by avoiding matrix multiplication and inversion. It firstly computes the final orthogonal vector via Gram-Schmidt process for each endmember spectrum. And then, these orthogonal vectors are used as projection vector for the pixel signature. The unconstrained abundance can be obtained directly by projecting the signature to the projection vectors, and computing the ratio of projected vector length and orthogonal vector length. Compared to the Orthogonal Subspace Projection and Least Squares Error algorithms, this method does not need matrix inversion, which is much computation costing and hard to implement on hardware. It just completes the orthogonalization process by repeated vector operations, easy for application on both parallel computation and hardware. The reasonability of the algorithm is proved by its relationship with Orthogonal Sub-space Projection and Least Squares Error algorithms. And its computational complexity is also compared with the other two algorithms', which is the lowest one. At last, the experimental results on synthetic image and real image are also provided, giving another evidence for effectiveness of the method.
Abdelfattah, Ahmad
2016-05-23
Simulations of many multi-component PDE-based applications, such as petroleum reservoirs or reacting flows, are dominated by the solution, on each time step and within each Newton step, of large sparse linear systems. The standard solver is a preconditioned Krylov method. Along with application of the preconditioner, memory-bound Sparse Matrix-Vector Multiplication (SpMV) is the most time-consuming operation in such solvers. Multi-species models produce Jacobians with a dense block structure, where the block size can be as large as a few dozen. Failing to exploit this dense block structure vastly underutilizes hardware capable of delivering high performance on dense BLAS operations. This paper presents a GPU-accelerated SpMV kernel for block-sparse matrices. Dense matrix-vector multiplications within the sparse-block structure leverage optimization techniques from the KBLAS library, a high performance library for dense BLAS kernels. The design ideas of KBLAS can be applied to block-sparse matrices. Furthermore, a technique is proposed to balance the workload among thread blocks when there are large variations in the lengths of nonzero rows. Multi-GPU performance is highlighted. The proposed SpMV kernel outperforms existing state-of-the-art implementations using matrices with real structures from different applications. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Abdelfattah, Ahmad; Ltaief, Hatem; Keyes, David E.; Dongarra, Jack
2016-01-01
Simulations of many multi-component PDE-based applications, such as petroleum reservoirs or reacting flows, are dominated by the solution, on each time step and within each Newton step, of large sparse linear systems. The standard solver is a preconditioned Krylov method. Along with application of the preconditioner, memory-bound Sparse Matrix-Vector Multiplication (SpMV) is the most time-consuming operation in such solvers. Multi-species models produce Jacobians with a dense block structure, where the block size can be as large as a few dozen. Failing to exploit this dense block structure vastly underutilizes hardware capable of delivering high performance on dense BLAS operations. This paper presents a GPU-accelerated SpMV kernel for block-sparse matrices. Dense matrix-vector multiplications within the sparse-block structure leverage optimization techniques from the KBLAS library, a high performance library for dense BLAS kernels. The design ideas of KBLAS can be applied to block-sparse matrices. Furthermore, a technique is proposed to balance the workload among thread blocks when there are large variations in the lengths of nonzero rows. Multi-GPU performance is highlighted. The proposed SpMV kernel outperforms existing state-of-the-art implementations using matrices with real structures from different applications. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Energy Technology Data Exchange (ETDEWEB)
Williams, Samuel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Vuduc, Richard [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Shalf, John [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Yelick, Katherine [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States); Demmel, James [Univ. of California, Berkeley, CA (United States)
2007-01-01
We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientific study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.
Lectures on light nonlinear and quantum optics using the density matrix
Rand, Stephen C.
2016-01-01
This book bridges the gap between introductory quantum mechanics and the research front of modern optics and scientific fields that make use of light. While suitable as a reference for the specialist in quantum optics, it also targets non-specialists from other disciplines who need to understand light and its uses in research. It introduces a single analytic tool, the density matrix, to analyze complex optical phenomena encountered in traditional as well as cross-disciplinary research. It moves swiftly in a tight sequence from elementary to sophisticated topics in quantum optics, including optical tweezers, laser cooling, coherent population transfer, optical magnetism, electromagnetically induced transparency, squeezed light, and cavity quantum electrodynamics. A systematic approach starts with the simplest systems—stationary two-level atoms—then introduces atomic motion, adds more energy levels, and moves on to discuss first-, second-, and third-order coherence effects that are the basis for analyzing n...
A Geometric Algebra Co-Processor for Color Edge Detection
Directory of Open Access Journals (Sweden)
Biswajit Mishra
2015-01-01
Full Text Available This paper describes advancement in color edge detection, using a dedicated Geometric Algebra (GA co-processor implemented on an Application Specific Integrated Circuit (ASIC. GA provides a rich set of geometric operations, giving the advantage that many signal and image processing operations become straightforward and the algorithms intuitive to design. The use of GA allows images to be represented with the three R, G, B color channels defined as a single entity, rather than separate quantities. A novel custom ASIC is proposed and fabricated that directly targets GA operations and results in significant performance improvement for color edge detection. Use of the hardware described in this paper also shows that the convolution operation with the rotor masks within GA belongs to a class of linear vector filters and can be applied to image or speech signals. The contribution of the proposed approach has been demonstrated by implementing three different types of edge detection schemes on the proposed hardware. The overall performance gains using the proposed GA Co-Processor over existing software approaches are more than 3.2× faster than GAIGEN and more than 2800× faster than GABLE. The performance of the fabricated GA co-processor is approximately an order of magnitude faster than previously published results for hardware implementations.
Optical computing - an alternate approach to trigger processing
International Nuclear Information System (INIS)
Cleland, W.E.
1981-01-01
The enormous rate reduction factors required by most ISABELLE experiments suggest that we should examine every conceivable approach to trigger processing. One approach that has not received much attention by high energy physicists is optical data processing. The past few years have seen rapid advances in optoelectronic technology, stimulated mainly by the military and the communications industry. An intriguing question is whether one can utilize this technology together with the optical computing techniques that have been developed over the past two decades to develop a rapid trigger processor for high energy physics experiments. Optical data processing is a method for performing a few very specialized operations on data which is inherently two dimensional. Typical operations are the formation of convolution or correlation integrals between the input data and information stored in the processor in the form of an optical filter. Optical processors are classed as coherent or incoherent, according to the spatial coherence of the input wavefront. Typically, in a coherent processor a laser beam is modulated with a photographic transparency which represents the input data. In an incoherent processor, the input may be an incoherently illuminated transparency, but self-luminous objects, such as an oscilloscope trace, have also been used. We consider here an incoherent processor in which the input data is converted into an optical wavefront through the excitation of an array of point sources - either light emitting diodes or injection lasers
Catalán, Sandra; Igual, Francisco D.; Mayo, Rafael; Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.
2015-01-01
Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware ...
Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)
1983-01-01
A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
Improved parallel solution techniques for the integral transport matrix method
Energy Technology Data Exchange (ETDEWEB)
Zerr, R. Joseph, E-mail: rjz116@psu.edu [Department of Mechanical and Nuclear Engineering, The Pennsylvania State University, University Park, PA (United States); Azmy, Yousry Y., E-mail: yyazmy@ncsu.edu [Department of Nuclear Engineering, North Carolina State University, Burlington Engineering Laboratories, Raleigh, NC (United States)
2011-07-01
Alternative solution strategies to the parallel block Jacobi (PBJ) method for the solution of the global problem with the integral transport matrix method operators have been designed and tested. The most straightforward improvement to the Jacobi iterative method is the Gauss-Seidel alternative. The parallel red-black Gauss-Seidel (PGS) algorithm can improve on the number of iterations and reduce work per iteration by applying an alternating red-black color-set to the subdomains and assigning multiple sub-domains per processor. A parallel GMRES(m) method was implemented as an alternative to stationary iterations. Computational results show that the PGS method can improve on the PBJ method execution time by up to 10´ when eight sub-domains per processor are used. However, compared to traditional source iterations with diffusion synthetic acceleration, it is still approximately an order of magnitude slower. The best-performing cases are optically thick because sub-domains decouple, yielding faster convergence. Further tests revealed that 64 sub-domains per processor was the best performing level of sub-domain division. An acceleration technique that improves the convergence rate would greatly improve the ITMM. The GMRES(m) method with a diagonal block pre conditioner consumes approximately the same time as the PBJ solver but could be improved by an as yet undeveloped, more efficient pre conditioner. (author)
Improved parallel solution techniques for the integral transport matrix method
International Nuclear Information System (INIS)
Zerr, R. Joseph; Azmy, Yousry Y.
2011-01-01
Alternative solution strategies to the parallel block Jacobi (PBJ) method for the solution of the global problem with the integral transport matrix method operators have been designed and tested. The most straightforward improvement to the Jacobi iterative method is the Gauss-Seidel alternative. The parallel red-black Gauss-Seidel (PGS) algorithm can improve on the number of iterations and reduce work per iteration by applying an alternating red-black color-set to the subdomains and assigning multiple sub-domains per processor. A parallel GMRES(m) method was implemented as an alternative to stationary iterations. Computational results show that the PGS method can improve on the PBJ method execution time by up to 10´ when eight sub-domains per processor are used. However, compared to traditional source iterations with diffusion synthetic acceleration, it is still approximately an order of magnitude slower. The best-performing cases are optically thick because sub-domains decouple, yielding faster convergence. Further tests revealed that 64 sub-domains per processor was the best performing level of sub-domain division. An acceleration technique that improves the convergence rate would greatly improve the ITMM. The GMRES(m) method with a diagonal block pre conditioner consumes approximately the same time as the PBJ solver but could be improved by an as yet undeveloped, more efficient pre conditioner. (author)
A vectorized Monte Carlo code for modeling photon transport in SPECT
International Nuclear Information System (INIS)
Smith, M.F.; Floyd, C.E. Jr.; Jaszczak, R.J.
1993-01-01
A vectorized Monte Carlo computer code has been developed for modeling photon transport in single photon emission computed tomography (SPECT). The code models photon transport in a uniform attenuating region and photon detection by a gamma camera. It is adapted from a history-based Monte Carlo code in which photon history data are stored in scalar variables and photon histories are computed sequentially. The vectorized code is written in FORTRAN77 and uses an event-based algorithm in which photon history data are stored in arrays and photon history computations are performed within DO loops. The indices of the DO loops range over the number of photon histories, and these loops may take advantage of the vector processing unit of our Stellar GS1000 computer for pipelined computations. Without the use of the vector processor the event-based code is faster than the history-based code because of numerical optimization performed during conversion to the event-based algorithm. When only the detection of unscattered photons is modeled, the event-based code executes 5.1 times faster with the use of the vector processor than without; when the detection of scattered and unscattered photons is modeled the speed increase is a factor of 2.9. Vectorization is a valuable way to increase the performance of Monte Carlo code for modeling photon transport in SPECT
Li, Hongye; Wan, Hongdan; Zhang, Zuxing; Sun, Bing; Zhang, Lin
2016-10-01
This paper investigates optical properties of few-mode fiber with non-uniform refractive index, namely: the few mode fiber with U-shape refractive index and the two-mode and four-mode few-mode fiber with bent radius. Finite element method is used to analyze the mode distributions based on their non-uniform refractive index. Effective mode control can be achieved through these few mode fibers to achieve vector beam generation. Finally, reflection spectra of a few-mode fiber Bragg grating are calculated theoretically and then measured under different bending conditions. Experimental results are in good accordance with the theoretical ones. These few mode fibers show potential applications in generation of cylindrical vector beam both for optical lasing and sensing systems.
DEFF Research Database (Denmark)
Lee, Kyo-Beum; Blaabjerg, Frede
2004-01-01
This paper presents a new sensorless vector control system for high performance induction motor drives fed by a matrix converter with a non-linearity compensation and disturbance observer. The nonlinear voltage distortion that is caused by communication delay and on-state voltage drop in switching...
Green Secure Processors: Towards Power-Efficient Secure Processor Design
Chhabra, Siddhartha; Solihin, Yan
With the increasing wealth of digital information stored on computer systems today, security issues have become increasingly important. In addition to attacks targeting the software stack of a system, hardware attacks have become equally likely. Researchers have proposed Secure Processor Architectures which utilize hardware mechanisms for memory encryption and integrity verification to protect the confidentiality and integrity of data and computation, even from sophisticated hardware attacks. While there have been many works addressing performance and other system level issues in secure processor design, power issues have largely been ignored. In this paper, we first analyze the sources of power (energy) increase in different secure processor architectures. We then present a power analysis of various secure processor architectures in terms of their increase in power consumption over a base system with no protection and then provide recommendations for designs that offer the best balance between performance and power without compromising security. We extend our study to the embedded domain as well. We also outline the design of a novel hybrid cryptographic engine that can be used to minimize the power consumption for a secure processor. We believe that if secure processors are to be adopted in future systems (general purpose or embedded), it is critically important that power issues are considered in addition to performance and other system level issues. To the best of our knowledge, this is the first work to examine the power implications of providing hardware mechanisms for security.
Nested dissection on a mesh-connected processor array
International Nuclear Information System (INIS)
Worley, P.H.; Schreiber, R.
1986-01-01
The authors present a parallel implementation of Gaussian elimination without pivoting using the nested dissection ordering for solving Ax=b where A is an N x N symmetric positive definite matrix. If the graph of A is a √N x √N finite element mesh then a parallel complexity of O(√N) can be achieved for Gaussian elimination with the nested dissection ordering. The authors' implementation achieves this parallel complexity on a two dimensional MIMD processor array with N processors and nearest neighbors interconnections. Thus nested dissection is a near optimal algorithm for this problem on this interconnection topology. The parallel implementation on this architecture requires 158√N + O(log/sub 2/(√N)) parallel floating point multiplications. It is faster than a Kung-Leiserson systolic array for banded matrices for N≥961, and faster than a serial implementation for N as small as 9
Probabilistic programmable quantum processors
International Nuclear Information System (INIS)
Buzek, V.; Ziman, M.; Hillery, M.
2004-01-01
We analyze how to improve performance of probabilistic programmable quantum processors. We show how the probability of success of the probabilistic processor can be enhanced by using the processor in loops. In addition, we show that an arbitrary SU(2) transformations of qubits can be encoded in program state of a universal programmable probabilistic quantum processor. The probability of success of this processor can be enhanced by a systematic correction of errors via conditional loops. Finally, we show that all our results can be generalized also for qudits. (Abstract Copyright [2004], Wiley Periodicals, Inc.)
Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors
Abdul Kareem PARCHUR; Ram Asaray SINGH
2012-01-01
High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310). The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many ke...
International Nuclear Information System (INIS)
Kunz, P.F.
1976-01-01
The problems of data analysis with hardware processors are reviewed and a description is given of a programmable processor. This processor, the 168/E, has been designed for use in the LASS multi-processor system; it has an execution speed comparable to the IBM 370/168 and uses the subset of IBM 370 instructions appropriate to the LASS analysis task. (Auth.)
Wysokiński, Karol; Filipowicz, Marta; Stańczyk, Tomasz; Lipiński, Stanisław; Napierała, Marek; Murawski, Michał; Nasiłowski, Tomasz
2017-10-01
A matrix of optical fiber sensors eligible for remote measurements is reported in this paper. The aim of work was to monitor the air quality with a device, which does not need any electricity on site of the measurement. The matrix consists of several sensors detecting carbon dioxide concentration, relative humidity and temperature. Sensors utilize active optical materials, which change their color when exposed to varied conditions. All the sensors are powered with standard light emitting diodes. Light is transmitted by an optical fiber from the light source and then it reaches the active layer which changes its color, when the conditions change. This results in a change of attenuation of light passing through the active layer. Modified light is then transmitted by another optical fiber to the detector, where simple photoresistor is used. It is powered by a stabilized DC power supply and the current is measured. Since no expensive elements are needed to manufacture such a matrix of sensors, its price may be competitive to the price of the devices already available on the market, while the matrix also exhibits other valuable properties.
An integrated processor for photonic quantum states using a broadband light–matter interface
International Nuclear Information System (INIS)
Saglamyurek, E; Sinclair, N; Slater, J A; Heshami, K; Oblak, D; Tittel, W
2014-01-01
Faithful storage and coherent manipulation of quantum optical pulses are key for long distance quantum communications and quantum computing. Combining these functions in a light–matter interface that can be integrated on-chip with other photonic quantum technologies, e.g. sources of entangled photons, is an important step towards these applications. To date there have only been a few demonstrations of coherent pulse manipulation utilizing optical storage devices compatible with quantum states, and that only in atomic gas media (making integration difficult) and with limited capabilities. Here we describe how a broadband waveguide quantum memory based on the atomic frequency comb (AFC) protocol can be used as a programmable processor for essentially arbitrary spectral and temporal manipulations of individual quantum optical pulses. Using weak coherent optical pulses at the few photon level, we experimentally demonstrate sequencing, time-to-frequency multiplexing and demultiplexing, splitting, interfering, temporal and spectral filtering, compressing and stretching as well as selective delaying. Our integrated light–matter interface offers high-rate, robust and easily configurable manipulation of quantum optical pulses and brings fully practical optical quantum devices one step closer to reality. Furthermore, as the AFC protocol is suitable for storage of intense light pulses, our processor may also find applications in classical communications. (paper)
Hyperbolic-symmetry vector fields.
Gao, Xu-Zhen; Pan, Yue; Cai, Meng-Qiang; Li, Yongnan; Tu, Chenghou; Wang, Hui-Tian
2015-12-14
We present and construct a new kind of orthogonal coordinate system, hyperbolic coordinate system. We present and design a new kind of local linearly polarized vector fields, which is defined as the hyperbolic-symmetry vector fields because the points with the same polarization form a series of hyperbolae. We experimentally demonstrate the generation of such a kind of hyperbolic-symmetry vector optical fields. In particular, we also study the modified hyperbolic-symmetry vector optical fields with the twofold and fourfold symmetric states of polarization when introducing the mirror symmetry. The tight focusing behaviors of these vector fields are also investigated. In addition, we also fabricate micro-structures on the K9 glass surfaces by several tightly focused (modified) hyperbolic-symmetry vector fields patterns, which demonstrate that the simulated tightly focused fields are in good agreement with the fabricated micro-structures.
Demonstration of two-qubit algorithms with a superconducting quantum processor.
DiCarlo, L; Chow, J M; Gambetta, J M; Bishop, Lev S; Johnson, B R; Schuster, D I; Majer, J; Blais, A; Frunzio, L; Girvin, S M; Schoelkopf, R J
2009-07-09
Quantum computers, which harness the superposition and entanglement of physical states, could outperform their classical counterparts in solving problems with technological impact-such as factoring large numbers and searching databases. A quantum processor executes algorithms by applying a programmable sequence of gates to an initialized register of qubits, which coherently evolves into a final state containing the result of the computation. Building a quantum processor is challenging because of the need to meet simultaneously requirements that are in conflict: state preparation, long coherence times, universal gate operations and qubit readout. Processors based on a few qubits have been demonstrated using nuclear magnetic resonance, cold ion trap and optical systems, but a solid-state realization has remained an outstanding challenge. Here we demonstrate a two-qubit superconducting processor and the implementation of the Grover search and Deutsch-Jozsa quantum algorithms. We use a two-qubit interaction, tunable in strength by two orders of magnitude on nanosecond timescales, which is mediated by a cavity bus in a circuit quantum electrodynamics architecture. This interaction allows the generation of highly entangled states with concurrence up to 94 per cent. Although this processor constitutes an important step in quantum computing with integrated circuits, continuing efforts to increase qubit coherence times, gate performance and register size will be required to fulfil the promise of a scalable technology.
Discrete Fourier transformation processor based on complex radix (−1 + j number system
Directory of Open Access Journals (Sweden)
Anidaphi Shadap
2017-02-01
Full Text Available Complex radix (−1 + j allows the arithmetic operations of complex numbers to be done without treating the divide and conquer rules, which offers the significant speed improvement of complex numbers computation circuitry. Design and hardware implementation of complex radix (−1 + j converter has been introduced in this paper. Extensive simulation results have been incorporated and an application of this converter towards the implementation of discrete Fourier transformation (DFT processor has been presented. The functionality of the DFT processor have been verified in Xilinx ISE design suite version 14.7 and performance parameters like propagation delay and dynamic switching power consumption have been calculated by Virtuoso platform in Cadence. The proposed DFT processor has been implemented through conversion, multiplication and addition. The performance parameter matrix in terms of delay and power consumption offered a significant improvement over other traditional implementation of DFT processor.
Versatile generation of optical vector fields and vector beams using a non-interferometric approach.
Tripathi, Santosh; Toussaint, Kimani C
2012-05-07
We present a versatile, non-interferometric method for generating vector fields and vector beams which can produce all the states of polarization represented on a higher-order Poincaré sphere. The versatility and non-interferometric nature of this method is expected to enable exploration of various exotic properties of vector fields and vector beams. To illustrate this, we study the propagation properties of some vector fields and find that, in general, propagation alters both their intensity and polarization distribution, and more interestingly, converts some vector fields into vector beams. In the article, we also suggest a modified Jones vector formalism to represent vector fields and vector beams.
Parallel/vector algorithms for the spherical SN transport theory method
International Nuclear Information System (INIS)
Haghighat, A.; Mattis, R.E.
1990-01-01
This paper discusses vector and parallel processing of a 1-D curvilinear (i.e. spherical) S N transport theory algorithm on the Cornell National SuperComputer Facility (CNSF) IBM 3090/600E. Two different vector algorithms were developed and parallelized based on angular decomposition. It is shown that significant speedups are attainable. For example, for problems with large granularity, using 4 processors, the parallel/vector algorithm achieves speedups (for wall-clock time) of more than 4.5 relative to the old serial/scalar algorithm. Furthermore, this work has demonstrated the existing potential for the development of faster processing vector and parallel algorithms for multidimensional curvilinear geometries. (author)
Vector pulsing soliton of self-induced transparency in waveguide
International Nuclear Information System (INIS)
Adamashvili, G.T.
2015-01-01
A theory of an optical resonance vector pulsing soliton in waveguide is developed. A thin transition layer containing semiconductor quantum dots forms the boundary between the waveguide and one of the connected media. Analytical and numerical solutions for the optical vector pulsing soliton in waveguide are obtained. The vector pulsing soliton in the presence of excitonic and bi-excitonic excitations is compared with the soliton for waveguide TM-modes with parameters that can be used in modern optical experiments. It is shown that these nonlinear waves have significantly different parameters and shapes. - Highlights: • An optical vector pulsing soliton in a planar waveguide is presented. • Explicit form of the optical vector pulsing soliton are obtained. • The vector pulsing soliton and the soliton have different parameters and profiles
ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors
DEFF Research Database (Denmark)
Liu, Weifeng; Vinter, Brian
2014-01-01
and its child nodes must be executed sequentially, and (2) heaps, even d-heaps (d-ary heaps or d-way heaps), cannot supply enough wide data parallelism to these processors. Recent research proposed more versatile asymmetric multicore processors (AMPs) that consist of two types of cores (latency......-oriented cores with high single-thread performance and throughput-oriented cores with wide vector processing capability), unified memory address space and faster synchronization mechanism among cores with different ISAs. To leverage the AMPs for the heap data structure, in this paper we propose ad......-heap, an efficient heap data structure that introduces an implicit bridge structure and properly apportions workloads to the two types of cores. We implement a batch k-selection algorithm and conduct experiments on simulated AMP environments composed of real CPUs and GPUs. In our experiments on two representative...
Parallel Kalman filter track fit based on vector classes
Energy Technology Data Exchange (ETDEWEB)
Kisel, Ivan [GSI Helmholtzzentrum fuer Schwerionenforschung GmbH (Germany); Kretz, Matthias [Kirchhoff-Institut fuer Physik, Ruprecht-Karls Universitaet, Heidelberg (Germany); Kulakov, Igor [Goethe-Universitaet, Frankfurt am Main (Germany); National Taras Shevchenko University, Kyiv (Ukraine)
2010-07-01
Modern high energy physics experiments have to process terabytes of input data produced in particle collisions. The core of the data reconstruction in high energy physics is the Kalman filter. Therefore, developing the fast Kalman filter algorithm, which uses maximum available power of modern processors, is important, in particular for initial selection of events interesting for the new physics. One of processors features, which can speed up the algorithm, is a SIMD instruction set, which allows to pack several data items in one register and operate on all of them in one go, thus achieving more operations per clock cycle. Therefore a flexible and useful interface, which uses the SIMD instruction set on different CPU and GPU processors architectures, has been realized as a vector classes library. The Kalman filter based track fitting algorithm has been implemented with use of the vector classes. Fitting quality tests show good results with the residuals equal to 49 {mu}m and 44 {mu}m for x and y track parameters and relative momentum resolution of 0.7%. The fitting time of 0.053 {mu}s per track has been achieved on Intel Xeon X5550 with 8 cores at 2.6 GHz by using in addition Intel Threading Building Blocks.
Vectorization in quantum chemistry
International Nuclear Information System (INIS)
Saunders, V.R.
1987-01-01
It is argued that the optimal vectorization algorithm for many steps (and sub-steps) in a typical ab initio calculation of molecular electronic structure is quite strongly dependent on the target vector machine. Details such as the availability (or lack) of a given vector construct in the hardware, vector startup times and asymptotic rates must all be considered when selecting the optimal algorithm. Illustrations are drawn from: gaussian integral evaluation, fock matrix construction, 4-index transformation of molecular integrals, direct-CI methods, the matrix multiply operation. A cross comparison of practical implementations on the CDC Cyber 205, the Cray-IS and Cray-XMP machines is presented. To achieve portability while remaining optimal on a wide range of machines it is necessary to code all available algorithms in a machine independent manner, and to select the appropriate algorithm using a procedure which is based on machine dependent parameters. Most such parameters concern the timing of certain vector loop kernals, which can usually be derived from a 'bench-marking' routine executed prior to the calculation proper
Vector assembly of colloids on monolayer substrates
Jiang, Lingxiang; Yang, Shenyu; Tsang, Boyce; Tu, Mei; Granick, Steve
2017-06-01
The key to spontaneous and directed assembly is to encode the desired assembly information to building blocks in a programmable and efficient way. In computer graphics, raster graphics encodes images on a single-pixel level, conferring fine details at the expense of large file sizes, whereas vector graphics encrypts shape information into vectors that allow small file sizes and operational transformations. Here, we adapt this raster/vector concept to a 2D colloidal system and realize `vector assembly' by manipulating particles on a colloidal monolayer substrate with optical tweezers. In contrast to raster assembly that assigns optical tweezers to each particle, vector assembly requires a minimal number of optical tweezers that allow operations like chain elongation and shortening. This vector approach enables simple uniform particles to form a vast collection of colloidal arenes and colloidenes, the spontaneous dissociation of which is achieved with precision and stage-by-stage complexity by simply removing the optical tweezers.
Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms
Energy Technology Data Exchange (ETDEWEB)
Williams, Samuel; Oliker, Leonid; Vuduc, Richard; Shalf, John; Yelick, Katherine; Demmel, James
2008-10-16
We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one of the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.
Video image processor on the Spacelab 2 Solar Optical Universal Polarimeter /SL2 SOUP/
Lindgren, R. W.; Tarbell, T. D.
1981-01-01
The SOUP instrument is designed to obtain diffraction-limited digital images of the sun with high photometric accuracy. The Video Processor originated from the requirement to provide onboard real-time image processing, both to reduce the telemetry rate and to provide meaningful video displays of scientific data to the payload crew. This original concept has evolved into a versatile digital processing system with a multitude of other uses in the SOUP program. The central element in the Video Processor design is a 16-bit central processing unit based on 2900 family bipolar bit-slice devices. All arithmetic, logical and I/O operations are under control of microprograms, stored in programmable read-only memory and initiated by commands from the LSI-11. Several functions of the Video Processor are described, including interface to the High Rate Multiplexer downlink, cosmetic and scientific data processing, scan conversion for crew displays, focus and exposure testing, and use as ground support equipment.
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.
Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut
2018-05-03
Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.
Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors
Directory of Open Access Journals (Sweden)
Abdul Kareem PARCHUR
2012-08-01
Full Text Available High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310. The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many key application areas in modern generation. The scaling of performance in two major series of Intel Xeon processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310 has been analyzed using the performance numbers of 12 CPU2006 integer benchmarks, performance numbers that exhibit significant differences in performance. The results and analysis can be used by performance engineers, scientists and developers to better understand the performance scaling in modern generation processors.
Simulation of a parallel processor on a serial processor: The neutron diffusion equation
International Nuclear Information System (INIS)
Honeck, H.C.
1981-01-01
Parallel processors could provide the nuclear industry with very high computing power at a very moderate cost. Will we be able to make effective use of this power. This paper explores the use of a very simple parallel processor for solving the neutron diffusion equation to predict power distributions in a nuclear reactor. We first describe a simple parallel processor and estimate its theoretical performance based on the current hardware technology. Next, we show how the parallel processor could be used to solve the neutron diffusion equation. We then present the results of some simulations of a parallel processor run on a serial processor and measure some of the expected inefficiencies. Finally we extrapolate the results to estimate how actual design codes would perform. We find that the standard numerical methods for solving the neutron diffusion equation are still applicable when used on a parallel processor. However, some simple modifications to these methods will be necessary if we are to achieve the full power of these new computers. (orig.) [de
Exploring query execution strategies for JIT vectorization and SIMD
T.K. Gubner (Tim); P.A. Boncz (Peter)
2017-01-01
textabstractThis paper partially explores the design space for efficient query processors on future hardware that is rich in SIMD capabilities. It departs from two well-known approaches: (1) interpreted block-at-a-time execution (a.k.a. "vectorization") and (2) "data-centric" JIT compilation, as in
Rank-Optimized Logistic Matrix Regression toward Improved Matrix Data Classification.
Zhang, Jianguang; Jiang, Jianmin
2018-02-01
While existing logistic regression suffers from overfitting and often fails in considering structural information, we propose a novel matrix-based logistic regression to overcome the weakness. In the proposed method, 2D matrices are directly used to learn two groups of parameter vectors along each dimension without vectorization, which allows the proposed method to fully exploit the underlying structural information embedded inside the 2D matrices. Further, we add a joint [Formula: see text]-norm on two parameter matrices, which are organized by aligning each group of parameter vectors in columns. This added co-regularization term has two roles-enhancing the effect of regularization and optimizing the rank during the learning process. With our proposed fast iterative solution, we carried out extensive experiments. The results show that in comparison to both the traditional tensor-based methods and the vector-based regression methods, our proposed solution achieves better performance for matrix data classifications.
Efficacy of Code Optimization on Cache-based Processors
VanderWijngaart, Rob F.; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
The current common wisdom in the U.S. is that the powerful, cost-effective supercomputers of tomorrow will be based on commodity (RISC) micro-processors with cache memories. Already, most distributed systems in the world use such hardware as building blocks. This shift away from vector supercomputers and towards cache-based systems has brought about a change in programming paradigm, even when ignoring issues of parallelism. Vector machines require inner-loop independence and regular, non-pathological memory strides (usually this means: non-power-of-two strides) to allow efficient vectorization of array operations. Cache-based systems require spatial and temporal locality of data, so that data once read from main memory and stored in high-speed cache memory is used optimally before being written back to main memory. This means that the most cache-friendly array operations are those that feature zero or unit stride, so that each unit of data read from main memory (a cache line) contains information for the next iteration in the loop. Moreover, loops ought to be 'fat', meaning that as many operations as possible are performed on cache data-provided instruction caches do not overflow and enough registers are available. If unit stride is not possible, for example because of some data dependency, then care must be taken to avoid pathological strides, just ads on vector computers. For cache-based systems the issues are more complex, due to the effects of associativity and of non-unit block (cache line) size. But there is more to the story. Most modern micro-processors are superscalar, which means that they can issue several (arithmetic) instructions per clock cycle, provided that there are enough independent instructions in the loop body. This is another argument for providing fat loop bodies. With these restrictions, it appears fairly straightforward to produce code that will run efficiently on any cache-based system. It can be argued that although some of the important
Functional unit for a processor
Rohani, A.; Kerkhoff, Hans G.
2013-01-01
The invention relates to a functional unit for a processor, such as a Very Large Instruction Word Processor. The invention further relates to a processor comprising at least one such functional unit. The invention further relates to a functional unit and processor capable of mitigating the effect of
Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A
2013-11-05
Mechanisms for performing matrix multiplication operations with data pre-conditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation.
Off-diagonal helicity density matrix elements for vector mesons produced in polarized e+e- processes
International Nuclear Information System (INIS)
Anselmino, M.; Murgia, F.; Quintairos, P.
1999-04-01
Final state q q-bar interactions give origin to non zero values of the off-diagonal element ρ 1,-1 of the helicity density matrix of vector mesons produced in e + e - annihilations, as confirmed by recent OPAL data on φ, D * and K * 's. New predictions are given for ρ 1,-1 of several mesons produced at large x E and small p T - i.e. collinear with the parent jet - in the annihilation of polarized 3 + and 3 - , the results depend strongly on the elementary dynamics and allow further non trivial tests of the standard model. (author)
Krasilenko, Vladimir G.; Lazarev, Alexander A.; Nikitovich, Diana V.
2017-10-01
The paper considers results of design and modeling of continuously logical base cells (CL BC) based on current mirrors (CM) with functions of preliminary analogue and subsequent analogue-digital processing for creating sensor multichannel analog-to-digital converters (SMC ADCs) and image processors (IP). For such with vector or matrix parallel inputs-outputs IP and SMC ADCs it is needed active basic photosensitive cells with an extended electronic circuit, which are considered in paper. Such basic cells and ADCs based on them have a number of advantages: high speed and reliability, simplicity, small power consumption, high integration level for linear and matrix structures. We show design of the CL BC and ADC of photocurrents and their various possible implementations and its simulations. We consider CL BC for methods of selection and rank preprocessing and linear array of ADCs with conversion to binary codes and Gray codes. In contrast to our previous works here we will dwell more on analogue preprocessing schemes for signals of neighboring cells. Let us show how the introduction of simple nodes based on current mirrors extends the range of functions performed by the image processor. Each channel of the structure consists of several digital-analog cells (DC) on 15-35 CMOS. The amount of DC does not exceed the number of digits of the formed code, and for an iteration type, only one cell of DC, complemented by the device of selection and holding (SHD), is required. One channel of ADC with iteration is based on one DC-(G) and SHD, and it has only 35 CMOS transistors. In such ADCs easily parallel code can be realized and also serial-parallel output code. The circuits and simulation results of their design with OrCAD are shown. The supply voltage of the DC is 1.8÷3.3V, the range of an input photocurrent is 0.1÷24μA, the transformation time is 20÷30nS at 6-8 bit binary or Gray codes. The general power consumption of the ADC with iteration is only 50÷100μW, if the
Cai, Meng-Qiang; Wang, Zhou-Xiang; Liang, Juan; Wang, Yan-Kun; Gao, Xu-Zhen; Li, Yongnan; Tu, Chenghou; Wang, Hui-Tian
2017-08-01
The scheme for generating vector optical fields should have not only high efficiency but also flexibility for satisfying the requirements of various applications. However, in general, high efficiency and flexibility are not compatible. Here we present and experimentally demonstrate a solution to directly, flexibly, and efficiently generate vector vortex optical fields (VVOFs) with a reflective phase-only liquid crystal spatial light modulator (LC-SLM) based on optical birefringence of liquid crystal molecules. To generate the VVOFs, this approach needs in principle only a half-wave plate, an LC-SLM, and a quarter-wave plate. This approach has some advantages, including a simple experimental setup, good flexibility, and high efficiency, making the approach very promising in some applications when higher power is need. This approach has a generation efficiency of 44.0%, which is much higher than the 1.1% of the common path interferometric approach.
Energy Technology Data Exchange (ETDEWEB)
Walz, H.V.
1980-07-01
An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 ..mu..sec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed.
International Nuclear Information System (INIS)
Walz, H.V.
1980-07-01
An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 μsec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed
Accuracy in Optical Information Processing
Timucin, Dogan Aslan
Low computational accuracy is an important obstacle for optical processors which blocks their way to becoming a practical reality and a serious challenger for classical computing paradigms. This research presents a comprehensive solution approach to the problem of accuracy enhancement in discrete analog optical information processing systems. Statistical analysis of a generic three-plane optical processor is carried out first, taking into account the effects of diffraction, interchannel crosstalk, and background radiation. Noise sources included in the analysis are photon, excitation, and emission fluctuations in the source array, transmission and polarization fluctuations in the modulator, and photoelectron, gain, dark, shot, and thermal noise in the detector array. Means and mutual coherence and probability density functions are derived for both optical and electrical output signals. Next, statistical models for a number of popular optoelectronic devices are studied. Specific devices considered here are light-emitting and laser diode sources, an ideal noiseless modulator and a Gaussian random-amplitude-transmittance modulator, p-i-n and avalanche photodiode detectors followed by electronic postprocessing, and ideal free-space geometrical -optics propagation and single-lens imaging systems. Output signal statistics are determined for various interesting device combinations by inserting these models into the general formalism. Finally, based on these special-case output statistics, results on accuracy limitations and enhancement in optical processors are presented. Here, starting with the formulation of the accuracy enhancement problem as (1) an optimal detection problem and (2) as a parameter estimation problem, the potential accuracy improvements achievable via the classical multiple-hypothesis -testing and maximum likelihood and Bayesian parameter estimation methods are demonstrated. Merits of using proper normalizing transforms which can potentially stabilize
International Nuclear Information System (INIS)
Hyan, Yong Sil; Kim, Heung Tae; Kwon, Dal Gwan; Choi, Myung Joon; Cheung, Hwan
1986-01-01
Recently, Demands of Automatic X-ray film Processors are increasing more and more at University Hospitals and general Hospitals and Private clinics, but various troubles because of incorrect control were found out. Authors have researched to find out the function and Activity of Automatic X-ray film processor for 2 weeks Kodak RPX-OMAT Processor and Sakura GX3000 Processor and Doosan parka 2000 Processor and results obtained were as follows: 1. Automatic X-ray film processor have an advantage to conduct the rapid treatment of X-ray film processing but incorrect handling of developing and fixing agents were brought about a great change in Contrast and Optical density of X-ray film pictures. 2. About 300 X-ray film could be finished by same developing and fixing solution without exchanging any other solutions in each Automatic X-ray film processor
A digital-signal-processor-based optical tomographic system for dynamic imaging of joint diseases
Lasker, Joseph M.
Over the last decade, optical tomography (OT) has emerged as viable biomedical imaging modality. Various imaging systems have been developed that are employed in preclinical as well as clinical studies, mostly targeting breast imaging, brain imaging, and cancer related studies. Of particular interest are so-called dynamic imaging studies where one attempts to image changes in optical properties and/or physiological parameters as they occur during a system perturbation. To successfully perform dynamic imaging studies, great effort is put towards system development that offers increasingly enhanced signal-to-noise performance at ever shorter data acquisition times, thus capturing high fidelity tomographic data within narrower time periods. Towards this goal, I have developed in this thesis a dynamic optical tomography system that is, unlike currently available analog instrumentation, based on digital data acquisition and filtering techniques. At the core of this instrument is a digital signal processor (DSP) that collects, collates, and processes the digitized data set. Complementary protocols between the DSP and a complex programmable logic device synchronizes the sampling process and organizes data flow. Instrument control is implemented through a comprehensive graphical user interface which integrates automated calibration, data acquisition, and signal post-processing. Real-time data is generated at frame rates as high as 140 Hz. An extensive dynamic range (˜190 dB) accommodates a wide scope of measurement geometries and tissue types. Performance analysis demonstrates very low system noise (˜1 pW rms noise equivalent power), excellent signal precision (˜0.04%--0.2%) and long term system stability (˜1% over 40 min). Experiments on tissue phantoms validate spatial and temporal accuracy of the system. As a potential new application of dynamic optical imaging I present the first application of this method to use vascular hemodynamics as a means of characterizing
Optical vector network analyzer based on double-sideband modulation.
Jun, Wen; Wang, Ling; Yang, Chengwu; Li, Ming; Zhu, Ning Hua; Guo, Jinjin; Xiong, Liangming; Li, Wei
2017-11-01
We report an optical vector network analyzer (OVNA) based on double-sideband (DSB) modulation using a dual-parallel Mach-Zehnder modulator. The device under test (DUT) is measured twice with different modulation schemes. By post-processing the measurement results, the response of the DUT can be obtained accurately. Since DSB modulation is used in our approach, the measurement range is doubled compared with conventional single-sideband (SSB) modulation-based OVNA. Moreover, the measurement accuracy is improved by eliminating the even-order sidebands. The key advantage of the proposed scheme is that the measurement of a DUT with bandpass response can also be simply realized, which is a big challenge for the SSB-based OVNA. The proposed method is theoretically and experimentally demonstrated.
2006-01-01
Intel’s first dual-core Itanium processor, code-named "Montecito" is a major release of Intel's Itanium 2 Processor Family, which implements the Intel Itanium architecture on a dual-core processor with two cores per die (integrated circuit). Itanium 2 is much more powerful than its predecessor. It has lower power consumption and thermal dissipation.
International Nuclear Information System (INIS)
Kunz, P.F.; Gravina, M.; Oxoby, G.
1984-04-01
The 3081/E project was formed to prepare a much improved IBM mainframe emulator for the future. Its design is based on a large amount of experience in using the 168/E processor to increase available CPU power in both online and offline environments. The processor will be at least equal to the execution speed of a 370/168 and up to 1.5 times faster for heavy floating point code. A single processor will thus be at least four times more powerful than the VAX 11/780, and five processors on a system would equal at least the performance of the IBM 3081K. With its large memory space and simple but flexible high speed interface, the 3081/E is well suited for the online and offline needs of high energy physics in the future
Does the Intel Xeon Phi processor fit HEP workloads?
Nowak, A.; Bitzes, G.; Dotti, A.; Lazzaro, A.; Jarp, S.; Szostek, P.; Valsan, L.; Botezatu, M.; Leduc, J.
2014-06-01
This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.
Vectorization, parallelization and porting of nuclear codes (porting). Progress report fiscal 1999
Energy Technology Data Exchange (ETDEWEB)
Kawasaki, Nobuo; Nemoto, Toshiyuki; Kawai, Wataru; Ishizuki, Shigeru [Fujitsu Ltd., Tokyo (Japan); Ogasawara, Shinobu; Kume, Etsuo; Adachi, Masaaki [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Yatake, Yo-ichi [Hitachi Ltd., Tokyo (Japan)
2001-01-01
Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system, the SX-4 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 18 codes in fiscal 1999. These results are reported in 3 parts, i.e., the vectorization and the parallelization part on vector processors, the parallelization port on scalar processors and the porting part. In this report, we describe the porting. In this porting part, the porting of Assisted Model Building with Energy Refinement code version 5 (AMBER5), general purpose Monte Carlo codes far neutron and photon transport calculations based on continuous energy and multigroup methods (MVP/GMVP), automatic editing system for MCNP library code (autonj), neutron damage calculations for materials irradiations and neutron damage calculations for compounds code (SPECTER/SPECOMP), severe accident analysis code (MELCOR) and COolant Boiling in Rod Arrays, Two-Fluid code (COBRA-TF) on the VPP500 system and/or the AP3000 system are described. (author)
Ultrastructure of the extracellular matrix of bovine dura mater, optic nerve sheath and sclera.
Raspanti, M; Marchini, M; Della Pasqua, V; Strocchi, R; Ruggeri, A
1992-10-01
The sclera, the outermost sheath of the optic nerve and the dura mater have been investigated histologically and ultrastructurally. Although these tissues appear very similar under the light microscope, being dense connective tissues mainly composed of collagen bundles and a limited amount of cells and elastic fibres, they exhibit subtle differences on electron microscopy. In the dura and sclera collagen appears in the form of large, nonuniform fibrils, similar to those commonly found in tendons, while in the optic nerve sheath the fibrils appear smaller and uniform, similar to those commonly observed in reticular tissues, vessel walls and skin. Freeze-fracture also reveals these fibrils to have different subfibrillar architectures, straight or helical, which correspond to 2 distinct forms of collagen fibril previously described (Raspanti et al. 1989). The other extracellular matrix components also vary with the particular collagen fibril structure. Despite their common embryological derivation, the dura mater, optic nerve sheath and sclera exhibit diversification of their extracellular matrix consistent with the mechanical loads to which these tissues are subjected. Our observations indicate that the outermost sheath of the optic nerve resembles the epineurium of peripheral nerves rather than the dura to which it is commonly likened.
Integrated fuel processor development
International Nuclear Information System (INIS)
Ahmed, S.; Pereira, C.; Lee, S. H. D.; Krumpelt, M.
2001-01-01
The Department of Energy's Office of Advanced Automotive Technologies has been supporting the development of fuel-flexible fuel processors at Argonne National Laboratory. These fuel processors will enable fuel cell vehicles to operate on fuels available through the existing infrastructure. The constraints of on-board space and weight require that these fuel processors be designed to be compact and lightweight, while meeting the performance targets for efficiency and gas quality needed for the fuel cell. This paper discusses the performance of a prototype fuel processor that has been designed and fabricated to operate with liquid fuels, such as gasoline, ethanol, methanol, etc. Rated for a capacity of 10 kWe (one-fifth of that needed for a car), the prototype fuel processor integrates the unit operations (vaporization, heat exchange, etc.) and processes (reforming, water-gas shift, preferential oxidation reactions, etc.) necessary to produce the hydrogen-rich gas (reformate) that will fuel the polymer electrolyte fuel cell stacks. The fuel processor work is being complemented by analytical and fundamental research. With the ultimate objective of meeting on-board fuel processor goals, these studies include: modeling fuel cell systems to identify design and operating features; evaluating alternative fuel processing options; and developing appropriate catalysts and materials. Issues and outstanding challenges that need to be overcome in order to develop practical, on-board devices are discussed
Optical threshold secret sharing scheme based on basic vector operations and coherence superposition
Deng, Xiaopeng; Wen, Wei; Mi, Xianwu; Long, Xuewen
2015-04-01
We propose, to our knowledge for the first time, a simple optical algorithm for secret image sharing with the (2,n) threshold scheme based on basic vector operations and coherence superposition. The secret image to be shared is firstly divided into n shadow images by use of basic vector operations. In the reconstruction stage, the secret image can be retrieved by recording the intensity of the coherence superposition of any two shadow images. Compared with the published encryption techniques which focus narrowly on information encryption, the proposed method can realize information encryption as well as secret sharing, which further ensures the safety and integrality of the secret information and prevents power from being kept centralized and abused. The feasibility and effectiveness of the proposed method are demonstrated by numerical results.
Estimation of pure autoregressive vector models for revenue series ...
African Journals Online (AJOL)
This paper aims at applying multivariate approach to Box and Jenkins univariate time series modeling to three vector series. General Autoregressive Vector Models with time varying coefficients are estimated. The first vector is a response vector, while others are predictor vectors. By matrix expansion each vector, whether ...
Pigarev, Aleksey V.; Bazarov, Timur O.; Fedorov, Vladimir V.; Ryabushkin, Oleg A.
2018-02-01
Most modern systems of the optical image registration are based on the matrices of photosensitive semiconductor heterostructures. However, measurement of radiation intensities up to several MW/cm2 -level using such detectors is a great challenge because semiconductor elements have low optical damage threshold. Reflecting or absorbing filters that can be used for attenuation of radiation intensity, as a rule, distort beam profile. Furthermore, semiconductor based devices have relatively narrow measurement wavelength bandwidth. We introduce a novel matrix method of optical image registration. This approach doesn't require any attenuation when measuring high radiation intensities. A sensitive element is the matrix made of thin transparent piezoelectric crystals that absorb just a small part of incident optical power. Each crystal element has its own set of intrinsic (acoustic) vibration modes. These modes can be exited due to the inverse piezoelectric effect when the external electric field is applied to the crystal sample providing that the field frequency corresponds to one of the vibration mode frequencies. Such piezoelectric resonances (PR) can be observed by measuring the radiofrequency response spectrum of the crystal placed between the capacitor plates. PR frequencies strongly depend on the crystal temperature. Temperature calibration of PR frequencies is conducted in the uniform heating conditions. In the case a crystal matrix is exposed to the laser radiation the incident power can be obtained separately for each crystal element by measuring its PR frequency kinetics providing that the optical absorption coefficient is known. The operating wavelength range of such sensor is restricted by the transmission bandwidth of the applied crystals. A plane matrix constituting of LiNbO3 crystals was assembled in order to demonstrate the possibility of application of the proposed approach. The crystal elements were placed between two electrodes forming a capacitor which
A class of parallel algorithms for computation of the manipulator inertia matrix
Fijany, Amir; Bejczy, Antal K.
1989-01-01
Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
First experience of vectorizing electromagnetic physics models for detector simulation
International Nuclear Information System (INIS)
Amadio, G; Bianchini, C; Apostolakis, J; Bitzes, G; Brun, R; Carminati, F; Gheata, A; Novak, M; Shadura, O; Wenzel, S; Bandieramonte, M; Canal, P; Elvira, D; Jun, S Y; Lima, G; Licht, J de Fine; Duhem, L; Presbyterian, M; Seghal, R
2015-01-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. In this paper we present the first experience of vectorizing electromagnetic physics models developed for the GeantV project. (paper)
First experience of vectorizing electromagnetic physics models for detector simulation
Amadio, G.; Apostolakis, J.; Bandieramonte, M.; Bianchini, C.; Bitzes, G.; Brun, R.; Canal, P.; Carminati, F.; de Fine Licht, J.; Duhem, L.; Elvira, D.; Gheata, A.; Jun, S. Y.; Lima, G.; Novak, M.; Presbyterian, M.; Shadura, O.; Seghal, R.; Wenzel, S.
2015-12-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. In this paper we present the first experience of vectorizing electromagnetic physics models developed for the GeantV project.
First experience of vectorizing electromagnetic physics models for detector simulation
Energy Technology Data Exchange (ETDEWEB)
Amadio, G. [Sao Paulo State U.; Apostolakis, J. [CERN; Bandieramonte, M. [Catania Astrophys. Observ.; Bianchini, C. [Mackenzie Presbiteriana U.; Bitzes, G. [CERN; Brun, R. [CERN; Canal, P. [Fermilab; Carminati, F. [CERN; Licht, J.de Fine [U. Copenhagen (main); Duhem, L. [Intel, Santa Clara; Elvira, D. [Fermilab; Gheata, A. [CERN; Jun, S. Y. [Fermilab; Lima, G. [Fermilab; Novak, M. [CERN; Presbyterian, M. [Bhabha Atomic Res. Ctr.; Shadura, O. [CERN; Seghal, R. [Bhabha Atomic Res. Ctr.; Wenzel, S. [CERN
2015-12-23
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. In this paper we present the first experience of vectorizing electromagnetic physics models developed for the GeantV project.
Energy Technology Data Exchange (ETDEWEB)
Yeung, Yu-Hong; Pothen, Alex; Halappanavar, Mahantesh; Huang, Zhenyu
2017-10-09
We present an augmented matrix approach to update the solution to a linear system of equations when the coefficient matrix is modified by a few elements within a principal submatrix. This problem arises in the dynamic security analysis of a power grid, where operators need to perform $N-x$ contingency analysis, i.e., determine the state of the system when up to $x$ links from $N$ fail. Our algorithms augment the coefficient matrix to account for the changes in it, and then compute the solution to the augmented system without refactoring the modified matrix. We provide two algorithms, a direct method, and a hybrid direct-iterative method for solving the augmented system. We also exploit the sparsity of the matrices and vectors to accelerate the overall computation. Our algorithms are compared on three power grids with PARDISO, a parallel direct solver, and CHOLMOD, a direct solver with the ability to modify the Cholesky factors of the coefficient matrix. We show that our augmented algorithms outperform PARDISO (by two orders of magnitude), and CHOLMOD (by a factor of up to 5). Further, our algorithms scale better than CHOLMOD as the number of elements updated increases. The solutions are computed with high accuracy. Our algorithms are capable of computing $N-x$ contingency analysis on a $778K$ bus grid, updating a solution with $x=20$ elements in $1.6 \\times 10^{-2}$ seconds on an Intel Xeon processor.
Massively parallel performance of neutron transport response matrix algorithms
International Nuclear Information System (INIS)
Hanebutte, U.R.; Lewis, E.E.
1993-01-01
Massively parallel red/black response matrix algorithms for the solution of within-group neutron transport problems are implemented on the Connection Machines-2, 200 and 5. The response matrices are dericed from the diamond-differences and linear-linear nodal discrete ordinate and variational nodal P 3 approximations. The unaccelerated performance of the iterative procedure is examined relative to the maximum rated performances of the machines. The effects of processor partitions size, of virtual processor ratio and of problems size are examined in detail. For the red/black algorithm, the ratio of inter-node communication to computing times is found to be quite small, normally of the order of ten percent or less. Performance increases with problems size and with virtual processor ratio, within the memeory per physical processor limitation. Algorithm adaptation to courser grain machines is straight-forward, with total computing time being virtually inversely proportional to the number of physical processors. (orig.)
DEFF Research Database (Denmark)
Holbek, Simon
, if this significant reduction in the element count can still provide precise and robust 3-D vector flow estimates in a plane. The study concludes that the RC array is capable of estimating precise 3-D vector flow both in a plane and in a volume, despite the low channel count. However, some inherent new challenges...... ultrasonic vector flow estimation and bring it a step closer to a clinical application. A method for high frame rate 3-D vector flow estimation in a plane using the transverse oscillation method combined with a 1024 channel 2-D matrix array is presented. The proposed method is validated both through phantom...... hampers the task of real-time processing. In a second study, some of the issue with the 2-D matrix array are solved by introducing a 2-D row-column (RC) addressing array with only 62 + 62 elements. It is investigated both through simulations and via experimental setups in various flow conditions...
Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.
2017-11-01
Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
The Molen Polymorphic Media Processor
Kuzmanov, G.K.
2004-01-01
In this dissertation, we address high performance media processing based on a tightly coupled co-processor architectural paradigm. More specifically, we introduce a reconfigurable media augmentation of a general purpose processor and implement it into a fully operational processor prototype. The
XOP: A second generation fast processor for on-line use in high energy physics experiments
International Nuclear Information System (INIS)
Lingjaerde, T.
1981-01-01
Processors for trigger calculations and data compression in high energy physics are characterized by a high data input capability combined with fas execution of relatively simple routines. In order to achieve the required performance it is advantageous to replace the classical computer instruction-set by microcoded instructions, the various fields of which control the internal subunits in parallel. The fast processor called ESOP is based on such a principle: the different operations are handled step by step by dedicated optimized modules under control of a central instruction unit. Thus, the arithmetic operations, address calculations, conditional checking, loop counts and next instruction evaluation all overlap in time. Based upon the experience from ESOP the architecture of a new processor 'XOP' is beginning to take shape which will be faster and easier to use. In this context the most important innovations are: easy handling of operands in the arithmetic unit by means of three data buses and large data files, a powerful data addressing unit for easy handling of vectors, as well as single operands, and a very flexible logic for conditional branching. Input/output will be made transparent through the introduction of internal fast processors which will be used in conjunction with powerful firmware as a software debugging aid. (orig.)
Does the Intel Xeon Phi processor fit HEP workloads?
International Nuclear Information System (INIS)
Nowak, A; Bitzes, G; Dotti, A; Lazzaro, A; Jarp, S; Szostek, P; Valsan, L; Botezatu, M; Leduc, J
2014-01-01
This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.
Bodewig, E
1959-01-01
Matrix Calculus, Second Revised and Enlarged Edition focuses on systematic calculation with the building blocks of a matrix and rows and columns, shunning the use of individual elements. The publication first offers information on vectors, matrices, further applications, measures of the magnitude of a matrix, and forms. The text then examines eigenvalues and exact solutions, including the characteristic equation, eigenrows, extremum properties of the eigenvalues, bounds for the eigenvalues, elementary divisors, and bounds for the determinant. The text ponders on approximate solutions, as well
Acceleration of spiking neural network based pattern recognition on NVIDIA graphics processors.
Han, Bing; Taha, Tarek M
2010-04-01
There is currently a strong push in the research community to develop biological scale implementations of neuron based vision models. Systems at this scale are computationally demanding and generally utilize more accurate neuron models, such as the Izhikevich and the Hodgkin-Huxley models, in favor of the more popular integrate and fire model. We examine the feasibility of using graphics processing units (GPUs) to accelerate a spiking neural network based character recognition network to enable such large scale systems. Two versions of the network utilizing the Izhikevich and Hodgkin-Huxley models are implemented. Three NVIDIA general-purpose (GP) GPU platforms are examined, including the GeForce 9800 GX2, the Tesla C1060, and the Tesla S1070. Our results show that the GPGPUs can provide significant speedup over conventional processors. In particular, the fastest GPGPU utilized, the Tesla S1070, provided a speedup of 5.6 and 84.4 over highly optimized implementations on the fastest central processing unit (CPU) tested, a quadcore 2.67 GHz Xeon processor, for the Izhikevich and the Hodgkin-Huxley models, respectively. The CPU implementation utilized all four cores and the vector data parallelism offered by the processor. The results indicate that GPUs are well suited for this application domain.
Scaling and optimizing the Gysela code on a cluster of many-core processors
Latu , Guillaume; ASAHI , Yuuichi; Bigot , Julien; Fehér , Tamás; Grandgirard , Virginie
2018-01-01
The current generation of the Xeon Phi Knights Landing (KNL) processor provides a highly multi-threaded environment on which regular programming models such as MPI/OpenMP can be used. This specific hardware offers both large memory bandwidth and large computing resources and is currently available on computing facilities. Many factors impact the performance achieved by applications, one of the key points is the efficient exploitation of SIMD vector units, another one is the memory access patt...
Halder, P.; Chakraborty, A.; Deb Roy, P.; Das, H. S.
2014-09-01
, including test data, etc.: 120226886 Distribution format: tar.gz Programming language: Java, Fortran95. Computer: Any Windows or Linux systems capable of hosting a java runtime environment, java3D and fortran95 compiler; Developed on 2.40 GHz Intel Core i3. Operating system: Any Windows or Linux systems capable of hosting a java runtime environment, java3D and fortran95 compiler. RAM: Ranging from a few Mbytes to several Gbytes, depending on the input parameters. Classification: 1.3. External routines: jfreechart-1.0.14 [1] (free plotting library for java), j3d-jre-1.5.2 [2] (3D visualization). Nature of problem: Optical properties of cosmic dust aggregates. Solution method: Java application based on Mackowski and Mischenko's Superposition T-Matrix code. Restrictions: The program is designed for single processor systems. Additional comments: The distribution file for this program is over 120 Mbytes and therefore is not delivered directly when Download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: Ranging from few minutes to several hours, depending on the input parameters. References: [1] http://www.jfree.org/index.html [2] https://java3d.java.net/
EPR and optical spectroscopic studies of neutral free radicals in an adamantane matrix
International Nuclear Information System (INIS)
Jordan, J.E.
1975-03-01
Recent work in our laboratory has demonstrated that neutral free radicals produced by x-irradiation and trapped in adamantane exhibit exceedingly long lifetimes because of the lack of rapid diffusion in the solid matrix. This observation and the fact that samples can be pressed into pellets with high optical transparency in the visible and near uv regions of the spectrum suggested to us that this unique matrix might be used for studying the optical properties of free radicals. The results of a wide variety of experiments of this type are described in this thesis. These include experiments in which secondary free radicals are produced by photoinduced decomposition of primary free radicals by selective irradiation with visible light, the observation of strong optical absorption spectra of free radicals at room temperature using a Cary 14 spectrophotometer, the finding that certain free radicals exhibit strong, visible fluorescence when irradiated with uv light, and the discovery that the absorption intensity of multiplicity-forbidden transition in singlet and doublet state species is enhanced relative to spin-allowed transitions by at least three orders of magnitude. An analysis of these results in terms of molecular orbital theory is given, and experiments designed to obtain the epr spectra of electronically-excited states of free radicals are described
Chue-Sang, Joseph; Bai, Yuqiang; Stoff, Susan; Gonzalez, Mariacarla; Holness, Nola; Gomes, Jefferson; Jung, Ranu; Gandjbakhche, Amir; Chernomordik, Viktor V; Ramella-Roman, Jessica C
2017-08-01
Preterm birth (PTB) presents a serious medical health concern throughout the world. There is a high incidence of PTB in both developed and developing countries ranging from 11% to 15%, respectively. Recent research has shown that cervical collagen orientation and distribution changes during pregnancy may be useful in predicting PTB. Polarization imaging is an effective means to measure optical anisotropy in birefringent materials, such as the cervix's extracellular matrix. Noninvasive, full-field Mueller matrix polarimetry (MMP) imaging methodologies, and optical coherence tomography (OCT) imaging were used to assess cervical collagen content and structure in nonpregnant porcine cervices. We demonstrate that the highly ordered structure of the nonpregnant porcine cervix can be observed with MMP. Furthermore, when utilized ex vivo, OCT and MMP yield very similar results with a mean error of 3.46% between the two modalities. (2017) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).
The symbol coding language for the BUTs processor of in-core reactor control systems
International Nuclear Information System (INIS)
Vorob'ev, D.M.; Golovanov, M.N.; Levin, G.L.; Parfenova, T.K.; Filatov, V.P.
1978-01-01
A symbolic coding language is described; it has been developed for automation of making up programs for in-core control systems. The systems use the ideology of the CAMAC-VECTOR system and include the BUTs-20 processor. The symbolic coding language has been developed as a programming language of the ASSEMBLER type. Operators of instructions and pseudo-instructions, the rules of reading in the text of the source program, and operator record formats are considered
Kiniry, Joseph R.; Cheong, Elaine
1998-01-01
The Java Pre-Processor, or JPP for short, is a parsing pre-processor for the Java programming language. Unlike its namesake (the C/C++ Pre-Processor, cpp), JPP provides functionality above and beyond simple textual substitution. JPP's capabilities include code beautification, code standard conformance checking, class and interface specification and testing, and documentation generation.
Focusing behavior of the fractal vector optical fields designed by fractal lattice growth model.
Gao, Xu-Zhen; Pan, Yue; Zhao, Meng-Dan; Zhang, Guan-Lin; Zhang, Yu; Tu, Chenghou; Li, Yongnan; Wang, Hui-Tian
2018-01-22
We introduce a general fractal lattice growth model, significantly expanding the application scope of the fractal in the realm of optics. This model can be applied to construct various kinds of fractal "lattices" and then to achieve the design of a great diversity of fractal vector optical fields (F-VOFs) combinating with various "bases". We also experimentally generate the F-VOFs and explore their universal focusing behaviors. Multiple focal spots can be flexibly enginnered, and the optical tweezers experiment validates the simulated tight focusing fields, which means that this model allows the diversity of the focal patterns to flexibly trap and manipulate micrometer-sized particles. Furthermore, the recovery performance of the F-VOFs is also studied when the input fields and spatial frequency spectrum are obstructed, and the results confirm the robustness of the F-VOFs in both focusing and imaging processes, which is very useful in information transmission.
Pérez López, César
2014-01-01
MATLAB is a high-level language and environment for numerical computation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications. The language, tools, and built-in math functions enable you to explore multiple approaches and reach a solution faster than with spreadsheets or traditional programming languages, such as C/C++ or Java. MATLAB Matrix Algebra introduces you to the MATLAB language with practical hands-on instructions and results, allowing you to quickly achieve your goals. Starting with a look at symbolic and numeric variables, with an emphasis on vector and matrix variables, you will go on to examine functions and operations that support vectors and matrices as arguments, including those based on analytic parent functions. Computational methods for finding eigenvalues and eigenvectors of matrices are detailed, leading to various matrix decompositions. Applications such as change of bases, the classification of quadratic forms and ...
Energy Technology Data Exchange (ETDEWEB)
Ushenko, V A; Sidor, M I [Yuriy Fedkovych Chernivtsi National University, Chernivtsi (Ukraine); Marchuk, Yu F; Pashkovskaya, N V; Andreichuk, D R [Bukovinian State Medical University, Chernivtsi (Ukraine)
2015-03-31
We report a model of Mueller-matrix description of optical anisotropy of protein networks in biological tissues with allowance for the linear birefringence and dichroism. The model is used to construct the reconstruction algorithms of coordinate distributions of phase shifts and the linear dichroism coefficient. In the statistical analysis of such distributions, we have found the objective criteria of differentiation between benign and malignant tissues of the female reproductive system. From the standpoint of evidence-based medicine, we have determined the operating characteristics (sensitivity, specificity and accuracy) of the Mueller-matrix reconstruction method of optical anisotropy parameters and demonstrated its effectiveness in the differentiation of benign and malignant tumours. (laser applications and other topics in quantum electronics)
R-Matrix Theory of Atomic Collisions Application to Atomic, Molecular and Optical Processes
Burke, Philip George
2011-01-01
Commencing with a self-contained overview of atomic collision theory, this monograph presents recent developments of R-matrix theory and its applications to a wide-range of atomic molecular and optical processes. These developments include electron and photon collisions with atoms, ions and molecules required in the analysis of laboratory and astrophysical plasmas, multiphoton processes required in the analysis of superintense laser interactions with atoms and molecules and positron collisions with atoms and molecules required in antimatter studies of scientific and technologial importance. Basic mathematical results and general and widely used R-matrix computer programs are summarized in the appendices.
International Nuclear Information System (INIS)
Rabb, Savelas A.; Olesik, John W.
2008-01-01
The ability to obtain high precision, high accuracy measurements in samples with complex matrices using High Performance Inductively Coupled Plasma-Optical Emission Spectroscopy (HP-ICP-OES) was investigated. The Common Analyte Internal Standard (CAIS) procedure was incorporated into the High Performance Inductively Coupled Plasma-Optical Emission Spectroscopy method to correct for matrix-induced changes in emission intensity ratios. Matrix matching and standard addition approaches to minimize matrix-induced errors when using High Performance Inductively Coupled Plasma-Optical Emission Spectroscopy were also assessed. The High Performance Inductively Coupled Plasma-Optical Emission Spectroscopy method was tested with synthetic solutions in a variety of matrices, alloy standard reference materials and geological reference materials
Optical Doppler tomography based on a field programmable gate array
DEFF Research Database (Denmark)
Larsen, Henning Engelbrecht; Nilsson, Ronnie Thorup; Thrane, Lars
2008-01-01
We report the design of and results obtained by using a field programmable gate array (FPGA) to digitally process optical Doppler tomography signals. The processor fits into the analog signal path in an existing optical coherence tomography setup. We demonstrate both Doppler frequency and envelope...... extraction using the Hilbert transform, all in a single FPGA. An FPGA implementation has certain advantages over general purpose digital signal processor (DSP) due to the fact that the processing elements operate in parallel as opposed to the DSP. which is primarily a sequential processor....
CLOUD DETECTION OF OPTICAL SATELLITE IMAGES USING SUPPORT VECTOR MACHINE
Directory of Open Access Journals (Sweden)
K.-Y. Lee
2016-06-01
Full Text Available Cloud covers are generally present in optical remote-sensing images, which limit the usage of acquired images and increase the difficulty of data analysis, such as image compositing, correction of atmosphere effects, calculations of vegetation induces, land cover classification, and land cover change detection. In previous studies, thresholding is a common and useful method in cloud detection. However, a selected threshold is usually suitable for certain cases or local study areas, and it may be failed in other cases. In other words, thresholding-based methods are data-sensitive. Besides, there are many exceptions to control, and the environment is changed dynamically. Using the same threshold value on various data is not effective. In this study, a threshold-free method based on Support Vector Machine (SVM is proposed, which can avoid the abovementioned problems. A statistical model is adopted to detect clouds instead of a subjective thresholding-based method, which is the main idea of this study. The features used in a classifier is the key to a successful classification. As a result, Automatic Cloud Cover Assessment (ACCA algorithm, which is based on physical characteristics of clouds, is used to distinguish the clouds and other objects. In the same way, the algorithm called Fmask (Zhu et al., 2012 uses a lot of thresholds and criteria to screen clouds, cloud shadows, and snow. Therefore, the algorithm of feature extraction is based on the ACCA algorithm and Fmask. Spatial and temporal information are also important for satellite images. Consequently, co-occurrence matrix and temporal variance with uniformity of the major principal axis are used in proposed method. We aim to classify images into three groups: cloud, non-cloud and the others. In experiments, images acquired by the Landsat 7 Enhanced Thematic Mapper Plus (ETM+ and images containing the landscapes of agriculture, snow area, and island are tested. Experiment results demonstrate
Cloud Detection of Optical Satellite Images Using Support Vector Machine
Lee, Kuan-Yi; Lin, Chao-Hung
2016-06-01
Cloud covers are generally present in optical remote-sensing images, which limit the usage of acquired images and increase the difficulty of data analysis, such as image compositing, correction of atmosphere effects, calculations of vegetation induces, land cover classification, and land cover change detection. In previous studies, thresholding is a common and useful method in cloud detection. However, a selected threshold is usually suitable for certain cases or local study areas, and it may be failed in other cases. In other words, thresholding-based methods are data-sensitive. Besides, there are many exceptions to control, and the environment is changed dynamically. Using the same threshold value on various data is not effective. In this study, a threshold-free method based on Support Vector Machine (SVM) is proposed, which can avoid the abovementioned problems. A statistical model is adopted to detect clouds instead of a subjective thresholding-based method, which is the main idea of this study. The features used in a classifier is the key to a successful classification. As a result, Automatic Cloud Cover Assessment (ACCA) algorithm, which is based on physical characteristics of clouds, is used to distinguish the clouds and other objects. In the same way, the algorithm called Fmask (Zhu et al., 2012) uses a lot of thresholds and criteria to screen clouds, cloud shadows, and snow. Therefore, the algorithm of feature extraction is based on the ACCA algorithm and Fmask. Spatial and temporal information are also important for satellite images. Consequently, co-occurrence matrix and temporal variance with uniformity of the major principal axis are used in proposed method. We aim to classify images into three groups: cloud, non-cloud and the others. In experiments, images acquired by the Landsat 7 Enhanced Thematic Mapper Plus (ETM+) and images containing the landscapes of agriculture, snow area, and island are tested. Experiment results demonstrate the detection
Producing chopped firewood with firewood processors
International Nuclear Information System (INIS)
Kaerhae, K.; Jouhiaho, A.
2009-01-01
The TTS Institute's research and development project studied both the productivity of new, chopped firewood processors (cross-cutting and splitting machines) suitable for professional and independent small-scale production, and the costs of the chopped firewood produced. Seven chopped firewood processors were tested in the research, six of which were sawing processors and one shearing processor. The chopping work was carried out using wood feeding racks and a wood lifter. The work was also carried out without any feeding appliances. Altogether 132.5 solid m 3 of wood were chopped in the time studies. The firewood processor used had the most significant impact on chopping work productivity. In addition to the firewood processor, the stem mid-diameter, the length of the raw material, and of the firewood were also found to affect productivity. The wood feeding systems also affected productivity. If there is a feeding rack and hydraulic grapple loader available for use in chopping firewood, then it is worth using the wood feeding rack. A wood lifter is only worth using with the largest stems (over 20 cm mid-diameter) if a feeding rack cannot be used. When producing chopped firewood from small-diameter wood, i.e. with a mid-diameter less than 10 cm, the costs of chopping work were over 10 EUR solid m -3 with sawing firewood processors. The shearing firewood processor with a guillotine blade achieved a cost level of 5 EUR solid m -3 when the mid-diameter of the chopped stem was 10 cm. In addition to the raw material, the cost-efficient chopping work also requires several hundred annual operating hours with a firewood processor, which is difficult for individual firewood entrepreneurs to achieve. The operating hours of firewood processors can be increased to the required level by the joint use of the processors by a number of firewood entrepreneurs. (author)
Opto-VLSI-based reconfigurable free-space optical interconnects architecture
DEFF Research Database (Denmark)
Aljada, Muhsen; Alameh, Kamal; Chung, Il-Sug
2007-01-01
is the Opto-VLSI processor which can be driven by digital phase steering and multicasting holograms that reconfigure the optical interconnects between the input and output ports. The optical interconnects architecture is experimentally demonstrated at 2.5 Gbps using high-speed 1×3 VCSEL array and 1......×3 photoreceiver array in conjunction with two 1×4096 pixel Opto-VLSI processors. The minimisation of the crosstalk between the output ports is achieved by appropriately aligning the VCSEL and PD elements with respect to the Opto-VLSI processors and driving the latter with optimal steering phase holograms....
Breaking the Memory Bottleneck with an Optical Data Path
National Research Council Canada - National Science Library
Fritts, Jason E; Chamberlain, Roger D
2005-01-01
.... Through a simulation-based performance analysis of a 1 GHz processor model, we provide a preliminary evaluation of the benefits of an optical processor-to-memory bus in both eliminating the bandwidth...
Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.
2014-03-01
The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.
Liu, Gui-Geng; Wang, Ke; Lee, Yun-Han; Wang, Dan; Li, Ping-Ping; Gou, Fangwang; Li, Yongnan; Tu, Chenghou; Wu, Shin-Tson; Wang, Hui-Tian
2018-02-15
Vortex vector optical fields (VVOFs) refer to a kind of vector optical field with an azimuth-variant polarization and a helical phase, simultaneously. Such a VVOF is defined by the topological index of the polarization singularity and the topological charge of the phase vortex. We present a simple method to measure the topological charge and index of VVOFs by using a space-variant half-wave plate (SV-HWP). The geometric phase grating of the SV-HWP diffracts a VVOF into ±1 orders with orthogonally left- and right-handed circular polarizations. By inserting a polarizer behind the SV-HWP, the two circular polarization states project into the linear polarization and then interfere with each other to form the interference pattern, which enables the direct measurement of the topological charge and index of VVOFs.
A program system for ab initio MO calculations on vector and parallel processing machines. Pt. 1
International Nuclear Information System (INIS)
Ernenwein, R.; Rohmer, M.M.; Benard, M.
1990-01-01
We present a program system for ab initio molecular orbital calculations on vector and parallel computers. The present article is devoted to the computation of one- and two-electron integrals over contracted Gaussian basis sets involving s-, p-, d- and f-type functions. The McMurchie and Davidson (MMD) algorithm has been implemented and parallelized by distributing over a limited number of logical tasks the calculation of the 55 relevant classes of integrals. All sections of the MMD algorithm have been efficiently vectorized, leading to a scalar/vector ratio of 5.8. Different algorithms are proposed and compared for an optimal vectorization of the contraction of the 'intermediate integrals' generated by the MMD formalism. Advantage is taken of the dynamic storage allocation for tuning the length of the vector loops (i.e. the size of the vectorization buffer) as a function of (i) the total memory available for the job, (ii) the number of logical tasks defined by the user (≤13), and (iii) the storage requested by each specific class of integrals. Test calculations carried out on a CRAY-2 computer show that the average number of finite integrals computed over a (s, p, d, f) CGTO basis set is about 1180000 per second and per processor. The combination of vectorization and parallelism on this 4-processor machine reduces the CPU time by a factor larger than 20 with respect to the scalar and sequential performance. (orig.)
Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A
2014-02-11
Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.
Parallel optical information, concept, and response evolver: POINCARE
Caulfield, H. John; Caulfield, Kimberly
1991-08-01
It is now possible to build a nonlinear adaptive system which will incorporate many of the properties of the human mind, such as true originality in such skills as reasoning by analogy and reasoning by retrodiction, including literally unpredictable thoughts; and development of individual styles, personalities, expertise, etc. Like humans, these optical processors will have a rich `subconscious'' experience. Like humans, they will be clonable, but clones will develop differently as they experience the world differently, make different decisions, develop different habits, etc. In short, powerful optical processors with some of the properties normally associated with human intelligence can be made. This approach can result in a powerful optical processor with those properties. A demonstration chosen for simplicity of implementation is suggested. This could be the first computer of any type which uses quantum indeterminacy in an integral and important way.
Ces-VP: consultation expert system for vector programming of nuclear codes
International Nuclear Information System (INIS)
Fujisaki, Masahide; Makino, Mitsuhiro; Ishiguro, Misako
1988-08-01
Ces-VP is a prototype rule-based expert system for consulting the vector programming, based on the knowledge of vectorization of nuclear codes at JAERI during these 10 years. Experts in vectorization can restructure nuclear codes with high performance on vector processors, since they have know-how for choosing the best technique among a lot of techniques that were acquired from the experience of vectorization in the past. Frequency in trial and error will be reduced if a beginner can easily use the know-how of experts. In this report, at first the contents of Ces-VP and its intention are shown. Then, the method for acquiring the know-how of vectorization and the method for making rules from the know-how are described. The outline of Ces-VP implemented on Fujitsu expert tool ESHELL is described. Finally, the availability of Ces-VP is evaluated from the data gathered from practical use and its present problems are discussed. (author)
Efficient probabilistic model checking on general purpose graphic processors
Bosnacki, D.; Edelkamp, S.; Sulewski, D.; Pasareanu, C.S.
2009-01-01
We present algorithms for parallel probabilistic model checking on general purpose graphic processing units (GPGPUs). For this purpose we exploit the fact that some of the basic algorithms for probabilistic model checking rely on matrix vector multiplication. Since this kind of linear algebraic
AMD's 64-bit Opteron processor
CERN. Geneva
2003-01-01
This talk concentrates on issues that relate to obtaining peak performance from the Opteron processor. Compiler options, memory layout, MPI issues in multi-processor configurations and the use of a NUMA kernel will be covered. A discussion of recent benchmarking projects and results will also be included.BiographiesDavid RichDavid directs AMD's efforts in high performance computing and also in the use of Opteron processors...
Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor
Byun, Chansup; Kepner, Jeremy; Arcand, William; Bestor, David; Bergeron, Bill; Gadepally, Vijay; Houle, Michael; Hubbell, Matthew; Jones, Michael; Klein, Anna; Michaleas, Peter; Milechin, Lauren; Mullen, Julie; Prout, Andrew; Rosa, Antonio
2017-01-01
Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher levels of parallelism. At the Lincoln Laboratory Supercomputing Center (LLSC), the majority of users are running data analysis applications such as MATLAB and O...
Decoupled Vector-Fetch Architecture with a Scalarizing Compiler
Lee, Yunsup
2016-01-01
As we approach the end of conventional technology scaling, computer architects are forced to incorporate specialized and heterogeneous accelerators into general-purpose processors for greater energy efficiency. Among the prominent accelerators that have recently become more popular are data-parallel processing units, such as classic vector units, SIMD units, and graphics processing units (GPUs). Surveying a wide range of data-parallel architectures and their parallel programming models and ...
Composable processor virtualization for embedded systems
Molnos, A.M.; Milutinovic, A.; She, D.; Goossens, K.G.W.
2010-01-01
Processor virtualization divides a physical processor's time among a set of virual machines, enabling efficient hardware utilization, application security and allowing co-existence of different operating systems on the same processor. Through initially intended for the server domain, virtualization
Effective SIMD Vectorization for Intel Xeon Phi Coprocessors
Tian, Xinmin; Saito, Hideki; Preis, Serguei V.; Garcia, Eric N.; Kozhukhov, Sergey S.; Masten, Matt; Cherkasov, Aleksei G.; Panchenko, Nikolay
2015-01-01
Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2D vectorization implemented in the Intel C/C++ and Fortran production compilers for Intel Xeon Phi coprocessors. A ...
Noniterative MAP reconstruction using sparse matrix representations.
Cao, Guangzhi; Bouman, Charles A; Webb, Kevin J
2009-09-01
We present a method for noniterative maximum a posteriori (MAP) tomographic reconstruction which is based on the use of sparse matrix representations. Our approach is to precompute and store the inverse matrix required for MAP reconstruction. This approach has generally not been used in the past because the inverse matrix is typically large and fully populated (i.e., not sparse). In order to overcome this problem, we introduce two new ideas. The first idea is a novel theory for the lossy source coding of matrix transformations which we refer to as matrix source coding. This theory is based on a distortion metric that reflects the distortions produced in the final matrix-vector product, rather than the distortions in the coded matrix itself. The resulting algorithms are shown to require orthonormal transformations of both the measurement data and the matrix rows and columns before quantization and coding. The second idea is a method for efficiently storing and computing the required orthonormal transformations, which we call a sparse-matrix transform (SMT). The SMT is a generalization of the classical FFT in that it uses butterflies to compute an orthonormal transform; but unlike an FFT, the SMT uses the butterflies in an irregular pattern, and is numerically designed to best approximate the desired transforms. We demonstrate the potential of the noniterative MAP reconstruction with examples from optical tomography. The method requires offline computation to encode the inverse transform. However, once these offline computations are completed, the noniterative MAP algorithm is shown to reduce both storage and computation by well over two orders of magnitude, as compared to a linear iterative reconstruction methods.
Deterministic chaos in the processor load
International Nuclear Information System (INIS)
Halbiniak, Zbigniew; Jozwiak, Ireneusz J.
2007-01-01
In this article we present the results of research whose purpose was to identify the phenomenon of deterministic chaos in the processor load. We analysed the time series of the processor load during efficiency tests of database software. Our research was done on a Sparc Alpha processor working on the UNIX Sun Solaris 5.7 operating system. The conducted analyses proved the presence of the deterministic chaos phenomenon in the processor load in this particular case
Fujino, Kan; Yamamoto, Yusuke; Daito, Takuji; Makino, Akiko; Honda, Tomoyuki; Tomonaga, Keizo
2017-09-01
Borna disease virus (BoDV), a prototype of mammalian bornavirus, is a non-segmented, negative strand RNA virus that often causes severe neurological disorders in infected animals, including horses and sheep. Unique among animal RNA viruses, BoDV transcribes and replicates non-cytopathically in the cell nucleus, leading to establishment of long-lasting persistent infection. This striking feature of BoDV indicates its potential as an RNA virus vector system. It has previously been demonstrated by our team that recombinant BoDV (rBoDV) lacking an envelope glycoprotein (G) gene develops persistent infections in transduced cells without loss of the viral genome. In this study, a novel non-transmissive rBoDV, rBoDV ΔMG, which lacks both matrix (M) and G genes in the genome, is reported. rBoDV-ΔMG expressing green fluorescence protein (GFP), rBoDV ΔMG-GFP, was efficiently generated in Vero/MG cells stably expressing both BoDV M and G proteins. Infection with rBoDV ΔMG-GFP was persistently maintained in the parent Vero cells without propagation within cell culture. The optimal ratio of M and G for efficient viral particle production by transient transfection of M and G expression plasmids into cells persistently infected with rBoDV ΔMG-GFP was also demonstrated. These findings indicate that the rBoDV ΔMG-based BoDV vector may provide an extremely safe virus vector system and could be a novel strategy for investigating the function of M and G proteins and the host range of bornaviruses. © 2017 The Societies and John Wiley & Sons Australia, Ltd.
Topological events on the lines of circular polarization in nonparaxial vector optical fields.
Freund, Isaac
2017-02-01
In nonparaxial vector optical fields, the following topological events are shown to occur in apparent violation of charge conservation: as one translates the observation plane along a line of circular polarization (a C line), the points on the line (C points) are seen to change not only the signs of their topological charges, but also their handedness, and, at turning points on the line, paired C points with the same topological charge and opposite handedness are seen to nucleate. These counter-intuitive events cannot occur in paraxial fields.
Energy Technology Data Exchange (ETDEWEB)
Minonzio, Jean-Gabriel; Talmant, Maryline; Laugier, Pascal, E-mail: jean-gabriel.minonzio@upmc.fr [UPMC Univ Paris 06, UMR 7623, LIP, 15 rue de l' ecole de medecine F-75005, Paris (France)
2011-01-01
Different quantitative ultrasound techniques are currently developed for clinical assessment of human bone status. This paper is dedicated to axial transmission: emitters and receivers are linearly arranged on the same side of the skeletal site, preferentially the forearm. In several clinical studies, the signal velocity of the earliest temporal event has been shown to discriminate osteoporotic patients from healthy subjects. However, a multi parameter approach might be relevant to improve bone diagnosis and this be could be achieved by accurate measurement of guided waves wave vectors. For clinical purposes and easy access to the measurement site, the length probe is limited to about 10 mm. The limited number of acquisition scan points on such a short distance reduces the efficiency of conventional signal processing techniques, such as spatio-temporal Fourier transform. The performance of time-frequency techniques was shown to be moderate in other studies. Thus, optimised signal processing is a critical point for a reliable estimate of guided mode wave vectors. Toward this end, a technique, taking benefit of using both multiple emitters and multiple receivers, is proposed. The guided mode wave vectors are obtained using a projection in the singular vectors basis. Those are determined by the singular values decomposition of the transmission matrix between the two arrays at different frequencies. This technique enables us to recover accurately guided waves wave vectors for moderately large array.
Rommel, Simon; Mendinueta, José Manuel Delgado; Klaus, Werner; Sakaguchi, Jun; Olmos, Juan José Vegas; Awaji, Yoshinari; Monroy, Idelfonso Tafur; Wada, Naoya
2017-09-18
This paper discusses spatially diverse optical vector network analysis for space division multiplexing (SDM) component and system characterization, which is becoming essential as SDM is widely considered to increase the capacity of optical communication systems. Characterization of a 108-channel photonic lantern spatial multiplexer, coupled to a 36-core 3-mode fiber, is experimentally demonstrated, extracting the full impulse response and complex transfer function matrices as well as insertion loss (IL) and mode-dependent loss (MDL) data. Moreover, the mode-mixing behavior of fiber splices in the few-mode multi-core fiber and their impact on system IL and MDL are analyzed, finding splices to cause significant mode-mixing and to be non-negligible in system capacity analysis.
Application of an array processor to the analysis of magnetic data for the Doublet III tokamak
International Nuclear Information System (INIS)
Wang, T.S.; Saito, M.T.
1980-08-01
Discussed herein is a fast computational technique employing the Floating Point Systems AP-190L array processor to analyze magnetic data for the Doublet III tokamak, a fusion research device. Interpretation of the experimental data requires the repeated solution of a free-boundary nonlinear partial differential equation, which describes the magnetohydrodynamic (MHD) equilibrium of the plasma. For this particular application, we have found that the array processor is only 1.4 and 3.5 times slower than the CDC-7600 and CRAY computers, respectively. The overhead on the host DEC-10 computer was kept to a minimum by chaining the complete Poisson solver and free-boundary algorithm into one single-load module using the vector function chainer (VFC). A simple time-sharing scheme for using the MHD code is also discussed
Pape, Dennis R.
1990-09-01
The present conference discusses topics in optical image processing, optical signal processing, acoustooptic spectrum analyzer systems and components, and optical computing. Attention is given to tradeoffs in nonlinearly recorded matched filters, miniature spatial light modulators, detection and classification using higher-order statistics of optical matched filters, rapid traversal of an image data base using binary synthetic discriminant filters, wideband signal processing for emitter location, an acoustooptic processor for autonomous SAR guidance, and sampling of Fresnel transforms. Also discussed are an acoustooptic RF signal-acquisition system, scanning acoustooptic spectrum analyzers, the effects of aberrations on acoustooptic systems, fast optical digital arithmetic processors, information utilization in analog and digital processing, optical processors for smart structures, and a self-organizing neural network for unsupervised learning.
Quantitative analysis of eyes and other optical systems in linear optics.
Harris, William F; Evans, Tanya; van Gool, Radboud D
2017-05-01
To show that 14-dimensional spaces of augmented point P and angle Q characteristics, matrices obtained from the ray transference, are suitable for quantitative analysis although only the latter define an inner-product space and only on it can one define distances and angles. The paper examines the nature of the spaces and their relationships to other spaces including symmetric dioptric power space. The paper makes use of linear optics, a three-dimensional generalization of Gaussian optics. Symmetric 2 × 2 dioptric power matrices F define a three-dimensional inner-product space which provides a sound basis for quantitative analysis (calculation of changes, arithmetic means, etc.) of refractive errors and thin systems. For general systems the optical character is defined by the dimensionally-heterogeneous 4 × 4 symplectic matrix S, the transference, or if explicit allowance is made for heterocentricity, the 5 × 5 augmented symplectic matrix T. Ordinary quantitative analysis cannot be performed on them because matrices of neither of these types constitute vector spaces. Suitable transformations have been proposed but because the transforms are dimensionally heterogeneous the spaces are not naturally inner-product spaces. The paper obtains 14-dimensional spaces of augmented point P and angle Q characteristics. The 14-dimensional space defined by the augmented angle characteristics Q is dimensionally homogenous and an inner-product space. A 10-dimensional subspace of the space of augmented point characteristics P is also an inner-product space. The spaces are suitable for quantitative analysis of the optical character of eyes and many other systems. Distances and angles can be defined in the inner-product spaces. The optical systems may have multiple separated astigmatic and decentred refracting elements. © 2017 The Authors Ophthalmic & Physiological Optics © 2017 The College of Optometrists.
International Nuclear Information System (INIS)
Baehler, P.; Bosco, N.; Lingjaerde, T.; Ljuslin, C.; Van Praag, A.; Werner, P.
1986-01-01
The XOP processor has been designed for trigger calculation and data compression in high energy physics experiments. Therefore, emphasis has been placed upon fast execution and high input/output rate. The fast execution is achieved by a wide instruction word holding operations which are executed concurrently. Thus, the arithmetic operations, data address calculations, data accessing, condition checking, loop count checking and next instruction evaluation all overlap in time. In conventional micro-processors these operations are performed sequentially. In addition, the instruction set comprises not only the classical computer instructions, but also specialized instructions suitable for trigger calculations, such as bit search, population count, loose compare and vector instructions. In order to achieve a high input/output rate, each XOP ECLine interface board is equipped with an input and an output port which fulfil the LeCroy ECLine specifications. The autonomous input port allows a data rate of 40 Mbytes/sec, while the program controlled output port allows 20 Mbytes/sec. For Fastbus based systems a dual Fastbus master interface is under design which allows to build up a Fastbus multi-processor system. This design is being done in collaboration with LAPP Annecy for the CERN Lep L3 experiment. Their scheme comprises 4-5 XOP processors, each of them with a master interface on a data input segment and a master interface on a data output segment. This paper describes the structure of the XOP processor, the interface capabilities and the software development and debugging tools. (Auth.)
Pulsewidth Modulation of Neutral-Point-Clamped Indirect Matrix Converter
DEFF Research Database (Denmark)
Blaabjerg, Frede; Poh Chiang, Loh; Gao, Feng
2008-01-01
An indirect matrix converter is an alternative "all-semiconductor" energy processor proposed recently for converting an ac source with fixed magnitude and frequency to a variable voltage and frequency supply that can meet the requirements of a particular industry application. In principle...
Neurovision processor for designing intelligent sensors
Gupta, Madan M.; Knopf, George K.
1992-03-01
A programmable multi-task neuro-vision processor, called the Positive-Negative (PN) neural processor, is proposed as a plausible hardware mechanism for constructing robust multi-task vision sensors. The computational operations performed by the PN neural processor are loosely based on the neural activity fields exhibited by certain nervous tissue layers situated in the brain. The neuro-vision processor can be programmed to generate diverse dynamic behavior that may be used for spatio-temporal stabilization (STS), short-term visual memory (STVM), spatio-temporal filtering (STF) and pulse frequency modulation (PFM). A multi- functional vision sensor that performs a variety of information processing operations on time- varying two-dimensional sensory images can be constructed from a parallel and hierarchical structure of numerous individually programmed PN neural processors.
Analysis method of beam pointing stability based on optical transmission matrix
Wang, Chuanchuan; Huang, PingXian; Li, Xiaotong; Cen, Zhaofen
2016-10-01
Quite a lot of factors will make effects on beam pointing stability of an optical system, Among them, the element tolerance is one of the most important and common factors. In some large laser systems, it will make final micro beams spot on the image plane deviate obviously. So it is essential for us to achieve effective and accurate analysis theoretically on element tolerance. In order to make the analysis of beam pointing stability convenient and theoretical, we consider transmission of a single chief ray rather than beams approximately to stand for the whole spot deviation. According to optical matrix, we also simplify this complex process of light transmission to multiplication of many matrices. So that we can set up element tolerance model, namely having mathematical expression to illustrate spot deviation in an optical system with element tolerance. In this way, we can realize quantitative analysis of beam pointing stability theoretically. In second half of the paper, we design an experiment to get the spot deviation in a multipass optical system caused by element tolerance, then we adjust the tolerance step by step and compare the results with the datum got from tolerance model, finally prove the correction of tolerance model successfully.
Refractive index inversion based on Mueller matrix method
Fan, Huaxi; Wu, Wenyuan; Huang, Yanhua; Li, Zhaozhao
2016-03-01
Based on Stokes vector and Jones vector, the correlation between Mueller matrix elements and refractive index was studied with the result simplified, and through Mueller matrix way, the expression of refractive index inversion was deduced. The Mueller matrix elements, under different incident angle, are simulated through the expression of specular reflection so as to analyze the influence of the angle of incidence and refractive index on it, which is verified through the measure of the Mueller matrix elements of polished metal surface. Research shows that, under the condition of specular reflection, the result of Mueller matrix inversion is consistent with the experiment and can be used as an index of refraction of inversion method, and it provides a new way for target detection and recognition technology.
Use of Lanczos vectors in fluid/structure interaction problems
International Nuclear Information System (INIS)
Jeans, R.; Mathews, I.C.
1992-01-01
The goals of any numerical computational technique used for the solution of structural acoustics problems in the exterior infinite domain should be of accuracy with rapid convergence, robustness, and computational efficiency. A computer program has been developed to achieve each of these three goals. Accuracy and robustness in the numerical representation of the integral equations used to represent the infinite fluid was attained through the use of boundary element implementations of the surface Helmholtz integral equations. The computational efficiency was resolved through the use of Lanczos vectors to model the deformation characteristics of the structure. The authors have developed collocation and variational techniques to overcome the difficulties previously encountered in the numerical implementation of the hypersingular integral operator. The Cauchy singularity present in the integral formulation is made numerically amenable through the use of tangential derivatives in both the collocation and variational techniques. The variational approach has the advantage that the resulting added fluid mass term is symmetric and combines efficiently with a finite element approximation of the structural elastic response. Several different strategies making use of the Lanczos vectors have been investigated. The first involved the use of Lanczos vectors solely to characterize the structural response. This reduced form of the structural dynamical matrix was then substituted back into a Burton and Miller formulation of the acoustic problem. The second strategy investigated involved forming the complex Lanzcos vectors of the dynamical matrix formed from the addition of a symmetrical added fluid matrix to the structural mass matrix. The size of resultant matrix equation set solved at each frequency for this strategy is determined by the number of Lanczos vectors used. 19 refs., 10 figs., 2 tabs
VIRTUS: a multi-processor system in FASTBUS
International Nuclear Information System (INIS)
Ellett, J.; Jackson, R.; Ritter, R.; Schlein, P.; Yaeger, D.; Zweizig, J.
1986-01-01
VIRTUS is a system of parallel MC68000-based processors interconnected by FASTBUS that is used either on-line as an intelligent trigger component or off-line for full event processing. Each processor receives the complete set of data from one event. The host computer, a VAX 11/780, down-line loads all software to the processors, controls and monitors the functioning of all processors, and writes processed data to tape. Instructions, programs, and data are transferred among the processors and the host in the form of fixed format, variable length data blocks. (Auth.)
Ansari, A H; Cherian, P J; Dereymaeker, A; Matic, V; Jansen, K; De Wispelaere, L; Dielman, C; Vervisch, J; Swarte, R M; Govaert, P; Naulaers, G; De Vos, M; Van Huffel, S
2016-09-01
After identifying the most seizure-relevant characteristics by a previously developed heuristic classifier, a data-driven post-processor using a novel set of features is applied to improve the performance. The main characteristics of the outputs of the heuristic algorithm are extracted by five sets of features including synchronization, evolution, retention, segment, and signal features. Then, a support vector machine and a decision making layer remove the falsely detected segments. Four datasets including 71 neonates (1023h, 3493 seizures) recorded in two different university hospitals, are used to train and test the algorithm without removing the dubious seizures. The heuristic method resulted in a false alarm rate of 3.81 per hour and good detection rate of 88% on the entire test databases. The post-processor, effectively reduces the false alarm rate by 34% while the good detection rate decreases by 2%. This post-processing technique improves the performance of the heuristic algorithm. The structure of this post-processor is generic, improves our understanding of the core visually determined EEG features of neonatal seizures and is applicable for other neonatal seizure detectors. The post-processor significantly decreases the false alarm rate at the expense of a small reduction of the good detection rate. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Two-dimensional optoelectronic interconnect-processor and its operational bit error rate
Liu, J. Jiang; Gollsneider, Brian; Chang, Wayne H.; Carhart, Gary W.; Vorontsov, Mikhail A.; Simonis, George J.; Shoop, Barry L.
2004-10-01
Two-dimensional (2-D) multi-channel 8x8 optical interconnect and processor system were designed and developed using complementary metal-oxide-semiconductor (CMOS) driven 850-nm vertical-cavity surface-emitting laser (VCSEL) arrays and the photodetector (PD) arrays with corresponding wavelengths. We performed operation and bit-error-rate (BER) analysis on this free-space integrated 8x8 VCSEL optical interconnects driven by silicon-on-sapphire (SOS) circuits. Pseudo-random bit stream (PRBS) data sequence was used in operation of the interconnects. Eye diagrams were measured from individual channels and analyzed using a digital oscilloscope at data rates from 155 Mb/s to 1.5 Gb/s. Using a statistical model of Gaussian distribution for the random noise in the transmission, we developed a method to compute the BER instantaneously with the digital eye-diagrams. Direct measurements on this interconnects were also taken on a standard BER tester for verification. We found that the results of two methods were in the same order and within 50% accuracy. The integrated interconnects were investigated in an optoelectronic processing architecture of digital halftoning image processor. Error diffusion networks implemented by the inherently parallel nature of photonics promise to provide high quality digital halftoned images.
A lock circuit for a multi-core processor
DEFF Research Database (Denmark)
2015-01-01
An integrated circuit comprising a multiple processor cores and a lock circuit that comprises a queue register with respective bits set or reset via respective, connections dedicated to respective processor cores, whereby the queue register identifies those among the multiple processor cores...... that are enqueued in the queue register. Furthermore, the integrated circuit comprises a current register and a selector circuit configured to select a processor core and identify that processor core by a value in the current register. A selected processor core is a prioritized processor core among the cores...... configured with an integrated circuit; and a silicon die configured with an integrated circuit....
Pape, Dennis R.
Consideration is given to the following topics: transition of optical processing into systems (TOPS), optical signal processing, optical signal processing devices, optical image processing, Russian optical information processing, optical interconnects, and optical computing. Particular papers are presented on an acoustooptic range-Doppler processor design for radar insertion, an optical SAR processor and target recognition system, an advanced magnetooptic spatial light modulator device development update, an algorithm for controlling speckle-noise parameters, optical image processing in Russia, a massively parallel optical interconnect for long data stream convolution, and a reprogrammable digital optical coprocessor. (For individual items see A93-27718 to A93-27723)
Sensitometric control of roentgen film processors
International Nuclear Information System (INIS)
Forsberg, H.; Karolinska Sjukhuset, Stockholm
1987-01-01
Monitoring of film processors performance is essential since image quality, patient dose and costs are influenced by the performance. A system for sensitometric constancy control of film processors and their associated components is described. Experience with the system for 3 years is given when implemented on 17 film processors. Modern high quality film processors have a stability that makes a test frequency of once a week sufficient to maintain adequate image quality. The test system is so sensitive that corrective actions almost invariably have been taken before any technical problem degraded the image quality to a visible degree. (orig.)
Special purpose processors for high energy physics applications
International Nuclear Information System (INIS)
Verkerk, C.
1978-01-01
The review on the subject of hardware processors from very fast decision logic for the split field magnet facility at CERN, to a point-finding processor used to relieve the data-acquisition minicomputer from the task of monitoring the SPS experiment is given. Block diagrams of decision making processor, point-finding processor, complanarity and opening angle processor and programmable track selector module are presented and discussed. The applications of fully programmable but slower processor on the one hand, and very fast and programmable decision logic on the other hand are given in this review
Increasing the computational efficient of digital cross correlation by a vectorization method
Chang, Ching-Yuan; Ma, Chien-Ching
2017-08-01
This study presents a vectorization method for use in MATLAB programming aimed at increasing the computational efficiency of digital cross correlation in sound and images, resulting in a speedup of 6.387 and 36.044 times compared with performance values obtained from looped expression. This work bridges the gap between matrix operations and loop iteration, preserving flexibility and efficiency in program testing. This paper uses numerical simulation to verify the speedup of the proposed vectorization method as well as experiments to measure the quantitative transient displacement response subjected to dynamic impact loading. The experiment involved the use of a high speed camera as well as a fiber optic system to measure the transient displacement in a cantilever beam under impact from a steel ball. Experimental measurement data obtained from the two methods are in excellent agreement in both the time and frequency domain, with discrepancies of only 0.68%. Numerical and experiment results demonstrate the efficacy of the proposed vectorization method with regard to computational speed in signal processing and high precision in the correlation algorithm. We also present the source code with which to build MATLAB-executable functions on Windows as well as Linux platforms, and provide a series of examples to demonstrate the application of the proposed vectorization method.
The Central Trigger Processor (CTP)
Franchini, Matteo
2016-01-01
The Central Trigger Processor (CTP) receives trigger information from the calorimeter and muon trigger processors, as well as from other sources of trigger. It makes the Level-1 decision (L1A) based on a trigger menu.
Adaptive track scheduling to optimize concurrency and vectorization in GeantV
International Nuclear Information System (INIS)
Apostolakis, J; Brun, R; Carminati, F; Gheata, A; Novak, M; Wenzel, S; Bandieramonte, M; Bitzes, G; Canal, P; Elvira, V D; Jun, S Y; Lima, G; Licht, J C De Fine; Duhem, L; Sehgal, R; Shadura, O
2015-01-01
The GeantV project is focused on the R and D of new particle transport techniques to maximize parallelism on multiple levels, profiting from the use of both SIMD instructions and co-processors for the CPU-intensive calculations specific to this type of applications. In our approach, vectors of tracks belonging to multiple events and matching different locality criteria must be gathered and dispatched to algorithms having vector signatures. While the transport propagates tracks and changes their individual states, data locality becomes harder to maintain. The scheduling policy has to be changed to maintain efficient vectors while keeping an optimal level of concurrency. The model has complex dynamics requiring tuning the thresholds to switch between the normal regime and special modes, i.e. prioritizing events to allow flushing memory, adding new events in the transport pipeline to boost locality, dynamically adjusting the particle vector size or switching between vector to single track mode when vectorization causes only overhead. This work requires a comprehensive study for optimizing these parameters to make the behaviour of the scheduler self-adapting, presenting here its initial results. (paper)
Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors
Yi, Hongsuk
2014-03-01
Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.
Very Long Instruction Word Processors
Indian Academy of Sciences (India)
Pentium Processor have modified the processor architecture to exploit parallelism in a program. .... The type of operation itself is encoded using 14 bits. .... text of designing simple architectures with low power consump- tion and execute x86 ...
Chen, Dongsheng; Zeng, Nan; Xie, Qiaolin; He, Honghui; Tuchin, Valery V; Ma, Hui
2017-08-01
We investigate the polarization features corresponding to changes in the microstructure of nude mouse skin during immersion in a glycerol solution. By comparing the Mueller matrix imaging experiments and Monte Carlo simulations, we examine in detail how the Mueller matrix elements vary with the immersion time. The results indicate that the polarization features represented by Mueller matrix elements m22&m33&m44 and the absolute values of m34&m43 are sensitive to the immersion time. To gain a deeper insight on how the microstructures of the skin vary during the tissue optical clearing (TOC), we set up a sphere-cylinder birefringence model (SCBM) of the skin and carry on simulations corresponding to different TOC mechanisms. The good agreement between the experimental and simulated results confirm that Mueller matrix imaging combined with Monte Carlo simulation is potentially a powerful tool for revealing microscopic features of biological tissues.
Heading-vector navigation based on head-direction cells and path integration.
Kubie, John L; Fenton, André A
2009-05-01
Insect navigation is guided by heading vectors that are computed by path integration. Mammalian navigation models, on the other hand, are typically based on map-like place representations provided by hippocampal place cells. Such models compute optimal routes as a continuous series of locations that connect the current location to a goal. We propose a "heading-vector" model in which head-direction cells or their derivatives serve both as key elements in constructing the optimal route and as the straight-line guidance during route execution. The model is based on a memory structure termed the "shortcut matrix," which is constructed during the initial exploration of an environment when a set of shortcut vectors between sequential pairs of visited waypoint locations is stored. A mechanism is proposed for calculating and storing these vectors that relies on a hypothesized cell type termed an "accumulating head-direction cell." Following exploration, shortcut vectors connecting all pairs of waypoint locations are computed by vector arithmetic and stored in the shortcut matrix. On re-entry, when local view or place representations query the shortcut matrix with a current waypoint and goal, a shortcut trajectory is retrieved. Since the trajectory direction is in head-direction compass coordinates, navigation is accomplished by tracking the firing of head-direction cells that are tuned to the heading angle. Section 1 of the manuscript describes the properties of accumulating head-direction cells. It then shows how accumulating head-direction cells can store local vectors and perform vector arithmetic to perform path-integration-based homing. Section 2 describes the construction and use of the shortcut matrix for computing direct paths between any pair of locations that have been registered in the shortcut matrix. In the discussion, we analyze the advantages of heading-based navigation over map-based navigation. Finally, we survey behavioral evidence that nonhippocampal
Assessing the Progress of Trapped-Ion Processors Towards Fault-Tolerant Quantum Computation
Bermudez, A.; Xu, X.; Nigmatullin, R.; O'Gorman, J.; Negnevitsky, V.; Schindler, P.; Monz, T.; Poschinger, U. G.; Hempel, C.; Home, J.; Schmidt-Kaler, F.; Biercuk, M.; Blatt, R.; Benjamin, S.; Müller, M.
2017-10-01
A quantitative assessment of the progress of small prototype quantum processors towards fault-tolerant quantum computation is a problem of current interest in experimental and theoretical quantum information science. We introduce a necessary and fair criterion for quantum error correction (QEC), which must be achieved in the development of these quantum processors before their sizes are sufficiently big to consider the well-known QEC threshold. We apply this criterion to benchmark the ongoing effort in implementing QEC with topological color codes using trapped-ion quantum processors and, more importantly, to guide the future hardware developments that will be required in order to demonstrate beneficial QEC with small topological quantum codes. In doing so, we present a thorough description of a realistic trapped-ion toolbox for QEC and a physically motivated error model that goes beyond standard simplifications in the QEC literature. We focus on laser-based quantum gates realized in two-species trapped-ion crystals in high-optical aperture segmented traps. Our large-scale numerical analysis shows that, with the foreseen technological improvements described here, this platform is a very promising candidate for fault-tolerant quantum computation.
Chue-Sang, Joseph; Bai, Yuqiang; Stoff, Susan; Gonzalez, Mariacarla; Holness, Nola; Gomes, Jefferson; Jung, Ranu; Gandjbakhche, Amir; Chernomordik, Viktor V.; Ramella-Roman, Jessica C.
2017-08-01
Preterm birth (PTB) presents a serious medical health concern throughout the world. There is a high incidence of PTB in both developed and developing countries ranging from 11% to 15%, respectively. Recent research has shown that cervical collagen orientation and distribution changes during pregnancy may be useful in predicting PTB. Polarization imaging is an effective means to measure optical anisotropy in birefringent materials, such as the cervix's extracellular matrix. Noninvasive, full-field Mueller matrix polarimetry (MMP) imaging methodologies, and optical coherence tomography (OCT) imaging were used to assess cervical collagen content and structure in nonpregnant porcine cervices. We demonstrate that the highly ordered structure of the nonpregnant porcine cervix can be observed with MMP. Furthermore, when utilized ex vivo, OCT and MMP yield very similar results with a mean error of 3.46% between the two modalities.
Experimental testing of the noise-canceling processor.
Collins, Michael D; Baer, Ralph N; Simpson, Harry J
2011-09-01
Signal-processing techniques for localizing an acoustic source buried in noise are tested in a tank experiment. Noise is generated using a discrete source, a bubble generator, and a sprinkler. The experiment has essential elements of a realistic scenario in matched-field processing, including complex source and noise time series in a waveguide with water, sediment, and multipath propagation. The noise-canceling processor is found to outperform the Bartlett processor and provide the correct source range for signal-to-noise ratios below -10 dB. The multivalued Bartlett processor is found to outperform the Bartlett processor but not the noise-canceling processor. © 2011 Acoustical Society of America
Vectorizing and macrotasking Monte Carlo neutral particle algorithms
International Nuclear Information System (INIS)
Heifetz, D.B.
1987-04-01
Monte Carlo algorithms for computing neutral particle transport in plasmas have been vectorized and macrotasked. The techniques used are directly applicable to Monte Carlo calculations of neutron and photon transport, and Monte Carlo integration schemes in general. A highly vectorized code was achieved by calculating test flight trajectories in loops over arrays of flight data, isolating the conditional branches to as few a number of loops as possible. A number of solutions are discussed to the problem of gaps appearing in the arrays due to completed flights, which impede vectorization. A simple and effective implementation of macrotasking is achieved by dividing the calculation of the test flight profile among several processors. A tree of random numbers is used to ensure reproducible results. The additional memory required for each task may preclude using a larger number of tasks. In future machines, the limit of macrotasking may be possible, with each test flight, and split test flight, being a separate task
Residual, restarting and Richardson iteration for the matrix exponential
Bochev, Mikhail A.; Grimm, Volker; Hochbruck, Marlis
2013-01-01
A well-known problem in computing some matrix functions iteratively is the lack of a clear, commonly accepted residual notion. An important matrix function for which this is the case is the matrix exponential. Suppose the matrix exponential of a given matrix times a given vector has to be computed.
Residual, restarting and Richardson iteration for the matrix exponential
Bochev, Mikhail A.
2010-01-01
A well-known problem in computing some matrix functions iteratively is a lack of a clear, commonly accepted residual notion. An important matrix function for which this is the case is the matrix exponential. Assume, the matrix exponential of a given matrix times a given vector has to be computed. We
Lee, Dukhyung; Kim, Dai-Sik
2016-01-01
We study light scattering off rectangular slot nano antennas on a metal film varying incident polarization and incident angle, to examine which field vector of light is more important: electric vector perpendicular to, versus magnetic vector parallel to the long axis of the rectangle. While vector Babinet’s principle would prefer magnetic field along the long axis for optimizing slot antenna function, convention and intuition most often refer to the electric field perpendicular to it. Here, we demonstrate experimentally that in accordance with vector Babinet’s principle, the incident magnetic vector parallel to the long axis is the dominant component, with the perpendicular incident electric field making a small contribution of the factor of 1/|ε|, the reciprocal of the absolute value of the dielectric constant of the metal, owing to the non-perfectness of metals at optical frequencies.
International Nuclear Information System (INIS)
Tint, M.
The contribution of the mesonic exchange effect to the conserved vector current in the first forbidden β-decay of Ra E is estimated under the headings: (1) The conserved vector current. (2) The CVC theory and the first forbidden β-decays. (3) Shell model calculations of some matrix-elements. (4) Direct calculation of the exchange term. Considering the mesonic exchange effect in the axial vector-current of β-decay the partially conserved axial vector current theory and experimental results of the process p + p → d + π + are examined. (U.K.)
All optical vector magnetometer, Phase I
National Aeronautics and Space Administration — This Phase I research project will investigate a novel method of operating an atomic magnetometer to simultaneously measure total magnetic fields and vector magnetic...
Automated Vectorization of Decision-Based Algorithms
James, Mark
2006-01-01
Virtually all existing vectorization algorithms are designed to only analyze the numeric properties of an algorithm and distribute those elements across multiple processors. This advances the state of the practice because it is the only known system, at the time of this reporting, that takes high-level statements and analyzes them for their decision properties and converts them to a form that allows them to automatically be executed in parallel. The software takes a high-level source program that describes a complex decision- based condition and rewrites it as a disjunctive set of component Boolean relations that can then be executed in parallel. This is important because parallel architectures are becoming more commonplace in conventional systems and they have always been present in NASA flight systems. This technology allows one to take existing condition-based code and automatically vectorize it so it naturally decomposes across parallel architectures.
Development of a highly reliable CRT processor
International Nuclear Information System (INIS)
Shimizu, Tomoya; Saiki, Akira; Hirai, Kenji; Jota, Masayoshi; Fujii, Mikiya
1996-01-01
Although CRT processors have been employed by the main control board to reduce the operator's workload during monitoring, the control systems are still operated by hardware switches. For further advancement, direct controller operation through a display device is expected. A CRT processor providing direct controller operation must be as reliable as the hardware switches are. The authors are developing a new type of highly reliable CRT processor that enables direct controller operations. In this paper, we discuss the design principles behind a highly reliable CRT processor. The principles are defined by studies of software reliability and of the functional reliability of the monitoring and operation systems. The functional configuration of an advanced CRT processor is also addressed. (author)
Computer Generated Inputs for NMIS Processor Verification
International Nuclear Information System (INIS)
J. A. Mullens; J. E. Breeding; J. A. McEvers; R. W. Wysor; L. G. Chiang; J. R. Lenarduzzi; J. T. Mihalczo; J. K. Mattingly
2001-01-01
Proper operation of the Nuclear Identification Materials System (NMIS) processor can be verified using computer-generated inputs [BIST (Built-In-Self-Test)] at the digital inputs. Preselected sequences of input pulses to all channels with known correlation functions are compared to the output of the processor. These types of verifications have been utilized in NMIS type correlation processors at the Oak Ridge National Laboratory since 1984. The use of this test confirmed a malfunction in a NMIS processor at the All-Russian Scientific Research Institute of Experimental Physics (VNIIEF) in 1998. The NMIS processor boards were returned to the U.S. for repair and subsequently used in NMIS passive and active measurements with Pu at VNIIEF in 1999
International Nuclear Information System (INIS)
Kunz, P.F.; Gravina, M.; Oxoby, G.; Trang, Q.; Fucci, A.; Jacobs, D.; Martin, B.; Storr, K.
1983-03-01
Since the introduction of the 168//sub E/, emulating processors have been successful over an amazingly wide range of applications. This paper will describe a second generation processor, the 3081//sub E/. This new processor, which is being developed as a collaboration between SLAC and CERN, goes beyond just fixing the obvious faults of the 168//sub E/. Not only will the 3081//sub E/ have much more memory space, incorporate many more IBM instructions, and have much more memory space, incorporate many more IBM instructions, and have full double precision floating point arithmetic, but it will also have faster execution times and be much simpler to build, debug, and maintain. The simple interface and reasonable cost of the 168//sub E/ will be maintained for the 3081//sub E/
Ushenko, A. G.; Dubolazov, A. V.; Ushenko, V. A.; Ushenko, Yu. A.; Sakhnovskiy, M. Y.; Pavlyukovich, O.; Pavlyukovich, N.; Novakovskaya, O.; Gorsky, M. P.
2016-09-01
The model of Mueller-matrix description of mechanisms of optical anisotropy that typical for polycrystalline layers of the histological sections of biological tissues and fluids - optical activity, birefringence, as well as linear and circular dichroism - is suggested. Within the statistical analysis distributions quantities of linear and circular birefringence and dichroism the objective criteria of differentiation of myocardium histological sections (determining the cause of death); films of blood plasma (liver pathology); peritoneal fluid (endometriosis of tissues of women reproductive sphere); urine (kidney disease) were determined. From the point of view of probative medicine the operational characteristics (sensitivity, specificity and accuracy) of the method of Mueller-matrix reconstruction of optical anisotropy parameters were found.
High-speed optical three-axis vector magnetometry based on nonlinear Hanle effect in rubidium vapor
Azizbekyan, Hrayr; Shmavonyan, Svetlana; Khanbekyan, Aleksandr; Movsisyan, Marina; Papoyan, Aram
2017-07-01
The magnetic-field-compensation optical vector magnetometer based on the nonlinear Hanle effect in alkali metal vapor allowing two-axis measurement operation has been further elaborated for three-axis performance, along with significant reduction of measurement time. The upgrade was achieved by implementing a two-beam resonant excitation configuration and a fast maximum searching algorithm. Results of the proof-of-concept experiments, demonstrating 1 μT B-field resolution, are presented. The applied interest and capability of the proposed technique is analyzed.
O'Sullivan, George A.; O'Sullivan, Joseph A.
1999-01-01
In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.
PixonVision real-time video processor
Puetter, R. C.; Hier, R. G.
2007-09-01
PixonImaging LLC and DigiVision, Inc. have developed a real-time video processor, the PixonVision PV-200, based on the patented Pixon method for image deblurring and denoising, and DigiVision's spatially adaptive contrast enhancement processor, the DV1000. The PV-200 can process NTSC and PAL video in real time with a latency of 1 field (1/60 th of a second), remove the effects of aerosol scattering from haze, mist, smoke, and dust, improve spatial resolution by up to 2x, decrease noise by up to 6x, and increase local contrast by up to 8x. A newer version of the processor, the PV-300, is now in prototype form and can handle high definition video. Both the PV-200 and PV-300 are FPGA-based processors, which could be spun into ASICs if desired. Obvious applications of these processors include applications in the DOD (tanks, aircraft, and ships), homeland security, intelligence, surveillance, and law enforcement. If developed into an ASIC, these processors will be suitable for a variety of portable applications, including gun sights, night vision goggles, binoculars, and guided munitions. This paper presents a variety of examples of PV-200 processing, including examples appropriate to border security, battlefield applications, port security, and surveillance from unmanned aerial vehicles.
Processors and systems (picture processing)
Energy Technology Data Exchange (ETDEWEB)
Gemmar, P
1983-01-01
Automatic picture processing requires high performance computers and high transmission capacities in the processor units. The author examines the possibilities of operating processors in parallel in order to accelerate the processing of pictures. He therefore discusses a number of available processors and systems for picture processing and illustrates their capacities for special types of picture processing. He stresses the fact that the amount of storage required for picture processing is exceptionally high. The author concludes that it is as yet difficult to decide whether very large groups of simple processors or highly complex multiprocessor systems will provide the best solution. Both methods will be aided by the development of VLSI. New solutions have already been offered (systolic arrays and 3-d processing structures) but they also are subject to losses caused by inherently parallel algorithms. Greater efforts must be made to produce suitable software for multiprocessor systems. Some possibilities for future picture processing systems are discussed. 33 references.
Gu, Bing; Xu, Danfeng; Rui, Guanghao; Lian, Meng; Cui, Yiping; Zhan, Qiwen
2015-09-20
Generation of vectorial optical fields with arbitrary polarization distribution is of great interest in areas where exotic optical fields are desired. In this work, we experimentally demonstrate the versatile generation of linearly polarized vector fields, elliptically polarized vector fields, and circularly polarized vortex beams through introducing attenuators in a common-path interferometer. By means of Richards-Wolf vectorial diffraction method, the characteristics of the highly focused elliptically polarized vector fields are studied. The optical force and torque on a dielectric Rayleigh particle produced by these tightly focused vector fields are calculated and exploited for the stable trapping of dielectric Rayleigh particles. It is shown that the additional degree of freedom provided by the elliptically polarized vector field allows one to control the spatial structure of polarization, to engineer the focusing field, and to tailor the optical force and torque on a dielectric Rayleigh particle.
Matrix algebra theory, computations and applications in statistics
Gentle, James E
2017-01-01
This textbook for graduate and advanced undergraduate students presents the theory of matrix algebra for statistical applications, explores various types of matrices encountered in statistics, and covers numerical linear algebra. Matrix algebra is one of the most important areas of mathematics in data science and in statistical theory, and the second edition of this very popular textbook provides essential updates and comprehensive coverage on critical topics in mathematics in data science and in statistical theory. Part I offers a self-contained description of relevant aspects of the theory of matrix algebra for applications in statistics. It begins with fundamental concepts of vectors and vector spaces; covers basic algebraic properties of matrices and analytic properties of vectors and matrices in multivariate calculus; and concludes with a discussion on operations on matrices in solutions of linear systems and in eigenanalysis. Part II considers various types of matrices encountered in statistics, such as...
Integrated Optical Synthetic Aperture Radar Processor.
1987-09-01
acoustooptic cell was employed to input each radar return into a time-and-space integrating optical architecture comprised of several lenses, a CCD area array...acoustooptic cell and parallel rib waveguide structure. During the course of the literature survey, we became aware of an elegant and poten- tially profound...wave.) scatterer at (f , A(t) is the far-field pattern of the antenna. From the geometry of Si. 1. R can be written as [I-2R,/c - nT1 r(t) = A(nT) rectj
Modeling and Simulation of Matrix Converter
DEFF Research Database (Denmark)
Liu, Fu-rong; Klumpner, Christian; Blaabjerg, Frede
2005-01-01
This paper discusses the modeling and simulation of matrix converter. Two models of matrix converter are presented: one is based on indirect space vector modulation and the other is based on power balance equation. The basis of these two models is• given and the process on modeling is introduced...
Functional Verification of Enhanced RISC Processor
SHANKER NILANGI; SOWMYA L
2013-01-01
This paper presents design and verification of a 32-bit enhanced RISC processor core having floating point computations integrated within the core, has been designed to reduce the cost and complexity. The designed 3 stage pipelined 32-bit RISC processor is based on the ARM7 processor architecture with single precision floating point multiplier, floating point adder/subtractor for floating point operations and 32 x 32 booths multiplier added to the integer core of ARM7. The binary representati...
q-Virasoro constraints in matrix models
Energy Technology Data Exchange (ETDEWEB)
Nedelin, Anton [Dipartimento di Fisica, Università di Milano-Bicocca and INFN, sezione di Milano-Bicocca, Piazza della Scienza 3, I-20126 Milano (Italy); Department of Physics and Astronomy, Uppsala university,Box 516, SE-75120 Uppsala (Sweden); Zabzine, Maxim [Department of Physics and Astronomy, Uppsala university,Box 516, SE-75120 Uppsala (Sweden)
2017-03-20
The Virasoro constraints play the important role in the study of matrix models and in understanding of the relation between matrix models and CFTs. Recently the localization calculations in supersymmetric gauge theories produced new families of matrix models and we have very limited knowledge about these matrix models. We concentrate on elliptic generalization of hermitian matrix model which corresponds to calculation of partition function on S{sup 3}×S{sup 1} for vector multiplet. We derive the q-Virasoro constraints for this matrix model. We also observe some interesting algebraic properties of the q-Virasoro algebra.
Bank switched memory interface for an image processor
International Nuclear Information System (INIS)
Barron, M.; Downward, J.
1980-09-01
A commercially available image processor is interfaced to a PDP-11/45 through an 8K window of memory addresses. When the image processor was not in use it was desired to be able to use the 8K address space as real memory. The standard method of accomplishing this would have been to use UNIBUS switches to switch in either the physical 8K bank of memory or the image processor memory. This method has the disadvantage of being rather expensive. As a simple alternative, a device was built to selectively enable or disable either an 8K bank of memory or the image processor memory. To enable the image processor under program control, GEN is contracted in size, the memory is disabled, a device partition for the image processor is created above GEN, and the image processor memory is enabled. The process is reversed to restore memory to GEN. The hardware to enable/disable the image and computer memories is controlled using spare bits from a DR-11K output register. The image processor and physical memory can be switched in or out on line with no adverse affects on the system's operation
The ATLAS Muon to Central Trigger Processor Interface Upgrade for the Run 3 of the LHC
Armbruster, Aaron James; The ATLAS collaboration; Chelstowska, Magda Anna
2017-01-01
To cope with the higher luminosity and physics cross-sections for the third run of the Large Hadron Collider (LHC) and beyond, the Trigger and Data Acquisition (TDAQ) system of ATLAS experiment at CERN is being upgraded. Part of the TDAQ system, the Muon to Central Trigger Processor Interface (MUCTPI) receives muon candidates information from each of the 208 barrel and endcap muon trigger sectors, counts muon candidates for each transverse momentum threshold and sends the result to the Central Trigger Processor (CTP). The MUCTPI takes into account the possible overlap between trigger sectors in order to avoid double counting of muon candidates. A full redesign and replacement of the existing MUCTPI is required in order to provide full-granularity muon position information at the bunch crossing rate to the Topological Trigger processor (L1Topo) and to be able to interface with the new sector logic modules. State-of-the-art FPGA technology and high-density ribbon ﬁber-optic transmitters and receivers is being...
The ATLAS Muon-to-Central Trigger Processor Interface Upgrade for the Run 3 of the LHC
Armbruster, Aaron James; The ATLAS collaboration
2017-01-01
To cope with the higher luminosity and physics cross-sections for the third run of the Large Hadron Collider (LHC) and beyond, the Trigger and Data Acquisition (TDAQ) system of ATLAS experiment at CERN is being upgraded. Part of the TDAQ system, the Muon to Central Trigger Processor Interface (MUCTPI) receives muon candidates information from each of the 208 barrel and endcap muon trigger sectors, counts muon candidates for each transverse momentum threshold and sends the result to the Central Trigger Processor (CTP). The MUCTPI takes into account the possible overlap between trigger sectors in order to avoid double counting of muon candidates. A full redesign and replacement of the existing MUCTPI is required in order to provide full-granularity muon position information at the bunch crossing rate to the Topological Trigger processor (L1Topo) and to be able to interface with the new sector logic modules. State-of-the-art FPGA technology and high-density ribbon ﬁber-optic transmitters and receivers is being...
Variation in efficiency of parallel algorithms. [for study of stiffness matrices in planar trusses
Hayashi, A.; Melosh, R. J.; Utku, S.; Salama, M.
1985-01-01
The present study has the objective to investigate some iterative parallel-processor linear equation solving algorithms with respect to efficiency for analyses of typical linear engineering systems. Attention is given to a set of n linear equations, Ku = p, where K = an n x n positive definite, sparsely populated, symmetric matrix, u = an n x 1 vector of unknown responses, and p = an n x 1 vector of prescribed constants. This study is concerned with a hybrid method in which iteration is used to solve the problem, while a direct method is used on the local processor level. Variations in the efficiency of parallel algorithms are explored. Measures of the efficiency are based on computer experiments regarding the algorithms. For all the algorithms, the wall clock time is found to decrease as the number of processors increases.
Indian Academy of Sciences (India)
The Gram-Schmidt process is one of the first things one learns in a course ... We might want to stay as close to the experimental data as possible when converting these vectors to orthonormal ones demanded by the model. The process of finding the closest or- thonormal .... is obtained by writing the matrix A = [aI, an], then.
VON WISPR Family Processors: Volume 1
National Research Council Canada - National Science Library
Wagstaff, Ronald
1997-01-01
...) and the background noise they are embedded in. Processors utilizing those fluctuations such as the von WISPR Family Processors discussed herein, are methods or algorithms that preferentially attenuate the fluctuating signals and noise...
Many - body simulations using an array processor
International Nuclear Information System (INIS)
Rapaport, D.C.
1985-01-01
Simulations of microscopic models of water and polypeptides using molecular dynamics and Monte Carlo techniques have been carried out with the aid of an FPS array processor. The computational techniques are discussed, with emphasis on the development and optimization of the software to take account of the special features of the processor. The computing requirements of these simulations exceed what could be reasonably carried out on a normal 'scientific' computer. While the FPS processor is highly suited to the kinds of models described, several other computationally intensive problems in statistical mechanics are outlined for which alternative processor architectures are more appropriate
Sensing RF signals with the optical wideband converter
Valley, George C.; Sefler, George A.; Shaw, T. J.
2013-01-01
The optical wideband converter (OWC) is a system for measuring properties of RF signals in the GHz band without use of high speed electronics. In the OWC the RF signal is modulated on a repetitively pulsed optical field with a large wavelength chirp, the optical field is diffracted onto a spatial light modulator (SLM) whose pixels are modulated with a pseudo-random bit sequences (PRBSs), and finally the optical field is directed to a photodiode and the resulting current integrated for each PRBS. When the number of PRBSs and measurements equals the number of SLM pixels, the RF signal can be obtained in principle by multiplying the measurement vector by the inverse of the square matrix given by the PRBSs and the properties of the optics. When the number of measurements is smaller than the number of pixels, a compressive sensing (CS) measurement can be performed, and sparse RF signals can be obtained using one of the standard CS recovery algorithms such as the penalized l1 norm (also known as basis pursuit) or one of the variants of matching pursuit. Accurate reconstruction of RF signals requires good calibration of the OWC. In this paper, we present results using the OWC for RF signals consisting of 2 sinusoids recovered using 3 techniques (matrix inversion, basis pursuit, and matching pursuit). We compare results obtained with orthogonal matching pursuit with nonlinear least squares to basis pursuit with an over-complete dictionary.
Multi-processor network implementations in Multibus II and VME
International Nuclear Information System (INIS)
Briegel, C.
1992-01-01
ACNET (Fermilab Accelerator Controls Network), a proprietary network protocol, is implemented in a multi-processor configuration for both Multibus II and VME. The implementations are contrasted by the bus protocol and software design goals. The Multibus II implementation provides for multiple processors running a duplicate set of tasks on each processor. For a network connected task, messages are distributed by a network round-robin scheduler. Further, messages can be stopped, continued, or re-routed for each task by user-callable commands. The VME implementation provides for multiple processors running one task across all processors. The process can either be fixed to a particular processor or dynamically allocated to an available processor depending on the scheduling algorithm of the multi-processing operating system. (author)
Support of the extremal measure in a vector equilibrium problem
International Nuclear Information System (INIS)
Lapik, M A
2006-01-01
A generalization of the Mhaskar-Saff functional is obtained for a vector equilibrium problem with an external field. As an application, the supports of the equilibrium measures are found in a special vector equilibrium problem with Nikishin matrix.
de Sitter group as a symmetry for optical decoherence
International Nuclear Information System (INIS)
Baskal, S; Kim, Y S
2006-01-01
Stokes parameters form a Minkowskian 4-vector under various optical transformations. As a consequence, the resulting two-by-two density matrix constitutes a representation of the Lorentz group. The associated Poincare sphere is a geometric representation of the Lorentz group. Since the Lorentz group preserves the determinant of the density matrix, it cannot accommodate the decoherence process through the decaying off-diagonal elements of the density matrix, which yields to an increase in the value of the determinant. It is noted that the O(3, 2) de Sitter group contains two Lorentz subgroups. The change in the determinant in one Lorentz group can be compensated by the other. It is thus possible to describe the decoherence process as a symmetry transformation in the O(3, 2) space. It is shown also that these two coupled Lorentz groups can serve as a concrete example of Feynman's rest of the universe
A low-cost vector processor boosting compute-intensive image processing operations
Adorf, Hans-Martin
1992-01-01
Low-cost vector processing (VP) is within reach of everyone seriously engaged in scientific computing. The advent of affordable add-on VP-boards for standard workstations complemented by mathematical/statistical libraries is beginning to impact compute-intensive tasks such as image processing. A case in point in the restoration of distorted images from the Hubble Space Telescope. A low-cost implementation is presented of the standard Tarasko-Richardson-Lucy restoration algorithm on an Intel i860-based VP-board which is seamlessly interfaced to a commercial, interactive image processing system. First experience is reported (including some benchmarks for standalone FFT's) and some conclusions are drawn.
Evaluation of the Intel Sandy Bridge-EP server processor
Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department
2012-01-01
In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing an 8-core “Sandy Bridge-EP” processor with Intel’s previous microarchitecture, the “Westmere-EP”. The Intel marketing names for these processors are “Xeon E5-2600 processor series” and “Xeon 5600 processor series”, respectively. Both processors are produced in a 32nm process, and both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores ...
Array processors based on Gaussian fraction-free method
Energy Technology Data Exchange (ETDEWEB)
Peng, S; Sedukhin, S [Aizu Univ., Aizuwakamatsu, Fukushima (Japan); Sedukhin, I
1998-03-01
The design of algorithmic array processors for solving linear systems of equations using fraction-free Gaussian elimination method is presented. The design is based on a formal approach which constructs a family of planar array processors systematically. These array processors are synthesized and analyzed. It is shown that some array processors are optimal in the framework of linear allocation of computations and in terms of number of processing elements and computing time. (author)
Multiple Embedded Processors for Fault-Tolerant Computing
Bolotin, Gary; Watson, Robert; Katanyoutanant, Sunant; Burke, Gary; Wang, Mandy
2005-01-01
A fault-tolerant computer architecture has been conceived in an effort to reduce vulnerability to single-event upsets (spurious bit flips caused by impingement of energetic ionizing particles or photons). As in some prior fault-tolerant architectures, the redundancy needed for fault tolerance is obtained by use of multiple processors in one computer. Unlike prior architectures, the multiple processors are embedded in a single field-programmable gate array (FPGA). What makes this new approach practical is the recent commercial availability of FPGAs that are capable of having multiple embedded processors. A working prototype (see figure) consists of two embedded IBM PowerPC 405 processor cores and a comparator built on a Xilinx Virtex-II Pro FPGA. This relatively simple instantiation of the architecture implements an error-detection scheme. A planned future version, incorporating four processors and two comparators, would correct some errors in addition to detecting them.
Online track processor for the CDF upgrade
International Nuclear Information System (INIS)
Thomson, E. J.
2002-01-01
A trigger track processor, called the eXtremely Fast Tracker (XFT), has been designed for the CDF upgrade. This processor identifies high transverse momentum (> 1.5 GeV/c) charged particles in the new central outer tracking chamber for CDF II. The XFT design is highly parallel to handle the input rate of 183 Gbits/s and output rate of 44 Gbits/s. The processor is pipelined and reports the result for a new event every 132 ns. The processor uses three stages: hit classification, segment finding, and segment linking. The pattern recognition algorithms for the three stages are implemented in programmable logic devices (PLDs) which allow in-situ modification of the algorithm at any time. The PLDs reside on three different types of modules. The complete system has been installed and commissioned at CDF II. An overview of the track processor and performance in CDF Run II are presented
Non-coaxial superposition of vector vortex beams.
Aadhi, A; Vaity, Pravin; Chithrabhanu, P; Reddy, Salla Gangi; Prabakar, Shashi; Singh, R P
2016-02-10
Vector vortex beams are classified into four types depending upon spatial variation in their polarization vector. We have generated all four of these types of vector vortex beams by using a modified polarization Sagnac interferometer with a vortex lens. Further, we have studied the non-coaxial superposition of two vector vortex beams. It is observed that the superposition of two vector vortex beams with same polarization singularity leads to a beam with another kind of polarization singularity in their interaction region. The results may be of importance in ultrahigh security of the polarization-encrypted data that utilizes vector vortex beams and multiple optical trapping with non-coaxial superposition of vector vortex beams. We verified our experimental results with theory.
Design Principles for Synthesizable Processor Cores
DEFF Research Database (Denmark)
Schleuniger, Pascal; McKee, Sally A.; Karlsson, Sven
2012-01-01
As FPGAs get more competitive, synthesizable processor cores become an attractive choice for embedded computing. Currently popular commercial processor cores do not fully exploit current FPGA architectures. In this paper, we propose general design principles to increase instruction throughput...
Data register and processor for multiwire chambers
International Nuclear Information System (INIS)
Karpukhin, V.V.
1985-01-01
A data register and a processor for data receiving and processing from drift chambers of a device for investigating relativistic positroniums are described. The data are delivered to the register input in the form of the Grey 8 bit code, memorized and transformed to a position code. The register information is delivered to the KAMAK trunk and to the front panel plug. The processor selects particle tracks in a horizontal plane of the facility. ΔY maximum coordinate divergence and minimum point quantity on the track are set from the processor front panel. Processor solution time is 16 μs maximum quantity of simultaneously analyzed coordinates is 16
NMR-MPar: A Fault-Tolerance Approach for Multi-Core and Many-Core Processors
Directory of Open Access Journals (Sweden)
Vanessa Vargas
2018-03-01
Full Text Available Multi-core and many-core processors are a promising solution to achieve high performance by maintaining a lower power consumption. However, the degree of miniaturization makes them more sensitive to soft-errors. To improve the system reliability, this work proposes a fault-tolerance approach based on redundancy and partitioning principles called N-Modular Redundancy and M-Partitions (NMR-MPar. By combining both principles, this approach allows multi-/many-core processors to perform critical functions in mixed-criticality systems. Benefiting from the capabilities of these devices, NMR-MPar creates different partitions that perform independent functions. For critical functions, it is proposed that N partitions with the same configuration participate of an N-modular redundancy system. In order to validate the approach, a case study is implemented on the KALRAY Multi-Purpose Processing Array (MPPA-256 many-core processor running two parallel benchmark applications. The traveling salesman problem and matrix multiplication applications were selected to test different device’s resources. The effectiveness of NMR-MPar is assessed by software-implemented fault-injection. For evaluation purposes, it is considered that the system is intended to be used in avionics. Results show the improvement of the application reliability by two orders of magnitude when implementing NMR-MPar on the system. Finally, this work opens the possibility to use massive parallelism for dependable applications in embedded systems.
Optimization of the Brillouin operator on the KNL architecture
Dürr, Stephan
2018-03-01
Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv = 12 right-hand-sides, Nthr = 256 threads, on lattices of size 323 × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.
Development methods for VLSI-processors
International Nuclear Information System (INIS)
Horninger, K.; Sandweg, G.
1982-01-01
The aim of this project, which was originally planed for 3 years, was the development of modern system and circuit concepts, for VLSI-processors having a 32 bit wide data path. The result of this first years work is the concept of a general purpose processor. This processor is not only logically but also physically (on the chip) divided into four functional units: a microprogrammable instruction unit, an execution unit in slice technique, a fully associative cache memory and an I/O unit. For the ALU of the execution unit circuits in PLA and slice techniques have been realized. On the basis of regularity, area consumption and achievable performance the slice technique has been prefered. The designs utilize selftesting circuitry. (orig.) [de
The performance of an LSI-11/23 with a SKYMNK-Q array processor as a high speed front end processor
International Nuclear Information System (INIS)
Clark, D.L.
1983-01-01
The NSRL has recently installed a VAX-11/750 based data acquisition system which is networked to two LSI-11/23 satellite processors. Each of the LSI's are connected to CAMAC branch drivers. The LSI's have small array processors installed for use in preprocessing data. The objective is to provide an easy to use high speed processor that will relieve the VAX of some of the real-time data analysis tasks. The basic operation of the array processor and some of the results of performance tests are described
Federal Laboratory Consortium — The Embedded Processor Laboratory provides the means to design, develop, fabricate, and test embedded computers for missile guidance electronics systems in support...
Effective SIMD Vectorization for Intel Xeon Phi Coprocessors
Directory of Open Access Journals (Sweden)
Xinmin Tian
2015-01-01
Full Text Available Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2D vectorization implemented in the Intel C/C++ and Fortran production compilers for Intel Xeon Phi coprocessors. A set of workloads from several application domains is employed to conduct the performance study of our SIMD vectorization techniques. The performance results show that we achieved up to 12.5x performance gain on the Intel Xeon Phi coprocessor. We also demonstrate a 2000x performance speedup from the seamless integration of SIMD vectorization and parallelization.
Multi-Core Processor Memory Contention Benchmark Analysis Case Study
Simon, Tyler; McGalliard, James
2009-01-01
Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.
Architectural design and analysis of a programmable image processor
International Nuclear Information System (INIS)
Siyal, M.Y.; Chowdhry, B.S.; Rajput, A.Q.K.
2003-01-01
In this paper we present an architectural design and analysis of a programmable image processor, nicknamed Snake. The processor was designed with a high degree of parallelism to speed up a range of image processing operations. Data parallelism found in array processors has been included into the architecture of the proposed processor. The implementation of commonly used image processing algorithms and their performance evaluation are also discussed. The performance of Snake is also compared with other types of processor architectures. (author)
Gamow state vectors as functionals over subspaces of the nuclear space
International Nuclear Information System (INIS)
Bohm, A.
1979-12-01
Exponentially decaying Gamow state vectors are obtained from S-matrix poles in the lower half of the second sheet, and are defined as functionals over a subspace of the nuclear space, PHI. Exponentially growing Gamow state vectors are obtained from S-matrix poles in the upper half of the second sheet, and are defined as functionals over another subspace of PHI. On functionals over these two subspaces the dynamical group of time development splits into two semigroups
Mathuriya, Amrita; Luo, Ye; Benali, Anouar; Shulenburger, Luke; Kim, Jeongnim
2016-01-01
B-spline based orbital representations are widely used in Quantum Monte Carlo (QMC) simulations of solids, historically taking as much as 50% of the total run time. Random accesses to a large four-dimensional array make it challenging to efficiently utilize caches and wide vector units of modern CPUs. We present node-level optimizations of B-spline evaluations on multi/many-core shared memory processors. To increase SIMD efficiency and bandwidth utilization, we first apply data layout transfo...
Analytical Bounds on the Threads in IXP1200 Network Processor
Ramakrishna, STGS; Jamadagni, HS
2003-01-01
Increasing link speeds have placed enormous burden on the processing requirements and the processors are expected to carry out a variety of tasks. Network Processors (NP) [1] [2] is the blanket name given to the processors, which are traded for flexibility and performance. Network Processors are offered by a number of vendors; to take the main burden of processing requirement of network related operations from the conventional processors. The Network Processors cover a spectrum of design trad...
Optical currents in vector fields
DEFF Research Database (Denmark)
Angelsky, O. V.; Gorsky, M. P.; Maksimyak, P. P.
2011-01-01
The influence of phase relations and the degree of mutual coherence of superimposing waves in the arrangements of twowave superposition on the characteristics of the microparticle's motion has been analyzed. The prospects of studying temporal coherence using the proposed approach are made. For th....... For the first time, we have shown experimentally the possibility of diagnostics the optical currents in liquids caused by polarization characteristics of an optical field alone, using test metallic particles of nanoscale....
A UNIX-based prototype biomedical virtual image processor
International Nuclear Information System (INIS)
Fahy, J.B.; Kim, Y.
1987-01-01
The authors have developed a multiprocess virtual image processor for the IBM PC/AT, in order to maximize image processing software portability for biomedical applications. An interprocess communication scheme, based on two-way metacode exchange, has been developed and verified for this purpose. Application programs call a device-independent image processing library, which transfers commands over a shared data bridge to one or more Autonomous Virtual Image Processors (AVIP). Each AVIP runs as a separate process in the UNIX operating system, and implements the device-independent functions on the image processor to which it corresponds. Application programs can control multiple image processors at a time, change the image processor configuration used at any time, and are completely portable among image processors for which an AVIP has been implemented. Run-time speeds have been found to be acceptable for higher level functions, although rather slow for lower level functions, owing to the overhead associated with sending commands and data over the shared data bridge
Orbit Classification of Qutrit via the Gram Matrix
International Nuclear Information System (INIS)
Tay, B. A.; Zainuddin, Hishamuddin
2008-01-01
We classify the orbits generated by unitary transformation on the density matrices of the three-state quantum systems (qutrits) via the Gram matrix. The Gram matrix is a real symmetric matrix formed from the Hilbert–Schmidt scalar products of the vectors lying in the tangent space to the orbits. The rank of the Gram matrix determines the dimensions of the orbits, which fall into three classes for qutrits. (general)
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs
Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; Masciovecchio, Mario; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi
2017-08-01
For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs
Directory of Open Access Journals (Sweden)
Cerati Giuseppe
2017-01-01
Full Text Available For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU, ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC, for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs
Energy Technology Data Exchange (ETDEWEB)
Cerati, Giuseppe [Fermilab; Elmer, Peter [Princeton U.; Krutelyov, Slava [UC, San Diego; Lantz, Steven [Cornell U.; Lefebvre, Matthieu [Princeton U.; Masciovecchio, Mario [UC, San Diego; McDermott, Kevin [Cornell U.; Riley, Daniel [Cornell U., LNS; Tadel, Matevž [UC, San Diego; Wittich, Peter [Cornell U.; Würthwein, Frank [UC, San Diego; Yagil, Avi [UC, San Diego
2017-01-01
For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
The communication processor of TUMULT-64
Smit, Gerardus Johannes Maria; Jansen, P.G.
1988-01-01
Tumult (Twente University MULTi-processor system) is a modular extendible multi-processor system designed and implemented at the Twente University of Technology in co-operation with Oce Nederland B.V. and the Dr. Neher Laboratories (Dutch PTT). Characteristics of the hardware are: MIMD type,
Distribution amplitudes of vector mesons
Energy Technology Data Exchange (ETDEWEB)
Braun, V.M. [Regensburg Univ. (Germany). Inst. fuer Theoretische Physik; Broemmel, D. [Deutsches Elektronen-Synchrotron, Hamburg (Germany); Goeckeler, M. [Regensburg Univ. (DE). Inst. fuer Theoretische Physik] (and others)
2007-11-15
Results are presented for the lowest moment of the distribution amplitude for the K{sup *} vector meson. Both longitudinal and transverse moments are investigated. We use two flavours of O(a) improved Wilson fermions, together with a non-perturbative renormalisation of the matrix element. (orig.)
High-Speed General Purpose Genetic Algorithm Processor.
Hoseini Alinodehi, Seyed Pourya; Moshfe, Sajjad; Saber Zaeimian, Masoumeh; Khoei, Abdollah; Hadidi, Khairollah
2016-07-01
In this paper, an ultrafast steady-state genetic algorithm processor (GAP) is presented. Due to the heavy computational load of genetic algorithms (GAs), they usually take a long time to find optimum solutions. Hardware implementation is a significant approach to overcome the problem by speeding up the GAs procedure. Hence, we designed a digital CMOS implementation of GA in [Formula: see text] process. The proposed processor is not bounded to a specific application. Indeed, it is a general-purpose processor, which is capable of performing optimization in any possible application. Utilizing speed-boosting techniques, such as pipeline scheme, parallel coarse-grained processing, parallel fitness computation, parallel selection of parents, dual-population scheme, and support for pipelined fitness computation, the proposed processor significantly reduces the processing time. Furthermore, by relying on a built-in discard operator the proposed hardware may be used in constrained problems that are very common in control applications. In the proposed design, a large search space is achievable through the bit string length extension of individuals in the genetic population by connecting the 32-bit GAPs. In addition, the proposed processor supports parallel processing, in which the GAs procedure can be run on several connected processors simultaneously.
DEFF Research Database (Denmark)
2013-01-01
The present invention relates to an all-optical sensor utilizing effective index modulation of a waveguide and detection of a wavelength shift of reflected light and a force sensing system accommodating said optical sensor. One embodiment of the invention relates to a sensor system comprising...... at least one multimode light source, one or more optical sensors comprising a multimode sensor optical waveguide accommodating a distributed Bragg reflector, at least one transmitting optical waveguide for guiding light from said at least one light source to said one or more multimode sensor optical...... waveguides, a detector for measuring light reflected from said Bragg reflector in said one or more multimode sensor optical waveguides, and a data processor adapted for analyzing variations in the Bragg wavelength of at least one higher order mode of the reflected light....
Making CSB + -Trees Processor Conscious
DEFF Research Database (Denmark)
Samuel, Michael; Pedersen, Anders Uhl; Bonnet, Philippe
2005-01-01
of the CSB+-tree. We argue that it is necessary to consider a larger group of parameters in order to adapt CSB+-tree to processor architectures as different as Pentium and Itanium. We identify this group of parameters and study how it impacts the performance of CSB+-tree on Itanium 2. Finally, we propose......Cache-conscious indexes, such as CSB+-tree, are sensitive to the underlying processor architecture. In this paper, we focus on how to adapt the CSB+-tree so that it performs well on a range of different processor architectures. Previous work has focused on the impact of node size on the performance...... a systematic method for adapting CSB+-tree to new platforms. This work is a first step towards integrating CSB+-tree in MySQL’s heap storage manager....
Lipsi: Probably the Smallest Processor in the World
DEFF Research Database (Denmark)
Schoeberl, Martin
2018-01-01
While research on high-performance processors is important, it is also interesting to explore processor architectures at the other end of the spectrum: tiny processor cores for auxiliary functions. While it is common to implement small circuits for such functions, such as a serial port, in dedica...... at a minimal cost....
List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor
Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.
2014-03-01
List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging
Recommending the heterogeneous cluster type multi-processor system computing
International Nuclear Information System (INIS)
Iijima, Nobukazu
2010-01-01
Real-time reactor simulator had been developed by reusing the equipment of the Musashi reactor and its performance improvement became indispensable for research tools to increase sampling rate with introduction of arithmetic units using multi-Digital Signal Processor(DSP) system (cluster). In order to realize the heterogeneous cluster type multi-processor system computing, combination of two kinds of Control Processor (CP) s, Cluster Control Processor (CCP) and System Control Processor (SCP), were proposed with Large System Control Processor (LSCP) for hierarchical cluster if needed. Faster computing performance of this system was well evaluated by simulation results for simultaneous execution of plural jobs and also pipeline processing between clusters, which showed the system led to effective use of existing system and enhancement of the cost performance. (T. Tanaka)
Energy flow characteristics of vector X-Waves
Salem, Mohamed; Bagci, Hakan
2011-01-01
The vector form of X-Waves is obtained as a superposition of transverse electric and transverse magnetic polarized field components. It is shown that the signs of all components of the Poynting vector can be locally changed using carefully chosen complex amplitudes of the transverse electric and transverse magnetic polarization components. Negative energy flux density in the longitudinal direction can be observed in a bounded region around the centroid; in this region the local behavior of the wave field is similar to that of wave field with negative energy flow. This peculiar energy flux phenomenon is of essential importance for electromagnetic and optical traps and tweezers, where the location and momenta of microand nanoparticles are manipulated by changing the Poynting vector, and in detection of invisibility cloaks. © 2011 Optical Society of America.
Optical-domain Compensation for Coupling between Optical Fiber Conjugate Vortex Modes
DEFF Research Database (Denmark)
Lyubopytov, Vladimir S.; Tatarczak, Anna; Lu, Xiaofeng
2016-01-01
We demonstrate for the first time optical-domain compensation for coupling between conjugate vortex modes in optical fibers. We introduce a novel method for reconstructing the complex propagation matrix of the optical fiber with straightforward implementation.......We demonstrate for the first time optical-domain compensation for coupling between conjugate vortex modes in optical fibers. We introduce a novel method for reconstructing the complex propagation matrix of the optical fiber with straightforward implementation....
Trigonometric bases for matrix weighted Lp-spaces
DEFF Research Database (Denmark)
Nielsen, Morten
2010-01-01
We give a complete characterization of 2π-periodic matrix weights W for which the vector-valued trigonometric system forms a Schauder basis for the matrix weighted space Lp(T;W). Then trigonometric quasi-greedy bases for Lp(T;W) are considered. Quasi-greedy bases are systems for which the simple...
First level trigger processor for the ZEUS calorimeter
International Nuclear Information System (INIS)
Dawson, J.W.; Talaga, R.L.; Burr, G.W.; Laird, R.J.; Smith, W.; Lackey, J.
1990-01-01
This paper discusses the design of the first level trigger processor for the ZEUS calorimeter. This processor accepts data from the 13,000 photomultipliers of the calorimeter which is topologically divided into 16 regions, and after regional preprocessing, performs logical and numerical operations which cross regional boundaries. Because the crossing period at the HERA collider is 96 ns, it is necessary that first-level trigger decisions be made in pipelined hardware. One microsecond is allowed for the processor to perform the required logical and numerical operations, during which time the data from ten crossings would be resident in the processor while being clocked through the pipelined hardware. The circuitry is implemented in 100K ECL, Advanced CMOS discrete devices, and programmable gate arrays, and operates in a VME environment. All tables and registers are written/read from VME, and all diagnostic codes are executed from VME. Preprocessed data flows into the processor at a rate of 5.2GB/s, and processed data flows from the processor to the Global First-Level Trigger at a rate of 700MB/s. The system allows for subsets of the logic to be configured by software and for various important variables to be histogrammed as they flow through the processor. 2 refs., 3 figs
First-level trigger processor for the ZEUS calorimeter
International Nuclear Information System (INIS)
Dawson, J.W.; Talaga, R.L.; Burr, G.W.; Laird, R.J.; Smith, W.; Lackey, J.
1990-01-01
The design of the first-level trigger processor for the Zeus calorimeter is discussed. This processor accepts data from the 13,000 photomultipliers of the calorimeter, which is topologically divided into 16 regions, and after regional preprocessing performs logical and numerical operations that cross regional boundaries. Because the crossing period at the HERA collider is 96 ns, it is necessary that first-level trigger decisions be made in pipelined hardware. One microsecond is allowed for the processor to perform the required logical and numerical operations, during which time the data from ten crossings would be resident in the processor while being clocked through the pipelined hardware. The circuitry is implemented in 100K emitter-coupled logic (ECL), advanced CMOS discrete devices and programmable gate arrays, and operates in a VME environment. All tables and registers are written/read from VME, and all diagnostic codes are executed from VME. Preprocessed data flows into the processor at a rate of 5.2 Gbyte/s, and processed data flows from the processor to the global first-level trigger at a rate of 70 Mbyte/s. The system allows for subsets of the logic to be configured by software and for various important variables to be histogrammed as they flow through the processor
A digital retina-like low-level vision processor.
Mertoguno, S; Bourbakis, N G
2003-01-01
This correspondence presents the basic design and the simulation of a low level multilayer vision processor that emulates to some degree the functional behavior of a human retina. This retina-like multilayer processor is the lower part of an autonomous self-organized vision system, called Kydon, that could be used on visually impaired people with a damaged visual cerebral cortex. The Kydon vision system, however, is not presented in this paper. The retina-like processor consists of four major layers, where each of them is an array processor based on hexagonal, autonomous processing elements that perform a certain set of low level vision tasks, such as smoothing and light adaptation, edge detection, segmentation, line recognition and region-graph generation. At each layer, the array processor is a 2D array of k/spl times/m hexagonal identical autonomous cells that simultaneously execute certain low level vision tasks. Thus, the hardware design and the simulation at the transistor level of the processing elements (PEs) of the retina-like processor and its simulated functionality with illustrative examples are provided in this paper.
The Trigger Processor and Trigger Processor Algorithms for the ATLAS New Small Wheel Upgrade
Lazovich, Tomo; The ATLAS collaboration
2015-01-01
The ATLAS New Small Wheel (NSW) is an upgrade to the ATLAS muon endcap detectors that will be installed during the next long shutdown of the LHC. Comprising both MicroMegas (MMs) and small-strip Thin Gap Chambers (sTGCs), this system will drastically improve the performance of the muon system in a high cavern background environment. The NSW trigger, in particular, will significantly reduce the rate of fake triggers coming from track segments in the endcap not originating from the interaction point. We will present an overview of the trigger, the proposed sTGC and MM trigger algorithms, and the hardware implementation of the trigger. In particular, we will discuss both the heart of the trigger, an ATCA system with FPGA-based trigger processors (using the same hardware platform for both MM and sTGC triggers), as well as the full trigger electronics chain, including dedicated cards for transmission of data via GBT optical links. Finally, we will detail the challenges of ensuring that the trigger electronics can ...
SAD PROCESSOR FOR MULTIPLE MACROBLOCK MATCHING IN FAST SEARCH VIDEO MOTION ESTIMATION
Directory of Open Access Journals (Sweden)
Nehal N. Shah
2015-02-01
Full Text Available Motion estimation is a very important but computationally complex task in video coding. Process of determining motion vectors based on the temporal correlation of consecutive frame is used for video compression. In order to reduce the computational complexity of motion estimation and maintain the quality of encoding during motion compensation, different fast search techniques are available. These block based motion estimation algorithms use the sum of absolute difference (SAD between corresponding macroblock in current frame and all the candidate macroblocks in the reference frame to identify best match. Existing implementations can perform SAD between two blocks using sequential or pipeline approach but performing multi operand SAD in single clock cycle with optimized recourses is state of art. In this paper various parallel architectures for computation of the fixed block size SAD is evaluated and fast parallel SAD architecture is proposed with optimized resources. Further SAD processor is described with 9 processing elements which can be configured for any existing fast search block matching algorithm. Proposed SAD processor consumes 7% fewer adders compared to existing implementation for one processing elements. Using nine PE it can process 84 HD frames per second in worse case which is good outcome for real time implementation. In average case architecture process 325 HD frames per second.
Directory of Open Access Journals (Sweden)
Salem Ibrahim Salem
2017-10-01
Full Text Available Abstract: The chlorophyll-a (Chla products of seven processors developed for the Medium Resolution Imaging Spectrometer (MERIS sensor were evaluated. The seven processors, based on a neural network and band height, were assessed over an optically complex water body with Chla concentrations of 8.10–187.40 mg∙m−3 using 10-year MERIS archival data. These processors were adopted for the Ocean and Land Color Instrument (OLCI sensor. Results indicated that the four processors of band height (i.e. the Maximum Chlorophyll Index (MCI_L1; and Fluorescence Line Height (FLH_L1; neural network (i.e. Eutrophic Lake (EUL; and Case 2 Regional (C2R possessed reasonable retrieval accuracy with root mean square error (R2 in the range of 0.42–0.65. However, these processors underestimated the retrieved Chla > 100 mg∙m−3, reflecting the limitation of the band height processors to eliminate the influence of non-phytoplankton matter and highlighting the need to train the neural network for highly turbid waters. MCI_L1 outperformed other processors during the calibration and validation stages (R2 = 0.65, Root mean square error (RMSE = 22.18 mg∙m−3, the mean absolute relative error (MARE = 36.88%. In contrast, the results from the Boreal Lake (BOL and Free University of Berlin (FUB processors demonstrated their inadequacy to accurately retrieve Chla concentration > 50 mg∙m−3, mainly due to the limitation of the training datasets that resulted in a high MARE for BOL (56.20% and FUB (57.00%. Mapping the spatial distribution of Chla concentrations across Lake Kasumigaura using the seven processors showed that all processors—except for the BOL and FUB—were able to accurately capture the Chla distribution for moderate and high Chla concentrations. In addition, MCI_L1 and C2R processors were evaluated over 10-years of monthly measured Chla as they demonstrated the best retrieval accuracy from both groups (i.e. band height and neural network
Filtering and smoothing of stae vector for diffuse state space models
Koopman, S.J.; Durbin, J.
2003-01-01
This paper presents exact recursions for calculating the mean and mean square error matrix of the state vector given the observations for the multi-variate linear Gaussian state-space model in the case where the initial state vector is (partially) diffuse.
Median and Morphological Specialized Processors for a Real-Time Image Data Processing
Directory of Open Access Journals (Sweden)
Kazimierz Wiatr
2002-01-01
Full Text Available This paper presents the considerations on selecting a multiprocessor MISD architecture for fast implementation of the vision image processing. Using the authorÃ¢Â€Â²s earlier experience with real-time systems, implementing of specialized hardware processors based on the programmable FPGA systems has been proposed in the pipeline architecture. In particular, the following processors are presented: median filter and morphological processor. The structure of a universal reconfigurable processor developed has been proposed as well. Experimental results are presented as delays on LCA level implementation for median filter, morphological processor, convolution processor, look-up-table processor, logic processor and histogram processor. These times compare with delays in general purpose processor and DSP processor.
Progress on adenovirus-vectored universal influenza vaccines.
Xiang, Kui; Ying, Guan; Yan, Zhou; Shanshan, Yan; Lei, Zhang; Hongjun, Li; Maosheng, Sun
2015-01-01
Influenza virus (IFV) infection causes serious health problems and heavy financial burdens each year worldwide. The classical inactivated influenza virus vaccine (IIVV) and live attenuated influenza vaccine (LAIV) must be updated regularly to match the new strains that evolve due to antigenic drift and antigenic shift. However, with the discovery of broadly neutralizing antibodies that recognize conserved antigens, and the CD8(+) T cell responses targeting viral internal proteins nucleoprotein (NP), matrix protein 1 (M1) and polymerase basic 1 (PB1), it is possible to develop a universal influenza vaccine based on the conserved hemagglutinin (HA) stem, NP, and matrix proteins. Recombinant adenovirus (rAd) is an ideal influenza vaccine vector because it has an ideal stability and safety profile, induces balanced humoral and cell-mediated immune responses due to activation of innate immunity, provides 'self-adjuvanting' activity, can mimic natural IFV infection, and confers seamless protection against mucosal pathogens. Moreover, this vector can be developed as a low-cost, rapid-response vaccine that can be quickly manufactured. Therefore, an adenovirus vector encoding conserved influenza antigens holds promise in the development of a universal influenza vaccine. This review will summarize the progress in adenovirus-vectored universal flu vaccines and discuss future novel approaches.
Java Processor Optimized for RTSJ
Directory of Open Access Journals (Sweden)
Tu Shiliang
2007-01-01
Full Text Available Due to the preeminent work of the real-time specification for Java (RTSJ, Java is increasingly expected to become the leading programming language in real-time systems. To provide a Java platform suitable for real-time applications, a Java processor which can execute Java bytecode is directly proposed in this paper. It provides efficient support in hardware for some mechanisms specified in the RTSJ and offers a simpler programming model through ameliorating the scoped memory of the RTSJ. The worst case execution time (WCET of the bytecodes implemented in this processor is predictable by employing the optimization method proposed in our previous work, in which all the processing interfering predictability is handled before bytecode execution. Further advantage of this method is to make the implementation of the processor simpler and suited to a low-cost FPGA chip.
Reverse ray tracing for transformation optics.
Hu, Chia-Yu; Lin, Chun-Hung
2015-06-29
Ray tracing is an important technique for predicting optical system performance. In the field of transformation optics, the Hamiltonian equations of motion for ray tracing are well known. The numerical solutions to the Hamiltonian equations of motion are affected by the complexities of the inhomogeneous and anisotropic indices of the optical device. Based on our knowledge, no previous work has been conducted on ray tracing for transformation optics with extreme inhomogeneity and anisotropicity. In this study, we present the use of 3D reverse ray tracing in transformation optics. The reverse ray tracing is derived from Fermat's principle based on a sweeping method instead of finding the full solution to ordinary differential equations. The sweeping method is employed to obtain the eikonal function. The wave vectors are then obtained from the gradient of that eikonal function map in the transformed space to acquire the illuminance. Because only the rays in the points of interest have to be traced, the reverse ray tracing provides an efficient approach to investigate the illuminance of a system. This approach is useful in any form of transformation optics where the material property tensor is a symmetric positive definite matrix. The performance and analysis of three transformation optics with inhomogeneous and anisotropic indices are explored. The ray trajectories and illuminances in these demonstration cases are successfully solved by the proposed reverse ray tracing method.
Data collection from FASTBUS to a DEC UNIBUS processor through the UNIBUS-Processor Interface
International Nuclear Information System (INIS)
Larwill, M.; Barsotti, E.; Lesny, D.; Pordes, R.
1983-01-01
This paper describes the use of the UNIBUS Processor Interface, an interface between FASTBUS and the Digital Equipment Corporation UNIBUS. The UPI was developed by Fermilab and the University of Illinois. Details of the use of this interface in a high energy physics experiment at Fermilab are given. The paper includes a discussion of the operation of the UPI on the UNIBUS of a VAX-11, and plans for using the UPI to perform data acquisition from FASTBUS to a VAX-11 Processor
Optical MSD symbolic substitution system based on a higher ordered rule
Reddy, A. K.; Mallikarjun, Tatipamula; Raina, J. P.
1992-12-01
The advantages provided by Photonic Computing has been well documented. An Optical arithmetic processor has to take full advantage of the massive parallelism in optical signals. Such a processor, using the Modified - Signed - Digit (MSD) number . (i) representation, has been presented here based (2) on the symbolic substitution 1ogi. The higher order symbolic substitution rules are formulated for the addition operation, which is carried out in just two steps. Based on the addition operation, the other arithmetic operations - subtraction, multiplication and division - are implemented. Finally, the usefulness of this MSD system is studied.
New development for low energy electron beam processor
International Nuclear Information System (INIS)
Takei, Taro; Goto, Hitoshi; Oizumi, Matsutoshi; Hirakawa, Tetsuya; Ochi, Masafumi
2003-01-01
Newly developed low-energy electron beam (EB) processors that have unique designs and configurations compared to conventional ones enable electron-beam treatment of small three-dimensional objects, such as grain-like agricultural products and small plastic parts. As the EB processor can irradiate the products from the whole angles, the uniform EB treatment can be achieved at one time regardless the complex shapes of the product. Here presented are two new EB processors: the first system has cylindrical process zone, which allows three-dimensional objects to be irradiated with one-pass treatment. The second is a tube-type small EB processor, achieving not only its compactor design, but also higher beam extraction efficiency and flexible installation of the irradiation heads. The basic design of each processor and potential applications with them will be presented in this paper. (author)
A data base processor semantics specification package
Fishwick, P. A.
1983-01-01
A Semantics Specification Package (DBPSSP) for the Intel Data Base Processor (DBP) is defined. DBPSSP serves as a collection of cross assembly tools that allow the analyst to assemble request blocks on the host computer for passage to the DBP. The assembly tools discussed in this report may be effectively used in conjunction with a DBP compatible data communications protocol to form a query processor, precompiler, or file management system for the database processor. The source modules representing the components of DBPSSP are fully commented and included.
Automatic SIMD vectorization of SSA-based control flow graphs
Karrenberg, Ralf
2015-01-01
Ralf Karrenberg presents Whole-Function Vectorization (WFV), an approach that allows a compiler to automatically create code that exploits data-parallelism using SIMD instructions. Data-parallel applications such as particle simulations, stock option price estimation or video decoding require the same computations to be performed on huge amounts of data. Without WFV, one processor core executes a single instance of a data-parallel function. WFV transforms the function to execute multiple instances at once using SIMD instructions. The author describes an advanced WFV algorithm that includes a v
Construction and decomposition of biorthogonal vector-valued wavelets with compact support
International Nuclear Information System (INIS)
Chen Qingjiang; Cao Huaixin; Shi Zhi
2009-01-01
In this article, we introduce vector-valued multiresolution analysis and the biorthogonal vector-valued wavelets with four-scale. The existence of a class of biorthogonal vector-valued wavelets with compact support associated with a pair of biorthogonal vector-valued scaling functions with compact support is discussed. A method for designing a class of biorthogonal compactly supported vector-valued wavelets with four-scale is proposed by virtue of multiresolution analysis and matrix theory. The biorthogonality properties concerning vector-valued wavelet packets are characterized with the aid of time-frequency analysis method and operator theory. Three biorthogonality formulas regarding them are presented.
Liquid lens: advances in adaptive optics
Casey, Shawn Patrick
2010-12-01
'Liquid lens' technologies promise significant advancements in machine vision and optical communications systems. Adaptations for machine vision, human vision correction, and optical communications are used to exemplify the versatile nature of this technology. Utilization of liquid lens elements allows the cost effective implementation of optical velocity measurement. The project consists of a custom image processor, camera, and interface. The images are passed into customized pattern recognition and optical character recognition algorithms. A single camera would be used for both speed detection and object recognition.
Globe hosts launch of new processor
2006-01-01
Launch of the quadecore processor chip at the Globe. On 14 November, in a series of major media events around the world, the chip-maker Intel launched its new 'quadcore' processor. For the regions of Europe, the Middle East and Africa, the day-long launch event took place in CERN's Globe of Science and Innovation, with over 30 journalists in attendance, coming from as far away as Johannesburg and Dubai. CERN was a significant choice for the event: the first tests of this new generation of processor in Europe had been made at CERN over the preceding months, as part of CERN openlab, a research partnership with leading IT companies such as Intel, HP and Oracle. The event also provided the opportunity for the journalists to visit ATLAS and the CERN Computer Centre. The strategy of putting multiple processor cores on the same chip, which has been pursued by Intel and other chip-makers in the last few years, represents an important departure from the more traditional improvements in the sheer speed of such chips. ...
Vector-matrix-quaternion, array and arithmetic packages: All HAL/S functions implemented in Ada
Klumpp, Allan R.; Kwong, David D.
1986-01-01
The HAL/S avionics programmers have enjoyed a variety of tools built into a language tailored to their special requirements. Ada is designed for a broader group of applications. Rather than providing built-in tools, Ada provides the elements with which users can build their own. Standard avionic packages remain to be developed. These must enable programmers to code in Ada as they have coded in HAL/S. The packages under development at JPL will provide all of the vector-matrix, array, and arithmetic functions described in the HAL/S manuals. In addition, the linear algebra package will provide all of the quaternion functions used in Shuttle steering and Galileo attitude control. Furthermore, using Ada's extensibility, many quaternion functions are being implemented as infix operations; equivalent capabilities were never implemented in HAL/S because doing so would entail modifying the compiler and expanding the language. With these packages, many HAL/S expressions will compile and execute in Ada, unchanged. Others can be converted simply by replacing the implicit HAL/S multiply operator with the Ada *. Errors will be trapped and identified. Input/output will be convenient and readable.
Matrix light and pixel light: optical system architecture and requirements to the light source
Spinger, Benno; Timinger, Andreas L.
2015-09-01
Modern Automotive headlamps enable improved functionality for more driving comfort and safety. Matrix or Pixel light headlamps are not restricted to either pure low beam functionality or pure high beam. Light in direction of oncoming traffic is selectively switched of, potential hazard can be marked via an isolated beam and the illumination on the road can even follow a bend. The optical architectures that enable these advanced functionalities are diverse. Electromechanical shutters and lens units moved by electric motors were the first ways to realize these systems. Switching multiple LED light sources is a more elegant and mechanically robust solution. While many basic functionalities can already be realized with a limited number of LEDs, an increasing number of pixels will lead to more driving comfort and better visibility. The required optical system needs not only to generate a desired beam distribution with a high angular dynamic, but also needs to guarantee minimal stray light and cross talk between the different pixels. The direct projection of the LED array via a lens is a simple but not very efficient optical system. We discuss different optical elements for pre-collimating the light with minimal cross talk and improved contrast between neighboring pixels. Depending on the selected optical system, we derive the basic light source requirements: luminance, surface area, contrast, flux and color homogeneity.
XL-100S microprogrammable processor
International Nuclear Information System (INIS)
Gorbunov, N.V.; Guzik, Z.; Sutulin, V.A.; Forytski, A.
1983-01-01
The XL-100S microprogrammable processor providing the multiprocessor operation mode in the XL system crate is described. The processor meets the EUR 6500 CAMAC standards, address up to 4 Mbyte memory, and interacts with 7 CAMAC branchas. Eight external requests initiate operations preset by a sequence of microcommands in a memory of the capacity up to 64 kwords of 32-Git. The microprocessor architecture allows one to emulate commands of the majority of mini- or micro-computers, including floating point operations. The XL-100S processor may be used in various branches of experimental physics: for physical experiment apparatus control, fast selection of useful physical events, organization of the of input/output operations, organization of direct assess to memory included, etc. The Am2900 microprocessor set is used as an elementary base. The device is made in the form of a single width CAMAC module
The optics of secondary polarized proton beams
International Nuclear Information System (INIS)
Carey, D.C.
1990-05-01
Polarized protons can be produced by the parity-violating decay of either lambda or sigma hyperons. A secondary bema of polarized protons can then be produced without the difficult procedure of accelerating polarized protons. The preservation of the polarization while the protons are being transmitted to a final focus places stringent limitations on the optics of the beam line. The equations of motion of a polarized particle in a magnetic field have been solved to first order for quadrupole and dipole magnets. The lowest order terms indicate that the polarization vector will be restored to its original direction upon passage through a magnetic system if the momentum vector is unaltered. Higher-order terms may be derived by an expansion in commutators of the rotation matrix and its longitudinal derivative. The higher-order polarization rotation terms then arise from the non-commutivity of the rotation matrices by large angles in three-dimensional space. 5 refs., 3 figs
Simulation of a processor switching circuit with APLSV
International Nuclear Information System (INIS)
Dilcher, H.
1979-01-01
The report describes the simulation of a processor switching circuit with APL. Furthermore an APL function is represented to simulate a processor in an assembly like language. Both together serve as a tool for studying processor properties. By means of the programming function it is also possible to program other simulated processors. The processor is to be used in the processing of data in real time analysis that occur in high energy physics experiments. The data are already offered to the computer in digitalized form. A typical data rate is at 10 KB/ sec. The data are structured in blocks. The particular blocks are 1 KB wide and are independent from each other. Aprocessor has to decide, whether the block data belong to an event that is part of the backround noise and can therefore be forgotten, or whether the data should be saved for a later evaluation. (orig./WB) [de
Reference vectors in economic choice
Directory of Open Access Journals (Sweden)
Teycir Abdelghani GOUCHA
2013-07-01
Full Text Available In this paper the introduction of notion of reference vector paves the way for a combination of classical and social approaches in the framework of referential preferences given by matrix groups. It is shown that individual demand issue from rational decision does not depend on that reference.
The Heidelberg POLYP - a flexible and fault-tolerant poly-processor
International Nuclear Information System (INIS)
Maenner, R.; Deluigi, B.
1981-01-01
The Heidelberg poly-processor system POLYP is described. It is intended to be used in nuclear physics for reprocessing of experimental data, in high energy physics as second-stage trigger processor, and generally in other applications requiring high-computing power. The POLYP system consists of any number of I/O-processors, processor modules (eventually of different types), global memory segments, and a host processor. All modules (up to several hundred) are connected by a multiple common-data-bus system; all processors, additionally, by a multiple sync bus system for processor/task-scheduling. All hard- and software is designed to be decentralized and free of bottle-necks. Most hardware-faults like single-bit errors in memory or multi-bit errors during transfers are automatically corrected. Defective modules, buses, etc., can be removed with only a graceful degradation of the system-throughput. (orig.)
International Nuclear Information System (INIS)
Ban, Ya.; Kotov, V.M.; Kharcharufkova, K.
1987-01-01
Algorithms for track signal filtration from bubble and streamer spark chambers read by CCD matrix with elements of 256x288 dimensions are described. The microprogrammed RISC processor is used for preliminary processing and filtration of data obtained. It makes possible to recognize and filter track elements in the zone of 0.25 mm 2 square during 0.17-0.20 s, that maintains it in real time operation
Gaikwad, Akshay; Rehal, Diksha; Singh, Amandeep; Arvind, Dorai, Kavita
2018-02-01
We present the NMR implementation of a scheme for selective and efficient quantum process tomography without ancilla. We generalize this scheme such that it can be implemented efficiently using only a set of measurements involving product operators. The method allows us to estimate any element of the quantum process matrix to a desired precision, provided a set of quantum states can be prepared efficiently. Our modified technique requires fewer experimental resources as compared to the standard implementation of selective and efficient quantum process tomography, as it exploits the special nature of NMR measurements to allow us to compute specific elements of the process matrix by a restrictive set of subsystem measurements. To demonstrate the efficacy of our scheme, we experimentally tomograph the processes corresponding to "no operation," a controlled-NOT (CNOT), and a controlled-Hadamard gate on a two-qubit NMR quantum information processor, with high fidelities.
A Note on Inclusion Intervals of Matrix Singular Values
Directory of Open Access Journals (Sweden)
Shu-Yu Cui
2012-01-01
Full Text Available We establish an inclusion relation between two known inclusion intervals of matrix singular values in some special case. In addition, based on the use of positive scale vectors, a known inclusion interval of matrix singular values is also improved.
A dedicated line-processor as used at the SHF
International Nuclear Information System (INIS)
Bevan, A.V.; Hatley, R.W.; Price, D.R.; Rankin, P.
1985-01-01
A hardwired trigger processor was used at the SLAC Hybrid Facility to find evidence for charged tracks originating from the fiducial volume of a 40'' rapidcycling bubble chamber. Straight-line projections of these tracks in the plane perpendicular to the applied magnetic field were searched for using data from three sets of proportional wire chambers (PWC). This information was made directly available to the processor by means of a special digitizing card. The results memory of the processor simulated read-only memory in a 168/E processor and was accessible by it. The 168/E controlled the issuing of a trigger command to the bubble chamber flash tubes. The same design of digitizer card used by the line processor was incorporated into the 168/E, again as read only memory, which allowed it access to the raw data for continual monitoring of trigger integrity. The design logic of the trigger processor was verified by running real PWC data through a FORTRAN simulation of the hardware. This enabled the debugging to become highly automated since a step by step, computer controlled comparison of processor registers to simulation predictions could be made
Petković, Dalibor; Shamshirband, Shahaboddin; Saboohi, Hadi; Ang, Tan Fong; Anuar, Nor Badrul; Rahman, Zulkanain Abdul; Pavlović, Nenad T.
2014-07-01
The quantitative assessment of image quality is an important consideration in any type of imaging system. The modulation transfer function (MTF) is a graphical description of the sharpness and contrast of an imaging system or of its individual components. The MTF is also known and spatial frequency response. The MTF curve has different meanings according to the corresponding frequency. The MTF of an optical system specifies the contrast transmitted by the system as a function of image size, and is determined by the inherent optical properties of the system. In this study, the polynomial and radial basis function (RBF) are applied as the kernel function of Support Vector Regression (SVR) to estimate and predict estimate MTF value of the actual optical system according to experimental tests. Instead of minimizing the observed training error, SVR_poly and SVR_rbf attempt to minimize the generalization error bound so as to achieve generalized performance. The experimental results show that an improvement in predictive accuracy and capability of generalization can be achieved by the SVR_rbf approach in compare to SVR_poly soft computing methodology.
Spectral analysis of the UFBG-based acousto—optical modulator in V-I transmission matrix formalism
Wu, Liang-Ying; Pei, Li; Liu, Chao; Wang, Yi-Qun; Weng, Si-Jun; Wang, Jian-Shuai
2014-11-01
In this study, the V-I transmission matrix formalism (V-I method) is proposed to analyze the spectrum characteristics of the uniform fiber Bragg grating (FBG)-based acousto—optic modulators (UFBG-AOM). The simulation results demonstrate that both the amplitude of the acoustically induced strain and the frequency of the acoustic wave (AW) have an effect on the spectrum. Additionally, the wavelength spacing between the primary reflectivity peak and the secondary reflectivity peak is proportional to the acoustic frequency with the ratio 0.1425 nm/MHz. Meanwhile, we compare the amount of calculation. For the FBG whose period is M, the calculation of the V-I method is 4 × (2M-1) in addition/subtraction, 8 × (2M - 1) in multiply/division and 2M in exponent arithmetic, which is almost a quarter of the multi-film method and transfer matrix (TM) method. The detailed analysis indicates that, compared with the conventional multi-film method and transfer matrix (TM) method, the V-I method is faster and less complex.
Spectral analysis of the UFBG-based acousto—optical modulator in V–I transmission matrix formalism
International Nuclear Information System (INIS)
Wu Liang-Ying; Pei Li; Liu Chao; Wang Yi-Qun; Weng Si-Jun; Wang Jian-Shuai
2014-01-01
In this study, the V–I transmission matrix formalism (V–I method) is proposed to analyze the spectrum characteristics of the uniform fiber Bragg grating (FBG)-based acousto—optic modulators (UFBG-AOM). The simulation results demonstrate that both the amplitude of the acoustically induced strain and the frequency of the acoustic wave (AW) have an effect on the spectrum. Additionally, the wavelength spacing between the primary reflectivity peak and the secondary reflectivity peak is proportional to the acoustic frequency with the ratio 0.1425 nm/MHz. Meanwhile, we compare the amount of calculation. For the FBG whose period is M, the calculation of the V–I method is 4 × (2M–1) in addition/subtraction, 8 × (2M – 1) in multiply/division and 2M in exponent arithmetic, which is almost a quarter of the multi-film method and transfer matrix (TM) method. The detailed analysis indicates that, compared with the conventional multi-film method and transfer matrix (TM) method, the V–I method is faster and less complex. (general)
Novel Optical Processor for Phased Array Antenna.
1992-10-20
parallel glass slide into the signal beam optical loop. The parallel glass acts like a variable phase shifter to the signal beam simulating phase drift...A list of possible designs are given as follows , _ _ Velocity fa (100dB/cm) Lumit Wavelength I M2I1 TeO2 Longi 4.2 /m/ns about 3 GHz 1.4 4m 34 Fast...subject to achievable acoustic frequency, the preferred materials are the slow shear wave in TeO2 , the fast shear wave in TeO2 or the shear waves in
Advances on geometric flux optical design method
García-Botella, Ángel; Fernández-Balbuena, Antonio Álvarez; Vázquez, Daniel
2017-09-01
Nonimaging optics is focused on the study of methods to design concentrators or illuminators systems. It can be included in the area of photometry and radiometry and it is governed by the laws of geometrical optics. The field vector method, which starts with the definition of the irradiance vector E, is one of the techniques used in nonimaging optics. Called "Geometrical flux vector" it has provide ideal designs. The main property of this model is, its ability to estimate how radiant energy is transferred by the optical system, from the concepts of field line, flux tube and pseudopotential surface, overcoming traditional raytrace methods. Nevertheless this model has been developed only at an academic level, where characteristic optical parameters are ideal not real and the studied geometries are simple. The main objective of the present paper is the application of the vector field method to the analysis and design of real concentration and illumination systems. We propose the development of a calculation tool for optical simulations by vector field, using algorithms based on Fermat`s principle, as an alternative to traditional tools for optical simulations by raytrace, based on reflection and refraction law. This new tool provides, first, traditional simulations results: efficiency, illuminance/irradiance calculations, angular distribution of light- with lower computation time, photometrical information needs about a few tens of field lines, in comparison with million rays needed nowadays. On the other hand the tool will provides new information as vector field maps produced by the system, composed by field lines and quasipotential surfaces. We show our first results with the vector field simulation tool.
Moritsuka, Fumi; Wada, Naoya; Sakamoto, Takahide; Kawanishi, Tetsuya; Komai, Yuki; Anzai, Shimako; Izutsu, Masayuki; Kodate, Kashiko
2007-06-11
In optical packet switching (OPS) and optical code division multiple access (OCDMA) systems, label generation and processing are key technologies. Recently, several label processors have been proposed and demonstrated. However, in order to recognize N different labels, N separate devices are required. Here, we propose and experimentally demonstrate a large-scale, multiple optical code (OC)-label generation and processing technology based on multi-port, a fully tunable optical spectrum synthesizer (OSS) and a multi-wavelength electro-optic frequency comb generator. The OSS can generate 80 different OC-labels simultaneously and can perform 80-parallel matched filtering. We also demonstrated its application to OCDMA.
Sojourn time tails in processor-sharing systems
Egorova, R.R.
2009-01-01
The processor-sharing discipline was originally introduced as a modeling abstraction for the design and performance analysis of the processing unit of a computer system. Under the processor-sharing discipline, all active tasks are assumed to be processed simultaneously, receiving an equal share of
An interactive parallel processor for data analysis
International Nuclear Information System (INIS)
Mong, J.; Logan, D.; Maples, C.; Rathbun, W.; Weaver, D.
1984-01-01
A parallel array of eight minicomputers has been assembled in an attempt to deal with kiloparameter data events. By exporting computer system functions to a separate processor, the authors have been able to achieve computer amplification linearly proportional to the number of executing processors
Multi-gigabit optical interconnects for next-generation on-board digital equipment
Venet, Norbert; Favaro, Henri; Sotom, Michel; Maignan, Michel; Berthon, Jacques
2017-11-01
Parallel optical interconnects are experimentally assessed as a technology that may offer the high-throughput data communication capabilities required to the next-generation on-board digital processing units. An optical backplane interconnect was breadboarded, on the basis of a digital transparent processor that provides flexible connectivity and variable bandwidth in telecom missions with multi-beam antenna coverage. The unit selected for the demonstration required that more than tens of Gbit/s be supported by the backplane. The demonstration made use of commercial parallel optical link modules at 850 nm wavelength, with 12 channels running at up to 2.5 Gbit/s. A flexible optical fibre circuit was developed so as to route board-to-board connections. It was plugged to the optical transmitter and receiver modules through 12-fibre MPO connectors. BER below 10-14 and optical link budgets in excess of 12 dB were measured, which would enable to integrate broadcasting. Integration of the optical backplane interconnect was successfully demonstrated by validating the overall digital processor functionality.
Multiprocessor Real-Time Scheduling with Hierarchical Processor Affinities
Bonifaci , Vincenzo; Brandenburg , Björn; D'Angelo , Gianlorenzo; Marchetti-Spaccamela , Alberto
2016-01-01
International audience; Many multiprocessor real-time operating systems offer the possibility to restrict the migrations of any task to a specified subset of processors by setting affinity masks. A notion of " strong arbitrary processor affinity scheduling " (strong APA scheduling) has been proposed; this notion avoids schedulability losses due to overly simple implementations of processor affinities. Due to potential overheads, strong APA has not been implemented so far in a real-time operat...
Hardware trigger processor for the MDT system
AUTHOR|(SzGeCERN)757787; The ATLAS collaboration; Hazen, Eric; Butler, John; Black, Kevin; Gastler, Daniel Edward; Ntekas, Konstantinos; Taffard, Anyes; Martinez Outschoorn, Verena; Ishino, Masaya; Okumura, Yasuyuki
2017-01-01
We are developing a low-latency hardware trigger processor for the Monitored Drift Tube system in the Muon spectrometer. The processor will fit candidate Muon tracks in the drift tubes in real time, improving significantly the momentum resolution provided by the dedicated trigger chambers. We present a novel pure-FPGA implementation of a Legendre transform segment finder, an associative-memory alternative implementation, an ARM (Zynq) processor-based track fitter, and compact ATCA carrier board architecture. The ATCA architecture is designed to allow a modular, staged approach to deployment of the system and exploration of alternative technologies.
Makita, Shuichi; Kurokawa, Kazuhiro; Hong, Young-Joo; Miura, Masahiro; Yasuno, Yoshiaki
2016-04-01
This paper describes a complex correlation mapping algorithm for optical coherence angiography (cmOCA). The proposed algorithm avoids the signal-to-noise ratio dependence and exhibits low noise in vasculature imaging. The complex correlation coefficient of the signals, rather than that of the measured data are estimated, and two-step averaging is introduced. Algorithms of motion artifact removal based on non perfusing tissue detection using correlation are developed. The algorithms are implemented with Jones-matrix OCT. Simultaneous imaging of pigmented tissue and vasculature is also achieved using degree of polarization uniformity imaging with cmOCA. An application of cmOCA to in vivo posterior human eyes is presented to demonstrate that high-contrast images of patients' eyes can be obtained.
Makita, Shuichi; Kurokawa, Kazuhiro; Hong, Young-Joo; Miura, Masahiro; Yasuno, Yoshiaki
2016-01-01
This paper describes a complex correlation mapping algorithm for optical coherence angiography (cmOCA). The proposed algorithm avoids the signal-to-noise ratio dependence and exhibits low noise in vasculature imaging. The complex correlation coefficient of the signals, rather than that of the measured data are estimated, and two-step averaging is introduced. Algorithms of motion artifact removal based on non perfusing tissue detection using correlation are developed. The algorithms are implemented with Jones-matrix OCT. Simultaneous imaging of pigmented tissue and vasculature is also achieved using degree of polarization uniformity imaging with cmOCA. An application of cmOCA to in vivo posterior human eyes is presented to demonstrate that high-contrast images of patients’ eyes can be obtained. PMID:27446673
Piqueras, M. A.; Mengual, T.; Navasquillo, O.; Sotom, M.; Caille, G.
2017-11-01
The evolution of broadband communication satellites shows a clear trend towards beam forming and beamswitching systems with efficient multiple access schemes with wide bandwidths, for which to be economically viable, the communication price shall be as low as possible. In such applications, the most demanding antenna concept is the Direct Radiating Array (DRA) since its use allows a flexible power allocation between beams and may afford failures in their active chains with low impact on the antenna radiating pattern. Forming multiple antenna beams, as for `multimedia via satellite' missions, can be done mainly in three ways: in microwave domain, by digital or optical processors: - Microwave beam-formers are strongly constrained by the mass and volume of microwave devices and waveguides - the bandwidth of digital processors is limited due to power consumption and complexity constraints. - The microwave photonics is an enabling technology that can improve the antenna feeding network performances, overcoming the limitations of the traditional technology in the more demanding scenarios, and may overcome the conventional RF beam-former issues, to generate accurately the very numerous time delays or phase shifts required in a DRA with a large number of beams and of radiating elements. Integrated optics technology can play a crucial role as an alternative technology for implementing beam-forming structures for satellite applications thanks to the well known advantages of this technology such as low volume and weight, huge electrical bandwidth, electro-magnetic interference immunity, low consumption, remote delivery capability with low-attenuation (by carrying all microwave signals over optical fibres) and the robustness and precision that exhibits integrated optics. Under the ESA contract 4000105095/12/NL/RA the consortium formed by DAS Photonics, Thales Alenia Space and the Nanophotonic Technology Center of Valencia is developing a three-dimensional Optical Beamforming
Linear optical response of carbon nanotubes under axial magnetic field
Moradian, Rostam; Chegel, Raad; Behzad, Somayeh
2010-04-01
We considered single walled carbon naotubes (SWCNTs) as real three dimensional (3D) systems in a cylindrical coordinate. The optical matrix elements and linear susceptibility, χ(ω), in the tight binding approximation in terms of one-dimensional wave vector, kz and subband index, l are calculated. In an external axial magnetic field optical frequency dependence of linear susceptibility are investigated. We found that axial magnetic field has two effects on the imaginary part of the linear susceptibility spectrum, in agreement with experimental results. The first effect is broadening and the second, splitting. Also we found that for all metallic zigzag and armchair SWCNTs, the axial magnetic field leads to the creation of a peak with energy less than 1.5 eV, contrary to what is observed in the absence of a magnetic field.
Three Interpretations of the Matrix Equation Ax = b
Larson, Christine; Zandieh, Michelle
2013-01-01
Many of the central ideas in an introductory undergraduate linear algebra course are closely tied to a set of interpretations of the matrix equation Ax = b (A is a matrix, x and b are vectors): linear combination interpretations, systems interpretations, and transformation interpretations. We consider graphic and symbolic representations for each,…
Optical links in handheld multimedia devices
van Geffen, S.; Duis, J.; Miller, R.
2008-04-01
Ever emerging applications in handheld multimedia devices such as mobile phones, laptop computers, portable video games and digital cameras requiring increased screen resolutions are driving higher aggregate bitrates between host processor and display(s) enabling services such as mobile video conferencing, video on demand and TV broadcasting. Larger displays and smaller phones require complex mechanical 3D hinge configurations striving to combine maximum functionality with compact building volumes. Conventional galvanic interconnections such as Micro-Coax and FPC carrying parallel digital data between host processor and display module may produce Electromagnetic Interference (EMI) and bandwidth limitations caused by small cable size and tight cable bends. To reduce the number of signals through a hinge, the mobile phone industry, organized in the MIPI (Mobile Industry Processor Interface) alliance, is currently defining an electrical interface transmitting serialized digital data at speeds >1Gbps. This interface allows for electrical or optical interconnects. Above 1Gbps optical links may offer a cost effective alternative because of their flexibility, increased bandwidth and immunity to EMI. This paper describes the development of optical links for handheld communication devices. A cable assembly based on a special Plastic Optical Fiber (POF) selected for its mechanical durability is terminated with a small form factor molded lens assembly which interfaces between an 850nm VCSEL transmitter and a receiving device on the printed circuit board of the display module. A statistical approach based on a Lean Design For Six Sigma (LDFSS) roadmap for new product development tries to find an optimum link definition which will be robust and low cost meeting the power consumption requirements appropriate for battery operated systems.
Research and simulation of the decoupling transformation in AC motor vector control
He, Jiaojiao; Zhao, Zhongjie; Liu, Ken; Zhang, Yongping; Yao, Tuozhong
2018-04-01
Permanent magnet synchronous motor (PMSM) is a nonlinear, strong coupling, multivariable complex object, and transformation decoupling can solve the coupling problem of permanent magnet synchronous motor. This paper gives a permanent magnet synchronous motor (PMSM) mathematical model, introduces the permanent magnet synchronous motor vector control coordinate transformation in the process of modal matrix inductance matrix transform through the matrix related knowledge of different coordinates of diagonalization, which makes the coupling between the independent, realize the control of motor current and excitation the torque current coupling separation, and derived the coordinate transformation matrix, the thought to solve the coupling problem of AC motor. Finally, in the Matlab/Simulink environment, through the establishment and combination between the PMSM ontology, coordinate conversion module, built the simulation model of permanent magnet synchronous motor vector control, introduces the model of each part, and analyzed the simulation results.
Towards a Process Algebra for Shared Processors
DEFF Research Database (Denmark)
Buchholtz, Mikael; Andersen, Jacob; Løvengreen, Hans Henrik
2002-01-01
We present initial work on a timed process algebra that models sharing of processor resources allowing preemption at arbitrary points in time. This enables us to model both the functional and the timely behaviour of concurrent processes executed on a single processor. We give a refinement relation...
High-speed packet filtering utilizing stream processors
Hummel, Richard J.; Fulp, Errin W.
2009-04-01
Parallel firewalls offer a scalable architecture for the next generation of high-speed networks. While these parallel systems can be implemented using multiple firewalls, the latest generation of stream processors can provide similar benefits with a significantly reduced latency due to locality. This paper describes how the Cell Broadband Engine (CBE), a popular stream processor, can be used as a high-speed packet filter. Results show the CBE can potentially process packets arriving at a rate of 1 Gbps with a latency less than 82 μ-seconds. Performance depends on how well the packet filtering process is translated to the unique stream processor architecture. For example the method used for transmitting data and control messages among the pseudo-independent processor cores has a significant impact on performance. Experimental results will also show the current limitations of a CBE operating system when used to process packets. Possible solutions to these issues will be discussed.
"Real-Time Optical Laboratory Linear Algebra Solution Of Partial Differential Equations"
Casasent, David; Jackson, James
1986-03-01
A Space Integrating (SI) Optical Linear Algebra Processor (OLAP) employing space and frequency-multiplexing, new partitioning and data flow, and achieving high accuracy performance with a non base-2 number system is described. Laboratory data on the performance of this system and the solution of parabolic Partial Differential Equations (PDEs) is provided. A multi-processor OLAP system is also described for the first time. It use in the solution of multiple banded matrices that frequently arise is then discussed. The utility and flexibility of this processor compared to digital systolic architectures should be apparent.
Blume, H.; Alexandru, R.; Applegate, R.; Giordano, T.; Kamiya, K.; Kresina, R.
1986-06-01
In a digital diagnostic imaging department, the majority of operations for handling and processing of images can be grouped into a small set of basic operations, such as image data buffering and storage, image processing and analysis, image display, image data transmission and image data compression. These operations occur in almost all nodes of the diagnostic imaging communications network of the department. An image processor architecture was developed in which each of these functions has been mapped into hardware and software modules. The modular approach has advantages in terms of economics, service, expandability and upgradeability. The architectural design is based on the principles of hierarchical functionality, distributed and parallel processing and aims at real time response. Parallel processing and real time response is facilitated in part by a dual bus system: a VME control bus and a high speed image data bus, consisting of 8 independent parallel 16-bit busses, capable of handling combined up to 144 MBytes/sec. The presented image processor is versatile enough to meet the video rate processing needs of digital subtraction angiography, the large pixel matrix processing requirements of static projection radiography, or the broad range of manipulation and display needs of a multi-modality diagnostic work station. Several hardware modules are described in detail. For illustrating the capabilities of the image processor, processed 2000 x 2000 pixel computed radiographs are shown and estimated computation times for executing the processing opera-tions are presented.
Fast processor for dilepton triggers
International Nuclear Information System (INIS)
Katsanevas, S.; Kostarakis, P.; Baltrusaitis, R.
1983-01-01
We describe a fast trigger processor, developed for and used in Fermilab experiment E-537, for selecting high-mass dimuon events produced by negative pions and anti-protons. The processor finds candidate tracks by matching hit information received from drift chambers and scintillation counters, and determines their momenta. Invariant masses are calculated for all possible pairs of tracks and an event is accepted if any invariant mass is greater than some preselectable minimum mass. The whole process, accomplished within 5 to 10 microseconds, achieves up to a ten-fold reduction in trigger rate
Algorithms for computational fluid dynamics n parallel processors
International Nuclear Information System (INIS)
Van de Velde, E.F.
1986-01-01
A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries
How random is a random vector?
International Nuclear Information System (INIS)
Eliazar, Iddo
2015-01-01
Over 80 years ago Samuel Wilks proposed that the “generalized variance” of a random vector is the determinant of its covariance matrix. To date, the notion and use of the generalized variance is confined only to very specific niches in statistics. In this paper we establish that the “Wilks standard deviation” –the square root of the generalized variance–is indeed the standard deviation of a random vector. We further establish that the “uncorrelation index” –a derivative of the Wilks standard deviation–is a measure of the overall correlation between the components of a random vector. Both the Wilks standard deviation and the uncorrelation index are, respectively, special cases of two general notions that we introduce: “randomness measures” and “independence indices” of random vectors. In turn, these general notions give rise to “randomness diagrams”—tangible planar visualizations that answer the question: How random is a random vector? The notion of “independence indices” yields a novel measure of correlation for Lévy laws. In general, the concepts and results presented in this paper are applicable to any field of science and engineering with random-vectors empirical data.
How random is a random vector?
Eliazar, Iddo
2015-12-01
Over 80 years ago Samuel Wilks proposed that the "generalized variance" of a random vector is the determinant of its covariance matrix. To date, the notion and use of the generalized variance is confined only to very specific niches in statistics. In this paper we establish that the "Wilks standard deviation" -the square root of the generalized variance-is indeed the standard deviation of a random vector. We further establish that the "uncorrelation index" -a derivative of the Wilks standard deviation-is a measure of the overall correlation between the components of a random vector. Both the Wilks standard deviation and the uncorrelation index are, respectively, special cases of two general notions that we introduce: "randomness measures" and "independence indices" of random vectors. In turn, these general notions give rise to "randomness diagrams"-tangible planar visualizations that answer the question: How random is a random vector? The notion of "independence indices" yields a novel measure of correlation for Lévy laws. In general, the concepts and results presented in this paper are applicable to any field of science and engineering with random-vectors empirical data.
Design of RISC Processor Using VHDL and Cadence
Moslehpour, Saeid; Puliroju, Chandrasekhar; Abu-Aisheh, Akram
The project deals about development of a basic RISC processor. The processor is designed with basic architecture consisting of internal modules like clock generator, memory, program counter, instruction register, accumulator, arithmetic and logic unit and decoder. This processor is mainly used for simple general purpose like arithmetic operations and which can be further developed for general purpose processor by increasing the size of the instruction register. The processor is designed in VHDL by using Xilinx 8.1i version. The present project also serves as an application of the knowledge gained from past studies of the PSPICE program. The study will show how PSPICE can be used to simplify massive complex circuits designed in VHDL Synthesis. The purpose of the project is to explore the designed RISC model piece by piece, examine and understand the Input/ Output pins, and to show how the VHDL synthesis code can be converted to a simplified PSPICE model. The project will also serve as a collection of various research materials about the pieces of the circuit.
Introduction to Matrix Algebra, Student's Text, Unit 23.
Allen, Frank B.; And Others
Unit 23 in the SMSG secondary school mathematics series is a student text covering the following topics in matrix algebra: matrix operations, the algebra of 2 X 2 matrices, matrices and linear systems, representation of column matrices as geometric vectors, and transformations of the plane. Listed in the appendix are four research exercises in…
Regularization in Matrix Relevance Learning
Schneider, Petra; Bunte, Kerstin; Stiekema, Han; Hammer, Barbara; Villmann, Thomas; Biehl, Michael
A In this paper, we present a regularization technique to extend recently proposed matrix learning schemes in learning vector quantization (LVQ). These learning algorithms extend the concept of adaptive distance measures in LVQ to the use of relevance matrices. In general, metric learning can
Real time processor for array speckle interferometry
Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos
1989-02-01
The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.
vector bilinear autoregressive time series model and its superiority
African Journals Online (AJOL)
KEYWORDS: Linear time series, Autoregressive process, Autocorrelation function, Partial autocorrelation function,. Vector time .... important result on matrix algebra with respect to the spectral ..... application to covariance analysis of super-.
On the resultant property of the Fisher information matrix of a vector ARMA process
Klein, A.; Mélard, G.; Spreij, P.
2004-01-01
A matrix is called a multiple resultant matrix associated to two matrix polynomials when it becomes singular if and only if the two matrix polynomials have at least one common eigenvalue. In this paper a new multiple resultant matrix is introduced. It concerns the Fisher information matrix (FIM) of
Magnetic and magneto-optical properties of CdS:Mn quantum dots in PVA matrix
International Nuclear Information System (INIS)
Fediv, V I; Savchuk, A I; Frasunyak, V M; Makoviy, V V; Savchuk, O A
2010-01-01
We have studied the magnetic and magneto-optical properties of CdS:Mn quantum dots in polyvinyl alcohol matrix synthesized by co-precipitation method. The size of quantum dots was estimated by means of absorption spectroscopy. The results of measurements of magnetic susceptibility as a function of temperature and spectral dependence of the Faraday rotation of CdS:Mn quantum dots / polyvinyl alcohol composites are presented. In this work magnetic susceptibility was investigated by Faraday's method at the temperatures of (78-300) K in magnetic fields of (0.05-0.8) T. The inverse magnetic susceptibility as a function of temperature follows a Curie Weiss law. Formation of ferromagnetic coupling between magnetic ions is supposed. Magneto-optical Faraday rotation has been investigated in the wavelength region (400-700) nm at temperature 300 K in a magnetic field up to 5 T. Sign of the Verdet constant is found to be negative.
Scalar-vector unitarity mixing in ξ gauge
International Nuclear Information System (INIS)
Kaloshin, A.E.; Radzhabov, A.E.
2003-01-01
The effect of unitary mixing of scalar and vector fields in general ξ gauge is studied. This effect takes place for nonconserved vector currents and ξ gauge generates some additional problems with unphysical scalar field. Solutions of Dyson-Schwinger equations and performed the renormalization of full propagators are obtained. The key feature of renormalization is the usage of Ward identity which relates some different Green functions. It is found that using of Ward identity leads to disappearing of ξ dependence in renormalization matrix element [ru
Stoykov, S.; Atanassov, E.; Margenov, S.
2016-10-01
Many of the scientific applications involve sparse or dense matrix operations, such as solving linear systems, matrix-matrix products, eigensolvers, etc. In what concerns structural nonlinear dynamics, the computations of periodic responses and the determination of stability of the solution are of primary interest. Shooting method iswidely used for obtaining periodic responses of nonlinear systems. The method involves simultaneously operations with sparse and dense matrices. One of the computationally expensive operations in the method is multiplication of sparse by dense matrices. In the current work, a new algorithm for sparse matrix by dense matrix products is presented. The algorithm takes into account the structure of the sparse matrix, which is obtained by space discretization of the nonlinear Mindlin's plate equation of motion by the finite element method. The algorithm is developed to use the vector engine of Intel Xeon Phi coprocessors. It is compared with the standard sparse matrix by dense matrix algorithm and the one developed by Intel MKL and it is shown that by considering the properties of the sparse matrix better algorithms can be developed.
International Nuclear Information System (INIS)
Palaniappan, J; Wang, H; Ogin, S L; Thorne, A; Reed, G T; Tjin, S C
2005-01-01
A comparison is made between conventional (i.e. uniform) and chirped optical fibre Bragg gratings (FBGs) for the detection of matrix cracking damage in composite materials. Matrix cracking damage is generally the first type of visible damage to develop under load in the off-axis plies of laminated composites and is generally the precursor of more serious damage mechanisms, particularly delamination. The detection of this type of damage is thus important, particularly in aerospace applications. Using a uniform FBG, characteristic changes develop in the reflected spectrum which can be used to identify crack development in the composite. The additional advantage of using a chirped grating is that the crack position can also be located
A Vector Representation for Thermodynamic Relationships
Pogliani, Lionello
2006-01-01
The existing vector formalism method for thermodynamic relationship maintains tractability and uses accessible mathematics, which can be seen as a diverting and entertaining step into the mathematical formalism of thermodynamics and as an elementary application of matrix algebra. The method is based on ideas and operations apt to improve the…
MPC Related Computational Capabilities of ARMv7A Processors
DEFF Research Database (Denmark)
Frison, Gianluca; Jørgensen, John Bagterp
2015-01-01
In recent years, the mass market of mobile devices has pushed the demand for increasingly fast but cheap processors. ARM, the world leader in this sector, has developed the Cortex-A series of processors with focus on computationally intensive applications. If properly programmed, these processors...... are powerful enough to solve the complex optimization problems arising in MPC in real-time, while keeping the traditional low-cost and low-power consumption. This makes these processors ideal candidates for use in embedded MPC. In this paper, we investigate the floating-point capabilities of Cortex A7, A9...... and A15 and show how to exploit the unique features of each processor to obtain the best performance, in the context of a novel implementation method for the linear-algebra routines used in MPC solvers. This method adapts high-performance computing techniques to the needs of embedded MPC. In particular...
Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.
2016-12-01
The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.
Energy Technology Data Exchange (ETDEWEB)
Wang, Wenbo [Imaging Unit, Integrative Oncology Department, BC Cancer Agency Research Center, 675 West 10th Avenue, Vancouver, British Columbia V5Z 1L3 (Canada); Department of Dermatology and Skin Science, University of British Columbia, 835 West 10th Avenue, Vancouver, British Columbia V5Z 4E8 (Canada); Department of Biomedical Engineering, University of British Columbia, KAIS 5500, 2332 Main Mall, Vancouver, British Columbia V6T 1Z4 (Canada); Wu, Zhenguo; Zhao, Jianhua; Lui, Harvey; Zeng, Haishan, E-mail: hzeng@bccrc.ca [Imaging Unit, Integrative Oncology Department, BC Cancer Agency Research Center, 675 West 10th Avenue, Vancouver, British Columbia V5Z 1L3 (Canada); Department of Dermatology and Skin Science, University of British Columbia, 835 West 10th Avenue, Vancouver, British Columbia V5Z 4E8 (Canada)
2016-06-15
Scanning speed and coupling efficiency of excitation light to optic fibres are two major technical challenges that limit the potential of fluorescence excitation-emission matrix (EEM) spectrometer for on-line applications and in vivo studies. In this paper, a novel EEM system, utilizing a supercontinuum white light source and acousto-optic tunable filters (AOTFs), was introduced and evaluated. The supercontinuum white light, generated by pumping a nonlinear photonic crystal fiber with an 800 nm femtosecond laser, was efficiently coupled into a bifurcated optic fiber bundle. High speed EEM spectral scanning was achieved using AOTFs both for selecting excitation wavelength and scanning emission spectra. Using calibration lamps (neon and mercury argon), wavelength deviations were determined to vary from 0.18 nm to −0.70 nm within the spectral range of 500–850 nm. Spectral bandwidth for filtered excitation light broadened by twofold compared to that measured with monochromatic light between 650 nm and 750 nm. The EEM spectra for methanol solutions of laser dyes were successfully acquired with this rapid fluorometer using an integration time of 5 s.
International Nuclear Information System (INIS)
Wang, Wenbo; Wu, Zhenguo; Zhao, Jianhua; Lui, Harvey; Zeng, Haishan
2016-01-01
Scanning speed and coupling efficiency of excitation light to optic fibres are two major technical challenges that limit the potential of fluorescence excitation-emission matrix (EEM) spectrometer for on-line applications and in vivo studies. In this paper, a novel EEM system, utilizing a supercontinuum white light source and acousto-optic tunable filters (AOTFs), was introduced and evaluated. The supercontinuum white light, generated by pumping a nonlinear photonic crystal fiber with an 800 nm femtosecond laser, was efficiently coupled into a bifurcated optic fiber bundle. High speed EEM spectral scanning was achieved using AOTFs both for selecting excitation wavelength and scanning emission spectra. Using calibration lamps (neon and mercury argon), wavelength deviations were determined to vary from 0.18 nm to −0.70 nm within the spectral range of 500–850 nm. Spectral bandwidth for filtered excitation light broadened by twofold compared to that measured with monochromatic light between 650 nm and 750 nm. The EEM spectra for methanol solutions of laser dyes were successfully acquired with this rapid fluorometer using an integration time of 5 s.
Reflection and transmission of normally incident full-vector X waves on planar interfaces
Salem, Mohamed
2011-12-23
The reflection and transmission of full-vector X waves normally incident on planar half-spaces and slabs are studied. For this purpose, X waves are expanded in terms of weighted vector Bessel beams; this new decomposition and reconstruction method offers a more lucid and intuitive interpretation of the physical phenomena observed upon the reflection or transmission of X waves when compared to the conventional plane-wave decomposition technique. Using the Bessel beam expansion approach, we have characterized changes in the field shape and the intensity distribution of the transmitted and reflected full-vector X waves. We have also identified a novel longitudinal shift, which is observed when a full-vector X wave is transmitted through a dielectric slab under frustrated total reflection condition. The results of our studies presented here are valuable in understanding the behavior of full-vector X waves when they are utilized in practical applications in electromagnetics, optics, and photonics, such as trap and tweezer setups, optical lithography, and immaterial probing. © 2011 Optical Society of America.
Logistic Fuel Processor Development
National Research Council Canada - National Science Library
Salavani, Reza
2004-01-01
The Air Base Technologies Division of the Air Force Research Laboratory has developed a logistic fuel processor that removes the sulfur content of the fuel and in the process converts logistic fuel...
Balagan, Semyon Anatolyevich; Nazarov, Vladimir U; Shevlyagin, Alexander Vladimirovich; Goroshko, Dmitrii L; Galkin, N G
2018-05-03
We develop an approach and present results of the combined molecular dynamics and density functional theory calculations of the structural and optical properties of the nanometer-sized crystallites embedded in a bulk crystalline matrix. The method is designed and implemented for both compatible and incompatible lattices of the nanocrystallite (NC) and the host matrix, when determining the NC optimal orientation relative to the matrix constitutes a challenging problem. We suggest and substantiate an expression for the cost function of the search algorithm, which is the energy per supercell generalized for varying number of atoms in the latter. The epitaxial relationships at the Si/NC interfaces and the optical properties are obtained and found to be in a reasonable agreement with experimental data. Dielectric functions show significant sensitivity to the NC's orientation relative to the matrix at energies below 0.5 eV. © 2018 IOP Publishing Ltd.
Balagan, Semyon A.; Nazarov, Vladimir U.; Shevlyagin, Alexander V.; Goroshko, Dmitrii L.; Galkin, Nikolay G.
2018-06-01
We develop an approach and present results of the combined molecular dynamics and density functional theory calculations of the structural and optical properties of the nanometer-sized crystallites embedded in a bulk crystalline matrix. The method is designed and implemented for both compatible and incompatible lattices of the nanocrystallite (NC) and the host matrix, when determining the NC optimal orientation relative to the matrix constitutes a challenging problem. We suggest and substantiate an expression for the cost function of the search algorithm, which is the energy per supercell generalized for varying number of atoms in the latter. The epitaxial relationships at the Si/NC interfaces and the optical properties are obtained and found to be in a reasonable agreement with experimental data. Dielectric functions show significant sensitivity to the NC’s orientation relative to the matrix at energies below 0.5 eV.
A matrix model from string field theory
Directory of Open Access Journals (Sweden)
Syoji Zeze
2016-09-01
Full Text Available We demonstrate that a Hermitian matrix model can be derived from level truncated open string field theory with Chan-Paton factors. The Hermitian matrix is coupled with a scalar and U(N vectors which are responsible for the D-brane at the tachyon vacuum. Effective potential for the scalar is evaluated both for finite and large N. Increase of potential height is observed in both cases. The large $N$ matrix integral is identified with a system of N ZZ branes and a ghost FZZT brane.