large intel paragon: Topics by WorldWideScience.org

Sample records for large intel paragon

Applications Performance on NAS Intel Paragon XP/S - 15#

Science.gov (United States)

Saini, Subhash; Simon, Horst D.; Copper, D. M. (Technical Monitor)

1994-01-01

The Numerical Aerodynamic Simulation (NAS) Systems Division received an Intel Touchstone Sigma prototype model Paragon XP/S- 15 in February, 1993. The i860 XP microprocessor with an integrated floating point unit and operating in dual -instruction mode gives peak performance of 75 million floating point operations (NIFLOPS) per second for 64 bit floating point arithmetic. It is used in the Paragon XP/S-15 which has been installed at NAS, NASA Ames Research Center. The NAS Paragon has 208 nodes and its peak performance is 15.6 GFLOPS. Here, we will report on early experience using the Paragon XP/S- 15. We have tested its performance using both kernels and applications of interest to NAS. We have measured the performance of BLAS 1, 2 and 3 both assembly-coded and Fortran coded on NAS Paragon XP/S- 15. Furthermore, we have investigated the performance of a single node one-dimensional FFT, a distributed two-dimensional FFT and a distributed three-dimensional FFT Finally, we measured the performance of NAS Parallel Benchmarks (NPB) on the Paragon and compare it with the performance obtained on other highly parallel machines, such as CM-5, CRAY T3D, IBM SP I, etc. In particular, we investigated the following issues, which can strongly affect the performance of the Paragon: a. Impact of the operating system: Intel currently uses as a default an operating system OSF/1 AD from the Open Software Foundation. The paging of Open Software Foundation (OSF) server at 22 MB to make more memory available for the application degrades the performance. We found that when the limit of 26 NIB per node out of 32 MB available is reached, the application is paged out of main memory using virtual memory. When the application starts paging, the performance is considerably reduced. We found that dynamic memory allocation can help applications performance under certain circumstances. b. Impact of data cache on the i860/XP: We measured the performance of the BLAS both assembly coded and Fortran
Balancing Contention and Synchronization on the Intel Paragon

Science.gov (United States)

Bokhari, Shahid H.; Nicol, David M.

1996-01-01

The Intel Paragon is a mesh-connected distributed memory parallel computer. It uses an oblivious and deterministic message routing algorithm: this permits us to develop highly optimized schedules for frequently needed communication patterns. The complete exchange is one such pattern. Several approaches are available for carrying it out on the mesh. We study an algorithm developed by Scott. This algorithm assumes that a communication link can carry one message at a time and that a node can only transmit one message at a time. It requires global synchronization to enforce a schedule of transmissions. Unfortunately global synchronization has substantial overhead on the Paragon. At the same time the powerful interconnection mechanism of this machine permits 2 or 3 messages to share a communication link with minor overhead. It can also overlap multiple message transmission from the same node to some extent. We develop a generalization of Scott's algorithm that executes complete exchange with a prescribed contention. Schedules that incur greater contention require fewer synchronization steps. This permits us to tradeoff contention against synchronization overhead. We describe the performance of this algorithm and compare it with Scott's original algorithm as well as with a naive algorithm that does not take interconnection structure into account. The Bounded contention algorithm is always better than Scott's algorithm and outperforms the naive algorithm for all but the smallest message sizes. The naive algorithm fails to work on meshes larger than 12 x 12. These results show that due consideration of processor interconnect and machine performance parameters is necessary to obtain peak performance from the Paragon and its successor mesh machines.
A new shared-memory programming paradigm for molecular dynamics simulations on the Intel Paragon

International Nuclear Information System (INIS)

D'Azevedo, E.F.; Romine, C.H.

1994-12-01

This report describes the use of shared memory emulation with DOLIB (Distributed Object Library) to simplify parallel programming on the Intel Paragon. A molecular dynamics application is used as an example to illustrate the use of the DOLIB shared memory library. SOTON-PAR, a parallel molecular dynamics code with explicit message-passing using a Lennard-Jones 6-12 potential, is rewritten using DOLIB primitives. The resulting code has no explicit message primitives and resembles a serial code. The new code can perform dynamic load balancing and achieves better performance than the original parallel code with explicit message-passing
Communication overhead on the Intel Paragon, IBM SP2 and Meiko CS-2

Science.gov (United States)

Bokhari, Shahid H.

1995-01-01

Interprocessor communication overhead is a crucial measure of the power of parallel computing systems-its impact can severely limit the performance of parallel programs. This report presents measurements of communication overhead on three contemporary commercial multicomputer systems: the Intel Paragon, the IBM SP2 and the Meiko CS-2. In each case the time to communicate between processors is presented as a function of message length. The time for global synchronization and memory access is discussed. The performance of these machines in emulating hypercubes and executing random pairwise exchanges is also investigated. It is shown that the interprocessor communication time depends heavily on the specific communication pattern required. These observations contradict the commonly held belief that communication overhead on contemporary machines is independent of the placement of tasks on processors. The information presented in this report permits the evaluation of the efficiency of parallel algorithm implementations against standard baselines.
The parallel processing of EGS4 code on distributed memory scalar parallel computer:Intel Paragon XP/S15-256

Energy Technology Data Exchange (ETDEWEB)

Takemiya, Hiroshi; Ohta, Hirofumi; Honma, Ichirou

1996-03-01

The parallelization of Electro-Magnetic Cascade Monte Carlo Simulation Code, EGS4 on distributed memory scalar parallel computer: Intel Paragon XP/S15-256 is described. EGS4 has the feature that calculation time for one incident particle is quite different from each other because of the dynamic generation of secondary particles and different behavior of each particle. Granularity for parallel processing, parallel programming model and the algorithm of parallel random number generation are discussed and two kinds of method, each of which allocates particles dynamically or statically, are used for the purpose of realizing high speed parallel processing of this code. Among four problems chosen for performance evaluation, the speedup factors for three problems have been attained to nearly 100 times with 128 processor. It has been found that when both the calculation time for each incident particles and its dispersion are large, it is preferable to use dynamic particle allocation method which can average the load for each processor. And it has also been found that when they are small, it is preferable to use static particle allocation method which reduces the communication overhead. Moreover, it is pointed out that to get the result accurately, it is necessary to use double precision variables in EGS4 code. Finally, the workflow of program parallelization is analyzed and tools for program parallelization through the experience of the EGS4 parallelization are discussed. (author).
A fast random number generator for the Intel Paragon supercomputer

Science.gov (United States)

Gutbrod, F.

1995-06-01

A pseudo-random number generator is presented which makes optimal use of the architecture of the i860-microprocessor and which is expected to have a very long period. It is therefore a good candidate for use on the parallel supercomputer Paragon XP. In the assembler version, it needs 6.4 cycles for a real∗4 random number. There is a FORTRAN routine which yields identical numbers up to rare and minor rounding discrepancies, and it needs 28 cycles. The FORTRAN performance on other microprocessors is somewhat better. Arguments for the quality of the generator and some numerical tests are given.
Navier-Stokes Aerodynamic Simulation of the V-22 Osprey on the Intel Paragon MPP

Science.gov (United States)

Vadyak, Joseph; Shrewsbury, George E.; Narramore, Jim C.; Montry, Gary; Holst, Terry; Kwak, Dochan (Technical Monitor)

1995-01-01

The paper will describe the Development of a general three-dimensional multiple grid zone Navier-Stokes flowfield simulation program (ENS3D-MPP) designed for efficient execution on the Intel Paragon Massively Parallel Processor (MPP) supercomputer, and the subsequent application of this method to the prediction of the viscous flowfield about the V-22 Osprey tiltrotor vehicle. The flowfield simulation code solves the thin Layer or full Navier-Stoke's equation - for viscous flow modeling, or the Euler equations for inviscid flow modeling on a structured multi-zone mesh. In the present paper only viscous simulations will be shown. The governing difference equations are solved using a time marching implicit approximate factorization method with either TVD upwind or central differencing used for the convective terms and central differencing used for the viscous diffusion terms. Steady state or Lime accurate solutions can be calculated. The present paper will focus on steady state applications, although time accurate solution analysis is the ultimate goal of this effort. Laminar viscosity is calculated using Sutherland's law and the Baldwin-Lomax two layer algebraic turbulence model is used to compute the eddy viscosity. The Simulation method uses an arbitrary block, curvilinear grid topology. An automatic grid adaption scheme is incorporated which concentrates grid points in high density gradient regions. A variety of user-specified boundary conditions are available. This paper will present the application of the scalable and superscalable versions to the steady state viscous flow analysis of the V-22 Osprey using a multiple zone global mesh. The mesh consists of a series of sheared cartesian grid blocks with polar grids embedded within to better simulate the wing tip mounted nacelle. MPP solutions will be shown in comparison to equivalent Cray C-90 results and also in comparison to experimental data. Discussions on meshing considerations, wall clock execution time
Paragon

Data.gov (United States)

Department of Veterans Affairs — The Paragon and Tririga Applications are project management programs utilized by CFM for construction programs. The contents of the databases are a compiliation of...
The new lattice code Paragon and its qualification for PWR core applications

International Nuclear Information System (INIS)

Ouisloumen, M.; Huria, H.C.; Mayhue, L.T.; Smith, R.M.; Kichty, M.J.; Matsumoto, H.; Tahara, Y.

2003-01-01

Paragon is a new two-dimensional transport code based on collision probability with interface current method and written entirely in Fortran 90/95. The qualification of Paragon has been completed and the results are very good. This qualification included a number of critical experiments. Comparisons to the Monte Carlo code MCNP for a wide variety of PWR assembly lattice types were also performed. In addition, Paragon-based core simulator models have been compared against PWR plant startup and operational data for a large number of plants. Some results of these calculations and also comparisons against models developed with a licensed Westinghouse lattice code, Phoenix-P, are presented. The qualification described in this paper provided the basis for the qualification of Paragon both as a validated transport code and as the nuclear data source for core simulator codes
EPRI depletion benchmark calculations using PARAGON

International Nuclear Information System (INIS)

Kucukboyaci, Vefa N.

2015-01-01

Highlights: • PARAGON depletion calculations are benchmarked against the EPRI reactivity decrement experiments. • Benchmarks cover a wide range of enrichments, burnups, cooling times, and burnable absorbers, and different depletion and storage conditions. • Results from PARAGON-SCALE scheme are more conservative relative to the benchmark data. • ENDF/B-VII based data reduces the excess conservatism and brings the predictions closer to benchmark reactivity decrement values. - Abstract: In order to conservatively apply burnup credit in spent fuel pool criticality analyses, code validation for both fresh and used fuel is required. Fresh fuel validation is typically done by modeling experiments from the “International Handbook.” A depletion validation can determine a bias and bias uncertainty for the worth of the isotopes not found in the fresh fuel critical experiments. Westinghouse’s burnup credit methodology uses PARAGON™ (Westinghouse 2-D lattice physics code) and its 70-group cross-section library, which have been benchmarked, qualified, and licensed both as a standalone transport code and as a nuclear data source for core design simulations. A bias and bias uncertainty for the worth of depletion isotopes, however, are not available for PARAGON. Instead, the 5% decrement approach for depletion uncertainty is used, as set forth in the Kopp memo. Recently, EPRI developed a set of benchmarks based on a large set of power distribution measurements to ascertain reactivity biases. The depletion reactivity has been used to create 11 benchmark cases for 10, 20, 30, 40, 50, and 60 GWd/MTU and 3 cooling times 100 h, 5 years, and 15 years. These benchmark cases are analyzed with PARAGON and the SCALE package and sensitivity studies are performed using different cross-section libraries based on ENDF/B-VI.3 and ENDF/B-VII data to assess that the 5% decrement approach is conservative for determining depletion uncertainty
ANALISA PARKIR PARAGON MALL SEMARANG

Directory of Open Access Journals (Sweden)

Mudjiastuti Handajani

2016-01-01

Full Text Available Pull traffic that occurs in complex Paragon Mall affect parking demand in these locations. For that we need research to obtain data - data that is requirsed by way of observations of drivers and parking attendants. The intent and purpose of this paper is to get the amount of traffic traveling tug toward Paragon Mall shopping complex and to determine parking demand at Paragon Mall shopping complex. Furthermore, based on the data that has been obtained so that data processing is done to obtain the desired result of which the accumulated maximum parking, towing trips, towing traffic, parking demand factors. From the calculation it is known that the maximum accumulation occurs on a Saturday, which is 173 to 295 types of automobiles and vehicle type motorcycle. Pull the whole trip occurred on Sunday at 4.76 people/100m2 Floor Area Groos. The amount of traffic tug Day on Saturday by 219 smp / hour for this type of car and 108 smp / hour for this type of motorcycle. Thus Paragon Mall has considerable pull the trip and have enough parking area where the number of plots for the car park as many as 260 plots for motorcycle parking and 800 parking plots.
Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor

OpenAIRE

Koskela, TS; Lobet, M

2017-01-01

In this session we show, in two case studies, how the roofline feature of Intel Advisor has been utilized to optimize the performance of kernels of the XGC1 and PICSAR codes in preparation for Intel Knights Landing architecture. The impact of the implemented optimizations and the benefits of using the automatic roofline feature of Intel Advisor to study performance of large applications will be presented. This demonstrates an effective optimization strategy that has enabled these science appl...
New compilers speed up applications for Intel-based systems; Intel Compilers pave the way for Intel's Hyper-threading technology

CERN Multimedia

2002-01-01

"Intel Corporation today introduced updated tools to help software developers optimize applications for Intel's expanding family of architectures with key innovations such as Intel's Hyper Threading Technology (1 page).
The Chesty Puller Paragon: Leadership Dogma or Model Doctrine

National Research Council Canada - National Science Library

Quintrall, Mickey

1997-01-01

In this study, I examine whether or not the United States Marine Corps senior warrior leaders should continue to use heroic warriors from the 1942-52 era as contemporary paragons of tactical leadership...
Intel: High Throughput Computing Collaboration: A CERN openlab / Intel collaboration

CERN Multimedia

CERN. Geneva

2015-01-01

The Intel/CERN High Throughput Computing Collaboration studies the application of upcoming Intel technologies to the very challenging environment of the LHC trigger and data-acquisition systems. These systems will need to transport and process many terabits of data every second, in some cases with tight latency constraints. Parallelisation and tight integration of accelerators and classical CPU via Intel's OmniPath fabric are the key elements in this project.
Parallelization of quantum molecular dynamics simulation code

International Nuclear Information System (INIS)

Kato, Kaori; Kunugi, Tomoaki; Shibahara, Masahiko; Kotake, Susumu

1998-02-01

A quantum molecular dynamics simulation code has been developed for the analysis of the thermalization of photon energies in the molecule or materials in Kansai Research Establishment. The simulation code is parallelized for both Scalar massively parallel computer (Intel Paragon XP/S75) and Vector parallel computer (Fujitsu VPP300/12). Scalable speed-up has been obtained with a distribution to processor units by division of particle group in both parallel computers. As a result of distribution to processor units not only by particle group but also by the particles calculation that is constructed with fine calculations, highly parallelization performance is achieved in Intel Paragon XP/S75. (author)
Intel Galileo essentials

CERN Document Server

Grimmett, Richard

2015-01-01

This book is for anyone who has ever been curious about using the Intel Galileo to create electronics projects. Some programming background is useful, but if you know how to use a personal computer, with the aid of the step-by-step instructions in this book, you can construct complex electronics projects that use the Intel Galileo.
Accessing Intel FPGAs for Acceleration

CERN Multimedia

CERN. Geneva

2018-01-01

In this presentation, we will discuss the latest tools and products from Intel that enables FPGAs to be deployed as Accelerators. We will first talk about the Acceleration Stack for Intel Xeon CPU with FPGAs which makes it easy to create, verify, and execute functions on the Intel Programmable Acceleration Card in a Data Center. We will then talk about the OpenCL flow which allows parallel software developers to create FPGA systems and deploy them using the OpenCL standard. Next, we will talk about the Intel High-Level Synthesis compiler which can convert C++ code into custom RTL code optimized for Intel FPGAs. Lastly, we will focus on the task of running Machine Learning inference on the FPGA leveraging some of the tools we discussed. About the speaker Karl Qi is Sr. Staff Applications Engineer, Technical Training. He has been with the Customer Training department at Altera/Intel for 8 years. Most recently, he is responsible for all training content relating to High-Level Design tools, including the OpenCL...
Intel Xeon Phi coprocessor high performance programming

CERN Document Server

Jeffers, James

2013-01-01

Authors Jim Jeffers and James Reinders spent two years helping educate customers about the prototype and pre-production hardware before Intel introduced the first Intel Xeon Phi coprocessor. They have distilled their own experiences coupled with insights from many expert customers, Intel Field Engineers, Application Engineers and Technical Consulting Engineers, to create this authoritative first book on the essentials of programming for this new architecture and these new products. This book is useful even before you ever touch a system with an Intel Xeon Phi coprocessor. To ensure that your applications run at maximum efficiency, the authors emphasize key techniques for programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi coprocessors, or other high performance microprocessors. Applying these techniques will generally increase your program performance on any system, and better prepare you for Intel Xeon Phi coprocessors and the Intel MIC architecture. It off...
Experiences implementing the MPI standard on Sandia`s lightweight kernels

Energy Technology Data Exchange (ETDEWEB)

Brightwell, R.; Greenberg, D.S.

1997-10-01

This technical report describes some lessons learned from implementing the Message Passing Interface (MPI) standard, and some proposed extentions to MPI, at Sandia. The implementations were developed using Sandia-developed lightweight kernels running on the Intel Paragon and Intel TeraFLOPS platforms. The motivations for this research are discussed, and a detailed analysis of several implementation issues is presented.

Home automation with Intel Galileo

CERN Document Server

Dundar, Onur

2015-01-01

This book is for anyone who wants to learn Intel Galileo for home automation and cross-platform software development. No knowledge of programming with Intel Galileo is assumed, but knowledge of the C programming language is essential.
Unlock performance secrets of next-gen Intel hardware

CERN Multimedia

CERN. Geneva

2015-01-01

Intel® Xeon Phi Product. About the speaker Zakhar is a software architect in Intel SSG group. His current role is Parallel Studio architect with focus on SIMD vector parallelism assistance tools. Before it he was working as Intel Advisor XE software architect and software development team-lead. Before joining Intel he was...
Feasibility Study for Paragon - Bisti Solar Ranch

Energy Technology Data Exchange (ETDEWEB)

Benally, Thomas [Navajo Hopi Land Commission Office (NHLCO), Window Rock, AZ (United States)

2015-06-01

The Navajo Hopi Land Commission Office (NHLCO) and Navajo Nation (NN) plan to develop renewable energy (RE) projects on the Paragon-Bisti Ranch (PBR) lands, set aside under the Navajo Hopi Land Settlement Act (NHLSA) for the benefit of Relocatees. This feasibility study (FS), which was funded under a grant from DOE’s Tribal Energy Program (TEP), was prepared in order to explore the development of the 22,000-acre PBR in northwestern New Mexico for solar energy facilities. Topics covered include: • Site Selection • Analysis of RE, and a Preliminary Design • Transmission, Interconnection Concerns and Export Markets • Financial and Economic Analysis • Environmental Study • Socioeconomic and Cultural Factors • Next Steps.
A scalable parallel algorithm for multiple objective linear programs

Science.gov (United States)

Wiecek, Malgorzata M.; Zhang, Hong

1994-01-01

This paper presents an ADBASE-based parallel algorithm for solving multiple objective linear programs (MOLP's). Job balance, speedup and scalability are of primary interest in evaluating efficiency of the new algorithm. Implementation results on Intel iPSC/2 and Paragon multiprocessors show that the algorithm significantly speeds up the process of solving MOLP's, which is understood as generating all or some efficient extreme points and unbounded efficient edges. The algorithm gives specially good results for large and very large problems. Motivation and justification for solving such large MOLP's are also included.
Effective SIMD Vectorization for Intel Xeon Phi Coprocessors

OpenAIRE

Tian, Xinmin; Saito, Hideki; Preis, Serguei V.; Garcia, Eric N.; Kozhukhov, Sergey S.; Masten, Matt; Cherkasov, Aleksei G.; Panchenko, Nikolay

2015-01-01

Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2D vectorization implemented in the Intel C/C++ and Fortran production compilers for Intel Xeon Phi coprocessors. A ...
Vectorization, parallelization and implementation of Quantum molecular dynamics codes (QQQF, MONTEV)

Energy Technology Data Exchange (ETDEWEB)

Kato, Kaori [High Energy Accelerator Research Organization, Tsukuba, Ibaraki (Japan); Kunugi, Tomoaki; Kotake, Susumu; Shibahara, Masahiko

1998-03-01

This report describes parallelization, vectorization and implementation for two simulation codes, Quantum molecular dynamics simulation code QQQF and Photon montecalro molecular dynamics simulation code MONTEV, that have been developed for the analysis of the thermalization of photon energies in the molecule or materials. QQQF has been vectorized and parallelized on Fujitsu VPP and has been implemented from VPP to Intel Paragon XP/S and parallelized. MONTEV has been implemented from VPP to Paragon and parallelized. (author)
INTEL: Intel based systems move up in supercomputing ranks

CERN Multimedia

2002-01-01

"The TOP500 supercomputer rankings released today at the Supercomputing 2002 conference show a dramatic increase in the number of Intel-based systems being deployed in high-performance computing (HPC) or supercomputing areas" (1/2 page).
Effective SIMD Vectorization for Intel Xeon Phi Coprocessors

Directory of Open Access Journals (Sweden)

Xinmin Tian

2015-01-01

Full Text Available Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2D vectorization implemented in the Intel C/C++ and Fortran production compilers for Intel Xeon Phi coprocessors. A set of workloads from several application domains is employed to conduct the performance study of our SIMD vectorization techniques. The performance results show that we achieved up to 12.5x performance gain on the Intel Xeon Phi coprocessor. We also demonstrate a 2000x performance speedup from the seamless integration of SIMD vectorization and parallelization.
Optimizing Performance of Combustion Chemistry Solvers on Intel's Many Integrated Core (MIC) Architectures

Energy Technology Data Exchange (ETDEWEB)

Sitaraman, Hariswaran [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Grout, Ray W [National Renewable Energy Laboratory (NREL), Golden, CO (United States)

2017-06-09

This work investigates novel algorithm designs and optimization techniques for restructuring chemistry integrators in zero and multidimensional combustion solvers, which can then be effectively used on the emerging generation of Intel's Many Integrated Core/Xeon Phi processors. These processors offer increased computing performance via large number of lightweight cores at relatively lower clock speeds compared to traditional processors (e.g. Intel Sandybridge/Ivybridge) used in current supercomputers. This style of processor can be productively used for chemistry integrators that form a costly part of computational combustion codes, in spite of their relatively lower clock speeds. Performance commensurate with traditional processors is achieved here through the combination of careful memory layout, exposing multiple levels of fine grain parallelism and through extensive use of vendor supported libraries (Cilk Plus and Math Kernel Libraries). Important optimization techniques for efficient memory usage and vectorization have been identified and quantified. These optimizations resulted in a factor of ~ 3 speed-up using Intel 2013 compiler and ~ 1.5 using Intel 2017 compiler for large chemical mechanisms compared to the unoptimized version on the Intel Xeon Phi. The strategies, especially with respect to memory usage and vectorization, should also be beneficial for general purpose computational fluid dynamics codes.
Scientific Computing and Apple's Intel Transition

CERN Document Server

CERN. Geneva

2006-01-01

Intel's published processor roadmap and how it may affect the future of personal and scientific computing About the speaker: Eric Albert is Senior Software Engineer in Apple's Core Technologies group. During Mac OS X's transition to Intel processors he has worked on almost every part of the operating system, from the OS kernel and compiler tools to appli...
[Intel random number generator-based true random number generator].

Science.gov (United States)

Huang, Feng; Shen, Hong

2004-09-01

To establish a true random number generator on the basis of certain Intel chips. The random numbers were acquired by programming using Microsoft Visual C++ 6.0 via register reading from the random number generator (RNG) unit of an Intel 815 chipset-based computer with Intel Security Driver (ISD). We tested the generator with 500 random numbers in NIST FIPS 140-1 and X(2) R-Squared test, and the result showed that the random number it generated satisfied the demand of independence and uniform distribution. We also compared the random numbers generated by Intel RNG-based true random number generator and those from the random number table statistically, by using the same amount of 7500 random numbers in the same value domain, which showed that the SD, SE and CV of Intel RNG-based random number generator were less than those of the random number table. The result of u test of two CVs revealed no significant difference between the two methods. Intel RNG-based random number generator can produce high-quality random numbers with good independence and uniform distribution, and solves some problems with random number table in acquisition of the random numbers.
Using the Intel Math Kernel Library on Peregrine | High-Performance

Science.gov (United States)

Computing | NREL the Intel Math Kernel Library on Peregrine Using the Intel Math Kernel Library on Peregrine Learn how to use the Intel Math Kernel Library (MKL) with Peregrine system software. MKL architectures. Core math functions in MKL include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier
Comparative VME Performance Tests for MEN A20 Intel-L865 and RIO-3 PPC-LynxOS platforms

CERN Document Server

Andersen, M; CERN. Geneva. BE Department

2009-01-01

This benchmark note presents test results from reading values over VME using different methods and different sizes of data registers, running on two different platforms Intel-L865 and PPC-LynxOS. We find that the PowerPC is a factor 3 faster in accessing an array of contiguous VME memory locations. Block transfer and DMA read accesses are also tested and compared with conventional single access reads.
CERN welcomes Intel Science Fair winners

CERN Multimedia

Katarina Anthony

2012-01-01

This June, CERN welcomed twelve gifted young scientists aged 15-18 for a week-long visit of the Laboratory. These talented students were the winners of a special award co-funded by CERN and Intel, given yearly at the Intel International Science and Engineering Fair (ISEF). The CERN award winners at the Intel ISEF 2012 Special Awards Ceremony. © Society for Science & the Public (SSP). The CERN award was set up back in 2009 as an opportunity to bring some of the best and brightest young minds to the Laboratory. The award winners are selected from among 1,500 talented students participating in ISEF – the world's largest pre-university science competition, in which students compete for more than €3 million in awards. “CERN gave an award – which was obviously this trip – to students studying physics, maths, electrical engineering and computer science,” says Benjamin Craig Bartlett, 17, from South Carolina, USA, wh...
PARAGON-IPS: A Portable Imaging Software System For Multiple Generations Of Image Processing Hardware

Science.gov (United States)

Montelione, John

1989-07-01

Paragon-IPS is a comprehensive software system which is available on virtually all generations of image processing hardware. It is designed for an image processing department or a scientist and engineer who is doing image processing full-time. It is being used by leading R&D labs in government agencies and Fortune 500 companies. Applications include reconnaissance, non-destructive testing, remote sensing, medical imaging, etc.
Intel Xeon Phi accelerated Weather Research and Forecasting (WRF) Goddard microphysics scheme

Science.gov (United States)

Mielikainen, J.; Huang, B.; Huang, A. H.-L.

2014-12-01

The Weather Research and Forecasting (WRF) model is a numerical weather prediction system designed to serve both atmospheric research and operational forecasting needs. The WRF development is a done in collaboration around the globe. Furthermore, the WRF is used by academic atmospheric scientists, weather forecasters at the operational centers and so on. The WRF contains several physics components. The most time consuming one is the microphysics. One microphysics scheme is the Goddard cloud microphysics scheme. It is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The Goddard microphysics scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Goddard scheme incorporates a large number of improvements. Thus, we have optimized the Goddard scheme code. In this paper, we present our results of optimizing the Goddard microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The Intel MIC is capable of executing a full operating system and entire programs rather than just kernels as the GPU does. The MIC coprocessor supports all important Intel development tools. Thus, the development environment is one familiar to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discussed in this paper. The results show that the optimizations improved performance of Goddard microphysics scheme on Xeon Phi 7120P by a factor of 4.7×. In addition, the optimizations reduced the Goddard microphysics scheme's share of the total WRF processing time from 20.0 to 7.5%. Furthermore, the same optimizations
Extension of the AMBER molecular dynamics software to Intel's Many Integrated Core (MIC) architecture

Science.gov (United States)

Needham, Perri J.; Bhuiyan, Ashraf; Walker, Ross C.

2016-04-01

We present an implementation of explicit solvent particle mesh Ewald (PME) classical molecular dynamics (MD) within the PMEMD molecular dynamics engine, that forms part of the AMBER v14 MD software package, that makes use of Intel Xeon Phi coprocessors by offloading portions of the PME direct summation and neighbor list build to the coprocessor. We refer to this implementation as pmemd MIC offload and in this paper present the technical details of the algorithm, including basic models for MPI and OpenMP configuration, and analyze the resultant performance. The algorithm provides the best performance improvement for large systems (>400,000 atoms), achieving a ∼35% performance improvement for satellite tobacco mosaic virus (1,067,095 atoms) when 2 Intel E5-2697 v2 processors (2 ×12 cores, 30M cache, 2.7 GHz) are coupled to an Intel Xeon Phi coprocessor (Model 7120P-1.238/1.333 GHz, 61 cores). The implementation utilizes a two-fold decomposition strategy: spatial decomposition using an MPI library and thread-based decomposition using OpenMP. We also present compiler optimization settings that improve the performance on Intel Xeon processors, while retaining simulation accuracy.
Parallel supercomputing: Advanced methods, algorithms, and software for large-scale linear and nonlinear problems

Energy Technology Data Exchange (ETDEWEB)

Carey, G.F.; Young, D.M.

1993-12-31

The program outlined here is directed to research on methods, algorithms, and software for distributed parallel supercomputers. Of particular interest are finite element methods and finite difference methods together with sparse iterative solution schemes for scientific and engineering computations of very large-scale systems. Both linear and nonlinear problems will be investigated. In the nonlinear case, applications with bifurcation to multiple solutions will be considered using continuation strategies. The parallelizable numerical methods of particular interest are a family of partitioning schemes embracing domain decomposition, element-by-element strategies, and multi-level techniques. The methods will be further developed incorporating parallel iterative solution algorithms with associated preconditioners in parallel computer software. The schemes will be implemented on distributed memory parallel architectures such as the CRAY MPP, Intel Paragon, the NCUBE3, and the Connection Machine. We will also consider other new architectures such as the Kendall-Square (KSQ) and proposed machines such as the TERA. The applications will focus on large-scale three-dimensional nonlinear flow and reservoir problems with strong convective transport contributions. These are legitimate grand challenge class computational fluid dynamics (CFD) problems of significant practical interest to DOE. The methods developed and algorithms will, however, be of wider interest.
Experience with Intel's Many Integrated Core Architecture in ATLAS Software

CERN Document Server

Fleischmann, S; The ATLAS collaboration; Lavrijsen, W; Neumann, M; Vitillo, R

2014-01-01

Intel recently released the first commercial boards of its Many Integrated Core (MIC) Architecture. MIC is Intel's solution for the domain of throughput computing, currently dominated by general purpose programming on graphics processors (GPGPU). MIC allows the use of the more familiar x86 programming model and supports standard technologies such as OpenMP, MPI, and Intel's Threading Building Blocks. This should make it possible to develop for both throughput and latency devices using a single code base.\
Experience with Intel's Many Integrated Core Architecture in ATLAS Software

CERN Document Server

Fleischmann, S; The ATLAS collaboration; Lavrijsen, W; Neumann, M; Vitillo, R

2013-01-01

Intel recently released the first commercial boards of its Many Integrated Core (MIC) Architecture. MIC is Intel's solution for the domain of throughput computing, currently dominated by general purpose programming on graphics processors (GPGPU). MIC allows the use of the more familiar x86 programming model and supports standard technologies such as OpenMP, MPI, and Intel's Threading Building Blocks. This should make it possible to develop for both throughput and latency devices using a single code base.\

Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

OpenAIRE

Abdul Kareem PARCHUR; Kuppangari Krishna RAO; Fazal NOORBASHA; Ram Asaray SINGH

2010-01-01

As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI) benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT) @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalab...
A fast global sum on the coarse-grained scalar parallel computer

International Nuclear Information System (INIS)

Ohta, Toshio.

1996-03-01

A new global sum subroutine, which has the same function as the prepared one on the Intel Paragon, is developed. The algorithm is simple and faster. It makes the performance 10 times faster than the original one in case of 128 nodes. The results will be shown with the characteristics, restrictions and extendability. (author)
Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

Directory of Open Access Journals (Sweden)

Abdul Kareem PARCHUR

2012-08-01

Full Text Available High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310. The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many key application areas in modern generation. The scaling of performance in two major series of Intel Xeon processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310 has been analyzed using the performance numbers of 12 CPU2006 integer benchmarks, performance numbers that exhibit significant differences in performance. The results and analysis can be used by performance engineers, scientists and developers to better understand the performance scaling in modern generation processors.
Theorem Proving in Intel Hardware Design

Science.gov (United States)

O'Leary, John

2009-01-01

For the past decade, a framework combining model checking (symbolic trajectory evaluation) and higher-order logic theorem proving has been in production use at Intel. Our tools and methodology have been used to formally verify execution cluster functionality (including floating-point operations) for a number of Intel products, including the Pentium(Registered TradeMark)4 and Core(TradeMark)i7 processors. Hardware verification in 2009 is much more challenging than it was in 1999 - today s CPU chip designs contain many processor cores and significant firmware content. This talk will attempt to distill the lessons learned over the past ten years, discuss how they apply to today s problems, outline some future directions.
Revisiting Intel Xeon Phi optimization of Thompson cloud microphysics scheme in Weather Research and Forecasting (WRF) model

Science.gov (United States)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen

2015-10-01

The Thompson cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Thompson scheme incorporates a large number of improvements. Thus, we have optimized the speed of this important part of WRF. Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the Thompson microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. New optimizations for an updated Thompson scheme are discusses in this paper. The optimizations improved the performance of the original Thompson code on Xeon Phi 7120P by a factor of 1.8x. Furthermore, the same optimizations improved the performance of the Thompson on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 1.8x compared to the original Thompson code.
Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor

Directory of Open Access Journals (Sweden)

Lukasz Szustak

2015-01-01

Full Text Available The multidimensional positive definite advection transport algorithm (MPDATA belongs to the group of nonoscillatory forward-in-time algorithms and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model. In this work, we outline an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture. In order to utilize available computing resources, we propose the (3 + 1D decomposition of MPDATA heterogeneous stencil computations. This approach is based on combination of the loop tiling and fusion techniques. It allows us to ease memory/communication bounds and better exploit the theoretical floating point efficiency of target computing platforms. An important method of improving the efficiency of the (3 + 1D decomposition is partitioning of available cores/threads into work teams. It permits for reducing inter-cache communication overheads. This method also increases opportunities for the efficient distribution of MPDATA computation onto available resources of the Intel MIC architecture, as well as Intel CPUs. We discuss preliminary performance results obtained on two hybrid platforms, containing two CPUs and Intel Xeon Phi. The top-of-the-line Intel Xeon Phi 7120P gives the best performance results, and executes MPDATA almost 2 times faster than two Intel Xeon E5-2697v2 CPUs.
Implementation of an Agent-Based Parallel Tissue Modelling Framework for the Intel MIC Architecture

Directory of Open Access Journals (Sweden)

Maciej Cytowski

2017-01-01

Full Text Available Timothy is a novel large scale modelling framework that allows simulating of biological processes involving different cellular colonies growing and interacting with variable environment. Timothy was designed for execution on massively parallel High Performance Computing (HPC systems. The high parallel scalability of the implementation allows for simulations of up to 109 individual cells (i.e., simulations at tissue spatial scales of up to 1 cm3 in size. With the recent advancements of the Timothy model, it has become critical to ensure appropriate performance level on emerging HPC architectures. For instance, the introduction of blood vessels supplying nutrients to the tissue is a very important step towards realistic simulations of complex biological processes, but it greatly increased the computational complexity of the model. In this paper, we describe the process of modernization of the application in order to achieve high computational performance on HPC hybrid systems based on modern Intel® MIC architecture. Experimental results on the Intel Xeon Phi™ coprocessor x100 and the Intel Xeon Phi processor x200 are presented.
Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

OpenAIRE

Abdul Kareem PARCHUR; Ram Asaray SINGH

2012-01-01

High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310). The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many ke...
Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Energy Technology Data Exchange (ETDEWEB)

Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeffrey A.; Tallent, Nathan R.; Vishnu, Abhinav; Kerbyson, Darren J.

2017-08-24

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors --- including NVIDIA, Intel, AMD, and IBM --- have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. This paper provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling --- sometimes encouraged by restricted GPU memory --- NVLink is less important.
MILC staggered conjugate gradient performance on Intel KNL

OpenAIRE

DeTar, Carleton; Doerfler, Douglas; Gottlieb, Steven; Jha, Ashish; Kalamkar, Dhiraj; Li, Ruizi; Toussaint, Doug

2016-01-01

We review our work done to optimize the staggered conjugate gradient (CG) algorithm in the MILC code for use with the Intel Knights Landing (KNL) architecture. KNL is the second gener- ation Intel Xeon Phi processor. It is capable of massive thread parallelism, data parallelism, and high on-board memory bandwidth and is being adopted in supercomputing centers for scientific research. The CG solver consumes the majority of time in production running, so we have spent most of our effort on it. ...
Endf/B-VII.0 Based Library for Paragon - 313

International Nuclear Information System (INIS)

Huria, H.C.; Kucukboyaci, V.N.; Ouisloumen, M.

2010-01-01

A new 70-group library has been generated for the Westinghouse lattice physics code PARAGON using the ENDF/B-VII.0 nuclear data files. The new library retains the major features of the current library, including the number of energy groups and the reduction in the U-238 resonance integral. The upper bound for the up-scattering effects in the new library, however, has been moved to 4.0 eV from 2.1 eV for better MOX fuel predictions. The new library has been used to analyze standard benchmarks and also to compare the measured and predicted parameters for different types of Westinghouse and Combustion Engineering (CE) type operating reactor cores. Results indicate that the new library will not impact the reactivity, power distribution and the temperature coefficient predictions over a wide range of physics design parameters; however, will improve the MOX core predictions. In other words, the ENDF/B-VI.3 and ENDF/B-VII.0 produce similar results for reactor core calculations. (authors)
Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

Directory of Open Access Journals (Sweden)

Abdul Kareem PARCHUR

2010-12-01

Full Text Available As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalability in Intel Core 2 Duo series processors. Even though AI benchmarks have similar execution time, they have dissimilar characteristics which are identified using principal component analysis and dendogram. As the processor frequency increased from 1.8 GHz to 3.167 GHz the execution time is decreased by ~370 sec for AI workloads. In the case of Physics/Quantum Computing programs it was ~940 sec.
Parallelization of particle transport using Intel® TBB

International Nuclear Information System (INIS)

Apostolakis, J; Brun, R; Carminati, F; Gheata, A; Wenzel, S; Belogurov, S; Ovcharenko, E

2014-01-01

One of the current challenges in HEP computing is the development of particle propagation algorithms capable of efficiently use all performance aspects of modern computing devices. The Geant-Vector project at CERN has recently introduced an approach in this direction. This paper describes the implementation of a similar workflow using the Intel(r) Threading Building Blocks (Intel(r) TBB) library. This approach is intended to overcome the potential bottleneck of having a single dispatcher on many-core architectures and to result in better scalability compared to the initial pthreads-based version.
Performance of a plasma fluid code on the Intel parallel computers

International Nuclear Information System (INIS)

Lynch, V.E.; Carreras, B.A.; Drake, J.B.; Leboeuf, J.N.; Liewer, P.

1992-01-01

One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel σ machine gives an improvement factor close to 64 over the single-processor CRAY-2
Performance of a plasma fluid code on the Intel parallel computers

Science.gov (United States)

Lynch, V. E.; Carreras, B. A.; Drake, J. B.; Leboeuf, J. N.; Liewer, P.

1992-01-01

One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2.
Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations

Science.gov (United States)

Bernaschi, M.; Bisson, M.; Salvadore, F.

2014-10-01

We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. We present data also for a traditional high-end multi-core architecture: the Intel Sandy Bridge. The results show that although on the two Intel architectures it is possible to use basically the same code, the performances of a Intel MIC change dramatically depending on (apparently) minor details. Another issue is that to obtain a reasonable scalability with the Intel Phi coprocessor (Phi is the coprocessor that implements the MIC architecture) in a cluster configuration it is necessary to use the so-called offload mode which reduces the performances of the single system. As to the GPU, the Kepler architecture offers a clear advantage with respect to the previous Fermi architecture maintaining exactly the same source code. Scalability of the multi-GPU implementation remains very good by using the CPU as a communication co-processor of the GPU. All source codes are provided for inspection and for double-checking the results.
Performance of a plasma fluid code on the Intel parallel computers

International Nuclear Information System (INIS)

Lynch, V.E.; Carreras, B.A.; Drake, J.B.; Leboeuf, J.N.; Liewer, P.

1992-01-01

One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2. 12 refs
Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS

Science.gov (United States)

Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.

2018-03-01

Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.
Efficient Implementation of Many-body Quantum Chemical Methods on the Intel Xeon Phi Coprocessor

Energy Technology Data Exchange (ETDEWEB)

Apra, Edoardo; Klemm, Michael; Kowalski, Karol

2014-12-01

This paper presents the implementation and performance of the highly accurate CCSD(T) quantum chemistry method on the Intel Xeon Phi coprocessor within the context of the NWChem computational chemistry package. The widespread use of highly correlated methods in electronic structure calculations is contingent upon the interplay between advances in theory and the possibility of utilizing the ever-growing computer power of emerging heterogeneous architectures. We discuss the design decisions of our implementation as well as the optimizations applied to the compute kernels and data transfers between host and coprocessor. We show the feasibility of adopting the Intel Many Integrated Core Architecture and the Intel Xeon Phi coprocessor for developing efficient computational chemistry modeling tools. Remarkable scalability is demonstrated by benchmarks. Our solution scales up to a total of 62560 cores with the concurrent utilization of Intel Xeon processors and Intel Xeon Phi coprocessors.
Intel Many Integrated Core (MIC) architecture optimization strategies for a memory-bound Weather Research and Forecasting (WRF) Goddard microphysics scheme

Science.gov (United States)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

2014-10-01

The Goddard cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The WRF is a widely used weather prediction system in the world. It development is a done in collaborative around the globe. The Goddard microphysics scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Goddard scheme incorporates a large number of improvements. Thus, we have optimized the code of this important part of WRF. In this paper, we present our results of optimizing the Goddard microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The Intel MIC is capable of executing a full operating system and entire programs rather than just kernels as the GPU do. The MIC coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 4.7x. Furthermore, the same optimizations improved performance on a dual socket Intel Xeon E5-2670 system by a factor of 2.8x compared to the original code.

Intel Corporation osaleb Eesti koolitusprogrammis / Raivo Juurak

Index Scriptorium Estoniae

Juurak, Raivo, 1949-

2001-01-01

Haridusministeeriumis tutvustati infotehnoloogiaalast koolitusprogrammi, milles osaleb maailma suuremaid arvutifirmasid Intel Corporation. Koolituskursuse käigus õpetatakse aineõpetajaid oma ainetundides interneti võimalusi kasutama. 50-tunnised kursused viiakse läbi kõigis maakondades
3-D electromagnetic plasma particle simulations on the Intel Delta parallel computer

International Nuclear Information System (INIS)

Wang, J.; Liewer, P.C.

1994-01-01

A three-dimensional electromagnetic PIC code has been developed on the 512 node Intel Touchstone Delta MIMD parallel computer. This code is based on the General Concurrent PIC algorithm which uses a domain decomposition to divide the computation among the processors. The 3D simulation domain can be partitioned into 1-, 2-, or 3-dimensional sub-domains. Particles must be exchanged between processors as they move among the subdomains. The Intel Delta allows one to use this code for very-large-scale simulations (i.e. over 10 8 particles and 10 6 grid cells). The parallel efficiency of this code is measured, and the overall code performance on the Delta is compared with that on Cray supercomputers. It is shown that their code runs with a high parallel efficiency of ≥ 95% for large size problems. The particle push time achieved is 115 nsecs/particle/time step for 162 million particles on 512 nodes. Comparing with the performance on a single processor Cray C90, this represents a factor of 58 speedup. The code uses a finite-difference leap frog method for field solve which is significantly more efficient than fast fourier transforms on parallel computers. The performance of this code on the 128 node Cray T3D will also be discussed
Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing

OpenAIRE

Kanamori, Issaku; Matsufuru, Hideo

2017-01-01

We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel Xeon Phi Knights Landing (KNL). The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that represents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather general methods for the SIMD architecture of KNL, such as using intrinsics and manual prefetching, to the matrix multiplication an...
75 FR 21353 - Intel Corporation, Fab 20 Division, Including On-Site Leased Workers From Volt Technical...

Science.gov (United States)

2010-04-23

... DEPARTMENT OF LABOR Employment and Training Administration [TA-W-73,642] Intel Corporation, Fab 20... of Intel Corporation, Fab 20 Division, including on-site leased workers of Volt Technical Resources... Precision, Inc. were employed on-site at the Hillsboro, Oregon location of Intel Corporation, Fab 20...
MILC staggered conjugate gradient performance on Intel KNL

Energy Technology Data Exchange (ETDEWEB)

Li, Ruiz [Indiana Univ., Bloomington, IN (United States). Dept. of Physics; Detar, Carleton [Univ. of Utah, Salt Lake City, UT (United States). Dept. of Physics and Astronomy; Doerfler, Douglas W. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC); Gottlieb, Steven [Indiana Univ., Bloomington, IN (United States). Dept. of Physics; Jha, Asish [Intel Corp., Hillsboro, OR (United States). Sofware and Services Group; Kalamkar, Dhiraj [Intel Labs., Bangalore (India). Parallel Computing Lab.; Toussaint, Doug [Univ. of Arizona, Tucson, AZ (United States). Physics Dept.

2016-11-03

We review our work done to optimize the staggered conjugate gradient (CG) algorithm in the MILC code for use with the Intel Knights Landing (KNL) architecture. KNL is the second gener- ation Intel Xeon Phi processor. It is capable of massive thread parallelism, data parallelism, and high on-board memory bandwidth and is being adopted in supercomputing centers for scientific research. The CG solver consumes the majority of time in production running, so we have spent most of our effort on it. We compare performance of an MPI+OpenMP baseline version of the MILC code with a version incorporating the QPhiX staggered CG solver, for both one-node and multi-node runs.
Parallel community climate model: Description and user`s guide

Energy Technology Data Exchange (ETDEWEB)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H. [and others

1996-07-15

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.
An INTEL 8080 microprocessor development system

International Nuclear Information System (INIS)

Horne, P.J.

1977-01-01

The INTEL 8080 has become one of the two most widely used microprocessors at CERN, the other being the MOTOROLA 6800. Even thouth this is the case, there have been, to date, only rudimentary facilities available for aiding the development of application programs for this microprocessor. An ideal development system is one which has a sophisticated editing and filing system, an assembler/compiler, and access to the microprocessor application. In many instances access to a PROM programmer is also required, as the application may utilize only PROMs for program storage. With these thoughts in mind, an INTEL 8080 microprocessor development system was implemented in the Proton Synchrotron (PS) Division. This system utilizes a PDP 11/45 as the editing and file-handling machine, and an MSC 8/MOD 80 microcomputer for assembling, PROM programming and debugging user programs at run time. The two machines are linked by an existing CAMAC crate system which will also provide the means of access to microprocessor applications in CAMAC and the interface of the development system to any other application. (Auth.)
Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors

Science.gov (United States)

Yi, Hongsuk

2014-03-01

Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.
75 FR 48338 - Intel Corporation; Analysis of Proposed Consent Order to Aid Public Comment

Science.gov (United States)

2010-08-10

... product road maps, its compilers, and product benchmarking (Sections VI, VII, and VIII). The Proposed... alleges that Intel's failure to fully disclose the changes it made to its compilers and libraries... benchmarking organizations the effects of its compiler redesign on non-Intel CPUs. Several benchmarking...
Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor

Energy Technology Data Exchange (ETDEWEB)

Koskela, Tuomas S.; Lobet, Mathieu; Deslippe, Jack; Matveev, Zakhar

2017-05-23

In this session we show, in two case studies, how the roofline feature of Intel Advisor has been utilized to optimize the performance of kernels of the XGC1 and PICSAR codes in preparation for Intel Knights Landing architecture. The impact of the implemented optimizations and the benefits of using the automatic roofline feature of Intel Advisor to study performance of large applications will be presented. This demonstrates an effective optimization strategy that has enabled these science applications to achieve up to 4.6 times speed-up and prepare for future exascale architectures. # Goal/Relevance of Session The roofline model [1,2] is a powerful tool for analyzing the performance of applications with respect to the theoretical peak achievable on a given computer architecture. It allows one to graphically represent the performance of an application in terms of operational intensity, i.e. the ratio of flops performed and bytes moved from memory in order to guide optimization efforts. Given the scale and complexity of modern science applications, it can often be a tedious task for the user to perform the analysis on the level of functions or loops to identify where performance gains can be made. With new Intel tools, it is now possible to automate this task, as well as base the estimates of peak performance on measurements rather than vendor specifications. The goal of this session is to demonstrate how the roofline feature of Intel Advisor can be used to balance memory vs. computation related optimization efforts and effectively identify performance bottlenecks. A series of typical optimization techniques: cache blocking, structure refactoring, data alignment, and vectorization illustrated by the kernel cases will be addressed. # Description of the codes ## XGC1 The XGC1 code [3] is a magnetic fusion Particle-In-Cell code that uses an unstructured mesh for its Poisson solver that allows it to accurately resolve the edge plasma of a magnetic fusion device. After
PCG: A software package for the iterative solution of linear systems on scalar, vector and parallel computers

Energy Technology Data Exchange (ETDEWEB)

Joubert, W. [Los Alamos National Lab., NM (United States); Carey, G.F. [Univ. of Texas, Austin, TX (United States)

1994-12-31

A great need exists for high performance numerical software libraries transportable across parallel machines. This talk concerns the PCG package, which solves systems of linear equations by iterative methods on parallel computers. The features of the package are discussed, as well as techniques used to obtain high performance as well as transportability across architectures. Representative numerical results are presented for several machines including the Connection Machine CM-5, Intel Paragon and Cray T3D parallel computers.
GW Calculations of Materials on the Intel Xeon-Phi Architecture

Science.gov (United States)

Deslippe, Jack; da Jornada, Felipe H.; Vigil-Fowler, Derek; Biller, Ariel; Chelikowsky, James R.; Louie, Steven G.

Intel Xeon-Phi processors are expected to power a large number of High-Performance Computing (HPC) systems around the United States and the world in the near future. We evaluate the ability of GW and pre-requisite Density Functional Theory (DFT) calculations for materials on utilizing the Xeon-Phi architecture. We describe the optimization process and performance improvements achieved. We find that the GW method, like other higher level Many-Body methods beyond standard local/semilocal approximations to Kohn-Sham DFT, is particularly well suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-waves, band-pairs and frequencies. Support provided by the SCIDAC program, Department of Energy, Office of Science, Advanced Scientic Computing Research and Basic Energy Sciences. Grant Numbers DE-SC0008877 (Austin) and DE-AC02-05CH11231 (LBNL).
Windows for Intel Macs

CERN Document Server

Ogasawara, Todd

2008-01-01

Even the most devoted Mac OS X user may need to use Windows XP, or may just be curious about XP and its applications. This Short Cut is a concise guide for OS X users who need to quickly get comfortable and become productive with Windows XP basics on their Macs. It covers: Security Networking ApplicationsMac users can easily install and use Windows thanks to Boot Camp and Parallels Desktop for Mac. Boot Camp lets an Intel-based Mac install and boot Windows XP on its own hard drive partition. Parallels Desktop for Mac uses virtualization technology to run Windows XP (or other operating systems
Towards Porting a Real-World Seismological Application to the Intel MIC Architecture

OpenAIRE

V. Weinberg

2014-01-01

This whitepaper aims to discuss first experiences with porting an MPI-based real-world geophysical application to the new Intel Many Integrated Core (MIC) architecture. The selected code SeisSol is an application written in Fortran that can be used to simulate earthquake rupture and radiating seismic wave propagation in complex 3-D heterogeneous materials. The PRACE prototype cluster EURORA at CINECA, Italy, was accessed to analyse the MPI-performance of SeisSol on Intel Xeon Phi on both sing...
Optimizing zonal advection of the Advanced Research WRF (ARW) dynamics for Intel MIC

Science.gov (United States)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

2014-10-01

The Weather Research and Forecast (WRF) model is the most widely used community weather forecast and research model in the world. There are two distinct varieties of WRF. The Advanced Research WRF (ARW) is an experimental, advanced research version featuring very high resolution. The WRF Nonhydrostatic Mesoscale Model (WRF-NMM) has been designed for forecasting operations. WRF consists of dynamics code and several physics modules. The WRF-ARW core is based on an Eulerian solver for the fully compressible nonhydrostatic equations. In the paper, we will use Intel Intel Many Integrated Core (MIC) architecture to substantially increase the performance of a zonal advection subroutine for optimization. It is of the most time consuming routines in the ARW dynamics core. Advection advances the explicit perturbation horizontal momentum equations by adding in the large-timestep tendency along with the small timestep pressure gradient tendency. We will describe the challenges we met during the development of a high-speed dynamics code subroutine for MIC architecture. Furthermore, lessons learned from the code optimization process will be discussed. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 2.4x.
Connecting Effective Instruction and Technology. Intel-elebration: Safari.

Science.gov (United States)

Burton, Larry D.; Prest, Sharon

Intel-ebration is an attempt to integrate the following research-based instructional frameworks and strategies: (1) dimensions of learning; (2) multiple intelligences; (3) thematic instruction; (4) cooperative learning; (5) project-based learning; and (6) instructional technology. This paper presents a thematic unit on safari, using the…
CAMSHIFT Tracker Design Experiments With Intel OpenCV and SAI

National Research Council Canada - National Science Library

Francois, Alexandre R

2004-01-01

... (including multi-modal) systems, must be specifically addressed. This report describes design and implementation experiments for CAMSHIFT-based tracking systems using Intel's Open Computer Vision library and SAI...
A Portable Parallel Implementation of the U.S. Navy Layered Ocean Model

Science.gov (United States)

1995-01-01

Wallcraft, PhD (I.C. 1981) Planning Systems Inc. & P. R. Moore, PhD (Camb. 1971) IC Dept. Math. DR Moore 1° Encontro de Metodos Numericos...Kendall Square, Hypercube, D R Moore 1 ° Encontro de Metodos Numericos para Equacöes de Derivadas Parciais A. J. Wallcraft IC Mathematics...chips: Chips Machine DEC Alpha CrayT3D/E SUN Sparc Fujitsu AP1000 Intel 860 Paragon D R Moore 1° Encontro de Metodos Numericos para Equacöes
Full cycle trigonometric function on Intel Quartus II Verilog

Science.gov (United States)

Mustapha, Muhazam; Zulkarnain, Nur Antasha

2018-02-01

This paper discusses about an improvement of a previous research on hardware based trigonometric calculations. Tangent function will also be implemented to get a complete set. The functions have been simulated using Quartus II where the result will be compared to the previous work. The number of bits has also been extended for each trigonometric function. The design is based on RTL due to its resource efficient nature. At earlier stage, a technology independent test bench simulation was conducted on ModelSim due to its convenience in capturing simulation data so that accuracy information can be obtained. On second stage, Intel/Altera Quartus II will be used to simulate on technology dependent platform, particularly on the one belonging to Intel/Altera itself. Real data on no. logic elements used and propagation delay have also been obtained.
Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors.

Science.gov (United States)

Feinstein, Wei P; Moreno, Juana; Jarrell, Mark; Brylinski, Michal

2015-06-01

Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of eFindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of eFindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of eFindSite is freely available to the academic community at www.brylinski.org/efindsite.

Trusted Computing Technologies, Intel Trusted Execution Technology.

Energy Technology Data Exchange (ETDEWEB)

Guise, Max Joseph; Wendt, Jeremy Daniel

2011-01-01

We describe the current state-of-the-art in Trusted Computing Technologies - focusing mainly on Intel's Trusted Execution Technology (TXT). This document is based on existing documentation and tests of two existing TXT-based systems: Intel's Trusted Boot and Invisible Things Lab's Qubes OS. We describe what features are lacking in current implementations, describe what a mature system could provide, and present a list of developments to watch. Critical systems perform operation-critical computations on high importance data. In such systems, the inputs, computation steps, and outputs may be highly sensitive. Sensitive components must be protected from both unauthorized release, and unauthorized alteration: Unauthorized users should not access the sensitive input and sensitive output data, nor be able to alter them; the computation contains intermediate data with the same requirements, and executes algorithms that the unauthorized should not be able to know or alter. Due to various system requirements, such critical systems are frequently built from commercial hardware, employ commercial software, and require network access. These hardware, software, and network system components increase the risk that sensitive input data, computation, and output data may be compromised.
Radiation Failures in Intel 14nm Microprocessors

Science.gov (United States)

Bossev, Dobrin P.; Duncan, Adam R.; Gadlage, Matthew J.; Roach, Austin H.; Kay, Matthew J.; Szabo, Carl; Berger, Tammy J.; York, Darin A.; Williams, Aaron; LaBel, K.;

2016-01-01

In this study the 14 nm Intel Broadwell 5th generation core series 5005U-i3 and 5200U-i5 was mounted on Dell Inspiron laptops, MSI Cubi and Gigabyte Brix barebones and tested with Windows 8 and CentOS7 at idle. Heavy-ion-induced hard- and catastrophic failures do not appear to be related to the Intel 14nm Tri-Gate FinFET process. They originate from a small (9 m 140 m) area on the 32nm planar PCH die (not the CPU) as initially speculated. The hard failures seem to be due to a SEE but the exact physical mechanism has yet to be identified. Some possibilities include latch-ups, charge ion trapping or implantation, ion channels, or a combination of those (in biased conditions). The mechanism of the catastrophic failures seems related to the presence of electric power (1.05V core voltage). The 1064 nm laser mimics ionization radiation and induces soft- and hard failures as a direct result of electron-hole pair production, not heat. The 14nm FinFET processes continue to look promising for space radiation environments.

Real-time data acquisition and feedback control using Linux Intel computers

International Nuclear Information System (INIS)

Penaflor, B.G.; Ferron, J.R.; Piglowski, D.A.; Johnson, R.D.; Walker, M.L.

2006-01-01

This paper describes the experiences of the DIII-D programming staff in adapting Linux based Intel computing hardware for use in real-time data acquisition and feedback control systems. Due to the highly dynamic and unstable nature of magnetically confined plasmas in tokamak fusion experiments, real-time data acquisition and feedback control systems are in routine use with all major tokamaks. At DIII-D, plasmas are created and sustained using a real-time application known as the digital plasma control system (PCS). During each experiment, the PCS periodically samples data from hundreds of diagnostic signals and provides these data to control algorithms implemented in software. These algorithms compute the necessary commands to send to various actuators that affect plasma performance. The PCS consists of a group of rack mounted Intel Xeon computer systems running an in-house customized version of the Linux operating system tailored specifically to meet the real-time performance needs of the plasma experiments. This paper provides a more detailed description of the real-time computing hardware and custom developed software, including recent work to utilize dual Intel Xeon equipped computers within the PCS
Analysis of Intel IA-64 Processor Support for Secure Systems

National Research Council Canada - National Science Library

Unalmis, Bugra

2001-01-01

.... Systems could be constructed for which serious security threats would be eliminated. This thesis explores the Intel IA-64 processor's hardware support and its relationship to software for building a secure system...
Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study

OpenAIRE

Rucci, Enzo; De Giusti, Armando Eduardo; Naiouf, Marcelo

2017-01-01

Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architec- ture.While optimizing applications on CPUs, GPUs and first Xeon Phi’s has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm ...
Parallel decomposition of the tight-binding fictitious Lagrangian algorithm for molecular dynamics simulations of semiconductors

International Nuclear Information System (INIS)

Yeh, M.; Kim, J.; Khan, F.S.

1995-01-01

We present a parallel decomposition of the tight-binding fictitious Lagrangian algorithm for the Intel iPSC/860 and the Intel Paragon parallel computers. We show that it is possible to perform long simulations, of the order of 10 000 time steps, on semiconducting clusters consisting of as many as 512 atoms, on a time scale of the order of 20 h or less. We have made a very careful timing analysis of all parts of our code, and have identified the bottlenecks. We have also derived formulas which can predict the timing of our code, based on the number of processors, message passing bandwidth, floating point performance of each node, and the set up time for message passing, appropriate to the machine being used. The time of the simulation scales as the square of the number of particles, if the number of processors is made to scale linearly with the number of particles. We show that for a system as large as 512 atoms, the main bottleneck of the computation is the orthogonalization of the wave functions, which consumes about 90% of the total time of the simulation
Lawrence Livermore National Laboratory selects Intel Itanium 2 processors for world's most powerful Linux cluster

CERN Multimedia

2003-01-01

"Intel Corporation, system manufacturer California Digital and the University of California at Lawrence Livermore National Laboratory (LLNL) today announced they are building one of the world's most powerful supercomputers. The supercomputer project, codenamed "Thunder," uses nearly 4,000 IntelÂ® ItaniumÂ® 2 processors... is expected to be complete in January 2004" (1 page).
Experience with Intel's many integrated core architecture in ATLAS software

International Nuclear Information System (INIS)

Fleischmann, S; Neumann, M; Kama, S; Lavrijsen, W; Vitillo, R

2014-01-01

Intel recently released the first commercial boards of its Many Integrated Core (MIC) Architecture. MIC is Intel's solution for the domain of throughput computing, currently dominated by general purpose programming on graphics processors (GPGPU). MIC allows the use of the more familiar x86 programming model and supports standard technologies such as OpenMP, MPI, and Intel's Threading Building Blocks (TBB). This should make it possible to develop for both throughput and latency devices using a single code base. In ATLAS Software, track reconstruction has been shown to be a good candidate for throughput computing on GPGPU devices. In addition, the newly proposed offline parallel event-processing framework, GaudiHive, uses TBB for task scheduling. The MIC is thus, in principle, a good fit for this domain. In this paper, we report our experiences of porting to and optimizing ATLAS tracking algorithms for the MIC, comparing the programmability and relative cost/performance of the MIC against those of current GPGPUs and latency-optimized CPUs.
Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Energy Technology Data Exchange (ETDEWEB)

Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeffrey A.; Tallent, Nathan R.; Vishnu, Abhinav; Kerbyson, Darren J.

2017-07-03

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors --- including NVIDIA, Intel, AMD and IBM --- have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. This paper provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, CaffeNet, AlexNet and GoogleNet topologies using the Cifar10 and ImageNet datasets. The workloads are vendor optimized for each architecture. GPUs provide the highest overall raw performance. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and KNL can be competitive when considering performance/watt. Furthermore, NVLink is critical to GPU scaling.
Investigating the Use of the Intel Xeon Phi for Event Reconstruction

Science.gov (United States)

Sherman, Keegan; Gilfoyle, Gerard

2014-09-01

The physics goal of Jefferson Lab is to understand how quarks and gluons form nuclei and it is being upgraded to a higher, 12-GeV beam energy. The new CLAS12 detector in Hall B will collect 5-10 terabytes of data per day and will require considerable computing resources. We are investigating tools, such as the Intel Xeon Phi, to speed up the event reconstruction. The Kalman Filter is one of the methods being studied. It is a linear algebra algorithm that estimates the state of a system by combining existing data and predictions of those measurements. The tools required to apply this technique (i.e. matrix multiplication, matrix inversion) are being written using C++ intrinsics for Intel's Xeon Phi Coprocessor, which uses the Many Integrated Cores (MIC) architecture. The Intel MIC is a new high-performance chip that connects to a host machine through the PCIe bus and is built to run highly vectorized and parallelized code making it a well-suited device for applications such as the Kalman Filter. Our tests of the MIC optimized algorithms needed for the filter show significant increases in speed. For example, matrix multiplication of 5x5 matrices on the MIC was able to run up to 69 times faster than the host core. The physics goal of Jefferson Lab is to understand how quarks and gluons form nuclei and it is being upgraded to a higher, 12-GeV beam energy. The new CLAS12 detector in Hall B will collect 5-10 terabytes of data per day and will require considerable computing resources. We are investigating tools, such as the Intel Xeon Phi, to speed up the event reconstruction. The Kalman Filter is one of the methods being studied. It is a linear algebra algorithm that estimates the state of a system by combining existing data and predictions of those measurements. The tools required to apply this technique (i.e. matrix multiplication, matrix inversion) are being written using C++ intrinsics for Intel's Xeon Phi Coprocessor, which uses the Many Integrated Cores (MIC
Evaluation of the Intel Westmere-EX server processor

CERN Document Server

Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

2011-01-01

One year after the arrival of the Intel Xeon 7500 systems (“Nehalem-EX”), CERN openlab is presenting a set of benchmark results obtained when running on the new Xeon E7-4870 Processors, representing the “Westmere-EX” family. A modern 4-socket, 40-core system is confronted with the previous generation of expandable (“EX”) platforms, represented by a 4-socket, 32-core Intel Xeon X7560 based system – both being “top of the line” systems. Benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores via Symmetric MultiThreading (SMT), the cache sizes available, the configured memory topology, as well as the power configuration if throughput per watt is to be measured. As in previous activities, we have tried to do a good job of comparing like with like. In a “top of the line” comparison based on the HEPSPEC06 benchmark, the “We...
Optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme for Intel Many Integrated Core (MIC) architecture

Science.gov (United States)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.-L.

2015-05-01

Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The co-processor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of Xeon Phi will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 1.3x.
Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture

OpenAIRE

Liu, Xu; Chen, Langshi; Firoz, Jesun S.; Qiu, Judy; Jiang, Lei

2017-01-01

Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we empirically evaluate various computing platforms including an Intel Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor codenamed Knights Landing (KNL) in the domain of parallel graph processing. We show that the KNL gains encouraging per...
Intel Legend and CERN would build up high speed Internet

CERN Multimedia

2002-01-01

Intel, Legend and China Education and Research Network jointly announced on the 25th of April that they will be cooperating with each other to build up the new generation high speed internet, over the next three years (1/2 page).
A comparison of SuperLU solvers on the intel MIC architecture

Science.gov (United States)

Tuncel, Mehmet; Duran, Ahmet; Celebi, M. Serdar; Akaydin, Bora; Topkaya, Figen O.

2016-10-01

In many science and engineering applications, problems may result in solving a sparse linear system AX=B. For example, SuperLU_MCDT, a linear solver, was used for the large penta-diagonal matrices for 2D problems and hepta-diagonal matrices for 3D problems, coming from the incompressible blood flow simulation (see [1]). It is important to test the status and potential improvements of state-of-the-art solvers on new technologies. In this work, sequential, multithreaded and distributed versions of SuperLU solvers (see [2]) are examined on the Intel Xeon Phi coprocessors using offload programming model at the EURORA cluster of CINECA in Italy. We consider a portfolio of test matrices containing patterned matrices from UFMM ([3]) and randomly located matrices. This architecture can benefit from high parallelism and large vectors. We find that the sequential SuperLU benefited up to 45 % performance improvement from the offload programming depending on the sparse matrix type and the size of transferred and processed data.
Staggered Dslash Performance on Intel Xeon Phi Architecture

OpenAIRE

Li, Ruizi; Gottlieb, Steven

2014-01-01

The conjugate gradient (CG) algorithm is among the most essential and time consuming parts of lattice calculations with staggered quarks. We test the performance of CG and dslash, the key step in the CG algorithm, on the Intel Xeon Phi, also known as the Many Integrated Core (MIC) architecture. We try different parallelization strategies using MPI, OpenMP, and the vector processing units (VPUs).
Massively parallel computation of PARASOL code on the Origin 3800 system

International Nuclear Information System (INIS)

Hosokawa, Masanari; Takizuka, Tomonori

2001-10-01

The divertor particle simulation code named PARASOL simulates open-field plasmas between divertor walls self-consistently by using an electrostatic PIC method and a binary collision Monte Carlo model. The PARASOL parallelized with MPI-1.1 for scalar parallel computer worked on Intel Paragon XP/S system. A system SGI Origin 3800 was newly installed (May, 2001). The parallel programming was improved at this switchover. As a result of the high-performance new hardware and this improvement, the PARASOL is speeded up by about 60 times with the same number of processors. (author)
Global synchronization algorithms for the Intel iPSC/860

Science.gov (United States)

Seidel, Steven R.; Davis, Mark A.

1992-01-01

In a distributed memory multicomputer that has no global clock, global processor synchronization can only be achieved through software. Global synchronization algorithms are used in tridiagonal systems solvers, CFD codes, sequence comparison algorithms, and sorting algorithms. They are also useful for event simulation, debugging, and for solving mutual exclusion problems. For the Intel iPSC/860 in particular, global synchronization can be used to ensure the most effective use of the communication network for operations such as the shift, where each processor in a one-dimensional array or ring concurrently sends a message to its right (or left) neighbor. Three global synchronization algorithms are considered for the iPSC/860: the gysnc() primitive provided by Intel, the PICL primitive sync0(), and a new recursive doubling synchronization (RDS) algorithm. The performance of these algorithms is compared to the performance predicted by communication models of both the long and forced message protocols. Measurements of the cost of shift operations preceded by global synchronization show that the RDS algorithm always synchronizes the nodes more precisely and costs only slightly more than the other two algorithms.
A new parallel molecular dynamics algorithm for organic systems

International Nuclear Information System (INIS)

Plimpton, S.; Hendrickson, B.; Heffelfinger, G.

1993-01-01

A new parallel algorithm for simulating bonded molecular systems such as polymers and proteins by molecular dynamics (MD) is presented. In contrast to methods that extract parallelism by breaking the spatial domain into sub-pieces, the new method does not require regular geometries or uniform particle densities to achieve high parallel efficiency. For very large, regular systems spatial methods are often the best choice, but in practice the new method is faster for systems with tens-of-thousands of atoms simulated on large numbers of processors. It is also several times faster than the techniques commonly used for parallelizing bonded MD that assign a subset of atoms to each processor and require all-to-all communication. Implementation of the algorithm in a CHARMm-like MD model with many body forces and constraint dynamics is discussed and timings on the Intel Delta and Paragon machines are given. Example calculations using the algorithm in simulations of polymers and liquid-crystal molecules will also be briefly discussed
Parallelization of pressure equation solver for incompressible N-S equations

International Nuclear Information System (INIS)

Ichihara, Kiyoshi; Yokokawa, Mitsuo; Kaburaki, Hideo.

1996-03-01

A pressure equation solver in a code for 3-dimensional incompressible flow analysis has been parallelized by using red-black SOR method and PCG method on Fujitsu VPP500, a vector parallel computer with distributed memory. For the comparison of scalability, the solver using the red-black SOR method has been also parallelized on the Intel Paragon, a scalar parallel computer with a distributed memory. The scalability of the red-black SOR method on both VPP500 and Paragon was lost, when number of processor elements was increased. The reason of non-scalability on both systems is increasing communication time between processor elements. In addition, the parallelization by DO-loop division makes the vectorizing efficiency lower on VPP500. For an effective implementation on VPP500, a large scale problem which holds very long vectorized DO-loops in the parallel program should be solved. PCG method with red-black SOR method applied to incomplete LU factorization (red-black PCG) has more iteration steps than normal PCG method with forward and backward substitution, in spite of same number of the floating point operations in a DO-loop of incomplete LU factorization. The parallelized red-black PCG method has less merits than the parallelized red-black SOR method when the computational region has fewer grids, because the low vectorization efficiency is obtained in red-black PCG method. (author)

High-performance computing on the Intel Xeon Phi how to fully exploit MIC architectures

CERN Document Server

Wang, Endong; Shen, Bo; Zhang, Guangyong; Lu, Xiaowei; Wu, Qing; Wang, Yajuan

2014-01-01

The aim of this book is to explain to high-performance computing (HPC) developers how to utilize the Intel® Xeon Phi™ series products efficiently. To that end, it introduces some computing grammar, programming technology and optimization methods for using many-integrated-core (MIC) platforms and also offers tips and tricks for actual use, based on the authors' first-hand optimization experience.The material is organized in three sections. The first section, "Basics of MIC", introduces the fundamentals of MIC architecture and programming, including the specific Intel MIC programming environment
Exploring synchrotron radiation capabilities: The ALS-Intel CRADA

International Nuclear Information System (INIS)

Gozzo, F.; Cossy-Favre, A.; Padmore, H.

1997-01-01

Synchrotron radiation spectroscopy and spectromicroscopy were applied, at the Advanced Light Source, to the analysis of materials and problems of interest to the commercial semiconductor industry. The authors discuss some of the results obtained at the ALS using existing capabilities, in particular the small spot ultra-ESCA instrument on beamline 7.0 and the AMS (Applied Material Science) endstation on beamline 9.3.2. The continuing trend towards smaller feature size and increased performance for semiconductor components has driven the semiconductor industry to invest in the development of sophisticated and complex instrumentation for the characterization of microstructures. Among the crucial milestones established by the Semiconductor Industry Association are the needs for high quality, defect free and extremely clean silicon wafers, very thin gate oxides, lithographies near 0.1 micron and advanced material interconnect structures. The requirements of future generations cannot be met with current industrial technologies. The purpose of the ALS-Intel CRADA (Cooperative Research And Development Agreement) is to explore, compare and improve the utility of synchrotron-based techniques for practical analysis of substrates of interest to semiconductor chip manufacturing. The first phase of the CRADA project consisted in exploring existing ALS capabilities and techniques on some problems of interest. Some of the preliminary results obtained on Intel samples are discussed here
Protein Alignment on the Intel Xeon Phi Coprocessor

OpenAIRE

Ramstad, Jorun

2015-01-01

There is an increasing need for sensitive, high perfomance sequence alignemnet tools. With the growing databases of scientificly analyzed protein sequences, more compute power is necessary. Specialized architectures arise, and a transition from serial to specialized implementationsis is required. This thesis is a study of whether Intel 60's cores Xeon Phi coprocessor is a suitable architecture for implementation of a sequence alignment tool. The performance relative to existing tools are eval...
Intel·ligència emocional a maternal

OpenAIRE

Missé Cortina, Jordi

2015-01-01

Inclusió d'activitats d'intel·ligència emocional a maternal A i B per al treball de l'adquisició de valors com l'autoestima, el respecte, la tolerància, etc. Inclusión de actividades de inteligencia emocional en maternal A y B para el trabajo de la adquisición de valores como la autoestima, el respeto, la tolerancia, etc. Practicum for the Psychology program on Educational Psychology.
Communication overhead on the Intel iPSC-860 hypercube

Science.gov (United States)

Bokhari, Shahid H.

1990-01-01

Experiments were conducted on the Intel iPSC-860 hypercube in order to evaluate the overhead of interprocessor communication. It is demonstrated that: (1) contrary to popular belief, the distance between two communicating processors has a significant impact on communication time, (2) edge contention can increase communication time by a factor of more than 7, and (3) node contention has no measurable impact.
Reflective memory recorder upgrade: an opportunity to benchmark PowerPC and Intel architectures for real time

Science.gov (United States)

Abuter, Roberto; Tischer, Helmut; Frahm, Robert

2014-07-01

Several high frequency loops are required to run the VLTI (Very Large Telescope Interferometer) 2, e.g. for fringe tracking11, 5, angle tracking, vibration cancellation, data capture. All these loops rely on low latency real time computers based on the VME bus, Motorola PowerPC14 hardware architecture. In this context, one highly demanding application in terms of cycle time, latency and data transfer volume is the VLTI centralized recording facility, so called, RMN recorder1 (Reflective Memory Recorder). This application captures and transfers data flowing through the distributed memory of the system in real time. Some of the VLTI data producers are running with frequencies up to 8 KHz. With the evolution from first generation instruments like MIDI3, PRIMA5, and AMBER4 which use one or two baselines, to second generation instruments like MATISSE10 and GRAVITY9 which will use all six baselines simultaneously, the quantity of signals has increased by, at least, a factor of six. This has led to a significant overload of the RMN recorder1 which has reached the natural limits imposed by the underlying hardware. At the same time, new, more powerful computers, based on the Intel multicore families of CPUs and PCI buses have become available. With the purpose of improving the performance of the RMN recorder1 application and in order to make it capable of coping with the demands of the new generation instruments, a slightly modified implementation has been developed and integrated into an Intel based multicore computer15 running the VxWorks17 real time operating system. The core of the application is based on the standard VLT software framework for instruments13. The real time task reads from the reflective memory using the onboard DMA access12 and captured data is transferred to the outside world via a TCP socket on a dedicated Ethernet connection. The diversity of the software and hardware that are involved makes this application suitable as a benchmarking platform. A
Parallel Programming with Intel Parallel Studio XE

CERN Document Server

Blair-Chappell , Stephen

2012-01-01

Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Evaluating the transport layer of the ALFA framework for the Intel® Xeon Phi™ Coprocessor

Science.gov (United States)

Santogidis, Aram; Hirstius, Andreas; Lalis, Spyros

2015-12-01

The ALFA framework supports the software development of major High Energy Physics experiments. As part of our research effort to optimize the transport layer of ALFA, we focus on profiling its data transfer performance for inter-node communication on the Intel Xeon Phi Coprocessor. In this article we present the collected performance measurements with the related analysis of the results. The optimization opportunities that are discovered, help us to formulate the future plans of enabling high performance data transfer for ALFA on the Intel Xeon Phi architecture.
Performance tuning Weather Research and Forecasting (WRF) Goddard longwave radiative transfer scheme on Intel Xeon Phi

Science.gov (United States)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

2015-10-01

Next-generation mesoscale numerical weather prediction system, the Weather Research and Forecasting (WRF) model, is a designed for dual use for forecasting and research. WRF offers multiple physics options that can be combined in any way. One of the physics options is radiance computation. The major source for energy for the earth's climate is solar radiation. Thus, it is imperative to accurately model horizontal and vertical distribution of the heating. Goddard solar radiative transfer model includes the absorption duo to water vapor,ozone, ozygen, carbon dioxide, clouds and aerosols. The model computes the interactions among the absorption and scattering by clouds, aerosols, molecules and surface. Finally, fluxes are integrated over the entire longwave spectrum.In this paper, we present our results of optimizing the Goddard longwave radiative transfer scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The optimizations improved the performance of the original Goddard longwave radiative transfer scheme on Xeon Phi 7120P by a factor of 2.2x. Furthermore, the same optimizations improved the performance of the Goddard longwave radiative transfer scheme on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 2.1x compared to the original Goddard longwave radiative transfer scheme code.
Application of a parallel 3-dimensional hydrogeochemistry HPF code to a proposed waste disposal site at the Oak Ridge National Laboratory

International Nuclear Information System (INIS)

Gwo, Jin-Ping; Yeh, Gour-Tsyh

1997-01-01

The objectives of this study are (1) to parallelize a 3-dimensional hydrogeochemistry code and (2) to apply the parallel code to a proposed waste disposal site at the Oak Ridge National Laboratory (ORNL). The 2-dimensional hydrogeochemistry code HYDROGEOCHEM, developed at the Pennsylvania State University for coupled subsurface solute transport and chemical equilibrium processes, was first modified to accommodate 3-dimensional problem domains. A bi-conjugate gradient stabilized linear matrix solver was then incorporated to solve the matrix equation. We chose to parallelize the 3-dimensional code on the Intel Paragons at ORNL by using an HPF (high performance FORTRAN) compiler developed at PGI. The data- and task-parallel algorithms available in the HPF compiler proved to be highly efficient for the geochemistry calculation. This calculation can be easily implemented in HPF formats and is perfectly parallel because the chemical speciation on one finite-element node is virtually independent of those on the others. The parallel code was applied to a subwatershed of the Melton Branch at ORNL. Chemical heterogeneity, in addition to physical heterogeneities of the geological formations, has been identified as one of the major factors that affect the fate and transport of contaminants at ORNL. This study demonstrated an application of the 3-dimensional hydrogeochemistry code on the Melton Branch site. A uranium tailing problem that involved in aqueous complexation and precipitation-dissolution was tested. Performance statistics was collected on the Intel Paragons at ORNL. Implications of these results on the further optimization of the code were discussed
Scientific Programming with High Performance Fortran: A Case Study Using the xHPF Compiler

Directory of Open Access Journals (Sweden)

Eric De Sturler

1997-01-01

Full Text Available Recently, the first commercial High Performance Fortran (HPF subset compilers have appeared. This article reports on our experiences with the xHPF compiler of Applied Parallel Research, version 1.2, for the Intel Paragon. At this stage, we do not expect very High Performance from our HPF programs, even though performance will eventually be of paramount importance for the acceptance of HPF. Instead, our primary objective is to study how to convert large Fortran 77 (F77 programs to HPF such that the compiler generates reasonably efficient parallel code. We report on a case study that identifies several problems when parallelizing code with HPF; most of these problems affect current HPF compiler technology in general, although some are specific for the xHPF compiler. We discuss our solutions from the perspective of the scientific programmer, and presenttiming results on the Intel Paragon. The case study comprises three programs of different complexity with respect to parallelization. We use the dense matrix-matrix product to show that the distribution of arrays and the order of nested loops significantly influence the performance of the parallel program. We use Gaussian elimination with partial pivoting to study the parallelization strategy of the compiler. There are various ways to structure this algorithm for a particular data distribution. This example shows how much effort may be demanded from the programmer to support the compiler in generating an efficient parallel implementation. Finally, we use a small application to show that the more complicated structure of a larger program may introduce problems for the parallelization, even though all subroutines of the application are easy to parallelize by themselves. The application consists of a finite volume discretization on a structured grid and a nested iterative solver. Our case study shows that it is possible to obtain reasonably efficient parallel programs with xHPF, although the compiler
Multi-processing CTH: Porting legacy FORTRAN code to MP hardware

Energy Technology Data Exchange (ETDEWEB)

Bell, R.L.; Elrick, M.G.; Hertel, E.S. Jr.

1996-12-31

CTH is a family of codes developed at Sandia National Laboratories for use in modeling complex multi-dimensional, multi-material problems that are characterized by large deformations and/or strong shocks. A two-step, second-order accurate Eulerian solution algorithm is used to solve the mass, momentum, and energy conservation equations. CTH has historically been run on systems where the data are directly accessible to the cpu, such as workstations and vector supercomputers. Multiple cpus can be used if all data are accessible to all cpus. This is accomplished by placing compiler directives or subroutine calls within the source code. The CTH team has implemented this scheme for Cray shared memory machines under the Unicos operating system. This technique is effective, but difficult to port to other (similar) shared memory architectures because each vendor has a different format of directives or subroutine calls. A different model of high performance computing is one where many (> 1,000) cpus work on a portion of the entire problem and communicate by passing messages that contain boundary data. Most, if not all, codes that run effectively on parallel hardware were written with a parallel computing paradigm in mind. Modifying an existing code written for serial nodes poses a significantly different set of challenges that will be discussed. CTH, a legacy FORTRAN code, has been modified to allow for solutions on distributed memory parallel computers such as the IBM SP2, the Intel Paragon, Cray T3D, or a network of workstations. The message passing version of CTH will be discussed and example calculations will be presented along with performance data. Current timing studies indicate that CTH is 2--3 times faster than equivalent C++ code written specifically for parallel hardware. CTH on the Intel Paragon exhibits linear speed up with problems that are scaled (constant problem size per node) for the number of parallel nodes.
Implementation of High-Order Multireference Coupled-Cluster Methods on Intel Many Integrated Core Architecture.

Science.gov (United States)

Aprà, E; Kowalski, K

2016-03-08

In this paper we discuss the implementation of multireference coupled-cluster formalism with singles, doubles, and noniterative triples (MRCCSD(T)), which is capable of taking advantage of the processing power of the Intel Xeon Phi coprocessor. We discuss the integration of two levels of parallelism underlying the MRCCSD(T) implementation with computational kernels designed to offload the computationally intensive parts of the MRCCSD(T) formalism to Intel Xeon Phi coprocessors. Special attention is given to the enhancement of the parallel performance by task reordering that has improved load balancing in the noniterative part of the MRCCSD(T) calculations. We also discuss aspects regarding efficient optimization and vectorization strategies.
Empirical study of parallel LRU simulation algorithms

Science.gov (United States)

Carr, Eric; Nicol, David M.

1994-01-01

This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.
A Fast parallel tridiagonal algorithm for a class of CFD applications

Science.gov (United States)

Moitra, Stuti; Sun, Xian-He

1996-01-01

The parallel diagonal dominant (PDD) algorithm is an efficient tridiagonal solver. This paper presents for study a variation of the PDD algorithm, the reduced PDD algorithm. The new algorithm maintains the minimum communication provided by the PDD algorithm, but has a reduced operation count. The PDD algorithm also has a smaller operation count than the conventional sequential algorithm for many applications. Accuracy analysis is provided for the reduced PDD algorithm for symmetric Toeplitz tridiagonal (STT) systems. Implementation results on Langley's Intel Paragon and IBM SP2 show that both the PDD and reduced PDD algorithms are efficient and scalable.
A performance study of sparse Cholesky factorization on INTEL iPSC/860

Science.gov (United States)

Zubair, M.; Ghose, M.

1992-01-01

The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices. However, there is a lack of such efficient codes on parallel machines in general, and distributed machines in particular. Some of the issues that are critical to the implementation of sparse Cholesky factorization on a distributed memory parallel machine are ordering, partitioning and mapping, load balancing, and ordering of various tasks within a processor. Here, we focus on the effect of various partitioning schemes on the performance of sparse Cholesky factorization on the Intel iPSC/860. Also, a new partitioning heuristic for structured as well as unstructured sparse matrices is proposed, and its performance is compared with other schemes.
Time-efficient simulations of tight-binding electronic structures with Intel Xeon PhiTM many-core processors

Science.gov (United States)

Ryu, Hoon; Jeong, Yosang; Kang, Ji-Hoon; Cho, Kyu Nam

2016-12-01

Modelling of multi-million atomic semiconductor structures is important as it not only predicts properties of physically realizable novel materials, but can accelerate advanced device designs. This work elaborates a new Technology-Computer-Aided-Design (TCAD) tool for nanoelectronics modelling, which uses a sp3d5s∗ tight-binding approach to describe multi-million atomic structures, and simulate electronic structures with high performance computing (HPC), including atomic effects such as alloy and dopant disorders. Being named as Quantum simulation tool for Advanced Nanoscale Devices (Q-AND), the tool shows nice scalability on traditional multi-core HPC clusters implying the strong capability of large-scale electronic structure simulations, particularly with remarkable performance enhancement on latest clusters of Intel Xeon PhiTM coprocessors. A review of the recent modelling study conducted to understand an experimental work of highly phosphorus-doped silicon nanowires, is presented to demonstrate the utility of Q-AND. Having been developed via Intel Parallel Computing Center project, Q-AND will be open to public to establish a sound framework of nanoelectronics modelling with advanced HPC clusters of a many-core base. With details of the development methodology and exemplary study of dopant electronics, this work will present a practical guideline for TCAD development to researchers in the field of computational nanoelectronics.
Parallel computing by Monte Carlo codes MVP/GMVP

International Nuclear Information System (INIS)

Nagaya, Yasunobu; Nakagawa, Masayuki; Mori, Takamasa

2001-01-01

General-purpose Monte Carlo codes MVP/GMVP are well-vectorized and thus enable us to perform high-speed Monte Carlo calculations. In order to achieve more speedups, we parallelized the codes on the different types of parallel computing platforms or by using a standard parallelization library MPI. The platforms used for benchmark calculations are a distributed-memory vector-parallel computer Fujitsu VPP500, a distributed-memory massively parallel computer Intel paragon and a distributed-memory scalar-parallel computer Hitachi SR2201, IBM SP2. As mentioned generally, linear speedup could be obtained for large-scale problems but parallelization efficiency decreased as the batch size per a processing element(PE) was smaller. It was also found that the statistical uncertainty for assembly powers was less than 0.1% by the PWR full-core calculation with more than 10 million histories and it took about 1.5 hours by massively parallel computing. (author)
DOLIB: Distributed Object Library

Energy Technology Data Exchange (ETDEWEB)

D' Azevedo, E.F.

1994-01-01

This report describes the use and implementation of DOLIB (Distributed Object Library), a library of routines that emulates global or virtual shared memory on Intel multiprocessor systems. Access to a distributed global array is through explicit calls to gather and scatter. Advantages of using DOLIB include: dynamic allocation and freeing of huge (gigabyte) distributed arrays, both C and FORTRAN callable interfaces, and the ability to mix shared-memory and message-passing programming models for ease of use and optimal performance. DOLIB is independent of language and compiler extensions and requires no special operating system support. DOLIB also supports automatic caching of read-only data for high performance. The virtual shared memory support provided in DOLIB is well suited for implementing Lagrangian particle tracking techniques. We have also used DOLIB to create DONIO (Distributed Object Network I/O Library), which obtains over a 10-fold improvement in disk I/O performance on the Intel Paragon.
DOLIB: Distributed Object Library

Energy Technology Data Exchange (ETDEWEB)

D`Azevedo, E.F.; Romine, C.H.

1994-10-01

This report describes the use and implementation of DOLIB (Distributed Object Library), a library of routines that emulates global or virtual shared memory on Intel multiprocessor systems. Access to a distributed global array is through explicit calls to gather and scatter. Advantages of using DOLIB include: dynamic allocation and freeing of huge (gigabyte) distributed arrays, both C and FORTRAN callable interfaces, and the ability to mix shared-memory and message-passing programming models for ease of use and optimal performance. DOLIB is independent of language and compiler extensions and requires no special operating system support. DOLIB also supports automatic caching of read-only data for high performance. The virtual shared memory support provided in DOLIB is well suited for implementing Lagrangian particle tracking techniques. We have also used DOLIB to create DONIO (Distributed Object Network I/O Library), which obtains over a 10-fold improvement in disk I/O performance on the Intel Paragon.

Implementation of a 3-D nonlinear MHD [magnetohydrodynamics] calculation on the Intel hypercube

International Nuclear Information System (INIS)

Lynch, V.E.; Carreras, B.A.; Drake, J.B.; Hicks, H.R.; Lawkins, W.F.

1987-01-01

The optimization of numerical schemes and increasing computer capabilities in the last ten years have improved the efficiency of 3-D nonlinear resistive MHD calculations by about two to three orders of magnitude. However, we are still very limited in performing these types of calculations. Hypercubes have a large number of processors with only local memory and bidirectional links among neighbors. The Intel Hypercube at Oak Ridge has 64 processors with 0.5 megabytes of memory per processor. The multiplicity of processors opens new possibilities for the treatment of such computations. The constraint on time and resources favored the approach of using the existing RSF code which solves as an initial value problem the reduced set of MHD equations for a periodic cylindrical geometry. This code includes minimal physics and geometry, but contains the basic three dimensionality and nonlinear structure of the equations. The code solves the reduced set of MHD equations by Fourier expansion in two angular coordinates and finite differences in the radial one. Due to the continuing interest in these calculations and the likelihood that future supercomputers will take greater advantage of parallelism, the present study was initiated by the ORNL Exploratory Studies Committee and funded entirely by Laboratory Discretionary Funds. The objectives of the study were: to ascertain the suitability of MHD calculation for parallel computation, to design and implement a parallel algorithm to perform the computations, and to evaluate the hypercube, and in particular, ORNL's Intel iPSC, for use in MHD computations
Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned

OpenAIRE

Georgios Goumas

2014-01-01

In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Intel Xeon Phi coprocessor. Our efforts involved both the evaluation of programming models including OpenCL, POSIX threads and OpenMP and typical optimization strategies like parallelization and vectorization. Since the straightforward porting process of the already existing OpenCL version of the code encountered performance problems that require further analysis, we focused our efforts on the impl...
Single event effect testing of the Intel 80386 family and the 80486 microprocessor

International Nuclear Information System (INIS)

Moran, A.; LaBel, K.; Gates, M.; Seidleck, C.; McGraw, R.; Broida, M.; Firer, J.; Sprehn, S.

1996-01-01

The authors present single event effect test results for the Intel 80386 microprocessor, the 80387 coprocessor, the 82380 peripheral device, and on the 80486 microprocessor. Both single event upset and latchup conditions were monitored
An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture

International Nuclear Information System (INIS)

Mironov, Vladimir; Moskovsky, Alexander; D’Mello, Michael; Alexeev, Yuri

2017-01-01

The Hartree-Fock (HF) method in the quantum chemistry package GAMESS represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals (ERIs) and the building of the Fock matrix. These are the central components of the main Self Consistent Field (SCF) loop, the key hotspot in Electronic Structure (ES) codes. By threading the MPI ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4x to 6x for large systems), but also achieve a significant (>2x) reduction in the overall memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel R Xeon PhiTM supercomputer. Here, scaling numbers are reported on up to 7,680 cores on Intel Xeon Phi coprocessors.
Performance optimization of Qbox and WEST on Intel Knights Landing

Science.gov (United States)

Zheng, Huihuo; Knight, Christopher; Galli, Giulia; Govoni, Marco; Gygi, Francois

We present the optimization of electronic structure codes Qbox and WEST targeting the Intel®Xeon Phi™processor, codenamed Knights Landing (KNL). Qbox is an ab-initio molecular dynamics code based on plane wave density functional theory (DFT) and WEST is a post-DFT code for excited state calculations within many-body perturbation theory. Both Qbox and WEST employ highly scalable algorithms which enable accurate large-scale electronic structure calculations on leadership class supercomputer platforms beyond 100,000 cores, such as Mira and Theta at the Argonne Leadership Computing Facility. In this work, features of the KNL architecture (e.g. hierarchical memory) are explored to achieve higher performance in key algorithms of the Qbox and WEST codes and to develop a road-map for further development targeting next-generation computing architectures. In particular, the optimizations of the Qbox and WEST codes on the KNL platform will target efficient large-scale electronic structure calculations of nanostructured materials exhibiting complex structures and prediction of their electronic and thermal properties for use in solar and thermal energy conversion device. This work was supported by MICCoM, as part of Comp. Mats. Sci. Program funded by the U.S. DOE, Office of Sci., BES, MSE Division. This research used resources of the ALCF, which is a DOE Office of Sci. User Facility under Contract DE-AC02-06CH11357.
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™

OpenAIRE

Gomes, Jeremias M.; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H.

2015-01-01

We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon Phi™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP’s irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high perfo...
Using Intel Xeon Phi to accelerate the WRF TEMF planetary boundary layer scheme

Science.gov (United States)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen

2014-05-01

The Weather Research and Forecasting (WRF) model is designed for numerical weather prediction and atmospheric research. The WRF software infrastructure consists of several components such as dynamic solvers and physics schemes. Numerical models are used to resolve the large-scale flow. However, subgrid-scale parameterizations are for an estimation of small-scale properties (e.g., boundary layer turbulence and convection, clouds, radiation). Those have a significant influence on the resolved scale due to the complex nonlinear nature of the atmosphere. For the cloudy planetary boundary layer (PBL), it is fundamental to parameterize vertical turbulent fluxes and subgrid-scale condensation in a realistic manner. A parameterization based on the Total Energy - Mass Flux (TEMF) that unifies turbulence and moist convection components produces a better result that the other PBL schemes. For that reason, the TEMF scheme is chosen as the PBL scheme we optimized for Intel Many Integrated Core (MIC), which ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our optimization results for TEMF planetary boundary layer scheme. The optimizations that were performed were quite generic in nature. Those optimizations included vectorization of the code to utilize vector units inside each CPU. Furthermore, memory access was improved by scalarizing some of the intermediate arrays. The results show that the optimization improved MIC performance by 14.8x. Furthermore, the optimizations increased CPU performance by 2.6x compared to the original multi-threaded code on quad core Intel Xeon E5-2603 running at 1.8 GHz. Compared to the optimized code running on a single CPU socket the optimized MIC code is 6.2x faster.
Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor

OpenAIRE

Lu, Mian; Zhang, Lei; Huynh, Huynh Phung; Ong, Zhongliang; Liang, Yun; He, Bingsheng; Goh, Rick Siow Mong; Huynh, Richard

2013-01-01

With the ease-of-programming, flexibility and yet efficiency, MapReduce has become one of the most popular frameworks for building big-data applications. MapReduce was originally designed for distributed-computing, and has been extended to various architectures, e,g, multi-core CPUs, GPUs and FPGAs. In this work, we focus on optimizing the MapReduce framework on Xeon Phi, which is the latest product released by Intel based on the Many Integrated Core Architecture. To the best of our knowledge...
Parallel implementation of many-body mean-field equations

International Nuclear Information System (INIS)

Chinn, C.R.; Umar, A.S.; Vallieres, M.; Strayer, M.R.

1994-01-01

We describe the numerical methods used to solve the system of stiff, nonlinear partial differential equations resulting from the Hartree-Fock description of many-particle quantum systems, as applied to the structure of the nucleus. The solutions are performed on a three-dimensional Cartesian lattice. Discretization is achieved through the lattice basis-spline collocation method, in which quantum-state vectors and coordinate-space operators are expressed in terms of basis-spline functions on a spatial lattice. All numerical procedures reduce to a series of matrix-vector multiplications and other elementary operations, which we perform on a number of different computing architectures, including the Intel Paragon and the Intel iPSC/860 hypercube. Parallelization is achieved through a combination of mechanisms employing the Gram-Schmidt procedure, broadcasts, global operations, and domain decomposition of state vectors. We discuss the approach to the problems of limited node memory and node-to-node communication overhead inherent in using distributed-memory, multiple-instruction, multiple-data stream parallel computers. An algorithm was developed to reduce the communication overhead by pipelining some of the message passing procedures
OpenMP-accelerated SWAT simulation using Intel C and FORTRAN compilers: Development and benchmark

Science.gov (United States)

Ki, Seo Jin; Sugimura, Tak; Kim, Albert S.

2015-02-01

We developed a practical method to accelerate execution of Soil and Water Assessment Tool (SWAT) using open (free) computational resources. The SWAT source code (rev 622) was recompiled using a non-commercial Intel FORTRAN compiler in Ubuntu 12.04 LTS Linux platform, and newly named iOMP-SWAT in this study. GNU utilities of make, gprof, and diff were used to develop the iOMP-SWAT package, profile memory usage, and check identicalness of parallel and serial simulations. Among 302 SWAT subroutines, the slowest routines were identified using GNU gprof, and later modified using Open Multiple Processing (OpenMP) library in an 8-core shared memory system. In addition, a C wrapping function was used to rapidly set large arrays to zero by cross compiling with the original SWAT FORTRAN package. A universal speedup ratio of 2.3 was achieved using input data sets of a large number of hydrological response units. As we specifically focus on acceleration of a single SWAT run, the use of iOMP-SWAT for parameter calibrations will significantly improve the performance of SWAT optimization.
Does the Intel Xeon Phi processor fit HEP workloads?

Science.gov (United States)

Nowak, A.; Bitzes, G.; Dotti, A.; Lazzaro, A.; Jarp, S.; Szostek, P.; Valsan, L.; Botezatu, M.; Leduc, J.

2014-06-01

This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.
Profiling CPU-bound workloads on Intel Haswell-EP platforms

CERN Document Server

Guerri, Marco; Cristovao, Cordeiro; CERN. Geneva. IT Department

2017-01-01

With the increasing adoption of public and private cloud resources to support the demands in terms of computing capacity of the WLCG, the HEP community has begun studying several benchmarking applications aimed at continuously assessing the performance of virtual machines procured from commercial providers. In order to characterise the behaviour of these benchmarks, in-depth profiling activities have been carried out. In this document we outline our experience in profiling one specific application, the ATLAS Kit Validation, in an attempt to explain an unexpected distribution in the performance samples obtained on systems based on Intel Haswell-EP processors.
Applications Performance Under MPL and MPI on NAS IBM SP2

Science.gov (United States)

Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)

1994-01-01

On July 5, 1994, an IBM Scalable POWER parallel System (IBM SP2) with 64 nodes, was installed at the Numerical Aerodynamic Simulation (NAS) Facility Each node of NAS IBM SP2 is a "wide node" consisting of a RISC 6000/590 workstation module with a clock of 66.5 MHz which can perform four floating point operations per clock with a peak performance of 266 Mflop/s. By the end of 1994, 64 nodes of IBM SP2 will be upgraded to 160 nodes with a peak performance of 42.5 Gflop/s. An overview of the IBM SP2 hardware is presented. The basic understanding of architectural details of RS 6000/590 will help application scientists the porting, optimizing, and tuning of codes from other machines such as the CRAY C90 and the Paragon to the NAS SP2. Optimization techniques such as quad-word loading, effective utilization of two floating point units, and data cache optimization of RS 6000/590 is illustrated, with examples giving performance gains at each optimization step. The conversion of codes using Intel's message passing library NX to codes using native Message Passing Library (MPL) and the Message Passing Interface (NMI) library available on the IBM SP2 is illustrated. In particular, we will present the performance of Fast Fourier Transform (FFT) kernel from NAS Parallel Benchmarks (NPB) under MPL and MPI. We have also optimized some of Fortran BLAS 2 and BLAS 3 routines, e.g., the optimized Fortran DAXPY runs at 175 Mflop/s and optimized Fortran DGEMM runs at 230 Mflop/s per node. The performance of the NPB (Class B) on the IBM SP2 is compared with the CRAY C90, Intel Paragon, TMC CM-5E, and the CRAY T3D.
Survey on present status and trend of parallel programming environments

International Nuclear Information System (INIS)

Takemiya, Hiroshi; Higuchi, Kenji; Honma, Ichiro; Ohta, Hirofumi; Kawasaki, Takuji; Imamura, Toshiyuki; Koide, Hiroshi; Akimoto, Masayuki.

1997-03-01

This report intends to provide useful information on software tools for parallel programming through the survey on parallel programming environments of the following six parallel computers, Fujitsu VPP300/500, NEC SX-4, Hitachi SR2201, Cray T94, IBM SP, and Intel Paragon, all of which are installed at Japan Atomic Energy Research Institute (JAERI), moreover, the present status of R and D's on parallel softwares of parallel languages, compilers, debuggers, performance evaluation tools, and integrated tools is reported. This survey has been made as a part of our project of developing a basic software for parallel programming environment, which is designed on the concept of STA (Seamless Thinking Aid to programmers). (author)
Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for two-dimensional panel codes.

Science.gov (United States)

Einkemmer, Lukas

2017-01-01

To optimize the geometry of airfoils for a specific application is an important engineering problem. In this context genetic algorithms have enjoyed some success as they are able to explore the search space without getting stuck in local optima. However, these algorithms require the computation of aerodynamic properties for a significant number of airfoil geometries. Consequently, for low-speed aerodynamics, panel methods are most often used as the inner solver. In this paper we evaluate the performance of such an optimization algorithm on modern accelerators (more specifically, the Intel Xeon Phi 7120 and the NVIDIA K80). For that purpose, we have implemented an optimized version of the algorithm on the CPU and Xeon Phi (based on OpenMP, vectorization, and the Intel MKL library) and on the GPU (based on CUDA and the MAGMA library). We present timing results for all codes and discuss the similarities and differences between the three implementations. Overall, we observe a speedup of approximately 2.5 for adding an Intel Xeon Phi 7120 to a dual socket workstation and a speedup between 3.4 and 3.8 for adding a NVIDIA K80 to a dual socket workstation.
Why K-12 IT Managers and Administrators Are Embracing the Intel-Based Mac

Science.gov (United States)

Technology & Learning, 2007

2007-01-01

Over the past year, Apple has dramatically increased its share of the school computer marketplace--especially in the category of notebook computers. A recent study conducted by Grunwald Associates and Rockman et al. reports that one of the major reasons for this growth is Apple's introduction of the Intel processor to the entire line of Mac…
Autonomous controller (JCAM 10) for CAMAC crate with 8080 (INTEL) microprocessor

International Nuclear Information System (INIS)

Gallice, P.; Mathis, M.

1975-01-01

The CAMAC crate autonomous controller JCAM-10 is designed around an INTEL 8080 microprocessor in association with a 5K RAM and 4K REPROM memory. The concept of the module is described, in which data transfers between CAMAC modules and the memory are optimised from software point of view as well as from execution time. In fact, the JCAM-10 is a microcomputer with a set of 1000 peripheral units represented by the CAMAC modules commercially available
Mashup d'aplicacions basat en un buscador intel·ligent

OpenAIRE

Sancho Piqueras, Javier

2010-01-01

Mashup de funcionalitats, basat en un cercador intel·ligent, en aquest cas pensat per a cursos, carreres màsters, etc. La finalitat és adjuntar diverses aplicacions amb l'únic propòsit que en aquest cas és un buscador però que també ens permet utilitzar eines per a la connectivitat mitjançant web Services, o xarxes socials. Mashup de funcionalidades, basado en un buscador inteligente, en este caso pensado para cursos, carreras másters, etc. La finalidad es juntar diversas aplicaciones con ...
Deployment of the OSIRIS EM-PIC code on the Intel Knights Landing architecture

Science.gov (United States)

Fonseca, Ricardo

2017-10-01

Electromagnetic particle-in-cell (EM-PIC) codes such as OSIRIS have found widespread use in modelling the highly nonlinear and kinetic processes that occur in several relevant plasma physics scenarios, ranging from astrophysical settings to high-intensity laser plasma interaction. Being computationally intensive, these codes require large scale HPC systems, and a continuous effort in adapting the algorithm to new hardware and computing paradigms. In this work, we report on our efforts on deploying the OSIRIS code on the new Intel Knights Landing (KNL) architecture. Unlike the previous generation (Knights Corner), these boards are standalone systems, and introduce several new features, include the new AVX-512 instructions and on-package MCDRAM. We will focus on the parallelization and vectorization strategies followed, as well as memory management, and present a detailed performance evaluation of code performance in comparison with the CPU code. This work was partially supported by Fundaçã para a Ciência e Tecnologia (FCT), Portugal, through Grant No. PTDC/FIS-PLA/2940/2014.
Performance Evaluation of an Intel Haswell- and Ivy Bridge-Based Supercomputer Using Scientific and Engineering Applications

Science.gov (United States)

Saini, Subhash; Hood, Robert T.; Chang, Johnny; Baron, John

2016-01-01

We present a performance evaluation conducted on a production supercomputer of the Intel Xeon Processor E5- 2680v3, a twelve-core implementation of the fourth-generation Haswell architecture, and compare it with Intel Xeon Processor E5-2680v2, an Ivy Bridge implementation of the third-generation Sandy Bridge architecture. Several new architectural features have been incorporated in Haswell including improvements in all levels of the memory hierarchy as well as improvements to vector instructions and power management. We critically evaluate these new features of Haswell and compare with Ivy Bridge using several low-level benchmarks including subset of HPCC, HPCG and four full-scale scientific and engineering applications. We also present a model to predict the performance of HPCG and Cart3D within 5%, and Overflow within 10% accuracy.

Does the Intel Xeon Phi processor fit HEP workloads?

International Nuclear Information System (INIS)

Nowak, A; Bitzes, G; Dotti, A; Lazzaro, A; Jarp, S; Szostek, P; Valsan, L; Botezatu, M; Leduc, J

2014-01-01

This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.
Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi

OpenAIRE

Stanic, Milan; Palomar, Oscar; Ratkovic, Ivan; Duric, Milovan; Unsal, Osman; Cristal, Adrian; Valero, Mateo

2014-01-01

Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released by Intel with new features such as a wide 512-bit vector unit and vector scatter/gather instru...
Newsgroups, Activist Publics, and Corporate Apologia: The Case of Intel and Its Pentium Chip.

Science.gov (United States)

Hearit, Keith Michael

1999-01-01

Applies J. Grunig's theory of publics to the phenomenon of Internet newsgroups using the case of the flawed Intel Pentium chip. Argues that technology facilitates the rapid movement of publics from the theoretical construct stage to the active stage. Illustrates some of the difficulties companies face in establishing their identity in cyberspace.…
"Personified as Paragon of Suffering...... Optimistic Being of Achieving Normalcy:" A Conceptual Model Derived from Qualitative Research

Science.gov (United States)

Nayak, Shalini G; Pai, Mamatha Shivananda; George, Linu Sara

2018-01-01

Background: Conceptual models developed through qualitative research are based on the unique experiences of suffering and individuals’ adoptions of each participant. A wide array of problems are faced by head-and-neck cancer (HNC) patients due to disease pathology and treatment modalities which are sufficient to influence the quality of life (QOL). Men possess greater self-acceptance and are better equipped with intrapersonal strength to cope with stress and adequacy compared to women. Methodology: A qualitative phenomenology study was conducted among seven women suffering from HNC, with the objective to understand their experiences of suffering and to describe the phenomenon. Data were collected by face-to-face, in-depth, open-ended interviews. Data were analyzed using Open Code software (OPC 4.0) by following the steps of Colaizzi process. Results: The phenomenon that emerged out of the lived experiences of HNC women was "Personified as paragon of suffering.optimistic being of achieving normalcy," with five major themes and 13 subthemes. Conclusion: The conceptual model developed with the phenomenological approach is very specific to the women suffering from HNC, which will be contributing to develop strategies to improve the QOL of women. PMID:29440812
DBPQL: A view-oriented query language for the Intel Data Base Processor

Science.gov (United States)

Fishwick, P. A.

1983-01-01

An interactive query language (BDPQL) for the Intel Data Base Processor (DBP) is defined. DBPQL includes a parser generator package which permits the analyst to easily create and manipulate the query statement syntax and semantics. The prototype language, DBPQL, includes trace and performance commands to aid the analyst when implementing new commands and analyzing the execution characteristics of the DBP. The DBPQL grammar file and associated key procedures are included as an appendix to this report.
Game-Based Experiential Learning in Online Management Information Systems Classes Using Intel's IT Manager 3

Science.gov (United States)

Bliemel, Michael; Ali-Hassan, Hossam

2014-01-01

For several years, we used Intel's flash-based game "IT Manager 3: Unseen Forces" as an experiential learning tool, where students had to act as a manager making real-time prioritization decisions about repairing computer problems, training and upgrading systems with better technologies as well as managing increasing numbers of technical…
Acceleration of Blender Cycles Path-Tracing Engine Using Intel Many Integrated Core Architecture

OpenAIRE

Jaroš , Milan; Říha , Lubomír; Strakoš , Petr; Karásek , Tomáš; Vašatová , Alena; Jarošová , Marta; Kozubek , Tomáš

2015-01-01

Part 2: Algorithms; International audience; This paper describes the acceleration of the most computationally intensive kernels of the Blender rendering engine, Blender Cycles, using Intel Many Integrated Core architecture (MIC). The proposed parallelization, which uses OpenMP technology, also improves the performance of the rendering engine when running on multi-core CPUs and multi-socket servers. Although the GPU acceleration is already implemented in Cycles, its functionality is limited. O...
Parallel algorithms for unconstrained optimization by multisplitting with inexact subspace search - the abstract

Energy Technology Data Exchange (ETDEWEB)

Renaut, R.; He, Q. [Arizona State Univ., Tempe, AZ (United States)

1994-12-31

In a new parallel iterative algorithm for unconstrained optimization by multisplitting is proposed. In this algorithm the original problem is split into a set of small optimization subproblems which are solved using well known sequential algorithms. These algorithms are iterative in nature, e.g. DFP variable metric method. Here the authors use sequential algorithms based on an inexact subspace search, which is an extension to the usual idea of an inexact fine search. Essentially the idea of the inexact line search for nonlinear minimization is that at each iteration the authors only find an approximate minimum in the line search direction. Hence by inexact subspace search, they mean that, instead of finding the minimum of the subproblem at each interation, they do an incomplete down hill search to give an approximate minimum. Some convergence and numerical results for this algorithm will be presented. Further, the original theory will be generalized to the situation with a singular Hessian. Applications for nonlinear least squares problems will be presented. Experimental results will be presented for implementations on an Intel iPSC/860 Hypercube with 64 nodes as well as on the Intel Paragon.
New parallel SOR method by domain partitioning

Energy Technology Data Exchange (ETDEWEB)

Xie, Dexuan [Courant Inst. of Mathematical Sciences New York Univ., NY (United States)

1996-12-31

In this paper, we propose and analyze a new parallel SOR method, the PSOR method, formulated by using domain partitioning together with an interprocessor data-communication technique. For the 5-point approximation to the Poisson equation on a square, we show that the ordering of the PSOR based on the strip partition leads to a consistently ordered matrix, and hence the PSOR and the SOR using the row-wise ordering have the same convergence rate. However, in general, the ordering used in PSOR may not be {open_quote}consistently ordered{close_quotes}. So, there is a need to analyze the convergence of PSOR directly. In this paper, we present a PSOR theory, and show that the PSOR method can have the same asymptotic rate of convergence as the corresponding sequential SOR method for a wide class of linear systems in which the matrix is {open_quotes}consistently ordered{close_quotes}. Finally, we demonstrate the parallel performance of the PSOR method on four different message passing multiprocessors (a KSR1, the Intel Delta, an Intel Paragon and an IBM SP2), along with a comparison with the point Red-Black and four-color SOR methods.
Optimizing the Betts-Miller-Janjic cumulus parameterization with Intel Many Integrated Core (MIC) architecture

Science.gov (United States)

Huang, Melin; Huang, Bormin; Huang, Allen H.-L.

2015-10-01

The schemes of cumulus parameterization are responsible for the sub-grid-scale effects of convective and/or shallow clouds, and intended to represent vertical fluxes due to unresolved updrafts and downdrafts and compensating motion outside the clouds. Some schemes additionally provide cloud and precipitation field tendencies in the convective column, and momentum tendencies due to convective transport of momentum. The schemes all provide the convective component of surface rainfall. Betts-Miller-Janjic (BMJ) is one scheme to fulfill such purposes in the weather research and forecast (WRF) model. National Centers for Environmental Prediction (NCEP) has tried to optimize the BMJ scheme for operational application. As there are no interactions among horizontal grid points, this scheme is very suitable for parallel computation. With the advantage of Intel Xeon Phi Many Integrated Core (MIC) architecture, efficient parallelization and vectorization essentials, it allows us to optimize the BMJ scheme. If compared to the original code respectively running on one CPU socket (eight cores) and on one CPU core with Intel Xeon E5-2670, the MIC-based optimization of this scheme running on Xeon Phi coprocessor 7120P improves the performance by 2.4x and 17.0x, respectively.
Application of Intel Many Integrated Core (MIC) accelerators to the Pleim-Xiu land surface scheme

Science.gov (United States)

Huang, Melin; Huang, Bormin; Huang, Allen H.

2015-10-01

The land-surface model (LSM) is one physics process in the weather research and forecast (WRF) model. The LSM includes atmospheric information from the surface layer scheme, radiative forcing from the radiation scheme, and precipitation forcing from the microphysics and convective schemes, together with internal information on the land's state variables and land-surface properties. The LSM is to provide heat and moisture fluxes over land points and sea-ice points. The Pleim-Xiu (PX) scheme is one LSM. The PX LSM features three pathways for moisture fluxes: evapotranspiration, soil evaporation, and evaporation from wet canopies. To accelerate the computation process of this scheme, we employ Intel Xeon Phi Many Integrated Core (MIC) Architecture as it is a multiprocessor computer structure with merits of efficient parallelization and vectorization essentials. Our results show that the MIC-based optimization of this scheme running on Xeon Phi coprocessor 7120P improves the performance by 2.3x and 11.7x as compared to the original code respectively running on one CPU socket (eight cores) and on one CPU core with Intel Xeon E5-2670.
Acceleration of Cherenkov angle reconstruction with the new Intel Xeon/FPGA compute platform for the particle identification in the LHCb Upgrade

Science.gov (United States)

Faerber, Christian

2017-10-01

The LHCb experiment at the LHC will upgrade its detector by 2018/2019 to a ‘triggerless’ readout scheme, where all the readout electronics and several sub-detector parts will be replaced. The new readout electronics will be able to readout the detector at 40 MHz. This increases the data bandwidth from the detector down to the Event Filter farm to 40 TBit/s, which also has to be processed to select the interesting proton-proton collision for later storage. The architecture of such a computing farm, which can process this amount of data as efficiently as possible, is a challenging task and several compute accelerator technologies are being considered for use inside the new Event Filter farm. In the high performance computing sector more and more FPGA compute accelerators are used to improve the compute performance and reduce the power consumption (e.g. in the Microsoft Catapult project and Bing search engine). Also for the LHCb upgrade the usage of an experimental FPGA accelerated computing platform in the Event Building or in the Event Filter farm is being considered and therefore tested. This platform from Intel hosts a general CPU and a high performance FPGA linked via a high speed link which is for this platform a QPI link. On the FPGA an accelerator is implemented. The used system is a two socket platform from Intel with a Xeon CPU and an FPGA. The FPGA has cache-coherent memory access to the main memory of the server and can collaborate with the CPU. As a first step, a computing intensive algorithm to reconstruct Cherenkov angles for the LHCb RICH particle identification was successfully ported in Verilog to the Intel Xeon/FPGA platform and accelerated by a factor of 35. The same algorithm was ported to the Intel Xeon/FPGA platform with OpenCL. The implementation work and the performance will be compared. Also another FPGA accelerator the Nallatech 385 PCIe accelerator with the same Stratix V FPGA were tested for performance. The results show that the Intel
Implementation of 5-layer thermal diffusion scheme in weather research and forecasting model with Intel Many Integrated Cores

Science.gov (United States)

Huang, Melin; Huang, Bormin; Huang, Allen H.

2014-10-01

For weather forecasting and research, the Weather Research and Forecasting (WRF) model has been developed, consisting of several components such as dynamic solvers and physical simulation modules. WRF includes several Land- Surface Models (LSMs). The LSMs use atmospheric information, the radiative and precipitation forcing from the surface layer scheme, the radiation scheme, and the microphysics/convective scheme all together with the land's state variables and land-surface properties, to provide heat and moisture fluxes over land and sea-ice points. The WRF 5-layer thermal diffusion simulation is an LSM based on the MM5 5-layer soil temperature model with an energy budget that includes radiation, sensible, and latent heat flux. The WRF LSMs are very suitable for massively parallel computation as there are no interactions among horizontal grid points. The features, efficient parallelization and vectorization essentials, of Intel Many Integrated Core (MIC) architecture allow us to optimize this WRF 5-layer thermal diffusion scheme. In this work, we present the results of the computing performance on this scheme with Intel MIC architecture. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.1x. Accordingly, the same CPU-based optimizations improved the performance on Intel Xeon E5- 2603 by a factor of 1.6x as compared to the first version of multi-threaded code.
Applying the roofline performance model to the intel xeon phi knights landing processor

OpenAIRE

Doerfler, D; Deslippe, J; Williams, S; Oliker, L; Cook, B; Kurth, T; Lobet, M; Malas, T; Vay, JL; Vincenti, H

2016-01-01

ï¿½ Springer International Publishing AG 2016. The Roofline Performance Model is a visually intuitive method used to bound the sustained peak floating-point performance of any given arithmetic kernel on any given processor architecture. In the Roofline, performance is nominally measured in floating-point operations per second as a function of arithmetic intensity (operations per byte of data). In this study we determine the Roofline for the Intel Knights Landing (KNL) processor, determining t...
Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

OpenAIRE

Hofmann, Johannes; Treibig, Jan; Hager, Georg; Wellein, Gerhard

2013-01-01

We examine the Xeon Phi, which is based on Intel's Many Integrated Cores architecture, for its suitability to run the FDK algorithm--the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography. We study the challenges of efficiently parallelizing the application and means to enable sensible data sharing between threads despite the lack of a shared last level cache. Apart from parallelization, SIMD vectorization is critical for good performance on t...
Acceleration of Monte Carlo simulation of photon migration in complex heterogeneous media using Intel many-integrated core architecture.

Science.gov (United States)

Gorshkov, Anton V; Kirillin, Mikhail Yu

2015-08-01

Over two decades, the Monte Carlo technique has become a gold standard in simulation of light propagation in turbid media, including biotissues. Technological solutions provide further advances of this technique. The Intel Xeon Phi coprocessor is a new type of accelerator for highly parallel general purpose computing, which allows execution of a wide range of applications without substantial code modification. We present a technical approach of porting our previously developed Monte Carlo (MC) code for simulation of light transport in tissues to the Intel Xeon Phi coprocessor. We show that employing the accelerator allows reducing computational time of MC simulation and obtaining simulation speed-up comparable to GPU. We demonstrate the performance of the developed code for simulation of light transport in the human head and determination of the measurement volume in near-infrared spectroscopy brain sensing.
Evaluation of the Single-precision Floatingpoint Vector Add Kernel Using the Intel FPGA SDK for OpenCL

Energy Technology Data Exchange (ETDEWEB)

Jin, Zheming [Argonne National Lab. (ANL), Argonne, IL (United States); Yoshii, Kazutomo [Argonne National Lab. (ANL), Argonne, IL (United States); Finkel, Hal [Argonne National Lab. (ANL), Argonne, IL (United States); Cappello, Franck [Argonne National Lab. (ANL), Argonne, IL (United States)

2017-04-20

Open Computing Language (OpenCL) is a high-level language that enables software programmers to explore Field Programmable Gate Arrays (FPGAs) for application acceleration. The Intel FPGA software development kit (SDK) for OpenCL allows a user to specify applications at a high level and explore the performance of low-level hardware acceleration. In this report, we present the FPGA performance and power consumption results of the single-precision floating-point vector add OpenCL kernel using the Intel FPGA SDK for OpenCL on the Nallatech 385A FPGA board. The board features an Arria 10 FPGA. We evaluate the FPGA implementations using the compute unit duplication and kernel vectorization optimization techniques. On the Nallatech 385A FPGA board, the maximum compute kernel bandwidth we achieve is 25.8 GB/s, approximately 76% of the peak memory bandwidth. The power consumption of the FPGA device when running the kernels ranges from 29W to 42W.
Performance Analysis of an Astrophysical Simulation Code on the Intel Xeon Phi Architecture

OpenAIRE

Noormofidi, Vahid; Atlas, Susan R.; Duan, Huaiyu

2015-01-01

We have developed the astrophysical simulation code XFLAT to study neutrino oscillations in supernovae. XFLAT is designed to utilize multiple levels of parallelism through MPI, OpenMP, and SIMD instructions (vectorization). It can run on both CPU and Xeon Phi co-processors based on the Intel Many Integrated Core Architecture (MIC). We analyze the performance of XFLAT on configurations with CPU only, Xeon Phi only and both CPU and Xeon Phi. We also investigate the impact of I/O and the multi-n...
Efficient sparse matrix-matrix multiplication for computing periodic responses by shooting method on Intel Xeon Phi

Science.gov (United States)

Stoykov, S.; Atanassov, E.; Margenov, S.

2016-10-01

Many of the scientific applications involve sparse or dense matrix operations, such as solving linear systems, matrix-matrix products, eigensolvers, etc. In what concerns structural nonlinear dynamics, the computations of periodic responses and the determination of stability of the solution are of primary interest. Shooting method iswidely used for obtaining periodic responses of nonlinear systems. The method involves simultaneously operations with sparse and dense matrices. One of the computationally expensive operations in the method is multiplication of sparse by dense matrices. In the current work, a new algorithm for sparse matrix by dense matrix products is presented. The algorithm takes into account the structure of the sparse matrix, which is obtained by space discretization of the nonlinear Mindlin's plate equation of motion by the finite element method. The algorithm is developed to use the vector engine of Intel Xeon Phi coprocessors. It is compared with the standard sparse matrix by dense matrix algorithm and the one developed by Intel MKL and it is shown that by considering the properties of the sparse matrix better algorithms can be developed.
Software and DVFS Tuning for Performance and Energy-Efficiency on Intel KNL Processors

Directory of Open Access Journals (Sweden)

Enrico Calore

2018-06-01

Full Text Available Energy consumption of processors and memories is quickly becoming a limiting factor in the deployment of large computing systems. For this reason, it is important to understand the energy performance of these processors and to study strategies allowing their use in the most efficient way. In this work, we focus on the computing and energy performance of the Knights Landing Xeon Phi, the latest Intel many-core architecture processor for HPC applications. We consider the 64-core Xeon Phi 7230 and profile its performance and energy efficiency using both its on-chip MCDRAM and the off-chip DDR4 memory as the main storage for application data. As a benchmark application, we use a lattice Boltzmann code heavily optimized for this architecture and implemented using several different arrangements of the application data in memory (data-layouts, in short. We also assess the dependence of energy consumption on data-layouts, memory configurations (DDR4 or MCDRAM and the number of threads per core. We finally consider possible trade-offs between computing performance and energy efficiency, tuning the clock frequency of the processor using the Dynamic Voltage and Frequency Scaling (DVFS technique.

Evaluating the networking characteristics of the Cray XC-40 Intel Knights Landing-based Cori supercomputer at NERSC

Energy Technology Data Exchange (ETDEWEB)

Doerfler, Douglas [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Austin, Brian [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Cook, Brandon [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Deslippe, Jack [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Kandalla, Krishna [Cray Inc, Bloomington, MN (United States); Mendygral, Peter [Cray Inc, Bloomington, MN (United States)

2017-09-12

There are many potential issues associated with deploying the Intel Xeon Phi™ (code named Knights Landing [KNL]) manycore processor in a large-scale supercomputer. One in particular is the ability to fully utilize the high-speed communications network, given that the serial performance of a Xeon Phi TM core is a fraction of a Xeon®core. In this paper, we take a look at the trade-offs associated with allocating enough cores to fully utilize the Aries high-speed network versus cores dedicated to computation, e.g., the trade-off between MPI and OpenMP. In addition, we evaluate new features of Cray MPI in support of KNL, such as internode optimizations. We also evaluate one-sided programming models such as Unified Parallel C. We quantify the impact of the above trade-offs and features using a suite of National Energy Research Scientific Computing Center applications.
OpenMP GNU and Intel Fortran programs for solving the time-dependent Gross-Pitaevskii equation

Science.gov (United States)

Young-S., Luis E.; Muruganandam, Paulsamy; Adhikari, Sadhan K.; Lončar, Vladimir; Vudragović, Dušan; Balaž, Antun

2017-11-01

reduce the execution time cannot be overemphasized. To address this issue, we provide here such OpenMP Fortran programs, optimized for both Intel and GNU Fortran compilers and capable of using all available CPU cores, which can significantly reduce the execution time. Summary of revisions: Previous Fortran programs [1] for solving the time-dependent GP equation in 1d, 2d, and 3d with different trap symmetries have been parallelized using the OpenMP interface to reduce the execution time on multi-core processors. There are six different trap symmetries considered, resulting in six programs for imaginary-time propagation and six for real-time propagation, totaling to 12 programs included in BEC-GP-OMP-FOR software package. All input data (number of atoms, scattering length, harmonic oscillator trap length, trap anisotropy, etc.) are conveniently placed at the beginning of each program, as before [2]. Present programs introduce a new input parameter, which is designated by Number_of_Threads and defines the number of CPU cores of the processor to be used in the calculation. If one sets the value 0 for this parameter, all available CPU cores will be used. For the most efficient calculation it is advisable to leave one CPU core unused for the background system's jobs. For example, on a machine with 20 CPU cores such that we used for testing, it is advisable to use up to 19 CPU cores. However, the total number of used CPU cores can be divided into more than one job. For instance, one can run three simulations simultaneously using 10, 4, and 5 CPU cores, respectively, thus totaling to 19 used CPU cores on a 20-core computer. The Fortran source programs are located in the directory src, and can be compiled by the make command using the makefile in the root directory BEC-GP-OMP-FOR of the software package. The examples of produced output files can be found in the directory output, although some large density files are omitted, to save space. The programs calculate the values of
Transitioning to Intel-based Linux Servers in the Payload Operations Integration Center

Science.gov (United States)

Guillebeau, P. L.

2004-01-01

The MSFC Payload Operations Integration Center (POIC) is the focal point for International Space Station (ISS) payload operations. The POIC contains the facilities, hardware, software and communication interface necessary to support payload operations. ISS ground system support for processing and display of real-time spacecraft and telemetry and command data has been operational for several years. The hardware components were reaching end of life and vendor costs were increasing while ISS budgets were becoming severely constrained. Therefore it has been necessary to migrate the Unix portions of our ground systems to commodity priced Intel-based Linux servers. hardware architecture including networks, data storage, and highly available resources. This paper will concentrate on the Linux migration implementation for the software portion of our ground system. The migration began with 3.5 million lines of code running on Unix platforms with separate servers for telemetry, command, Payload information management systems, web, system control, remote server interface and databases. The Intel-based system is scheduled to be available for initial operational use by August 2004 The overall migration to Intel-based Linux servers in the control center involves changes to the This paper will address the Linux migration study approach including the proof of concept, criticality of customer buy-in and importance of beginning with POSlX compliant code. It will focus on the development approach explaining the software lifecycle. Other aspects of development will be covered including phased implementation, interim milestones and metrics measurements and reporting mechanisms. This paper will also address the testing approach covering all levels of testing including development, development integration, IV&V, user beta testing and acceptance testing. Test results including performance numbers compared with Unix servers will be included. need for a smooth transition while maintaining
Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor

OpenAIRE

Byun, Chansup; Kepner, Jeremy; Arcand, William; Bestor, David; Bergeron, Bill; Gadepally, Vijay; Houle, Michael; Hubbell, Matthew; Jones, Michael; Klein, Anna; Michaleas, Peter; Milechin, Lauren; Mullen, Julie; Prout, Andrew; Rosa, Antonio

2017-01-01

Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher levels of parallelism. At the Lincoln Laboratory Supercomputing Center (LLSC), the majority of users are running data analysis applications such as MATLAB and O...
High-throughput sockets over RDMA for the Intel Xeon Phi coprocessor

CERN Document Server

Santogidis, Aram

2017-01-01

In this paper we describe the design, implementation and performance of Trans4SCIF, a user-level socket-like transport library for the Intel Xeon Phi coprocessor. Trans4SCIF library is primarily intended for high-throughput applications. It uses RDMA transfers over the native SCIF support, in a way that is transparent for the application, which has the illusion of using conventional stream sockets. We also discuss the integration of Trans4SCIF with the ZeroMQ messaging library, used extensively by several applications running at CERN. We show that this can lead to a substantial, up to 3x, increase of application throughput compared to the default TCP/IP transport option.
GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

Science.gov (United States)

Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

2017-08-01

The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy
An efficient communication scheme for solving Sn equations on message-passing multiprocessors

International Nuclear Information System (INIS)

Azmy, Y.Y.

1993-01-01

Early models of Intel's hypercube multiprocessors, e.g., the iPSC/1 and iPSC/2, were characterized by the high latency of message passing. This relatively weak dependence of the communication penalty on the size of messages, in contrast to its strong dependence on the number of messages, justified using the Fan-in Fan-out algorithm (which implements a minimum spanning tree path) to perform global operations, such as global sums, etc. Recent models of message-passing computers, such as the iPSC/860 and the Paragon, have been found to possess much smaller latency, thus forcing a reexamination of the issue of performance optimization with respect to communication schemes. Essentially, the Fan-in Fan-out scheme minimizes the number of nonsimultaneous messages sent but not the volume of data traffic across the network. Furthermore, if a global operation is performed in conjunction with the message passing, a large fraction of the attached nodes remains idle as the number of utilized processors is halved in each step of the process. On the other hand, the Recursive Halving scheme offers the smallest communication cost for global operations but has some drawbacks
Evaluation of the Intel Nehalem-EX server processor

CERN Document Server

Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

2010-01-01

In this paper we report on a set of benchmark results recently obtained by the CERN openlab by comparing the 4-socket, 32-core Intel Xeon X7560 server with the previous generation 4-socket server, based on the Xeon X7460 processor. The Xeon X7560 processor represents a major change in many respects, especially the memory sub-system, so it was important to make multiple comparisons. In most benchmarks the two 4-socket servers were compared. It should be underlined that both servers represent the “top of the line” in terms of frequency. However, in some cases, it was important to compare systems that integrated the latest processor features, such as QPI links, Symmetric multithreading and over-clocking via Turbo mode, and in such situations the X7560 server was compared to a dual socket L5520 based system with an identical frequency of 2.26 GHz. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following ...
Initial results on computational performance of Intel Many Integrated Core (MIC) architecture: implementation of the Weather and Research Forecasting (WRF) Purdue-Lin microphysics scheme

Science.gov (United States)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

2014-10-01

Purdue-Lin scheme is a relatively sophisticated microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme includes six classes of hydro meteors: water vapor, cloud water, raid, cloud ice, snow and graupel. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. In this paper, we accelerate the Purdue Lin scheme using Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi is a high performance coprocessor consists of up to 61 cores. The Xeon Phi is connected to a CPU via the PCI Express (PICe) bus. In this paper, we will discuss in detail the code optimization issues encountered while tuning the Purdue-Lin microphysics Fortran code for Xeon Phi. In particularly, getting a good performance required utilizing multiple cores, the wide vector operations and make efficient use of memory. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 4.2x. Furthermore, the same optimizations improved performance on Intel Xeon E5-2603 CPU by a factor of 1.2x compared to the original code.
Evaluation of the Intel Sandy Bridge-EP server processor

CERN Document Server

Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

2012-01-01

In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing an 8-core “Sandy Bridge-EP” processor with Intel’s previous microarchitecture, the “Westmere-EP”. The Intel marketing names for these processors are “Xeon E5-2600 processor series” and “Xeon 5600 processor series”, respectively. Both processors are produced in a 32nm process, and both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores ...
Multi-CPU plasma fluid turbulence calculations on a CRAY Y-MP C90

International Nuclear Information System (INIS)

Lynch, V.E.; Carreras, B.A.; Leboeuf, J.N.; Curtis, B.C.; Troutman, R.L.

1993-01-01

Significant improvements in real-time efficiency have been obtained for plasma fluid turbulence calculations by microtasking the nonlinear fluid code KITE in which they are implemented on the CRAY Y-MP C90 at the National Energy Research Supercomputer Center (NERSC). The number of processors accessed concurrently scales linearly with problem size. Close to six concurrent processors have so far been obtained with a three-dimensional nonlinear production calculation at the currently allowed memory size of 80 Mword. With a calculation size corresponding to the maximum allowed memory of 200 Mword in the next system configuration, we expect to be able to access close to nine processors of the C90 concurrently with a commensurate improvement in real-time efficiency. These improvements in performance are comparable to those expected from a massively parallel implementation of the same calculations on the Intel Paragon
Multi-CPU plasma fluid turbulence calculations on a CRAY Y-MP C90

International Nuclear Information System (INIS)

Lynch, V.E.; Carreras, B.A.; Leboeuf, J.N.; Curtis, B.C.; Troutman, R.L.

1993-01-01

Significant improvements in real-time efficiency have been obtained for plasma fluid turbulence calculations by microtasking the nonlinear fluid code KITE in which they are implemented on the CRAY Y-MP C90 at the National Energy Research Supercomputer Center (NERSC). The number of processors accessed concurrently scales linearly with problem size. Close to six concurrent processors have so far been obtained with a three-dimensional nonlinear production calculation at the currently allowed memory size of 80 Mword. With a calculation size corresponding to the maximum allowed memory of 200 Mword in the next system configuration, they expect to be able to access close to ten processors of the C90 concurrently with a commensurate improvement in real-time efficiency. These improvements in performance are comparable to those expected from a massively parallel implementation of the same calculations on the Intel Paragon
Evaluation of the OpenCL AES Kernel using the Intel FPGA SDK for OpenCL

Energy Technology Data Exchange (ETDEWEB)

Jin, Zheming [Argonne National Lab. (ANL), Argonne, IL (United States); Yoshii, Kazutomo [Argonne National Lab. (ANL), Argonne, IL (United States); Finkel, Hal [Argonne National Lab. (ANL), Argonne, IL (United States); Cappello, Franck [Argonne National Lab. (ANL), Argonne, IL (United States)

2017-04-20

The OpenCL standard is an open programming model for accelerating algorithms on heterogeneous computing system. OpenCL extends the C-based programming language for developing portable codes on different platforms such as CPU, Graphics processing units (GPUs), Digital Signal Processors (DSPs) and Field Programmable Gate Arrays (FPGAs). The Intel FPGA SDK for OpenCL is a suite of tools that allows developers to abstract away the complex FPGA-based development flow for a high-level software development flow. Users can focus on the design of hardware-accelerated kernel functions in OpenCL and then direct the tools to generate the low-level FPGA implementations. The approach makes the FPGA-based development more accessible to software users as the needs for hybrid computing using CPUs and FPGAs are increasing. It can also significantly reduce the hardware development time as users can evaluate different ideas with high-level language without deep FPGA domain knowledge. In this report, we evaluate the performance of the kernel using the Intel FPGA SDK for OpenCL and Nallatech 385A FPGA board. Compared to the M506 module, the board provides more hardware resources for a larger design exploration space. The kernel performance is measured with the compute kernel throughput, an upper bound to the FPGA throughput. The report presents the experimental results in details. The Appendix lists the kernel source code.
GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS on Intel Xeon Phi processors

Directory of Open Access Journals (Sweden)

H. Wang

2017-08-01

Full Text Available The Global Nested Air Quality Prediction Modeling System (GNAQPMS is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS, which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL. Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC, KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1 updating the pure Message Passing Interface (MPI parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2 fully employing the 512 bit wide vector processing units (VPUs on the KNL platform; (3 reducing unnecessary memory access to improve cache efficiency; (4 reducing the thread local storage (TLS in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5 changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined
A parallel implementation of particle tracking with space charge effects on an INTEL iPSC/860

International Nuclear Information System (INIS)

Chang, L.; Bourianoff, G.; Cole, B.; Machida, S.

1993-05-01

Particle-tracking simulation is one of the scientific applications that is well-suited to parallel computations. At the Superconducting Super Collider, it has been theoretically and empirically demonstrated that particle tracking on a designed lattice can achieve very high parallel efficiency on a MIMD Intel iPSC/860 machine. The key to such success is the realization that the particles can be tracked independently without considering their interaction. The perfectly parallel nature of particle tracking is broken if the interaction effects between particles are included. The space charge introduces an electromagnetic force that will affect the motion of tracked particles in 3-D space. For accurate modeling of the beam dynamics with space charge effects, one needs to solve three-dimensional Maxwell field equations, usually by a particle-in-cell (PIC) algorithm. This will require each particle to communicate with its neighbor grids to compute the momentum changes at each time step. It is expected that the 3-D PIC method will degrade parallel efficiency of particle-tracking implementation on any parallel computer. In this paper, we describe an efficient scheme for implementing particle tracking with space charge effects on an INTEL iPSC/860 machine. Experimental results show that a parallel efficiency of 75% can be obtained
Student Intern Ben Freed Competes as Finalist in Intel STS Competition, Three Other Interns Named Semifinalists | Poster

Science.gov (United States)

By Ashley DeVine, Staff Writer Werner H. Kirstin (WHK) student intern Ben Freed was one of 40 finalists to compete in the Intel Science Talent Search (STS) in Washington, DC, in March. “It was seven intense days of interacting with amazing judges and incredibly smart and interesting students. We met President Obama, and then the MIT astronomy lab named minor planets after each
Optimizing meridional advection of the Advanced Research WRF (ARW) dynamics for Intel Xeon Phi coprocessor

Science.gov (United States)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.-L.

2015-05-01

The most widely used community weather forecast and research model in the world is the Weather Research and Forecast (WRF) model. Two distinct varieties of WRF exist. The one we are interested is the Advanced Research WRF (ARW) is an experimental, advanced research version featuring very high resolution. The WRF Nonhydrostatic Mesoscale Model (WRF-NMM) has been designed for forecasting operations. WRF consists of dynamics code and several physics modules. The WRF-ARW core is based on an Eulerian solver for the fully compressible nonhydrostatic equations. In the paper, we optimize a meridional (north-south direction) advection subroutine for Intel Xeon Phi coprocessor. Advection is of the most time consuming routines in the ARW dynamics core. It advances the explicit perturbation horizontal momentum equations by adding in the large-timestep tendency along with the small timestep pressure gradient tendency. We will describe the challenges we met during the development of a high-speed dynamics code subroutine for MIC architecture. Furthermore, lessons learned from the code optimization process will be discussed. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 1.2x.
Les multituds intel·ligents com a generadores de dades massives : la intel·ligència col·lectiva al servei de la innovació social

Directory of Open Access Journals (Sweden)

Sanz, Sandra

2015-06-01

Full Text Available Les últimes dècades es registra un increment de mobilitzacions socials organitzades, intervingudes, narrades i coordinades a través de les TIC. Són mostra de multituds intel·ligents (smart mobs que s'aprofiten dels nous mitjans de comunicació per organitzar-se. Tant pel nombre de missatges intercanviats i generats com per les pròpies interaccions generades, aquestes multituds intel·ligents es converteixen en objecte de les dades massives. La seva anàlisi a partir de les possibilitats que brinda l'enginyeria de dades pot contribuir a detectar idees construïdes com també sabers compartits fruit de la intel·ligència col·lectiva. Aquest fet afavoriria la reutilització d'aquesta informació per incrementar el coneixement del col·lectiu i contribuir al desenvolupament de la innovació social. És per això que en aquest article s'assenyalen els interrogants i les limitacions que encara presenten aquestes anàlisis i es posa en relleu la necessitat d'aprofundir en el desenvolupament de nous mètodes i tècniques d'anàlisi.En las últimas décadas se registra un incremento de movilizaciones sociales organizadas, mediadas, narradas y coordinadas a través de TICs. Son muestra de smart mobs o multitudes inteligentes que se aprovechan de los nuevos medios de comunicación para organizarse. Tanto por el número de mensajes intercambiados y generados como por las propias interacciones generadas, estas multitudes inteligentes se convierten en objeto del big data. Su análisis a partir de las posibilidades que brinda la ingeniería de datos puede contribuir a detectar ideas construidas así como saberes compartidos fruto de la inteligencia colectiva. Ello favorecería la reutilización de esta información para incrementar el conocimiento del colectivo y contribuir al desarrollo de la innovación social. Es por ello que en este artículo se señalan los interrogantes y limitaciones que todavía presentan estos análisis y se pone de relieve la
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

Science.gov (United States)

Joslin, Ronald D.; Zubair, Mohammad

1993-01-01

The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
Evaluating the transport layer of the ALFA framework for the Intel(®) Xeon Phi(™) Coprocessor

OpenAIRE

Santogidis, Aram; Hirstius, Andreas; Lalis, Spyros

2015-01-01

The ALFA framework supports the software development of major High Energy Physics experiments. As part of our research effort to optimize the transport layer of ALFA, we focus on profiling its data transfer performance for inter-node communication on the Intel Xeon Phi Coprocessor. In this article we present the collected performance measurements with the related analysis of the results. The optimization opportunities that are discovered, help us to formulate the future plans of enabling high...

Accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) model on Intel Xeon Phi processors

OpenAIRE

Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junming; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

2017-01-01

The GNAQPMS model is the global version of the Nested Air Quality Prediction Modelling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present our work of porting and optimizing the GNAQPMS model on the second generation Intel Xeon Phi processor codename “Knights Landing” (KNL). Compared with the first generation Xeon Phi coprocessor, KNL introduced many new hardware features such as a boo...
Multi-threaded ATLAS simulation on Intel Knights Landing processors

CERN Document Server

AUTHOR|(INSPIRE)INSPIRE-00014247; The ATLAS collaboration; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea

2017-01-01

The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with detai...
Multi-threaded ATLAS Simulation on Intel Knights Landing Processors

CERN Document Server

Farrell, Steven; The ATLAS collaboration; Calafiura, Paolo; Leggett, Charles

2016-01-01

The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), will be delivered to its users in two phases with the first phase online now and the second phase expected in mid-2016. Cori Phase 2 will be based on the KNL architecture and will contain over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a great use-case for the KNL architecture and supercomputers like Cori. Simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this presentation we will give an overview of the ATLAS simulation application with details on its multi-thr...
Plasma turbulence calculations on the Intel iPSC/860 (rx) hypercube

International Nuclear Information System (INIS)

Lynch, V.E.; Ruiter, J.R.

1990-01-01

One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A serial algorithm used for plasma turbulence calculations was modified to allocate a radial region in each node. In this way, convolutions at a fixed radius are performed in parallel, and communication is limited to boundary values for each radial region. For a semi-implicity numerical scheme (tridiagonal matrix solver), there is a factor of 3 improvement in efficiency with the Intel iPSC/860 machine using 64 processors over a single-processor Cray-II. For block-tridiagonal matrix cases (fully implicit code), a second parallelization takes place. The Fourier components are distributed in nodes. In each node, the block-tridiagonal matrix is inverted for each of allocated Fourier components. The algorithm for this second case has not yet been optimized. 10 refs., 4 figs
Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors

Energy Technology Data Exchange (ETDEWEB)

Heybrock, Simon; Joo, Balint; Kalamkar, Dhiraj D; Smelyanskiy, Mikhail; Vaidyanathan, Karthikeyan; Wettig, Tilo; Dubey, Pradeep

2014-12-01

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel Xeon Phi co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.
Emmarcar el debat: Lliure expressió contra propietat intel·lectual, els propers cinquanta anys

Directory of Open Access Journals (Sweden)

Eben Moglen

2007-02-01

Full Text Available
El Prof. Moglen explica i analitza, des d'una perspectiva històrica, la profunda revolució social i legal que resulta de la tecnologia digital quan aquesta s'aplica a tots els camps: programari, música i tot tipus de creacions. En concret, explica la manera en què la tecnologia digital està forçant una modificació substancial (desaparició dels sistemes de propietat intel·lectual i fa prediccions per al futur pròxim dels mercats de la PI.
Experience with low-power x86 processors (Atom) for HEP usage. An initial analysis of the Intel® dual core Atom™ N330 processor

CERN Document Server

Balazs, G; Nowak, A; CERN. Geneva. IT Department

2009-01-01

In this paper we compare a system based on an Intel Atom N330 low-power processor to a modern Intel Xeon® dual-socket server using CERN IT’s standard criteria for comparing price-performance and performance per watt. The Xeon server corresponds to what is typically acquired as servers in the LHC Computing Grid. The comparisons used public pricing information from November 2008. After the introduction in section 1, section 2 describes the hardware and software setup. In section 3 we describe the power measurements we did and in section 4 we discuss the throughput performance results. In section 5 we summarize our initial conclusions. We then go on to describe our long term vision and possible future scenarios for using such low-power processors, and finally we list interesting development directions.
ELT-scale Adaptive Optics real-time control with thes Intel Xeon Phi Many Integrated Core Architecture

Science.gov (United States)

Jenkins, David R.; Basden, Alastair; Myers, Richard M.

2018-05-01

We propose a solution to the increased computational demands of Extremely Large Telescope (ELT) scale adaptive optics (AO) real-time control with the Intel Xeon Phi Knights Landing (KNL) Many Integrated Core (MIC) Architecture. The computational demands of an AO real-time controller (RTC) scale with the fourth power of telescope diameter and so the next generation ELTs require orders of magnitude more processing power for the RTC pipeline than existing systems. The Xeon Phi contains a large number (≥64) of low power x86 CPU cores and high bandwidth memory integrated into a single socketed server CPU package. The increased parallelism and memory bandwidth are crucial to providing the performance for reconstructing wavefronts with the required precision for ELT scale AO. Here, we demonstrate that the Xeon Phi KNL is capable of performing ELT scale single conjugate AO real-time control computation at over 1.0kHz with less than 20μs RMS jitter. We have also shown that with a wavefront sensor camera attached the KNL can process the real-time control loop at up to 966Hz, the maximum frame-rate of the camera, with jitter remaining below 20μs RMS. Future studies will involve exploring the use of a cluster of Xeon Phis for the real-time control of the MCAO and MOAO regimes of AO. We find that the Xeon Phi is highly suitable for ELT AO real time control.
Thread-level parallelization and optimization of NWChem for the Intel MIC architecture

Energy Technology Data Exchange (ETDEWEB)

Shan, Hongzhang [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Williams, Samuel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); de Jong, Wibe [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

2015-01-01

In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments. In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant e ort was required to safely and efeciently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI+OpenMP hybrid implementations attain up to 65× better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6× better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.
Thread-Level Parallelization and Optimization of NWChem for the Intel MIC Architecture

Energy Technology Data Exchange (ETDEWEB)

Shan, Hongzhang; Williams, Samuel; Jong, Wibe de; Oliker, Leonid

2014-10-10

In the multicore era it was possible to exploit the increase in on-chip parallelism by simply running multiple MPI processes per chip. Unfortunately, manycore processors' greatly increased thread- and data-level parallelism coupled with a reduced memory capacity demand an altogether different approach. In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures. As the next NERSC machine will be a self-hosted Intel MIC (Xeon Phi) based supercomputer, we leverage an existing MIC testbed at NERSC to evaluate our experiments. In order to proxy the fact that future MIC machines will not have a host processor, we run all of our experiments in tt native mode. We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was required to safely and efficiently thread the TEXAS integral package when constructing the Fock matrix. Ultimately, our new MPI OpenMP hybrid implementations attain up to 65x better performance for the triples part of the CCSD(T) due in large part to the fact that the limited on-card memory limits the existing MPI implementation to a single process per card. Additionally, we obtain up to 1.6x better performance on Fock matrix constructions when compared with the best MPI implementations running multiple processes per card.
Computationally efficient implementation of sarse-tap FIR adaptive filters with tap-position control on intel IA-32 processors

OpenAIRE

Hirano, Akihiro; Nakayama, Kenji

2008-01-01

This paper presents an computationally ef cient implementation of sparse-tap FIR adaptive lters with tapposition control on Intel IA-32 processors with single-instruction multiple-data (SIMD) capability. In order to overcome randomorder memory access which prevents a ectorization, a blockbased processing and a re-ordering buffer are introduced. A dynamic register allocation and the use of memory-to-register operations help the maximization of the loop-unrolling level. Up to 66percent speedup ...
Solving Large Quadratic|Assignment Problems in Parallel

DEFF Research Database (Denmark)

Clausen, Jens; Perregaard, Michael

1997-01-01

and recalculation of bounds between branchings when used in a parallel Branch-and-Bound algorithm. The algorithm has been implemented on a 16-processor MEIKO Computing Surface with Intel i860 processors. Computational results from the solution of a number of large QAPs, including the classical Nugent 20...... processors, and have hence not been ideally suited for computations essentially involving non-vectorizable computations on integers.In this paper we investigate the combination of one of the best bound functions for a Branch-and-Bound algorithm (the Gilmore-Lawler bound) and various testing, variable binding...
GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

OpenAIRE

H. Wang; H. Wang; H. Wang; H. Wang; H. Chen; H. Chen; Q. Wu; Q. Wu; J. Lin; X. Chen; X. Xie; R. Wang; R. Wang; X. Tang; Z. Wang

2017-01-01

The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (code...
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

CERN Document Server

Abdurachmanov, David; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad

2014-01-01

Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

Science.gov (United States)

Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad

2015-05-01

Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

International Nuclear Information System (INIS)

Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Muzaffar, Shahzad; Knight, Robert

2015-01-01

Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG). (paper)
Evaluation of the Intel iWarp parallel processor for space flight applications

Science.gov (United States)

Hine, Butler P., III; Fong, Terrence W.

1993-01-01

The potential of a DARPA-sponsored advanced processor, the Intel iWarp, for use in future SSF Data Management Systems (DMS) upgrades is evaluated through integration into the Ames DMS testbed and applications testing. The iWarp is a distributed, parallel computing system well suited for high performance computing applications such as matrix operations and image processing. The system architecture is modular, supports systolic and message-based computation, and is capable of providing massive computational power in a low-cost, low-power package. As a consequence, the iWarp offers significant potential for advanced space-based computing. This research seeks to determine the iWarp's suitability as a processing device for space missions. In particular, the project focuses on evaluating the ease of integrating the iWarp into the SSF DMS baseline architecture and the iWarp's ability to support computationally stressing applications representative of SSF tasks.
Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

KAUST Repository

Wu, Xingfu

2013-12-01

In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore supercomputers: IBM POWER4, POWER5+ and BlueGene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks and Intel\\'s MPI benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore supercomputers because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyrokinetic Toroidal Code (GTC) in magnetic fusion to validate our performance model of the hybrid application on these multicore supercomputers. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore supercomputers. © 2013 Elsevier Inc.
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™

Science.gov (United States)

Gomes, Jeremias M.; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H.

2016-01-01

We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon Phi™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP’s irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63× on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7× and 1.62×, respectively, as compared to efficient CPU and GPU implementations. PMID:27298591
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™.

Science.gov (United States)

Gomes, Jeremias M; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H

2015-10-01

We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel ® Xeon Phi ™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP's irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63 × on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7 × and 1.62 × , respectively, as compared to efficient CPU and GPU implementations.

MPIRUN: A Portable Loader for Multidisciplinary and Multi-Zonal Applications

Science.gov (United States)

Fineberg, Samuel A.; Woodrow, Thomas S. (Technical Monitor)

1994-01-01

Multidisciplinary and multi-zonal applications are an important class of applications in the area of Computational Aerosciences. In these codes, two or more distinct parallel programs or copies of a single program are utilized to model a single problem. To support such applications, it is common to use a programming model where a program is divided into several single program multiple data stream (SPMD) applications, each of which solves the equations for a single physical discipline or grid zone. These SPMD applications are then bound together to form a single multidisciplinary or multi-zonal program in which the constituent parts communicate via point-to-point message passing routines. One method for implementing the message passing portion of these codes is with the new Message Passing Interface (MPI) standard. Unfortunately, this standard only specifies the message passing portion of an application, but does not specify any portable mechanisms for loading an application. MPIRUN was developed to provide a portable means for loading MPI programs, and was specifically targeted at multidisciplinary and multi-zonal applications. Programs using MPIRUN for loading and MPI for message passing are then portable between all machines supported by MPIRUN. MPIRUN is currently implemented for the Intel iPSC/860, TMC CM5, IBM SP-1 and SP-2, Intel Paragon, and workstation clusters. Further, MPIRUN is designed to be simple enough to port easily to any system supporting MPI.
Plasma Science and Applications at the Intel Science Fair: A Retrospective

Science.gov (United States)

Berry, Lee

2009-11-01

For the past five years, the Coalition for Plasma Science (CPS) has presented an award for a plasma project at the Intel International Science and Engineering Fair (ISEF). Eligible projects have ranged from grape-based plasma production in a microwave oven to observation of the effects of viscosity in a fluid model of quark-gluon plasma. Most projects have been aimed at applications, including fusion, thrusters, lighting, materials processing, and GPS improvements. However diagnostics (spectroscopy), technology (magnets), and theory (quark-gluon plasmas) have also been represented. All of the CPS award-winning projects so far have been based on experiments, with two awards going to women students and three to men. Since the award was initiated, both the number and quality of plasma projects has increased. The CPS expects this trend to continue, and looks forward to continuing its work with students who are excited about the possibilities of plasma. You too can share this excitement by judging at the 2010 fair in San Jose on May 11-12.
Multi-threaded ATLAS simulation on Intel Knights Landing processors

Science.gov (United States)

Farrell, Steven; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea; ATLAS Collaboration

2017-10-01

The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with details on its multi-threaded design. Then, we will present a performance analysis of the application on KNL devices and compare it to a traditional x86 platform to demonstrate the capabilities of the architecture and evaluate the benefits of utilizing KNL platforms like Cori for ATLAS production.
La responsabilitat davant la intel·ligència artificial en el comerç electrònic

OpenAIRE

Martín i Palomas, Elisabet

2015-01-01

Es planteja en aquesta tesi l'efecte produït sobre la responsabilitat derivada de les accions realitzades autònomament per sistemes dotats d'intel·ligència artificial, sense la participació directa de cap ésser humà, en els temes més directament relacionats amb el comerç electrònic. Per a això s'analitzen les activitats realitzades per algunes de les principals empreses internacionals de comerç electrònic, com el grup nord-americà eBay o el grup xinès Alibaba. Després de desenvolupar els prin...
Performance Evaluation of Multithreaded Geant4 Simulations Using an Intel Xeon Phi Cluster

Directory of Open Access Journals (Sweden)

P. Schweitzer

2015-01-01

Full Text Available The objective of this study is to evaluate the performances of Intel Xeon Phi hardware accelerators for Geant4 simulations, especially for multithreaded applications. We present the complete methodology to guide users for the compilation of their Geant4 applications on Phi processors. Then, we propose series of benchmarks to compare the performance of Xeon CPUs and Phi processors for a Geant4 example dedicated to the simulation of electron dose point kernels, the TestEm12 example. First, we compare a distributed execution of a sequential version of the Geant4 example on both architectures before evaluating the multithreaded version of the Geant4 example. If Phi processors demonstrated their ability to accelerate computing time (till a factor 3.83 when distributing sequential Geant4 simulations, we do not reach the same level of speedup when considering the multithreaded version of the Geant4 example.
Parallelization of 2-D lattice Boltzmann codes

International Nuclear Information System (INIS)

Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo.

1996-03-01

Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author)
Parallelization of 2-D lattice Boltzmann codes

Energy Technology Data Exchange (ETDEWEB)

Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo

1996-03-01

Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author).
Large Scale GW Calculations on the Cori System

Science.gov (United States)

Deslippe, Jack; Del Ben, Mauro; da Jornada, Felipe; Canning, Andrew; Louie, Steven

The NERSC Cori system, powered by 9000+ Intel Xeon-Phi processors, represents one of the largest HPC systems for open-science in the United States and the world. We discuss the optimization of the GW methodology for this system, including both node level and system-scale optimizations. We highlight multiple large scale (thousands of atoms) case studies and discuss both absolute application performance and comparison to calculations on more traditional HPC architectures. We find that the GW method is particularly well suited for many-core architectures due to the ability to exploit a large amount of parallelism across many layers of the system. This work was supported by the U.S. Department of Energy, Office of Science, Basic Energy Sciences, Materials Sciences and Engineering Division, as part of the Computational Materials Sciences Program.
Modeling high-temperature superconductors and metallic alloys on the Intel IPSC/860

Science.gov (United States)

Geist, G. A.; Peyton, B. W.; Shelton, W. A.; Stocks, G. M.

Oak Ridge National Laboratory has embarked on several computational Grand Challenges, which require the close cooperation of physicists, mathematicians, and computer scientists. One of these projects is the determination of the material properties of alloys from first principles and, in particular, the electronic structure of high-temperature superconductors. While the present focus of the project is on superconductivity, the approach is general enough to permit study of other properties of metallic alloys such as strength and magnetic properties. This paper describes the progress to date on this project. We include a description of a self-consistent KKR-CPA method, parallelization of the model, and the incorporation of a dynamic load balancing scheme into the algorithm. We also describe the development and performance of a consolidated KKR-CPA code capable of running on CRAYs, workstations, and several parallel computers without source code modification. Performance of this code on the Intel iPSC/860 is also compared to a CRAY 2, CRAY YMP, and several workstations. Finally, some density of state calculations of two perovskite superconductors are given.
Application of Intel Many Integrated Core (MIC) architecture to the Yonsei University planetary boundary layer scheme in Weather Research and Forecasting model

Science.gov (United States)

Huang, Melin; Huang, Bormin; Huang, Allen H.

2014-10-01

The Weather Research and Forecasting (WRF) model provided operational services worldwide in many areas and has linked to our daily activity, in particular during severe weather events. The scheme of Yonsei University (YSU) is one of planetary boundary layer (PBL) models in WRF. The PBL is responsible for vertical sub-grid-scale fluxes due to eddy transports in the whole atmospheric column, determines the flux profiles within the well-mixed boundary layer and the stable layer, and thus provide atmospheric tendencies of temperature, moisture (including clouds), and horizontal momentum in the entire atmospheric column. The YSU scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. To accelerate the computation process of the YSU scheme, we employ Intel Many Integrated Core (MIC) Architecture as it is a multiprocessor computer structure with merits of efficient parallelization and vectorization essentials. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.4x. Furthermore, the same CPU-based optimizations improved the performance on Intel Xeon E5-2603 by a factor of 1.6x as compared to the first version of multi-threaded code.
Investigation of Large Scale Cortical Models on Clustered Multi-Core Processors

Science.gov (United States)

2013-02-01

Playstation 3 with 6 available SPU cores outperforms the Intel Xeon processor (with 4 cores) by about 1.9 times for the HTM model and by 2.4 times...runtime breakdowns of the HTM and Dean models respectively on the Cell processor (on the Playstation 3) and the Intel Xeon processor ( 4 thread...YOUR FORM TO THE ABOVE ORGANIZATION. 1. REPORT DATE (DD-MM-YYYY) 2. REPORT TYPE 3. DATES COVERED (From - To) 4 . TITLE AND SUBTITLE 5a. CONTRACT NUMBER
Evaluation of CHO Benchmarks on the Arria 10 FPGA using Intel FPGA SDK for OpenCL

Energy Technology Data Exchange (ETDEWEB)

Jin, Zheming [Argonne National Lab. (ANL), Argonne, IL (United States); Yoshii, Kazutomo [Argonne National Lab. (ANL), Argonne, IL (United States); Finkel, Hal [Argonne National Lab. (ANL), Argonne, IL (United States); Cappello, Franck [Argonne National Lab. (ANL), Argonne, IL (United States)

2017-05-23

The OpenCL standard is an open programming model for accelerating algorithms on heterogeneous computing system. OpenCL extends the C-based programming language for developing portable codes on different platforms such as CPU, Graphics processing units (GPUs), Digital Signal Processors (DSPs) and Field Programmable Gate Arrays (FPGAs). The Intel FPGA SDK for OpenCL is a suite of tools that allows developers to abstract away the complex FPGA-based development flow for a high-level software development flow. Users can focus on the design of hardware-accelerated kernel functions in OpenCL and then direct the tools to generate the low-level FPGA implementations. The approach makes the FPGA-based development more accessible to software users as the needs for hybrid computing using CPUs and FPGAs are increasing. It can also significantly reduce the hardware development time as users can evaluate different ideas with high-level language without deep FPGA domain knowledge. Benchmarking of OpenCL-based framework is an effective way for analyzing the performance of system by studying the execution of the benchmark applications. CHO is a suite of benchmark applications that provides support for OpenCL [1]. The authors presented CHO as an OpenCL port of the CHStone benchmark. Using Altera OpenCL (AOCL) compiler to synthesize the benchmark applications, they listed the resource usage and performance of each kernel that can be successfully synthesized by the compiler. In this report, we evaluate the resource usage and performance of the CHO benchmark applications using the Intel FPGA SDK for OpenCL and Nallatech 385A FPGA board that features an Arria 10 FPGA device. The focus of the report is to have a better understanding of the resource usage and performance of the kernel implementations using Arria-10 FPGA devices compared to Stratix-5 FPGA devices. In addition, we also gain knowledge about the limitations of the current compiler when it fails to synthesize a benchmark
76 FR 20674 - Tribal Consultation Meetings

Science.gov (United States)

2011-04-13

...--Paragon Casino Resort, 6773 East Tunica Drive, Marksville, LA 71351. FOR FURTHER INFORMATION CONTACT... will take place Thursday, May 19, 2011, at the Paragon Casino Resort in Marksville, Louisiana...
Inflatable Habitat with Integrated Primary and Secondary Structure, Phase I

Data.gov (United States)

National Aeronautics and Space Administration — Paragon Space Development Corp (Paragon) and Thin Red Line Aerospace (TRLA) proposes to explore the utilization of inflatable structures by designing a habitation...
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

Science.gov (United States)

Nadkarni, P M; Miller, P L

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
A dynamic programming approach for quickly estimating large network-based MEV models

DEFF Research Database (Denmark)

Mai, Tien; Frejinger, Emma; Fosgerau, Mogens

2017-01-01

We propose a way to estimate a family of static Multivariate Extreme Value (MEV) models with large choice sets in short computational time. The resulting model is also straightforward and fast to use for prediction. Following Daly and Bierlaire (2006), the correlation structure is defined by a ro...... to converge (4.3 h on an Intel(R) 3.2 GHz machine using a non-parallelized code). We also show that our approach allows to estimate a cross-nested logit model of 111 nests with a real data set of more than 100,000 observations in 14 h....
Performance Issues in High Performance Fortran Implementations of Sensor-Based Applications

Directory of Open Access Journals (Sweden)

David R. O'hallaron

1997-01-01

Full Text Available Applications that get their inputs from sensors are an important and often overlooked application domain for High Performance Fortran (HPF. Such sensor-based applications typically perform regular operations on dense arrays, and often have latency and through put requirements that can only be achieved with parallel machines. This article describes a study of sensor-based applications, including the fast Fourier transform, synthetic aperture radar imaging, narrowband tracking radar processing, multibaseline stereo imaging, and medical magnetic resonance imaging. The applications are written in a dialect of HPF developed at Carnegie Mellon, and are compiled by the Fx compiler for the Intel Paragon. The main results of the study are that (1 it is possible to realize good performance for realistic sensor-based applications written in HPF and (2 the performance of the applications is determined by the performance of three core operations: independent loops (i.e., loops with no dependences between iterations, reductions, and index permutations. The article discusses the implications for HPF implementations and introduces some simple tests that implementers and users can use to measure the efficiency of the loops, reductions, and index permutations generated by an HPF compiler.
Mapping robust parallel multigrid algorithms to scalable memory architectures

Science.gov (United States)

Overman, Andrea; Vanrosendale, John

1993-01-01

The convergence rate of standard multigrid algorithms degenerates on problems with stretched grids or anisotropic operators. The usual cure for this is the use of line or plane relaxation. However, multigrid algorithms based on line and plane relaxation have limited and awkward parallelism and are quite difficult to map effectively to highly parallel architectures. Newer multigrid algorithms that overcome anisotropy through the use of multiple coarse grids rather than relaxation are better suited to massively parallel architectures because they require only simple point-relaxation smoothers. In this paper, we look at the parallel implementation of a V-cycle multiple semicoarsened grid (MSG) algorithm on distributed-memory architectures such as the Intel iPSC/860 and Paragon computers. The MSG algorithms provide two levels of parallelism: parallelism within the relaxation or interpolation on each grid and across the grids on each multigrid level. Both levels of parallelism must be exploited to map these algorithms effectively to parallel architectures. This paper describes a mapping of an MSG algorithm to distributed-memory architectures that demonstrates how both levels of parallelism can be exploited. The result is a robust and effective multigrid algorithm for distributed-memory machines.
Evaluation of the Intel Xeon Phi Co-processor to accelerate the sensitivity map calculation for PET imaging

Science.gov (United States)

Dey, T.; Rodrigue, P.

2015-07-01

We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 103 to 104 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called `intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The examination
Evaluation of the Intel Xeon Phi Co-processor to accelerate the sensitivity map calculation for PET imaging

International Nuclear Information System (INIS)

Dey, T.; Rodrigue, P.

2015-01-01

We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 10 3 to 10 4 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called 'intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The

75 FR 26791 - General Motors Company, Formerly Known as General Motors Corporation, Mansfield Metal Center...

Science.gov (United States)

2010-05-12

...-Uniform Service, Cjbf, Llc, Ferrous Processing & Trading Co., Paragon Technologies and Severn Trent... leased from Aramark-Uniform Service, CJBF, LLC, Ferrous Processing & Trading Co., Paragon Technologies... Technologies and Severn Trent Services working on site at the Mansfield Metal Center, Mansfield, Ohio location...
Analysis of the Intel 386 and i486 microprocessors for the Space Station Freedom Data Management System

Science.gov (United States)

Liu, Yuan-Kwei

1991-01-01

The feasibility is analyzed of upgrading the Intel 386 microprocessor, which has been proposed as the baseline processor for the Space Station Freedom (SSF) Data Management System (DMS), to the more advanced i486 microprocessors. The items compared between the two processors include the instruction set architecture, power consumption, the MIL-STD-883C Class S (Space) qualification schedule, and performance. The advantages of the i486 over the 386 are (1) lower power consumption; and (2) higher floating point performance. The i486 on-chip cache does not have parity check or error detection and correction circuitry. The i486 with on-chip cache disabled, however, has lower integer performance than the 386 without cache, which is the current DMS design choice. Adding cache to the 386/386 DX memory hierachy appears to be the most beneficial change to the current DMS design at this time.
75 FR 35030 - Formations of, Acquisitions by, and Mergers of Bank Holding Companies

Science.gov (United States)

2010-06-21

... outstanding shares of Paragon Commercial Corporation, and its subsidiary, Paragon Commercial Bank, both of... FEDERAL RESERVE SYSTEM Formations of, Acquisitions by, and Mergers of Bank Holding Companies The companies listed in this notice have applied to the Board for approval, pursuant to the Bank Holding Company...
Mesa de coordenadas cartesianas (x,y para la perforación de materiales por medio de un microcontrolador 8051 de intel

Directory of Open Access Journals (Sweden)

Omar Yesid Flórez-Prada

2001-01-01

Full Text Available In our environment we are surrounded by a number of electronic systems that perform automatic operations according to a number of parameters previously programmed by the operator. This paper presents the prototype of a table of two coordinates (Cartesian plane (X, Y, which uses a development system based on the 8051 microcontroller INTEL (R (computer system, making the system function sending the respective control commands to locate the tool at different points of the work area of the table, the points are previously programmed by the operator, interacting with the keyboard. To make the movements of the table (X, Y, actuator devices responsible for carrying out a linear movement that moves the tool to the specified distance are used.
78 FR 38617 - Procedures for Establishing That an American Indian Group Exists as an Indian Tribe

Science.gov (United States)

2013-06-27

..... Seven Feathers Casino Resort, 146 Chief Miwaleta Lane, Canyonville, OR 97417, (541) 839-1111. July 25..., (800) 624-5572. July 29, 2013 9 a.m.-12 p.m 1 p.m.-4 p.m Petosky, Michigan.... Odawa Casino Resort... 9 a.m.-12 p.m 1 p.m.-4 p.m Marksville, Louisiana Paragon Casino Resort, 711 Paragon Place...
Comparison of two accelerators for Monte Carlo radiation transport calculations, Nvidia Tesla M2090 GPU and Intel Xeon Phi 5110p coprocessor: A case study for X-ray CT imaging dose calculation

International Nuclear Information System (INIS)

Liu, T.; Xu, X.G.; Carothers, C.D.

2015-01-01

Highlights: • A new Monte Carlo photon transport code ARCHER-CT for CT dose calculations is developed to execute on the GPU and coprocessor. • ARCHER-CT is verified against MCNP. • The GPU code on an Nvidia M2090 GPU is 5.15–5.81 times faster than the parallel CPU code on an Intel X5650 6-core CPU. • The coprocessor code on an Intel Xeon Phi 5110p coprocessor is 3.30–3.38 times faster than the CPU code. - Abstract: Hardware accelerators are currently becoming increasingly important in boosting high performance computing systems. In this study, we tested the performance of two accelerator models, Nvidia Tesla M2090 GPU and Intel Xeon Phi 5110p coprocessor, using a new Monte Carlo photon transport package called ARCHER-CT we have developed for fast CT imaging dose calculation. The package contains three components, ARCHER-CT CPU , ARCHER-CT GPU and ARCHER-CT COP designed to be run on the multi-core CPU, GPU and coprocessor architectures respectively. A detailed GE LightSpeed Multi-Detector Computed Tomography (MDCT) scanner model and a family of voxel patient phantoms are included in the code to calculate absorbed dose to radiosensitive organs under user-specified scan protocols. The results from ARCHER agree well with those from the production code Monte Carlo N-Particle eXtended (MCNPX). It is found that all the code components are significantly faster than the parallel MCNPX run on 12 MPI processes, and that the GPU and coprocessor codes are 5.15–5.81 and 3.30–3.38 times faster than the parallel ARCHER-CT CPU , respectively. The M2090 GPU performs better than the 5110p coprocessor in our specific test. Besides, the heterogeneous computation mode in which the CPU and the hardware accelerator work concurrently can increase the overall performance by 13–18%
Evaluation of a capillary zone electrophoresis system versus a conventional agarose gel system for routine serum protein separation and monoclonal component typing.

Science.gov (United States)

Roudiere, L; Boularan, A M; Bonardet, A; Vallat, C; Cristol, J P; Dupuy, A M

2006-01-01

Capillary zone electrophoresis of serum proteins is increasingly gaining impact in clinical laboratories. During 2003, we compared the fully automated capillary electrophoresis (CE) system from Beckman (Paragon CZE 2000) with the method agarose gel electrophoresis Sebia (Hydrasis-Hyris, AGE). This new study focused on the evaluation of analytical performance and a comparison including 115 fresh routine samples (group A) and a series of 97 frozen pathologic sera with suspicion of monoclonal protein (group B). Coefficients of variation (CVs %) for the five classical protein fractions have been reported to be consistenly serum samples (group B), there were 90 in which we detected a monoclonal protein by immunofixation (IF) (immunosubtraction (IS) was not used). AGE and Paragon 2000 failed to detect 7 and 12 monoclonal proteins, respectively, leading to a concordance to 92% for AGE and 87% for Paragon 2000 for identifying electrophoretic abnormalities in this group. Beta-globulin abnormalities and M paraprotein were well detected with Paragon 2000. Only 81% (21 vs 26) of the gammopathies were immunotyped with IS by two readers blinded to the IF immunotype. The Paragon 2000 is a reliable alternative to conventional agarose gel electrophoresis combining the advantages of full automation (rapidity, ease of use and cost) with high analytical performance. Qualified interpretation of results requires an adaptation period which could further improve concordance between the methods. Recently, this CE system has been improved by the manufacturer (Beckman) concerning the migration buffer and detection of beta-globulin abnormalities.
Parallel plasma fluid turbulence calculations

International Nuclear Information System (INIS)

Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.

1994-01-01

The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated
Portable, parallel, reusable Krylov space codes

Energy Technology Data Exchange (ETDEWEB)

Smith, B.; Gropp, W. [Argonne National Lab., IL (United States)

1994-12-31

Krylov space accelerators are an important component of many algorithms for the iterative solution of linear systems. Each Krylov space method has it`s own particular advantages and disadvantages, therefore it is desirable to have a variety of them available all with an identical, easy to use, interface. A common complaint application programmers have with available software libraries for the iterative solution of linear systems is that they require the programmer to use the data structures provided by the library. The library is not able to work with the data structures of the application code. Hence, application programmers find themselves constantly recoding the Krlov space algorithms. The Krylov space package (KSP) is a data-structure-neutral implementation of a variety of Krylov space methods including preconditioned conjugate gradient, GMRES, BiCG-Stab, transpose free QMR and CGS. Unlike all other software libraries for linear systems that the authors are aware of, KSP will work with any application codes data structures, in Fortran or C. Due to it`s data-structure-neutral design KSP runs unchanged on both sequential and parallel machines. KSP has been tested on workstations, the Intel i860 and Paragon, Thinking Machines CM-5 and the IBM SP1.
Using Kokkos for Performant Cross-Platform Acceleration of Liquid Rocket Simulations

Science.gov (United States)

2017-05-08

defined functors (like Thrust or Intel TBB) Backends for Nvidia GPU, Intel Xeon, Xeon Phi , IBM Power8, others. “View” data structure provides optimal... Architecture of my Kokkos framework Designed for minimally-invasive operation alongside large Fortran code. Everything is controlled from Fortran through a...Controls Kokkos initialization/finalization void initialize(…); void finalize(…); TVProperties* gettvproperties(); Architecture of my Kokkos framework
A Monte Carlo study of the ''minus sign problem'' in the t-J model using an intel IPSC/860 hypercube

International Nuclear Information System (INIS)

Kovarik, M.D.; Barnes, T.; Tennessee Univ., Knoxville, TN

1993-01-01

We describe a Monte Carlo simulation of the 2-dimensional t-J model on an Intel iPSC/860 hypercube. The problem studied is the determination of the dispersion relation of a dynamical hole in the t-J model of the high temperature superconductors. Since this problem involves the motion of many fermions in more than one spatial dimensions, it is representative of the class of systems that suffer from the ''minus sign problem'' of dynamical fermions which has made Monte Carlo simulation very difficult. We demonstrate that for small values of the hole hopping parameter one can extract the entire hole dispersion relation using the GRW Monte Carlo algorithm, which is a simulation of the Euclidean time Schroedinger equation, and present results on 4 x 4 and 6 x 6 lattices. We demonstrate that a qualitative picture at higher hopping parameters may be found by extrapolating weak hopping results where the minus sign problem is less severe. Generalization to physical hopping parameter values will only require use of an improved trial wavefunction for importance sampling
Quantum Chemical Calculations Using Accelerators: Migrating Matrix Operations to the NVIDIA Kepler GPU and the Intel Xeon Phi.

Science.gov (United States)

Leang, Sarom S; Rendell, Alistair P; Gordon, Mark S

2014-03-11

Increasingly, modern computer systems comprise a multicore general-purpose processor augmented with a number of special purpose devices or accelerators connected via an external interface such as a PCI bus. The NVIDIA Kepler Graphical Processing Unit (GPU) and the Intel Phi are two examples of such accelerators. Accelerators offer peak performances that can be well above those of the host processor. How to exploit this heterogeneous environment for legacy application codes is not, however, straightforward. This paper considers how matrix operations in typical quantum chemical calculations can be migrated to the GPU and Phi systems. Double precision general matrix multiply operations are endemic in electronic structure calculations, especially methods that include electron correlation, such as density functional theory, second order perturbation theory, and coupled cluster theory. The use of approaches that automatically determine whether to use the host or an accelerator, based on problem size, is explored, with computations that are occurring on the accelerator and/or the host. For data-transfers over PCI-e, the GPU provides the best overall performance for data sizes up to 4096 MB with consistent upload and download rates between 5-5.6 GB/s and 5.4-6.3 GB/s, respectively. The GPU outperforms the Phi for both square and nonsquare matrix multiplications.
Heat dissipation for the Intel Core i5 processor using multiwalled carbon-nanotube-based ethylene glycol

Energy Technology Data Exchange (ETDEWEB)

Thang, Bui Hung; Trinh, Pham Van; Quang, Le Dinh; Khoi, Phan Hong; Minh, Phan Ngoc [Vietnam Academy of Science and Technology, Ho Chi Minh CIty (Viet Nam); Huong, Nguyen Thi [Hanoi University of Science, Hanoi (Viet Nam); Vietnam National University, Hanoi (Viet Nam)

2014-08-15

Carbon nanotubes (CNTs) are some of the most valuable materials with high thermal conductivity. The thermal conductivity of individual multiwalled carbon nanotubes (MWCNTs) grown by using chemical vapor deposition is 600 ± 100 Wm{sup -1}K{sup -1} compared with the thermal conductivity 419 Wm{sup -1}K{sup -1} of Ag. Carbon-nanotube-based liquids - a new class of nanomaterials, have shown many interesting properties and distinctive features offering potential in heat dissipation applications for electronic devices, such as computer microprocessor, high power LED, etc. In this work, a multiwalled carbon-nanotube-based liquid was made of well-dispersed hydroxyl-functional multiwalled carbon nanotubes (MWCNT-OH) in ethylene glycol (EG)/distilled water (DW) solutions by using Tween-80 surfactant and an ultrasonication method. The concentration of MWCNT-OH in EG/DW solutions ranged from 0.1 to 1.2 gram/liter. The dispersion of the MWCNT-OH-based EG/DW solutions was evaluated by using a Zeta-Sizer analyzer. The MWCNT-OH-based EG/DW solutions were used as coolants in the liquid cooling system for the Intel Core i5 processor. The thermal dissipation efficiency and the thermal response of the system were evaluated by directly measuring the temperature of the micro-processor using the Core Temp software and the temperature sensors built inside the micro-processor. The results confirmed the advantages of CNTs in thermal dissipation systems for computer processors and other high-power electronic devices.
Heat dissipation for the Intel Core i5 processor using multiwalled carbon-nanotube-based ethylene glycol

International Nuclear Information System (INIS)

Thang, Bui Hung; Trinh, Pham Van; Quang, Le Dinh; Khoi, Phan Hong; Minh, Phan Ngoc; Huong, Nguyen Thi

2014-01-01

Carbon nanotubes (CNTs) are some of the most valuable materials with high thermal conductivity. The thermal conductivity of individual multiwalled carbon nanotubes (MWCNTs) grown by using chemical vapor deposition is 600 ± 100 Wm -1 K -1 compared with the thermal conductivity 419 Wm -1 K -1 of Ag. Carbon-nanotube-based liquids - a new class of nanomaterials, have shown many interesting properties and distinctive features offering potential in heat dissipation applications for electronic devices, such as computer microprocessor, high power LED, etc. In this work, a multiwalled carbon-nanotube-based liquid was made of well-dispersed hydroxyl-functional multiwalled carbon nanotubes (MWCNT-OH) in ethylene glycol (EG)/distilled water (DW) solutions by using Tween-80 surfactant and an ultrasonication method. The concentration of MWCNT-OH in EG/DW solutions ranged from 0.1 to 1.2 gram/liter. The dispersion of the MWCNT-OH-based EG/DW solutions was evaluated by using a Zeta-Sizer analyzer. The MWCNT-OH-based EG/DW solutions were used as coolants in the liquid cooling system for the Intel Core i5 processor. The thermal dissipation efficiency and the thermal response of the system were evaluated by directly measuring the temperature of the micro-processor using the Core Temp software and the temperature sensors built inside the micro-processor. The results confirmed the advantages of CNTs in thermal dissipation systems for computer processors and other high-power electronic devices.
Stereoscopic-3D display design: a new paradigm with Intel Adaptive Stable Image Technology [IA-SIT

Science.gov (United States)

Jain, Sunil

2012-03-01

Stereoscopic-3D (S3D) proliferation on personal computers (PC) is mired by several technical and business challenges: a) viewing discomfort due to cross-talk amongst stereo images; b) high system cost; and c) restricted content availability. Users expect S3D visual quality to be better than, or at least equal to, what they are used to enjoying on 2D in terms of resolution, pixel density, color, and interactivity. Intel Adaptive Stable Image Technology (IA-SIT) is a foundational technology, successfully developed to resolve S3D system design challenges and deliver high quality 3D visualization at PC price points. Optimizations in display driver, panel timing firmware, backlight hardware, eyewear optical stack, and synch mechanism combined can help accomplish this goal. Agnostic to refresh rate, IA-SIT will scale with shrinking of display transistors and improvements in liquid crystal and LED materials. Industry could profusely benefit from the following calls to action:- 1) Adopt 'IA-SIT S3D Mode' in panel specs (via VESA) to help panel makers monetize S3D; 2) Adopt 'IA-SIT Eyewear Universal Optical Stack' and algorithm (via CEA) to help PC peripheral makers develop stylish glasses; 3) Adopt 'IA-SIT Real Time Profile' for sub-100uS latency control (via BT Sig) to extend BT into S3D; and 4) Adopt 'IA-SIT Architecture' for Monitors and TVs to monetize via PC attach.
Scalable Algorithms for Clustering Large Geospatiotemporal Data Sets on Manycore Architectures

Science.gov (United States)

Mills, R. T.; Hoffman, F. M.; Kumar, J.; Sreepathi, S.; Sripathi, V.

2016-12-01

The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery using data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe a massively parallel implementation of accelerated k-means clustering and some optimizations to boost computational intensity and utilization of wide SIMD lanes on state-of-the art multi- and manycore processors, including the second-generation Intel Xeon Phi ("Knights Landing") processor based on the Intel Many Integrated Core (MIC) architecture, which includes several new features, including an on-package high-bandwidth memory. We also analyze the code in the context of a few practical applications to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.
Simulating the Euclidean time Schroedinger equations using an Intel iPSC/860 hypercube: Application to the t-J model of high-Tc superconductivity

International Nuclear Information System (INIS)

Kovarik, M.D.; Barnes, T.; Tennessee Univ., Knoxville, TN

1993-01-01

We describe a Monte Carlo simulation of a dynamical fermion problem in two spatial dimensions on an Intel iPSC/860 hypercube. The problem studied is the determination of the dispersion relation of a dynamical hole in the t-J model of the high temperature superconductors. Since this problem involves the motion of many fermions in more than one spatial dimensions, it is representative of the class of systems that suffer from the ''minus sign problem'' of dynamical fermions which has made Monte Carlo simulation very difficult. We demonstrate that for small values of the hole hopping parameter one can extract the entire hole dispersion relation using the GRW Monte Carlo algorithm, which is a simulation of the Euclidean time Schroedinger equation, and present results on 4 x 4 and 6 x 6 lattices. Generalization to physical hopping parameter values wig only require use of an improved trial wavefunction for importance sampling
Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture

KAUST Repository

AbdulJabbar, Mustafa Abdulmajeed; Al Farhan, Mohammed; Yokota, Rio; Keyes, David E.

2017-01-01

Manycore optimizations are essential for achieving performance worthy of anticipated exascale systems. Utilization of manycore chips is inevitable to attain the desired floating point performance of these energy-austere systems. In this work, we revisit ExaFMM, the open source Fast Multiple Method (FMM) library, in light of highly tuned shared-memory parallelization and detailed performance analysis on the new highly parallel Intel manycore architecture, Knights Landing (KNL). We assess scalability and performance gain using task-based parallelism of the FMM tree traversal. We also provide an in-depth analysis of the most computationally intensive part of the traversal kernel (i.e., the particle-to-particle (P2P) kernel), by comparing its performance across KNL and Broadwell architectures. We quantify different configurations that exploit the on-chip 512-bit vector units within different task-based threading paradigms. MPI communication-reducing and NUMA-aware approaches for the FMM’s global tree data exchange are examined with different cluster modes of KNL. By applying several algorithm- and architecture-aware optimizations for FMM, we show that the N-Body kernel on 256 threads of KNL achieves on average 2.8× speedup compared to the non-vectorized version, whereas on 56 threads of Broadwell, it achieves on average 2.9× speedup. In addition, the tree traversal kernel on KNL scales monotonically up to 256 threads with task-based programming models. The MPI-based communication-reducing algorithms show expected improvements of the data locality across the KNL on-chip network.
Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture

KAUST Repository

AbdulJabbar, Mustafa Abdulmajeed

2017-07-31

Manycore optimizations are essential for achieving performance worthy of anticipated exascale systems. Utilization of manycore chips is inevitable to attain the desired floating point performance of these energy-austere systems. In this work, we revisit ExaFMM, the open source Fast Multiple Method (FMM) library, in light of highly tuned shared-memory parallelization and detailed performance analysis on the new highly parallel Intel manycore architecture, Knights Landing (KNL). We assess scalability and performance gain using task-based parallelism of the FMM tree traversal. We also provide an in-depth analysis of the most computationally intensive part of the traversal kernel (i.e., the particle-to-particle (P2P) kernel), by comparing its performance across KNL and Broadwell architectures. We quantify different configurations that exploit the on-chip 512-bit vector units within different task-based threading paradigms. MPI communication-reducing and NUMA-aware approaches for the FMM’s global tree data exchange are examined with different cluster modes of KNL. By applying several algorithm- and architecture-aware optimizations for FMM, we show that the N-Body kernel on 256 threads of KNL achieves on average 2.8× speedup compared to the non-vectorized version, whereas on 56 threads of Broadwell, it achieves on average 2.9× speedup. In addition, the tree traversal kernel on KNL scales monotonically up to 256 threads with task-based programming models. The MPI-based communication-reducing algorithms show expected improvements of the data locality across the KNL on-chip network.
Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

KAUST Repository

Wu, Xingfu; Taylor, Valerie

2013-01-01

In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore supercomputers: IBM POWER4, POWER5+ and BlueGene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks and Intel's MPI benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore supercomputers because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyrokinetic Toroidal Code (GTC) in magnetic fusion to validate our performance model of the hybrid application on these multicore supercomputers. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore supercomputers. © 2013 Elsevier Inc.

Computation of large covariance matrices by SAMMY on graphical processing units and multicore CPUs

International Nuclear Information System (INIS)

Arbanas, G.; Dunn, M.E.; Wiarda, D.

2011-01-01

Computational power of Graphical Processing Units and multicore CPUs was harnessed by the nuclear data evaluation code SAMMY to speed up computations of large Resonance Parameter Covariance Matrices (RPCMs). This was accomplished by linking SAMMY to vendor-optimized implementations of the matrix-matrix multiplication subroutine of the Basic Linear Algebra Library to compute the most time-consuming step. The 235 U RPCM computed previously using a triple-nested loop was re-computed using the NVIDIA implementation of the subroutine on a single Tesla Fermi Graphical Processing Unit, and also using the Intel's Math Kernel Library implementation on two different multicore CPU systems. A multiplication of two matrices of dimensions 16,000×20,000 that had previously taken days, took approximately one minute on the GPU. Comparable performance was achieved on a dual six-core CPU system. The magnitude of the speed-up suggests that these, or similar, combinations of hardware and libraries may be useful for large matrix operations in SAMMY. Uniform interfaces of standard linear algebra libraries make them a promising candidate for a programming framework of a new generation of SAMMY for the emerging heterogeneous computing platforms. (author)
Computation of large covariance matrices by SAMMY on graphical processing units and multicore CPUs

Energy Technology Data Exchange (ETDEWEB)

Arbanas, G.; Dunn, M.E.; Wiarda, D., E-mail: arbanasg@ornl.gov, E-mail: dunnme@ornl.gov, E-mail: wiardada@ornl.gov [Oak Ridge National Laboratory, Oak Ridge, TN (United States)

2011-07-01

Computational power of Graphical Processing Units and multicore CPUs was harnessed by the nuclear data evaluation code SAMMY to speed up computations of large Resonance Parameter Covariance Matrices (RPCMs). This was accomplished by linking SAMMY to vendor-optimized implementations of the matrix-matrix multiplication subroutine of the Basic Linear Algebra Library to compute the most time-consuming step. The {sup 235}U RPCM computed previously using a triple-nested loop was re-computed using the NVIDIA implementation of the subroutine on a single Tesla Fermi Graphical Processing Unit, and also using the Intel's Math Kernel Library implementation on two different multicore CPU systems. A multiplication of two matrices of dimensions 16,000×20,000 that had previously taken days, took approximately one minute on the GPU. Comparable performance was achieved on a dual six-core CPU system. The magnitude of the speed-up suggests that these, or similar, combinations of hardware and libraries may be useful for large matrix operations in SAMMY. Uniform interfaces of standard linear algebra libraries make them a promising candidate for a programming framework of a new generation of SAMMY for the emerging heterogeneous computing platforms. (author)
Children's Physical Activity While Gardening: Development of a Valid and Reliable Direct Observation Tool.

Science.gov (United States)

Myers, Beth M; Wells, Nancy M

2015-04-01

Gardens are a promising intervention to promote physical activity (PA) and foster health. However, because of the unique characteristics of gardening, no extant tool can capture PA, postures, and motions that take place in a garden. The Physical Activity Research and Assessment tool for Garden Observation (PARAGON) was developed to assess children's PA levels, tasks, postures, and motions, associations, and interactions while gardening. PARAGON uses momentary time sampling in which a trained observer watches a focal child for 15 seconds and then records behavior for 15 seconds. Sixty-five children (38 girls, 27 boys) at 4 elementary schools in New York State were observed over 8 days. During the observation, children simultaneously wore Actigraph GT3X+ accelerometers. The overall interrater reliability was 88% agreement, and Ebel was .97. Percent agreement values for activity level (93%), garden tasks (93%), motions (80%), associations (95%), and interactions (91%) also met acceptable criteria. Validity was established by previously validated PA codes and by expected convergent validity with accelerometry. PARAGON is a valid and reliable observation tool for assessing children's PA in the context of gardening.
La reforma projectada de la Comissió de Propietat Intel·lectual del Ministeri de Cultura i el “procediment per al restabliment de la legalitat”, la instrucció i resolució del qual es vol atribuir a la seva proposada Secció Segona.

Directory of Open Access Journals (Sweden)

Pablo Ferrándiz

2010-07-01

Full Text Available La disposició final segona del Projecte de llei d'economia sostenible que el Govern ha presentat al Congrés dels Diputats projecta crear una nova comissió de propietat intel·lectual al si del Ministeri de Cultura, que passarà a estar integrada per dues seccions: la Secció Primera i la Secció Segona. En particular, a la Secció Segona es vol atribuir la instrucció i resolució d'un nou procediment administratiu, anomenat per al restabliment de la legalitat, el qual podrà finalitzar amb l'adopció de mesures de restricció a la prestació de serveis de la societat de la informació, com ara la interrupció del servei o la retirada de les dades, quan vulnerin drets de propietat intel·lectual, però sempre que el seu responsable ho dugui a terme amb ànim de lucre, directe o indirecte, o hagi causat o sigui susceptible de causar un dany patrimonial. Encara que l'execució de tals mesures requerirà l'autorització judicial prèvia mitjançant una interlocutòria, l'òrgan competent per a la qual és el Jutjat Central Contenciós Administratiu de l'Audiència Nacional, l'atribució a una administració pública de competències per a resoldre controvèrsies o conflictes entre particulars, sobre drets (els de propietat intel·lectual de naturalesa estrictament privada (això és, uns drets, encara que sigui especial, de propietat, planteja no pocs interrogants passant l'assumpte pel tamís del principi de separació de poders que consagra l'article 117 de la Constitució i qüestiona la neutralitat de l'Administració pública, ja que no es justifica degudament en el projecte de llei que amb la seva actuació serveix, com exigeix l'article 103 del text fonamental, amb objectivitat, als interessos generals.
The kpx, a program analyzer for parallelization

International Nuclear Information System (INIS)

Matsuyama, Yuji; Orii, Shigeo; Ota, Toshiro; Kume, Etsuo; Aikawa, Hiroshi.

1997-03-01

The kpx is a program analyzer, developed as a common technological basis for promoting parallel processing. The kpx consists of three tools. The first is ktool, that shows how much execution time is spent in program segments. The second is ptool, that shows parallelization overhead on the Paragon system. The last is xtool, that shows parallelization overhead on the VPP system. The kpx, designed to work for any FORTRAN cord on any UNIX computer, is confirmed to work well after testing on Paragon, SP2, SR2201, VPP500, VPP300, Monte-4, SX-4 and T90. (author)
Using Intel's Knight Landing Processor to Accelerate Global Nested Air Quality Prediction Modeling System (GNAQPMS) Model

Science.gov (United States)

Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.

2016-12-01

The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.
Exact diagonalization of quantum lattice models on coprocessors

Science.gov (United States)

Siro, T.; Harju, A.

2016-10-01

We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
Parallel solution of the time-dependent Ginzburg-Landau equations and other experiences using BlockComm-Chameleon and PCN on the IBM SP, Intel iPSC/860, and clusters of workstations

International Nuclear Information System (INIS)

Coskun, E.

1995-09-01

Time-dependent Ginzburg-Landau (TDGL) equations are considered for modeling a thin-film finite size superconductor placed under magnetic field. The problem then leads to the use of so-called natural boundary conditions. Computational domain is partitioned into subdomains and bond variables are used in obtaining the corresponding discrete system of equations. An efficient time-differencing method based on the Forward Euler method is developed. Finally, a variable strength magnetic field resulting in a vortex motion in Type II High T c superconducting films is introduced. The authors tackled the problem using two different state-of-the-art parallel computing tools: BlockComm/Chameleon and PCN. They had access to two high-performance distributed memory supercomputers: the Intel iPSC/860 and IBM SP1. They also tested the codes using, as a parallel computing environment, a cluster of Sun Sparc workstations
DEVELOPMENT OF THE INTEGRATED WATER RECOVERY ASSEMBLY (IRA) FOR RECYCLING HABITATION WASTEWATER STREAMS, Phase I

Data.gov (United States)

National Aeronautics and Space Administration — Paragon Space Development Corporation and our partner Research Institution Texas Tech University (TTU) propose to develop a spacecraft habitat wastewater recycling...
How Managers' everyday decisions create or destroy your company's strategy.

Science.gov (United States)

Bower, Joseph L; Gilbert, Clark G

2007-02-01

Senior executives have long been frustrated by the disconnection between the plans and strategies they devise and the actual behavior of the managers throughout the company. This article approaches the problem from the ground up, recognizing that every time a manager allocates resources, that decision moves the company either into or out of alignment with its announced strategy. A well-known story--Intel's exit from the memory business--illustrates this point. When discussing what businesses Intel should be in, Andy Grove asked Gordon Moore what they would do if Intel were a company that they had just acquired. When Moore answered, "Get out of memory," they decided to do just that. It turned out, though, that Intel's revenues from memory were by this time only 4% of total sales. Intel's lower-level managers had already exited the business. What Intel hadn't done was to shut down the flow of research funding into memory (which was still eating up one-third of all research expenditures); nor had the company announced its exit to the outside world. Because divisional and operating managers-as well as customers and capital markets-have such a powerful impact on the realized strategy of the firm, senior management might consider focusing less on the company's formal strategy and more on the processes by which the company allocates resources. Top managers must know the track record of the people who are making resource allocation proposals; recognize the strategic issues at stake; reach down to operational managers to work across division lines; frame resource questions to reflect the corporate perspective, especially when large sums of money are involved and conditions are highly uncertain; and create a new context that allows top executives to circumvent the regular resource allocation process when necessary.
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

Science.gov (United States)

Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

2016-07-19

Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
Macroprocessing is the computing design principle for the times

CERN Multimedia

2001-01-01

In a keynote speech, Intel Corporation CEO Craig Barrett emphasized that "macroprocessing" provides innovative and cost effective solutions to companies that they can customize and scale to match their own data needs. Barrett showcased examples of macroprocessing implementations from business, government and the scientific community, which use the power of Intel Architecture and Oracle9i Real Application Clusters to build large complex and scalable database solutions. A testimonial from CERN explained how the need for high performance computing to perform scientific research on sub-atomic particles was accomplished by using clusters of Xeon processor-based servers.
Employing Ionomer Membrane Technology to Extract Water from Brine, Phase I

Data.gov (United States)

National Aeronautics and Space Administration — Paragon Space Development Corporation proposes the use of an microporous-ionomer membrane pair to improve the robustness and effectiveness of membrane-based water...
Tririga

Data.gov (United States)

Department of Veterans Affairs — The Paragon and Tririga Applications are project management programs utilized by CFM for construction programs. The contents of the databases are a compiliation of...
Communication strategies for angular domain decomposition of transport calculations on message passing multiprocessors

International Nuclear Information System (INIS)

Azmy, Y.Y.

1997-01-01

The effect of three communication schemes for solving Arbitrarily High Order Transport (AHOT) methods of the Nodal type on parallel performance is examined via direct measurements and performance models. The target architecture in this study is Oak Ridge National Laboratory's 128 node Paragon XP/S 5 computer and the parallelization is based on the Parallel Virtual Machine (PVM) library. However, the conclusions reached can be easily generalized to a large class of message passing platforms and communication software. The three schemes considered here are: (1) PVM's global operations (broadcast and reduce) which utilizes the Paragon's native corresponding operations based on a spanning tree routing; (2) the Bucket algorithm wherein the angular domain decomposition of the mesh sweep is complemented with a spatial domain decomposition of the accumulation process of the scalar flux from the angular flux and the convergence test; (3) a distributed memory version of the Bucket algorithm that pushes the spatial domain decomposition one step farther by actually distributing the fixed source and flux iterates over the memories of the participating processes. Their conclusion is that the Bucket algorithm is the most efficient of the three if all participating processes have sufficient memories to hold the entire problem arrays. Otherwise, the third scheme becomes necessary at an additional cost to speedup and parallel efficiency that is quantifiable via the parallel performance model
Safe, Non-Corrosive Dielectric Fluid for Stagnating Radiator Thermal Control System, Phase I

Data.gov (United States)

National Aeronautics and Space Administration — Paragon proposes to develop a single-loop, non-toxic, stagnating active pumped loop thermal control design for NASA's Orion or Lunar Surface Access Module (LSAM)...
What's New in Computers

Indian Academy of Sciences (India)

In late 1996 Intel announced an enhancement of the Pentium. Processor architecture and christened it MMX technology or multimedia extensions. The Intel MMX technology is purportedly the most significant enhancement to the Intel architecture since the extension of the x86 architecture to 32 bits in 1985 when Intel first ...
Sound : Albumid / Satanizer

Index Scriptorium Estoniae

Satanizer

1998-01-01

Uutest albumitest Paragon Of Beauty "The Spring",Vibratsioon vol.2 (eesti pop undergroundikogumik), Crossing all over! vol.7(laia ampluaaga duubelkogumik), Rappers paradise V' ja Monkey Maffia "Shoot the boss" lühitutvustused
Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2

Science.gov (United States)

Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad

1995-01-01

The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.
Computational experience with a parallel algorithm for tetrangle inequality bound smoothing.

Science.gov (United States)

Rajan, K; Deo, N

1999-09-01

Determining molecular structure from interatomic distances is an important and challenging problem. Given a molecule with n atoms, lower and upper bounds on interatomic distances can usually be obtained only for a small subset of the 2(n(n-1)) atom pairs, using NMR. Given the bounds so obtained on the distances between some of the atom pairs, it is often useful to compute tighter bounds on all the 2(n(n-1)) pairwise distances. This process is referred to as bound smoothing. The initial lower and upper bounds for the pairwise distances not measured are usually assumed to be 0 and infinity. One method for bound smoothing is to use the limits imposed by the triangle inequality. The distance bounds so obtained can often be tightened further by applying the tetrangle inequality--the limits imposed on the six pairwise distances among a set of four atoms (instead of three for the triangle inequalities). The tetrangle inequality is expressed by the Cayley-Menger determinants. For every quadruple of atoms, each pass of the tetrangle inequality bound smoothing procedure finds upper and lower limits on each of the six distances in the quadruple. Applying the tetrangle inequalities to each of the (4n) quadruples requires O(n4) time. Here, we propose a parallel algorithm for bound smoothing employing the tetrangle inequality. Each pass of our algorithm requires O(n3 log n) time on a REW PRAM (Concurrent Read Exclusive Write Parallel Random Access Machine) with O(log(n)n) processors. An implementation of this parallel algorithm on the Intel Paragon XP/S and its performance are also discussed.

ASACUSA measures microwave transition in antiprotonic helium

CERN Document Server

Eades, John

2003-01-01

The ASACUSA collaboration has reinforced its status as a paragon of precision physics by following up its impressive six parts in 10/sup 8/ measurement of the antiproton's charge and mass with new measurements of its magnetism. (4 refs).
The Intelence aNd pRezista Once A Day Study (INROADS): a multicentre, single-arm, open-label study of etravirine and darunavir/ritonavir as dual therapy in HIV-1-infected early treatment-experienced subjects.

Science.gov (United States)

Ruane, P J; Brinson, C; Ramgopal, M; Ryan, R; Coate, B; Cho, M; Kakuda, T N; Anderson, D

2015-05-01

Following antiretroviral therapy failure, patients are often treated with a three-drug regimen that includes two nucleoside/tide reverse transcriptase inhibitors [N(t)RTIs]. An alternative two-drug nucleoside-sparing regimen may decrease the pill burden and drug toxicities associated with the use of N(t)RTIs. The Intelence aNd pRezista Once A Day Study (INROADS; NCT01199939) evaluated the nucleoside-sparing regimen of etravirine 400 mg with darunavir/ritonavir 800/100 mg once-daily in HIV-1-infected treatment-experienced subjects or treatment-naïve subjects with transmitted resistance. In this exploratory phase 2b, single-arm, open-label, multicentre, 48-week study, the primary endpoint was the proportion of subjects who achieved HIV-1 RNA treatment-experienced subjects or treatment-naïve subjects with transmitted resistance was virologically efficacious and well tolerated. © 2014 British HIV Association.
'Micro-8' micro-computer system

International Nuclear Information System (INIS)

Yagi, Hideyuki; Nakahara, Yoshinori; Yamada, Takayuki; Takeuchi, Norio; Koyama, Kinji

1978-08-01

The micro-computer Micro-8 system has been developed to organize a data exchange network between various instruments and a computer group including a large computer system. Used for packet exchangers and terminal controllers, the system consists of ten kinds of standard boards including a CPU board with INTEL-8080 one-chip-processor. CPU architecture, BUS architecture, interrupt control, and standard-boards function are explained in circuit block diagrams. Operations of the basic I/O device, digital I/O board and communication adapter are described with definitions of the interrupt ramp status, I/O command, I/O mask, data register, etc. In the appendixes are circuit drawings, INTEL-8080 micro-processor specifications, BUS connections, I/O address mappings, jumper connections of address selection, and interface connections. (author)
Software Aspects of IEEE Floating-Point Computations for Numerical Applications in High Energy Physics

CERN Multimedia

CERN. Geneva

2010-01-01

hazards to be avoided About the speaker Jeffrey M Arnold is a Senior Software Engineer in the Intel Compiler and Languages group at Intel Corporation. He has been part of the Digital->Compaq->Intel compiler organization for nearly 20 years; part of that time, he worked on both lo...
Particle-in-Cell laser-plasma simulation on Xeon Phi coprocessors

Science.gov (United States)

Surmin, I. A.; Bastrakov, S. I.; Efimenko, E. S.; Gonoskov, A. A.; Korzhimanov, A. V.; Meyerov, I. B.

2016-05-01

This paper concerns the development of a high-performance implementation of the Particle-in-Cell method for plasma simulation on Intel Xeon Phi coprocessors. We discuss the suitability of the method for Xeon Phi architecture and present our experience in the porting and optimization of the existing parallel Particle-in-Cell code PICADOR. Direct porting without code modification gives performance on Xeon Phi close to that of an 8-core CPU on a benchmark problem with 50 particles per cell. We demonstrate step-by-step optimization techniques, such as improving data locality, enhancing parallelization efficiency and vectorization leading to an overall 4.2 × speedup on CPU and 7.5 × on Xeon Phi compared to the baseline version. The optimized version achieves 16.9 ns per particle update on an Intel Xeon E5-2660 CPU and 9.3 ns per particle update on an Intel Xeon Phi 5110P. For a real problem of laser ion acceleration in targets with surface grating, where a large number of macroparticles per cell is required, the speedup of Xeon Phi compared to CPU is 1.6 ×.
Scalable Architectures : The Future is Multicore

NARCIS (Netherlands)

Juurlink, B.; Meenderinck, C.

2008-01-01

“The future is multicore”, it sounds like a slogan from an Intel commercial. Indeed, a few years ago Intel introduced the Core Duo microprocessor and, more recently, quad-core technology. But, is it all pure marketing? Was Intel tired of advertising with clock frequencies? Did they need another buzz
Parallelizing ATLAS Reconstruction and Simulation: Issues and Optimization Solutions for Scaling on Multi- and Many-CPU Platforms

International Nuclear Information System (INIS)

Leggett, C; Jackson, K; Tatarkhanov, M; Yao, Y; Binet, S; Levinthal, D

2011-01-01

Thermal limitations have forced CPU manufacturers to shift from simply increasing clock speeds to improve processor performance, to producing chip designs with multi- and many-core architectures. Further the cores themselves can run multiple threads as a zero overhead context switch allowing low level resource sharing (Intel Hyperthreading). To maximize bandwidth and minimize memory latency, memory access has become non uniform (NUMA). As manufacturers add more cores to each chip, a careful understanding of the underlying architecture is required in order to fully utilize the available resources. We present AthenaMP and the Atlas event loop manager, the driver of the simulation and reconstruction engines, which have been rewritten to make use of multiple cores, by means of event based parallelism, and final stage I/O synchronization. However, initial studies on 8 andl6 core Intel architectures have shown marked non-linearities as parallel process counts increase, with as much as 30% reductions in event throughput in some scenarios. Since the Intel Nehalem architecture (both Gainestown and Westmere) will be the most common choice for the next round of hardware procurements, an understanding of these scaling issues is essential. Using hardware based event counters and Intel's Performance Tuning Utility, we have studied the performance bottlenecks at the hardware level, and discovered optimization schemes to maximize processor throughput. We have also produced optimization mechanisms, common to all large experiments, that address the extreme nature of today's HEP code, which due to it's size, places huge burdens on the memory infrastructure of today's processors.
A scalable PC-based parallel computer for lattice QCD

International Nuclear Information System (INIS)

Fodor, Z.; Katz, S.D.; Pappa, G.

2003-01-01

A PC-based parallel computer for medium/large scale lattice QCD simulations is suggested. The Eoetvoes Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes. Gigabit Ethernet cards are used for nearest neighbor communication in a two-dimensional mesh. The sustained performance for dynamical staggered (wilson) quarks on large lattices is around 70(110) GFlops. The exceptional price/performance ratio is below $1/Mflop
A scalable PC-based parallel computer for lattice QCD

International Nuclear Information System (INIS)

Fodor, Z.; Papp, G.

2002-09-01

A PC-based parallel computer for medium/large scale lattice QCD simulations is suggested. The Eoetvoes Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7 GHz nodes. Gigabit Ethernet cards are used for nearest neighbor communication in a two-dimensional mesh. The sustained performance for dynamical staggered(wilson) quarks on large lattices is around 70(110) GFlops. The exceptional price/performance ratio is below $1/Mflop. (orig.)
Cognitive Medical Wireless Testbed System (COMWITS)

Science.gov (United States)

2016-11-01

Number: ...... ...... Sub Contractors (DD882) Names of other research staff Inventions (DD882) Scientific Progress This testbed merges two ARO grants...bit 64 bit CPU Intel Xeon Processor E5-1650v3 (6C, 3.5 GHz, Turbo, HT , 15M, 140W) Intel Core i7-3770 (3.4 GHz Quad Core, 77W) Dual Intel Xeon
Sizing Analysis for Aircraft Utilizing Hybrid-Electric Propulsion Systems

Science.gov (United States)

2011-03-18

world, the paragon of animals -William Shakespeare I would not have made it this far without the love and support of my parents. Their work-ethic...xiii I. Introduction ...Condition 1 SIZING ANALYSIS FOR AIRCRAFT UTILIZING HYBRID- ELECTRIC PROPULSION SYSTEMS I. Introduction 1. Background Physically
The 'border within': inhabiting the border in Trieste

NARCIS (Netherlands)

Bialasiewicz, L.; Minca, C.

2010-01-01

In this paper we look to the Italian border city of Trieste-at various points in its past, a cosmopolitan port, Austria's urbs europeissima, but also a battleground for competing understandings of territoriality, identity, and belonging and a paragon of the violent application of an ethnoterritorial
Implementation of a 3D plasma particle-in-cell code on a MIMD parallel computer

International Nuclear Information System (INIS)

Liewer, P.C.; Lyster, P.; Wang, J.

1993-01-01

A three-dimensional plasma particle-in-cell (PIC) code has been implemented on the Intel Delta MIMD parallel supercomputer using the General Concurrent PIC algorithm. The GCPIC algorithm uses a domain decomposition to divide the computation among the processors: A processor is assigned a subdomain and all the particles in it. Particles must be exchanged between processors as they move. Results are presented comparing the efficiency for 1-, 2- and 3-dimensional partitions of the three dimensional domain. This algorithm has been found to be very efficient even when a large fraction (e.g. 30%) of the particles must be exchanged at every time step. On the 512-node Intel Delta, up to 125 million particles have been pushed with an electrostatic push time of under 500 nsec/particle/time step
TH-A-19A-08: Intel Xeon Phi Implementation of a Fast Multi-Purpose Monte Carlo Simulation for Proton Therapy

Energy Technology Data Exchange (ETDEWEB)

Souris, K; Lee, J; Sterpin, E [Universite catholique de Louvain, Brussels (Belgium)

2014-06-15

Purpose: Recent studies have demonstrated the capability of graphics processing units (GPUs) to compute dose distributions using Monte Carlo (MC) methods within clinical time constraints. However, GPUs have a rigid vectorial architecture that favors the implementation of simplified particle transport algorithms, adapted to specific tasks. Our new, fast, and multipurpose MC code, named MCsquare, runs on Intel Xeon Phi coprocessors. This technology offers 60 independent cores, and therefore more flexibility to implement fast and yet generic MC functionalities, such as prompt gamma simulations. Methods: MCsquare implements several models and hence allows users to make their own tradeoff between speed and accuracy. A 200 MeV proton beam is simulated in a heterogeneous phantom using Geant4 and two configurations of MCsquare. The first one is the most conservative and accurate. The method of fictitious interactions handles the interfaces and secondary charged particles emitted in nuclear interactions are fully simulated. The second, faster configuration simplifies interface crossings and simulates only secondary protons after nuclear interaction events. Integral depth-dose and transversal profiles are compared to those of Geant4. Moreover, the production profile of prompt gammas is compared to PENH results. Results: Integral depth dose and transversal profiles computed by MCsquare and Geant4 are within 3%. The production of secondaries from nuclear interactions is slightly inaccurate at interfaces for the fastest configuration of MCsquare but this is unlikely to have any clinical impact. The computation time varies between 90 seconds for the most conservative settings to merely 59 seconds in the fastest configuration. Finally prompt gamma profiles are also in very good agreement with PENH results. Conclusion: Our new, fast, and multi-purpose Monte Carlo code simulates prompt gammas and calculates dose distributions in less than a minute, which complies with clinical time
TH-A-19A-08: Intel Xeon Phi Implementation of a Fast Multi-Purpose Monte Carlo Simulation for Proton Therapy

International Nuclear Information System (INIS)

Souris, K; Lee, J; Sterpin, E

2014-01-01

Purpose: Recent studies have demonstrated the capability of graphics processing units (GPUs) to compute dose distributions using Monte Carlo (MC) methods within clinical time constraints. However, GPUs have a rigid vectorial architecture that favors the implementation of simplified particle transport algorithms, adapted to specific tasks. Our new, fast, and multipurpose MC code, named MCsquare, runs on Intel Xeon Phi coprocessors. This technology offers 60 independent cores, and therefore more flexibility to implement fast and yet generic MC functionalities, such as prompt gamma simulations. Methods: MCsquare implements several models and hence allows users to make their own tradeoff between speed and accuracy. A 200 MeV proton beam is simulated in a heterogeneous phantom using Geant4 and two configurations of MCsquare. The first one is the most conservative and accurate. The method of fictitious interactions handles the interfaces and secondary charged particles emitted in nuclear interactions are fully simulated. The second, faster configuration simplifies interface crossings and simulates only secondary protons after nuclear interaction events. Integral depth-dose and transversal profiles are compared to those of Geant4. Moreover, the production profile of prompt gammas is compared to PENH results. Results: Integral depth dose and transversal profiles computed by MCsquare and Geant4 are within 3%. The production of secondaries from nuclear interactions is slightly inaccurate at interfaces for the fastest configuration of MCsquare but this is unlikely to have any clinical impact. The computation time varies between 90 seconds for the most conservative settings to merely 59 seconds in the fastest configuration. Finally prompt gamma profiles are also in very good agreement with PENH results. Conclusion: Our new, fast, and multi-purpose Monte Carlo code simulates prompt gammas and calculates dose distributions in less than a minute, which complies with clinical time
Object identification with deep learning using Intel DAAL on Knights Landing processor [Vidyo

CERN Multimedia

CERN. Geneva

2017-01-01

The problem of object recognition is computationally expensive, especially when large amounts of data is involved. Recently, techniques in deep neural networks (DNN) - including convolutional neural networks and residual neural networks - have shown great recognition accuracy compared to traditional methods (artificial neural networks, decision tress, etc.). However, experience reveals that there are still a number of factors that limit scientists from deriving the full performance benefits of large, DNNs. We summarize these challenges as follows: (1) large number of hyperparameters that have to be tuned against the DNN during training phase, leading to several data re-computations over a large design-space, (2) the share volume of data used for training, resulting in prolonged training time, (3) how to effectively utilize underlying hardware (compute, network and storage) to achieve maximum performance during this training phase. In this presentation, we discuss a cross-layer perspective into realizing effic...
Follow the leader : Hewlett-Packard and its succession crisis

NARCIS (Netherlands)

Lachotzki, F.; Olson, M.A.

2012-01-01

A paragon of the Silicon Valley start-up ethos (and with a solid internal culture known as “The HP Way”), by the turn of the 21st century Hewlett-Packard has become a leading worldwide provider of personal computer equipment (particularly laptops), printers and other accessories. Yet shortly
The strange birth of liberal Denmark

DEFF Research Database (Denmark)

Henriksen, Ingrid; Sharpe, Paul; Lampe, Markus

2012-01-01

into decline, in fact flourished. Key to the success of Danish agriculture was an early diversification towards dairy production. This article challenges this simple story which sees Denmark as something of a liberal paragon. Denmark's success owed much to a prudent use of trade policy which favoured dairy...
Comparison of 2 accelerators of Monte Carlo radiation transport calculations, NVIDIA tesla M2090 GPU and Intel Xeon Phi 5110p coprocessor: a case study for X-ray CT Imaging Dose calculation

International Nuclear Information System (INIS)

Liu, T.; Xu, X.G.; Carothers, C.D.

2013-01-01

Hardware accelerators are currently becoming increasingly important in boosting high performance computing systems. In this study, we tested the performance of two accelerator models, NVIDIA Tesla M2090 GPU and Intel Xeon Phi 5110p coprocessor, using a new Monte Carlo photon transport package called ARCHER-CT we have developed for fast CT imaging dose calculation. The package contains three code variants, ARCHER-CT(CPU), ARCHER-CT(GPU) and ARCHER-CT(COP) to run in parallel on the multi-core CPU, GPU and coprocessor architectures respectively. A detailed GE LightSpeed Multi-Detector Computed Tomography (MDCT) scanner model and a family of voxel patient phantoms were included in the code to calculate absorbed dose to radiosensitive organs under specified scan protocols. The results from ARCHER agreed well with those from the production code Monte Carlo N-Particle eXtended (MCNPX). It was found that all the code variants were significantly faster than the parallel MCNPX running on 12 MPI processes, and that the GPU and coprocessor performed equally well, being 2.89-4.49 and 3.01-3.23 times faster than the parallel ARCHER-CT(CPU) running with 12 hyper-threads. (authors)
Comparison of Two Accelerators for Monte Carlo Radiation Transport Calculations, NVIDIA Tesla M2090 GPU and Intel Xeon Phi 5110p Coprocessor: A Case Study for X-ray CT Imaging Dose Calculation

Science.gov (United States)

Liu, Tianyu; Xu, X. George; Carothers, Christopher D.

2014-06-01

Hardware accelerators are currently becoming increasingly important in boosting high performance computing sys- tems. In this study, we tested the performance of two accelerator models, NVIDIA Tesla M2090 GPU and Intel Xeon Phi 5110p coprocessor, using a new Monte Carlo photon transport package called ARCHER-CT we have developed for fast CT imaging dose calculation. The package contains three code variants, ARCHER - CTCPU, ARCHER - CTGPU and ARCHER - CTCOP to run in parallel on the multi-core CPU, GPU and coprocessor architectures respectively. A detailed GE LightSpeed Multi-Detector Computed Tomography (MDCT) scanner model and a family of voxel patient phantoms were included in the code to calculate absorbed dose to radiosensitive organs under specified scan protocols. The results from ARCHER agreed well with those from the production code Monte Carlo N-Particle eXtended (MCNPX). It was found that all the code variants were significantly faster than the parallel MCNPX running on 12 MPI processes, and that the GPU and coprocessor performed equally well, being 2.89~4.49 and 3.01~3.23 times faster than the parallel ARCHER - CTCPU running with 12 hyperthreads.

Fulltext PDF

Indian Academy of Sciences (India)

All calculations were performed on PC com- puter using WINGX program.20 Molecular graphics were generated using the DIAMOND Version 2,21 and. MERCURY 2.4 software.22. 2.3 Instrumentation. The infrared spectra were recorded on a Perkin Elmer. (FT-IR) Paragon 1000 Pc spectrometer in the range. 4000–400 cm.
The Community College President: Working with and through the Media to Advance the Institution

Science.gov (United States)

Carringer, Paul T.

2013-01-01

The purpose of this study was to examine how community college presidents successfully work with and through the media to advance their institutions. Four successful cases were studied. These success stories came from the list of Paragon Award winners selected annually by the National Council of Marketing and Public Relations (NCMPR) and be cross…
Limited tender notification for the AMC of desktop computers and

Indian Academy of Sciences (India)

Brand. 1 INTEL 82945 EXPRESS. P4 3.0GHZ. 1X2GB + 1 x 1 GB =3GB. DDR2. 160 GB STATA. WINDOWS 7 PRO SP1. 32BIT. CD/ DVDRW. 17 " LCD. 2 INTEL G31PR. PENTIUM DUAL E2160. 1.8GHZ. 2X2GB =4GB DDR2. 160 GB STATA. WINDOWS 7 PRO SP1. 32BIT. CD/ DVDRW. 17 " LCD. 3 INTEL G31/33 EXPRESS.
Proceedings of the Workshop on Future Directions in Computer Architecture and Software, Held in Charleston, South Carolina on 5-7 May 1986,

Science.gov (United States)

1986-08-30

the Intel /Caltech Cosmic Cube8 ; (6) massive SIMD array processors , such as the 11 Gflop IBM GFii 9 , built specifically for physics quark modelling...randomly between 0 and 100 msec. a separate communication processor ( Intel 80186) associ- (or 0 to 80 message-interaction times). In this case, in-line a...machine was announced by IMS Associates. % o The processors were Intel 8080’s. In 1983 a working hypercube, the 64-node Cosmic Cube, was demonstrated at
Naval Science & Technology: Enabling the Future Force

Science.gov (United States)

2013-04-01

corn for disruptive technologies Laser Cooling Spintronics Bz 1st U.S. Intel satellite GRAB Semiconductors GaAs, GaN, SiC GPS...Payoff • Innovative and game-changing • Approved by Corporate Board • Delivers prototype Innovative Naval Prototypes (5-10 Year) Disruptive ... Technologies Free Electron Laser Integrated Topside EM Railgun Sea Base Enablers Tactical Satellite Large Displacement UUV AACUS Directed
Step by step parallel programming method for molecular dynamics code

International Nuclear Information System (INIS)

Orii, Shigeo; Ohta, Toshio

1996-07-01

Parallel programming for a numerical simulation program of molecular dynamics is carried out with a step-by-step programming technique using the two phase method. As a result, within the range of a certain computing parameters, it is found to obtain parallel performance by using the level of parallel programming which decomposes the calculation according to indices of do-loops into each processor on the vector parallel computer VPP500 and the scalar parallel computer Paragon. It is also found that VPP500 shows parallel performance in wider range computing parameters. The reason is that the time cost of the program parts, which can not be reduced by the do-loop level of the parallel programming, can be reduced to the negligible level by the vectorization. After that, the time consuming parts of the program are concentrated on less parts that can be accelerated by the do-loop level of the parallel programming. This report shows the step-by-step parallel programming method and the parallel performance of the molecular dynamics code on VPP500 and Paragon. (author)
76 FR 32372 - Notice of Receipt of Complaint; Solicitation of Comments Relating to the Public Interest

Science.gov (United States)

2011-06-06

... Rica S.A. of Costa Rica, Intel Malaysia Sdn. Bhd of Malaysia, Intel (Philippines) of the Philippines... any public health, safety, or welfare concerns in the United States relating to the potential orders...
Xeon Phi - A comparison between the newly introduced MIC architecture and a standard CPU through three types of problems.

OpenAIRE

Kristiansen, Joakim

2016-01-01

As Moore s law continues, processors keep getting more cores packed together on the chip. This thesis is an empirical study of the rather newly introduced Intel Many Integrated Core (IMIC) architecture found in the Intel Xeon Phi. With roughly 60 cores connected by a high performance on-die interconnect, the Intel Xeon Phi makes an interesting candidate for High Performance Computing. By digging into parallel algorithms solving three well known problems, our goal is to optimize, test and comp...
Smart contracts sobre Bitcoin

OpenAIRE

Andreu Alemany, Josep Miquel

2016-01-01

El present treball final de màster realitza una introducció als smart contracts. El treball introdueix el concepte de contracte intel·ligent, els seus usos i alguns exemples existents. Seguidament proporciona les nocions necessàries de les transaccions del protocol Bitcoin per poder implementar un contracte intel·ligent, usant la blockchain que ofereix el protocol. Per últim, s'explica la implementació d'un contracte intel·ligent usant bitcoin: un canal de micropagaments. El presente traba...
Army Grit: Field Marshal Viscount Slims Key to Victory

Science.gov (United States)

2017-05-25

24 Part Two: The Education of a Grit Paragon...Damon is a professor of psychology at Stanford Graduate School of Education ; and Duckworth, Grit: The Power of Passion and Perseverance, 91. . 8 The...visitors from the fighting troops, especially those who interfered with the pleasant tenor of staff life, were not really welcome.”84 Later in his career
African Instituted Churches in Southern Africa: Paragons of Regional ...

African Journals Online (AJOL)

sulaiman.adebowale

2006-05-23

May 23, 2006 ... reducing discussions on regional cooperation to purely economic and materialistic ..... references to religion and culture in a document dealing with economic .... nomic Cooperation and Integration, Harare: SAPES Books.
Cerebral arteriovenous malformation in Noonan's syndrome.

OpenAIRE

Schon, F.; Bowler, J.; Baraitser, M.

1992-01-01

Noonan's syndrome involves the association of multiple congenital abnormalities including neck webbing, pectus excavatum, facial anomalies with a variety of cardiac defects. In this paper the association of Noonan's syndrome with a large cerebral arteriovenous malformation is reported. Congenital cerebrovascular abnormalities are not a recognized feature of the syndrome. The paper also reviews previous reports of neurological associations with Noonan's syndrome, the commonest being mild intel...
The quantum structure of matter grand challenge project: Large-scale 3-D solutions in relativistic quantum dynamics

International Nuclear Information System (INIS)

Wells, J.C.; Oberacker, V.E.; Umar, A.S.

1993-01-01

We describe the numerical methods used to solve the time-dependent Dirac equation on a three-dimensional Cartesian lattice. Efficient algorithms are required for computationally intensive studies of nonperturbative relativistic quantum dynamics. Discretization is achieved through the lattice basis-spline collocation method, in which quantum-state vectors and coordinate-space operators are expressed in terms of basis-spline functions on a spatial lattice. All numerical procedures reduce to a series of matrix-vector operations which we perform on the Intel iPSC/860 hypercube, making full use of parallelism. We discuss our solutions to the problems of limited node memory and node-to-node communication overhead inherent in using distributed-memory, multiple-instruction, multiple-data stream parallel computers
SiRen: Leveraging Similar Regions for Efficient and Accurate Variant Calling

Science.gov (United States)

2015-05-30

Cloudera, EMC2, Ericsson, Facebook, Guavus, HP, Huawei, Informatica , Intel, Microsoft, NetApp, Pivotal, Samsung, Schlumberger, Splunk, Virdata and VMware...EMC2, Ericsson, Facebook, Guavus, HP, Huawei, Informatica , Intel, Microsoft, NetApp, Pivotal, Samsung, Schlumberger, Splunk, Virdata and VMware
Air Revitalization System Enables Excursions to the Stratosphere

Science.gov (United States)

2015-01-01

Paragon Space Development Corporation, based in Tucson, Arizona has had a long history of collaboration with NASA, including developing a modular air purification system under the Commercial Crew Development Program, designed to support the commercial space sector. Using that device and other NASA technology, startup company World View is now gearing up to take customers on helium balloon rides to the stratosphere.
78 FR 12354 - Certain Microprocessors, Components Thereof, and Products Containing Same; Termination of...

Science.gov (United States)

2013-02-22

... by accessing its Internet server ( http://www.usitc.gov ). The public record for this investigation... of Penang, Malaysia; and Intel Products (Chengdu) Ltd. of Chengdu, China (collectively, ``Intel... Commission's forthcoming opinion. The authority for the Commission's determination is contained in section...
The parallel algorithm for the 2D discrete wavelet transform

Science.gov (United States)

Barina, David; Najman, Pavel; Kleparnik, Petr; Kula, Michal; Zemcik, Pavel

2018-04-01

The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.
Bridging the Cyberspace Gap: Washington and Silicon Valley

Science.gov (United States)

2017-12-21

Emerging Technologies and National Security and Director of the Digital and Cyberspace Policy Program at the Council on Foreign Relations . One of the...large public role in tracing cyberattacks to nation-states and other perpetrators. In addition, Alphabet, Amazon, Apple, Cisco, Facebook, IBM, Intel...2013 after a number of public disputes.1 In December 2015, a terrorist killed 14 people in San Bernardino, California. The Federal Bureau of
Implementación a nivel universitario del curso INTEL educar para el futuro: sistematización de la experiencia en la Facultad de Educación de la Universidad de Costa Rica

Directory of Open Access Journals (Sweden)

Flora Eugenia Salas Madriz

2003-01-01

Full Text Available La Facultad de Educación de la Universidad de Costa Rica, consciente del impacto e importancia que están adquiriendo las tecnologías educativas digitales en el ámbito educativo a nivel nacional y mundial, ha creado el Programa Tecnologías Educativas Avanzadas (PROTEA para el desarrollo y la investigación en este campo. En este artículo se sistematiza la experiencia de nueve docentes de la Universidad de Costa Rica, que colaboraron en la adecuación para el nivel universitario del curso Intel Educar para el Futuro, impartido por la Fundación Omar Dengo. Éste está orientado a educadores de primaria y secundaria, y su propósito es poner a disposición de los educadores las herramientas del Office® de Microsoft ® para desarrollar los procesos de enseñanza. Debido a las diferencias de fondo y forma entre la educación primaria y secundaria, y la universitaria, se hizo necesario revisar el manual y la metodología a fin de adaptarlos a las necesidades de la docencia universitaria y de la formación de formadores. En este sentido, la experiencia y colaboración de los docentes participantes fue determinante para lograr que el curso alcance hoy sus objetivos de poner estas herramientas al servicio de la docencia universitaria
A High Performance Computing Framework for Physics-based Modeling and Simulation of Military Ground Vehicles

Science.gov (United States)

2011-03-25

cluster. The co-processing idea is the enabler of the heterogeneous computing concept advertised recently as the paradigm capable of delivering exascale ...Petascale to Exascale : Extending Intel’s HPC Commitment: http://download.intel.com/pressroom/archive/reference/ISC_2010_Skaugen_keynote.pdf in

ML-o-Scope: A Diagnostic Visualization System for Deep Machine Learning Pipelines

Science.gov (United States)

2014-05-16

Huawei , Intel, Microsoft, NetApp, Pivotal, Splunk, Virdata, VMware, WANdisco and Yahoo!. ML-o-scope: a diagnostic visualization system for deep machine...Facebook, GameOnTalis, Guavus, HP, Huawei , Intel, Microsoft, NetApp, Pivotal, Splunk, Virdata, VMware, WANdisco and Yahoo!. References [1] Bruna, J., and
Towards Wearable Cognitive Assistance

Science.gov (United States)

2013-12-01

MHz 2002 Itanium R© 1 GHz Blackberry 133 MHz 5810 2007 Intel R© 9.6 GHz Apple 412 MHz CoreTM 2 (4 cores) iPhone 2011 Intel R© 32 GHz Samsung 2.4 GHz...Xeon R© X5 (2x6 cores) Galaxy S2 (2 cores) 2013 Intel R© 64 GHz Samsung 6.4 GHz Xeon R© E5 (2x12 cores) Galaxy S4 (4 cores) Google Glass 2.4 GHz OMAP... diverse inputs: the language content and deep semantics of the words, the tone in which they are spoken, the facial expressions and eye movements with
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

Science.gov (United States)

Sun, Xian-He

1997-01-01

Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

Science.gov (United States)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
Enabling Computational Dynamics in Distributed Computing Environments Using a Heterogeneous Computing Template

Science.gov (United States)

2011-08-09

heterogeneous computing concept advertised recently as the paradigm capable of delivering exascale flop rates by the end of the decade. In this framework...and Lamb. Page 10 of 10 UNCLASSIFIED [3] Skaugen, K., Petascale to Exascale : Extending Intel’s HPC Commitment: http://download.intel.com
ICE-DIP kicks off

CERN Multimedia

CERN Bulletin

2013-01-01

Last month, Marie Curie Actions* added a new member to its ranks: ICE-DIP (the Intel-CERN European Doctorate Industrial Program). The programme held its kick-off meeting on 18-19 February in Leixlip near Dublin, Ireland, at Intel’s premises. Building on CERN’s long-standing relationship with Intel in the CERN openlab project, ICE-DIP brings together CERN and industrial partners, Intel and Xena Networks, to train five Early Stage ICT Researchers. These researchers will be funded by the European Commission and granted a CERN Fellow contract while enrolled in the doctoral programmes at partner universities Dublin City University and National University of Ireland Maynooth. The researchers will go on extended secondments to Intel Labs Europe locations across Europe during their three-year training programme. The primary focus of the ICE-DIP researchers will be the development of techniques for acquiring and processing data that are relevant for the trigger a...
ICE-DIP closing workshop - Public session | 14 September

CERN Multimedia

2016-01-01

ICE-DIP, the Intel-CERN European Doctorate Industrial Program (see here), is a European Industrial Doctorate scheme (see here) led by CERN. The focus of the project, which launched in 2013, has been the development of techniques for acquiring and processing data that are relevant for the trigger and data-acquisition systems of the LHC experiments. The results will be publicly presented in an open session on the afternoon of 14th September. Building on CERN’s long-standing relationship with Intel through CERN openlab, ICE-DIP brings together CERN, Intel and research universities to offer training to five PhD students in advanced information and communication technologies (ICT). These young researchers have been funded by the European Commission as fellows at CERN and enrolled in doctoral programmes at the National University of Ireland Maynooth and Dublin City University. They have each completed 18 month secondments at Intel locations around the world gaining in-depth experience of the v...
Ideal Directed-Energy System To Defeat Small Unmanned Aircraft System Swarms

Science.gov (United States)

2017-05-21

large number of animate or inanimate things massed together and usually in motion.”19 Unlike bees that developed swarming behaviors over time...set multiple records in recent years. From 2015 to 2017, Intel increased the quantity of sUAS in their light shows conducted around the world from...successfully- tests- worlds -largest-micro-drone-swarm. 25 Ibid. 26 Chris Loterina, “Department Of Defense Tests Swarm Of 3D-Printed Micro-Drones Called Perdix
Metaobjects as a programming tool / Robert William Lemke

OpenAIRE

Lemke, Robert William

2010-01-01

Computer applications can be described as largely rigid structures within which an information seeker must navigate in search of information - each screen, each transaction having underlying unique code. The larger the application, the higher the number of lines of code and the larger the size of the application executable. This study suggests an alternative pattern based approach, an approach driven by the information seeker. This alternative approach makes use of value embedded in intell...
76 FR 39895 - In the Matter of Certain Microprocessors, Components Thereof, and Products Containing Same...

Science.gov (United States)

2011-07-07

... Commission Rule 210.50(b)(1), 19 CFR 210.50(b)(1), the presiding administrative law judge shall take evidence...; Intel Malaysia Sdn. Bhd, Bayan Lapas Free Trade Zone, Phase III, Penang 11900, Malaysia; Intel...; Hewlett-Packard Company, 3000 Hanover Street, Palo Alto, CA 94304. (c) The Office of Unfair Import...
A Digital Motion Control System for Large Telescopes

Science.gov (United States)

Hunter, T. R.; Wilson, R. W.; Kimberk, R.; Leiker, P. S.

2001-05-01

We have designed and programmed a digital motion control system for large telescopes, in particular, the 6-meter antennas of the Submillimeter Array on Mauna Kea. The system consists of a single robust, high-reliability microcontroller board which implements a two-axis velocity servo while monitoring and responding to critical safety parameters. Excellent tracking performance has been achieved with this system (0.3 arcsecond RMS at sidereal rate). The 24x24 centimeter four-layer printed circuit board contains a multitude of hardware devices: 40 digital inputs (for limit switches and fault indicators), 32 digital outputs (to enable/disable motor amplifiers and brakes), a quad 22-bit ADC (to read the motor tachometers), four 16-bit DACs (that provide torque signals to the motor amplifiers), a 32-LED status panel, a serial port to the LynxOS PowerPC antenna computer (RS422/460kbps), a serial port to the Palm Vx handpaddle (RS232/115kbps), and serial links to the low-resolution absolute encoders on the azimuth and elevation axes. Each section of the board employs independent ground planes and power supplies, with optical isolation on all I/O channels. The processor is an Intel 80C196KC 16-bit microcontroller running at 20MHz on an 8-bit bus. This processor executes an interrupt-driven, scheduler-based software system written in C and assembled into an EPROM with user-accessible variables stored in NVSRAM. Under normal operation, velocity update requests arrive at 100Hz from the position-loop servo process running independently on the antenna computer. A variety of telescope safety checks are performed at 279Hz including routine servicing of a 6 millisecond watchdog timer. Additional ADCs onboard the microcontroller monitor the winding temperature and current in the brushless three-phase drive motors. The PID servo gains can be dynamically changed in software. Calibration factors and software filters can be applied to the tachometer readings prior to the application of
The Intelcities Community of Practice: The Capacity-Building, Co-Design, Evaluation, and Monitoring of E-Government Services

Science.gov (United States)

Deakin, Mark; Lombardi, Patrizia; Cooper, Ian

2011-01-01

The paper examines the IntelCities Community of Practice (CoP) supporting the development of the organization's capacity-building, co-design, monitoring, and evaluation of e-government services. It begins by outlining the IntelCities CoP and goes on to set out the integrated model of electronically enhanced government (e-government) services…
Genten: Software for Generalized Tensor Decompositions v. 1.0.0

Energy Technology Data Exchange (ETDEWEB)

2017-06-22

Tensors, or multidimensional arrays, are a powerful mathematical means of describing multiway data. This software provides computational means for decomposing or approximating a given tensor in terms of smaller tensors of lower dimension, focusing on decomposition of large, sparse tensors. These techniques have applications in many scientific areas, including signal processing, linear algebra, computer vision, numerical analysis, data mining, graph analysis, neuroscience and more. The software is designed to take advantage of parallelism present emerging computer architectures such has multi-core CPUs, many-core accelerators such as the Intel Xeon Phi, and computation-oriented GPUs to enable efficient processing of large tensors.
Dual-core Itanium Processor

CERN Multimedia

2006-01-01

Intel’s first dual-core Itanium processor, code-named "Montecito" is a major release of Intel's Itanium 2 Processor Family, which implements the Intel Itanium architecture on a dual-core processor with two cores per die (integrated circuit). Itanium 2 is much more powerful than its predecessor. It has lower power consumption and thermal dissipation.
Monte Carlo simulations of quantum systems on massively parallel supercomputers

International Nuclear Information System (INIS)

Ding, H.Q.

1993-01-01

A large class of quantum physics applications uses operator representations that are discrete integers by nature. This class includes magnetic properties of solids, interacting bosons modeling superfluids and Cooper pairs in superconductors, and Hubbard models for strongly correlated electrons systems. This kind of application typically uses integer data representations and the resulting algorithms are dominated entirely by integer operations. The authors implemented an efficient algorithm for one such application on the Intel Touchstone Delta and iPSC/860. The algorithm uses a multispin coding technique which allows significant data compactification and efficient vectorization of Monte Carlo updates. The algorithm regularly switches between two data decompositions, corresponding naturally to different Monte Carlo updating processes and observable measurements such that only nearest-neighbor communications are needed within a given decomposition. On 128 nodes of Intel Delta, this algorithm updates 183 million spins per second (compared to 21 million on CM-2 and 6.2 million on a Cray Y-MP). A systematic performance analysis shows a better than 90% efficiency in the parallel implementation
A glimpse into the future for 12 young scientists

CERN Multimedia

Jordan Juras

2011-01-01

Last week, CERN received a visit from a gifted group of high school students. The winners of CERN’s Special Award at the Intel International Science and Engineering Fair (ISEF) were invited to spend a few days here and discover first hand what it's like to work in such a complex environment and how to best enjoy oneself in this part of the world. ISEF students sit with Wolfgang Von Rüden outside of the Globe. In early 2009, Craig Barrett, Intel’s chairman of the Board at the time, visited CERN as part of Intel’s partnership in CERN openlab. He and Wolfgang von Rüden, former IT Department Head, agreed to create the CERN Special Award for the Intel International Science and Engineering Fair (ISEF) – a 5-day trip to CERN for 12 students, co-funded by CERN and Intel. The annual Intel ISEF is an aspiration for students, who participate at the local high school science fair level. Students who succeed there go on to compete at ISEF affiliated ...
First evaluation of the CPU, GPGPU and MIC architectures for real time particle tracking based on Hough transform at the LHC

International Nuclear Information System (INIS)

V Halyo, V Halyo; LeGresley, P; Lujan, P; Karpusenko, V; Vladimirov, A

2014-01-01

Recent innovations focused around parallel processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA's Tesla Graphics Processing Unit (GPU) or Intel's Xeon Phi, in the High Level Trigger. These accelerators have the potential to provide faster or more energy efficient event selection, thus opening up possibilities for new complex triggers that were not previously feasible. At the same time, it is crucial to explore the performance limits achievable on the latest generation multicore CPUs with the use of the best software optimization methods. In this article, a new tracking algorithm based on the Hough transform will be evaluated for the first time on multi-core Intel i7-3770 and Intel Xeon E5-2697v2 CPUs, an NVIDIA Tesla K20c GPU, and an Intel Xeon Phi 7120 coprocessor. Preliminary time performance will be presented
Building Task-Oriented Applications: An Introduction to the Legion Programming Paradigm

Science.gov (United States)

2015-02-01

Intel Corporation. Intel Xeon Phi Coprocessor Instruction Set Architecture Reference Manual, 7 Sep 2012. [accessed 2014 Jun 1]. https...algorithmic development rather than the idiosyncrasies of different computing architectures . Legion is part of a small but growing movement to treat...Multicore central processing units (CPUs) and various hardware accelerators exacerbate a complex architectural landscape that inevitably constrains
Locally orderless registration code

DEFF Research Database (Denmark)

2012-01-01

This is code for the TPAMI paper "Locally Orderless Registration". The code requires intel threadding building blocks installed and is provided for 64 bit on mac, linux and windows.......This is code for the TPAMI paper "Locally Orderless Registration". The code requires intel threadding building blocks installed and is provided for 64 bit on mac, linux and windows....
Flavorings in Context: Spices and Herbs in Medieval Near East

OpenAIRE

Lewicka, Paulina B.

2011-01-01

Throughout history, the approach towards imported spices varied from culture to culture. In medieval and early post-medieval Europe, where spices became an exotic object of temporary desire, they were often used unskillfully and in a haphazard manner. In the Ottoman Constantinople, unlike in Europe, it was the moderate use of spices, and not overdosing them, that became a manifestation of status. As deliberate paragons of refinement, the Ottomans depreciated what they considered uncivilized w...

CEIBA: a fast track plan with multiphase pumping by 750 m of water; CEIBA: un projet fast track avec pompage multiphasique par 750 m d'eau. Canyon Express: le plus profond des reseaux de production. Le Saibos FDS: un navire polyvalent adapte au defit des developements en grande profondeur

Energy Technology Data Exchange (ETDEWEB)

Delaporte, M. [Paragon Litwin, 92 - Nanterre (France); Obsen, M. [Framo Engineering, Bergen (Norway); Bang, P.; RIjkens, F. [Total, La Defense 6, 92 - Courbevoie (France); Poirson, L. [Saibos sas, 78 - Guyancourt (France)

2003-08-01

The development of the field of CEIBA, located at 22 miles off the equatorial Guinea coast, has been the aim of a presentation AFTP/SPE on May 21 at Clamart. M. Delaporte, director of offshore study and development of projects at Paragon Litwin has presented at first an overview of the plan and then M. Olsen, commercial responsible of Framo Engineering has given precision on the polyphasic pumping equipments which particularize this development. (O.M.)
Kalman Filter Tracking on Parallel Architectures

International Nuclear Information System (INIS)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2016-01-01

Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment
I processorie Itanium "accelerano" il CERN

CERN Multimedia

2001-01-01

"Il CERN è di nuovo in prima fila nell'abbinamento tra informatica e ricerca scientifica. L'organizzazione europea utilizzerà anche computer basati sui nuovi microprocessori Intel Itanium a 64 bit". - CERN is again at the forefront of the coupling of informatics and scientific research. The European organization will use computers with new 64 bit Intel Itanium microprocessors (0.5 page)
LTE-Enhanced Cognitive Radio Network Testbed (LTE-CORNET)

Science.gov (United States)

2016-11-01

4 PERCENT_SUPPORTEDNAME FTE Equivalent: Total Number: Sub Contractors (DD882) Names of Personnel receiving masters degrees Names of personnel...Turbo, HT , 15M, 140W) Intel Core i7-3770 (3.4 GHz Quad Core, 77W) Dual Intel Xeon E5-2695 v4 (18C, 2.1GHz, 3.3GHz Turbo, 2400MHz, 45MB, 120W
Embedded Systems Design with 8051 Microcontrollers

DEFF Research Database (Denmark)

Karakahayov, Zdravko; Winther, Ole; Christensen, Knud Smed

Textbook on embedded microcontrollers. Example microcontroller family: Intel 8051 with special emphasis on Philips 80C552. Structure, design examples and programming in C and assembler. Hardware - software codesign. EProm emulator.......Textbook on embedded microcontrollers. Example microcontroller family: Intel 8051 with special emphasis on Philips 80C552. Structure, design examples and programming in C and assembler. Hardware - software codesign. EProm emulator....
Mask manufacturing improvement through capability definition and bottleneck line management

Science.gov (United States)

Strott, Al

1994-02-01

In 1989, Intel's internal mask operation limited itself to research and development activities and re-inspection and pellicle application of externally manufactured masks. Recognizing the rising capital cost of mask manufacturing at the leading edge, Intel's Mask Operation management decided to offset some of these costs by manufacturing more masks internally. This was the beginning of the challenge they set to manufacture at least 50% of Intel's mask volume internally, at world class performance levels. The first step in responding to this challenge was the completion of a comprehensive operation capability analysis. A series of bottleneck improvements by focus teams resulted in an average cycle time improvement to less than five days on all product and less than two days on critical products.
A GPU offloading mechanism for LHCb

International Nuclear Information System (INIS)

Badalov, Alexey; Cardona, Xavier Vilasis; Perez, Daniel Hugo Campora; Zvyagin, Alexander; Neufeld, Niko

2014-01-01

The current computational infrastructure at LHCb is designed for sequential execution. It is possible to make use of modern multi-core machines by using multi-threaded algorithms and running multiple instances in parallel, but there is no way to make efficient use of specialized massively parallel hardware, such as graphical processing units and Intel Xeon/Phi. We extend the current infrastructure with an out-of-process computational server able to gather data from multiple instances and process them in large batches.
UTILIZAÇÃO DAS EXTENSÕES MULTIMÍDIA DOS PROCESSADORES INTEL® PARA REDUÇÃO DO NÚMERO DE CICLOS PARA A EXECUÇÃO DE PROGRAMAS

OpenAIRE

HOLANDA, Adriano de Jesus; RUIZ, Evandro Eduardo Seron; CARNEIRO, Antonio Adilton Oliveira2

2014-01-01

The use of multimedia extension with registers that perform the same operation in multiple data of the current processors should decrease the execution time of programs used to perform the same operation in a large quantity of data. The aim of the work was quantify the number of cycles needed to perform two-dimensional cross correlation calculation on a number of generated series with different number of elements, using a program compiled using Intel® Assembly x86-64 language and SSE extensio...
Mixed Precision Solver Scalable to 16000 MPI Processes for Lattice Quantum Chromodynamics Simulations on the Oakforest-PACS System

OpenAIRE

Boku, Taisuke; Ishikawa, Ken-Ichi; Kuramashi, Yoshinobu; Meadows, Lawrence

2017-01-01

Lattice Quantum Chromodynamics (Lattice QCD) is a quantum field theory on a finite discretized space-time box so as to numerically compute the dynamics of quarks and gluons to explore the nature of subatomic world. Solving the equation of motion of quarks (quark solver) is the most compute-intensive part of the lattice QCD simulations and is one of the legacy HPC applications. We have developed a mixed-precision quark solver for a large Intel Xeon Phi (KNL) system named "Oakforest-PACS", empl...
Parallel Application Performance on Two Generations of Intel Xeon HPC Platforms

Energy Technology Data Exchange (ETDEWEB)

Chang, Christopher H.; Long, Hai; Sides, Scott; Vaidhynathan, Deepthi; Jones, Wesley

2015-10-15

Two next-generation node configurations hosting the Haswell microarchitecture were tested with a suite of microbenchmarks and application examples, and compared with a current Ivy Bridge production node on NREL" tm s Peregrine high-performance computing cluster. A primary conclusion from this study is that the additional cores are of little value to individual task performance--limitations to application parallelism, or resource contention among concurrently running but independent tasks, limits effective utilization of these added cores. Hyperthreading generally impacts throughput negatively, but can improve performance in the absence of detailed attention to runtime workflow configuration. The observations offer some guidance to procurement of future HPC systems at NREL. First, raw core count must be balanced with available resources, particularly memory bandwidth. Balance-of-system will determine value more than processor capability alone. Second, hyperthreading continues to be largely irrelevant to the workloads that are commonly seen, and were tested here, at NREL. Finally, perhaps the most impactful enhancement to productivity might occur through enabling multiple concurrent jobs per node. Given the right type and size of workload, more may be achieved by doing many slow things at once, than fast things in order.
Virtual Wingman: Harnessing the Future Unstructured Information Environment to Achieve Mission Success

Science.gov (United States)

2010-12-01

Review reporter Erica Naone interviewed Amazon , Intel, Enomaly, and SUN Micro- systems executives concerning cloud computing use. They 17 suggested that...Naone interviewed Amazon , Intel, Enomaly, and SUN Micro- systems executives concerning cloud computing use. They Cloud Computing—By using a thin...Jesús Favela, Alfredo Preciado, and Aurora Vizcaino. “Agent-Based Ambient Intelligence for Healthcare.” AI Communications 18, no. 3 (September 2005
Cost/Performance Ratio Achieved by Using a Commodity-Based Cluster

Science.gov (United States)

Lopez, Isaac

2001-01-01

Researchers at the NASA Glenn Research Center acquired a commodity cluster based on Intel Corporation processors to compare its performance with a traditional UNIX cluster in the execution of aeropropulsion applications. Since the cost differential of the clusters was significant, a cost/performance ratio was calculated. After executing a propulsion application on both clusters, the researchers demonstrated a 9.4 cost/performance ratio in favor of the Intel-based cluster. These researchers utilize the Aeroshark cluster as one of the primary testbeds for developing NPSS parallel application codes and system software. The Aero-shark cluster provides 64 Intel Pentium II 400-MHz processors, housed in 32 nodes. Recently, APNASA - a code developed by a Government/industry team for the design and analysis of turbomachinery systems was used for a simulation on Glenn's Aeroshark cluster.
The design, validation, and performance of Grace

Directory of Open Access Journals (Sweden)

Ru Zhu

2016-05-01

Full Text Available The design, validation and performance of Grace, a GPU-accelerated micromagnetic simulation software, are presented. The software adopts C+ + Accelerated Massive Parallelism (C+ + AMP so that it runs on GPUs from various hardware vendors including NVidia, AMD and Intel. At large simulation scales, up to two orders of magnitude of speedup factor is observed, compared to CPU-based micromagnetic simulation software OOMMF. The software can run on high-end professional GPUs as well as budget personal laptops, and is free to download.
Experimental data of co-crystals of Etravirine and L-tartaric acid

Directory of Open Access Journals (Sweden)

Mikal Rekdal

2018-02-01

Full Text Available Etravirine is a drug used alongside other medication in the treatment of HIV and is a non-nucleoside reverse transcriptase inhibitor. It is a BCS class IV drug, having low solubility and high permeability (Drugbank, https://www.drugbank.ca/drugs/DB06414 [1]. As a result, large doses of the drug are required for treatment. Two pills have to be taken twice a day, making it a “pill burden” (Intelence, http://www.intelence.com/hcp/dosing/administration-options [2]. Therefore, attempts of co-crystallizing Etravirine are attractive as the solubility of the drug tends to increase in this solid form (Schultheiss and Newman, 2009 [3].In this study Etravirine co-crystals were synthesized in the molar ratios 1:1, 1:2 and 2:1 with L-tartaric acid as the co-former. Both slow evaporation and physical mixture was performed to mix the components. DSC values of final products are presented as well as FTIR spectra to observe the altered intermolecular interactions. A chemical stability test was performed after seven days using area under curve data from an HPLC instrument. Keywords: Etravirine, Co-crystals, HPLC, FTIR instrument, DSC instrument
Vectorization, parallelization and porting of nuclear codes (porting). Progress report fiscal 1998

International Nuclear Information System (INIS)

Nemoto, Toshiyuki; Kawai, Wataru; Ishizuki, Shigeru; Kawasaki, Nobuo; Kume, Etsuo; Adachi, Masaaki; Ogasawara, Shinobu

2000-03-01

Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 12 codes in fiscal 1998. These results are reported in 3 parts, i.e., the vectorization and parallelization on vector processors part, the parallelization on scalar processors part and the porting part. In this report, we describe the porting. In this porting part, the porting of Monte Carlo N-Particle Transport code MCNP4B2 and Reactor Safety Analysis code RELAP5 on the AP3000 are described. In the vectorization and parallelization on vector processors part, the vectorization of General Tokamak Circuit Simulation Program code GTCSP, the vectorization and parallelization of Molecular Dynamics Ntv Simulation code MSP2, Eddy Current Analysis code EDDYCAL, Thermal Analysis Code for Test of Passive Cooling System by HENDEL T2 code THANPACST2 and MHD Equilibrium code SELENEJ on the VPP500 are described. In the parallelization on scalar processors part, the parallelization of Monte Carlo N-Particle Transport code MCNP4B2, Plasma Hydrodynamics code using Cubic Interpolated propagation Method PHCIP and Vectorized Monte Carlo code (continuous energy model/multi-group model) MVP/GMVP on the Paragon are described. (author)
Validation of clinical testing for warfarin sensitivity: comparison of CYP2C9-VKORC1 genotyping assays and warfarin-dosing algorithms.

Science.gov (United States)

Langley, Michael R; Booker, Jessica K; Evans, James P; McLeod, Howard L; Weck, Karen E

2009-05-01

Responses to warfarin (Coumadin) anticoagulation therapy are affected by genetic variability in both the CYP2C9 and VKORC1 genes. Validation of pharmacogenetic testing for warfarin responses includes demonstration of analytical validity of testing platforms and of the clinical validity of testing. We compared four platforms for determining the relevant single nucleotide polymorphisms (SNPs) in both CYP2C9 and VKORC1 that are associated with warfarin sensitivity (Third Wave Invader Plus, ParagonDx/Cepheid Smart Cycler, Idaho Technology LightCycler, and AutoGenomics Infiniti). Each method was examined for accuracy, cost, and turnaround time. All genotyping methods demonstrated greater than 95% accuracy for identifying the relevant SNPs (CYP2C9 *2 and *3; VKORC1 -1639 or 1173). The ParagonDx and Idaho Technology assays had the shortest turnaround and hands-on times. The Third Wave assay was readily scalable to higher test volumes but had the longest hands-on time. The AutoGenomics assay interrogated the largest number of SNPs but had the longest turnaround time. Four published warfarin-dosing algorithms (Washington University, UCSF, Louisville, and Newcastle) were compared for accuracy for predicting warfarin dose in a retrospective analysis of a local patient population on long-term, stable warfarin therapy. The predicted doses from both the Washington University and UCSF algorithms demonstrated the best correlation with actual warfarin doses.
Validation of Clinical Testing for Warfarin Sensitivity

Science.gov (United States)

Langley, Michael R.; Booker, Jessica K.; Evans, James P.; McLeod, Howard L.; Weck, Karen E.

2009-01-01

Responses to warfarin (Coumadin) anticoagulation therapy are affected by genetic variability in both the CYP2C9 and VKORC1 genes. Validation of pharmacogenetic testing for warfarin responses includes demonstration of analytical validity of testing platforms and of the clinical validity of testing. We compared four platforms for determining the relevant single nucleotide polymorphisms (SNPs) in both CYP2C9 and VKORC1 that are associated with warfarin sensitivity (Third Wave Invader Plus, ParagonDx/Cepheid Smart Cycler, Idaho Technology LightCycler, and AutoGenomics Infiniti). Each method was examined for accuracy, cost, and turnaround time. All genotyping methods demonstrated greater than 95% accuracy for identifying the relevant SNPs (CYP2C9 *2 and *3; VKORC1 −1639 or 1173). The ParagonDx and Idaho Technology assays had the shortest turnaround and hands-on times. The Third Wave assay was readily scalable to higher test volumes but had the longest hands-on time. The AutoGenomics assay interrogated the largest number of SNPs but had the longest turnaround time. Four published warfarin-dosing algorithms (Washington University, UCSF, Louisville, and Newcastle) were compared for accuracy for predicting warfarin dose in a retrospective analysis of a local patient population on long-term, stable warfarin therapy. The predicted doses from both the Washington University and UCSF algorithms demonstrated the best correlation with actual warfarin doses. PMID:19324988
Simulated Lunar Testing of Metabolic Heat Regenerated Temperature Swing Adsorption

Science.gov (United States)

Padilla, Sebastian A.; Bower, Chad E.; Iacomini, Christie S.; Paul, Heather L.

2012-01-01

Metabolic heat regenerated Temperature Swing Adsorption (MTSA) technology is being developed for thermal and carbon dioxide (CO2) control for a Portable Life Support System (PLSS), as well as water recycling. An Engineering Development Unit (EDU) of the MTSA Subassembly (MTSAS) was designed and assembled for optimized Martian operations, but also meets system requirements for lunar operations. For lunar operations the MTSA sorption cycle is driven via a vacuum swing between suit ventilation loop pressure and lunar vacuum. The focus of this effort was testing in a simulated lunar environment. This environment was simulated in Paragon's EHF vacuum chamber. The objective of the testing was to evaluate the full cycle performance of the MTSA Subassembly EDU, and to assess CO2 loading and pressure drop of the wash coated aluminum reticulated foam sorbent bed. Lunar environment testing proved out the feasibility of pure vacuum swing operation, making MTSA a technology that can be tested and used on the Moon prior to going to Mars. Testing demonstrated better than expected CO2 Nomenclature loading on the sorbent and nearly replicates the equilibrium data from the sorbent manufacturer. This exceeded any of the previous sorbent loading tests performed by Paragon. Subsequently, the increased performance of the sorbent bed design indicates future designs will require less mass and volume than the current EDU rendering MTSA as very competitive for Martian PLSS applications.
Perfmon2: a leap forward in performance monitoring

International Nuclear Information System (INIS)

Jarp, S; Jurga, R; Nowak, A

2008-01-01

This paper describes the software component, perfmon2, that is about to be added to the Linux kernel as the standard interface to the Performance Monitoring Unit (PMU) on common processors, including x86 (AMD and Intel), Sun SPARC, MIPS, IBM Power and Intel Itanium. It also describes a set of tools for doing performance monitoring in practice and details how the CERN openlab team has participated in the testing and development of these tools
Perfmon2: a leap forward in performance monitoring

Energy Technology Data Exchange (ETDEWEB)

Jarp, S; Jurga, R; Nowak, A [CERN, Geneva (Switzerland)], E-mail: Sverre.Jarp@cern.ch

2008-07-15

This paper describes the software component, perfmon2, that is about to be added to the Linux kernel as the standard interface to the Performance Monitoring Unit (PMU) on common processors, including x86 (AMD and Intel), Sun SPARC, MIPS, IBM Power and Intel Itanium. It also describes a set of tools for doing performance monitoring in practice and details how the CERN openlab team has participated in the testing and development of these tools.

Perfmon2 a leap forward in performance monitoring

CERN Document Server

Jarp, S; Nowak, A

2008-01-01

This paper describes the software component, perfmon2, that is about to be added to the Linux kernel as the standard interface to the Performance Monitoring Unit (PMU) on common processors, including x86 (AMD and Intel), Sun SPARC, MIPS, IBM Power and Intel Itanium. It also describes a set of tools for doing performance monitoring in practice and details how the CERN openlab team has participated in the testing and development of these tools.
OpenMP Parallelization and Optimization of Graph-based Machine Learning Algorithms

Science.gov (United States)

2016-05-01

Understanding Application Data Movement Characteristics using Intel VTune Amplifier and Software Development Emulator tools, Intel Xeon Phi User Group...sured by a summation of the weights along the graph cut) for this problem. This is equivalent to assigning a scalar or vector value ui to each i th data...graph Laplacian [9]. By projecting all vectors onto this sub-eigenspace, the iteration step reduces to a simple coefficient update. 2.2 Semi-supervised
Perfmon2: a leap forward in performance monitoring

Science.gov (United States)

Jarp, S.; Jurga, R.; Nowak, A.

2008-07-01

This paper describes the software component, perfmon2, that is about to be added to the Linux kernel as the standard interface to the Performance Monitoring Unit (PMU) on common processors, including x86 (AMD and Intel), Sun SPARC, MIPS, IBM Power and Intel Itanium. It also describes a set of tools for doing performance monitoring in practice and details how the CERN openlab team has participated in the testing and development of these tools.
Contracting in Complex Operations: Toward Developing a Contracting Framework for Security Sector Reconstruction and Reform

Science.gov (United States)

2014-01-01

MoI 49 Table A–2. Tactical Level Participants. (41 Total) Intel NDS ANA ASOF ANCOP AUP ABP ALP Embedded Advisors and Trainers (15) Colonel USARNG...Province) 50 Table A–2. Tactical Level Participants (page 2 of 2) Intel NDS ANA ASOF ANCOP AUP ABP ALP Duty TitleOrganizationRank...advising deployments *** Served on both strategic and tactical levels ABP : Afghan Border Police ANCOP: Afghan Civil Order Police CAAT: COIN Advise and
Electronics Industry Study Report: Semiconductors and Defense Electronics

Science.gov (United States)

2003-01-01

Access Memory (DRAM) chips and microprocessors. Samsung , Micron, Hynix, and Infineon control almost three-fourths of the DRAM market,8 while Intel alone...Country 2001 Sales ($B) 2002 Sales ($B) % Change % 2002 Mkt 1 1 Intel U.S. 23.7 24.0 1% 16.9% 2 3 Samsung Semiconductor S. Korea 6.3...located in four major regions: the United States, Europe, Japan, and the Asia-Pacific region (includes South Korea, China, Singapore, Malaysia , Taiwan
Crosstalk: The Journal of Defense Software Engineering. Volume 22, Number 2, February 2009

Science.gov (United States)

2009-02-01

possible system attacks. by Ron Greenfield and Dr. Charley Tichenor Enforcing Static Program Properties to Enable Safety-Critical Use of Java Software...Assurance by Ron Greenfield and Dr. Charley Tichenor, and Dr. Kelvin Nilsen’s Enforcing Static Program Properties in Safety-Critical Java Software Components...01&lang=en>. 5. Shakespeare, William. The Tempest. 6. Intel. “How Chips are Made.” 2008 <www.intel.com/ education /making chips/preparation.htm>. 7
NSW Executive Enhancements

Science.gov (United States)

1981-06-01

attacked most effectively with networking. Indeed, it was the marked success of the Arpanet in providing programmers economical access to diverse... economically feasible in the present -5- -**nm- ,-- ^ NSW Final Report for the Period Ending December 1980 state of the art: it would have required...INTEL 8080 MFU. PLM80: a cross compiler for the INTEL 8080 MPU . MACR020: a cross assembler for the AN/UYK-20 computer. CMS2-M: a cross compiler
Wilson and Domainwall Kernels on Oakforest-PACS

Science.gov (United States)

Kanamori, Issaku; Matsufuru, Hideo

2018-03-01

We report the performance of Wilson and Domainwall Kernels on a new Intel Xeon Phi Knights Landing based machine named Oakforest-PACS, which is co-hosted by University of Tokyo and Tsukuba University and is currently fastest in Japan. This machine uses Intel Omni-Path for the internode network. We compare performance with several types of implementation including that makes use of the Grid library. The code is incorporated with the code set Bridge++.
A minimum operating system based on the SM5300.01 magnetic tape recorder for the Micro-8 computer

International Nuclear Information System (INIS)

Kartashov, S.V.

1987-01-01

An operating system (OS) for microcomputers based on INTEL-8080, 8085 microprocessors oriented to use a magnetic tape recorder is described. This system comprises a tape-recorder manager and a file structure organization system (nucleus of OS), a symbol text editor, a macroassembler, an interactive disasembler and a program of communication with an EC-computer. The OS makes it possible to develop, debug, store and exploit the program written in INTEL-8085 assembly language
Secure and Efficient Regression Analysis Using a Hybrid Cryptographic Framework: Development and Evaluation.

Science.gov (United States)

Sadat, Md Nazmus; Jiang, Xiaoqian; Aziz, Md Momin Al; Wang, Shuang; Mohammed, Noman

2018-03-05

Machine learning is an effective data-driven tool that is being widely used to extract valuable patterns and insights from data. Specifically, predictive machine learning models are very important in health care for clinical data analysis. The machine learning algorithms that generate predictive models often require pooling data from different sources to discover statistical patterns or correlations among different attributes of the input data. The primary challenge is to fulfill one major objective: preserving the privacy of individuals while discovering knowledge from data. Our objective was to develop a hybrid cryptographic framework for performing regression analysis over distributed data in a secure and efficient way. Existing secure computation schemes are not suitable for processing the large-scale data that are used in cutting-edge machine learning applications. We designed, developed, and evaluated a hybrid cryptographic framework, which can securely perform regression analysis, a fundamental machine learning algorithm using somewhat homomorphic encryption and a newly introduced secure hardware component of Intel Software Guard Extensions (Intel SGX) to ensure both privacy and efficiency at the same time. Experimental results demonstrate that our proposed method provides a better trade-off in terms of security and efficiency than solely secure hardware-based methods. Besides, there is no approximation error. Computed model parameters are exactly similar to plaintext results. To the best of our knowledge, this kind of secure computation model using a hybrid cryptographic framework, which leverages both somewhat homomorphic encryption and Intel SGX, is not proposed or evaluated to this date. Our proposed framework ensures data security and computational efficiency at the same time. ©Md Nazmus Sadat, Xiaoqian Jiang, Md Momin Al Aziz, Shuang Wang, Noman Mohammed. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 05.03.2018.
Logical inference techniques for loop parallelization

DEFF Research Database (Denmark)

Oancea, Cosmin Eugen; Rauchwerger, Lawrence

2012-01-01

the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S={}, where S is a set expression representing array indexes. Using...... of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECT-CLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers....
Many-Body Mean-Field Equations: Parallel implementation

International Nuclear Information System (INIS)

Vallieres, M.; Umar, S.; Chinn, C.; Strayer, M.

1993-01-01

We describe the implementation of Hartree-Fock Many-Body Mean-Field Equations on a Parallel Intel iPSC/860 hypercube. We first discuss the Nuclear Mean-Field approach in physical terms. Then we describe our parallel implementation of this approach on the Intel iPSC/860 hypercube. We discuss and compare the advantages and disadvantages of the domain partition versus the Hilbert space partition for this problem. We conclude by discussing some timing experiments on various computing platforms
Internet of Things with Intel Galileo

CERN Document Server

de Sousa, Miguel

2015-01-01

This book employs an incremental, step-by-step approach to get you familiarized with everything from the basic terms, board components, and development environments to developing real projects. Each project will demonstrate how to use specific board components and tools. Both Galileo and Galileo Gen 2 are covered in this book.
Recent advances in PC-Linux systems for electronic structure computations by optimized compilers and numerical libraries.

Science.gov (United States)

Yu, Jen-Shiang K; Yu, Chin-Hui

2002-01-01

One of the most frequently used packages for electronic structure research, GAUSSIAN 98, is compiled on Linux systems with various hardware configurations, including AMD Athlon (with the "Thunderbird" core), AthlonMP, and AthlonXP (with the "Palomino" core) systems as well as the Intel Pentium 4 (with the "Willamette" core) machines. The default PGI FORTRAN compiler (pgf77) and the Intel FORTRAN compiler (ifc) are respectively employed with different architectural optimization options to compile GAUSSIAN 98 and test the performance improvement. In addition to the BLAS library included in revision A.11 of this package, the Automatically Tuned Linear Algebra Software (ATLAS) library is linked against the binary executables to improve the performance. Various Hartree-Fock, density-functional theories, and the MP2 calculations are done for benchmarking purposes. It is found that the combination of ifc with ATLAS library gives the best performance for GAUSSIAN 98 on all of these PC-Linux computers, including AMD and Intel CPUs. Even on AMD systems, the Intel FORTRAN compiler invariably produces binaries with better performance than pgf77. The enhancement provided by the ATLAS library is more significant for post-Hartree-Fock calculations. The performance on one single CPU is potentially as good as that on an Alpha 21264A workstation or an SGI supercomputer. The floating-point marks by SpecFP2000 have similar trends to the results of GAUSSIAN 98 package.
FPGAs Emulate Microprocessors-A Successful Case for HFC NPP Digital I and C Upgrade

International Nuclear Information System (INIS)

Hsu, Allen; Crow, Ivan; Reese, Carl; Kim, Jong; Yang, Steve

2014-01-01

Field Programmable Gate Arrays (FPGAs), as programmable logic devices (PLDs) have gained a great deal of interests for implementing safety I and C applications in nuclear power plants (NPPs) largely owing to the FPGAs'potential advantage over the currently more common microprocessor-based digital I and C applications. First of all, FPGAs have adequate capabilities for most digital I and C applications in NPPs. Secondly, FPGAs provide products with longer lifetime, improve testability, and reduce the drift which occurs in analog-based systems, from hardware perspective. Thirdly, FPGAs, from software perspective, can be made simpler, less reliant on complex software such as operating systems, which should make FPGAs easier to qualify for nuclear safety applications. Fourthly, FPGAs are less vulnerable to cyber attacks when FPGAs implement the I and C systems that do not contain high-level, general purpose software that may be easily subjected to malicious modifications. Finally, FPGAs can bring cost reduction in an I and C digital upgrade because FPGAs can provide simpler licensing process than microprocessor-based digital I and C, and FPGAs can be implemented more efficiently. This paper will present one successful case for YGN Unit I and C upgrade using FPGA-based components to replace the obsolete Intel 8085 Microprocessor-based controllers. In this case, FPGAs emulated the process of the existing microprocessors and interpreted the execution of CPU processing. More than 160 of the FPGA-based SBC-01 controllers replacing the Intel 8085 Microprocessor-based Printed Circuit Boards have been installed and running successfully for safety I and C applications over the last five years. In this upgrade, the new FPGA-based controller board SBC-01 emulated the functions of Intel 8085 microprocessor correctly. It is a successful and cost-effective upgrade.vIn this paper, lifecycle design and implementation process and rigorous V and V activities that were used in the
FPGAs Emulate Microprocessors-A Successful Case for HFC NPP Digital I and C Upgrade

Energy Technology Data Exchange (ETDEWEB)

Hsu, Allen; Crow, Ivan; Reese, Carl; Kim, Jong; Yang, Steve [Doosan HF Controls Corp, Carrollton (United States)

2014-08-15

Field Programmable Gate Arrays (FPGAs), as programmable logic devices (PLDs) have gained a great deal of interests for implementing safety I and C applications in nuclear power plants (NPPs) largely owing to the FPGAs'potential advantage over the currently more common microprocessor-based digital I and C applications. First of all, FPGAs have adequate capabilities for most digital I and C applications in NPPs. Secondly, FPGAs provide products with longer lifetime, improve testability, and reduce the drift which occurs in analog-based systems, from hardware perspective. Thirdly, FPGAs, from software perspective, can be made simpler, less reliant on complex software such as operating systems, which should make FPGAs easier to qualify for nuclear safety applications. Fourthly, FPGAs are less vulnerable to cyber attacks when FPGAs implement the I and C systems that do not contain high-level, general purpose software that may be easily subjected to malicious modifications. Finally, FPGAs can bring cost reduction in an I and C digital upgrade because FPGAs can provide simpler licensing process than microprocessor-based digital I and C, and FPGAs can be implemented more efficiently. This paper will present one successful case for YGN Unit I and C upgrade using FPGA-based components to replace the obsolete Intel 8085 Microprocessor-based controllers. In this case, FPGAs emulated the process of the existing microprocessors and interpreted the execution of CPU processing. More than 160 of the FPGA-based SBC-01 controllers replacing the Intel 8085 Microprocessor-based Printed Circuit Boards have been installed and running successfully for safety I and C applications over the last five years. In this upgrade, the new FPGA-based controller board SBC-01 emulated the functions of Intel 8085 microprocessor correctly. It is a successful and cost-effective upgrade.vIn this paper, lifecycle design and implementation process and rigorous V and V activities that were used in the
Artillery Survivability Model

Science.gov (United States)

2016-06-01

the simulation actually runs, but the graphic card does not render. Another feature of Unity is that the user can see profile data in Unity Editor...optimized the number of waypoints on a laptop with an integrated graphic card Intel HD4400. A dedicated graphic card would be better and faster. 47...manipulating x, y, and z values of the three-dimensional thread vector on a laptop that runs an Intel HD4400 on-board graphic card . In this function
Parallel transposition of sparse data structures

DEFF Research Database (Denmark)

Wang, Hao; Liu, Weifeng; Hou, Kaixi

2016-01-01

Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel tr...... transposition in the latest vendor-supplied library on an Intel multicore CPU platform, and the MergeTrans approach achieves on average of 3.4-fold (up to 11.7-fold) speedup on an Intel Xeon Phi many-core processor....
Supercomputing for molecular dynamics simulations handling multi-trillion particles in nanofluidics

CERN Document Server

Heinecke, Alexander; Horsch, Martin; Bungartz, Hans-Joachim

2015-01-01

This work presents modern implementations of relevant molecular dynamics algorithms using ls1 mardyn, a simulation program for engineering applications. The text focuses strictly on HPC-related aspects, covering implementation on HPC architectures, taking Intel Xeon and Intel Xeon Phi clusters as representatives of current platforms. The work describes distributed and shared-memory parallelization on these platforms, including load balancing, with a particular focus on the efficient implementation of the compute kernels. The text also discusses the software-architecture of the resulting code.
Particle In Cell Codes on Highly Parallel Architectures

Science.gov (United States)

Tableman, Adam

2014-10-01

We describe strategies and examples of Particle-In-Cell Codes running on Nvidia GPU and Intel Phi architectures. This includes basic implementations in skeletons codes and full-scale development versions (encompassing 1D, 2D, and 3D codes) in Osiris. Both the similarities and differences between Intel's and Nvidia's hardware will be examined. Work supported by grants NSF ACI 1339893, DOE DE SC 000849, DOE DE SC 0008316, DOE DE NA 0001833, and DOE DE FC02 04ER 54780.

Lenovo Group Ltd. : Achieving Competitive Advantages for its Hardware Business in Emerging Markets by Developing a Sustainable Business Model for its Software and Peripherals Business-The Malaysia Scenario.

OpenAIRE

Chin, Andrew Beng Huat

2010-01-01

From a humble beginning in late 1984, not much dufferent from how Hewlett-Packard Co. and Apple Inc. began from their family home garages, Lenovo Group Ltd., previously known as Legend Group Ltd. until 2003, is today the pride and joy in home country the People's Republic of China. By 1998, Lenovo had shipped its millionth personal computer (PC), and then Intel chairman, Andy Grove took the Legend PC back for Intel's museum collection. In 2003, Lenovo built and launched its second supercomput...
Vectorization, parallelization and porting of nuclear codes (vectorization and parallelization). Progress report fiscal 1998

International Nuclear Information System (INIS)

Ishizuki, Shigeru; Kawai, Wataru; Nemoto, Toshiyuki; Ogasawara, Shinobu; Kume, Etsuo; Adachi, Masaaki; Kawasaki, Nobuo; Yatake, Yo-ichi

2000-03-01

Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 12 codes in fiscal 1998. These results are reported in 3 parts, i.e., the vectorization and parallelization on vector processors part, the parallelization on scalar processors part and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of General Tokamak Circuit Simulation Program code GTCSP, the vectorization and parallelization of Molecular Dynamics NTV (n-particle, Temperature and Velocity) Simulation code MSP2, Eddy Current Analysis code EDDYCAL, Thermal Analysis Code for Test of Passive Cooling System by HENDEL T2 code THANPACST2 and MHD Equilibrium code SELENEJ on the VPP500 are described. In the parallelization on scalar processors part, the parallelization of Monte Carlo N-Particle Transport code MCNP4B2, Plasma Hydrodynamics code using Cubic Interpolated Propagation Method PHCIP and Vectorized Monte Carlo code (continuous energy model / multi-group model) MVP/GMVP on the Paragon are described. In the porting part, the porting of Monte Carlo N-Particle Transport code MCNP4B2 and Reactor Safety Analysis code RELAP5 on the AP3000 are described. (author)
Comparison of performance of three commercial platforms for warfarin sensitivity genotyping.

Science.gov (United States)

Babic, Nikolina; Haverfield, Eden V; Burrus, Julie A; Lozada, Anthony; Das, Soma; Yeo, Kiang-Teck J

2009-08-01

We performed a 3-way comparison on the Osmetech eSensor, AutoGenomics INFINITI, and a real-time PCR method (Paragonx reagents/Stratagene RT-PCR platform) for their FDA-cleared warfarin panels, and additional polymorphisms (CYP2C9*5, *6, and 11 and extended VKORC1 panels) where available. One hundred de-identified DNA samples were used in this IRB-approved study. Accuracy was determined by comparison of genotyping results across three platforms. Any discrepancy was resolved by bi-directional sequencing. The CYP4F2 on Osmetech was validated by bi-directional sequencing. Accuracies for CYP2C9*2 and *3 were 100% for all 3 platforms. VKORC1 3673 genotyping accuracies were 100% on eSensor and 97% on Infiniti. CYP2C9*5, *6 and *11 showed 100% concordance between eSensor and Infiniti. VKORC1 6484 and 9041 variants compared between ParagonDx and Infiniti analyzer were 100% (6484) and 99% (9041) concordant. CYP4F2 was 100% concordant with sequencing results. The time required to generate the results from automated DNA extraction-to-result was approximately 8h on Infiniti, and 4h on eSensor and ParagonDx, respectively. Overall, we observed excellent CYP2C9*2 and *3 genotyping accuracy for all three platforms. For VKORC1 3673 genotyping, eSensor demonstrated a slightly higher accuracy than the Infiniti, and CYP4F2 on Osmetech was 100% accurate.
Simulated Lunar Testing of Metabolic Heat Regenerated Temperature Swing Adsorption Technology

Science.gov (United States)

Padilla, Sebastian A.; Bower, Chad; Iacomini, Christie S.; Paul, H.

2011-01-01

Metabolic heat regenerated Temperature Swing Adsorption (MTSA) technology is being developed for thermal and carbon dioxide (CO2) control for a Portable Life Support System (PLSS), as well as water recycling. An Engineering Development Unit (EDU) of the MTSA subassembly was designed and assembled for optimized Martian operations, but also meets system requirements for lunar operations. For lunar operations the MTSA sorption cycle is driven via a vacuum swing between suit ventilation loop pressure and lunar vacuum. The focus of this effort is operations and testing in a simulated lunar environment. This environment was simulated in Paragon s EHF vacuum chamber. The objective of this testing was to evaluate the full cycle performance of the MTSA Subassembly EDU, and to assess CO2 loading and pressure drop of the wash coated aluminum reticulated foam sorbent bed. The lunar testing proved out the feasibility of pure vacuum swing operation, making MTSA a technology that can be tested and used on the Moon prior to going to Mars. Testing demonstrated better than expected CO2 loading on the sorbent and nearly replicates the equilibrium data from the sorbent manufacturer. This had not been achieved in any of the previous sorbent loading tests performed by Paragon. Subsequently, the increased performance of the sorbent bed design indicates future designs will require less mass and volume than the current EDU rendering MTSA as very competitive for Martian PLSS applications.
Wisdom comes with age?

CERN Multimedia

2009-01-01

‘A relativistic generalization of the Navier-Stokes equations to quark-gluon plasmas’ – the work of a CERN physicist perhaps? No, actually it is the title of a high school student’s project! Thirteen of the world’s brightest young scientific minds were recently treated to a tour of CERN. The Bulletin finds out more. The Intel ISEF students during their visit to CERN.Thirteen science wunderkinds came to CERN for a three-day visit on 29 June. The high school students, aged between 16 and 18, were all winners of this year’s Intel International Science and Engineering Fair (Intel ISEF), the world’s largest pre-college science competition. As part of their prize they won a visit to CERN organized by the CERN openlab collaboration (see box). "The whole trip has been incredible, and this is my first time in Europe as well so that makes it even more exciting," said Ryan Alexander, just 16 years old, who won in the Energy and Tr...
Web interfaces to relational databases

Science.gov (United States)

Carlisle, W. H.

1996-01-01

This reports on a project to extend the capabilities of a Virtual Research Center (VRC) for NASA's Advanced Concepts Office. The work was performed as part of NASA's 1995 Summer Faculty Fellowship program and involved the development of a prototype component of the VRC - a database system that provides data creation and access services within a room of the VRC. In support of VRC development, NASA has assembled a laboratory containing the variety of equipment expected to be used by scientists within the VRC. This laboratory consists of the major hardware platforms, SUN, Intel, and Motorola processors and their most common operating systems UNIX, Windows NT, Windows for Workgroups, and Macintosh. The SPARC 20 runs SUN Solaris 2.4, an Intel Pentium runs Windows NT and is installed on a different network from the other machines in the laboratory, a Pentium PC runs Windows for Workgroups, two Intel 386 machines run Windows 3.1, and finally, a PowerMacintosh and a Macintosh IIsi run MacOS.
Writing parallel programs that work

CERN Multimedia

CERN. Geneva

2012-01-01

Serial algorithms typically run inefficiently on parallel machines. This may sound like an obvious statement, but it is the root cause of why parallel programming is considered to be difficult. The current state of the computer industry is still that almost all programs in existence are serial. This talk will describe the techniques used in the Intel Parallel Studio to provide a developer with the tools necessary to understand the behaviors and limitations of the existing serial programs. Once the limitations are known the developer can refactor the algorithms and reanalyze the resulting programs with the tools in the Intel Parallel Studio to create parallel programs that work. About the speaker Paul Petersen is a Sr. Principal Engineer in the Software and Solutions Group (SSG) at Intel. He received a Ph.D. degree in Computer Science from the University of Illinois in 1993. After UIUC, he was employed at Kuck and Associates, Inc. (KAI) working on auto-parallelizing compiler (KAP), and was involved in th...
First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

CERN Document Server

Halyo, V.; Lujan, P.; Karpusenko, V.; Vladimirov, A.

2014-04-07

Recent innovations focused around {\\em parallel} processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA's Tesla Graphics Processing Unit (GPU) or Intel's \\xphi, in the High Level Trigger. These accelerators have the potential to provide faster or more energy efficient event selection, thus opening up possibilities for new complex triggers that were not previously feasible. At the same time, it is crucial to explore the performance limits achievable on the latest generation multicore CPUs with the use of the best software optimization methods. In this article, a new tracking algorithm based on the Hough transform will be evaluated for the first time on a multi-core Intel Xeon E5-2697v2 CPU, an NVIDIA Tesla K20c GPU, and an Intel \\x...
Comparison of Software Technologies for Vectorization and Parallelization

CERN Document Server

Lazzaro, Alfio; Nowak, Andrzej; Valsan, Liviu

2012-01-01

This paper demonstrates how modern software development methodologies can be used to give an existing sequential application a considerable performance speed-up on modern x86 server systems. Whereas, in the past, speed-up was directly linked to the increase in clock frequency when moving to a more modern system, current x86 servers present a plethora of “performance dimensions” that need to be harnessed with great care. The application we used is a real-life data analysis example in C++ analyzing High Energy Physics data. The key software methods used are OpenMP, Intel Threading Building Blocks (TBB), Intel Cilk Plus, and the auto-vectorization capability of the Intel compiler (Composer XE). Somewhat surprisingly, the Message Passing Interface (MPI) is successfully added, although our focus is on single-node rather than multi-node performance optimization. The paper underlines the importance of algorithmic redesign in order to optimize each performance dimension and links this to close control of the memo...
Comparison of the new intermediate complex atmospheric research (ICAR) model with the WRF model in a mesoscale catchment in Central Europe

Science.gov (United States)

Härer, Stefan; Bernhardt, Matthias; Gutmann, Ethan; Bauer, Hans-Stefan; Schulz, Karsten

2017-04-01

Until recently, a large gap existed in the atmospheric downscaling strategies. On the one hand, computationally efficient statistical approaches are widely used, on the other hand, dynamic but CPU-intensive numeric atmospheric models like the weather research and forecast (WRF) model exist. The intermediate complex atmospheric research (ICAR) model developed at NCAR (Boulder, Colorado, USA) addresses this gap by combining the strengths of both approaches: the process-based structure of a dynamic model and its applicability in a changing climate as well as the speed of a parsimonious modelling approach which facilitates the modelling of ensembles and a straightforward way to test new parametrization schemes as well as various input data sources. However, the ICAR model has not been tested in Europe and on slightly undulated terrain yet. This study now evaluates for the first time the ICAR model to WRF model runs in Central Europe comparing a complete year of model results in the mesoscale Attert catchment (Luxembourg). In addition to these modelling results, we also describe the first implementation of ICAR on an Intel Phi architecture and consequently perform speed tests between the Vienna cluster, a standard workstation and the use of an Intel Phi coprocessor. Finally, the study gives an outlook on sensitivity studies using slightly different input data sources.
Parallel Programming Application to Matrix Algebra in the Spectral Method for Control Systems Analysis, Synthesis and Identification

Directory of Open Access Journals (Sweden)

V. Yu. Kleshnin

2016-01-01

Full Text Available The article describes the matrix algebra libraries based on the modern technologies of parallel programming for the Spectrum software, which can use a spectral method (in the spectral form of mathematical description to analyse, synthesise and identify deterministic and stochastic dynamical systems. The developed matrix algebra libraries use the following technologies for the GPUs: OmniThreadLibrary, OpenMP, Intel Threading Building Blocks, Intel Cilk Plus for CPUs nVidia CUDA, OpenCL, and Microsoft Accelerated Massive Parallelism.The developed libraries support matrices with real elements (single and double precision. The matrix dimensions are limited by 32-bit or 64-bit memory model and computer configuration. These libraries are general-purpose and can be used not only for the Spectrum software. They can also find application in the other projects where there is a need to perform operations with large matrices.The article provides a comparative analysis of the libraries developed for various matrix operations (addition, subtraction, scalar multiplication, multiplication, powers of matrices, tensor multiplication, transpose, inverse matrix, finding a solution of the system of linear equations through the numerical experiments using different CPU and GPU. The article contains sample programs and performance test results for matrix multiplication, which requires most of all computational resources in regard to the other operations.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Architectures

Energy Technology Data Exchange (ETDEWEB)

Cerati, Giuseppe [Fermilab; Elmer, Peter [Princeton U.; Krutelyov, Slava [UC, San Diego; Lantz, Steven [Cornell U., Phys. Dept.; Lefebvre, Matthieu [Princeton U.; Masciovecchio, Mario [UC, San Diego; McDermott, Kevin [Cornell U., Phys. Dept.; Riley, Daniel [Cornell U., Phys. Dept.; Tadel, Matevž [UC, San Diego; Wittich, Peter [Cornell U., Phys. Dept.; Würthwein, Frank [UC, San Diego; Yagil, Avi [UC, San Diego

2017-11-16

Faced with physical and energy density limitations on clock speed, contemporary microprocessor designers have increasingly turned to on-chip parallelism for performance gains. Examples include the Intel Xeon Phi, GPGPUs, and similar technologies. Algorithms should accordingly be designed with ample amounts of fine-grained parallelism if they are to realize the full performance of the hardware. This requirement can be challenging for algorithms that are naturally expressed as a sequence of small-matrix operations, such as the Kalman filter methods widely in use in high-energy physics experiments. In the High-Luminosity Large Hadron Collider (HL-LHC), for example, one of the dominant computational problems is expected to be finding and fitting charged-particle tracks during event reconstruction; today, the most common track-finding methods are those based on the Kalman filter. Experience at the LHC, both in the trigger and offline, has shown that these methods are robust and provide high physics performance. Previously we reported the significant parallel speedups that resulted from our efforts to adapt Kalman-filter-based tracking to many-core architectures such as Intel Xeon Phi. Here we report on how effectively those techniques can be applied to more realistic detector configurations and event complexity.
The Development of Knowledge Management in the Oil and Gas Industry

OpenAIRE

Robert M. Grant

2013-01-01

Una revisión de las experiencias en gestión del conocimiento de las compañías BP, Royal Dutch Shell, Chevron, ExxonMobil, ConocoPhillips, Halliburton, Schlumberger, Paragon Engineering Services, BHP, Marathon Oil, y Murphy Oil identifica dos tipos de prácticas principales de gestión del conocimiento: aplicación de las tecnologías de la información y las comunicaciones para la transferencia de conocimiento explícito y el uso de técnicas de gestión del conocimiento persona a persona para facili...
Walter Pater and the Language of Sculpture

DEFF Research Database (Denmark)

Østermark-Johansen, Lene

Walter Pater and the Language of Sculpture is the first monograph to discuss the Victorian critic Walter Pater's attitude to sculpture. It brings together Pater's aesthetic theories with his theories on language and writing, to demonstrate how his ideas of the visual and written language...... the idea of rivalry (paragone) more broadly, examining Pater's concern with positioning himself as an art critic in the late Victorian art world. Situating Pater within centuries of European aesthetic theories as never before done, Walter Pater and the Language of Sculpture throws new light...
Large-D gravity and low-D strings.

Science.gov (United States)

Emparan, Roberto; Grumiller, Daniel; Tanabe, Kentaro

2013-06-21

We show that in the limit of a large number of dimensions a wide class of nonextremal neutral black holes has a universal near-horizon limit. The limiting geometry is the two-dimensional black hole of string theory with a two-dimensional target space. Its conformal symmetry explains the properties of massless scalars found recently in the large-D limit. For black branes with string charges, the near-horizon geometry is that of the three-dimensional black strings of Horne and Horowitz. The analogies between the α' expansion in string theory and the large-D expansion in gravity suggest a possible effective string description of the large-D limit of black holes. We comment on applications to several subjects, in particular to the problem of critical collapse.
Field Marshal Sir William J. Slim - Paragon of Moral and Ethical Courage

National Research Council Canada - National Science Library

Baylor, Richard

1998-01-01

... during the most desperate and brutal times. This paper looks closely at Field Marshal Slim's ethical development and leadership during his younger years, his senior leader years, and his later years...
Algunos aspectos sobre blockchains y smart contracts en educación superior

OpenAIRE

Amorós Poveda, Lucía

2018-01-01

Els conceptes de cadenes de blocs (blockchains) i contractes intel·ligents (smart contracts) ofereixen una alternativa sostenible en educació superior. Des d’aquest objectiu, es presenta una revisió d’ambdós conceptes i la seva relació amb els termes bitcoin, ledger, edublock i educoin. En un segon moment, s’atén a les xarxes en educació superior basades en tecnologia de cadenes de blocs, el seu vincle amb els contractes intel·ligents i les possibilitats a dia d’avui.
Emotion Regulation Training for Treating Warfighters with Combat-Related PTSD Using Real-Time fMRI and EEG-Assisted Neurofeedback

Science.gov (United States)

2017-12-01

BX equation to find B while minimizing the squared difference between Y and BX. We used the Intel Math Kernel Library (Intel® Math Kernel Library...virtually eliminate distance-dependent motion artifacts in resting state fMRI. J. Appl. Math . 935154 (2013). Lowe, M.J., Mock, B.J., Sorenson, J.A., 1998...feature is IC energy. The energy of a discrete time signal of x is defined by: Ex ∑∞ n=−∞ |x [n] |2 (2) Kurtosis can also be used for separating
Parallel algorithms for geometric connected component labeling on a hypercube multiprocessor

Science.gov (United States)

Belkhale, K. P.; Banerjee, P.

1992-01-01

Different algorithms for the geometric connected component labeling (GCCL) problem are defined each of which involves d stages of message passing, for a d-dimensional hypercube. The major idea is that in each stage a hypercube multiprocessor increases its knowledge of domain. The algorithms under consideration include the QUAD algorithm for small number of processors and the Overlap Quad algorithm for large number of processors, subject to the locality of the connected sets. These algorithms differ in their run time, memory requirements, and message complexity. They were implemented on an Intel iPSC2/D4/MX hypercube.
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

Science.gov (United States)

Rao, Hariprasad Nannapaneni

1989-01-01

The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.

GeantV: from CPU to accelerators

Science.gov (United States)

Amadio, G.; Ananya, A.; Apostolakis, J.; Arora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Sehgal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

2016-10-01

The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs.
GeantV: from CPU to accelerators

International Nuclear Information System (INIS)

Amadio, G; Bianchini, C; Iope, R; Ananya, A; Arora, A; Apostolakis, J; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S; Lima, G; Duhem, L

2016-01-01

The GeantV project aims to research and develop the next-generation simulation software describing the passage of particles through matter. While the modern CPU architectures are being targeted first, resources such as GPGPU, Intel© Xeon Phi, Atom or ARM cannot be ignored anymore by HEP CPU-bound applications. The proof of concept GeantV prototype has been mainly engineered for CPU's having vector units but we have foreseen from early stages a bridge to arbitrary accelerators. A software layer consisting of architecture/technology specific backends supports currently this concept. This approach allows to abstract out the basic types such as scalar/vector but also to formalize generic computation kernels using transparently library or device specific constructs based on Vc, CUDA, Cilk+ or Intel intrinsics. While the main goal of this approach is portable performance, as a bonus, it comes with the insulation of the core application and algorithms from the technology layer. This allows our application to be long term maintainable and versatile to changes at the backend side. The paper presents the first results of basket-based GeantV geometry navigation on the Intel© Xeon Phi KNC architecture. We present the scalability and vectorization study, conducted using Intel performance tools, as well as our preliminary conclusions on the use of accelerators for GeantV transport. We also describe the current work and preliminary results for using the GeantV transport kernel on GPUs. (paper)
Programación de Aplicaciones OpenCV sobre Sistemas Heterogéneos SoC-FPGA

OpenAIRE

Sanchis Cases, Francisco José

2014-01-01

OpenCV es una biblioteca de primitivas de procesado de imagen que permite crear algoritmos de Visión por Computador de última generación. OpenCV fue desarrollado originalmente por Intel en 1999 para mostrar la capacidad de procesamiento de los micros de Intel, por lo que la mayoría de la biblioteca está optimizada para correr en estos micros, incluyendo las extensiones MMX y SSE. http://en.wikipedia.org/wiki/OpenCV. Actualmente es ampliamente utilizada tanto por la comunidad científica como p...
High Performance Computing and Visualization Infrastructure for Simultaneous Parallel Computing and Parallel Visualization Research

Science.gov (United States)

2016-11-09

Total Number: Sub Contractors (DD882) Names of Personnel receiving masters degrees Names of personnel receiving PHDs Names of other research staff...Broadcom 5720 QP 1Gb Network Daughter Card (2) Intel Xeon E5-2680 v3 2.5GHz, 30M Cache, 9.60GT/s QPI, Turbo, HT , 12C/24T (120W...Broadcom 5720 QP 1Gb Network Daughter Card (2) Intel Xeon E5-2680 v3 2.5GHz, 30M Cache, 9.60GT/s QPI, Turbo, HT , 12C/24T (120W
Anomaly detection in smart city wireless sensor networks

OpenAIRE

Garcia Font, Víctor

2017-01-01

Aquesta tesi proposa una plataforma de detecció d’intrusions per a revelar atacs a les xarxes de sensors sense fils (WSN, per les sigles en anglès) de les ciutats intel·ligents (smart cities). La plataforma està dissenyada tenint en compte les necessitats dels administradors de la ciutat intel·ligent, els quals necessiten accés a una arquitectura centralitzada que pugui gestionar alarmes de seguretat en un sistema altament heterogeni i distribuït. En aquesta tesi s’identifiquen els diversos p...
Anomaly detection in smart city wireless sensor networks

OpenAIRE

García Font, Víctor

2017-01-01

Aquesta tesi proposa una plataforma de detecció d'intrusions per a revelar atacs a les xarxes de sensors sense fils (WSN, per les sigles en anglès) de les ciutats intel·ligents (smart cities). La plataforma està dissenyada tenint en compte les necessitats dels administradors de la ciutat intel·ligent, els quals necessiten accés a una arquitectura centralitzada que pugui gestionar alarmes de seguretat en un sistema altament heterogeni i distribuït. En aquesta tesi s'identifiquen els diversos p...
Parallel algorithms for continuum dynamics

International Nuclear Information System (INIS)

Hicks, D.L.; Liebrock, L.M.

1987-01-01

Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors
Case for a field-programmable gate array multicore hybrid machine for an image-processing application

Science.gov (United States)

Rakvic, Ryan N.; Ives, Robert W.; Lira, Javier; Molina, Carlos

2011-01-01

General purpose computer designers have recently begun adding cores to their processors in order to increase performance. For example, Intel has adopted a homogeneous quad-core processor as a base for general purpose computing. PlayStation3 (PS3) game consoles contain a multicore heterogeneous processor known as the Cell, which is designed to perform complex image processing algorithms at a high level. Can modern image-processing algorithms utilize these additional cores? On the other hand, modern advancements in configurable hardware, most notably field-programmable gate arrays (FPGAs) have created an interesting question for general purpose computer designers. Is there a reason to combine FPGAs with multicore processors to create an FPGA multicore hybrid general purpose computer? Iris matching, a repeatedly executed portion of a modern iris-recognition algorithm, is parallelized on an Intel-based homogeneous multicore Xeon system, a heterogeneous multicore Cell system, and an FPGA multicore hybrid system. Surprisingly, the cheaper PS3 slightly outperforms the Intel-based multicore on a core-for-core basis. However, both multicore systems are beaten by the FPGA multicore hybrid system by >50%.
Globe hosts launch of new processor

CERN Multimedia

2006-01-01

Launch of the quadecore processor chip at the Globe. On 14 November, in a series of major media events around the world, the chip-maker Intel launched its new 'quadcore' processor. For the regions of Europe, the Middle East and Africa, the day-long launch event took place in CERN's Globe of Science and Innovation, with over 30 journalists in attendance, coming from as far away as Johannesburg and Dubai. CERN was a significant choice for the event: the first tests of this new generation of processor in Europe had been made at CERN over the preceding months, as part of CERN openlab, a research partnership with leading IT companies such as Intel, HP and Oracle. The event also provided the opportunity for the journalists to visit ATLAS and the CERN Computer Centre. The strategy of putting multiple processor cores on the same chip, which has been pursued by Intel and other chip-makers in the last few years, represents an important departure from the more traditional improvements in the sheer speed of such chips. ...
A comparative critical analysis of modern task-parallel runtimes.

Energy Technology Data Exchange (ETDEWEB)

Wheeler, Kyle Bruce; Stark, Dylan; Murphy, Richard C.

2012-12-01

The rise in node-level parallelism has increased interest in task-based parallel runtimes for a wide array of application areas. Applications have a wide variety of task spawning patterns which frequently change during the course of application execution, based on the algorithm or solver kernel in use. Task scheduling and load balance regimes, however, are often highly optimized for specific patterns. This paper uses four basic task spawning patterns to quantify the impact of specific scheduling policy decisions on execution time. We compare the behavior of six publicly available tasking runtimes: Intel Cilk, Intel Threading Building Blocks (TBB), Intel OpenMP, GCC OpenMP, Qthreads, and High Performance ParalleX (HPX). With the exception of Qthreads, the runtimes prove to have schedulers that are highly sensitive to application structure. No runtime is able to provide the best performance in all cases, and those that do provide the best performance in some cases, unfortunately, provide extremely poor performance when application structure does not match the schedulers assumptions.
Unstructured Computational Aerodynamics on Many Integrated Core Architecture

KAUST Repository

Al Farhan, Mohammed A.

2016-06-08

Shared memory parallelization of the flux kernel of PETSc-FUN3D, an unstructured tetrahedral mesh Euler flow code previously studied for distributed memory and multi-core shared memory, is evaluated on up to 61 cores per node and up to 4 threads per core. We explore several thread-level optimizations to improve flux kernel performance on the state-of-the-art many integrated core (MIC) Intel processor Xeon Phi “Knights Corner,” with a focus on strong thread scaling. While the linear algebraic kernel is bottlenecked by memory bandwidth for even modest numbers of cores sharing a common memory, the flux kernel, which arises in the control volume discretization of the conservation law residuals and in the formation of the preconditioner for the Jacobian by finite-differencing the conservation law residuals, is compute-intensive and is known to exploit effectively contemporary multi-core hardware. We extend study of the performance of the flux kernel to the Xeon Phi in three thread affinity modes, namely scatter, compact, and balanced, in both offload and native mode, with and without various code optimizations to improve alignment and reduce cache coherency penalties. Relative to baseline “out-of-the-box” optimized compilation, code restructuring optimizations provide about 3.8x speedup using the offload mode and about 5x speedup using the native mode. Even with these gains for the flux kernel, with respect to execution time the MIC simply achieves par with optimized compilation on a contemporary multi-core Intel CPU, the 16-core Sandy Bridge E5 2670. Nevertheless, the optimizations employed to reduce the data motion and cache coherency protocol penalties of the MIC are expected to be of value for CFD and many other unstructured applications as many-core architecture evolves. We explore large-scale distributed-shared memory performance on the Cray XC40 supercomputer, to demonstrate that optimizations employed on Phi hybridize to this context, where each of
Unstructured Computational Aerodynamics on Many Integrated Core Architecture

KAUST Repository

Al Farhan, Mohammed A.; Kaushik, Dinesh K.; Keyes, David E.

2016-01-01

Shared memory parallelization of the flux kernel of PETSc-FUN3D, an unstructured tetrahedral mesh Euler flow code previously studied for distributed memory and multi-core shared memory, is evaluated on up to 61 cores per node and up to 4 threads per core. We explore several thread-level optimizations to improve flux kernel performance on the state-of-the-art many integrated core (MIC) Intel processor Xeon Phi “Knights Corner,” with a focus on strong thread scaling. While the linear algebraic kernel is bottlenecked by memory bandwidth for even modest numbers of cores sharing a common memory, the flux kernel, which arises in the control volume discretization of the conservation law residuals and in the formation of the preconditioner for the Jacobian by finite-differencing the conservation law residuals, is compute-intensive and is known to exploit effectively contemporary multi-core hardware. We extend study of the performance of the flux kernel to the Xeon Phi in three thread affinity modes, namely scatter, compact, and balanced, in both offload and native mode, with and without various code optimizations to improve alignment and reduce cache coherency penalties. Relative to baseline “out-of-the-box” optimized compilation, code restructuring optimizations provide about 3.8x speedup using the offload mode and about 5x speedup using the native mode. Even with these gains for the flux kernel, with respect to execution time the MIC simply achieves par with optimized compilation on a contemporary multi-core Intel CPU, the 16-core Sandy Bridge E5 2670. Nevertheless, the optimizations employed to reduce the data motion and cache coherency protocol penalties of the MIC are expected to be of value for CFD and many other unstructured applications as many-core architecture evolves. We explore large-scale distributed-shared memory performance on the Cray XC40 supercomputer, to demonstrate that optimizations employed on Phi hybridize to this context, where each of
A distributed microcomputer-controlled system for data acquisition and power spectral analysis of EEG.

Science.gov (United States)

Vo, T D; Dwyer, G; Szeto, H H

1986-04-01

A relatively powerful and inexpensive microcomputer-based system for the spectral analysis of the EEG is presented. High resolution and speed is achieved with the use of recently available large-scale integrated circuit technology with enhanced functionality (INTEL Math co-processors 8087) which can perform transcendental functions rapidly. The versatility of the system is achieved with a hardware organization that has distributed data acquisition capability performed by the use of a microprocessor-based analog to digital converter with large resident memory (Cyborg ISAAC-2000). Compiled BASIC programs and assembly language subroutines perform on-line or off-line the fast Fourier transform and spectral analysis of the EEG which is stored as soft as well as hard copy. Some results obtained from test application of the entire system in animal studies are presented.
Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks

Science.gov (United States)

Saini, Subhash; Ciotti, Robert; Gunney, Brian T. N.; Spelce, Thomas E.; Koniges, Alice; Dossa, Don; Adamidis, Panagiotis; Rabenseifner, Rolf; Tiyyagura, Sunil R.; Mueller, Matthias;

2006-01-01

The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers - SGI Altix BX2, Cray XI, Cray Opteron Cluster, Dell Xeon cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks (IMB) results to study the performance of 11 MPI communication functions on these systems.

Optimization of the Brillouin operator on the KNL architecture

Science.gov (United States)

Dürr, Stephan

2018-03-01

Experiences with optimizing the matrix-times-vector application of the Brillouin operator on the Intel KNL processor are reported. Without adjustments to the memory layout, performance figures of 360 Gflop/s in single and 270 Gflop/s in double precision are observed. This is with Nc = 3 colors, Nv = 12 right-hand-sides, Nthr = 256 threads, on lattices of size 323 × 64, using exclusively OMP pragmas. Interestingly, the same routine performs quite well on Intel Core i7 architectures, too. Some observations on the much harderWilson fermion matrix-times-vector optimization problem are added.
OntoWEDSS - An Ontology-based Environmental Decision-Support System for the management of Wastewater treatment plants

OpenAIRE

Ceccaroni, Luigi

2001-01-01

Les contribucions d'aquesta tesi uneixen dues disciplines: ciències ambientals (específicament, gestió d'aigües residuals) i informàtica (específicament, intel·ligència artificial). El tractament d'aigües residuals com a disciplina opera fent servir un rang de diferents enfocaments i mètodes que inclouen: control manual, control automàtic on-line, modelat numèric o no-numèric, models estadístics i simulacions. La tesi caracteritza la recerca interdisciplinària de tècniques d'intel·ligència ar...
Formas contemporâneas de relação entre capital e tecnicidade : estudo sobre a gênese de microprocessadores de licença proprietária e livre

OpenAIRE

Stefano Schiavetto Amancio

2014-01-01

Resumo: O objeto desta dissertação consiste no estudo da relação entre tecnicidade e capital a partir da concretização dos microprocessadores da empresa Intel, no período 1971-1999, e dos microprocessadores da comunidade de hardware livre OpenCores, no período 1999-2013, e como essas têm convertido tais objetos técnicos em capital. Compara-se, de um lado, como a empresa Intel têm registrado seus microprocessadores em licenças proprietárias e investido numa indústria internacional para comerci...
Appropriating A Female Voice: Nicholas Breton And The Countess Of Pembroke

Directory of Open Access Journals (Sweden)

DASCĂL REGHINA

2014-12-01

Full Text Available The sixteenth century author Nicholas Breton appropriates a female voice in many of his writings, among which Marie Magdalens Loue and The Pilgrimage to Paradise joyned with the Countesse of Penbrookes Loue feature prominently. The Countess of Pembroke, celebrated by Aemilia Lanyer in her Salve Deus Rex Judaeorum as a paragon of female religious devotion, is often associated in Breton's texts with Mary Magdalene. This paper will analyse some of the anxieties engendered by this appropriation of voice and of the Magdalene figure, anxieties that prove to be disruptive of Elizabethan gender hierarchies.
Walter Pater and the Language of Sculpture

DEFF Research Database (Denmark)

Østermark-Johansen, Lene

are closely linked. Going beyond Pater's views on sculpture as an art form, this study traces the notion of relief (rilievo) and hybrid form in Pater, and his view of the writer as sculptor, a carver in language. Alongside her treatment of rilievo as a pervasive trope, Lene Østermark-Johansen also employs...... the idea of rivalry (paragone) more broadly, examining Pater's concern with positioning himself as an art critic in the late Victorian art world. Situating Pater within centuries of European aesthetic theories as never before done, Walter Pater and the Language of Sculpture throws new light...
The Strange Birth of Liberal Denmark

DEFF Research Database (Denmark)

Henriksen, Ingrid; Lampe, Markus; Sharp, Paul Richard

The usual story of the "first era of globalization" at the end of the nineteenth century sees Denmark as something as an outlier: a country which, like Britain, resisted the globalization backlash in the wake of the inflow of cheap grain from the New World, but where agriculture, rather than going...... into decline, in fact flourished. Key to the success of Danish agriculture was an early diversification towards dairy production. We dispute this simple story which sees Denmark as something of a liberal paragon. Denmark's success owed much to a prudent use of trade policy which favoured dairy production...

Wafer of Intel Pentium 4 Prescott Chips

CERN Multimedia

Silicon wafer with hundreds of Penryn cores (microprocessor). There are around four times as many Prescott chips can be made per wafer than with the previous generation of Northwood-core Pentium 4 processors. It is faster and cheaper.
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

Science.gov (United States)

Manolakos, Elias S.

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

Science.gov (United States)

Sharma, Anuj; Manolakos, Elias S

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.
Large deviations

CERN Document Server

Varadhan, S R S

2016-01-01

The theory of large deviations deals with rates at which probabilities of certain events decay as a natural parameter in the problem varies. This book, which is based on a graduate course on large deviations at the Courant Institute, focuses on three concrete sets of examples: (i) diffusions with small noise and the exit problem, (ii) large time behavior of Markov processes and their connection to the Feynman-Kac formula and the related large deviation behavior of the number of distinct sites visited by a random walk, and (iii) interacting particle systems, their scaling limits, and large deviations from their expected limits. For the most part the examples are worked out in detail, and in the process the subject of large deviations is developed. The book will give the reader a flavor of how large deviation theory can help in problems that are not posed directly in terms of large deviations. The reader is assumed to have some familiarity with probability, Markov processes, and interacting particle systems.
Multimicroprocessor system for high-energy physics experiment applications

International Nuclear Information System (INIS)

Piska, K.; Falkenberg, W.; Glasneck, C.P.; Pflugbeil, W.

1982-01-01

An autonomous modular multicomputer system based on the INTEL 8080 for program development and for application to the high-energy physics experiment 'RISK' is presented. The associated microcomputers (a three-processor configuration is realized) with uniform software systems can perform, in parallel, the interactively-controlled processing and monitoring of data accessible in the common memory block coupled to the processors via the direct shared bus. Data are acquired into the common memory buffer by the main processor, which is linked by the CAMAC interface with the experimental apparatus and optionally with a large-size computer. One microcomputer can be connected with the magnetic tape unit used for data recording. (orig.)
A new nonlinear conjugate gradient coefficient under strong Wolfe-Powell line search

Science.gov (United States)

Mohamed, Nur Syarafina; Mamat, Mustafa; Rivaie, Mohd

2017-08-01

A nonlinear conjugate gradient method (CG) plays an important role in solving a large-scale unconstrained optimization problem. This method is widely used due to its simplicity. The method is known to possess sufficient descend condition and global convergence properties. In this paper, a new nonlinear of CG coefficient βk is presented by employing the Strong Wolfe-Powell inexact line search. The new βk performance is tested based on number of iterations and central processing unit (CPU) time by using MATLAB software with Intel Core i7-3470 CPU processor. Numerical experimental results show that the new βk converge rapidly compared to other classical CG method.
Accelerating Climate and Weather Simulations through Hybrid Computing

Science.gov (United States)

Zhou, Shujia; Cruz, Carlos; Duffy, Daniel; Tucker, Robert; Purcell, Mark

2011-01-01

Unconventional multi- and many-core processors (e.g. IBM (R) Cell B.E.(TM) and NVIDIA (R) GPU) have emerged as effective accelerators in trial climate and weather simulations. Yet these climate and weather models typically run on parallel computers with conventional processors (e.g. Intel, AMD, and IBM) using Message Passing Interface. To address challenges involved in efficiently and easily connecting accelerators to parallel computers, we investigated using IBM's Dynamic Application Virtualization (TM) (IBM DAV) software in a prototype hybrid computing system with representative climate and weather model components. The hybrid system comprises two Intel blades and two IBM QS22 Cell B.E. blades, connected with both InfiniBand(R) (IB) and 1-Gigabit Ethernet. The system significantly accelerates a solar radiation model component by offloading compute-intensive calculations to the Cell blades. Systematic tests show that IBM DAV can seamlessly offload compute-intensive calculations from Intel blades to Cell B.E. blades in a scalable, load-balanced manner. However, noticeable communication overhead was observed, mainly due to IP over the IB protocol. Full utilization of IB Sockets Direct Protocol and the lower latency production version of IBM DAV will reduce this overhead.
Cross-categorization of legal concepts across boundaries of legal systems: in consideration of inferential links

DEFF Research Database (Denmark)

Glückstad, Fumiko Kano; Herlau, Tue; Schmidt, Mikkel Nørgaard

2014-01-01

This work contrasts Giovanni Sartor’s view of inferential semantics of legal concepts (Sartor in Artif Intell Law 17:217–251, 2009) with a probabilistic model of theory formation (Kemp et al. in Cognition 114:165–196, 2010). The work further explores possibilities of implementing Kemp’s probabili......This work contrasts Giovanni Sartor’s view of inferential semantics of legal concepts (Sartor in Artif Intell Law 17:217–251, 2009) with a probabilistic model of theory formation (Kemp et al. in Cognition 114:165–196, 2010). The work further explores possibilities of implementing Kemp...... and Griffiths in Behav Brain Sci 4:629–640, 2001), the probabilistic model of theory formation, i.e., the Infinite Relational Model (IRM) first introduced by Kemp et al. (The twenty-first national conference on artificial intelligence, 2006, Cognition 114:165–196, 2010) and its extended model, i.e., the normal...... to the International Standard Classification of Education. The main contribution of this work is the proposal of a conceptual framework of the cross-categorization approach that, inspired by Sartor (Artif Intell Law 17:217–251, 2009), attempts to explain reasoner’s inferential mechanisms....
Time-domain seismic modeling in viscoelastic media for full waveform inversion on heterogeneous computing platforms with OpenCL

Science.gov (United States)

Fabien-Ouellet, Gabriel; Gloaguen, Erwan; Giroux, Bernard

2017-03-01

Full Waveform Inversion (FWI) aims at recovering the elastic parameters of the Earth by matching recordings of the ground motion with the direct solution of the wave equation. Modeling the wave propagation for realistic scenarios is computationally intensive, which limits the applicability of FWI. The current hardware evolution brings increasing parallel computing power that can speed up the computations in FWI. However, to take advantage of the diversity of parallel architectures presently available, new programming approaches are required. In this work, we explore the use of OpenCL to develop a portable code that can take advantage of the many parallel processor architectures now available. We present a program called SeisCL for 2D and 3D viscoelastic FWI in the time domain. The code computes the forward and adjoint wavefields using finite-difference and outputs the gradient of the misfit function given by the adjoint state method. To demonstrate the code portability on different architectures, the performance of SeisCL is tested on three different devices: Intel CPUs, NVidia GPUs and Intel Xeon PHI. Results show that the use of GPUs with OpenCL can speed up the computations by nearly two orders of magnitudes over a single threaded application on the CPU. Although OpenCL allows code portability, we show that some device-specific optimization is still required to get the best performance out of a specific architecture. Using OpenCL in conjunction with MPI allows the domain decomposition of large models on several devices located on different nodes of a cluster. For large enough models, the speedup of the domain decomposition varies quasi-linearly with the number of devices. Finally, we investigate two different approaches to compute the gradient by the adjoint state method and show the significant advantages of using OpenCL for FWI.
Le tableau vivant chez Raoul Ruiz : l’extension de la perception

OpenAIRE

Robert, Valentine

2016-01-01

Le tableau vivant (qui avait cours au XIXe siècle et qui consiste à faire incarner des compositions célèbres par des figurants immobiles, tenant la pose) est travaillé en motif par le cinéma de Raoul Ruiz. Chacune de ses ressources esthétiques y est explorée: sa valeur de "simulacre", de "paragone", de "réincarnation", mais aussi et surtout de dispositif de regard. Cette dernière dimension est au centre de cet article, qui étudie – surtout à partir de L'Hypothèse du tableau volé (1979), de Gé...
Humans, elephants, diamonds and gold: patterns of intentional design in Girolamo Cardano's natural philosophy.

Science.gov (United States)

Giglioni, Guido

2014-01-01

Distancing himself from both Aristotelian and Epicurean models of natural change, and resisting delusions of anthropocentric grandeur, Cardano advanced a theory of teleology centred on the notion of non-human selfhood. In keeping with Plato, he argued that nature was ruled by the mind, meaning by "mind" a universal paragon of intelligibility instantiated through patterns of purposive action ("noetic" teleology). This allowed Cardano to defend a theory of natural finalism in which life was regarded as a primordial attribute of being, already in evidence in the most elementary forms of nature, whose main categories were ability to feign, self-interest, self-preservation and indefinite persistence.
Large Neighborhood Search

DEFF Research Database (Denmark)

Pisinger, David; Røpke, Stefan

2010-01-01

Heuristics based on large neighborhood search have recently shown outstanding results in solving various transportation and scheduling problems. Large neighborhood search methods explore a complex neighborhood by use of heuristics. Using large neighborhoods makes it possible to find better...... candidate solutions in each iteration and hence traverse a more promising search path. Starting from the large neighborhood search method,we give an overview of very large scale neighborhood search methods and discuss recent variants and extensions like variable depth search and adaptive large neighborhood...
CPU and GPU (Cuda Template Matching Comparison

Directory of Open Access Journals (Sweden)

Evaldas Borcovas

2014-05-01

Full Text Available Image processing, computer vision or other complicated opticalinformation processing algorithms require large resources. It isoften desired to execute algorithms in real time. It is hard tofulfill such requirements with single CPU processor. NVidiaproposed CUDA technology enables programmer to use theGPU resources in the computer. Current research was madewith Intel Pentium Dual-Core T4500 2.3 GHz processor with4 GB RAM DDR3 (CPU I, NVidia GeForce GT320M CUDAcompliable graphics card (GPU I and Intel Core I5-2500K3.3 GHz processor with 4 GB RAM DDR3 (CPU II, NVidiaGeForce GTX 560 CUDA compatible graphic card (GPU II.Additional libraries as OpenCV 2.1 and OpenCV 2.4.0 CUDAcompliable were used for the testing. Main test were made withstandard function MatchTemplate from the OpenCV libraries.The algorithm uses a main image and a template. An influenceof these factors was tested. Main image and template have beenresized and the algorithm computing time and performancein Gtpix/s have been measured. According to the informationobtained from the research GPU computing using the hardwarementioned earlier is till 24 times faster when it is processing abig amount of information. When the images are small the performanceof CPU and GPU are not significantly different. Thechoice of the template size makes influence on calculating withCPU. Difference in the computing time between the GPUs canbe explained by the number of cores which they have.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

Science.gov (United States)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; Masciovecchio, Mario; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2017-08-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

Directory of Open Access Journals (Sweden)

Cerati Giuseppe

2017-01-01

Full Text Available For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU, ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC, for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

Energy Technology Data Exchange (ETDEWEB)

Cerati, Giuseppe [Fermilab; Elmer, Peter [Princeton U.; Krutelyov, Slava [UC, San Diego; Lantz, Steven [Cornell U.; Lefebvre, Matthieu [Princeton U.; Masciovecchio, Mario [UC, San Diego; McDermott, Kevin [Cornell U.; Riley, Daniel [Cornell U., LNS; Tadel, Matevž [UC, San Diego; Wittich, Peter [Cornell U.; Würthwein, Frank [UC, San Diego; Yagil, Avi [UC, San Diego

2017-01-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem

KAUST Repository

Haidar, Azzam

2012-01-01

Classical solvers for the dense symmetric eigenvalue problem suffer from the first step, which involves a reduction to tridiagonal form that is dominated by the cost of accessing memory during the panel factorization. The solution is to reduce the matrix to a banded form, which then requires the eigenvalues of the banded matrix to be computed. The standard divide and conquer algorithm can be modified for this purpose. The paper combines this insight with tile algorithms that can be scheduled via a dynamic runtime system to multicore architectures. A detailed analysis of performance and accuracy is included. Performance improvements of 14-fold and 4-fold speedups are reported relative to LAPACK and Intel\\'s Math Kernel Library.
Running Boot Camp

CERN Document Server

Toporek, Chuck

2008-01-01

When Steve Jobs jumped on stage at Macworld San Francisco 2006 and announced the new Intel-based Macs, the question wasn't if, but when someone would figure out a hack to get Windows XP running on these new "Mactels." Enter Boot Camp, a new system utility that helps you partition and install Windows XP on your Intel Mac. Boot Camp does all the heavy lifting for you. You won't need to open the Terminal and hack on system files or wave a chicken bone over your iMac to get XP running. This free program makes it easy for anyone to turn their Mac into a dual-boot Windows/OS X machine. Running Bo
MILC Code Performance on High End CPU and GPU Supercomputer Clusters

Science.gov (United States)

DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug

2018-03-01

With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.
MILC Code Performance on High End CPU and GPU Supercomputer Clusters

Directory of Open Access Journals (Sweden)

DeTar Carleton

2018-01-01

Full Text Available With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.

Computing on Knights and Kepler Architectures

International Nuclear Information System (INIS)

Bortolotti, G; Caberletti, M; Ferraro, A; Giacomini, F; Manzali, M; Maron, G; Salomoni, D; Crimi, G; Zanella, M

2014-01-01

A recent trend in scientific computing is the increasingly important role of co-processors, originally built to accelerate graphics rendering, and now used for general high-performance computing. The INFN Computing On Knights and Kepler Architectures (COKA) project focuses on assessing the suitability of co-processor boards for scientific computing in a wide range of physics applications, and on studying the best programming methodologies for these systems. Here we present in a comparative way our results in porting a Lattice Boltzmann code on two state-of-the-art accelerators: the NVIDIA K20X, and the Intel Xeon-Phi. We describe our implementations, analyze results and compare with a baseline architecture adopting Intel Sandy Bridge CPUs.
Applications and development of communication models for the touchstone GAMMA and DELTA prototypes

Science.gov (United States)

Seidel, Steven R.

1993-01-01

The goal of this project was to develop models of the interconnection networks of the Intel iPSC/860 and DELTA multicomputers to guide the design of efficient algorithms for interprocessor communication in problems that commonly occur in CFD codes and other applications. Interprocessor communication costs of codes for message-passing architectures such as the iPSC/860 and DELTA significantly affect the level of performance that can be obtained from those machines. This project addressed several specific problems in the achievement of efficient communication on the Intel iPSC/860 hypercube and DELTA mesh. In particular, an efficient global processor synchronization algorithm was developed for the iPSC/860 and numerous broadcast algorithms were designed for the DELTA.
Experiences and results multitasking a hydrodynamics code on global and local memory machines

International Nuclear Information System (INIS)

Mandell, D.

1987-01-01

A one-dimensional, time-dependent Lagrangian hydrodynamics code using a Godunov solution method has been multimasked for the Cray X-MP/48, the Intel iPSC hypercube, the Alliant FX series and the IBM RP3 computers. Actual multitasking results have been obtained for the Cray, Intel and Alliant computers and simulated results were obtained for the Cray and RP3 machines. The differences in the methods required to multitask on each of the machines is discussed. Results are presented for a sample problem involving a shock wave moving down a channel. Comparisons are made between theoretical speedups, predicted by Amdahl's law, and the actual speedups obtained. The problems of debugging on the different machines are also described
SPP: A data base processor data communications protocol

Science.gov (United States)

Fishwick, P. A.

1983-01-01

The design and implementation of a data communications protocol for the Intel Data Base Processor (DBP) is defined. The protocol is termed SPP (Service Port Protocol) since it enables data transfer between the host computer and the DBP service port. The protocol implementation is extensible in that it is explicitly layered and the protocol functionality is hierarchically organized. Extensive trace and performance capabilities have been supplied with the protocol software to permit optional efficient monitoring of the data transfer between the host and the Intel data base processor. Machine independence was considered to be an important attribute during the design and implementation of SPP. The protocol source is fully commented and is included in Appendix A of this report.
Accelerating Twisted Mass LQCD with QPhiX

Energy Technology Data Exchange (ETDEWEB)

Schröck, Mario [INFN, Rome3; Simula, Silvano [INFN, Rome3; Strelchenko, Alexei [Fermilab

2016-07-08

We present the implementation of twisted mass fermion operators for the QPhiX library. We analyze the performance on the Intel Xeon Phi (Knights Corner) coprocessor as well as on Intel Xeon Haswell CPUs. In particular, we demonstrate that on the Xeon Phi 7120P the Dslash kernel is able to reach 80\\% of the theoretical peak bandwidth, while on a Xeon Haswell E5-2630 CPU our generated code for the Dslash operator with AVX2 instructions outperforms the corresponding implementation in the tmLQCD library by a factor of $\\sim 5\\times$ in single precision. We strong scale the code up to 6.8 (14.1) Tflops in single (half) precision on 64 Xeon Haswell CPUs.
Navier-Stokes Simulation of Airconditioning Facility of a Large Modem Computer Room

Science.gov (United States)

2005-01-01

NASA recently assembled one of the world's fastest operational supercomputers to meet the agency's new high performance computing needs. This large-scale system, named Columbia, consists of 20 interconnected SGI Altix 512-processor systems, for a total of 10,240 Intel Itanium-2 processors. High-fidelity CFD simulations were performed for the NASA Advanced Supercomputing (NAS) computer room at Ames Research Center. The purpose of the simulations was to assess the adequacy of the existing air handling and conditioning system and make recommendations for changes in the design of the system if needed. The simulations were performed with NASA's OVERFLOW-2 CFD code which utilizes overset structured grids. A new set of boundary conditions were developed and added to the flow solver for modeling the roomls air-conditioning and proper cooling of the equipment. Boundary condition parameters for the flow solver are based on cooler CFM (flow rate) ratings and some reasonable assumptions of flow and heat transfer data for the floor and central processing units (CPU) . The geometry modeling from blue prints and grid generation were handled by the NASA Ames software package Chimera Grid Tools (CGT). This geometric model was developed as a CGT-scripted template, which can be easily modified to accommodate any changes in shape and size of the room, locations and dimensions of the CPU racks, disk racks, coolers, power distribution units, and mass-storage system. The compute nodes are grouped in pairs of racks with an aisle in the middle. High-speed connection cables connect the racks with overhead cable trays. The cool air from the cooling units is pumped into the computer room from a sub-floor through perforated floor tiles. The CPU cooling fans draw cool air from the floor tiles, which run along the outside length of each rack, and eject warm air into the center isle between the racks. This warm air is eventually drawn into the cooling units located near the walls of the room. One
Large scale electrolysers

International Nuclear Information System (INIS)

B Bello; M Junker

2006-01-01

Hydrogen production by water electrolysis represents nearly 4 % of the world hydrogen production. Future development of hydrogen vehicles will require large quantities of hydrogen. Installation of large scale hydrogen production plants will be needed. In this context, development of low cost large scale electrolysers that could use 'clean power' seems necessary. ALPHEA HYDROGEN, an European network and center of expertise on hydrogen and fuel cells, has performed for its members a study in 2005 to evaluate the potential of large scale electrolysers to produce hydrogen in the future. The different electrolysis technologies were compared. Then, a state of art of the electrolysis modules currently available was made. A review of the large scale electrolysis plants that have been installed in the world was also realized. The main projects related to large scale electrolysis were also listed. Economy of large scale electrolysers has been discussed. The influence of energy prices on the hydrogen production cost by large scale electrolysis was evaluated. (authors)
Parallel discrete ordinates algorithms on distributed and common memory systems

International Nuclear Information System (INIS)

Wienke, B.R.; Hiromoto, R.E.; Brickner, R.G.

1987-01-01

The S/sub n/ algorithm employs iterative techniques in solving the linear Boltzmann equation. These methods, both ordered and chaotic, were compared on both the Denelcor HEP and the Intel hypercube. Strategies are linked to the organization and accessibility of memory (common memory versus distributed memory architectures), with common concern for acquisition of global information. Apart from this, the inherent parallelism of the algorithm maps directly onto the two architectures. Results comparing execution times, speedup, and efficiency are based on a representative 16-group (full upscatter and downscatter) sample problem. Calculations were performed on both the Los Alamos National Laboratory (LANL) Denelcor HEP and the LANL Intel hypercube. The Denelcor HEP is a 64-bit multi-instruction, multidate MIMD machine consisting of up to 16 process execution modules (PEMs), each capable of executing 64 processes concurrently. Each PEM can cooperate on a job, or run several unrelated jobs, and share a common global memory through a crossbar switch. The Intel hypercube, on the other hand, is a distributed memory system composed of 128 processing elements, each with its own local memory. Processing elements are connected in a nearest-neighbor hypercube configuration and sharing of data among processors requires execution of explicit message-passing constructs
Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates

KAUST Repository

Malas, T.

2015-07-02

The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak performance. A key ingredient is the reduction of data traffic across slow data paths, especially the main memory interface. In this work we combine the ideas of multicore wavefront temporal blocking and diamond tiling to arrive at stencil update schemes that show large reductions in memory pressure compared to existing approaches. The resulting schemes show performance advantages in bandwidth-starved situations, which are exacerbated by the high bytes per lattice update case of variable coefficients. Our thread groups concept provides a controllable trade-off between concurrency and memory usage, shifting the pressure between the memory interface and the CPU. We present performance results on a contemporary Intel processor.
Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates

KAUST Repository

Malas, T.; Hager, G.; Ltaief, Hatem; Stengel, H.; Wellein, G.; Keyes, David E.

2015-01-01

The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak performance. A key ingredient is the reduction of data traffic across slow data paths, especially the main memory interface. In this work we combine the ideas of multicore wavefront temporal blocking and diamond tiling to arrive at stencil update schemes that show large reductions in memory pressure compared to existing approaches. The resulting schemes show performance advantages in bandwidth-starved situations, which are exacerbated by the high bytes per lattice update case of variable coefficients. Our thread groups concept provides a controllable trade-off between concurrency and memory usage, shifting the pressure between the memory interface and the CPU. We present performance results on a contemporary Intel processor.
Control and stimulation tools (cat) for the PC modelers

International Nuclear Information System (INIS)

Chan, K.S.; Lea, K.C.

1990-01-01

For the last couple of years, the personal computer technology has received a steady stream of improvement in CPU processing power, fast floating coprocessor, super graphics to a very large and fast hard drive. Since Intel began shipping its 80386 version of CPU, it became practical to develop or execute a substantial amount of power plant software on a personal computer. With the introduction of the RISC type personal workstation, complete simulators based on these new generation of computers will soon become a reality. As of today, although almost anybody can afford a personal computer, simulation supporting software is still rare or non-existent at all. The CAST has been designed to support those users who want to develop or debug large power plant simulation programs on a personal computer or workstation. Another separate paper will also be presented to demonstrate the real world development and debugging tools offered by CAST
Case Summary: Settlement Reached at Middlefield-Ellis-Whisman (MEW) Study Area to Address TCE Contamination

Science.gov (United States)

Case summary of the first amended consent decree with Intel Corporation and Raytheon Company to address trichloroethylene (TC) contamination in residential and commercial buildings in Mountain View, California
Real-time operating system for selected Intel processors

Science.gov (United States)

Pool, W. R.

1980-01-01

The rationale for system development is given along with reasons for not using vendor supplied operating systems. Although many system design and performance goals were dictated by problems with vendor supplied systems, other goals surfaced as a result of a design for a custom system able to span multiple projects. System development and management problems and areas that required redesign or major code changes for system implementation are examined as well as the relative successes of the initial projects. A generic description of the actual project is provided and the ongoing support requirements and future plans are discussed.
Lexicografia i intel·ligència artificial

Directory of Open Access Journals (Sweden)

Ramon Cerdà Massó

2015-10-01

Full Text Available Unlike most of traditional approaches to grammar, today's formal and computational models make use of the lexicon as an essential part of the whole system. Differences between current theoretical and computational models are more and more insignificant albeit the latter are perhaps still characterized by yielding full priority to minimalist solutions over more abstract considerations such as exhaustivity or formal elegance. The paper introduces some strategies in linguistic computation for natural language processing (NLP according to a typology based on functional complexity and describing in every case the scope and the role performed by the dictionary: pattern matching, semàntic grammars, syntactic parsers, augmented transition networks, unification formalisms, case frame grammars, etc. It ends with an exploration into lexicographically oriented procedures of conceptual dependency, which are found beyond the NLP, deeply within the domain of artificial intelligence.
Intel vahetab Pentiumi uute protsessorite vastu / Kuldar Kullasepp

Index Scriptorium Estoniae

Kullasepp, Kuldar, 1980-

2006-01-01

Vt. ka Postimees : na russkom jazõke 19. juuni, lk. 14. Maailma suurim protsessoritootja kavatseb aasta jooksul pea täielikult üle minna mitme tuumaga protsessorite tootmisele, rääkis Inteli firma Euroopa turundusjuht Alex Roessler
Evaluation of the Intel Westmere-EP server processor

CERN Document Server

Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

2010-01-01

In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing the 6-core “Westmere-EP” processor with Intel’s previous generation of the same microarchitecture, the “Nehalem-EP”. The former is produced in a new 32nm process, the latter in 45nm. Both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores via Simultaneous Multi-Threading (SMT), the cache sizes available, the memory configuration installed, as well...
The large-s field-reversed configuration experiment

International Nuclear Information System (INIS)

Hoffman, A.L.; Carey, L.N.; Crawford, E.A.; Harding, D.G.; DeHart, T.E.; McDonald, K.F.; McNeil, J.L.; Milroy, R.D.; Slough, J.T.; Maqueda, R.; Wurden, G.A.

1993-01-01

The Large-s Experiment (LSX) was built to study the formation and equilibrium properties of field-reversed configurations (FRCs) as the scale size increases. The dynamic, field-reversed theta-pinch method of FRC creation produces axial and azimuthal deformations and makes formation difficult, especially in large devices with large s (number of internal gyroradii) where it is difficult to achieve initial plasma uniformity. However, with the proper technique, these formation distortions can be minimized and are then observed to decay with time. This suggests that the basic stability and robustness of FRCs formed, and in some cases translated, in smaller devices may also characterize larger FRCs. Elaborate formation controls were included on LSX to provide the initial uniformity and symmetry necessary to minimize formation disturbances, and stable FRCs could be formed up to the design goal of s = 8. For x ≤ 4, the formation distortions decayed away completely, resulting in symmetric equilibrium FRCs with record confinement times up to 0.5 ms, agreeing with previous empirical scaling laws (τ∝sR). Above s = 4, reasonably long-lived (up to 0.3 ms) configurations could still be formed, but the initial formation distortions were so large that they never completely decayed away, and the equilibrium confinement was degraded from the empirical expectations. The LSX was only operational for 1 yr, and it is not known whether s = 4 represents a fundamental limit for good confinement in simple (no ion beam stabilization) FRCs or whether it simply reflects a limit of present formation technology. Ideally, s could be increased through flux buildup from neutral beams. Since the addition of kinetic or beam ions will probably be desirable for heating, sustainment, and further stabilization of magnetohydrodynamic modes at reactor-level s values, neutral beam injection is the next logical step in FRC development. 24 refs., 21 figs., 2 tabs
Large electrostatic accelerators

International Nuclear Information System (INIS)

Jones, C.M.

1984-01-01

The paper is divided into four parts: a discussion of the motivation for the construction of large electrostatic accelerators, a description and discussion of several large electrostatic accelerators which have been recently completed or are under construction, a description of several recent innovations which may be expected to improve the performance of large electrostatic accelerators in the future, and a description of an innovative new large electrostatic accelerator whose construction is scheduled to begin next year
EUV lithography for 30nm half pitch and beyond: exploring resolution, sensitivity, and LWR tradeoffs

Science.gov (United States)

Putna, E. Steve; Younkin, Todd R.; Chandhok, Manish; Frasure, Kent

2009-03-01

The International Technology Roadmap for Semiconductors (ITRS) denotes Extreme Ultraviolet (EUV) lithography as a leading technology option for realizing the 32nm half-pitch node and beyond. Readiness of EUV materials is currently one high risk area according to assessments made at the 2008 EUVL Symposium. The main development issue regarding EUV resist has been how to simultaneously achieve high sensitivity, high resolution, and low line width roughness (LWR). This paper describes the strategy and current status of EUV resist development at Intel Corporation. Data is presented utilizing Intel's Micro-Exposure Tool (MET) examining the feasibility of establishing a resist process that simultaneously exhibits <=30nm half-pitch (HP) L/S resolution at <=10mJ/cm2 with <=4nm LWR.
Fast control of a M.A. 23 manipulator

International Nuclear Information System (INIS)

Mouhamed, Mayez al.

1981-07-01

The present paper deals with the problem of the control of manipulating robots. Several methodologies used to define the basic elements of a command language are described. Our main interest lies in the movement coordination level. For this purpose we shall study in more detail the functions built around a module representing a geometrical model. Then the problem of effort analysis by computer is investigated. Two heuristic methods enabling the computation of the forces applied to the effector are proposed. The functions described above are implemented in a 16 bits microprocessor unit (INTEL 8086) associated with a floating point coprocessor (INTEL 8087). Finally, the performance of the control system, particularly the 8 ms command cycle and the low cost of system are discussed [fr

Computation cluster for Monte Carlo calculations

Energy Technology Data Exchange (ETDEWEB)

Petriska, M.; Vitazek, K.; Farkas, G.; Stacho, M.; Michalek, S. [Dep. Of Nuclear Physics and Technology, Faculty of Electrical Engineering and Information, Technology, Slovak Technical University, Ilkovicova 3, 81219 Bratislava (Slovakia)

2010-07-01

Two computation clusters based on Rocks Clusters 5.1 Linux distribution with Intel Core Duo and Intel Core Quad based computers were made at the Department of the Nuclear Physics and Technology. Clusters were used for Monte Carlo calculations, specifically for MCNP calculations applied in Nuclear reactor core simulations. Optimization for computation speed was made on hardware and software basis. Hardware cluster parameters, such as size of the memory, network speed, CPU speed, number of processors per computation, number of processors in one computer were tested for shortening the calculation time. For software optimization, different Fortran compilers, MPI implementations and CPU multi-core libraries were tested. Finally computer cluster was used in finding the weighting functions of neutron ex-core detectors of VVER-440. (authors)
Computational algorithms for simulations in atmospheric optics.

Science.gov (United States)

Konyaev, P A; Lukin, V P

2016-04-20

A computer simulation technique for atmospheric and adaptive optics based on parallel programing is discussed. A parallel propagation algorithm is designed and a modified spectral-phase method for computer generation of 2D time-variant random fields is developed. Temporal power spectra of Laguerre-Gaussian beam fluctuations are considered as an example to illustrate the applications discussed. Implementation of the proposed algorithms using Intel MKL and IPP libraries and NVIDIA CUDA technology is shown to be very fast and accurate. The hardware system for the computer simulation is an off-the-shelf desktop with an Intel Core i7-4790K CPU operating at a turbo-speed frequency up to 5 GHz and an NVIDIA GeForce GTX-960 graphics accelerator with 1024 1.5 GHz processors.
Mold heating and cooling microprocessor conversion

Science.gov (United States)

Hoffman, D. P.

1995-07-01

Conversion of the microprocessors and software for the Mold Heating and Cooling (MHAC) pump package control systems was initiated to allow required system enhancements and provide data communications capabilities with the Plastics Information and Control System (PICS). The existing microprocessor-based control systems for the pump packages use an Intel 8088-based microprocessor board with a maximum of 64 Kbytes of program memory. The requirements for the system conversion were developed, and hardware has been selected to allow maximum reuse of existing hardware and software while providing the required additional capabilities and capacity. The new hardware will incorporate an Intel 80286-based microprocessor board with an 80287 math coprocessor, the system includes additional memory, I/O, and RS232 communication ports.
Digital television: a new way to deliver information

Science.gov (United States)

Huang, Samson

1998-12-01

Digital television (DTV) is a new way to deliver video, audio, and other data. Why should TV be converted to digital? How does DTV work? What can we do with it? This paper provides some introduction about DTV, its history, and its roll-out plan. It then compares DTV with analog TV, and describes how DTV works. It also describes why the computer industry, as well as the consumer electronics industry, are both very interested I the DTV market. Next, it describes what Intel has done on DTV, including how we build a PC- based DTV, its test evaluation results, its new applications, and Intel's DTV station DMRL. This paper also describes remaining issues, our roadmap, vision, and future directions.
Computation cluster for Monte Carlo calculations

International Nuclear Information System (INIS)

Petriska, M.; Vitazek, K.; Farkas, G.; Stacho, M.; Michalek, S.

2010-01-01

Two computation clusters based on Rocks Clusters 5.1 Linux distribution with Intel Core Duo and Intel Core Quad based computers were made at the Department of the Nuclear Physics and Technology. Clusters were used for Monte Carlo calculations, specifically for MCNP calculations applied in Nuclear reactor core simulations. Optimization for computation speed was made on hardware and software basis. Hardware cluster parameters, such as size of the memory, network speed, CPU speed, number of processors per computation, number of processors in one computer were tested for shortening the calculation time. For software optimization, different Fortran compilers, MPI implementations and CPU multi-core libraries were tested. Finally computer cluster was used in finding the weighting functions of neutron ex-core detectors of VVER-440. (authors)
Ballmer, Barrett weigh in on security

CERN Multimedia

Sullivan, T

2003-01-01

ORLANDO, Fla. - Speaking in separate sessions Tuesday at the Gartner Symposium/ITxpo, Microsoft CEO Steve Ballmer and Intel's chief Craig Barrett discussed the problems of computer/network security (1/2 page).
Mobility and powering of large detectors. Moving large detectors

International Nuclear Information System (INIS)

Thompson, J.

1977-01-01

The possibility is considered of moving large lepton detectors at ISABELLE for readying new experiments, detector modifications, and detector repair. A large annex (approximately 25 m x 25 m) would be built adjacent to the Lepton Hall separated from the Lepton Hall by a wall of concrete 11 m high x 12 m wide (for clearance of the detector) and approximately 3 m thick (for radiation shielding). A large pad would support the detector, the door, the cryogenic support system and the counting house. In removing the detector from the beam hall, one would push the pad into the annex, add a dummy beam pipe, bake out the beam pipe, and restack and position the wall on a small pad at the door. The beam could then operate again while experimenters could work on the large detector in the annex. A consideration and rough price estimate of various questions and proposed solutions are given
Instantons and Large N

Science.gov (United States)

Mariño, Marcos

2015-09-01

Preface; Part I. Instantons: 1. Instantons in quantum mechanics; 2. Unstable vacua in quantum field theory; 3. Large order behavior and Borel summability; 4. Non-perturbative aspects of Yang-Mills theories; 5. Instantons and fermions; Part II. Large N: 6. Sigma models at large N; 7. The 1=N expansion in QCD; 8. Matrix models and matrix quantum mechanics at large N; 9. Large N QCD in two dimensions; 10. Instantons at large N; Appendix A. Harmonic analysis on S3; Appendix B. Heat kernel and zeta functions; Appendix C. Effective action for large N sigma models; References; Author index; Subject index.
Lattice QCD at finite temperature and density from Taylor expansion

Science.gov (United States)

Steinbrecher, Patrick

2017-01-01

In the first part, I present an overview of recent Lattice QCD simulations at finite temperature and density. In particular, we discuss fluctuations of conserved charges: baryon number, electric charge and strangeness. These can be obtained from Taylor expanding the QCD pressure as a function of corresponding chemical potentials. Our simulations were performed using quark masses corresponding to physical pion mass of about 140 MeV and allow a direct comparison to experimental data from ultra-relativistic heavy ion beams at hadron colliders such as the Relativistic Heavy Ion Collider at Brookhaven National Laboratory and the Large Hadron Collider at CERN. In the second part, we discuss computational challenges for current and future exascale Lattice simulations with a focus on new silicon developments from Intel and NVIDIA.
Performance of a parallel algorithm for solving the neutron diffusion equation on the hypercube

International Nuclear Information System (INIS)

Kirk, B.L.; Azmy, Y.Y.

1989-01-01

The one-group, steady state neutron diffusion equation in two- dimensional Cartesian geometry is solved using the nodal method technique. By decoupling sets of equations representing the neutron current continuity along the length of rows and columns of computational cells a new iterative algorithm is derived that is more suitable to solving large practical problems. This algorithm is highly parallelizable and is implemented on the Intel iPSC/2 hypercube in three versions which differ essentially in the total size of communicated data. Even though speedup was achieved, the efficiency is very low when many processors are used leading to the conclusion that the hypercube is not as well suited for this algorithm as shared memory machines. 10 refs., 1 fig., 3 tabs
Software Security and the "Building Security in Maturity" Model

CERN Document Server

CERN. Geneva

2011-01-01

Using the framework described in my book "Software Security: Building Security In" I will discuss and describe the state of the practice in software security. This talk is peppered with real data from the field, based on my work with several large companies as a Cigital consultant. As a discipline, software security has made great progress over the last decade. Of the sixty large-scale software security initiatives we are aware of, thirty-two---all household names---are currently included in the BSIMM study. Those companies among the thirty-two who graciously agreed to be identified include: Adobe, Aon, Bank of America, Capital One, The Depository Trust & Clearing Corporation (DTCC), EMC, Google, Intel, Intuit, McKesson, Microsoft, Nokia, QUALCOMM, Sallie Mae, Standard Life, SWIFT, Symantec, Telecom Italia, Thomson Reuters, VMware, and Wells Fargo. The BSIMM was created by observing and analyzing real-world data from thirty-two leading software security initiatives. The BSIMM can...
Decryption-decompression of AES protected ZIP files on GPUs

Science.gov (United States)

Duong, Tan Nhat; Pham, Phong Hong; Nguyen, Duc Huu; Nguyen, Thuy Thanh; Le, Hung Duc

2011-10-01

AES is a strong encryption system, so decryption-decompression of AES encrypted ZIP files requires very large computing power and techniques of reducing the password space. This makes implementations of techniques on common computing system not practical. In [1], we reduced the original very large password search space to a much smaller one which surely containing the correct password. Based on reduced set of passwords, in this paper, we parallel decryption, decompression and plain text recognition for encrypted ZIP files by using CUDA computing technology on graphics cards GeForce GTX295 of NVIDIA, to find out the correct password. The experimental results have shown that the speed of decrypting, decompressing, recognizing plain text and finding out the original password increases about from 45 to 180 times (depends on the number of GPUs) compared to sequential execution on the Intel Core 2 Quad Q8400 2.66 GHz. These results have demonstrated the potential applicability of GPUs in this cryptanalysis field.
High-channel-count, high-density microelectrode array for closed-loop investigation of neuronal networks.

Science.gov (United States)

Tsai, David; John, Esha; Chari, Tarun; Yuste, Rafael; Shepard, Kenneth

2015-01-01

We present a system for large-scale electrophysiological recording and stimulation of neural tissue with a planar topology. The recording system has 65,536 electrodes arranged in a 256 × 256 grid, with 25.5 μm pitch, and covering an area approximately 42.6 mm(2). The recording chain has 8.66 μV rms input-referred noise over a 100 ~ 10k Hz bandwidth while providing up to 66 dB of voltage gain. When recording from all electrodes in the array, it is capable of 10-kHz sampling per electrode. All electrodes can also perform patterned electrical microstimulation. The system produces ~ 1 GB/s of data when recording from the full array. To handle, store, and perform nearly real-time analyses of this large data stream, we developed a framework based around Xilinx FPGAs, Intel x86 CPUs and the NVIDIA Streaming Multiprocessors to interface with the electrode array.
The cost of conservative synchronization in parallel discrete event simulations

Science.gov (United States)

Nicol, David M.

1990-01-01

The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations

International Nuclear Information System (INIS)

Aktulga, Hasan Metin; Afibuzzaman, Md.; Williams, Samuel; Buluc, Aydin; Shao, Meiyue

2017-01-01

As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. Here, we consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We then present techniques to significantly improve the SpMM and the transpose operation SpMM T by using the compressed sparse blocks (CSB) format. We achieve 3-4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor.
Testing the Tester: Lessons Learned During the Testing of a State-of-the-Art Commercial 14nm Processor Under Proton Irradiation

Science.gov (United States)

Szabo, Carl M., Jr.; Duncan, Adam R.; Label, Kenneth A.

2017-01-01

Testing of an Intel 14nm desktop processor was conducted under proton irradiation. We share lessons learned, demonstrating that complex devices beget further complex challenges requiring practical and theoretical investigative expertise to solve.
Tiigrihüppe Sihtasutus alustab koostööd firmaga / Signe Teder

Index Scriptorium Estoniae

Teder, Signe

2001-01-01

Tiigrihüppe Sihtasutus alustas koos Intel Corporationìga uut koolitusprogrammi koolitajatele. Kursuse sisu ja metoodika vastab haridusministeeriumi kinnitatud õpetajatele soovitatavatele IKT-alastele pädevustele ja "Tiigrihüpe Plus" arengukavale
Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem

KAUST Repository

Haidar, Azzam; Ltaief, Hatem; Dongarra, Jack

2012-01-01

dynamic runtime system to multicore architectures. A detailed analysis of performance and accuracy is included. Performance improvements of 14-fold and 4-fold speedups are reported relative to LAPACK and Intel's Math Kernel Library.
Pooljuhtide sektorile oodatakse aasta lõpus kiiret kasvu / Annika Matson

Index Scriptorium Estoniae

Matson, Annika, 1976-

2005-01-01

Austria suurima majanduslehe Wirtschaftsblatt'i andmetel prognoosivad analüütikud kõikjal maailmas kiibitootjatele järsku käibe tõusu. Diagrammid, tabel: Intel on analüütikute soosik pooljuhtide sektoris
78 FR 9985 - Self-Regulatory Organizations; NYSE Arca, Inc.; Notice of Filing and Immediate Effectiveness of...

Science.gov (United States)

2013-02-12

... Symbol Cisco Systems, Inc CSCO Dell Inc DELL Facebook, Inc FB Intel Corporation INTC Microsoft... Reference Room, 100 F Street NE., Washington, DC 20549, on official business days between the hours of 10:00...

Benchmarking hardware architecture candidates for the NFIRAOS real-time controller

Science.gov (United States)

Smith, Malcolm; Kerley, Dan; Herriot, Glen; Véran, Jean-Pierre

2014-07-01

As a part of the trade study for the Narrow Field Infrared Adaptive Optics System, the adaptive optics system for the Thirty Meter Telescope, we investigated the feasibility of performing real-time control computation using a Linux operating system and Intel Xeon E5 CPUs. We also investigated a Xeon Phi based architecture which allows higher levels of parallelism. This paper summarizes both the CPU based real-time controller architecture and the Xeon Phi based RTC. The Intel Xeon E5 CPU solution meets the requirements and performs the computation for one AO cycle in an average of 767 microseconds. The Xeon Phi solution did not meet the 1200 microsecond time requirement and also suffered from unpredictable execution times. More detailed benchmark results are reported for both architectures.
Hardware description ADSP-21020 40-bit floating point DSP as designed in a remotely controlled digital CW Doppler radar

Science.gov (United States)

Morrison, R. E.; Robinson, S. H.

A continuous wave Doppler radar system has been designed which is portable, easily deployed, and remotely controlled. The heart of this system is a DSP/control board using Analog Devices ADSP-21020 40-bit floating point digital signal processor (DSP) microprocessor. Two 18-bit audio A/D converters provide digital input to the DSP/controller board for near real time target detection. Program memory for the DSP is dual ported with an Intel 87C51 microcontroller allowing DSP code to be up-loaded or down-loaded from a central controlling computer. The 87C51 provides overall system control for the remote radar and includes a time-of-day/day-of-year real time clock, system identification (ID) switches, and input/output (I/O) expansion by an Intel 82C55 I/O expander.
Graphics processing units accelerated semiclassical initial value representation molecular dynamics

Energy Technology Data Exchange (ETDEWEB)

Tamascelli, Dario; Dambrosio, Francesco Saverio [Dipartimento di Fisica, Università degli Studi di Milano, via Celoria 16, 20133 Milano (Italy); Conte, Riccardo [Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322 (United States); Ceotto, Michele, E-mail: michele.ceotto@unimi.it [Dipartimento di Chimica, Università degli Studi di Milano, via Golgi 19, 20133 Milano (Italy)

2014-05-07

This paper presents a Graphics Processing Units (GPUs) implementation of the Semiclassical Initial Value Representation (SC-IVR) propagator for vibrational molecular spectroscopy calculations. The time-averaging formulation of the SC-IVR for power spectrum calculations is employed. Details about the GPU implementation of the semiclassical code are provided. Four molecules with an increasing number of atoms are considered and the GPU-calculated vibrational frequencies perfectly match the benchmark values. The computational time scaling of two GPUs (NVIDIA Tesla C2075 and Kepler K20), respectively, versus two CPUs (Intel Core i5 and Intel Xeon E5-2687W) and the critical issues related to the GPU implementation are discussed. The resulting reduction in computational time and power consumption is significant and semiclassical GPU calculations are shown to be environment friendly.
Parallel grid generation algorithm for distributed memory computers

Science.gov (United States)

Moitra, Stuti; Moitra, Anutosh

1994-01-01

A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.
Results from a MA16-based neural trigger in an experiment looking for beauty

Energy Technology Data Exchange (ETDEWEB)

Baldanza, C. [Istituto Nazionale di Fisica Nucleare, Bologna (Italy); Beichter, J. [Siemens AG, ZFE T ME2, 81730 Munich (Germany); Bisi, F. [Istituto Nazionale di Fisica Nucleare, Bologna (Italy); Bruels, N. [Siemens AG, ZFE T ME2, 81730 Munich (Germany); Bruschini, C. [INFN/Genoa, Via Dodecaneso 33, 16146 Genoa (Italy); Cotta-Ramusino, A. [Istituto Nazionale di Fisica Nucleare, Bologna (Italy); D`Antone, I. [Istituto Nazionale di Fisica Nucleare, Bologna (Italy); Malferrari, L. [Istituto Nazionale di Fisica Nucleare, Bologna (Italy); Mazzanti, P. [Istituto Nazionale di Fisica Nucleare, Bologna (Italy); Musico, P. [INFN/Genoa, Via Dodecaneso 33, 16146 Genoa (Italy); Novelli, P. [INFN/Genoa, Via Dodecaneso 33, 16146 Genoa (Italy); Odorici, F. [Istituto Nazionale di Fisica Nucleare, Bologna (Italy); Odorico, R. [Istituto Nazionale di Fisica Nucleare, Bologna (Italy); Passaseo, M. [CERN, 1211 Geneva 23 (Switzerland); Zuffa, M. [Istituto Nazionale di Fisica Nucleare, Bologna (Italy)

1996-07-11

Results from a neural-network trigger based on the digital MA16 chip of Siemens are reported. The neural trigger has been applied to data from the WA92 experiment, looking for beauty particles, which have been collected during a run in which a neural trigger module based on Intel`s analog neural chip ETANN operated, as already reported. The MA16 board hosting the chip has a 16-bit I/O precision and a 53-bit precision for internal calculations. It operated at 50 MHz, yielding a response time for a 16 input-variable net of 3 {mu}s for a Fisher discriminant (1-layer net) and of 6 {mu}s for a 2-layer net. Results are compared with those previously obtained with the ETANN trigger. (orig.).
Development of an interface for an ultrareliable fault-tolerant control system and an electronic servo-control unit

Science.gov (United States)

Shaver, Charles; Williamson, Michael

1986-01-01

The NASA Ames Research Center sponsors a research program for the investigation of Intelligent Flight Control Actuation systems. The use of artificial intelligence techniques in conjunction with algorithmic techniques for autonomous, decentralized fault management of flight-control actuation systems is explored under this program. The design, development, and operation of the interface for laboratory investigation of this program is documented. The interface, architecturally based on the Intel 8751 microcontroller, is an interrupt-driven system designed to receive a digital message from an ultrareliable fault-tolerant control system (UFTCS). The interface links the UFTCS to an electronic servo-control unit, which controls a set of hydraulic actuators. It was necessary to build a UFTCS emulator (also based on the Intel 8751) to provide signal sources for testing the equipment.
The development of the time-keeping clock with TS-1 single chip microcomputer.

Science.gov (United States)

Zhou, Jiguang; Li, Yongan

The authors have developed a time-keeping clock with Intel 8751 single chip microcomputer that has been successfully used in time-keeping station. The hard-soft ware design and performance of the clock are introduced.
High-Performance Computing Paradigm and Infrastructure

CERN Document Server

Yang, Laurence T

2006-01-01

With hyperthreading in Intel processors, hypertransport links in next generation AMD processors, multi-core silicon in today's high-end microprocessors from IBM and emerging grid computing, parallel and distributed computers have moved into the mainstream
Flux in Tallinn

Index Scriptorium Estoniae

2004-01-01

Rahvusvahelise elektroonilise kunsti sümpoosioni ISEA2004 klubiõhtu "Flux in Tallinn" klubis Bon Bon. Eestit esindasid Ropotator, Ars Intel Inc., Urmas Puhkan, Joel Tammik, Taavi Tulev (pseud. Wochtzchee). Klubiõhtu koordinaator Andres Lõo
Speaker Profiles and Abstracts

Indian Academy of Sciences (India)

and her students work in the area of condensed matter systems. She was ... These findings have helped to understand the assembly and dynamics ... His research areas include electronic design automation, formal methods and artificial intel-.
Disseny d'una arquitectura IoT per a ciutats intel·ligents i desenvolupament d'un servei d'enllumenat intel·ligent

OpenAIRE

Hernández Jordán, Ana Leticia

2013-01-01

[ANGLÈS] Smart cities have been recently pointed out by experts an emerging market with enormous potential, which is expected to drive the digital economy forward in the coming years. Nowadays, cities hold half of the global population, consume 75% of the world´s energy resources and emit 80% of the carbon that is harming the environment. Making a city “smart” is emerging as a strategy to mitigate the problems generated by the urban population growth and rapid urbanization. In that way, by us...
Discovering epistasis in large scale genetic association studies by exploiting graphics cards.

Science.gov (United States)

Chen, Gary K; Guo, Yunfei

2013-12-03

Despite the enormous investments made in collecting DNA samples and generating germline variation data across thousands of individuals in modern genome-wide association studies (GWAS), progress has been frustratingly slow in explaining much of the heritability in common disease. Today's paradigm of testing independent hypotheses on each single nucleotide polymorphism (SNP) marker is unlikely to adequately reflect the complex biological processes in disease risk. Alternatively, modeling risk as an ensemble of SNPs that act in concert in a pathway, and/or interact non-additively on log risk for example, may be a more sensible way to approach gene mapping in modern studies. Implementing such analyzes genome-wide can quickly become intractable due to the fact that even modest size SNP panels on modern genotype arrays (500k markers) pose a combinatorial nightmare, require tens of billions of models to be tested for evidence of interaction. In this article, we provide an in-depth analysis of programs that have been developed to explicitly overcome these enormous computational barriers through the use of processors on graphics cards known as Graphics Processing Units (GPU). We include tutorials on GPU technology, which will convey why they are growing in appeal with today's numerical scientists. One obvious advantage is the impressive density of microprocessor cores that are available on only a single GPU. Whereas high end servers feature up to 24 Intel or AMD CPU cores, the latest GPU offerings from nVidia feature over 2600 cores. Each compute node may be outfitted with up to 4 GPU devices. Success on GPUs varies across problems. However, epistasis screens fare well due to the high degree of parallelism exposed in these problems. Papers that we review routinely report GPU speedups of over two orders of magnitude (>100x) over standard CPU implementations.
Discovering epistasis in large scale genetic association studies by exploiting graphics cards

Directory of Open Access Journals (Sweden)

Gary K Chen

2013-12-01

Full Text Available Despite the enormous investments made in collecting DNA samples and generating germline variation data across thousands of individuals in modern genome wide association studies (GWAS, progress has been frustratingly slow in explaining much of the heritability in common disease. Today’s paradigm of testing independent hypotheses on each SNP marker is unlikely to adequately reflect the complex biological processes in disease risk. Alternatively, modeling risk as an ensemble of SNPs that act in concert in a pathway, and/or interact non-additively on log risk for example, may be a more sensible way to approach gene mapping in modern studies. Implementing such analyses genome-wide can quickly become intractable due to the fact that even modest size SNP panels on modern genotype arrays (500k markers pose a combinatorial nightmare, require tens of billions of models to be tested for evidence of interaction. In this article, we provide an in-depth analysis of programs that have been developed to explicitly overcome these enormous computational barriers through the use of processors on graphics cards known as Graphics Processing Units (GPU. We include tutorials on GPU technology, which will convey why they are growing in appeal with today’s numerical scientists. One obvious advantage is the impressive density of microprocessor cores that are available on only a single GPU. Whereas high end servers feature up to 24 Intel or AMD CPU cores, the latest GPU offerings from nVidia feature over 2,600 cores. Each compute node may be outfitted with up to 4 GPU devices. Success on GPUs varies across problems. However epistasis screens fare well due to the high degree of parallelism exposed in these problems. Papers that we review routinely report GPU speedups of over two orders of magnitude (>100x over standard CPU implementations.
Large electrostatic accelerators

Energy Technology Data Exchange (ETDEWEB)

Jones, C.M.

1984-01-01

The increasing importance of energetic heavy ion beams in the study of atomic physics, nuclear physics, and materials science has partially or wholly motivated the construction of a new generation of large electrostatic accelerators designed to operate at terminal potentials of 20 MV or above. In this paper, the author briefly discusses the status of these new accelerators and also discusses several recent technological advances which may be expected to further improve their performance. The paper is divided into four parts: (1) a discussion of the motivation for the construction of large electrostatic accelerators, (2) a description and discussion of several large electrostatic accelerators which have been recently completed or are under construction, (3) a description of several recent innovations which may be expected to improve the performance of large electrostatic accelerators in the future, and (4) a description of an innovative new large electrostatic accelerator whose construction is scheduled to begin next year. Due to time and space constraints, discussion is restricted to consideration of only tandem accelerators.
Large electrostatic accelerators

International Nuclear Information System (INIS)

Jones, C.M.

1984-01-01

The increasing importance of energetic heavy ion beams in the study of atomic physics, nuclear physics, and materials science has partially or wholly motivated the construction of a new generation of large electrostatic accelerators designed to operate at terminal potentials of 20 MV or above. In this paper, the author briefly discusses the status of these new accelerators and also discusses several recent technological advances which may be expected to further improve their performance. The paper is divided into four parts: (1) a discussion of the motivation for the construction of large electrostatic accelerators, (2) a description and discussion of several large electrostatic accelerators which have been recently completed or are under construction, (3) a description of several recent innovations which may be expected to improve the performance of large electrostatic accelerators in the future, and (4) a description of an innovative new large electrostatic accelerator whose construction is scheduled to begin next year. Due to time and space constraints, discussion is restricted to consideration of only tandem accelerators
Paranoiline ellujääja / Vello Rääk

Index Scriptorium Estoniae

Rääk, Vello

2004-01-01

Ungari emigrandist USA juhtimisguruks tõusnud Andras Grofi alias Andrew S. Grove'i elust ja karjäärist, juhtimispõhimõtteist ning olulisematest saavutustest maailma suurima mikroprotsessorite tootja Intel eesotsas. Vt. samas: Intelis tormilised ajad
"Arvuti koolis" ja arvutiõpetus õppekavas / Helen Vanganen, Jevgeni Košelev

Index Scriptorium Estoniae

Vanganen, Helen

2002-01-01

Ligi 1500 aineõpetajat osales 2001/2002 õppeaastal kursusel "Arvuti koolis". Õpetajad tutvusid Oxfordi Ülikooli jätkukursusega - originaalprogramm kannab nime "Intelʼ Teach to the Future", mis on otseselt seotud infotehnoloogia rakendamisega õppekavas
Multi-core Microprocessors

Indian Academy of Sciences (India)

Based on empirical data, Gordon Moore .... there are numerous models of the same Intel microprocessor such as Pentium. 3). ... returns. The limit on instruction and thread-level processing coupled with ..... This style of parallel programming is.
Large-scale high density 3D AMT for mineral exploration — A case history from volcanic massive sulfide Pb-Zn deposit with 2000 AMT sites

Science.gov (United States)

Chen, R.; Chen, S.; He, L.; Yao, H.; Li, H.; Xi, X.; Zhao, X.

2017-12-01

EM method plays a key role in volcanic massive sulfide (VMS) deposit which is with high grade and high economic value. However, the performance of high density 3D AMT in detecting deep concealed VMS targets is not clear. The size of a typical VMS target is less than 100 m x 100 m x 50 m, it's a challenge task to find it with large depth. We carried a test in a VMS Pb-Zn deposit using high density 3D AMT with site spacing as 20 m and profile spacing as 40 - 80 m. About 2000 AMT sites were acquired in an area as 2000 m x 1500 m. Then we used a sever with 8 CPUs (Intel Xeon E7-8880 v3, 2.3 GHz, 144 cores), 2048 GB RAM, and 40 TB disk array to invert above 3D AMT sites using integral equation forward modeling and re-weighted conjugated-gradient inversion. The depth of VMS ore body is about 600 m and the size of the ore body is about 100 x 100 x 20m with dip angle about 45 degree. We finds that it's very hard to recover the location and shape of the ore body by 3D AMT inversion even using the data of all AMT sites and frequencies. However, it's possible to recover the location and shape of the deep concealed ore body if we adjust the inversion parameters carefully. A new set of inversion parameter needs to be find for high density 3D AMT data set and the inversion parameters working good for Dublin Secret Model II (DSM 2) is not suitable for our real data. This problem may be caused by different data density and different number of frequency. We find a set of good inversion parameter by comparing the shape and location of ore body with inversion result and trying different inversion parameters. And the application of new inversion parameter in nearby area with high density AMT sites shows that the inversion result is improved greatly.
Development of seismic tomography software for hybrid supercomputers

Science.gov (United States)

Nikitin, Alexandr; Serdyukov, Alexandr; Duchkov, Anton

2015-04-01

Seismic tomography is a technique used for computing velocity model of geologic structure from first arrival travel times of seismic waves. The technique is used in processing of regional and global seismic data, in seismic exploration for prospecting and exploration of mineral and hydrocarbon deposits, and in seismic engineering for monitoring the condition of engineering structures and the surrounding host medium. As a consequence of development of seismic monitoring systems and increasing volume of seismic data, there is a growing need for new, more effective computational algorithms for use in seismic tomography applications with improved performance, accuracy and resolution. To achieve this goal, it is necessary to use modern high performance computing systems, such as supercomputers with hybrid architecture that use not only CPUs, but also accelerators and co-processors for computation. The goal of this research is the development of parallel seismic tomography algorithms and software package for such systems, to be used in processing of large volumes of seismic data (hundreds of gigabytes and more). These algorithms and software package will be optimized for the most common computing devices used in modern hybrid supercomputers, such as Intel Xeon CPUs, NVIDIA Tesla accelerators and Intel Xeon Phi co-processors. In this work, the following general scheme of seismic tomography is utilized. Using the eikonal equation solver, arrival times of seismic waves are computed based on assumed velocity model of geologic structure being analyzed. In order to solve the linearized inverse problem, tomographic matrix is computed that connects model adjustments with travel time residuals, and the resulting system of linear equations is regularized and solved to adjust the model. The effectiveness of parallel implementations of existing algorithms on target architectures is considered. During the first stage of this work, algorithms were developed for execution on

Intelit kahtlustatakse konkurentsiseaduse rikkumises / Lauri Matsulevitsh

Index Scriptorium Estoniae

Matsulevitsh, Lauri

2005-01-01

Juunis esitas Inteli konkurent Advanced Micro Devices (AMD) USAs kohtule kaebuse, mille kohaselt on Intel ebakohaste meetoditega veennud arvutifirmasid mitte kasutama AMD toodangut. Inteli kontrolli all on 90% Windowsi tarkvaral töötavate personaalarvutite mikroprotsessorite turust
Performance Evaluation of Hyper Threading Technology ...

African Journals Online (AJOL)

PROF. OLIVER OSUAGWA

2015-12-01

Dec 1, 2015 ... Architecture Using Microsoft Operating System Platform. 1 Okonta O.E., 2 Ajani ... this means operating systems and user programs can ..... access nature of the Intel® Core™ i7 processor based ..... Operating systems manage.
75 FR 54682 - Self-Regulatory Organizations; International Securities Exchange, LLC; Notice of Filing and...

Science.gov (United States)

2010-09-08

.... (``MOT''), Newmont Mining Corporation (``NEM''), NetFlix Inc. (``NFLX''), NVIDIA Corporation (``NVDA... (``QQQQ''), Bank of America Corporation (``BAC''), Citigroup, Inc. (``C''), Standard and Poor's Depositary..., Inc. (``AAPL''), General Electric Company (``GE''), JPMorgan Chase & Co. (``JPM''), Intel Corporation...
Inspiring Innovation

CERN Multimedia

CERN. Geneva

2009-01-01

Craig Barrett became Intel's fourth president in May 1997, chief executive officer in 1998 and chairman of the Board on May 18, 2005. Dr. Barrett also serves as Chairman of the United Nations Global Alliance for Information...
Winckelmann, a bela alegoria e a superação do 'paragone' entre as artes

Directory of Open Access Journals (Sweden)

Claudia Valladão de Mattos

2011-12-01

Full Text Available O texto faz uma análise do conceito de Alegoria em Winckelmann. Procuramos mostrar como o autor faz um uso muito particular do termo, adotando-o principalmente em suas análises de pintura. O conceito de Alegoria em pintura parece ter favorecido, aos olhos de Winckelmann, tanto a realização da ut picura poesis, quanto a adoção das esculturas clássicas como modelo para a pintura, mas ao mesmo tempo esta opção implicou em uma rejeição do modelo narrativo privilegiado pela tradição clássica do século XVII, em favor de um modelo semelhante ao adotado por grandes artistas da tradição barroca do período.
Emotional foundations of cognitive control

Science.gov (United States)

Inzlicht, Michael; Bartholow, Bruce D.; Hirsh, Jacob B.

2015-01-01

Often seen as the paragon of higher cognition, here we suggest that cognitive control is dependent on emotion. Rather than asking whether control is influenced by emotion, we ask whether control itself can be understood as an emotional process. Reviewing converging evidence from cybernetics, animal research, cognitive neuroscience, and social and personality psychology, we suggest that cognitive control is initiated when goal conflicts evoke phasic changes to emotional primitives that both focus attention on the presence of goal conflicts and energize conflict resolution to support goal-directed behavior. Critically, we propose that emotion is not an inert byproduct of conflict but is instrumental in recruiting control. Appreciating the emotional foundations of control leads to testable predictions that can spur future research. PMID:25659515
“Next Unto the Gods My Life Shall Be Spent in Contemplation of Him”: Margaret Cavendish’s Dramatised Widowhood in Bell in Campo (I

Directory of Open Access Journals (Sweden)

Bronk Katarzyna

2017-12-01

Full Text Available Margaret Cavendish (1623–1673 is nowadays remembered as one of the most outspoken female writers and playwrights of the mid-seventeenth-century; one who openly promoted women’s right to education and public displays of creativity. Thus she paved the way for other female artists, such as her near contemporary, Aphra Behn. Although in her times seen as a harmless curiosity rather than a paragon to emulate, Cavendish managed to publish her plays along with more philosophical texts. Thanks to the re-discovery of female artists by feminist revisionism, her drama is now treated as a valuable source of knowledge on the values and norms of her class, gender, and, more generally, English society in the seventeenth century.
L'evoluzione della fisica sviluppo delle idee dai concetti iniziali alla relatività e ai quanti

CERN Document Server

Einstein, Albert

1965-01-01

Pubblicato in inglese alla vigilia della Seconda guerra mondiale e subito proposto in traduzione, L’evoluzione della fisica dovette aspettare la fine del conflitto per vedere la sua pubblicazione in Italia. Da allora (1948) questo testo non ha più smesso di rappresentare un punto di riferimento obbligato per il concetto stesso di divulgazione scientifica e per la fisica in particolare. Scritto dai protagonisti assoluti della rivoluzione della fisica relativistica e quantistica, ma destinato a un pubblico di non specialisti, il libro che avete tra le mani è il testo fondativo della moderna divulgazione delle idee, la pietra di paragone di ogni altro libro di fisica, che permette di intuire la straordinaria importanza e il valore rivoluzionario della svolta della fisica del Novecento.
High performance cone-beam spiral backprojection with voxel-specific weighting

International Nuclear Information System (INIS)

Steckmann, Sven; Knaup, Michael; Kachelriess, Marc

2009-01-01

Cone-beam spiral backprojection is computationally highly demanding. At first sight, the backprojection requirements are similar to those of cone-beam backprojection from circular scans such as it is performed in the widely used Feldkamp algorithm. However, there is an additional complication: the illumination of each voxel, i.e. the range of angles the voxel is seen by the x-ray cone, is a complex function of the voxel position. In general, one needs to multiply a voxel-specific weight w(x, y, z, α) prior to adding a projection from angle α to a voxel at position x, y, z. Often, the weight function has no analytically closed form and must be numerically determined. Storage of the weights is prohibitive since the amount of memory required equals the number of voxels per spiral rotation times the number of projections a voxel receives contributions and therefore is in the order of up to 10 12 floating point values for typical spiral scans. We propose a new algorithm that combines the spiral symmetry with the ability of today's 64 bit operating systems to store large amounts of precomputed weights, even above the 4 GB limit. Our trick is to backproject into slices that are rotated in the same manner as the spiral trajectory rotates. Using the spiral symmetry in this way allows one to exploit data-level paralellism and thereby to achieve a very high level of vectorization. An additional postprocessing step rotates these slices back to normal images. Our new backprojection algorithm achieves up to 17 giga voxel updates per second on our systems that are equipped with four standard Intel X7460 hexa core CPUs (Intel Xeon 7300 platform, 2.66 GHz, Intel Corporation). This equals the reconstruction of 344 images per second assuming that each slice consists of 512 x 512 pixels and receives contributions from 512 projections. Thereby, it is an order of magnitude faster than a highly optimized code that does not make use of the spiral symmetry. In its present version, the
High performance cone-beam spiral backprojection with voxel-specific weighting

Science.gov (United States)

Steckmann, Sven; Knaup, Michael; Kachelrieß, Marc

2009-06-01

Cone-beam spiral backprojection is computationally highly demanding. At first sight, the backprojection requirements are similar to those of cone-beam backprojection from circular scans such as it is performed in the widely used Feldkamp algorithm. However, there is an additional complication: the illumination of each voxel, i.e. the range of angles the voxel is seen by the x-ray cone, is a complex function of the voxel position. In general, one needs to multiply a voxel-specific weight w(x, y, z, α) prior to adding a projection from angle α to a voxel at position x, y, z. Often, the weight function has no analytically closed form and must be numerically determined. Storage of the weights is prohibitive since the amount of memory required equals the number of voxels per spiral rotation times the number of projections a voxel receives contributions and therefore is in the order of up to 1012 floating point values for typical spiral scans. We propose a new algorithm that combines the spiral symmetry with the ability of today's 64 bit operating systems to store large amounts of precomputed weights, even above the 4 GB limit. Our trick is to backproject into slices that are rotated in the same manner as the spiral trajectory rotates. Using the spiral symmetry in this way allows one to exploit data-level paralellism and thereby to achieve a very high level of vectorization. An additional postprocessing step rotates these slices back to normal images. Our new backprojection algorithm achieves up to 17 giga voxel updates per second on our systems that are equipped with four standard Intel X7460 hexa core CPUs (Intel Xeon 7300 platform, 2.66 GHz, Intel Corporation). This equals the reconstruction of 344 images per second assuming that each slice consists of 512 × 512 pixels and receives contributions from 512 projections. Thereby, it is an order of magnitude faster than a highly optimized code that does not make use of the spiral symmetry. In its present version, the
Performance of VPIC on Trinity

Science.gov (United States)

Nystrom, W. D.; Bergen, B.; Bird, R. F.; Bowers, K. J.; Daughton, W. S.; Guo, F.; Li, H.; Nam, H. A.; Pang, X.; Rust, W. N., III; Wohlbier, J.; Yin, L.; Albright, B. J.

2016-10-01

Trinity is a new major DOE computing resource which is going through final acceptance testing at Los Alamos National Laboratory. Trinity has several new and unique architectural features including two compute partitions, one with dual socket Intel Haswell Xeon compute nodes and one with Intel Knights Landing (KNL) Xeon Phi compute nodes. Additional unique features include use of on package high bandwidth memory (HBM) for the KNL nodes, the ability to configure the KNL nodes with respect to HBM model and on die network topology in a variety of operational modes at run time, and use of solid state storage via burst buffer technology to reduce time required to perform I/O. An effort is in progress to port and optimize VPIC to Trinity and evaluate its performance. Because VPIC was recently released as Open Source, it is being used as part of acceptance testing for Trinity and is participating in the Trinity Open Science Program which has resulted in excellent collaboration activities with both Cray and Intel. Results of this work will be presented on performance of VPIC on both Haswell and KNL partitions for both single node runs and runs at scale. Work performed under the auspices of the U.S. Dept. of Energy by the Los Alamos National Security, LLC Los Alamos National Laboratory under contract DE-AC52-06NA25396 and supported by the LANL LDRD program.
Semiconductor Ion Implanters

International Nuclear Information System (INIS)

MacKinnon, Barry A.; Ruffell, John P.

2011-01-01

In 1953 the Raytheon CK722 transistor was priced at $7.60. Based upon this, an Intel Xeon Quad Core processor containing 820,000,000 transistors should list at $6.2 billion! Particle accelerator technology plays an important part in the remarkable story of why that Intel product can be purchased today for a few hundred dollars. Most people of the mid twentieth century would be astonished at the ubiquity of semiconductors in the products we now buy and use every day. Though relatively expensive in the nineteen fifties they now exist in a wide range of items from high-end multicore microprocessors like the Intel product to disposable items containing 'only' hundreds or thousands like RFID chips and talking greeting cards. This historical development has been fueled by continuous advancement of the several individual technologies involved in the production of semiconductor devices including Ion Implantation and the charged particle beamlines at the heart of implant machines. In the course of its 40 year development, the worldwide implanter industry has reached annual sales levels around $2B, installed thousands of dedicated machines and directly employs thousands of workers. It represents in all these measures, as much and possibly more than any other industrial application of particle accelerator technology. This presentation discusses the history of implanter development. It touches on some of the people involved and on some of the developmental changes and challenges imposed as the requirements of the semiconductor industry evolved.
Large bowel resection

Science.gov (United States)

... blockage in the intestine due to scar tissue Colon cancer Diverticular disease (disease of the large bowel) Other reasons for bowel resection are: Familial polyposis (polyps are growths on the lining of the colon or rectum) Injuries that damage the large bowel ...
Block Fusion on Dynamically Adaptive Spacetree Grids for Shallow Water Waves

KAUST Repository

Weinzierl, Tobias; Bader, Michael; Unterweger, Kristof; Wittmann, Roland

2014-01-01

granular blocks. We study the fusion with a state-of-the-art shallow water solver at hands of an Intel Sandy Bridge and a Xeon Phi processor where we anticipate their reaction to selected block optimisation and vectorisation.
Peregrine System | High-Performance Computing | NREL

Science.gov (United States)

classes of nodes that users access: Login Nodes Peregrine has four login nodes, each of which has Intel E5 /scratch file systems, the /mss file system is mounted on all login nodes. Compute Nodes Peregrine has 2592
Modern pulsed spectrometer EPR for longitudinal relaxation time (T1) investigation - computer programs for measurement and data analysis

International Nuclear Information System (INIS)

Ilnicki, J.; Koziol, J.; Galinski, W.; Oles, T.; Kostrzewa, J.; Froncisz, W.

1994-01-01

The computerized control and data processing systems for new spectrometer designed for nuclear magnetic resonance studies of biological samples are presented. Both programs were written for INTEL 386 processor and they works under the Windows 3.0 environment
simpboard –a mongodb implementation of a simplified online ...

African Journals Online (AJOL)

HOD

boards are based on asynchronous text-based computer- mediated communication .... run on an Intel® Pentium ® CPU N3520 running. Microsoft Windows 10 ... The administrator module is invoked when the login type is admin. This displays ...
Microcomputer-controlled ultrasonic data acquisition system. [LMFBR

Energy Technology Data Exchange (ETDEWEB)

Simpson, W.A. Jr.

1978-11-01

The large volume of ultrasonic data generated by computer-aided test procedures has necessitated the development of a mobile, high-speed data acquisition and storage system. This approach offers the decided advantage of on-site data collection and remote data processing. It also utilizes standard, commercially available ultrasonic instrumentation. This system is controlled by an Intel 8080A microprocessor. The MCS80-SDK microcomputer board was chosen, and magnetic tape is used as the storage medium. A detailed description is provided of both the hardware and software developed to interface the magnetic tape storage subsystem to Biomation 8100 and Biomation 805 waveform recorders. A boxcar integrator acquisition system is also described for use when signal averaging becomes necessary. Both assembly language and machine language listings are provided for the software.
Microcomputer-controlled ultrasonic data acquisition system

International Nuclear Information System (INIS)

Simpson, W.A. Jr.

1978-11-01

The large volume of ultrasonic data generated by computer-aided test procedures has necessitated the development of a mobile, high-speed data acquisition and storage system. This approach offers the decided advantage of on-site data collection and remote data processing. It also utilizes standard, commercially available ultrasonic instrumentation. This system is controlled by an Intel 8080A microprocessor. The MCS80-SDK microcomputer board was chosen, and magnetic tape is used as the storage medium. A detailed description is provided of both the hardware and software developed to interface the magnetic tape storage subsystem to Biomation 8100 and Biomation 805 waveform recorders. A boxcar integrator acquisition system is also described for use when signal averaging becomes necessary. Both assembly language and machine language listings are provided for the software
Large N Scalars

DEFF Research Database (Denmark)

Sannino, Francesco

2016-01-01

We construct effective Lagrangians, and corresponding counting schemes, valid to describe the dynamics of the lowest lying large N stable massive composite state emerging in strongly coupled theories. The large N counting rules can now be employed when computing quantum corrections via an effective...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.