WorldWideScience

Sample records for intel fortran compiler

  1. A Note on Compiling Fortran

    Energy Technology Data Exchange (ETDEWEB)

    Busby, L. E. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-09-01

    Fortran modules tend to serialize compilation of large Fortran projects, by introducing dependencies among the source files. If file A depends on file B, (A uses a module defined by B), you must finish compiling B before you can begin compiling A. Some Fortran compilers (Intel ifort, GNU gfortran and IBM xlf, at least) offer an option to ‘‘verify syntax’’, with the side effect of also producing any associated Fortran module files. As it happens, this option usually runs much faster than the object code generation and optimization phases. For some projects on some machines, it can be advantageous to compile in two passes: The first pass generates the module files, quickly; the second pass produces the object files, in parallel. We achieve a 3.8× speedup in the case study below.

  2. OpenMP-accelerated SWAT simulation using Intel C and FORTRAN compilers: Development and benchmark

    Science.gov (United States)

    Ki, Seo Jin; Sugimura, Tak; Kim, Albert S.

    2015-02-01

    We developed a practical method to accelerate execution of Soil and Water Assessment Tool (SWAT) using open (free) computational resources. The SWAT source code (rev 622) was recompiled using a non-commercial Intel FORTRAN compiler in Ubuntu 12.04 LTS Linux platform, and newly named iOMP-SWAT in this study. GNU utilities of make, gprof, and diff were used to develop the iOMP-SWAT package, profile memory usage, and check identicalness of parallel and serial simulations. Among 302 SWAT subroutines, the slowest routines were identified using GNU gprof, and later modified using Open Multiple Processing (OpenMP) library in an 8-core shared memory system. In addition, a C wrapping function was used to rapidly set large arrays to zero by cross compiling with the original SWAT FORTRAN package. A universal speedup ratio of 2.3 was achieved using input data sets of a large number of hydrological response units. As we specifically focus on acceleration of a single SWAT run, the use of iOMP-SWAT for parameter calibrations will significantly improve the performance of SWAT optimization.

  3. OpenMP GNU and Intel Fortran programs for solving the time-dependent Gross-Pitaevskii equation

    Science.gov (United States)

    Young-S., Luis E.; Muruganandam, Paulsamy; Adhikari, Sadhan K.; Lončar, Vladimir; Vudragović, Dušan; Balaž, Antun

    2017-11-01

    We present Open Multi-Processing (OpenMP) version of Fortran 90 programs for solving the Gross-Pitaevskii (GP) equation for a Bose-Einstein condensate in one, two, and three spatial dimensions, optimized for use with GNU and Intel compilers. We use the split-step Crank-Nicolson algorithm for imaginary- and real-time propagation, which enables efficient calculation of stationary and non-stationary solutions, respectively. The present OpenMP programs are designed for computers with multi-core processors and optimized for compiling with both commercially-licensed Intel Fortran and popular free open-source GNU Fortran compiler. The programs are easy to use and are elaborated with helpful comments for the users. All input parameters are listed at the beginning of each program. Different output files provide physical quantities such as energy, chemical potential, root-mean-square sizes, densities, etc. We also present speedup test results for new versions of the programs. Program files doi:http://dx.doi.org/10.17632/y8zk3jgn84.2 Licensing provisions: Apache License 2.0 Programming language: OpenMP GNU and Intel Fortran 90. Computer: Any multi-core personal computer or workstation with the appropriate OpenMP-capable Fortran compiler installed. Number of processors used: All available CPU cores on the executing computer. Journal reference of previous version: Comput. Phys. Commun. 180 (2009) 1888; ibid.204 (2016) 209. Does the new version supersede the previous version?: Not completely. It does supersede previous Fortran programs from both references above, but not OpenMP C programs from Comput. Phys. Commun. 204 (2016) 209. Nature of problem: The present Open Multi-Processing (OpenMP) Fortran programs, optimized for use with commercially-licensed Intel Fortran and free open-source GNU Fortran compilers, solve the time-dependent nonlinear partial differential (GP) equation for a trapped Bose-Einstein condensate in one (1d), two (2d), and three (3d) spatial dimensions for

  4. Scientific Programming with High Performance Fortran: A Case Study Using the xHPF Compiler

    Directory of Open Access Journals (Sweden)

    Eric De Sturler

    1997-01-01

    Full Text Available Recently, the first commercial High Performance Fortran (HPF subset compilers have appeared. This article reports on our experiences with the xHPF compiler of Applied Parallel Research, version 1.2, for the Intel Paragon. At this stage, we do not expect very High Performance from our HPF programs, even though performance will eventually be of paramount importance for the acceptance of HPF. Instead, our primary objective is to study how to convert large Fortran 77 (F77 programs to HPF such that the compiler generates reasonably efficient parallel code. We report on a case study that identifies several problems when parallelizing code with HPF; most of these problems affect current HPF compiler technology in general, although some are specific for the xHPF compiler. We discuss our solutions from the perspective of the scientific programmer, and presenttiming results on the Intel Paragon. The case study comprises three programs of different complexity with respect to parallelization. We use the dense matrix-matrix product to show that the distribution of arrays and the order of nested loops significantly influence the performance of the parallel program. We use Gaussian elimination with partial pivoting to study the parallelization strategy of the compiler. There are various ways to structure this algorithm for a particular data distribution. This example shows how much effort may be demanded from the programmer to support the compiler in generating an efficient parallel implementation. Finally, we use a small application to show that the more complicated structure of a larger program may introduce problems for the parallelization, even though all subroutines of the application are easy to parallelize by themselves. The application consists of a finite volume discretization on a structured grid and a nested iterative solver. Our case study shows that it is possible to obtain reasonably efficient parallel programs with xHPF, although the compiler

  5. Effective SIMD Vectorization for Intel Xeon Phi Coprocessors

    OpenAIRE

    Tian, Xinmin; Saito, Hideki; Preis, Serguei V.; Garcia, Eric N.; Kozhukhov, Sergey S.; Masten, Matt; Cherkasov, Aleksei G.; Panchenko, Nikolay

    2015-01-01

    Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2D vectorization implemented in the Intel C/C++ and Fortran production compilers for Intel Xeon Phi coprocessors. A ...

  6. SVM Support in the Vienna Fortran Compilation System

    OpenAIRE

    Brezany, Peter; Gerndt, Michael; Sipkova, Viera

    1994-01-01

    Vienna Fortran, a machine-independent language extension to Fortran which allows the user to write programs for distributed-memory systems using global addresses, provides the forall-loop construct for specifying irregular computations that do not cause inter-iteration dependences. Compilers for distributed-memory systems generate code that is based on runtime analysis techniques and is only efficient if, in addition, aggressive compile-time optimizations are applied. Since these optimization...

  7. VFC: The Vienna Fortran Compiler

    Directory of Open Access Journals (Sweden)

    Siegfried Benkner

    1999-01-01

    Full Text Available High Performance Fortran (HPF offers an attractive high‐level language interface for programming scalable parallel architectures providing the user with directives for the specification of data distribution and delegating to the compiler the task of generating an explicitly parallel program. Available HPF compilers can handle regular codes quite efficiently, but dramatic performance losses may be encountered for applications which are based on highly irregular, dynamically changing data structures and access patterns. In this paper we introduce the Vienna Fortran Compiler (VFC, a new source‐to‐source parallelization system for HPF+, an optimized version of HPF, which addresses the requirements of irregular applications. In addition to extended data distribution and work distribution mechanisms, HPF+ provides the user with language features for specifying certain information that decisively influence a program’s performance. This comprises data locality assertions, non‐local access specifications and the possibility of reusing runtime‐generated communication schedules of irregular loops. Performance measurements of kernels from advanced applications demonstrate that with a high‐level data parallel language such as HPF+ a performance close to hand‐written message‐passing programs can be achieved even for highly irregular codes.

  8. Effective SIMD Vectorization for Intel Xeon Phi Coprocessors

    Directory of Open Access Journals (Sweden)

    Xinmin Tian

    2015-01-01

    Full Text Available Efficiently exploiting SIMD vector units is one of the most important aspects in achieving high performance of the application code running on Intel Xeon Phi coprocessors. In this paper, we present several effective SIMD vectorization techniques such as less-than-full-vector loop vectorization, Intel MIC specific alignment optimization, and small matrix transpose/multiplication 2D vectorization implemented in the Intel C/C++ and Fortran production compilers for Intel Xeon Phi coprocessors. A set of workloads from several application domains is employed to conduct the performance study of our SIMD vectorization techniques. The performance results show that we achieved up to 12.5x performance gain on the Intel Xeon Phi coprocessor. We also demonstrate a 2000x performance speedup from the seamless integration of SIMD vectorization and parallelization.

  9. PGHPF – An Optimizing High Performance Fortran Compiler for Distributed Memory Machines

    Directory of Open Access Journals (Sweden)

    Zeki Bozkus

    1997-01-01

    Full Text Available High Performance Fortran (HPF is the first widely supported, efficient, and portable parallel programming language for shared and distributed memory systems. HPF is realized through a set of directive-based extensions to Fortran 90. It enables application developers and Fortran end-users to write compact, portable, and efficient software that will compile and execute on workstations, shared memory servers, clusters, traditional supercomputers, or massively parallel processors. This article describes a production-quality HPF compiler for a set of parallel machines. Compilation techniques such as data and computation distribution, communication generation, run-time support, and optimization issues are elaborated as the basis for an HPF compiler implementation on distributed memory machines. The performance of this compiler on benchmark programs demonstrates that high efficiency can be achieved executing HPF code on parallel architectures.

  10. Recent advances in PC-Linux systems for electronic structure computations by optimized compilers and numerical libraries.

    Science.gov (United States)

    Yu, Jen-Shiang K; Yu, Chin-Hui

    2002-01-01

    One of the most frequently used packages for electronic structure research, GAUSSIAN 98, is compiled on Linux systems with various hardware configurations, including AMD Athlon (with the "Thunderbird" core), AthlonMP, and AthlonXP (with the "Palomino" core) systems as well as the Intel Pentium 4 (with the "Willamette" core) machines. The default PGI FORTRAN compiler (pgf77) and the Intel FORTRAN compiler (ifc) are respectively employed with different architectural optimization options to compile GAUSSIAN 98 and test the performance improvement. In addition to the BLAS library included in revision A.11 of this package, the Automatically Tuned Linear Algebra Software (ATLAS) library is linked against the binary executables to improve the performance. Various Hartree-Fock, density-functional theories, and the MP2 calculations are done for benchmarking purposes. It is found that the combination of ifc with ATLAS library gives the best performance for GAUSSIAN 98 on all of these PC-Linux computers, including AMD and Intel CPUs. Even on AMD systems, the Intel FORTRAN compiler invariably produces binaries with better performance than pgf77. The enhancement provided by the ATLAS library is more significant for post-Hartree-Fock calculations. The performance on one single CPU is potentially as good as that on an Alpha 21264A workstation or an SGI supercomputer. The floating-point marks by SpecFP2000 have similar trends to the results of GAUSSIAN 98 package.

  11. GRESS, FORTRAN Pre-compiler with Differentiation Enhancement

    International Nuclear Information System (INIS)

    1999-01-01

    1 - Description of program or function: The GRESS FORTRAN pre-compiler (SYMG) and run-time library are used to enhance conventional FORTRAN-77 programs with analytic differentiation of arithmetic statements for automatic differentiation in either forward or reverse mode. GRESS 3.0 is functionally equivalent to GRESS 2.1. GRESS 2.1 is an improved and updated version of the previous released GRESS 1.1. Improvements in the implementation of a the CHAIN option have resulted in a 70 to 85% reduction in execution time and up to a 50% reduction in memory required for forward chaining applications. 2 - Method of solution: GRESS uses a pre-compiler to analyze FORTRAN statements and determine the mathematical operations embodied in them. As each arithmetic assignment statement in a program is analyzed, SYMG generates the partial derivatives of the term on the left with respect to each floating-point variable on the right. The result of the pre-compilation step is a new FORTRAN program that can produce derivatives for any REAL (i.e., single or double precision) variable calculated by the model. Consequently, GRESS enhances FORTRAN programs or subprograms by adding the calculation of derivatives along with the original output. Derivatives from a GRESS enhanced model can be used internally (e.g., iteration acceleration) or externally (e.g., sensitivity studies). By calling GRESS run-time routines, derivatives can be propagated through the code via the chain rule (referred to as the CHAIN option) or accumulated to create an adjoint matrix (referred to as the ADGEN option). A third option, GENSUB, makes it possible to process a subset of a program (i.e., a do loop, subroutine, function, a sequence of subroutines, or a whole program) for calculating derivatives of dependent variables with respect to independent variables. A code enhanced with the GENSUB option can use forward mode, reverse mode, or a hybrid of the two modes. 3 - Restrictions on the complexity of the problem: GRESS

  12. Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

    Directory of Open Access Journals (Sweden)

    Hari Radhakrishnan

    2015-01-01

    Full Text Available This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were done using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.

  13. New compilers speed up applications for Intel-based systems; Intel Compilers pave the way for Intel's Hyper-threading technology

    CERN Multimedia

    2002-01-01

    "Intel Corporation today introduced updated tools to help software developers optimize applications for Intel's expanding family of architectures with key innovations such as Intel's Hyper Threading Technology (1 page).

  14. JLAPACK – Compiling LAPACK FORTRAN to Java

    Directory of Open Access Journals (Sweden)

    David M. Doolin

    1999-01-01

    Full Text Available The JLAPACK project provides the LAPACK numerical subroutines translated from their subset Fortran 77 source into class files, executable by the Java Virtual Machine (JVM and suitable for use by Java programmers. This makes it possible for Java applications or applets, distributed on the World Wide Web (WWW to use established legacy numerical code that was originally written in Fortran. The translation is accomplished using a special purpose Fortran‐to‐Java (source‐to‐source compiler. The LAPACK API will be considerably simplified to take advantage of Java’s object‐oriented design. This report describes the research issues involved in the JLAPACK project, and its current implementation and status.

  15. Numerical performance and throughput benchmark for electronic structure calculations in PC-Linux systems with new architectures, updated compilers, and libraries.

    Science.gov (United States)

    Yu, Jen-Shiang K; Hwang, Jenn-Kang; Tang, Chuan Yi; Yu, Chin-Hui

    2004-01-01

    A number of recently released numerical libraries including Automatically Tuned Linear Algebra Subroutines (ATLAS) library, Intel Math Kernel Library (MKL), GOTO numerical library, and AMD Core Math Library (ACML) for AMD Opteron processors, are linked against the executables of the Gaussian 98 electronic structure calculation package, which is compiled by updated versions of Fortran compilers such as Intel Fortran compiler (ifc/efc) 7.1 and PGI Fortran compiler (pgf77/pgf90) 5.0. The ifc 7.1 delivers about 3% of improvement on 32-bit machines compared to the former version 6.0. Performance improved from pgf77 3.3 to 5.0 is also around 3% when utilizing the original unmodified optimization options of the compiler enclosed in the software. Nevertheless, if extensive compiler tuning options are used, the speed can be further accelerated to about 25%. The performances of these fully optimized numerical libraries are similar. The double-precision floating-point (FP) instruction sets (SSE2) are also functional on AMD Opteron processors operated in 32-bit compilation, and Intel Fortran compiler has performed better optimization. Hardware-level tuning is able to improve memory bandwidth by adjusting the DRAM timing, and the efficiency in the CL2 mode is further accelerated by 2.6% compared to that of the CL2.5 mode. The FP throughput is measured by simultaneous execution of two identical copies of each of the test jobs. Resultant performance impact suggests that IA64 and AMD64 architectures are able to fulfill significantly higher throughput than the IA32, which is consistent with the SpecFPrate2000 benchmarks.

  16. MPI to Coarray Fortran: Experiences with a CFD Solver for Unstructured Meshes

    Directory of Open Access Journals (Sweden)

    Anuj Sharma

    2017-01-01

    Full Text Available High-resolution numerical methods and unstructured meshes are required in many applications of Computational Fluid Dynamics (CFD. These methods are quite computationally expensive and hence benefit from being parallelized. Message Passing Interface (MPI has been utilized traditionally as a parallelization strategy. However, the inherent complexity of MPI contributes further to the existing complexity of the CFD scientific codes. The Partitioned Global Address Space (PGAS parallelization paradigm was introduced in an attempt to improve the clarity of the parallel implementation. We present our experiences of converting an unstructured high-resolution compressible Navier-Stokes CFD solver from MPI to PGAS Coarray Fortran. We present the challenges, methodology, and performance measurements of our approach using Coarray Fortran. With the Cray compiler, we observe Coarray Fortran as a viable alternative to MPI. We are hopeful that Intel and open-source implementations could be utilized in the future.

  17. Programming in Fortran M

    Energy Technology Data Exchange (ETDEWEB)

    Foster, I.; Olson, R.; Tuecke, S.

    1993-08-01

    Fortran M is a small set of extensions to Fortran that supports a modular approach to the construction of sequential and parallel programs. Fortran M programs use channels to plug together processes which may be written in Fortran M or Fortran 77. Processes communicate by sending and receiving messages on channels. Channels and processes can be created dynamically, but programs remain deterministic unless specialized nondeterministic constructs are used. Fortran M programs can execute on a range of sequential, parallel, and networked computers. This report incorporates both a tutorial introduction to Fortran M and a users guide for the Fortran M compiler developed at Argonne National Laboratory. The Fortran M compiler, supporting software, and documentation are made available free of charge by Argonne National Laboratory, but are protected by a copyright which places certain restrictions on how they may be redistributed. See the software for details. The latest version of both the compiler and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/fortran-m at info.mcs.anl.gov.

  18. Installation of a new Fortran compiler and effective programming method on the vector supercomputer

    International Nuclear Information System (INIS)

    Nemoto, Toshiyuki; Suzuki, Koichiro; Watanabe, Kenji; Machida, Masahiko; Osanai, Seiji; Isobe, Nobuo; Harada, Hiroo; Yokokawa, Mitsuo

    1992-07-01

    The Fortran compiler, version 10 has been replaced with the new one, version 12 (V12) on the Fujitsu Computer system at JAERI since May, 1992. The benchmark test for the performance of the V12 compiler is carried out with 16 representative nuclear codes in advance of the installation of the compiler. The performance of the compiler is achieved by the factor of 1.13 in average. The effect of the enhanced functions of the compiler and the compatibility to the nuclear codes are also examined. The assistant tool for vectorization TOP10EX is developed. In this report, the results of the evaluation of the V12 compiler and the usage of the tools for vectorization are presented. (author)

  19. Programação orientada a objetos em FORTRAN

    OpenAIRE

    Beck, André Teófilo; Bazán, Felipe Alexander Vargas

    2011-01-01

    Este artigo apresenta conceitos fundamentais de programação orientada a objetos (OO) em FORTRAN. Em geral, os usuários de FORTRAN não estão familiarizados com estes conceitos, pois os compiladores desta linguagem não possuíam suporte para programação OO até o recente lançamento da versão 11.1 do compilador Intel Visual FORTRAN. Este compilador suporta a maioria das características de orientação a objetos do padrão FORTRAN 2003, permitindo a atualização de práticas de programaçã...

  20. Scientific Computing and Apple's Intel Transition

    CERN Document Server

    CERN. Geneva

    2006-01-01

    Intel's published processor roadmap and how it may affect the future of personal and scientific computing About the speaker: Eric Albert is Senior Software Engineer in Apple's Core Technologies group. During Mac OS X's transition to Intel processors he has worked on almost every part of the operating system, from the OS kernel and compiler tools to appli...

  1. The FORTRAN NALAP code adapted to a microcomputer compiler

    International Nuclear Information System (INIS)

    Lobo, Paulo David de Castro; Borges, Eduardo Madeira; Braz Filho, Francisco Antonio; Guimaraes, Lamartine Nogueira Frutuoso

    2010-01-01

    The Nuclear Energy Division of the Institute for Advanced Studies (IEAv) is conducting the TERRA project (TEcnologia de Reatores Rapidos Avancados), Technology for Advanced Fast Reactors project, aimed at a space reactor application. In this work, to attend the TERRA project, the NALAP code adapted to a microcomputer compiler called Compaq Visual Fortran (Version 6.6) is presented. This code, adapted from the light water reactor transient code RELAP 3B, simulates thermal-hydraulic responses for sodium cooled fast reactors. The strategy to run the code in a PC was divided in some steps mainly to remove unnecessary routines, to eliminate old statements, to introduce new ones and also to include extension precision mode. The source program was able to solve three sample cases under conditions of protected transients suggested in literature: the normal reactor shutdown, with a delay of 200 ms to start the control rod movement and a delay of 500 ms to stop the pumps; reactor scram after transient of loss of flow; and transients protected from overpower. Comparisons were made with results from the time when the NALAP code was acquired by the IEAv, back in the 80's. All the responses for these three simulations reproduced the calculations performed with the CDC compiler in 1985. Further modifications will include the usage of gas as coolant for the nuclear reactor to allow a Closed Brayton Cycle Loop - CBCL - to be used as a heat/electric converter. (author)

  2. The FORTRAN NALAP code adapted to a microcomputer compiler

    Energy Technology Data Exchange (ETDEWEB)

    Lobo, Paulo David de Castro; Borges, Eduardo Madeira; Braz Filho, Francisco Antonio; Guimaraes, Lamartine Nogueira Frutuoso, E-mail: plobo.a@uol.com.b, E-mail: eduardo@ieav.cta.b, E-mail: fbraz@ieav.cta.b, E-mail: guimarae@ieav.cta.b [Instituto de Estudos Avancados (IEAv/CTA), Sao Jose dos Campos, SP (Brazil)

    2010-07-01

    The Nuclear Energy Division of the Institute for Advanced Studies (IEAv) is conducting the TERRA project (TEcnologia de Reatores Rapidos Avancados), Technology for Advanced Fast Reactors project, aimed at a space reactor application. In this work, to attend the TERRA project, the NALAP code adapted to a microcomputer compiler called Compaq Visual Fortran (Version 6.6) is presented. This code, adapted from the light water reactor transient code RELAP 3B, simulates thermal-hydraulic responses for sodium cooled fast reactors. The strategy to run the code in a PC was divided in some steps mainly to remove unnecessary routines, to eliminate old statements, to introduce new ones and also to include extension precision mode. The source program was able to solve three sample cases under conditions of protected transients suggested in literature: the normal reactor shutdown, with a delay of 200 ms to start the control rod movement and a delay of 500 ms to stop the pumps; reactor scram after transient of loss of flow; and transients protected from overpower. Comparisons were made with results from the time when the NALAP code was acquired by the IEAv, back in the 80's. All the responses for these three simulations reproduced the calculations performed with the CDC compiler in 1985. Further modifications will include the usage of gas as coolant for the nuclear reactor to allow a Closed Brayton Cycle Loop - CBCL - to be used as a heat/electric converter. (author)

  3. 75 FR 48338 - Intel Corporation; Analysis of Proposed Consent Order to Aid Public Comment

    Science.gov (United States)

    2010-08-10

    ... product road maps, its compilers, and product benchmarking (Sections VI, VII, and VIII). The Proposed... alleges that Intel's failure to fully disclose the changes it made to its compilers and libraries... benchmarking organizations the effects of its compiler redesign on non-Intel CPUs. Several benchmarking...

  4. Performance Issues in High Performance Fortran Implementations of Sensor-Based Applications

    Directory of Open Access Journals (Sweden)

    David R. O'hallaron

    1997-01-01

    Full Text Available Applications that get their inputs from sensors are an important and often overlooked application domain for High Performance Fortran (HPF. Such sensor-based applications typically perform regular operations on dense arrays, and often have latency and through put requirements that can only be achieved with parallel machines. This article describes a study of sensor-based applications, including the fast Fourier transform, synthetic aperture radar imaging, narrowband tracking radar processing, multibaseline stereo imaging, and medical magnetic resonance imaging. The applications are written in a dialect of HPF developed at Carnegie Mellon, and are compiled by the Fx compiler for the Intel Paragon. The main results of the study are that (1 it is possible to realize good performance for realistic sensor-based applications written in HPF and (2 the performance of the applications is determined by the performance of three core operations: independent loops (i.e., loops with no dependences between iterations, reductions, and index permutations. The article discusses the implications for HPF implementations and introduces some simple tests that implementers and users can use to measure the efficiency of the loops, reductions, and index permutations generated by an HPF compiler.

  5. Accessing Intel FPGAs for Acceleration

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    In this presentation, we will discuss the latest tools and products from Intel that enables FPGAs to be deployed as Accelerators. We will first talk about the Acceleration Stack for Intel Xeon CPU with FPGAs which makes it easy to create, verify, and execute functions on the Intel Programmable Acceleration Card in a Data Center. We will then talk about the OpenCL flow which allows parallel software developers to create FPGA systems and deploy them using the OpenCL standard. Next, we will talk about the Intel High-Level Synthesis compiler which can convert C++ code into custom RTL code optimized for Intel FPGAs. Lastly, we will focus on the task of running Machine Learning inference on the FPGA leveraging some of the tools we discussed. About the speaker Karl Qi is Sr. Staff Applications Engineer, Technical Training. He has been with the Customer Training department at Altera/Intel for 8 years. Most recently, he is responsible for all training content relating to High-Level Design tools, including the OpenCL...

  6. FPP: A Fortran preprocessor

    International Nuclear Information System (INIS)

    Boyarski, A.

    1992-11-01

    FPP is a preprocessor which aids in porting Fortran source code across differing platforms. It provides conditional compilation features to enable or disable sections of code, and can modify file names in INCLUDE statements to a syntax suitable for a target platform. FPP is written Fortran 77, and runs on VM/CMS, VAX/VMS, UNIX, and PC/DOS SYSTEMS

  7. Reactive Geochemical Transport Modeling of Concentrated AqueousSolutions: Supplement to TOUGHREACT User's Guide for the PitzerIon-Interaction Model

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Guoxiang; Spycher, Nicolas; Xu, Tianfu; Sonnenthal, Eric; Steefel , Carl

    2006-12-15

    In this report, we present: -- The Pitzer ion-interactiontheory and models -- Input file requirements for using the TOUGHREACTPitzer ion-interaction model and associated databases -- Run-time errormessages -- Verification test cases and application examples. For themain code structure, features, overall solution methods, description ofinput/output files for parameters other than those specific to theimplemented Pitzer model, and error messages, see the TOUGHREACT User'sGuide (Xu et al., 2005). The TOUGHREACT Pitzer version runs on aDEC-alpha architecture CPU, under OSF1 V5.1, with Compaq Digital FortranCompiler. The compiler run-time libraries are required for execution aswell as compilation. The code also runs on Intel Pentium IV andhigher-version CPU-based machines with Compaq Visual Fortran Compiler orIntel Fortran Compiler (integrated with the Microsoft DevelopmentEnvironment). The minimum hardware configuration should include 1 GB RAMand 1 GB (2 GB recommended) of available disk space.

  8. Compiler issues associated with safety-related software

    International Nuclear Information System (INIS)

    Feinauer, L.R.

    1991-01-01

    A critical issue in the quality assurance of safety-related software is the ability of the software to produce identical results, independent of the host machine, operating system, or compiler version under which the software is installed. A study is performed using the VIPRE-0l, FREY-01, and RETRAN-02 safety-related codes. Results from an IBM 3083 computer are compared with results from a CYBER 860 computer. All three of the computer programs examined are written in FORTRAN; the VIPRE code uses the FORTRAN 66 compiler, whereas the FREY and RETRAN codes use the FORTRAN 77 compiler. Various compiler options are studied to determine their effect on the output between machines. Since the Control Data Corporation and IBM machines inherently represent numerical data differently, methods of producing equivalent accuracy of data representation were an important focus of the study. This paper identifies particular problems in the automatic double-precision option (AUTODBL) of the IBM FORTRAN 1.4.x series of compilers. The IBM FORTRAN version 2 compilers provide much more stable, reliable compilation for engineering software. Careful selection of compilers and compiler options can help guarantee identical results between different machines. To ensure reproducibility of results, the same compiler and compiler options should be used to install the program as were used in the development and testing of the program

  9. Towards Porting a Real-World Seismological Application to the Intel MIC Architecture

    OpenAIRE

    V. Weinberg

    2014-01-01

    This whitepaper aims to discuss first experiences with porting an MPI-based real-world geophysical application to the new Intel Many Integrated Core (MIC) architecture. The selected code SeisSol is an application written in Fortran that can be used to simulate earthquake rupture and radiating seismic wave propagation in complex 3-D heterogeneous materials. The PRACE prototype cluster EURORA at CINECA, Italy, was accessed to analyse the MPI-performance of SeisSol on Intel Xeon Phi on both sing...

  10. Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation

    OpenAIRE

    Vanderbauwhede, Wim; Davidson, Gavin

    2017-01-01

    Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent a powerful and affordable tool for scientists who look to speed up simulations of complex systems. However, porting code to such devices requires a detailed understanding of heterogeneous programming tools and effective strategies for parallelization. In this paper we present a source to source compilation approach with whole-program analysis to automatically transform single-threaded FORTRAN 77 legacy code into Ope...

  11. MORTRAN-2, FORTRAN Language Extension with User-Supplied Macros

    International Nuclear Information System (INIS)

    Cook, A. James; Shustek, L.J.

    1980-01-01

    1 - Description of problem or function: MORTRAN2 is a FORTRAN language extension that permits a relatively easy transition from FORTRAN to a more convenient and structured language. Its features include free-field format; alphanumeric statement labels; flexible comment convention; nested block structure; for-by-to, do, while, until, loop, if-then-else, if-else, exit, and next statements; multiple assignment statements; conditional compilation; and automatic listing indentation. The language is implemented by a macro-based pre-processor and is further extensible by user-defined macros. 2 - Method of solution: The MORTRAN2 pre-processor may be regarded as a compiler whose object code is ANSI Standard FORTRAN. The MORTRAN2 language is dynamically defined by macros which are input at each use of the pre-processor. 3 - Restrictions on the complexity of the problem: The pre-processor output must be accepted by a FORTRAN compiler

  12. Scientific Programming in Fortran

    Directory of Open Access Journals (Sweden)

    W. Van Snyder

    2007-01-01

    Full Text Available The Fortran programming language was designed by John Backus and his colleagues at IBM to reduce the cost of programming scientific applications. IBM delivered the first compiler for its model 704 in 1957. IBM's competitors soon offered incompatible versions. ANSI (ASA at the time developed a standard, largely based on IBM's Fortran IV in 1966. Revisions of the standard were produced in 1977, 1990, 1995 and 2003. Development of a revision, scheduled for 2008, is under way. Unlike most other programming languages, Fortran is periodically revised to keep pace with developments in language and processor design, while revisions largely preserve compatibility with previous versions. Throughout, the focus on scientific programming, and especially on efficient generated programs, has been maintained.

  13. Classical Fortran programming for engineering and scientific applications

    CERN Document Server

    Kupferschmid, Michael

    2009-01-01

    IntroductionWhy Study Programming?The Evolution of FORTRANWhy Study FORTRAN?Classical FORTRANAbout This BookAdvice to InstructorsAbout the AuthorAcknowledgmentsDisclaimersHello, World!Case Study: A First FORTRAN ProgramCompiling the ProgramRunning a Program in UNIXOmissionsExpressions and Assignment StatementsConstantsVariables and Variable NamesArithmetic OperatorsFunction ReferencesExpressionsA

  14. High Performance Fortran for Aerospace Applications

    National Research Council Canada - National Science Library

    Mehrotra, Piyush

    2000-01-01

    .... HPF is a set of Fortran extensions designed to provide users with a high-level interface for programming data parallel scientific applications while delegating to the compiler/runtime system the task...

  15. An Initial Evaluation of the NAG f90 Compiler

    Directory of Open Access Journals (Sweden)

    Michael Metcalf

    1992-01-01

    Full Text Available A few weeks before the formal publication of the ISO Fortran 90 Standard, NAG announced the world's first f90 compiler. We have evaluated the compiler by using it to assess the impact of Fortran 90 on the CERN Program Library.

  16. Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor

    Energy Technology Data Exchange (ETDEWEB)

    Koskela, Tuomas S.; Lobet, Mathieu; Deslippe, Jack; Matveev, Zakhar

    2017-05-23

    recent optimizations to its collision kernel [4], most of the computing time is spent in the electron push (pushe) kernel, where these optimization efforts have been focused. The kernel code scaled well with MPI+OpenMP but had almost no automatic compiler vectorization, in part due to indirect memory addresses and in part due to low trip counts of low-level loops that would be candidates for vectorization. Particle blocking and sorting have been implemented to increase trip counts of low-level loops and improve memory locality, and OpenMP directives have been added to vectorize compute-intensive loops that were identified by Advisor. The optimizations have improved the performance of the pushe kernel 2x on Haswell processors and 1.7x on KNL. The KNL node-for-node performance has been brought to within 30% of a NERSC Cori phase I Haswell node and we expect to bridge this gap by reducing the memory footprint of compute intensive routines to improve cache reuse. ## PICSAR is a Fortran/Python high-performance Particle-In-Cell library targeting at MIC architectures first designed to be coupled with the PIC code WARP for the simulation of laser-matter interaction and particle accelerators. PICSAR also contains a FORTRAN stand-alone kernel for performance studies and benchmarks. A MPI domain decomposition is used between NUMA domains and a tile decomposition (cache-blocking) handled by OpenMP has been added for shared-memory parallelism and better cache management. The so-called current deposition and field gathering steps that compose the PIC time loop constitute major hotspots that have been rewritten to enable more efficient vectorization. Particle communications between tiles and MPI domain has been merged and parallelized. All considered, these improvements provide speedups of 3.1 for order 1 and 4.6 for order 3 interpolation shape factors on KNL configured in SNC4 quadrant flat mode. Performance is similar between a node of cori phase 1 and KNL at order 1 and better on

  17. Optimizing Performance of Combustion Chemistry Solvers on Intel's Many Integrated Core (MIC) Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Sitaraman, Hariswaran [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Grout, Ray W [National Renewable Energy Laboratory (NREL), Golden, CO (United States)

    2017-06-09

    This work investigates novel algorithm designs and optimization techniques for restructuring chemistry integrators in zero and multidimensional combustion solvers, which can then be effectively used on the emerging generation of Intel's Many Integrated Core/Xeon Phi processors. These processors offer increased computing performance via large number of lightweight cores at relatively lower clock speeds compared to traditional processors (e.g. Intel Sandybridge/Ivybridge) used in current supercomputers. This style of processor can be productively used for chemistry integrators that form a costly part of computational combustion codes, in spite of their relatively lower clock speeds. Performance commensurate with traditional processors is achieved here through the combination of careful memory layout, exposing multiple levels of fine grain parallelism and through extensive use of vendor supported libraries (Cilk Plus and Math Kernel Libraries). Important optimization techniques for efficient memory usage and vectorization have been identified and quantified. These optimizations resulted in a factor of ~ 3 speed-up using Intel 2013 compiler and ~ 1.5 using Intel 2017 compiler for large chemical mechanisms compared to the unoptimized version on the Intel Xeon Phi. The strategies, especially with respect to memory usage and vectorization, should also be beneficial for general purpose computational fluid dynamics codes.

  18. Fortran code for SU(3) lattice gauge theory with and without MPI checkerboard parallelization

    Science.gov (United States)

    Berg, Bernd A.; Wu, Hao

    2012-10-01

    We document plain Fortran and Fortran MPI checkerboard code for Markov chain Monte Carlo simulations of pure SU(3) lattice gauge theory with the Wilson action in D dimensions. The Fortran code uses periodic boundary conditions and is suitable for pedagogical purposes and small scale simulations. For the Fortran MPI code two geometries are covered: the usual torus with periodic boundary conditions and the double-layered torus as defined in the paper. Parallel computing is performed on checkerboards of sublattices, which partition the full lattice in one, two, and so on, up to D directions (depending on the parameters set). For updating, the Cabibbo-Marinari heatbath algorithm is used. We present validations and test runs of the code. Performance is reported for a number of currently used Fortran compilers and, when applicable, MPI versions. For the parallelized code, performance is studied as a function of the number of processors. Program summary Program title: STMC2LSU3MPI Catalogue identifier: AEMJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEMJ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 26666 No. of bytes in distributed program, including test data, etc.: 233126 Distribution format: tar.gz Programming language: Fortran 77 compatible with the use of Fortran 90/95 compilers, in part with MPI extensions. Computer: Any capable of compiling and executing Fortran 77 or Fortran 90/95, when needed with MPI extensions. Operating system: Red Hat Enterprise Linux Server 6.1 with OpenMPI + pgf77 11.8-0, Centos 5.3 with OpenMPI + gfortran 4.1.2, Cray XT4 with MPICH2 + pgf90 11.2-0. Has the code been vectorised or parallelized?: Yes, parallelized using MPI extensions. Number of processors used: 2 to 11664 RAM: 200 Mega bytes per process. Classification: 11

  19. A Case Study of Some Issues in the Optimization of Fortran 90 Array Notation

    Directory of Open Access Journals (Sweden)

    John D. McCalpin

    1996-01-01

    Full Text Available Some issues in the relationship of coding style and compiler optimization are discussed with regard to Fortran 90 array notation. A review of several important Fortran 90 array constructs and their performance on vector and scalar hardware sets the stage for a more detailed example based on the kernel of a finite difference computational fluid dynamics model, specifically the nonlinear shallow water equations. Special attention is paid to the optimization of memory use and memory traffic. It is shown that the style of coding interacts with the rules of Fortran 90 and the current state of the art of Fortran 90 compilers to produce a fairly wide range of performance levels. Although performance degradations are typically small, a few cases of more serious loss of effciency are identified and discussed.

  20. LFK, FORTRAN Application Performance Test

    International Nuclear Information System (INIS)

    McMahon, F.H.

    1991-01-01

    1 - Description of program or function: LFK, the Livermore FORTRAN Kernels, is a computer performance test that measures a realistic floating-point performance range for FORTRAN applications. Informally known as the Livermore Loops test, the LFK test may be used as a computer performance test, as a test of compiler accuracy (via checksums) and efficiency, or as a hardware endurance test. The LFK test, which focuses on FORTRAN as used in computational physics, measures the joint performance of the computer CPU, the compiler, and the computational structures in units of Mega-flops/sec or Mflops. A C language version of subroutine KERNEL is also included which executes 24 samples of C numerical computation. The 24 kernels are a hydrodynamics code fragment, a fragment from an incomplete Cholesky conjugate gradient code, the standard inner product function of linear algebra, a fragment from a banded linear equations routine, a segment of a tridiagonal elimination routine, an example of a general linear recurrence equation, an equation of state fragment, part of an alternating direction implicit integration code, an integrate predictor code, a difference predictor code, a first sum, a first difference, a fragment from a two-dimensional particle-in-cell code, a part of a one-dimensional particle-in-cell code, an example of how casually FORTRAN can be written, a Monte Carlo search loop, an example of an implicit conditional computation, a fragment of a two-dimensional explicit hydrodynamics code, a general linear recurrence equation, part of a discrete ordinates transport program, a simple matrix calculation, a segment of a Planck distribution procedure, a two-dimensional implicit hydrodynamics fragment, and determination of the location of the first minimum in an array. 2 - Method of solution: CPU performance rates depend strongly on the maturity of FORTRAN compiler machine code optimization. The LFK test-bed executes the set of 24 kernels three times, resetting the DO

  1. Development of the static analyzer ANALYSIS/EX for FORTRAN programs

    International Nuclear Information System (INIS)

    Osanai, Seiji; Yokokawa, Mitsuo

    1993-08-01

    The static analyzer 'ANALYSIS' is the software tool for analyzing tree structure and COMMON regions of a FORTRAN program statically. With the installation of the new FORTRAN compiler, FORTRAN77EX(V12), to the computer system at JAERI, a new version of ANALYSIS, 'ANALYSIS/EX', has been developed to enhance its analyzing functions. In addition to the conventional functions of ANALYSIS, the ANALYSIS/EX is capable of analyzing of FORTRAN programs written in the FORTRAN77EX(V12) language grammar such as large-scale nuclear codes. The analyzing function of COMMON regions are also improved so as to obtain the relation between variables in COMMON regions in more detail. In this report, results of improvement and enhanced functions of the static analyzer ANALYSIS/EX are presented. (author)

  2. An INTEL 8080 microprocessor development system

    International Nuclear Information System (INIS)

    Horne, P.J.

    1977-01-01

    The INTEL 8080 has become one of the two most widely used microprocessors at CERN, the other being the MOTOROLA 6800. Even thouth this is the case, there have been, to date, only rudimentary facilities available for aiding the development of application programs for this microprocessor. An ideal development system is one which has a sophisticated editing and filing system, an assembler/compiler, and access to the microprocessor application. In many instances access to a PROM programmer is also required, as the application may utilize only PROMs for program storage. With these thoughts in mind, an INTEL 8080 microprocessor development system was implemented in the Proton Synchrotron (PS) Division. This system utilizes a PDP 11/45 as the editing and file-handling machine, and an MSC 8/MOD 80 microcomputer for assembling, PROM programming and debugging user programs at run time. The two machines are linked by an existing CAMAC crate system which will also provide the means of access to microprocessor applications in CAMAC and the interface of the development system to any other application. (Auth.)

  3. Multi-processing CTH: Porting legacy FORTRAN code to MP hardware

    Energy Technology Data Exchange (ETDEWEB)

    Bell, R.L.; Elrick, M.G.; Hertel, E.S. Jr.

    1996-12-31

    CTH is a family of codes developed at Sandia National Laboratories for use in modeling complex multi-dimensional, multi-material problems that are characterized by large deformations and/or strong shocks. A two-step, second-order accurate Eulerian solution algorithm is used to solve the mass, momentum, and energy conservation equations. CTH has historically been run on systems where the data are directly accessible to the cpu, such as workstations and vector supercomputers. Multiple cpus can be used if all data are accessible to all cpus. This is accomplished by placing compiler directives or subroutine calls within the source code. The CTH team has implemented this scheme for Cray shared memory machines under the Unicos operating system. This technique is effective, but difficult to port to other (similar) shared memory architectures because each vendor has a different format of directives or subroutine calls. A different model of high performance computing is one where many (> 1,000) cpus work on a portion of the entire problem and communicate by passing messages that contain boundary data. Most, if not all, codes that run effectively on parallel hardware were written with a parallel computing paradigm in mind. Modifying an existing code written for serial nodes poses a significantly different set of challenges that will be discussed. CTH, a legacy FORTRAN code, has been modified to allow for solutions on distributed memory parallel computers such as the IBM SP2, the Intel Paragon, Cray T3D, or a network of workstations. The message passing version of CTH will be discussed and example calculations will be presented along with performance data. Current timing studies indicate that CTH is 2--3 times faster than equivalent C++ code written specifically for parallel hardware. CTH on the Intel Paragon exhibits linear speed up with problems that are scaled (constant problem size per node) for the number of parallel nodes.

  4. Intel: High Throughput Computing Collaboration: A CERN openlab / Intel collaboration

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    The Intel/CERN High Throughput Computing Collaboration studies the application of upcoming Intel technologies to the very challenging environment of the LHC trigger and data-acquisition systems. These systems will need to transport and process many terabits of data every second, in some cases with tight latency constraints. Parallelisation and tight integration of accelerators and classical CPU via Intel's OmniPath fabric are the key elements in this project.

  5. Cloudy's Journey from FORTRAN to C, Why and How

    Science.gov (United States)

    Ferland, G. J.

    Cloudy is a large-scale plasma simulation code that is widely used across the astronomical community as an aid in the interpretation of spectroscopic data. The cover of the ADAS VI book featured predictions of the code. The FORTRAN 77 source code has always been freely available on the Internet, contributing to its widespread use. The coming of PCs and Linux has fundamentally changed the computing environment. Modern Fortran compilers (F90 and F95) are not freely available. A common-use code must be written in either FORTRAN 77 or C to be Open Source/GNU/Linux friendly. F77 has serious drawbacks - modern language constructs cannot be used, students do not have skills in this language, and it does not contribute to their future employability. It became clear that the code would have to be ported to C to have a viable future. I describe the approach I used to convert Cloudy from FORTRAN 77 with MILSPEC extensions to ANSI/ISO 89 C. Cloudy is now openly available as a C code, and will evolve to C++ as gcc and standard C++ mature. Cloudy looks to a bright future with a modern language.

  6. Extension of the AMBER molecular dynamics software to Intel's Many Integrated Core (MIC) architecture

    Science.gov (United States)

    Needham, Perri J.; Bhuiyan, Ashraf; Walker, Ross C.

    2016-04-01

    We present an implementation of explicit solvent particle mesh Ewald (PME) classical molecular dynamics (MD) within the PMEMD molecular dynamics engine, that forms part of the AMBER v14 MD software package, that makes use of Intel Xeon Phi coprocessors by offloading portions of the PME direct summation and neighbor list build to the coprocessor. We refer to this implementation as pmemd MIC offload and in this paper present the technical details of the algorithm, including basic models for MPI and OpenMP configuration, and analyze the resultant performance. The algorithm provides the best performance improvement for large systems (>400,000 atoms), achieving a ∼35% performance improvement for satellite tobacco mosaic virus (1,067,095 atoms) when 2 Intel E5-2697 v2 processors (2 ×12 cores, 30M cache, 2.7 GHz) are coupled to an Intel Xeon Phi coprocessor (Model 7120P-1.238/1.333 GHz, 61 cores). The implementation utilizes a two-fold decomposition strategy: spatial decomposition using an MPI library and thread-based decomposition using OpenMP. We also present compiler optimization settings that improve the performance on Intel Xeon processors, while retaining simulation accuracy.

  7. Visualization of Distributed Data Structures for High Performance Fortran-Like Languages

    Directory of Open Access Journals (Sweden)

    Rainer Koppler

    1997-01-01

    Full Text Available This article motivates the usage of graphics and visualization for efficient utilization of High Performance Fortran's (HPF's data distribution facilities. It proposes a graphical toolkit consisting of exploratory and estimation tools which allow the programmer to navigate through complex distributions and to obtain graphical ratings with respect to load distribution and communication. The toolkit has been implemented in a mapping design and visualization tool which is coupled with a compilation system for the HPF predecessor Vienna Fortran. Since this language covers a superset of HPF's facilities, the tool may also be used for visualization of HPF data structures.

  8. Kemari: A Portable High Performance Fortran System for Distributed Memory Parallel Processors

    Directory of Open Access Journals (Sweden)

    T. Kamachi

    1997-01-01

    Full Text Available We have developed a compilation system which extends High Performance Fortran (HPF in various aspects. We support the parallelization of well-structured problems with loop distribution and alignment directives similar to HPF's data distribution directives. Such directives give both additional control to the user and simplify the compilation process. For the support of unstructured problems, we provide directives for dynamic data distribution through user-defined mappings. The compiler also allows integration of message-passing interface (MPI primitives. The system is part of a complete programming environment which also comprises a parallel debugger and a performance monitor and analyzer. After an overview of the compiler, we describe the language extensions and related compilation mechanisms in detail. Performance measurements demonstrate the compiler's applicability to a variety of application classes.

  9. Standard Fortran

    International Nuclear Information System (INIS)

    Marshall, N.H.

    1981-01-01

    Because of its vast software investment in Fortran programs, the nuclear community has an inherent interest in the evolution of Fortran. This paper reviews the impact of the new Fortran 77 standard and discusses the projected changes which can be expected in the future

  10. Intel Galileo essentials

    CERN Document Server

    Grimmett, Richard

    2015-01-01

    This book is for anyone who has ever been curious about using the Intel Galileo to create electronics projects. Some programming background is useful, but if you know how to use a personal computer, with the aid of the step-by-step instructions in this book, you can construct complex electronics projects that use the Intel Galileo.

  11. Fortran

    CERN Document Server

    Marateck, Samuel L

    1977-01-01

    FORTRAN is written for students who have no prior knowledge of computers or programming. The book aims to teach students how to program using the FORTRAN language.The publication first elaborates on an introduction to computers and programming, introduction to FORTRAN, and calculations and the READ statement. Discussions focus on flow charts, rounding numbers, strings, executing the program, the WRITE and FORMAT statements, performing an addition, input and output devices, and algorithms. The text then takes a look at functions and the IF statement and the DO Loop, the IF-THEN-ELSE and the WHI

  12. DTK C/Fortran Interface Development for NEAMS FSI Simulations

    Energy Technology Data Exchange (ETDEWEB)

    Slattery, Stuart R. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Lebrun-Grandie, Damien T. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2016-09-19

    This report documents the development of DataTransferKit (DTK) C and Fortran interfaces for fluid-structure-interaction (FSI) simulations in NEAMS. In these simulations, the codes Nek5000 and Diablo are being coupled within the SHARP framework to study flow-induced vibration (FIV) in reactor steam generators. We will review the current Nek5000/Diablo coupling algorithm in SHARP and the current state of the solution transfer scheme used in this implementation. We will then present existing DTK algorithms which may be used instead to provide an improvement in both flexibility and scalability of the current SHARP implementation. We will show how these can be used within the current FSI scheme using a new set of interfaces to the algorithms developed by this work. These new interfaces currently expose the mesh-free solution transfer algorithms in DTK, a C++ library, and are written in C and Fortran to enable coupling of both Nek5000 and Diablo in their native Fortran language. They have been compiled and tested on Cooley, the test-bed machine for Mira at ALCF.

  13. Intel Xeon Phi coprocessor high performance programming

    CERN Document Server

    Jeffers, James

    2013-01-01

    Authors Jim Jeffers and James Reinders spent two years helping educate customers about the prototype and pre-production hardware before Intel introduced the first Intel Xeon Phi coprocessor. They have distilled their own experiences coupled with insights from many expert customers, Intel Field Engineers, Application Engineers and Technical Consulting Engineers, to create this authoritative first book on the essentials of programming for this new architecture and these new products. This book is useful even before you ever touch a system with an Intel Xeon Phi coprocessor. To ensure that your applications run at maximum efficiency, the authors emphasize key techniques for programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi coprocessors, or other high performance microprocessors. Applying these techniques will generally increase your program performance on any system, and better prepare you for Intel Xeon Phi coprocessors and the Intel MIC architecture. It off...

  14. Automatic Loop Parallelization via Compiler Guided Refactoring

    DEFF Research Database (Denmark)

    Larsen, Per; Ladelsky, Razya; Lidman, Jacob

    For many parallel applications, performance relies not on instruction-level parallelism, but on loop-level parallelism. Unfortunately, many modern applications are written in ways that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities...... for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler’s ability to generate loop-parallel code. We use this compilation system to modify two sequential...... benchmarks, finding that the code parallelized in this way runs up to 8.3 times faster on an octo-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core IBM POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should...

  15. An environment for parallel structuring of Fortran programs

    International Nuclear Information System (INIS)

    Sridharan, K.; McShea, M.; Denton, C.; Eventoff, B.; Browne, J.C.; Newton, P.; Ellis, M.; Grossbard, D.; Wise, T.; Clemmer, D.

    1990-01-01

    The paper describes and illustrates an environment for interactive support of the detection and implementation of macro-level parallelism in Fortran programs. The approach couples algorithms for dependence analysis with both innovative techniques for complexity management and capabilities for the measurement and analysis of the parallel computation structures generated through use of the environment. The resulting environment is complementary to the more common approach of seeking local parallelism by loop unrolling, either by an automatic compiler or manually. (orig.)

  16. Emulation of MS DOS Operational System on the Autonomous Crate-Controller with I8086 microprocessor

    International Nuclear Information System (INIS)

    Hons, Z.; Cizek, P.; Streit, V.

    1988-01-01

    KM-DOS operating system for CAMAC autonomous crate-controller based on Intel 8086/8087 microprocessor connected with Pravec-16 IBM PC is described. The KM-DOS system fully emulates the MS DOS environment on the CAMAC controller. Thus ASSEMBLER, FORTRAN, C and PASCAL programs compiled and linked on IBM PC and compatible can be run on the CAMAC controller and parall work of both computers is enabled

  17. Home automation with Intel Galileo

    CERN Document Server

    Dundar, Onur

    2015-01-01

    This book is for anyone who wants to learn Intel Galileo for home automation and cross-platform software development. No knowledge of programming with Intel Galileo is assumed, but knowledge of the C programming language is essential.

  18. DB90: A Fortran Callable Relational Database Routine for Scientific and Engineering Computer Programs

    Science.gov (United States)

    Wrenn, Gregory A.

    2005-01-01

    This report describes a database routine called DB90 which is intended for use with scientific and engineering computer programs. The software is written in the Fortran 90/95 programming language standard with file input and output routines written in the C programming language. These routines should be completely portable to any computing platform and operating system that has Fortran 90/95 and C compilers. DB90 allows a program to supply relation names and up to 5 integer key values to uniquely identify each record of each relation. This permits the user to select records or retrieve data in any desired order.

  19. Unlock performance secrets of next-gen Intel hardware

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    Intel® Xeon Phi Product. About the speaker Zakhar is a software architect in Intel SSG group. His current role is Parallel Studio architect with focus on SIMD vector parallelism assistance tools. Before it he was working as Intel Advisor XE software architect and software development team-lead. Before joining Intel he was...

  20. FORTRAN program for calculating liquid-phase and gas-phase thermal diffusion column coefficients

    International Nuclear Information System (INIS)

    Rutherford, W.M.

    1980-01-01

    A computer program (COLCO) was developed for calculating thermal diffusion column coefficients from theory. The program, which is written in FORTRAN IV, can be used for both liquid-phase and gas-phase thermal diffusion columns. Column coefficients for the gas phase can be based on gas properties calculated from kinetic theory using tables of omega integrals or on tables of compiled physical properties as functions of temperature. Column coefficients for the liquid phase can be based on compiled physical property tables. Program listings, test data, sample output, and users manual are supplied for appendices

  1. Initial results on computational performance of Intel Many Integrated Core (MIC) architecture: implementation of the Weather and Research Forecasting (WRF) Purdue-Lin microphysics scheme

    Science.gov (United States)

    Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

    2014-10-01

    Purdue-Lin scheme is a relatively sophisticated microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme includes six classes of hydro meteors: water vapor, cloud water, raid, cloud ice, snow and graupel. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. In this paper, we accelerate the Purdue Lin scheme using Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi is a high performance coprocessor consists of up to 61 cores. The Xeon Phi is connected to a CPU via the PCI Express (PICe) bus. In this paper, we will discuss in detail the code optimization issues encountered while tuning the Purdue-Lin microphysics Fortran code for Xeon Phi. In particularly, getting a good performance required utilizing multiple cores, the wide vector operations and make efficient use of memory. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 4.2x. Furthermore, the same optimizations improved performance on Intel Xeon E5-2603 CPU by a factor of 1.2x compared to the original code.

  2. INTEL: Intel based systems move up in supercomputing ranks

    CERN Multimedia

    2002-01-01

    "The TOP500 supercomputer rankings released today at the Supercomputing 2002 conference show a dramatic increase in the number of Intel-based systems being deployed in high-performance computing (HPC) or supercomputing areas" (1/2 page).

  3. Applications Performance on NAS Intel Paragon XP/S - 15#

    Science.gov (United States)

    Saini, Subhash; Simon, Horst D.; Copper, D. M. (Technical Monitor)

    1994-01-01

    The Numerical Aerodynamic Simulation (NAS) Systems Division received an Intel Touchstone Sigma prototype model Paragon XP/S- 15 in February, 1993. The i860 XP microprocessor with an integrated floating point unit and operating in dual -instruction mode gives peak performance of 75 million floating point operations (NIFLOPS) per second for 64 bit floating point arithmetic. It is used in the Paragon XP/S-15 which has been installed at NAS, NASA Ames Research Center. The NAS Paragon has 208 nodes and its peak performance is 15.6 GFLOPS. Here, we will report on early experience using the Paragon XP/S- 15. We have tested its performance using both kernels and applications of interest to NAS. We have measured the performance of BLAS 1, 2 and 3 both assembly-coded and Fortran coded on NAS Paragon XP/S- 15. Furthermore, we have investigated the performance of a single node one-dimensional FFT, a distributed two-dimensional FFT and a distributed three-dimensional FFT Finally, we measured the performance of NAS Parallel Benchmarks (NPB) on the Paragon and compare it with the performance obtained on other highly parallel machines, such as CM-5, CRAY T3D, IBM SP I, etc. In particular, we investigated the following issues, which can strongly affect the performance of the Paragon: a. Impact of the operating system: Intel currently uses as a default an operating system OSF/1 AD from the Open Software Foundation. The paging of Open Software Foundation (OSF) server at 22 MB to make more memory available for the application degrades the performance. We found that when the limit of 26 NIB per node out of 32 MB available is reached, the application is paged out of main memory using virtual memory. When the application starts paging, the performance is considerably reduced. We found that dynamic memory allocation can help applications performance under certain circumstances. b. Impact of data cache on the i860/XP: We measured the performance of the BLAS both assembly coded and Fortran

  4. Portable parallel programming in a Fortran environment

    International Nuclear Information System (INIS)

    May, E.N.

    1989-01-01

    Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs

  5. Roofline Analysis in the Intel® Advisor to Deliver Optimized Performance for applications on Intel® Xeon Phi™ Processor

    OpenAIRE

    Koskela, TS; Lobet, M

    2017-01-01

    In this session we show, in two case studies, how the roofline feature of Intel Advisor has been utilized to optimize the performance of kernels of the XGC1 and PICSAR codes in preparation for Intel Knights Landing architecture. The impact of the implemented optimizations and the benefits of using the automatic roofline feature of Intel Advisor to study performance of large applications will be presented. This demonstrates an effective optimization strategy that has enabled these science appl...

  6. MULTITASKER, Multitasking Kernel for C and FORTRAN Under UNIX

    International Nuclear Information System (INIS)

    Brooks, E.D. III

    1988-01-01

    1 - Description of program or function: MULTITASKER implements a multitasking kernel for the C and FORTRAN programming languages that runs under UNIX. The kernel provides a multitasking environment which serves two purposes. The first is to provide an efficient portable environment for the development, debugging, and execution of production multiprocessor programs. The second is to provide a means of evaluating the performance of a multitasking program on model multiprocessor hardware. The performance evaluation features require no changes in the application program source and are implemented as a set of compile- and run-time options in the kernel. 2 - Method of solution: The FORTRAN interface to the kernel is identical in function to the CRI multitasking package provided for the Cray XMP. This provides a migration path to high speed (but small N) multiprocessors once the application has been coded and debugged. With use of the UNIX m4 macro preprocessor, source compatibility can be achieved between the UNIX code development system and the target Cray multiprocessor. The kernel also provides a means of evaluating a program's performance on model multiprocessors. Execution traces may be obtained which allow the user to determine kernel overhead, memory conflicts between various tasks, and the average concurrency being exploited. The kernel may also be made to switch tasks every cpu instruction with a random execution ordering. This allows the user to look for unprotected critical regions in the program. These features, implemented as a set of compile- and run-time options, cause extra execution overhead which is not present in the standard production version of the kernel

  7. Computation cluster for Monte Carlo calculations

    Energy Technology Data Exchange (ETDEWEB)

    Petriska, M.; Vitazek, K.; Farkas, G.; Stacho, M.; Michalek, S. [Dep. Of Nuclear Physics and Technology, Faculty of Electrical Engineering and Information, Technology, Slovak Technical University, Ilkovicova 3, 81219 Bratislava (Slovakia)

    2010-07-01

    Two computation clusters based on Rocks Clusters 5.1 Linux distribution with Intel Core Duo and Intel Core Quad based computers were made at the Department of the Nuclear Physics and Technology. Clusters were used for Monte Carlo calculations, specifically for MCNP calculations applied in Nuclear reactor core simulations. Optimization for computation speed was made on hardware and software basis. Hardware cluster parameters, such as size of the memory, network speed, CPU speed, number of processors per computation, number of processors in one computer were tested for shortening the calculation time. For software optimization, different Fortran compilers, MPI implementations and CPU multi-core libraries were tested. Finally computer cluster was used in finding the weighting functions of neutron ex-core detectors of VVER-440. (authors)

  8. Computation cluster for Monte Carlo calculations

    International Nuclear Information System (INIS)

    Petriska, M.; Vitazek, K.; Farkas, G.; Stacho, M.; Michalek, S.

    2010-01-01

    Two computation clusters based on Rocks Clusters 5.1 Linux distribution with Intel Core Duo and Intel Core Quad based computers were made at the Department of the Nuclear Physics and Technology. Clusters were used for Monte Carlo calculations, specifically for MCNP calculations applied in Nuclear reactor core simulations. Optimization for computation speed was made on hardware and software basis. Hardware cluster parameters, such as size of the memory, network speed, CPU speed, number of processors per computation, number of processors in one computer were tested for shortening the calculation time. For software optimization, different Fortran compilers, MPI implementations and CPU multi-core libraries were tested. Finally computer cluster was used in finding the weighting functions of neutron ex-core detectors of VVER-440. (authors)

  9. Fortran 90 for scientists and engineers

    CERN Document Server

    Hahn, Brian

    1994-01-01

    The introduction of the Fortran 90 standard is the first significant change in the Fortran language in over 20 years. this book is designed for anyone wanting to learn Fortran for the first time or or a programmer who needs to upgrade from Fortran 77 to Fortran 90.Employing a practical, problem-based approach this book provides a comprehensive introduction to the language. More experienced programmers will find it a useful update to the new standard and will benefit from the emphasis on science and engineering applications.

  10. A Linear Algebra Framework for Static High Performance Fortran Code Distribution

    Directory of Open Access Journals (Sweden)

    Corinne Ancourt

    1997-01-01

    Full Text Available High Performance Fortran (HPF was developed to support data parallel programming for single-instruction multiple-data (SIMD and multiple-instruction multiple-data (MIMD machines with distributed memory. The programmer is provided a familiar uniform logical address space and specifies the data distribution by directives. The compiler then exploits these directives to allocate arrays in the local memories, to assign computations to elementary processors, and to migrate data between processors when required. We show here that linear algebra is a powerful framework to encode HPF directives and to synthesize distributed code with space-efficient array allocation, tight loop bounds, and vectorized communications for INDEPENDENT loops. The generated code includes traditional optimizations such as guard elimination, message vectorization and aggregation, and overlap analysis. The systematic use of an affine framework makes it possible to prove the compilation scheme correct.

  11. Modern Fortran in practice

    NARCIS (Netherlands)

    Markus, A.

    2012-01-01

    From its earliest days, the Fortran programming language has been designed with computing efficiency in mind. The latest standard, Fortran 2008, incorporates a host of modern features, including object-orientation, array operations, user-defined types, and provisions for parallel computing. This

  12. [Intel random number generator-based true random number generator].

    Science.gov (United States)

    Huang, Feng; Shen, Hong

    2004-09-01

    To establish a true random number generator on the basis of certain Intel chips. The random numbers were acquired by programming using Microsoft Visual C++ 6.0 via register reading from the random number generator (RNG) unit of an Intel 815 chipset-based computer with Intel Security Driver (ISD). We tested the generator with 500 random numbers in NIST FIPS 140-1 and X(2) R-Squared test, and the result showed that the random number it generated satisfied the demand of independence and uniform distribution. We also compared the random numbers generated by Intel RNG-based true random number generator and those from the random number table statistically, by using the same amount of 7500 random numbers in the same value domain, which showed that the SD, SE and CV of Intel RNG-based random number generator were less than those of the random number table. The result of u test of two CVs revealed no significant difference between the two methods. Intel RNG-based random number generator can produce high-quality random numbers with good independence and uniform distribution, and solves some problems with random number table in acquisition of the random numbers.

  13. Using the Intel Math Kernel Library on Peregrine | High-Performance

    Science.gov (United States)

    Computing | NREL the Intel Math Kernel Library on Peregrine Using the Intel Math Kernel Library on Peregrine Learn how to use the Intel Math Kernel Library (MKL) with Peregrine system software. MKL architectures. Core math functions in MKL include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier

  14. Final Report, Center for Programming Models for Scalable Parallel Computing: Co-Array Fortran, Grant Number DE-FC02-01ER25505

    Energy Technology Data Exchange (ETDEWEB)

    Robert W. Numrich

    2008-04-22

    The major accomplishment of this project is the production of CafLib, an 'object-oriented' parallel numerical library written in Co-Array Fortran. CafLib contains distributed objects such as block vectors and block matrices along with procedures, attached to each object, that perform basic linear algebra operations such as matrix multiplication, matrix transpose and LU decomposition. It also contains constructors and destructors for each object that hide the details of data decomposition from the programmer, and it contains collective operations that allow the programmer to calculate global reductions, such as global sums, global minima and global maxima, as well as vector and matrix norms of several kinds. CafLib is designed to be extensible in such a way that programmers can define distributed grid and field objects, based on vector and matrix objects from the library, for finite difference algorithms to solve partial differential equations. A very important extra benefit that resulted from the project is the inclusion of the co-array programming model in the next Fortran standard called Fortran 2008. It is the first parallel programming model ever included as a standard part of the language. Co-arrays will be a supported feature in all Fortran compilers, and the portability provided by standardization will encourage a large number of programmers to adopt it for new parallel application development. The combination of object-oriented programming in Fortran 2003 with co-arrays in Fortran 2008 provides a very powerful programming model for high-performance scientific computing. Additional benefits from the project, beyond the original goal, include a programto provide access to the co-array model through access to the Cray compiler as a resource for teaching and research. Several academics, for the first time, included the co-array model as a topic in their courses on parallel computing. A separate collaborative project with LANL and PNNL showed how to

  15. Computer and compiler effects on code results: status report

    International Nuclear Information System (INIS)

    1996-01-01

    Within the framework of the international effort on the assessment of computer codes, which are designed to describe the overall reactor coolant system (RCS) thermalhydraulic response, core damage progression, and fission product release and transport during severe accidents, there has been a continuous debate as to whether the code results are influenced by different code users or by different computers or compilers. The first aspect, the 'Code User Effect', has been investigated already. In this paper the other aspects will be discussed and proposals are given how to make large system codes insensitive to different computers and compilers. Hardware errors and memory problems are not considered in this report. The codes investigated herein are integrated code systems (e. g. ESTER, MELCOR) and thermalhydraulic system codes with extensions for severe accident simulation (e. g. SCDAP/RELAP, ICARE/CATHARE, ATHLET-CD), and codes to simulate fission product transport (e. g. TRAPMELT, SOPHAEROS). Since all of these codes are programmed in Fortran 77, the discussion herein is based on this programming language although some remarks are made about Fortran 90. Some observations about different code results by using different computers are reported and possible reasons for this unexpected behaviour are listed. Then methods are discussed how to avoid portability problems

  16. Comparative VME Performance Tests for MEN A20 Intel-L865 and RIO-3 PPC-LynxOS platforms

    CERN Document Server

    Andersen, M; CERN. Geneva. BE Department

    2009-01-01

    This benchmark note presents test results from reading values over VME using different methods and different sizes of data registers, running on two different platforms Intel-L865 and PPC-LynxOS. We find that the PowerPC is a factor 3 faster in accessing an array of contiguous VME memory locations. Block transfer and DMA read accesses are also tested and compared with conventional single access reads.

  17. CERN welcomes Intel Science Fair winners

    CERN Multimedia

    Katarina Anthony

    2012-01-01

    This June, CERN welcomed twelve gifted young scientists aged 15-18 for a week-long visit of the Laboratory. These talented students were the winners of a special award co-funded by CERN and Intel, given yearly at the Intel International Science and Engineering Fair (ISEF).   The CERN award winners at the Intel ISEF 2012 Special Awards Ceremony. © Society for Science & the Public (SSP). The CERN award was set up back in 2009 as an opportunity to bring some of the best and brightest young minds to the Laboratory. The award winners are selected from among 1,500 talented students participating in ISEF – the world's largest pre-university science competition, in which students compete for more than €3 million in awards. “CERN gave an award – which was obviously this trip – to students studying physics, maths, electrical engineering and computer science,” says Benjamin Craig Bartlett, 17, from South Carolina, USA, wh...

  18. Running the EGS4 Monte Carlo code with Fortran 90 on a pentium computer

    International Nuclear Information System (INIS)

    Caon, M.; Bibbo, G.; Pattison, J.

    1996-01-01

    The possibility to run the EGS4 Monte Carlo code radiation transport system for medical radiation modelling on a microcomputer is discussed. This has been done using a Fortran 77 compiler with a 32-bit memory addressing system running under a memory extender operating system. In addition a virtual memory manager such as QEMM386 was required. It has successfully run on a SUN Sparcstation2. In 1995 faster Pentium-based microcomputers became available as did the Windows 95 operating system which can handle 32-bit programs, multitasking and provides its own virtual memory management. The paper describe how with simple modification to the batch files it was possible to run EGS4 on a Pentium under Fortran 90 and Windows 95. This combination of software and hardware is cheaper and faster than running it on a SUN Sparcstation2. 8 refs., 1 tab

  19. Running the EGS4 Monte Carlo code with Fortran 90 on a pentium computer

    Energy Technology Data Exchange (ETDEWEB)

    Caon, M. [Flinders Univ. of South Australia, Bedford Park, SA (Australia)]|[Univercity of South Australia, SA (Australia); Bibbo, G. [Womens and Childrens hospital, SA (Australia); Pattison, J. [Univercity of South Australia, SA (Australia)

    1996-09-01

    The possibility to run the EGS4 Monte Carlo code radiation transport system for medical radiation modelling on a microcomputer is discussed. This has been done using a Fortran 77 compiler with a 32-bit memory addressing system running under a memory extender operating system. In addition a virtual memory manager such as QEMM386 was required. It has successfully run on a SUN Sparcstation2. In 1995 faster Pentium-based microcomputers became available as did the Windows 95 operating system which can handle 32-bit programs, multitasking and provides its own virtual memory management. The paper describe how with simple modification to the batch files it was possible to run EGS4 on a Pentium under Fortran 90 and Windows 95. This combination of software and hardware is cheaper and faster than running it on a SUN Sparcstation2. 8 refs., 1 tab.

  20. Strategies and Experiences Using High Performance Fortran

    National Research Council Canada - National Science Library

    Shires, Dale

    2001-01-01

    .... High performance Fortran (HPF) is a relative new addition to the Fortran dialect It is an attempt to provide an efficient high-level Fortran parallel programming language for the latest generation of been debatable...

  1. Logical inference techniques for loop parallelization

    DEFF Research Database (Denmark)

    Oancea, Cosmin Eugen; Rauchwerger, Lawrence

    2012-01-01

    the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S={}, where S is a set expression representing array indexes. Using...... of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECT-CLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers....

  2. Using Kokkos for Performant Cross-Platform Acceleration of Liquid Rocket Simulations

    Science.gov (United States)

    2017-05-08

    defined functors (like Thrust or Intel TBB) Backends for Nvidia GPU, Intel Xeon, Xeon Phi , IBM Power8, others. “View” data structure provides optimal... Architecture of my Kokkos framework Designed for minimally-invasive operation alongside large Fortran code. Everything is controlled from Fortran through a...Controls Kokkos initialization/finalization void initialize(…); void finalize(…); TVProperties* gettvproperties(); Architecture of my Kokkos framework

  3. Expert Programmer versus Parallelizing Compiler: A Comparative Study of Two Approaches for Distributed Shared Memory

    Directory of Open Access Journals (Sweden)

    M. F. P. O'Boyle

    1996-01-01

    Full Text Available This article critically examines current parallel programming practice and optimizing compiler development. The general strategies employed by compiler and programmer to optimize a Fortran program are described, and then illustrated for a specific case by applying them to a well-known scientific program, TRED2, using the KSR-1 as the target architecture. Extensive measurement is applied to the resulting versions of the program, which are compared with a version produced by a commercial optimizing compiler, KAP. The compiler strategy significantly outperforms KAP and does not fall far short of the performance achieved by the programmer. Following the experimental section each approach is critiqued by the other. Perceived flaws, advantages, and common ground are outlined, with an eye to improving both schemes.

  4. Evaluation of CHO Benchmarks on the Arria 10 FPGA using Intel FPGA SDK for OpenCL

    Energy Technology Data Exchange (ETDEWEB)

    Jin, Zheming [Argonne National Lab. (ANL), Argonne, IL (United States); Yoshii, Kazutomo [Argonne National Lab. (ANL), Argonne, IL (United States); Finkel, Hal [Argonne National Lab. (ANL), Argonne, IL (United States); Cappello, Franck [Argonne National Lab. (ANL), Argonne, IL (United States)

    2017-05-23

    The OpenCL standard is an open programming model for accelerating algorithms on heterogeneous computing system. OpenCL extends the C-based programming language for developing portable codes on different platforms such as CPU, Graphics processing units (GPUs), Digital Signal Processors (DSPs) and Field Programmable Gate Arrays (FPGAs). The Intel FPGA SDK for OpenCL is a suite of tools that allows developers to abstract away the complex FPGA-based development flow for a high-level software development flow. Users can focus on the design of hardware-accelerated kernel functions in OpenCL and then direct the tools to generate the low-level FPGA implementations. The approach makes the FPGA-based development more accessible to software users as the needs for hybrid computing using CPUs and FPGAs are increasing. It can also significantly reduce the hardware development time as users can evaluate different ideas with high-level language without deep FPGA domain knowledge. Benchmarking of OpenCL-based framework is an effective way for analyzing the performance of system by studying the execution of the benchmark applications. CHO is a suite of benchmark applications that provides support for OpenCL [1]. The authors presented CHO as an OpenCL port of the CHStone benchmark. Using Altera OpenCL (AOCL) compiler to synthesize the benchmark applications, they listed the resource usage and performance of each kernel that can be successfully synthesized by the compiler. In this report, we evaluate the resource usage and performance of the CHO benchmark applications using the Intel FPGA SDK for OpenCL and Nallatech 385A FPGA board that features an Arria 10 FPGA device. The focus of the report is to have a better understanding of the resource usage and performance of the kernel implementations using Arria-10 FPGA devices compared to Stratix-5 FPGA devices. In addition, we also gain knowledge about the limitations of the current compiler when it fails to synthesize a benchmark

  5. Experience with Intel's Many Integrated Core Architecture in ATLAS Software

    CERN Document Server

    Fleischmann, S; The ATLAS collaboration; Lavrijsen, W; Neumann, M; Vitillo, R

    2014-01-01

    Intel recently released the first commercial boards of its Many Integrated Core (MIC) Architecture. MIC is Intel's solution for the domain of throughput computing, currently dominated by general purpose programming on graphics processors (GPGPU). MIC allows the use of the more familiar x86 programming model and supports standard technologies such as OpenMP, MPI, and Intel's Threading Building Blocks. This should make it possible to develop for both throughput and latency devices using a single code base.\

  6. Experience with Intel's Many Integrated Core Architecture in ATLAS Software

    CERN Document Server

    Fleischmann, S; The ATLAS collaboration; Lavrijsen, W; Neumann, M; Vitillo, R

    2013-01-01

    Intel recently released the first commercial boards of its Many Integrated Core (MIC) Architecture. MIC is Intel's solution for the domain of throughput computing, currently dominated by general purpose programming on graphics processors (GPGPU). MIC allows the use of the more familiar x86 programming model and supports standard technologies such as OpenMP, MPI, and Intel's Threading Building Blocks. This should make it possible to develop for both throughput and latency devices using a single code base.\

  7. Exploiting first-class arrays in Fortran for accelerator programming

    International Nuclear Information System (INIS)

    Rasmussen, Craig E.; Weseloh, Wayne N.; Robey, Robert W.; Sottile, Matthew J.; Quinlan, Daniel; Overbey, Jeffrey

    2010-01-01

    Emerging architectures for high performance computing often are well suited to a data parallel programming model. This paper presents a simple programming methodology based on existing languages and compiler tools that allows programmers to take advantage of these systems. We will work with the array features of Fortran 90 to show how this infrequently exploited, standardized language feature is easily transformed to lower level accelerator code. Our transformations are based on a mapping from Fortran 90 to C++ code with OpenCL extensions. The sheer complexity of programming for clusters of many or multi-core processors with tens of millions threads of execution make the simplicity of the data parallel model attractive. Furthermore, the increasing complexity of todays applications (especially when convolved with the increasing complexity of the hardware) and the need for portability across hardware architectures make a higher-level and simpler programming model like data parallel attractive. The goal of this work has been to exploit source-to-source transformations that allow programmers to develop and maintain programs at a high-level of abstraction, without coding to a specific hardware architecture. Furthermore these transformations allow multiple hardware architectures to be targeted without changing the high-level source. It also removes the necessity for application programmers to understand details of the accelerator architecture or to know OpenCL.

  8. An Extensible Open-Source Compiler Infrastructure for Testing

    Energy Technology Data Exchange (ETDEWEB)

    Quinlan, D; Ur, S; Vuduc, R

    2005-12-09

    Testing forms a critical part of the development process for large-scale software, and there is growing need for automated tools that can read, represent, analyze, and transform the application's source code to help carry out testing tasks. However, the support required to compile applications written in common general purpose languages is generally inaccessible to the testing research community. In this paper, we report on an extensible, open-source compiler infrastructure called ROSE, which is currently in development at Lawrence Livermore National Laboratory. ROSE specifically targets developers who wish to build source-based tools that implement customized analyses and optimizations for large-scale C, C++, and Fortran90 scientific computing applications (on the order of a million lines of code or more). However, much of this infrastructure can also be used to address problems in testing, and ROSE is by design broadly accessible to those without a formal compiler background. This paper details the interactions between testing of applications and the ways in which compiler technology can aid in the understanding of those applications. We emphasize the particular aspects of ROSE, such as support for the general analysis of whole programs, that are particularly well-suited to the testing research community and the scale of the problems that community solves.

  9. LISP software generative compilation within the frame of a SLIP system; La compilation generative de programmes LISP dans le cadre d'un systeme SLIP

    Energy Technology Data Exchange (ETDEWEB)

    Sitbon, Andre

    1968-04-24

    After having outlined the limitations associated with the use of some programming languages (Fortran, Algol, assembler, and so on), and the interest of the use of the LISP structure and its associated language, the author notices that some problems remain regarding the memorisation of the computing process obtained by interpretation. Thus, he introduces a generative compiler which produces an executable programme, and which is written in a language very close to the used machine language, i.e. the FAP assembler language.

  10. Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

    OpenAIRE

    Abdul Kareem PARCHUR; Kuppangari Krishna RAO; Fazal NOORBASHA; Ram Asaray SINGH

    2010-01-01

    As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI) benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT) @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalab...

  11. Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

    Directory of Open Access Journals (Sweden)

    Abdul Kareem PARCHUR

    2012-08-01

    Full Text Available High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310. The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many key application areas in modern generation. The scaling of performance in two major series of Intel Xeon processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310 has been analyzed using the performance numbers of 12 CPU2006 integer benchmarks, performance numbers that exhibit significant differences in performance. The results and analysis can be used by performance engineers, scientists and developers to better understand the performance scaling in modern generation processors.

  12. Theorem Proving in Intel Hardware Design

    Science.gov (United States)

    O'Leary, John

    2009-01-01

    For the past decade, a framework combining model checking (symbolic trajectory evaluation) and higher-order logic theorem proving has been in production use at Intel. Our tools and methodology have been used to formally verify execution cluster functionality (including floating-point operations) for a number of Intel products, including the Pentium(Registered TradeMark)4 and Core(TradeMark)i7 processors. Hardware verification in 2009 is much more challenging than it was in 1999 - today s CPU chip designs contain many processor cores and significant firmware content. This talk will attempt to distill the lessons learned over the past ten years, discuss how they apply to today s problems, outline some future directions.

  13. Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor

    Directory of Open Access Journals (Sweden)

    Lukasz Szustak

    2015-01-01

    Full Text Available The multidimensional positive definite advection transport algorithm (MPDATA belongs to the group of nonoscillatory forward-in-time algorithms and performs a sequence of stencil computations. MPDATA is one of the major parts of the dynamic core of the EULAG geophysical model. In this work, we outline an approach to adaptation of the 3D MPDATA algorithm to the Intel MIC architecture. In order to utilize available computing resources, we propose the (3 + 1D decomposition of MPDATA heterogeneous stencil computations. This approach is based on combination of the loop tiling and fusion techniques. It allows us to ease memory/communication bounds and better exploit the theoretical floating point efficiency of target computing platforms. An important method of improving the efficiency of the (3 + 1D decomposition is partitioning of available cores/threads into work teams. It permits for reducing inter-cache communication overheads. This method also increases opportunities for the efficient distribution of MPDATA computation onto available resources of the Intel MIC architecture, as well as Intel CPUs. We discuss preliminary performance results obtained on two hybrid platforms, containing two CPUs and Intel Xeon Phi. The top-of-the-line Intel Xeon Phi 7120P gives the best performance results, and executes MPDATA almost 2 times faster than two Intel Xeon E5-2697v2 CPUs.

  14. Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

    OpenAIRE

    Abdul Kareem PARCHUR; Ram Asaray SINGH

    2012-01-01

    High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310). The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many ke...

  15. MILC staggered conjugate gradient performance on Intel KNL

    OpenAIRE

    DeTar, Carleton; Doerfler, Douglas; Gottlieb, Steven; Jha, Ashish; Kalamkar, Dhiraj; Li, Ruizi; Toussaint, Doug

    2016-01-01

    We review our work done to optimize the staggered conjugate gradient (CG) algorithm in the MILC code for use with the Intel Knights Landing (KNL) architecture. KNL is the second gener- ation Intel Xeon Phi processor. It is capable of massive thread parallelism, data parallelism, and high on-board memory bandwidth and is being adopted in supercomputing centers for scientific research. The CG solver consumes the majority of time in production running, so we have spent most of our effort on it. ...

  16. LISP software generative compilation within the frame of a SLIP system

    International Nuclear Information System (INIS)

    Sitbon, Andre

    1968-01-01

    After having outlined the limitations associated with the use of some programming languages (Fortran, Algol, assembler, and so on), and the interest of the use of the LISP structure and its associated language, the author notices that some problems remain regarding the memorisation of the computing process obtained by interpretation. Thus, he introduces a generative compiler which produces an executable programme, and which is written in a language very close to the used machine language, i.e. the FAP assembler language

  17. A portable approach for PIC on emerging architectures

    Science.gov (United States)

    Decyk, Viktor

    2016-03-01

    A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.

  18. Performance of Artificial Intelligence Workloads on the Intel Core 2 Duo Series Desktop Processors

    Directory of Open Access Journals (Sweden)

    Abdul Kareem PARCHUR

    2010-12-01

    Full Text Available As the processor architecture becomes more advanced, Intel introduced its Intel Core 2 Duo series processors. Performance impact on Intel Core 2 Duo processors are analyzed using SPEC CPU INT 2006 performance numbers. This paper studied the behavior of Artificial Intelligence (AI benchmarks on Intel Core 2 Duo series processors. Moreover, we estimated the task completion time (TCT @1 GHz, @2 GHz and @3 GHz Intel Core 2 Duo series processors frequency. Our results show the performance scalability in Intel Core 2 Duo series processors. Even though AI benchmarks have similar execution time, they have dissimilar characteristics which are identified using principal component analysis and dendogram. As the processor frequency increased from 1.8 GHz to 3.167 GHz the execution time is decreased by ~370 sec for AI workloads. In the case of Physics/Quantum Computing programs it was ~940 sec.

  19. High Performance Object-Oriented Scientific Programming in Fortran 90

    Science.gov (United States)

    Norton, Charles D.; Decyk, Viktor K.; Szymanski, Boleslaw K.

    1997-01-01

    We illustrate how Fortran 90 supports object-oriented concepts by example of plasma particle computations on the IBM SP. Our experience shows that Fortran 90 and object-oriented methodology give high performance while providing a bridge from Fortran 77 legacy codes to modern programming principles. All of our object-oriented Fortran 90 codes execute more quickly thatn the equeivalent C++ versions, yet the abstraction modelling capabilities used for scentific programming are comparably powereful.

  20. Parallelization of particle transport using Intel® TBB

    International Nuclear Information System (INIS)

    Apostolakis, J; Brun, R; Carminati, F; Gheata, A; Wenzel, S; Belogurov, S; Ovcharenko, E

    2014-01-01

    One of the current challenges in HEP computing is the development of particle propagation algorithms capable of efficiently use all performance aspects of modern computing devices. The Geant-Vector project at CERN has recently introduced an approach in this direction. This paper describes the implementation of a similar workflow using the Intel(r) Threading Building Blocks (Intel(r) TBB) library. This approach is intended to overcome the potential bottleneck of having a single dispatcher on many-core architectures and to result in better scalability compared to the initial pthreads-based version.

  1. Performance of a plasma fluid code on the Intel parallel computers

    International Nuclear Information System (INIS)

    Lynch, V.E.; Carreras, B.A.; Drake, J.B.; Leboeuf, J.N.; Liewer, P.

    1992-01-01

    One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel σ machine gives an improvement factor close to 64 over the single-processor CRAY-2

  2. Performance of a plasma fluid code on the Intel parallel computers

    Science.gov (United States)

    Lynch, V. E.; Carreras, B. A.; Drake, J. B.; Leboeuf, J. N.; Liewer, P.

    1992-01-01

    One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2.

  3. Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations

    Science.gov (United States)

    Bernaschi, M.; Bisson, M.; Salvadore, F.

    2014-10-01

    We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. We present data also for a traditional high-end multi-core architecture: the Intel Sandy Bridge. The results show that although on the two Intel architectures it is possible to use basically the same code, the performances of a Intel MIC change dramatically depending on (apparently) minor details. Another issue is that to obtain a reasonable scalability with the Intel Phi coprocessor (Phi is the coprocessor that implements the MIC architecture) in a cluster configuration it is necessary to use the so-called offload mode which reduces the performances of the single system. As to the GPU, the Kepler architecture offers a clear advantage with respect to the previous Fermi architecture maintaining exactly the same source code. Scalability of the multi-GPU implementation remains very good by using the CPU as a communication co-processor of the GPU. All source codes are provided for inspection and for double-checking the results.

  4. An Introduction to High Performance Fortran

    Directory of Open Access Journals (Sweden)

    John Merlin

    1995-01-01

    Full Text Available High Performance Fortran (HPF is an informal standard for extensions to Fortran 90 to assist its implementation on parallel architectures, particularly for data-parallel computation. Among other things, it includes directives for specifying data distribution across multiple memories, and concurrent execution features. This article provides a tutorial introduction to the main features of HPF.

  5. Application of Modern Fortran to Spacecraft Trajectory Design and Optimization

    Science.gov (United States)

    Williams, Jacob; Falck, Robert D.; Beekman, Izaak B.

    2018-01-01

    In this paper, applications of the modern Fortran programming language to the field of spacecraft trajectory optimization and design are examined. Modern object-oriented Fortran has many advantages for scientific programming, although many legacy Fortran aerospace codes have not been upgraded to use the newer standards (or have been rewritten in other languages perceived to be more modern). NASA's Copernicus spacecraft trajectory optimization program, originally a combination of Fortran 77 and Fortran 95, has attempted to keep up with modern standards and makes significant use of the new language features. Various algorithms and methods are presented from trajectory tools such as Copernicus, as well as modern Fortran open source libraries and other projects.

  6. Object-Oriented Scientific Programming with Fortran 90

    Science.gov (United States)

    Norton, C.

    1998-01-01

    Fortran 90 is a modern language that introduces many important new features beneficial for scientific programming. We discuss our experiences in plasma particle simulation and unstructured adaptive mesh refinement on supercomputers, illustrating the features of Fortran 90 that support the object-oriented methodology.

  7. DOLIB: Distributed Object Library

    Energy Technology Data Exchange (ETDEWEB)

    D' Azevedo, E.F.

    1994-01-01

    This report describes the use and implementation of DOLIB (Distributed Object Library), a library of routines that emulates global or virtual shared memory on Intel multiprocessor systems. Access to a distributed global array is through explicit calls to gather and scatter. Advantages of using DOLIB include: dynamic allocation and freeing of huge (gigabyte) distributed arrays, both C and FORTRAN callable interfaces, and the ability to mix shared-memory and message-passing programming models for ease of use and optimal performance. DOLIB is independent of language and compiler extensions and requires no special operating system support. DOLIB also supports automatic caching of read-only data for high performance. The virtual shared memory support provided in DOLIB is well suited for implementing Lagrangian particle tracking techniques. We have also used DOLIB to create DONIO (Distributed Object Network I/O Library), which obtains over a 10-fold improvement in disk I/O performance on the Intel Paragon.

  8. DOLIB: Distributed Object Library

    Energy Technology Data Exchange (ETDEWEB)

    D`Azevedo, E.F.; Romine, C.H.

    1994-10-01

    This report describes the use and implementation of DOLIB (Distributed Object Library), a library of routines that emulates global or virtual shared memory on Intel multiprocessor systems. Access to a distributed global array is through explicit calls to gather and scatter. Advantages of using DOLIB include: dynamic allocation and freeing of huge (gigabyte) distributed arrays, both C and FORTRAN callable interfaces, and the ability to mix shared-memory and message-passing programming models for ease of use and optimal performance. DOLIB is independent of language and compiler extensions and requires no special operating system support. DOLIB also supports automatic caching of read-only data for high performance. The virtual shared memory support provided in DOLIB is well suited for implementing Lagrangian particle tracking techniques. We have also used DOLIB to create DONIO (Distributed Object Network I/O Library), which obtains over a 10-fold improvement in disk I/O performance on the Intel Paragon.

  9. Fortran interface layer of the framework for developing particle simulator FDPS

    Science.gov (United States)

    Namekata, Daisuke; Iwasawa, Masaki; Nitadori, Keigo; Tanikawa, Ataru; Muranushi, Takayuki; Wang, Long; Hosono, Natsuki; Nomura, Kentaro; Makino, Junichiro

    2018-06-01

    Numerical simulations based on particle methods have been widely used in various fields including astrophysics. To date, various versions of simulation software have been developed by individual researchers or research groups in each field, through a huge amount of time and effort, even though the numerical algorithms used are very similar. To improve the situation, we have developed a framework, called FDPS (Framework for Developing Particle Simulators), which enables researchers to develop massively parallel particle simulation codes for arbitrary particle methods easily. Until version 3.0, FDPS provided an API (application programming interface) for the C++ programming language only. This limitation comes from the fact that FDPS is developed using the template feature in C++, which is essential to support arbitrary data types of particle. However, there are many researchers who use Fortran to develop their codes. Thus, the previous versions of FDPS require such people to invest much time to learn C++. This is inefficient. To cope with this problem, we developed a Fortran interface layer in FDPS, which provides API for Fortran. In order to support arbitrary data types of particle in Fortran, we design the Fortran interface layer as follows. Based on a given derived data type in Fortran representing particle, a PYTHON script provided by us automatically generates a library that manipulates the C++ core part of FDPS. This library is seen as a Fortran module providing an API of FDPS from the Fortran side and uses C programs internally to interoperate Fortran with C++. In this way, we have overcome several technical issues when emulating a `template' in Fortran. Using the Fortran interface, users can develop all parts of their codes in Fortran. We show that the overhead of the Fortran interface part is sufficiently small and a code written in Fortran shows a performance practically identical to the one written in C++.

  10. Performance of a plasma fluid code on the Intel parallel computers

    International Nuclear Information System (INIS)

    Lynch, V.E.; Carreras, B.A.; Drake, J.B.; Leboeuf, J.N.; Liewer, P.

    1992-01-01

    One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2. 12 refs

  11. The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

    Directory of Open Access Journals (Sweden)

    Matthew O'keefe

    1995-01-01

    Full Text Available Massively parallel processors (MPPs hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.

  12. Performance Evaluation of Multithreaded Geant4 Simulations Using an Intel Xeon Phi Cluster

    Directory of Open Access Journals (Sweden)

    P. Schweitzer

    2015-01-01

    Full Text Available The objective of this study is to evaluate the performances of Intel Xeon Phi hardware accelerators for Geant4 simulations, especially for multithreaded applications. We present the complete methodology to guide users for the compilation of their Geant4 applications on Phi processors. Then, we propose series of benchmarks to compare the performance of Xeon CPUs and Phi processors for a Geant4 example dedicated to the simulation of electron dose point kernels, the TestEm12 example. First, we compare a distributed execution of a sequential version of the Geant4 example on both architectures before evaluating the multithreaded version of the Geant4 example. If Phi processors demonstrated their ability to accelerate computing time (till a factor 3.83 when distributing sequential Geant4 simulations, we do not reach the same level of speedup when considering the multithreaded version of the Geant4 example.

  13. Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS

    Science.gov (United States)

    Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.

    2018-03-01

    Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.

  14. Efficient Implementation of Many-body Quantum Chemical Methods on the Intel Xeon Phi Coprocessor

    Energy Technology Data Exchange (ETDEWEB)

    Apra, Edoardo; Klemm, Michael; Kowalski, Karol

    2014-12-01

    This paper presents the implementation and performance of the highly accurate CCSD(T) quantum chemistry method on the Intel Xeon Phi coprocessor within the context of the NWChem computational chemistry package. The widespread use of highly correlated methods in electronic structure calculations is contingent upon the interplay between advances in theory and the possibility of utilizing the ever-growing computer power of emerging heterogeneous architectures. We discuss the design decisions of our implementation as well as the optimizations applied to the compute kernels and data transfers between host and coprocessor. We show the feasibility of adopting the Intel Many Integrated Core Architecture and the Intel Xeon Phi coprocessor for developing efficient computational chemistry modeling tools. Remarkable scalability is demonstrated by benchmarks. Our solution scales up to a total of 62560 cores with the concurrent utilization of Intel Xeon processors and Intel Xeon Phi coprocessors.

  15. Compiler Technology for Parallel Scientific Computation

    Directory of Open Access Journals (Sweden)

    Can Özturan

    1994-01-01

    Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.

  16. Intel Corporation osaleb Eesti koolitusprogrammis / Raivo Juurak

    Index Scriptorium Estoniae

    Juurak, Raivo, 1949-

    2001-01-01

    Haridusministeeriumis tutvustati infotehnoloogiaalast koolitusprogrammi, milles osaleb maailma suuremaid arvutifirmasid Intel Corporation. Koolituskursuse käigus õpetatakse aineõpetajaid oma ainetundides interneti võimalusi kasutama. 50-tunnised kursused viiakse läbi kõigis maakondades

  17. Aspects of FORTRAN in large-scale programming

    International Nuclear Information System (INIS)

    Metcalf, M.

    1983-01-01

    In these two lectures I examine the following three questions: i) Why did high-energy physicists begin to use FORTRAN. ii) Why do high-energy physicists continue to use FORTRAN. iii) Will high-energy physicists always use FORTRAN. In order to find answers to these questions, it is necessary to look at the history of the language, its present position, and its likely future, and also to consider its manner of use, the topic of portability, and the competition from other languages. Here we think especially of early competition from ALGOL, the more recent spread in the use of PASCAL, and the appearance of a completely new and ambitious language, ADA. (orig.)

  18. Aspects of FORTRAN in large-scale programming

    CERN Document Server

    Metcalf, M

    1983-01-01

    In these two lectures I shall try to examine the following three questions: i) Why did high-energy physicists begin to use FORTRAN? ii) Why do high-energy physicists continue to use FORTRAN? iii) Will high-energy physicists always use FORTRAN? In order to find answers to these questions, it is necessary to look at the history of the language, its present position, and its likely future, and also to consider its manner of use, the topic of portability, and the competition from other languages. Here we think especially of early competition from ALGOL, the more recent spread in the use of PASCAL, and the appearance of a completely new and ambitious language, ADA.

  19. On the tradeoffs of programming language choice for numerical modelling in geoscience. A case study comparing modern Fortran, C++/Blitz++ and Python/NumPy.

    Science.gov (United States)

    Jarecka, D.; Arabas, S.; Fijalkowski, M.; Gaynor, A.

    2012-04-01

    The language of choice for numerical modelling in geoscience has long been Fortran. A choice of a particular language and coding paradigm comes with different set of tradeoffs such as that between performance, ease of use (and ease of abuse), code clarity, maintainability and reusability, availability of open source compilers, debugging tools, adequate external libraries and parallelisation mechanisms. The availability of trained personnel and the scale and activeness of the developer community is of importance as well. We present a short comparison study aimed at identification and quantification of these tradeoffs for a particular example of an object oriented implementation of a parallel 2D-advection-equation solver in Python/NumPy, C++/Blitz++ and modern Fortran. The main angles of comparison will be complexity of implementation, performance of various compilers or interpreters and characterisation of the "added value" gained by a particular choice of the language. The choice of the numerical problem is dictated by the aim to make the comparison useful and meaningful to geoscientists. Python is chosen as a language that traditionally is associated with ease of use, elegant syntax but limited performance. C++ is chosen for its traditional association with high performance but even higher complexity and syntax obscurity. Fortran is included in the comparison for its widespread use in geoscience often attributed to its performance. We confront the validity of these traditional views. We point out how the usability of a particular language in geoscience depends on the characteristics of the language itself and the availability of pre-existing software libraries (e.g. NumPy, SciPy, PyNGL, PyNIO, MPI4Py for Python and Blitz++, Boost.Units, Boost.MPI for C++). Having in mind the limited complexity of the considered numerical problem, we present a tentative comparison of performance of the three implementations with different open source compilers including CPython and

  20. Comparison of and conversion between different implementations of the FORTRAN programming language

    Science.gov (United States)

    Treinish, L.

    1980-01-01

    A guideline for computer programmers who may need to exchange FORTRAN programs between several computers is presented. The characteristics of the FORTRAN language available on three different types of computers are outlined, and procedures and other considerations for the transfer of programs from one type of FORTRAN to another are discussed. In addition, the variance of these different FORTRAN's from the FORTRAN 77 standard are discussed.

  1. 75 FR 21353 - Intel Corporation, Fab 20 Division, Including On-Site Leased Workers From Volt Technical...

    Science.gov (United States)

    2010-04-23

    ... DEPARTMENT OF LABOR Employment and Training Administration [TA-W-73,642] Intel Corporation, Fab 20... of Intel Corporation, Fab 20 Division, including on-site leased workers of Volt Technical Resources... Precision, Inc. were employed on-site at the Hillsboro, Oregon location of Intel Corporation, Fab 20...

  2. MILC staggered conjugate gradient performance on Intel KNL

    Energy Technology Data Exchange (ETDEWEB)

    Li, Ruiz [Indiana Univ., Bloomington, IN (United States). Dept. of Physics; Detar, Carleton [Univ. of Utah, Salt Lake City, UT (United States). Dept. of Physics and Astronomy; Doerfler, Douglas W. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC); Gottlieb, Steven [Indiana Univ., Bloomington, IN (United States). Dept. of Physics; Jha, Asish [Intel Corp., Hillsboro, OR (United States). Sofware and Services Group; Kalamkar, Dhiraj [Intel Labs., Bangalore (India). Parallel Computing Lab.; Toussaint, Doug [Univ. of Arizona, Tucson, AZ (United States). Physics Dept.

    2016-11-03

    We review our work done to optimize the staggered conjugate gradient (CG) algorithm in the MILC code for use with the Intel Knights Landing (KNL) architecture. KNL is the second gener- ation Intel Xeon Phi processor. It is capable of massive thread parallelism, data parallelism, and high on-board memory bandwidth and is being adopted in supercomputing centers for scientific research. The CG solver consumes the majority of time in production running, so we have spent most of our effort on it. We compare performance of an MPI+OpenMP baseline version of the MILC code with a version incorporating the QPhiX staggered CG solver, for both one-node and multi-node runs.

  3. Formula Translation in Blitz++, NumPy and Modern Fortran: A Case Study of the Language Choice Tradeoffs

    Directory of Open Access Journals (Sweden)

    Sylwester Arabas

    2014-01-01

    Full Text Available Three object-oriented implementations of a prototype solver of the advection equation are introduced. The presented programs are based on Blitz++ (C++, NumPy (Python and Fortran's built-in array containers. The solvers constitute implementations of the Multidimensional Positive-Definite Advective Transport Algorithm (MPDATA. The introduced codes serve as examples for how the application of object-oriented programming (OOP techniques and new language constructs from C++11 and Fortran 2008 allow to reproduce the mathematical notation used in the literature within the program code. A discussion on the tradeoffs of the programming language choice is presented. The main angles of comparison are code brevity and syntax clarity (and hence maintainability and auditability as well as performance. All performance tests are carried out using free and open-source compilers. In the case of Python, a significant performance gain is observed when switching from the standard interpreter (CPython to the PyPy implementation of Python. Entire source code of all three implementations is embedded in the text and is licensed under the terms of the GNU GPL license.

  4. Replacing Fortran Namelists with JSON

    Science.gov (United States)

    Robinson, T. E., Jr.

    2017-12-01

    Maintaining a log of input parameters for a climate model is very important to understanding potential causes for answer changes during the development stages. Additionally, since modern Fortran is now interoperable with C, a more modern approach to software infrastructure to include code written in C is necessary. Merging these two separate facets of climate modeling requires a quality control for monitoring changes to input parameters and model defaults that can work with both Fortran and C. JSON will soon replace namelists as the preferred key/value pair input in the GFDL model. By adding a JSON parser written in C into the model, the input can be used by all functions and subroutines in the model, errors can be handled by the model instead of by the internal namelist parser, and the values can be output into a single file that is easily parsable by readily available tools. Input JSON files can handle all of the functionality of a namelist while being portable between C and Fortran. Fortran wrappers using unlimited polymorphism are crucial to allow for simple and compact code which avoids the need for many subroutines contained in an interface. Errors can be handled with more detail by providing information about location of syntax errors or typos. The output JSON provides a ground truth for values that the model actually uses by providing not only the values loaded through the input JSON, but also any default values that were not included. This kind of quality control on model input is crucial for maintaining reproducibility and understanding any answer changes resulting from changes in the input.

  5. READDATA: a FORTRAN 77 codeword input package

    International Nuclear Information System (INIS)

    Lander, P.A.

    1983-07-01

    A new codeword input package has been produced as a result of the incompatibility between different dialects of FORTRAN, especially when character variables are passed as parameters. This report is for those who wish to use a codeword input package with FORTRAN 77. The package, called ''Readdata'', attempts to combine the best features of its predecessors such as BINPUT and pseudo-BINPUT. (author)

  6. FASTPLOT, Interface Routines to MS FORTRAN Graphics Library

    International Nuclear Information System (INIS)

    1999-01-01

    1 - Description of program or function: FASTPLOT is a library of routines that can be used to interface with the Microsoft FORTRAN Graphics library (GRAPHICS.LIB). The FASTPLOT routines simplify the development of graphics applications and add capabilities such as histograms, Splines, symbols, and error bars. FASTPLOT also includes routines that can be used to create menus. 2 - Methods: FASTPLOT is a library of routines which must be linked with a user's FORTRAN programs that call any FASTPLOT routines. In addition, the user must link with the Microsoft FORTRAN Graphics library (GRAPHICS.LIB). 3 - Restrictions on the complexity of the problem: None noted

  7. C Versus Fortran-77 for Scientific Programming

    Directory of Open Access Journals (Sweden)

    Tom MacDonald

    1992-01-01

    Full Text Available The predominant programming language for numeric and scientific applications is Fortran-77 and supercomputers are primarily used to run large-scale numeric and scientific applications. Standard C* is not widely used for numerical and scientific programming, yet Standard C provides many desirable linguistic features not present in Fortran-77. Furthermore, the existence of a standard library and preprocessor eliminates the worst portability problems. A comparison of Standard C and Fortran-77 shows several key deficiencies in C that reduce its ability to adequately solve some numerical problems. Some of these problems have already been addressed by the C standard but others remain. Standard C with a few extensions and modifications could be suitable for all numerical applications and could become more popular in supercomputing environments.

  8. Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

    Science.gov (United States)

    Joslin, Ronald D.; Zubair, Mohammad

    1993-01-01

    The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.

  9. Alternatives to FORTRAN in control systems

    International Nuclear Information System (INIS)

    Howell, J.A.; Wright, R.M.

    1985-01-01

    Control system software has traditionally been written in assembly language, FORTRAN, or Basic. Today there exist several high-level languages with features that make them convenient and effective in control systems. These features include bit manipulation, user-defined data types, character manipulation, and high-level logical operations. Some of theses languages are quite different from FORTRAN and yet are easy to read and use. We discuss several languages, their features that make them convenient for control systems, and give examples of their use. We focus particular attention on the language C, developed by Bell Laboratories

  10. IRRIGOGRAPHY AFTER PREPARATIONS OF PATIENTS WITH FORTRANS

    Directory of Open Access Journals (Sweden)

    Irena Jankovic

    2006-01-01

    Full Text Available Fortrans® is a laxative in the form of the powder which is used for making solution for oral application. Laxative effects are achieved over long linear polymer (polyethylene-glikol - PEG 4000 which binds water molecules, increasing thus the volume of fluids in the intestinal tract.Material of study comprises 150 irrigographies made at the Institute of Radiology of the Clinical Center in Nis in the period from January 2004 to Jun 2005.The preparation in these cases was done by Fortrans®. The contrast medium used was barium sulfate.The results of study are presented in illustrations and irrigograms.In conclusion,we can say that Fortrans® provides reliable, effective and simple preparation of patients for irrigography as well as for fast, comfortable and efficient endographic examination (irrigography. The obtained irrigograms are of satisfactory quality, showing sharp contrasts.

  11. Intel Xeon Phi accelerated Weather Research and Forecasting (WRF) Goddard microphysics scheme

    Science.gov (United States)

    Mielikainen, J.; Huang, B.; Huang, A. H.-L.

    2014-12-01

    The Weather Research and Forecasting (WRF) model is a numerical weather prediction system designed to serve both atmospheric research and operational forecasting needs. The WRF development is a done in collaboration around the globe. Furthermore, the WRF is used by academic atmospheric scientists, weather forecasters at the operational centers and so on. The WRF contains several physics components. The most time consuming one is the microphysics. One microphysics scheme is the Goddard cloud microphysics scheme. It is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The Goddard microphysics scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Goddard scheme incorporates a large number of improvements. Thus, we have optimized the Goddard scheme code. In this paper, we present our results of optimizing the Goddard microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The Intel MIC is capable of executing a full operating system and entire programs rather than just kernels as the GPU does. The MIC coprocessor supports all important Intel development tools. Thus, the development environment is one familiar to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discussed in this paper. The results show that the optimizations improved performance of Goddard microphysics scheme on Xeon Phi 7120P by a factor of 4.7×. In addition, the optimizations reduced the Goddard microphysics scheme's share of the total WRF processing time from 20.0 to 7.5%. Furthermore, the same optimizations

  12. Software Aspects of IEEE Floating-Point Computations for Numerical Applications in High Energy Physics

    CERN Multimedia

    CERN. Geneva

    2010-01-01

    hazards to be avoided About the speaker Jeffrey M Arnold is a Senior Software Engineer in the Intel Compiler and Languages group at Intel Corporation. He has been part of the Digital->Compaq->Intel compiler organization for nearly 20 years; part of that time, he worked on both lo...

  13. Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors

    Science.gov (United States)

    Yi, Hongsuk

    2014-03-01

    Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.

  14. TOOLPACK1, Tools for Development and Maintenance of FORTRAN 77 Program

    International Nuclear Information System (INIS)

    Cowell, Wayne R.

    1993-01-01

    1 - Description of program or function: TOOLPACK1 consists of the following categories of software; (1) an integrated collection of tools intended to support the development and maintenance of FORTRAN 77 programs, in particular moderate-sized collections of mathematical software; (2) several user/Toolpack interfaces, one of which is selected for use at any particular installation; (3) three implementations of the tool/system interface, called TIE (Tool Interface to the Environment). The tools are written in FORTRAN 77 and are portable among TIE installations. The source contains symbolic constants as macro names and must be expanded with a suitable macro expander before being compiled and loaded. A portable macro expander is supplied in TOOLPACK1. The tools may be divided into three functional areas: general, documentation, and processing. One tool, the macro processor, Can be used in any of these categories. ISTDC: data comparison tool is designed mainly for comparing files of numeric values, and files with embedded text. ISTET Expands tabs. ISTFI: finds all the include files that a file needs. ISTGP Searches multiple files for occurrences of a regular expression. ISTHP: will provide limited help information about tools. ISTMP: The macro processor may be used to pre-process a file. The processor provides macro replacement, inclusion, conditional replacement, and processing capabilities for complex file processing. ISTSP: TIE-conforming version of the SPLIT utility to split up the concatenated files used on the tape. ISTSV: save/restore utility to save and restore sub-trees of the Portable File Store (PFS). ISTTD: text comparison tool. ISTVC: simple text file version controller. ISTAL: aids is a preprocessor that can be used to generate specific information from intermediate files created by other tools. The information that can be generated includes call-graphs, cross reference listings, segment execution frequencies, and symbol information. ISTAL can also strip

  15. Windows for Intel Macs

    CERN Document Server

    Ogasawara, Todd

    2008-01-01

    Even the most devoted Mac OS X user may need to use Windows XP, or may just be curious about XP and its applications. This Short Cut is a concise guide for OS X users who need to quickly get comfortable and become productive with Windows XP basics on their Macs. It covers: Security Networking ApplicationsMac users can easily install and use Windows thanks to Boot Camp and Parallels Desktop for Mac. Boot Camp lets an Intel-based Mac install and boot Windows XP on its own hard drive partition. Parallels Desktop for Mac uses virtualization technology to run Windows XP (or other operating systems

  16. Basic linear algebra subprograms for FORTRAN usage

    Science.gov (United States)

    Lawson, C. L.; Hanson, R. J.; Kincaid, D. R.; Krogh, F. T.

    1977-01-01

    A package of 38 low level subprograms for many of the basic operations of numerical linear algebra is presented. The package is intended to be used with FORTRAN. The operations in the package are dot products, elementary vector operations, Givens transformations, vector copy and swap, vector norms, vector scaling, and the indices of components of largest magnitude. The subprograms and a test driver are available in portable FORTRAN. Versions of the subprograms are also provided in assembly language for the IBM 360/67, the CDC 6600 and CDC 7600, and the Univac 1108.

  17. ARBUS: A FORTRAN tool for generating tree structure diagrams

    International Nuclear Information System (INIS)

    Ferrero, C.; Zanger, M.

    1992-02-01

    The FORTRAN77 stand-alone code ARBUS has been designed to aid the user by providing a tree structure diagram generating utility for computer programs written in FORTRAN language. This report is intended to describe the main purpose and features of ARBUS and to highlight some additional applications of the code by means of practical test cases. (orig.) [de

  18. Connecting Effective Instruction and Technology. Intel-elebration: Safari.

    Science.gov (United States)

    Burton, Larry D.; Prest, Sharon

    Intel-ebration is an attempt to integrate the following research-based instructional frameworks and strategies: (1) dimensions of learning; (2) multiple intelligences; (3) thematic instruction; (4) cooperative learning; (5) project-based learning; and (6) instructional technology. This paper presents a thematic unit on safari, using the…

  19. Data compilation of single pion photoproduction below 2 GeV

    International Nuclear Information System (INIS)

    Inagaki, Y.; Nakamura, T.; Ukai, K.

    1976-01-01

    The compilation of data of single pion photoproduction experiment below 2 GeV is presented with the keywords which specify the experiment. These data are written on a magnetic tape. Data format and the indices for the keywords are given. Various programs of using this tape are also presented. The results of the compilation are divided into two types. The one is the reference card on which the information of the experiment is given. The other is the data card. These reference and data cards are written using all A-type format on an original tape. The copy tapes are available, which are written by various types on request. There are two kinds of the copy tapes. The one is same as the original tape, and the other is the one different in the data card. Namely, this card is written by F-type following the data type. One experiment on this tape is represented by 3 kinds of the cards. One reference card with A-type format, many data cards with F-type format and one identifying card. Various programs which are written by FORTRAN are ready for these original and copy tapes. (Kato, T.)

  20. CAMSHIFT Tracker Design Experiments With Intel OpenCV and SAI

    National Research Council Canada - National Science Library

    Francois, Alexandre R

    2004-01-01

    ... (including multi-modal) systems, must be specifically addressed. This report describes design and implementation experiments for CAMSHIFT-based tracking systems using Intel's Open Computer Vision library and SAI...

  1. Full cycle trigonometric function on Intel Quartus II Verilog

    Science.gov (United States)

    Mustapha, Muhazam; Zulkarnain, Nur Antasha

    2018-02-01

    This paper discusses about an improvement of a previous research on hardware based trigonometric calculations. Tangent function will also be implemented to get a complete set. The functions have been simulated using Quartus II where the result will be compared to the previous work. The number of bits has also been extended for each trigonometric function. The design is based on RTL due to its resource efficient nature. At earlier stage, a technology independent test bench simulation was conducted on ModelSim due to its convenience in capturing simulation data so that accuracy information can be obtained. On second stage, Intel/Altera Quartus II will be used to simulate on technology dependent platform, particularly on the one belonging to Intel/Altera itself. Real data on no. logic elements used and propagation delay have also been obtained.

  2. Accelerating the Pace of Protein Functional Annotation With Intel Xeon Phi Coprocessors.

    Science.gov (United States)

    Feinstein, Wei P; Moreno, Juana; Jarrell, Mark; Brylinski, Michal

    2015-06-01

    Intel Xeon Phi is a new addition to the family of powerful parallel accelerators. The range of its potential applications in computationally driven research is broad; however, at present, the repository of scientific codes is still relatively limited. In this study, we describe the development and benchmarking of a parallel version of eFindSite, a structural bioinformatics algorithm for the prediction of ligand-binding sites in proteins. Implemented for the Intel Xeon Phi platform, the parallelization of the structure alignment portion of eFindSite using pragma-based OpenMP brings about the desired performance improvements, which scale well with the number of computing cores. Compared to a serial version, the parallel code runs 11.8 and 10.1 times faster on the CPU and the coprocessor, respectively; when both resources are utilized simultaneously, the speedup is 17.6. For example, ligand-binding predictions for 501 benchmarking proteins are completed in 2.1 hours on a single Stampede node equipped with the Intel Xeon Phi card compared to 3.1 hours without the accelerator and 36.8 hours required by a serial version. In addition to the satisfactory parallel performance, porting existing scientific codes to the Intel Xeon Phi architecture is relatively straightforward with a short development time due to the support of common parallel programming models by the coprocessor. The parallel version of eFindSite is freely available to the academic community at www.brylinski.org/efindsite.

  3. Trusted Computing Technologies, Intel Trusted Execution Technology.

    Energy Technology Data Exchange (ETDEWEB)

    Guise, Max Joseph; Wendt, Jeremy Daniel

    2011-01-01

    We describe the current state-of-the-art in Trusted Computing Technologies - focusing mainly on Intel's Trusted Execution Technology (TXT). This document is based on existing documentation and tests of two existing TXT-based systems: Intel's Trusted Boot and Invisible Things Lab's Qubes OS. We describe what features are lacking in current implementations, describe what a mature system could provide, and present a list of developments to watch. Critical systems perform operation-critical computations on high importance data. In such systems, the inputs, computation steps, and outputs may be highly sensitive. Sensitive components must be protected from both unauthorized release, and unauthorized alteration: Unauthorized users should not access the sensitive input and sensitive output data, nor be able to alter them; the computation contains intermediate data with the same requirements, and executes algorithms that the unauthorized should not be able to know or alter. Due to various system requirements, such critical systems are frequently built from commercial hardware, employ commercial software, and require network access. These hardware, software, and network system components increase the risk that sensitive input data, computation, and output data may be compromised.

  4. Radiation Failures in Intel 14nm Microprocessors

    Science.gov (United States)

    Bossev, Dobrin P.; Duncan, Adam R.; Gadlage, Matthew J.; Roach, Austin H.; Kay, Matthew J.; Szabo, Carl; Berger, Tammy J.; York, Darin A.; Williams, Aaron; LaBel, K.; hide

    2016-01-01

    In this study the 14 nm Intel Broadwell 5th generation core series 5005U-i3 and 5200U-i5 was mounted on Dell Inspiron laptops, MSI Cubi and Gigabyte Brix barebones and tested with Windows 8 and CentOS7 at idle. Heavy-ion-induced hard- and catastrophic failures do not appear to be related to the Intel 14nm Tri-Gate FinFET process. They originate from a small (9 m 140 m) area on the 32nm planar PCH die (not the CPU) as initially speculated. The hard failures seem to be due to a SEE but the exact physical mechanism has yet to be identified. Some possibilities include latch-ups, charge ion trapping or implantation, ion channels, or a combination of those (in biased conditions). The mechanism of the catastrophic failures seems related to the presence of electric power (1.05V core voltage). The 1064 nm laser mimics ionization radiation and induces soft- and hard failures as a direct result of electron-hole pair production, not heat. The 14nm FinFET processes continue to look promising for space radiation environments.

  5. A package of Linux scripts for the parallelization of Monte Carlo simulations

    Science.gov (United States)

    Badal, Andreu; Sempau, Josep

    2006-09-01

    sequential code. Program summary 1Title of program:clonEasy Catalogue identifier:ADYD_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYD_v1_0 Program obtainable from:CPC Program Library, Queen's University of Belfast, Northern Ireland Computer for which the program is designed and others in which it is operable:Any computer with a Unix style shell (bash), support for the Secure Shell protocol and a FORTRAN compiler Operating systems under which the program has been tested:Linux (RedHat 8.0, SuSe 8.1, Debian Woody 3.1) Compilers:GNU FORTRAN g77 (Linux); g95 (Linux); Intel Fortran Compiler 7.1 (Linux) Programming language used:Linux shell (bash) script, FORTRAN 77 No. of bits in a word:32 No. of lines in distributed program, including test data, etc.:1916 No. of bytes in distributed program, including test data, etc.:18 202 Distribution format:tar.gz Nature of the physical problem:There are many situations where a Monte Carlo simulation involves a huge amount of CPU time. The parallelization of such calculations is a simple way of obtaining a relatively low statistical uncertainty using a reasonable amount of time. Method of solution:The presented collection of Linux scripts and auxiliary FORTRAN programs implement Secure Shell-based communication between a "master" computer and a set of "clones". The aim of this communication is to execute a code that performs a Monte Carlo simulation on all the clones simultaneously. The code is unique, but each clone is fed with a different set of random seeds. Hence, clonEasy effectively permits the parallelization of the calculation. Restrictions on the complexity of the program:clonEasy can only be used with programs that produce statistically independent results using the same code, but with a different sequence of random numbers. Users must choose the initialization values for the random number generator on each computer and combine the output from the different executions. A FORTRAN program to combine the final results is

  6. Real-time data acquisition and feedback control using Linux Intel computers

    International Nuclear Information System (INIS)

    Penaflor, B.G.; Ferron, J.R.; Piglowski, D.A.; Johnson, R.D.; Walker, M.L.

    2006-01-01

    This paper describes the experiences of the DIII-D programming staff in adapting Linux based Intel computing hardware for use in real-time data acquisition and feedback control systems. Due to the highly dynamic and unstable nature of magnetically confined plasmas in tokamak fusion experiments, real-time data acquisition and feedback control systems are in routine use with all major tokamaks. At DIII-D, plasmas are created and sustained using a real-time application known as the digital plasma control system (PCS). During each experiment, the PCS periodically samples data from hundreds of diagnostic signals and provides these data to control algorithms implemented in software. These algorithms compute the necessary commands to send to various actuators that affect plasma performance. The PCS consists of a group of rack mounted Intel Xeon computer systems running an in-house customized version of the Linux operating system tailored specifically to meet the real-time performance needs of the plasma experiments. This paper provides a more detailed description of the real-time computing hardware and custom developed software, including recent work to utilize dual Intel Xeon equipped computers within the PCS

  7. Analysis of Intel IA-64 Processor Support for Secure Systems

    National Research Council Canada - National Science Library

    Unalmis, Bugra

    2001-01-01

    .... Systems could be constructed for which serious security threats would be eliminated. This thesis explores the Intel IA-64 processor's hardware support and its relationship to software for building a secure system...

  8. Lawrence Livermore National Laboratory selects Intel Itanium 2 processors for world's most powerful Linux cluster

    CERN Multimedia

    2003-01-01

    "Intel Corporation, system manufacturer California Digital and the University of California at Lawrence Livermore National Laboratory (LLNL) today announced they are building one of the world's most powerful supercomputers. The supercomputer project, codenamed "Thunder," uses nearly 4,000 Intel® Itanium® 2 processors... is expected to be complete in January 2004" (1 page).

  9. Experience with Intel's many integrated core architecture in ATLAS software

    International Nuclear Information System (INIS)

    Fleischmann, S; Neumann, M; Kama, S; Lavrijsen, W; Vitillo, R

    2014-01-01

    Intel recently released the first commercial boards of its Many Integrated Core (MIC) Architecture. MIC is Intel's solution for the domain of throughput computing, currently dominated by general purpose programming on graphics processors (GPGPU). MIC allows the use of the more familiar x86 programming model and supports standard technologies such as OpenMP, MPI, and Intel's Threading Building Blocks (TBB). This should make it possible to develop for both throughput and latency devices using a single code base. In ATLAS Software, track reconstruction has been shown to be a good candidate for throughput computing on GPGPU devices. In addition, the newly proposed offline parallel event-processing framework, GaudiHive, uses TBB for task scheduling. The MIC is thus, in principle, a good fit for this domain. In this paper, we report our experiences of porting to and optimizing ATLAS tracking algorithms for the MIC, comparing the programmability and relative cost/performance of the MIC against those of current GPGPUs and latency-optimized CPUs.

  10. Scaling Deep Learning Workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

    Energy Technology Data Exchange (ETDEWEB)

    Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeffrey A.; Tallent, Nathan R.; Vishnu, Abhinav; Kerbyson, Darren J.

    2017-07-03

    Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors --- including NVIDIA, Intel, AMD and IBM --- have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. This paper provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, CaffeNet, AlexNet and GoogleNet topologies using the Cifar10 and ImageNet datasets. The workloads are vendor optimized for each architecture. GPUs provide the highest overall raw performance. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and KNL can be competitive when considering performance/watt. Furthermore, NVLink is critical to GPU scaling.

  11. Investigating the Use of the Intel Xeon Phi for Event Reconstruction

    Science.gov (United States)

    Sherman, Keegan; Gilfoyle, Gerard

    2014-09-01

    The physics goal of Jefferson Lab is to understand how quarks and gluons form nuclei and it is being upgraded to a higher, 12-GeV beam energy. The new CLAS12 detector in Hall B will collect 5-10 terabytes of data per day and will require considerable computing resources. We are investigating tools, such as the Intel Xeon Phi, to speed up the event reconstruction. The Kalman Filter is one of the methods being studied. It is a linear algebra algorithm that estimates the state of a system by combining existing data and predictions of those measurements. The tools required to apply this technique (i.e. matrix multiplication, matrix inversion) are being written using C++ intrinsics for Intel's Xeon Phi Coprocessor, which uses the Many Integrated Cores (MIC) architecture. The Intel MIC is a new high-performance chip that connects to a host machine through the PCIe bus and is built to run highly vectorized and parallelized code making it a well-suited device for applications such as the Kalman Filter. Our tests of the MIC optimized algorithms needed for the filter show significant increases in speed. For example, matrix multiplication of 5x5 matrices on the MIC was able to run up to 69 times faster than the host core. The physics goal of Jefferson Lab is to understand how quarks and gluons form nuclei and it is being upgraded to a higher, 12-GeV beam energy. The new CLAS12 detector in Hall B will collect 5-10 terabytes of data per day and will require considerable computing resources. We are investigating tools, such as the Intel Xeon Phi, to speed up the event reconstruction. The Kalman Filter is one of the methods being studied. It is a linear algebra algorithm that estimates the state of a system by combining existing data and predictions of those measurements. The tools required to apply this technique (i.e. matrix multiplication, matrix inversion) are being written using C++ intrinsics for Intel's Xeon Phi Coprocessor, which uses the Many Integrated Cores (MIC

  12. Evaluation of the Intel Westmere-EX server processor

    CERN Document Server

    Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

    2011-01-01

    One year after the arrival of the Intel Xeon 7500 systems (“Nehalem-EX”), CERN openlab is presenting a set of benchmark results obtained when running on the new Xeon E7-4870 Processors, representing the “Westmere-EX” family. A modern 4-socket, 40-core system is confronted with the previous generation of expandable (“EX”) platforms, represented by a 4-socket, 32-core Intel Xeon X7560 based system – both being “top of the line” systems. Benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores via Symmetric MultiThreading (SMT), the cache sizes available, the configured memory topology, as well as the power configuration if throughput per watt is to be measured. As in previous activities, we have tried to do a good job of comparing like with like. In a “top of the line” comparison based on the HEPSPEC06 benchmark, the “We...

  13. Extracting UML Class Diagrams from Object-Oriented Fortran: ForUML

    Directory of Open Access Journals (Sweden)

    Aziz Nanthaamornphong

    2015-01-01

    Full Text Available Many scientists who implement computational science and engineering software have adopted the object-oriented (OO Fortran paradigm. One of the challenges faced by OO Fortran developers is the inability to obtain high level software design descriptions of existing applications. Knowledge of the overall software design is not only valuable in the absence of documentation, it can also serve to assist developers with accomplishing different tasks during the software development process, especially maintenance and refactoring. The software engineering community commonly uses reverse engineering techniques to deal with this challenge. A number of reverse engineering-based tools have been proposed, but few of them can be applied to OO Fortran applications. In this paper, we propose a software tool to extract unified modeling language (UML class diagrams from Fortran code. The UML class diagram facilitates the developers' ability to examine the entities and their relationships in the software system. The extracted diagrams enhance software maintenance and evolution. The experiments carried out to evaluate the proposed tool show its accuracy and a few of the limitations.

  14. Run-Time and Compiler Support for Programming in Adaptive Parallel Environments

    Directory of Open Access Journals (Sweden)

    Guy Edjlali

    1997-01-01

    Full Text Available For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at run-time. In this article, we discuss run-time support for data-parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a run-time library to provide this support. We discuss how the run-time library can be used by compilers of high-performance Fortran (HPF-like languages to generate code for an adaptive environment. We present performance results for a Navier-Stokes solver and a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network of nondedicated workstations, which are likely to be an important resource for parallel programming in the future.

  15. Optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme for Intel Many Integrated Core (MIC) architecture

    Science.gov (United States)

    Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.-L.

    2015-05-01

    Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the updated Goddard shortwave radiation Weather Research and Forecasting (WRF) scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The co-processor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of Xeon Phi will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 1.3x.

  16. Comparison of PASCAL and FORTRAN for solving problems in the physical sciences

    Science.gov (United States)

    Watson, V. R.

    1981-01-01

    The paper compares PASCAL and FORTRAN for problem solving in the physical sciences, due to requests NASA has received to make PASCAL available on the Numerical Aerodynamic Simulator (scheduled to be operational in 1986). PASCAL disadvantages include the lack of scientific utility procedures equivalent to the IBM scientific subroutine package or the IMSL package which are available in FORTRAN. Advantages include a well-organized, easy to read and maintain writing code, range checking to prevent errors, and a broad selection of data types. It is concluded that FORTRAN may be the better language, although ADA (patterned after PASCAL) may surpass FORTRAN due to its ability to add complex and vector math, and the specify the precision and range of variables.

  17. Revisiting Intel Xeon Phi optimization of Thompson cloud microphysics scheme in Weather Research and Forecasting (WRF) model

    Science.gov (United States)

    Mielikainen, Jarno; Huang, Bormin; Huang, Allen

    2015-10-01

    The Thompson cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Thompson scheme incorporates a large number of improvements. Thus, we have optimized the speed of this important part of WRF. Intel Many Integrated Core (MIC) ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our results of optimizing the Thompson microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. New optimizations for an updated Thompson scheme are discusses in this paper. The optimizations improved the performance of the original Thompson code on Xeon Phi 7120P by a factor of 1.8x. Furthermore, the same optimizations improved the performance of the Thompson on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 1.8x compared to the original Thompson code.

  18. IFF, Full-Screen Input Menu Generator for FORTRAN Program

    International Nuclear Information System (INIS)

    Seidl, Albert

    1991-01-01

    1 - Description of program or function: The IFF-package contains input modules for use within FORTRAN programs. This package enables the programmer to easily include interactive menu-directed data input (module VTMEN1) and command-word processing (module INPCOM) into a FORTRAN program. 2 - Method of solution: No mathematical operations are performed. 3 - Restrictions on the complexity of the problem: Certain restrictions of use may arise from the dimensioning of arrays. Field lengths are defined via PARAMETER-statements

  19. Fortran Testing and Refactoring Infrastructure, Phase II

    Data.gov (United States)

    National Aeronautics and Space Administration — Tech-X proposes to develop a comprehensive Fortran testing and refactoring infrastructure that allows developers and scientists to leverage the benefits of...

  20. Emulating Multiple Inheritance in Fortran 2003/2008

    Directory of Open Access Journals (Sweden)

    Karla Morris

    2015-01-01

    in Fortran 2003. The design unleashes the power of the associated class relationships for modeling complicated data structures yet avoids the ambiguities that plague some multiple inheritance scenarios.

  1. Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture

    OpenAIRE

    Liu, Xu; Chen, Langshi; Firoz, Jesun S.; Qiu, Judy; Jiang, Lei

    2017-01-01

    Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we empirically evaluate various computing platforms including an Intel Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor codenamed Knights Landing (KNL) in the domain of parallel graph processing. We show that the KNL gains encouraging per...

  2. FORTRAN data files transference from VAX/VMS to ALPHA/UNIX; Traspaso de ficheros FORTRAN de datos de VAX/VMS a ALPHA/UNIX

    Energy Technology Data Exchange (ETDEWEB)

    Sanchez, E.; Milligen, B. Ph van [CIEMAT (Spain)

    1997-09-01

    Several tools have been developed to access the TJ-IU databases, which currently reside in VAX/VMS servers, from the TJ-II Data Acquisition System DEC ALPHA 8400 server. The TJ-I/TJ-IU databases are not homogeneous and contain several types of data files, namely, SADE, CAMAC and FORTRAN unformatted files. The tools presented in this report allow one to transfer CAMAC and those FORTRAN unformatted files defined herein, from a VAX/VMS server, for data manipulation on the ALPHA/Digital UNIX server. (Author)

  3. Fortran Testing and Refactoring Infrastructure, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — Tech-X proposes to develop a comprehensive Fortran testing and refactoring infrastructure that allows developers and scientists to leverage the benefits of a...

  4. How to Interface Fortran with Matlab

    OpenAIRE

    Sagastizábal , Claudia; Vige , Guillaume

    1995-01-01

    Projet PROMATH; We describe the general procedure for interfacing Fortran routines with Matlab. We explain how to write a mex-file and the associated gateway function. In particular, each different type of argument is considered in detail. We finish with an illustrative example

  5. Extending R packages to support 64-bit compiled code: An illustration with spam64 and GIMMS NDVI3g data

    Science.gov (United States)

    Gerber, Florian; Mösinger, Kaspar; Furrer, Reinhard

    2017-07-01

    Software packages for spatial data often implement a hybrid approach of interpreted and compiled programming languages. The compiled parts are usually written in C, C++, or Fortran, and are efficient in terms of computational speed and memory usage. Conversely, the interpreted part serves as a convenient user-interface and calls the compiled code for computationally demanding operations. The price paid for the user friendliness of the interpreted component is-besides performance-the limited access to low level and optimized code. An example of such a restriction is the 64-bit vector support of the widely used statistical language R. On the R side, users do not need to change existing code and may not even notice the extension. On the other hand, interfacing 64-bit compiled code efficiently is challenging. Since many R packages for spatial data could benefit from 64-bit vectors, we investigate strategies to efficiently pass 64-bit vectors to compiled languages. More precisely, we show how to simply extend existing R packages using the foreign function interface to seamlessly support 64-bit vectors. This extension is shown with the sparse matrix algebra R package spam. The new capabilities are illustrated with an example of GIMMS NDVI3g data featuring a parametric modeling approach for a non-stationary covariance matrix.

  6. Intel Legend and CERN would build up high speed Internet

    CERN Multimedia

    2002-01-01

    Intel, Legend and China Education and Research Network jointly announced on the 25th of April that they will be cooperating with each other to build up the new generation high speed internet, over the next three years (1/2 page).

  7. Introduction to modern Fortran for the Earth system sciences

    CERN Document Server

    Chirila, Dragos B

    2014-01-01

    This work provides a short "getting started" guide to Fortran 90/95. The main target audience consists of newcomers to the field of numerical computation within Earth system sciences (students, researchers or scientific programmers). Furthermore, readers accustomed to other programming languages may also benefit from this work, by discovering how some programming techniques they are familiar with map to Fortran 95. The main goal is to enable readers to quickly start using Fortran 95 for writing useful programs. It also introduces a gradual discussion of Input/Output facilities relevant for Earth system sciences, from the simplest ones to the more advanced netCDF library (which has become a de facto standard for handling the massive datasets used within Earth system sciences). While related works already treat these disciplines separately (each often providing much more information than needed by the beginning practitioner), the reader finds in this book a shorter guide which links them. Compared to other book...

  8. Automatic generation of Fortran programs for algebraic simulation models

    International Nuclear Information System (INIS)

    Schopf, W.; Rexer, G.; Ruehle, R.

    1978-04-01

    This report documents a generator program by which econometric simulation models formulated in an application-orientated language can be transformed automatically in a Fortran program. Thus the model designer is able to build up, test and modify models without the need of a Fortran programmer. The development of a computer model is therefore simplified and shortened appreciably; in chapter 1-3 of this report all rules are presented for the application of the generator to the model design. Algebraic models including exogeneous and endogeneous time series variables, lead and lag function can be generated. In addition, to these language elements, Fortran sequences can be applied to the formulation of models in the case of complex model interrelations. Automatically the generated model is a module of the program system RSYST III and is therefore able to exchange input and output data with the central data bank of the system and in connection with the method library modules can be used to handle planning problems. (orig.) [de

  9. PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

    Energy Technology Data Exchange (ETDEWEB)

    Lichtner, Peter C. [OFM Research, Redmond, WA (United States); Hammond, Glenn E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lu, Chuan [Idaho National Lab. (INL), Idaho Falls, ID (United States); Karra, Satish [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Bisht, Gautam [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Andre, Benjamin [National Center for Atmospheric Research, Boulder, CO (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Mills, Richard [Intel Corporation, Portland, OR (United States); Univ. of Tennessee, Knoxville, TN (United States); Kumar, Jitendra [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2015-01-20

    PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Written in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 232 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.

  10. Staggered Dslash Performance on Intel Xeon Phi Architecture

    OpenAIRE

    Li, Ruizi; Gottlieb, Steven

    2014-01-01

    The conjugate gradient (CG) algorithm is among the most essential and time consuming parts of lattice calculations with staggered quarks. We test the performance of CG and dslash, the key step in the CG algorithm, on the Intel Xeon Phi, also known as the Many Integrated Core (MIC) architecture. We try different parallelization strategies using MPI, OpenMP, and the vector processing units (VPUs).

  11. Global synchronization algorithms for the Intel iPSC/860

    Science.gov (United States)

    Seidel, Steven R.; Davis, Mark A.

    1992-01-01

    In a distributed memory multicomputer that has no global clock, global processor synchronization can only be achieved through software. Global synchronization algorithms are used in tridiagonal systems solvers, CFD codes, sequence comparison algorithms, and sorting algorithms. They are also useful for event simulation, debugging, and for solving mutual exclusion problems. For the Intel iPSC/860 in particular, global synchronization can be used to ensure the most effective use of the communication network for operations such as the shift, where each processor in a one-dimensional array or ring concurrently sends a message to its right (or left) neighbor. Three global synchronization algorithms are considered for the iPSC/860: the gysnc() primitive provided by Intel, the PICL primitive sync0(), and a new recursive doubling synchronization (RDS) algorithm. The performance of these algorithms is compared to the performance predicted by communication models of both the long and forced message protocols. Measurements of the cost of shift operations preceded by global synchronization show that the RDS algorithm always synchronizes the nodes more precisely and costs only slightly more than the other two algorithms.

  12. Scaling deep learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

    Energy Technology Data Exchange (ETDEWEB)

    Gawande, Nitin A.; Landwehr, Joshua B.; Daily, Jeffrey A.; Tallent, Nathan R.; Vishnu, Abhinav; Kerbyson, Darren J.

    2017-08-24

    Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors --- including NVIDIA, Intel, AMD, and IBM --- have architectural road-maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. This paper provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling --- sometimes encouraged by restricted GPU memory --- NVLink is less important.

  13. High-performance computing on the Intel Xeon Phi how to fully exploit MIC architectures

    CERN Document Server

    Wang, Endong; Shen, Bo; Zhang, Guangyong; Lu, Xiaowei; Wu, Qing; Wang, Yajuan

    2014-01-01

    The aim of this book is to explain to high-performance computing (HPC) developers how to utilize the Intel® Xeon Phi™ series products efficiently. To that end, it introduces some computing grammar, programming technology and optimization methods for using many-integrated-core (MIC) platforms and also offers tips and tricks for actual use, based on the authors' first-hand optimization experience.The material is organized in three sections. The first section, "Basics of MIC", introduces the fundamentals of MIC architecture and programming, including the specific Intel MIC programming environment

  14. A fast random number generator for the Intel Paragon supercomputer

    Science.gov (United States)

    Gutbrod, F.

    1995-06-01

    A pseudo-random number generator is presented which makes optimal use of the architecture of the i860-microprocessor and which is expected to have a very long period. It is therefore a good candidate for use on the parallel supercomputer Paragon XP. In the assembler version, it needs 6.4 cycles for a real∗4 random number. There is a FORTRAN routine which yields identical numbers up to rare and minor rounding discrepancies, and it needs 28 cycles. The FORTRAN performance on other microprocessors is somewhat better. Arguments for the quality of the generator and some numerical tests are given.

  15. Exploring synchrotron radiation capabilities: The ALS-Intel CRADA

    International Nuclear Information System (INIS)

    Gozzo, F.; Cossy-Favre, A.; Padmore, H.

    1997-01-01

    Synchrotron radiation spectroscopy and spectromicroscopy were applied, at the Advanced Light Source, to the analysis of materials and problems of interest to the commercial semiconductor industry. The authors discuss some of the results obtained at the ALS using existing capabilities, in particular the small spot ultra-ESCA instrument on beamline 7.0 and the AMS (Applied Material Science) endstation on beamline 9.3.2. The continuing trend towards smaller feature size and increased performance for semiconductor components has driven the semiconductor industry to invest in the development of sophisticated and complex instrumentation for the characterization of microstructures. Among the crucial milestones established by the Semiconductor Industry Association are the needs for high quality, defect free and extremely clean silicon wafers, very thin gate oxides, lithographies near 0.1 micron and advanced material interconnect structures. The requirements of future generations cannot be met with current industrial technologies. The purpose of the ALS-Intel CRADA (Cooperative Research And Development Agreement) is to explore, compare and improve the utility of synchrotron-based techniques for practical analysis of substrates of interest to semiconductor chip manufacturing. The first phase of the CRADA project consisted in exploring existing ALS capabilities and techniques on some problems of interest. Some of the preliminary results obtained on Intel samples are discussed here

  16. Implementation of an Agent-Based Parallel Tissue Modelling Framework for the Intel MIC Architecture

    Directory of Open Access Journals (Sweden)

    Maciej Cytowski

    2017-01-01

    Full Text Available Timothy is a novel large scale modelling framework that allows simulating of biological processes involving different cellular colonies growing and interacting with variable environment. Timothy was designed for execution on massively parallel High Performance Computing (HPC systems. The high parallel scalability of the implementation allows for simulations of up to 109 individual cells (i.e., simulations at tissue spatial scales of up to 1 cm3 in size. With the recent advancements of the Timothy model, it has become critical to ensure appropriate performance level on emerging HPC architectures. For instance, the introduction of blood vessels supplying nutrients to the tissue is a very important step towards realistic simulations of complex biological processes, but it greatly increased the computational complexity of the model. In this paper, we describe the process of modernization of the application in order to achieve high computational performance on HPC hybrid systems based on modern Intel® MIC architecture. Experimental results on the Intel Xeon Phi™ coprocessor x100 and the Intel Xeon Phi processor x200 are presented.

  17. HANSF 1.3 user's manual

    Energy Technology Data Exchange (ETDEWEB)

    PLYS, M.G.

    1999-05-21

    The HANSF analysis tool is an integrated model considering phenomena inside a multi-canister overpack (MCO) spent nuclear fuel container such as fuel oxidation, convective and radiative heat transfer, and the potential for fission product release. It may be used for all phases of spent fuel disposition including cold vacuum drying, transportation, and storage. This manual reflects HANSF version 1.3, a revised version of version 1.2a. HANSF 1.3 was written to add new models for axial nodalization, add new features for ease of usage, and correct errors. HANSF 1.3 is intended for use on personal computers such as IBM-compatible machines with Intel processors running under a DOS-type operating system. HANSF 1.3 is known to compile under Lahey TI and Digital Visual FORTRAN, Version 6.0, but this does not preclude operation in other environments.

  18. HANSF 1.3 user's manual

    International Nuclear Information System (INIS)

    PLYS, M.G.

    1999-01-01

    The HANSF analysis tool is an integrated model considering phenomena inside a multi-canister overpack (MCO) spent nuclear fuel container such as fuel oxidation, convective and radiative heat transfer, and the potential for fission product release. It may be used for all phases of spent fuel disposition including cold vacuum drying, transportation, and storage. This manual reflects HANSF version 1.3, a revised version of version 1.2a. HANSF 1.3 was written to add new models for axial nodalization, add new features for ease of usage, and correct errors. HANSF 1.3 is intended for use on personal computers such as IBM-compatible machines with Intel processors running under a DOS-type operating system. HANSF 1.3 is known to compile under Lahey TI and Digital Visual FORTRAN, Version 6.0, but this does not preclude operation in other environments

  19. HANSF 1.3 user's manual; TOPICAL

    International Nuclear Information System (INIS)

    PLYS, M.G.

    1999-01-01

    The HANSF analysis tool is an integrated model considering phenomena inside a multi-canister overpack (MCO) spent nuclear fuel container such as fuel oxidation, convective and radiative heat transfer, and the potential for fission product release. It may be used for all phases of spent fuel disposition including cold vacuum drying, transportation, and storage. This manual reflects HANSF version 1.3, a revised version of version 1.2a. HANSF 1.3 was written to add new models for axial nodalization, add new features for ease of usage, and correct errors. HANSF 1.3 is intended for use on personal computers such as IBM-compatible machines with Intel processors running under a DOS-type operating system. HANSF 1.3 is known to compile under Lahey TI and Digital Visual FORTRAN, Version 6.0, but this does not preclude operation in other environments

  20. Protein Alignment on the Intel Xeon Phi Coprocessor

    OpenAIRE

    Ramstad, Jorun

    2015-01-01

    There is an increasing need for sensitive, high perfomance sequence alignemnet tools. With the growing databases of scientificly analyzed protein sequences, more compute power is necessary. Specialized architectures arise, and a transition from serial to specialized implementationsis is required. This thesis is a study of whether Intel 60's cores Xeon Phi coprocessor is a suitable architecture for implementation of a sequence alignment tool. The performance relative to existing tools are eval...

  1. Intel·ligència emocional a maternal

    OpenAIRE

    Missé Cortina, Jordi

    2015-01-01

    Inclusió d'activitats d'intel·ligència emocional a maternal A i B per al treball de l'adquisició de valors com l'autoestima, el respecte, la tolerància, etc. Inclusión de actividades de inteligencia emocional en maternal A y B para el trabajo de la adquisición de valores como la autoestima, el respeto, la tolerancia, etc. Practicum for the Psychology program on Educational Psychology.

  2. Communication overhead on the Intel iPSC-860 hypercube

    Science.gov (United States)

    Bokhari, Shahid H.

    1990-01-01

    Experiments were conducted on the Intel iPSC-860 hypercube in order to evaluate the overhead of interprocessor communication. It is demonstrated that: (1) contrary to popular belief, the distance between two communicating processors has a significant impact on communication time, (2) edge contention can increase communication time by a factor of more than 7, and (3) node contention has no measurable impact.

  3. Implementation of the Next Generation Attenuation (NGA) ground-motion prediction equations in Fortran and R

    Science.gov (United States)

    Kaklamanos, James; Boore, David M.; Thompson, Eric M.; Campbell, Kenneth W.

    2010-01-01

    This report presents two methods for implementing the earthquake ground-motion prediction equations released in 2008 as part of the Next Generation Attenuation of Ground Motions (NGA-West, or NGA) project coordinated by the Pacific Earthquake Engineering Research Center (PEER). These models were developed for predicting ground-motion parameters for shallow crustal earthquakes in active tectonic regions (such as California). Of the five ground-motion prediction equations (GMPEs) developed during the NGA project, four models are implemented: the GMPEs of Abrahamson and Silva (2008), Boore and Atkinson (2008), Campbell and Bozorgnia (2008), and Chiou and Youngs (2008a); these models are abbreviated as AS08, BA08, CB08, and CY08, respectively. Since site response is widely recognized as an important influence of ground motions, engineering applications typically require that such effects be modeled. The model of Idriss (2008) is not implemented in our programs because it does not explicitly include site response, whereas the other four models include site response and use the same variable to describe the site condition (VS30). We do not intend to discourage the use of the Idriss (2008) model, but we have chosen to implement the other four NGA models in our programs for those users who require ground-motion estimates for various site conditions. We have implemented the NGA models by using two separate programming languages: Fortran and R (R Development Core Team, 2010). Fortran, a compiled programming language, has been used in the scientific community for decades. R is an object-oriented language and environment for statistical computing that is gaining popularity in the statistical and scientific community. Derived from the S language and environment developed at Bell Laboratories, R is an open-source language that is freely available at http://www.r-project.org/ (last accessed 11 January 2011). In R, the functions for computing the NGA equations can be loaded as an

  4. Application of a parallel 3-dimensional hydrogeochemistry HPF code to a proposed waste disposal site at the Oak Ridge National Laboratory

    International Nuclear Information System (INIS)

    Gwo, Jin-Ping; Yeh, Gour-Tsyh

    1997-01-01

    The objectives of this study are (1) to parallelize a 3-dimensional hydrogeochemistry code and (2) to apply the parallel code to a proposed waste disposal site at the Oak Ridge National Laboratory (ORNL). The 2-dimensional hydrogeochemistry code HYDROGEOCHEM, developed at the Pennsylvania State University for coupled subsurface solute transport and chemical equilibrium processes, was first modified to accommodate 3-dimensional problem domains. A bi-conjugate gradient stabilized linear matrix solver was then incorporated to solve the matrix equation. We chose to parallelize the 3-dimensional code on the Intel Paragons at ORNL by using an HPF (high performance FORTRAN) compiler developed at PGI. The data- and task-parallel algorithms available in the HPF compiler proved to be highly efficient for the geochemistry calculation. This calculation can be easily implemented in HPF formats and is perfectly parallel because the chemical speciation on one finite-element node is virtually independent of those on the others. The parallel code was applied to a subwatershed of the Melton Branch at ORNL. Chemical heterogeneity, in addition to physical heterogeneities of the geological formations, has been identified as one of the major factors that affect the fate and transport of contaminants at ORNL. This study demonstrated an application of the 3-dimensional hydrogeochemistry code on the Melton Branch site. A uranium tailing problem that involved in aqueous complexation and precipitation-dissolution was tested. Performance statistics was collected on the Intel Paragons at ORNL. Implications of these results on the further optimization of the code were discussed

  5. Parallel Programming with Intel Parallel Studio XE

    CERN Document Server

    Blair-Chappell , Stephen

    2012-01-01

    Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the

  6. Evaluating the transport layer of the ALFA framework for the Intel® Xeon Phi™ Coprocessor

    Science.gov (United States)

    Santogidis, Aram; Hirstius, Andreas; Lalis, Spyros

    2015-12-01

    The ALFA framework supports the software development of major High Energy Physics experiments. As part of our research effort to optimize the transport layer of ALFA, we focus on profiling its data transfer performance for inter-node communication on the Intel Xeon Phi Coprocessor. In this article we present the collected performance measurements with the related analysis of the results. The optimization opportunities that are discovered, help us to formulate the future plans of enabling high performance data transfer for ALFA on the Intel Xeon Phi architecture.

  7. Performance tuning Weather Research and Forecasting (WRF) Goddard longwave radiative transfer scheme on Intel Xeon Phi

    Science.gov (United States)

    Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

    2015-10-01

    Next-generation mesoscale numerical weather prediction system, the Weather Research and Forecasting (WRF) model, is a designed for dual use for forecasting and research. WRF offers multiple physics options that can be combined in any way. One of the physics options is radiance computation. The major source for energy for the earth's climate is solar radiation. Thus, it is imperative to accurately model horizontal and vertical distribution of the heating. Goddard solar radiative transfer model includes the absorption duo to water vapor,ozone, ozygen, carbon dioxide, clouds and aerosols. The model computes the interactions among the absorption and scattering by clouds, aerosols, molecules and surface. Finally, fluxes are integrated over the entire longwave spectrum.In this paper, we present our results of optimizing the Goddard longwave radiative transfer scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The optimizations improved the performance of the original Goddard longwave radiative transfer scheme on Xeon Phi 7120P by a factor of 2.2x. Furthermore, the same optimizations improved the performance of the Goddard longwave radiative transfer scheme on a dual socket configuration of eight core Intel Xeon E5-2670 CPUs by a factor of 2.1x compared to the original Goddard longwave radiative transfer scheme code.

  8. MAPLIB, Thermodynamics Materials Property Generator for FORTRAN Program

    International Nuclear Information System (INIS)

    Schumann, U.; Zimmerer, W. and others

    1978-01-01

    1 - Nature of physical problem solved: MAPLIB is a program system which is able to incorporate the values of the properties of any material in a form suitable for use in other computer programs. The data are implemented in FORTRAN functions. A utility program is provided to assist in library management. 2 - Method of solution: MAPLIB consists of the following parts: 1) Conventions for the data format. 2) Some integrated data. 3) A data access system (FORTRAN subroutine). 4) An utility program for updating and documentation of the actual library content. The central part is a set of FORTRAN functions, e.g. WL H2O v(t,p) (heat conduction of water vapor as a function of temperature and pressure), which compute the required data and which can be called by the user program. The data content of MAPLIB has been delivered by many persons. There was no systematic evaluation of the material. It is the responsibility of every user to check the data for physical accuracy. MAPLIB only serves as a library system for manipulation and storing of such data. 3 - Restrictions on the complexity of the problem: a) See responsibility as explained above. b) Up to 1000 data functions could be implemented. c) If too many data functions are included in MAPLIB, the storage requirements become excessive for application in users programs

  9. FORTRAN data files transference from VAX/VMS to ALPHA/UNIX

    International Nuclear Information System (INIS)

    Sanchez, E.; Milligen, B.Ph. van

    1997-01-01

    Several tools have been developed to access the TJ-I and TJ-IU databases, which currently reside in VAX/VMS servers, from the TJ-II Data Acquisition System DEC ALPHA 8400 server. The TJ-I/TJ-IU databases are not homogeneous and contain several types of data files, namely, SADE. CAMAC and FORTRAN un formatted files. The tools presented in this report allow one to transfer CAMAC and those FORTRAN un formatted files defined herein. from a VAX/VMS server, for data manipulation on the ALPHA/Digital UNIX server. (Author) 5 refs

  10. Numerical methods of mathematical optimization with Algol and Fortran programs

    CERN Document Server

    Künzi, Hans P; Zehnder, C A; Rheinboldt, Werner

    1971-01-01

    Numerical Methods of Mathematical Optimization: With ALGOL and FORTRAN Programs reviews the theory and the practical application of the numerical methods of mathematical optimization. An ALGOL and a FORTRAN program was developed for each one of the algorithms described in the theoretical section. This should result in easy access to the application of the different optimization methods.Comprised of four chapters, this volume begins with a discussion on the theory of linear and nonlinear optimization, with the main stress on an easily understood, mathematically precise presentation. In addition

  11. Intel Many Integrated Core (MIC) architecture optimization strategies for a memory-bound Weather Research and Forecasting (WRF) Goddard microphysics scheme

    Science.gov (United States)

    Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

    2014-10-01

    The Goddard cloud microphysics scheme is a sophisticated cloud microphysics scheme in the Weather Research and Forecasting (WRF) model. The WRF is a widely used weather prediction system in the world. It development is a done in collaborative around the globe. The Goddard microphysics scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. Compared to the earlier microphysics schemes, the Goddard scheme incorporates a large number of improvements. Thus, we have optimized the code of this important part of WRF. In this paper, we present our results of optimizing the Goddard microphysics scheme on Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi coprocessor is the first product based on Intel MIC architecture, and it consists of up to 61 cores connected by a high performance on-die bidirectional interconnect. The Intel MIC is capable of executing a full operating system and entire programs rather than just kernels as the GPU do. The MIC coprocessor supports all important Intel development tools. Thus, the development environment is familiar one to a vast number of CPU developers. Although, getting a maximum performance out of MICs will require using some novel optimization techniques. Those optimization techniques are discusses in this paper. The results show that the optimizations improved performance of the original code on Xeon Phi 7120P by a factor of 4.7x. Furthermore, the same optimizations improved performance on a dual socket Intel Xeon E5-2670 system by a factor of 2.8x compared to the original code.

  12. Dynamic Memory De-allocation in Fortran 95/2003 Derived Type Calculus

    Directory of Open Access Journals (Sweden)

    Damian W.I. Rouson

    2005-01-01

    Full Text Available Abstract data types developed for computational science and engineering are frequently modeled after physical objects whose state variables must satisfy governing differential equations. Generalizing the associated algebraic and differential operators to operate on the abstract data types facilitates high-level program constructs that mimic standard mathematical notation. For non-trivial expressions, multiple object instantiations must occur to hold intermediate results during the expression's evaluation. When the dimension of each object's state space is not specified at compile-time, the programmer becomes responsible for dynamically allocating and de-allocating memory for each instantiation. With the advent of allocatable components in Fortran 2003 derived types, the potential exists for these intermediate results to occupy a substantial fraction of a program's footprint in memory. This issue becomes particularly acute at the highest levels of abstraction where coarse-grained data structures predominate. This paper proposes a set of rules for de-allocating memory that has been dynamically allocated for intermediate results in derived type calculus, while distinguishing that memory from more persistent objects. The new rules are applied to the design of a polymorphic time integrator for integrating evolution equations governing dynamical systems. Associated issues of efficiency and design robustness are discussed.

  13. Implementation of High-Order Multireference Coupled-Cluster Methods on Intel Many Integrated Core Architecture.

    Science.gov (United States)

    Aprà, E; Kowalski, K

    2016-03-08

    In this paper we discuss the implementation of multireference coupled-cluster formalism with singles, doubles, and noniterative triples (MRCCSD(T)), which is capable of taking advantage of the processing power of the Intel Xeon Phi coprocessor. We discuss the integration of two levels of parallelism underlying the MRCCSD(T) implementation with computational kernels designed to offload the computationally intensive parts of the MRCCSD(T) formalism to Intel Xeon Phi coprocessors. Special attention is given to the enhancement of the parallel performance by task reordering that has improved load balancing in the noniterative part of the MRCCSD(T) calculations. We also discuss aspects regarding efficient optimization and vectorization strategies.

  14. Pattern recognition in molecular dynamics. [FORTRAN

    Energy Technology Data Exchange (ETDEWEB)

    Zurek, W H; Schieve, W C [Texas Univ., Austin (USA)

    1977-07-01

    An algorithm for the recognition of the formation of bound molecular states in the computer simulation of a dilute gas is presented. Applications to various related problems in physics and chemistry are pointed out. Data structure and decision processes are described. Performance of the FORTRAN program based on the algorithm in cooperation with the molecular dynamics program is described and the results are presented.

  15. Engineering a compiler

    CERN Document Server

    Cooper, Keith D

    2012-01-01

    As computing has changed, so has the role of both the compiler and the compiler writer. The proliferation of processors, environments, and constraints demands an equally large number of compilers. To adapt, compiler writers retarget code generators, add optimizations, and work on issues such as code space or power consumption. Engineering a Compiler re-balances the curriculum for an introductory course in compiler construction to reflect the issues that arise in today's practice. Authors Keith Cooper and Linda Torczon convey both the art and the science of compiler construction and show best practice algorithms for the major problems inside a compiler. ·Focuses on the back end of the compiler-reflecting the focus of research and development over the last decade ·Applies the well-developed theory behind scanning and parsing to introduce concepts that play a critical role in optimization and code generation. ·Introduces the student to optimization through data-flow analysis, SSA form, and a selection of sc...

  16. Experiences in Data-Parallel Programming

    Directory of Open Access Journals (Sweden)

    Terry W. Clark

    1997-01-01

    Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.

  17. Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned

    OpenAIRE

    Georgios Goumas

    2014-01-01

    In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Intel Xeon Phi coprocessor. Our efforts involved both the evaluation of programming models including OpenCL, POSIX threads and OpenMP and typical optimization strategies like parallelization and vectorization. Since the straightforward porting process of the already existing OpenCL version of the code encountered performance problems that require further analysis, we focused our efforts on the impl...

  18. Single event effect testing of the Intel 80386 family and the 80486 microprocessor

    International Nuclear Information System (INIS)

    Moran, A.; LaBel, K.; Gates, M.; Seidleck, C.; McGraw, R.; Broida, M.; Firer, J.; Sprehn, S.

    1996-01-01

    The authors present single event effect test results for the Intel 80386 microprocessor, the 80387 coprocessor, the 82380 peripheral device, and on the 80486 microprocessor. Both single event upset and latchup conditions were monitored

  19. Bridging FORTRAN to object oriented paradigm for HEP data modeling task

    International Nuclear Information System (INIS)

    Huang, J.

    1993-12-01

    Object oriented (OO) technology appears to offer tangible benefits to the high energy physics (HEP) community. Yet many physicists view this newest software development used with much reservation and reluctance. Facing the reality of having to support the existing physics applications, which are written in FORTRAN, the software engineers in the Computer Engineering Group of the Physics Research Division at the Superconducting Super Collider Laboratory have accepted the challenge of mixing an old language with the new technology. This paper describes the experience and the techniques devised to fit FORTRAN into the OO paradigm (OOP)

  20. The Transition and Adoption to Modern Programming Concepts for Scientific Computing in Fortran

    Directory of Open Access Journals (Sweden)

    Charles D. Norton

    2007-01-01

    Full Text Available This paper describes our experiences in the early exploration of modern concepts introduced in Fortran90 for large-scale scientific programming. We review our early work in expressing object-oriented concepts based on the new Fortran90 constructs – foreign to most programmers at the time – our experimental work in applying them to various applications, the impact on the WG5/J3 standards committees to consider formalizing object-oriented constructs for later versions of Fortran, and work in exploring how other modern programming techniques such as Design Patterns can and have impacted our software development. Applications will be drawn from plasma particle simulation and finite element adaptive mesh refinement for solid earth crustal deformation modeling.

  1. Numerical integration subprogrammes in Fortran II-D

    Energy Technology Data Exchange (ETDEWEB)

    Fry, C. R.

    1966-12-15

    This note briefly describes some integration subprogrammes written in FORTRAN II-D for the IBM 1620-II at CARDE. These presented are two Newton-Cotes, Chebyshev polynomial summation, Filon's, Nordsieck's and optimum Runge-Kutta and predictor-corrector methods. A few miscellaneous numerical integration procedures are also mentioned covering statistical functions, oscillating integrands and functions occurring in electrical engineering.

  2. Program package for multicanonical simulations of U(1) lattice gauge theory-Second version

    Science.gov (United States)

    Bazavov, Alexei; Berg, Bernd A.

    2013-03-01

    A new version STMCMUCA_V1_1 of our program package is available. It eliminates compatibility problems of our Fortran 77 code, originally developed for the g77 compiler, with Fortran 90 and 95 compilers. New version program summaryProgram title: STMC_U1MUCA_v1_1 Catalogue identifier: AEET_v1_1 Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html Programming language: Fortran 77 compatible with Fortran 90 and 95 Computers: Any capable of compiling and executing Fortran code Operating systems: Any capable of compiling and executing Fortran code RAM: 10 MB and up depending on lattice size used No. of lines in distributed program, including test data, etc.: 15059 No. of bytes in distributed program, including test data, etc.: 215733 Keywords: Markov chain Monte Carlo, multicanonical, Wang-Landau recursion, Fortran, lattice gauge theory, U(1) gauge group, phase transitions of continuous systems Classification: 11.5 Catalogue identifier of previous version: AEET_v1_0 Journal Reference of previous version: Computer Physics Communications 180 (2009) 2339-2347 Does the new version supersede the previous version?: Yes Nature of problem: Efficient Markov chain Monte Carlo simulation of U(1) lattice gauge theory (or other continuous systems) close to its phase transition. Measurements and analysis of the action per plaquette, the specific heat, Polyakov loops and their structure factors. Solution method: Multicanonical simulations with an initial Wang-Landau recursion to determine suitable weight factors. Reweighting to physical values using logarithmic coding and calculating jackknife error bars. Reasons for the new version: The previous version was developed for the g77 compiler Fortran 77 version. Compiler errors were encountered with Fortran 90 and Fortran 95 compilers (specified below). Summary of revisions: epsilon=one/10**10 is replaced by epsilon/10.0D10 in the parameter statements of the subroutines u1_bmha.f, u1_mucabmha.f, u1wl

  3. Fortran code for generating random probability vectors, unitaries, and quantum states

    Directory of Open Access Journals (Sweden)

    Jonas eMaziero

    2016-03-01

    Full Text Available The usefulness of generating random configurations is recognized in many areas of knowledge. Fortran was born for scientific computing and has been one of the main programming languages in this area since then. And several ongoing projects targeting towards its betterment indicate that it will keep this status in the decades to come. In this article, we describe Fortran codes produced, or organized, for the generation of the following random objects: numbers, probability vectors, unitary matrices, and quantum state vectors and density matrices. Some matrix functions are also included and may be of independent interest.

  4. Compiler Feedback using Continuous Dynamic Compilation during Development

    DEFF Research Database (Denmark)

    Jensen, Nicklas Bo; Karlsson, Sven; Probst, Christian W.

    2014-01-01

    to optimization. This tool can help programmers understand what the optimizing compiler has done and suggest automatic source code changes in cases where the compiler refrains from optimizing. We have integrated our tool into an integrated development environment, interactively giving feedback as part...

  5. NSW Executive Enhancements

    Science.gov (United States)

    1981-06-01

    attacked most effectively with networking. Indeed, it was the marked success of the Arpanet in providing programmers economical access to diverse... economically feasible in the present -5- -**nm- ,-- ^ NSW Final Report for the Period Ending December 1980 state of the art: it would have required...INTEL 8080 MFU. PLM80: a cross compiler for the INTEL 8080 MPU . MACR020: a cross assembler for the AN/UYK-20 computer. CMS2-M: a cross compiler

  6. FORTRAN text correction with the CDC-1604-A console typewriter during reading a punched card program

    International Nuclear Information System (INIS)

    Kotorobaj, F.; Ruzhichka, Ya.; Stolyarskij, Yu.V.

    1977-01-01

    The paper describes FORTRAN text correction with the CDC 1604-A console typewriter during reading a punched card program. This method gives one more possibility of FORTRAN program correction during program's input to the CDC 1604-A computer. This essentially reduced the time necessary for punched card correction with other methods. Possibility of inputting desired number of punched cards one after another allows one writing small FORTRAN programs to computer core storage with simultaneous punching of the cards. The correction program has been written to the CDC 1604 COOP monitor

  7. User manual for two simple postscript output FORTRAN plotting routines

    Science.gov (United States)

    Nguyen, T. X.

    1991-01-01

    Graphics is one of the important tools in engineering analysis and design. However, plotting routines that generate output on high quality laser printers normally come in graphics packages, which tend to be expensive and system dependent. These factors become important for small computer systems or desktop computers, especially when only some form of a simple plotting routine is sufficient. With the Postscript language becoming popular, there are more and more Postscript laser printers now available. Simple, versatile, low cost plotting routines that can generate output on high quality laser printers are needed and standard FORTRAN language plotting routines using output in Postscript language seems logical. The purpose here is to explain two simple FORTRAN plotting routines that generate output in Postscript language.

  8. Multidimentional and Multi-Parameter Fortran-Based Curve Fitting ...

    African Journals Online (AJOL)

    This work briefly describes the mathematics behind the algorithm, and also elaborates how to implement it using FORTRAN 95 programming language. The advantage of this algorithm, when it is extended to surfaces and complex functions, is that it makes researchers to have a better trust during fitting. It also improves the ...

  9. An Introduction to Fortran Programming: An IPI Approach.

    Science.gov (United States)

    Fisher, D. D.; And Others

    This text is designed to give individually paced instruction in Fortran Programing. The text contains fifteen units. Unit titles include: Flowcharts, Input and Output, Loops, and Debugging. Also included is an extensive set of appendices. These were designed to contain a great deal of practical information necessary to the course. These appendices…

  10. FORTRAN programs for transient eddy current calculations using a perturbation-polynomial expansion technique

    International Nuclear Information System (INIS)

    Carpenter, K.H.

    1976-11-01

    A description is given of FORTRAN programs for transient eddy current calculations in thin, non-magnetic conductors using a perturbation-polynomial expansion technique. Basic equations are presented as well as flow charts for the programs implementing them. The implementation is in two steps--a batch program to produce an intermediate data file and interactive programs to produce graphical output. FORTRAN source listings are included for all program elements, and sample inputs and outputs are given for the major programs

  11. Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™

    OpenAIRE

    Gomes, Jeremias M.; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H.

    2015-01-01

    We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon Phi™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP’s irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high perfo...

  12. A Computer Program to Compile a Flander-Amidon Interaction Analysis Matrix

    Science.gov (United States)

    Hardy, Robert C.

    1970-01-01

    A program was written in FORTRAN IV for an IBM 3600 to produce the Flanders-Amidon Interaction Analysis Matrix and to also produce percentages of certain p FORTRAN IV and V for the Univac 1108. (Editor/RT)

  13. Evaluation of the Intel Xeon Phi Co-processor to accelerate the sensitivity map calculation for PET imaging

    Science.gov (United States)

    Dey, T.; Rodrigue, P.

    2015-07-01

    We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 103 to 104 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called `intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The examination

  14. Evaluation of the Intel Xeon Phi Co-processor to accelerate the sensitivity map calculation for PET imaging

    International Nuclear Information System (INIS)

    Dey, T.; Rodrigue, P.

    2015-01-01

    We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 10 3 to 10 4 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called 'intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The

  15. Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor

    OpenAIRE

    Lu, Mian; Zhang, Lei; Huynh, Huynh Phung; Ong, Zhongliang; Liang, Yun; He, Bingsheng; Goh, Rick Siow Mong; Huynh, Richard

    2013-01-01

    With the ease-of-programming, flexibility and yet efficiency, MapReduce has become one of the most popular frameworks for building big-data applications. MapReduce was originally designed for distributed-computing, and has been extended to various architectures, e,g, multi-core CPUs, GPUs and FPGAs. In this work, we focus on optimizing the MapReduce framework on Xeon Phi, which is the latest product released by Intel based on the Many Integrated Core Architecture. To the best of our knowledge...

  16. Does the Intel Xeon Phi processor fit HEP workloads?

    Science.gov (United States)

    Nowak, A.; Bitzes, G.; Dotti, A.; Lazzaro, A.; Jarp, S.; Szostek, P.; Valsan, L.; Botezatu, M.; Leduc, J.

    2014-06-01

    This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.

  17. Profiling CPU-bound workloads on Intel Haswell-EP platforms

    CERN Document Server

    Guerri, Marco; Cristovao, Cordeiro; CERN. Geneva. IT Department

    2017-01-01

    With the increasing adoption of public and private cloud resources to support the demands in terms of computing capacity of the WLCG, the HEP community has begun studying several benchmarking applications aimed at continuously assessing the performance of virtual machines procured from commercial providers. In order to characterise the behaviour of these benchmarks, in-depth profiling activities have been carried out. In this document we outline our experience in profiling one specific application, the ATLAS Kit Validation, in an attempt to explain an unexpected distribution in the performance samples obtained on systems based on Intel Haswell-EP processors.

  18. Nonalgebraic symbol manipulators for use in scientific and engineering modeling: introducing the FORSE (FORtran Symobol Expander)

    International Nuclear Information System (INIS)

    Schultz, J.H.; Lettvin, J.D.

    1982-02-01

    A system of nonalgebraic symbol manipulators, called The FORSE (FORtran Symbol Expanders) has been developed to document and prepare input-output for Fortran programs. The use of documentation at the level of the individual equation is defended. The operation of The FORSE is described, along with user instructions and a worked example

  19. Existing Fortran interfaces to Trilinos in preparation for exascale ForTrilinos development

    Energy Technology Data Exchange (ETDEWEB)

    Evans, Katherine J. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Young, Mitchell T. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Collins, Benjamin S. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Johnson, Seth R. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Prokopenko, Andrey V. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Heroux, Michael A. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-03-01

    This report summarizes the current state of Fortran interfaces to the Trilinos library within several key applications of the Exascale Computing Program (ECP), with the aim of informing developers about strategies to develop ForTrilinos, an exascale-ready, Fortran interface software package within Trilinos. The two software projects assessed within are the DOE Office of Science's Accelerated Climate Model for Energy (ACME) atmosphere component, CAM, and the DOE Office of Nuclear Energy's core-simulator portion of VERA, a nuclear reactor simulation code. Trilinos is an object-oriented, C++ based software project, and spans a collection of algorithms and other enabling technologies such as uncertainty quantification and mesh generation. To date, Trilinos has enabled these codes to achieve large-scale simulation results, however the simulation needs of CAM and VERA-CS will approach exascale over the next five years. A Fortran interface to Trilinos that enables efficient use of programming models and more advanced algorithms is necessary. Where appropriate, the needs of the CAM and VERA-CS software to achieve their simulation goals are called out specifically. With this report, a design document and execution plan for ForTrilinos development can proceed.

  20. VizieR Online Data Catalog: Habitable zones around main-sequence stars (Kopparapu+, 2014)

    Science.gov (United States)

    Kopparapu, R. K.; Ramirez, R. M.; Schottelkotte, J.; Kasting, J. F.; Domagal-Goldman, S.; Eymet, V.

    2017-08-01

    Language: Fortran 90 Code tested under the following compilers/operating systems: ifort/CentOS linux Description of input data: No input necessary. Description of output data: Output files: HZs.dat, HZ_coefficients.dat System requirements: No major system requirement. Fortran compiler necessary. Calls to external routines: None. Additional comments: None (1 data file).

  1. Computer Language For Optimization Of Design

    Science.gov (United States)

    Scotti, Stephen J.; Lucas, Stephen H.

    1991-01-01

    SOL is computer language geared to solution of design problems. Includes mathematical modeling and logical capabilities of computer language like FORTRAN; also includes additional power of nonlinear mathematical programming methods at language level. SOL compiler takes SOL-language statements and generates equivalent FORTRAN code and system calls. Provides syntactic and semantic checking for recovery from errors and provides detailed reports containing cross-references to show where each variable used. Implemented on VAX/VMS computer systems. Requires VAX FORTRAN compiler to produce executable program.

  2. Calculation of normalised organ and effective doses to adult reference computational phantoms from contemporary computed tomography scanners

    International Nuclear Information System (INIS)

    Jansen, Jan T.M.; Shrimpton, Paul C.

    2010-01-01

    The general-purpose Monte Carlo radiation transport code MCNPX has been used to simulate photon transport and energy deposition in anthropomorphic phantoms due to the x-ray exposure from the Philips iCT 256 and Siemens Definition CT scanners, together with the previously studied General Electric 9800. The MCNPX code was compiled with the Intel FORTRAN compiler and run on a Linux PC cluster. A patch has been successfully applied to reduce computing times by about 4%. The International Commission on Radiological Protection (ICRP) has recently published the Adult Male (AM) and Adult Female (AF) reference computational voxel phantoms as successors to the Medical Internal Radiation Dose (MIRD) stylised hermaphrodite mathematical phantoms that form the basis for the widely-used ImPACT CT dosimetry tool. Comparisons of normalised organ and effective doses calculated for a range of scanner operating conditions have demonstrated significant differences in results (in excess of 30%) between the voxel and mathematical phantoms as a result of variations in anatomy. These analyses illustrate the significant influence of choice of phantom on normalised organ doses and the need for standardisation to facilitate comparisons of dose. Further such dose simulations are needed in order to update the ImPACT CT Patient Dosimetry spreadsheet for contemporary CT practice. (author)

  3. Implementation of Neutronics Analysis Code using the Features of Object Oriented Programming via Fortran90/95

    Energy Technology Data Exchange (ETDEWEB)

    Han, Tae Young; Cho, Beom Jin [KEPCO Nuclear Fuel, Daejeon (Korea, Republic of)

    2011-05-15

    The object-oriented programming (OOP) concept was radically established after 1990s and successfully involved in Fortran 90/95. The features of OOP are such as the information hiding, encapsulation, modularity and inheritance, which lead to producing code that satisfy three R's: reusability, reliability and readability. The major OOP concepts, however, except Module are not mainly used in neutronics analysis codes even though the code was written by Fortran 90/95. In this work, we show that the OOP concept can be employed to develop the neutronics analysis code, ASTRA1D (Advanced Static and Transient Reactor Analyzer for 1-Dimension), via Fortran90/95 and those can be more efficient and reasonable programming methods

  4. A FORTRAN program for a least-square fitting

    International Nuclear Information System (INIS)

    Yamazaki, Tetsuo

    1978-01-01

    A practical FORTRAN program for a least-squares fitting is presented. Although the method is quite usual, the program calculates not only the most satisfactory set of values of unknowns but also the plausible errors associated with them. As an example, a measured lateral absorbed-dose distribution in water for a narrow 25-MeV electron beam is fitted to a Gaussian distribution. (auth.)

  5. Evaluation of the Intel Xeon Phi 7120 and NVIDIA K80 as accelerators for two-dimensional panel codes.

    Science.gov (United States)

    Einkemmer, Lukas

    2017-01-01

    To optimize the geometry of airfoils for a specific application is an important engineering problem. In this context genetic algorithms have enjoyed some success as they are able to explore the search space without getting stuck in local optima. However, these algorithms require the computation of aerodynamic properties for a significant number of airfoil geometries. Consequently, for low-speed aerodynamics, panel methods are most often used as the inner solver. In this paper we evaluate the performance of such an optimization algorithm on modern accelerators (more specifically, the Intel Xeon Phi 7120 and the NVIDIA K80). For that purpose, we have implemented an optimized version of the algorithm on the CPU and Xeon Phi (based on OpenMP, vectorization, and the Intel MKL library) and on the GPU (based on CUDA and the MAGMA library). We present timing results for all codes and discuss the similarities and differences between the three implementations. Overall, we observe a speedup of approximately 2.5 for adding an Intel Xeon Phi 7120 to a dual socket workstation and a speedup between 3.4 and 3.8 for adding a NVIDIA K80 to a dual socket workstation.

  6. Ada Compiler Validation Summary Report: Certificate Number: 940325S1. 11352 DDC-I DACS Sun SPARC/Solaries to Pentium PM Bare Ada Cross Compiler System, Version 4.6.4 Sun SPARCclassic = Intel Pentium (Operated as Bare Machine) Based in Xpress Desktop (Intel Product Number: XBASE6E4F-B)

    Science.gov (United States)

    1994-03-25

    Best Available Copy REPORT DOCUMENTATION PAGE _V -ONC Uft Xaf. WO -" Am u~~ ns~ 940325SI. 11352 , AVV: 94ddc5OO_3d. Compiler: DACS Sun SPARC/ aonais to...Manual for the Ada Proarammina Language, ANSI/MIL-STD-1815A, February 1983 and ISO 8652-1987. [Pro92] Ada Coupiler Validation Procedures, Version 3.1...objectives found to be irrelevant for the given Ada implementation. ISO International Organization for Standardization. LRM The Ada standard, or

  7. Why K-12 IT Managers and Administrators Are Embracing the Intel-Based Mac

    Science.gov (United States)

    Technology & Learning, 2007

    2007-01-01

    Over the past year, Apple has dramatically increased its share of the school computer marketplace--especially in the category of notebook computers. A recent study conducted by Grunwald Associates and Rockman et al. reports that one of the major reasons for this growth is Apple's introduction of the Intel processor to the entire line of Mac…

  8. Autonomous controller (JCAM 10) for CAMAC crate with 8080 (INTEL) microprocessor

    International Nuclear Information System (INIS)

    Gallice, P.; Mathis, M.

    1975-01-01

    The CAMAC crate autonomous controller JCAM-10 is designed around an INTEL 8080 microprocessor in association with a 5K RAM and 4K REPROM memory. The concept of the module is described, in which data transfers between CAMAC modules and the memory are optimised from software point of view as well as from execution time. In fact, the JCAM-10 is a microcomputer with a set of 1000 peripheral units represented by the CAMAC modules commercially available

  9. Mashup d'aplicacions basat en un buscador intel·ligent

    OpenAIRE

    Sancho Piqueras, Javier

    2010-01-01

    Mashup de funcionalitats, basat en un cercador intel·ligent, en aquest cas pensat per a cursos, carreres màsters, etc. La finalitat és adjuntar diverses aplicacions amb l'únic propòsit que en aquest cas és un buscador però que també ens permet utilitzar eines per a la connectivitat mitjançant web Services, o xarxes socials. Mashup de funcionalidades, basado en un buscador inteligente, en este caso pensado para cursos, carreras másters, etc. La finalidad es juntar diversas aplicaciones con ...

  10. Performance Evaluation of an Intel Haswell- and Ivy Bridge-Based Supercomputer Using Scientific and Engineering Applications

    Science.gov (United States)

    Saini, Subhash; Hood, Robert T.; Chang, Johnny; Baron, John

    2016-01-01

    We present a performance evaluation conducted on a production supercomputer of the Intel Xeon Processor E5- 2680v3, a twelve-core implementation of the fourth-generation Haswell architecture, and compare it with Intel Xeon Processor E5-2680v2, an Ivy Bridge implementation of the third-generation Sandy Bridge architecture. Several new architectural features have been incorporated in Haswell including improvements in all levels of the memory hierarchy as well as improvements to vector instructions and power management. We critically evaluate these new features of Haswell and compare with Ivy Bridge using several low-level benchmarks including subset of HPCC, HPCG and four full-scale scientific and engineering applications. We also present a model to predict the performance of HPCG and Cart3D within 5%, and Overflow within 10% accuracy.

  11. The formal specification of abstract data types and their implementation in Fortran 90: implementation issues concerning the use of pointers

    Science.gov (United States)

    Maley, D.; Kilpatrick, P. L.; Schreiner, E. W.; Scott, N. S.; Diercksen, G. H. F.

    1996-10-01

    In this paper we continue our investigation into the development of computational-science software based on the identification and formal specification of Abstract Data Types (ADTs) and their implementation in Fortran 90. In particular, we consider the consequences of using pointers when implementing a formally specified ADT in Fortran 90. Our aim is to highlight the resulting conflict between the goal of information hiding, which is central to the ADT methodology, and the space efficiency of the implementation. We show that the issue of storage recovery cannot be avoided by the ADT user, and present a range of implementations of a simple ADT to illustrate various approaches towards satisfactory storage management. Finally, we propose a set of guidelines for implementing ADTs using pointers in Fortran 90. These guidelines offer a way gracefully to provide disposal operations in Fortran 90. Such an approach is desirable since Fortran 90 does not provide automatic garbage collection which is offered by many object-oriented languages including Eiffel, Java, Smalltalk, and Simula.

  12. Does the Intel Xeon Phi processor fit HEP workloads?

    International Nuclear Information System (INIS)

    Nowak, A; Bitzes, G; Dotti, A; Lazzaro, A; Jarp, S; Szostek, P; Valsan, L; Botezatu, M; Leduc, J

    2014-01-01

    This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.

  13. Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi

    OpenAIRE

    Stanic, Milan; Palomar, Oscar; Ratkovic, Ivan; Duric, Milovan; Unsal, Osman; Cristal, Adrian; Valero, Mateo

    2014-01-01

    Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently released by Intel with new features such as a wide 512-bit vector unit and vector scatter/gather instru...

  14. Newsgroups, Activist Publics, and Corporate Apologia: The Case of Intel and Its Pentium Chip.

    Science.gov (United States)

    Hearit, Keith Michael

    1999-01-01

    Applies J. Grunig's theory of publics to the phenomenon of Internet newsgroups using the case of the flawed Intel Pentium chip. Argues that technology facilitates the rapid movement of publics from the theoretical construct stage to the active stage. Illustrates some of the difficulties companies face in establishing their identity in cyberspace.…

  15. 3-D electromagnetic plasma particle simulations on the Intel Delta parallel computer

    International Nuclear Information System (INIS)

    Wang, J.; Liewer, P.C.

    1994-01-01

    A three-dimensional electromagnetic PIC code has been developed on the 512 node Intel Touchstone Delta MIMD parallel computer. This code is based on the General Concurrent PIC algorithm which uses a domain decomposition to divide the computation among the processors. The 3D simulation domain can be partitioned into 1-, 2-, or 3-dimensional sub-domains. Particles must be exchanged between processors as they move among the subdomains. The Intel Delta allows one to use this code for very-large-scale simulations (i.e. over 10 8 particles and 10 6 grid cells). The parallel efficiency of this code is measured, and the overall code performance on the Delta is compared with that on Cray supercomputers. It is shown that their code runs with a high parallel efficiency of ≥ 95% for large size problems. The particle push time achieved is 115 nsecs/particle/time step for 162 million particles on 512 nodes. Comparing with the performance on a single processor Cray C90, this represents a factor of 58 speedup. The code uses a finite-difference leap frog method for field solve which is significantly more efficient than fast fourier transforms on parallel computers. The performance of this code on the 128 node Cray T3D will also be discussed

  16. Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing

    OpenAIRE

    Kanamori, Issaku; Matsufuru, Hideo

    2017-01-01

    We investigate implementation of lattice Quantum Chromodynamics (QCD) code on the Intel Xeon Phi Knights Landing (KNL). The most time consuming part of the numerical simulations of lattice QCD is a solver of linear equation for a large sparse matrix that represents the strong interaction among quarks. To establish widely applicable prescriptions, we examine rather general methods for the SIMD architecture of KNL, such as using intrinsics and manual prefetching, to the matrix multiplication an...

  17. Optimizing zonal advection of the Advanced Research WRF (ARW) dynamics for Intel MIC

    Science.gov (United States)

    Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

    2014-10-01

    The Weather Research and Forecast (WRF) model is the most widely used community weather forecast and research model in the world. There are two distinct varieties of WRF. The Advanced Research WRF (ARW) is an experimental, advanced research version featuring very high resolution. The WRF Nonhydrostatic Mesoscale Model (WRF-NMM) has been designed for forecasting operations. WRF consists of dynamics code and several physics modules. The WRF-ARW core is based on an Eulerian solver for the fully compressible nonhydrostatic equations. In the paper, we will use Intel Intel Many Integrated Core (MIC) architecture to substantially increase the performance of a zonal advection subroutine for optimization. It is of the most time consuming routines in the ARW dynamics core. Advection advances the explicit perturbation horizontal momentum equations by adding in the large-timestep tendency along with the small timestep pressure gradient tendency. We will describe the challenges we met during the development of a high-speed dynamics code subroutine for MIC architecture. Furthermore, lessons learned from the code optimization process will be discussed. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 2.4x.

  18. Reflective memory recorder upgrade: an opportunity to benchmark PowerPC and Intel architectures for real time

    Science.gov (United States)

    Abuter, Roberto; Tischer, Helmut; Frahm, Robert

    2014-07-01

    Several high frequency loops are required to run the VLTI (Very Large Telescope Interferometer) 2, e.g. for fringe tracking11, 5, angle tracking, vibration cancellation, data capture. All these loops rely on low latency real time computers based on the VME bus, Motorola PowerPC14 hardware architecture. In this context, one highly demanding application in terms of cycle time, latency and data transfer volume is the VLTI centralized recording facility, so called, RMN recorder1 (Reflective Memory Recorder). This application captures and transfers data flowing through the distributed memory of the system in real time. Some of the VLTI data producers are running with frequencies up to 8 KHz. With the evolution from first generation instruments like MIDI3, PRIMA5, and AMBER4 which use one or two baselines, to second generation instruments like MATISSE10 and GRAVITY9 which will use all six baselines simultaneously, the quantity of signals has increased by, at least, a factor of six. This has led to a significant overload of the RMN recorder1 which has reached the natural limits imposed by the underlying hardware. At the same time, new, more powerful computers, based on the Intel multicore families of CPUs and PCI buses have become available. With the purpose of improving the performance of the RMN recorder1 application and in order to make it capable of coping with the demands of the new generation instruments, a slightly modified implementation has been developed and integrated into an Intel based multicore computer15 running the VxWorks17 real time operating system. The core of the application is based on the standard VLT software framework for instruments13. The real time task reads from the reflective memory using the onboard DMA access12 and captured data is transferred to the outside world via a TCP socket on a dedicated Ethernet connection. The diversity of the software and hardware that are involved makes this application suitable as a benchmarking platform. A

  19. Advanced compiler design and implementation

    CERN Document Server

    Muchnick, Steven S

    1997-01-01

    From the Foreword by Susan L. Graham: This book takes on the challenges of contemporary languages and architectures, and prepares the reader for the new compiling problems that will inevitably arise in the future. The definitive book on advanced compiler design This comprehensive, up-to-date work examines advanced issues in the design and implementation of compilers for modern processors. Written for professionals and graduate students, the book guides readers in designing and implementing efficient structures for highly optimizing compilers for real-world languages. Covering advanced issues in fundamental areas of compiler design, this book discusses a wide array of possible code optimizations, determining the relative importance of optimizations, and selecting the most effective methods of implementation. * Lays the foundation for understanding the major issues of advanced compiler design * Treats optimization in-depth * Uses four case studies of commercial compiling suites to illustrate different approache...

  20. A Fortran Program for Deep Space Sensor Analysis.

    Science.gov (United States)

    1984-12-14

    used to help p maintain currency to the deep space satellite catelog? Research Question Can a Fortran program be designed to evaluate the effectiveness ...Range ( AFETR ) Range p Measurements Laboratory (RML) is located in Malibar, .- Florida. Like GEODSS, Malibar uses a 48 inch telescope with a...phased out. This mode will evaluate the effect of the loss of the 3 Baker-Nunn sites to mode 3 Mode 5 through Mode 8 Modes 5 through 8 are identical to

  1. An Algebraic Machinery for Optimizing Data Motion for HPF

    Directory of Open Access Journals (Sweden)

    Jan-Jan Wu

    1997-01-01

    Full Text Available This paper describes a general compiler optimization technique that reduces communica tion over-head for FORTRAN-90 (and High Performance FORTRAN implementations on massively parallel machines.

  2. A brief description and comparison of programming languages FORTRAN, ALGOL, COBOL, PL/1, and LISP 1.5 from a critical standpoint

    Science.gov (United States)

    Mathur, F. P.

    1972-01-01

    Several common higher level program languages are described. FORTRAN, ALGOL, COBOL, PL/1, and LISP 1.5 are summarized and compared. FORTRAN is the most widely used scientific programming language. ALGOL is a more powerful language for scientific programming. COBOL is used for most commercial programming applications. LISP 1.5 is primarily a list-processing language. PL/1 attempts to combine the desirable features of FORTRAN, ALGOL, and COBOL into a single language.

  3. DBPQL: A view-oriented query language for the Intel Data Base Processor

    Science.gov (United States)

    Fishwick, P. A.

    1983-01-01

    An interactive query language (BDPQL) for the Intel Data Base Processor (DBP) is defined. DBPQL includes a parser generator package which permits the analyst to easily create and manipulate the query statement syntax and semantics. The prototype language, DBPQL, includes trace and performance commands to aid the analyst when implementing new commands and analyzing the execution characteristics of the DBP. The DBPQL grammar file and associated key procedures are included as an appendix to this report.

  4. Game-Based Experiential Learning in Online Management Information Systems Classes Using Intel's IT Manager 3

    Science.gov (United States)

    Bliemel, Michael; Ali-Hassan, Hossam

    2014-01-01

    For several years, we used Intel's flash-based game "IT Manager 3: Unseen Forces" as an experiential learning tool, where students had to act as a manager making real-time prioritization decisions about repairing computer problems, training and upgrading systems with better technologies as well as managing increasing numbers of technical…

  5. IMAGEP - A FORTRAN ALGORITHM FOR DIGITAL IMAGE PROCESSING

    Science.gov (United States)

    Roth, D. J.

    1994-01-01

    IMAGEP is a FORTRAN computer algorithm containing various image processing, analysis, and enhancement functions. It is a keyboard-driven program organized into nine subroutines. Within the subroutines are other routines, also, selected via keyboard. Some of the functions performed by IMAGEP include digitization, storage and retrieval of images; image enhancement by contrast expansion, addition and subtraction, magnification, inversion, and bit shifting; display and movement of cursor; display of grey level histogram of image; and display of the variation of grey level intensity as a function of image position. This algorithm has possible scientific, industrial, and biomedical applications in material flaw studies, steel and ore analysis, and pathology, respectively. IMAGEP is written in VAX FORTRAN for DEC VAX series computers running VMS. The program requires the use of a Grinnell 274 image processor which can be obtained from Mark McCloud Associates, Campbell, CA. An object library of the required GMR series software is included on the distribution media. IMAGEP requires 1Mb of RAM for execution. The standard distribution medium for this program is a 1600 BPI 9track magnetic tape in VAX FILES-11 format. It is also available on a TK50 tape cartridge in VAX FILES-11 format. This program was developed in 1991. DEC, VAX, VMS, and TK50 are trademarks of Digital Equipment Corporation.

  6. Reformulation RELAP5-3D in FORTRAN 95 and Results

    International Nuclear Information System (INIS)

    Mesina, George L.

    2010-01-01

    RELAP5-3D is a nuclear power plant code used worldwide for safety analysis, design, and operator training. In keeping with ongoing developments in the computing industry, we have re-architected the code in the FORTRAN 95 language, the current, fully-available, FORTRAN language. These changes include a complete reworking of the database and conversion of the source code to take advantage of new constructs. The improvements and impacts to the code are manifold. It is a completely machine-independent code that produces machine independent fluid property and plot files and expands to the exact size needed to accommodate the user's input. Runtime is generally better for larger input models. Other impacts of code conversion are improved code readability, reduced maintenance and development time, increased adaptability to new computing platforms, and increased code longevity. The conversion methodology, code improvements and testing upgrades are presented in a manner that will be useful to future conversion projects for other such large codes. Comparison between the pre- and post-conversion code are made on the basis of code metrics and code performance.

  7. Acceleration of Blender Cycles Path-Tracing Engine Using Intel Many Integrated Core Architecture

    OpenAIRE

    Jaroš , Milan; Říha , Lubomír; Strakoš , Petr; Karásek , Tomáš; Vašatová , Alena; Jarošová , Marta; Kozubek , Tomáš

    2015-01-01

    Part 2: Algorithms; International audience; This paper describes the acceleration of the most computationally intensive kernels of the Blender rendering engine, Blender Cycles, using Intel Many Integrated Core architecture (MIC). The proposed parallelization, which uses OpenMP technology, also improves the performance of the rendering engine when running on multi-core CPUs and multi-socket servers. Although the GPU acceleration is already implemented in Cycles, its functionality is limited. O...

  8. Pseudo-BINPUT, a free formal input package for Fortran programmes

    International Nuclear Information System (INIS)

    Gubbins, M.E.

    1977-11-01

    Pseudo - BINPUT is an input package for reading free format data in codeword control in a FORTRAN programme. To a large degree it mimics in function the Winfrith Subroutine Library routine BINPUT. By using calls of the data input package DECIN to mimic the input routine BINPUT, Pseudo - BINPUT combines some of the advantages of both systems. (U.K.)

  9. Optimizing the Betts-Miller-Janjic cumulus parameterization with Intel Many Integrated Core (MIC) architecture

    Science.gov (United States)

    Huang, Melin; Huang, Bormin; Huang, Allen H.-L.

    2015-10-01

    The schemes of cumulus parameterization are responsible for the sub-grid-scale effects of convective and/or shallow clouds, and intended to represent vertical fluxes due to unresolved updrafts and downdrafts and compensating motion outside the clouds. Some schemes additionally provide cloud and precipitation field tendencies in the convective column, and momentum tendencies due to convective transport of momentum. The schemes all provide the convective component of surface rainfall. Betts-Miller-Janjic (BMJ) is one scheme to fulfill such purposes in the weather research and forecast (WRF) model. National Centers for Environmental Prediction (NCEP) has tried to optimize the BMJ scheme for operational application. As there are no interactions among horizontal grid points, this scheme is very suitable for parallel computation. With the advantage of Intel Xeon Phi Many Integrated Core (MIC) architecture, efficient parallelization and vectorization essentials, it allows us to optimize the BMJ scheme. If compared to the original code respectively running on one CPU socket (eight cores) and on one CPU core with Intel Xeon E5-2670, the MIC-based optimization of this scheme running on Xeon Phi coprocessor 7120P improves the performance by 2.4x and 17.0x, respectively.

  10. Application of Intel Many Integrated Core (MIC) accelerators to the Pleim-Xiu land surface scheme

    Science.gov (United States)

    Huang, Melin; Huang, Bormin; Huang, Allen H.

    2015-10-01

    The land-surface model (LSM) is one physics process in the weather research and forecast (WRF) model. The LSM includes atmospheric information from the surface layer scheme, radiative forcing from the radiation scheme, and precipitation forcing from the microphysics and convective schemes, together with internal information on the land's state variables and land-surface properties. The LSM is to provide heat and moisture fluxes over land points and sea-ice points. The Pleim-Xiu (PX) scheme is one LSM. The PX LSM features three pathways for moisture fluxes: evapotranspiration, soil evaporation, and evaporation from wet canopies. To accelerate the computation process of this scheme, we employ Intel Xeon Phi Many Integrated Core (MIC) Architecture as it is a multiprocessor computer structure with merits of efficient parallelization and vectorization essentials. Our results show that the MIC-based optimization of this scheme running on Xeon Phi coprocessor 7120P improves the performance by 2.3x and 11.7x as compared to the original code respectively running on one CPU socket (eight cores) and on one CPU core with Intel Xeon E5-2670.

  11. HAL/S-FC compiler system specifications

    Science.gov (United States)

    1976-01-01

    This document specifies the informational interfaces within the HAL/S-FC compiler, and between the compiler and the external environment. This Compiler System Specification is for the HAL/S-FC compiler and its associated run time facilities which implement the full HAL/S language. The HAL/S-FC compiler is designed to operate stand-alone on any compatible IBM 360/370 computer and within the Software Development Laboratory (SDL) at NASA/JSC, Houston, Texas.

  12. A Class-Specific Optimizing Compiler

    Directory of Open Access Journals (Sweden)

    Michael D. Sharp

    1993-01-01

    Full Text Available Class-specific optimizations are compiler optimizations specified by the class implementor to the compiler. They allow the compiler to take advantage of the semantics of the particular class so as to produce better code. Optimizations of interest include the strength reduction of class:: array address calculations, elimination of large temporaries, and the placement of asynchronous send/recv calls so as to achieve computation/communication overlap. We will outline our progress towards the implementation of a C++ compiler capable of incorporating class-specific optimizations.

  13. Acceleration of Cherenkov angle reconstruction with the new Intel Xeon/FPGA compute platform for the particle identification in the LHCb Upgrade

    Science.gov (United States)

    Faerber, Christian

    2017-10-01

    The LHCb experiment at the LHC will upgrade its detector by 2018/2019 to a ‘triggerless’ readout scheme, where all the readout electronics and several sub-detector parts will be replaced. The new readout electronics will be able to readout the detector at 40 MHz. This increases the data bandwidth from the detector down to the Event Filter farm to 40 TBit/s, which also has to be processed to select the interesting proton-proton collision for later storage. The architecture of such a computing farm, which can process this amount of data as efficiently as possible, is a challenging task and several compute accelerator technologies are being considered for use inside the new Event Filter farm. In the high performance computing sector more and more FPGA compute accelerators are used to improve the compute performance and reduce the power consumption (e.g. in the Microsoft Catapult project and Bing search engine). Also for the LHCb upgrade the usage of an experimental FPGA accelerated computing platform in the Event Building or in the Event Filter farm is being considered and therefore tested. This platform from Intel hosts a general CPU and a high performance FPGA linked via a high speed link which is for this platform a QPI link. On the FPGA an accelerator is implemented. The used system is a two socket platform from Intel with a Xeon CPU and an FPGA. The FPGA has cache-coherent memory access to the main memory of the server and can collaborate with the CPU. As a first step, a computing intensive algorithm to reconstruct Cherenkov angles for the LHCb RICH particle identification was successfully ported in Verilog to the Intel Xeon/FPGA platform and accelerated by a factor of 35. The same algorithm was ported to the Intel Xeon/FPGA platform with OpenCL. The implementation work and the performance will be compared. Also another FPGA accelerator the Nallatech 385 PCIe accelerator with the same Stratix V FPGA were tested for performance. The results show that the Intel

  14. Mass: Fortran program for calculating mass-absorption coefficients

    International Nuclear Information System (INIS)

    Nielsen, Aa.; Svane Petersen, T.

    1980-01-01

    Determinations of mass-absorption coefficients in the x-ray analysis of trace elements are an important and time consuming part of the arithmetic calculation. In the course of time different metods have been used. The program MASS calculates the mass-absorption coefficients from a given major element analysis at the x-ray wavelengths normally used in trace element determinations and lists the chemical analysis and the mass-absorption coefficients. The program is coded in FORTRAN IV, and is operational on the IBM 370/165 computer, on the UNIVAC 1110 and on PDP 11/05. (author)

  15. Implementation of 5-layer thermal diffusion scheme in weather research and forecasting model with Intel Many Integrated Cores

    Science.gov (United States)

    Huang, Melin; Huang, Bormin; Huang, Allen H.

    2014-10-01

    For weather forecasting and research, the Weather Research and Forecasting (WRF) model has been developed, consisting of several components such as dynamic solvers and physical simulation modules. WRF includes several Land- Surface Models (LSMs). The LSMs use atmospheric information, the radiative and precipitation forcing from the surface layer scheme, the radiation scheme, and the microphysics/convective scheme all together with the land's state variables and land-surface properties, to provide heat and moisture fluxes over land and sea-ice points. The WRF 5-layer thermal diffusion simulation is an LSM based on the MM5 5-layer soil temperature model with an energy budget that includes radiation, sensible, and latent heat flux. The WRF LSMs are very suitable for massively parallel computation as there are no interactions among horizontal grid points. The features, efficient parallelization and vectorization essentials, of Intel Many Integrated Core (MIC) architecture allow us to optimize this WRF 5-layer thermal diffusion scheme. In this work, we present the results of the computing performance on this scheme with Intel MIC architecture. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.1x. Accordingly, the same CPU-based optimizations improved the performance on Intel Xeon E5- 2603 by a factor of 1.6x as compared to the first version of multi-threaded code.

  16. Applying the roofline performance model to the intel xeon phi knights landing processor

    OpenAIRE

    Doerfler, D; Deslippe, J; Williams, S; Oliker, L; Cook, B; Kurth, T; Lobet, M; Malas, T; Vay, JL; Vincenti, H

    2016-01-01

    � Springer International Publishing AG 2016. The Roofline Performance Model is a visually intuitive method used to bound the sustained peak floating-point performance of any given arithmetic kernel on any given processor architecture. In the Roofline, performance is nominally measured in floating-point operations per second as a function of arithmetic intensity (operations per byte of data). In this study we determine the Roofline for the Intel Knights Landing (KNL) processor, determining t...

  17. Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

    OpenAIRE

    Hofmann, Johannes; Treibig, Jan; Hager, Georg; Wellein, Gerhard

    2013-01-01

    We examine the Xeon Phi, which is based on Intel's Many Integrated Cores architecture, for its suitability to run the FDK algorithm--the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography. We study the challenges of efficiently parallelizing the application and means to enable sensible data sharing between threads despite the lack of a shared last level cache. Apart from parallelization, SIMD vectorization is critical for good performance on t...

  18. Acceleration of Monte Carlo simulation of photon migration in complex heterogeneous media using Intel many-integrated core architecture.

    Science.gov (United States)

    Gorshkov, Anton V; Kirillin, Mikhail Yu

    2015-08-01

    Over two decades, the Monte Carlo technique has become a gold standard in simulation of light propagation in turbid media, including biotissues. Technological solutions provide further advances of this technique. The Intel Xeon Phi coprocessor is a new type of accelerator for highly parallel general purpose computing, which allows execution of a wide range of applications without substantial code modification. We present a technical approach of porting our previously developed Monte Carlo (MC) code for simulation of light transport in tissues to the Intel Xeon Phi coprocessor. We show that employing the accelerator allows reducing computational time of MC simulation and obtaining simulation speed-up comparable to GPU. We demonstrate the performance of the developed code for simulation of light transport in the human head and determination of the measurement volume in near-infrared spectroscopy brain sensing.

  19. VENTURE-PC 1.1, Reactor Analysis System with Sensitivity and Burnup

    International Nuclear Information System (INIS)

    2002-01-01

    1 - Description of program or function: The VENTURE program solves the usual neutronics eigenvalue, adjoint, fixed source, and criticality search problems. It treats up to three dimensions, maps power density, and does first-order perturbation analysis at the macroscopic cross section level. The BURNER code solves the nuclide chain equations to estimate the nuclide concentrations and burnup at the end of an exposure time or after a shutdown period. This package is based on the CCC-459/BOLD VENTURE IV code system developed at Oak Ridge National Laboratory. In January 1989 the University of Cincinnati contributed the first VENTURE-PC package to RSICC's collection. It was a subset of the mainframe version consisting of the VENTURE and BURNER modules plus several processing modules. VENTURE-PC was distributed as CCC-459 until July 1997 when a new version (with updated source code compatible with newer FORTRAN-77 compilers, some revisions, and extensions to solve much larger problems) was contributed by Argonne National Laboratory. The principle code modules included in the VENTURE-PC system are: VENTURE: Multigroup neutronics finite-difference diffusion theory. BURNER: Depletion calculation for reactor core analysis. Other modules within VENTURE-PC are: DVENTR: Venture input processor; DCRSPR: Neutron cross section processor; DUTLIN: Control file (CNTRL) input processor; DCMACR: Citation format cross section input processor; CRXSPR: Cross section processor; DENMAN: Fuel repositioning module. In August of 1999, Argonne again contributed an updated version of the code which overcomes problem size constraints caused by binary record length limits inherent to the Fortran 90 compiler. The need for long records is detected and avoided by sub-blocking them. Also, the latest Fortran 95 compiler offers substantial speed gains on the newest processors. The source code is updated to be compatible with either Fortran 90 or Fortran 95. In August 2002, the package was updated with

  20. Evaluation of the Single-precision Floatingpoint Vector Add Kernel Using the Intel FPGA SDK for OpenCL

    Energy Technology Data Exchange (ETDEWEB)

    Jin, Zheming [Argonne National Lab. (ANL), Argonne, IL (United States); Yoshii, Kazutomo [Argonne National Lab. (ANL), Argonne, IL (United States); Finkel, Hal [Argonne National Lab. (ANL), Argonne, IL (United States); Cappello, Franck [Argonne National Lab. (ANL), Argonne, IL (United States)

    2017-04-20

    Open Computing Language (OpenCL) is a high-level language that enables software programmers to explore Field Programmable Gate Arrays (FPGAs) for application acceleration. The Intel FPGA software development kit (SDK) for OpenCL allows a user to specify applications at a high level and explore the performance of low-level hardware acceleration. In this report, we present the FPGA performance and power consumption results of the single-precision floating-point vector add OpenCL kernel using the Intel FPGA SDK for OpenCL on the Nallatech 385A FPGA board. The board features an Arria 10 FPGA. We evaluate the FPGA implementations using the compute unit duplication and kernel vectorization optimization techniques. On the Nallatech 385A FPGA board, the maximum compute kernel bandwidth we achieve is 25.8 GB/s, approximately 76% of the peak memory bandwidth. The power consumption of the FPGA device when running the kernels ranges from 29W to 42W.

  1. DISPPAK SUBPAK, MS FORTRAN Extended Subroutine Library

    International Nuclear Information System (INIS)

    Langer, S.

    1991-01-01

    1 - Description of program or function: DISPPAK is a set of routines for use with Microsoft FORTRAN programs that allows the flexible display of information on the screen of an IBM PC in both text and graphics modes. The text mode routines allow the cursor to be placed at an arbitrary point on the screen and text to be displayed at the cursor location, making it possible to create menus and other structured displays. A routine to set the color of the characters that these routines display is also provided. A set of line drawing routines is included for use with IBM's Color Graphics Adapter or an equivalent board (such as the Enhanced Graphics Adapter in CGA emulation mode). These routines support both pixel coordinates and a user-specified set of real number coordinates. SUBPAK is a function library which allows Microsoft FORTRAN programs to calculate random numbers, issue calls to the operating system, read individual characters from the keyboard, perform Boolean and shift operations, and communicate with the I/O ports of the IBM PC. In addition, peek and poke routines, a routine that returns the address of any variable, and routines that can access the system time and date are included. 2 - Method of solution: For the DISPPAK line drawing routines, the user selects a fraction of the screen to use for plotting, chooses the coordinates that refer to the lower-left and upper-right corners, and decides whether the mapping should be linear or logarithmic. Lines are then drawn between endpoints defined in terms of the user coordinate system. Out-of-range coordinates are forced to the border of the window before the line is drawn. 3 - Restrictions on the complexity of the problem: No support is provided for filled areas or text

  2. Compiling software for a hierarchical distributed processing system

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-12-31

    Compiling software for a hierarchical distributed processing system including providing to one or more compiling nodes software to be compiled, wherein at least a portion of the software to be compiled is to be executed by one or more nodes; compiling, by the compiling node, the software; maintaining, by the compiling node, any compiled software to be executed on the compiling node; selecting, by the compiling node, one or more nodes in a next tier of the hierarchy of the distributed processing system in dependence upon whether any compiled software is for the selected node or the selected node's descendents; sending to the selected node only the compiled software to be executed by the selected node or selected node's descendent.

  3. Performance Analysis of an Astrophysical Simulation Code on the Intel Xeon Phi Architecture

    OpenAIRE

    Noormofidi, Vahid; Atlas, Susan R.; Duan, Huaiyu

    2015-01-01

    We have developed the astrophysical simulation code XFLAT to study neutrino oscillations in supernovae. XFLAT is designed to utilize multiple levels of parallelism through MPI, OpenMP, and SIMD instructions (vectorization). It can run on both CPU and Xeon Phi co-processors based on the Intel Many Integrated Core Architecture (MIC). We analyze the performance of XFLAT on configurations with CPU only, Xeon Phi only and both CPU and Xeon Phi. We also investigate the impact of I/O and the multi-n...

  4. Efficient sparse matrix-matrix multiplication for computing periodic responses by shooting method on Intel Xeon Phi

    Science.gov (United States)

    Stoykov, S.; Atanassov, E.; Margenov, S.

    2016-10-01

    Many of the scientific applications involve sparse or dense matrix operations, such as solving linear systems, matrix-matrix products, eigensolvers, etc. In what concerns structural nonlinear dynamics, the computations of periodic responses and the determination of stability of the solution are of primary interest. Shooting method iswidely used for obtaining periodic responses of nonlinear systems. The method involves simultaneously operations with sparse and dense matrices. One of the computationally expensive operations in the method is multiplication of sparse by dense matrices. In the current work, a new algorithm for sparse matrix by dense matrix products is presented. The algorithm takes into account the structure of the sparse matrix, which is obtained by space discretization of the nonlinear Mindlin's plate equation of motion by the finite element method. The algorithm is developed to use the vector engine of Intel Xeon Phi coprocessors. It is compared with the standard sparse matrix by dense matrix algorithm and the one developed by Intel MKL and it is shown that by considering the properties of the sparse matrix better algorithms can be developed.

  5. Optimization of Grillages Using Genetic Algorithms for Integrating Matlab and Fortran Environments

    Directory of Open Access Journals (Sweden)

    Darius Mačiūnas

    2013-02-01

    Full Text Available The purpose of the paper is to present technology applied for the global optimization of grillage-type pile foundations (further grillages. The goal of optimization is to obtain the optimal layout of pile placement in the grillages. The problem can be categorized as a topology optimization problem. The objective function is comprised of maximum reactive force emerging in a pile. The reactive force is minimized during the procedure of optimization during which variables enclose the positions of piles beneath connecting beams. Reactive forces in all piles are computed utilizing an original algorithm implemented in the Fortran programming language. The algorithm is integrated into the MatLab environment where the optimization procedure is executed utilizing a genetic algorithm. The article also describes technology enabling the integration of MatLab and Fortran environments. The authors seek to evaluate the quality of a solution to the problem analyzing experimental results obtained applying the proposed technology.

  6. Optimization of Grillages Using Genetic Algorithms for Integrating Matlab and Fortran Environments

    Directory of Open Access Journals (Sweden)

    Darius Mačiūnas

    2012-12-01

    Full Text Available The purpose of the paper is to present technology applied for the global optimization of grillage-type pile foundations (further grillages. The goal of optimization is to obtain the optimal layout of pile placement in the grillages. The problem can be categorized as a topology optimization problem. The objective function is comprised of maximum reactive force emerging in a pile. The reactive force is minimized during the procedure of optimization during which variables enclose the positions of piles beneath connecting beams. Reactive forces in all piles are computed utilizing an original algorithm implemented in the Fortran programming language. The algorithm is integrated into the MatLab environment where the optimization procedure is executed utilizing a genetic algorithm. The article also describes technology enabling the integration of MatLab and Fortran environments. The authors seek to evaluate the quality of a solution to the problem analyzing experimental results obtained applying the proposed technology.

  7. Evaluation of HAL/S language compilability using SAMSO's Compiler Writing System (CWS)

    Science.gov (United States)

    Feliciano, M.; Anderson, H. D.; Bond, J. W., III

    1976-01-01

    NASA/Langley is engaged in a program to develop an adaptable guidance and control software concept for spacecraft such as shuttle-launched payloads. It is envisioned that this flight software be written in a higher-order language, such as HAL/S, to facilitate changes or additions. To make this adaptable software transferable to various onboard computers, a compiler writing system capability is necessary. A joint program with the Air Force Space and Missile Systems Organization was initiated to determine if the Compiler Writing System (CWS) owned by the Air Force could be utilized for this purpose. The present study explores the feasibility of including the HAL/S language constructs in CWS and the effort required to implement these constructs. This will determine the compilability of HAL/S using CWS and permit NASA/Langley to identify the HAL/S constructs desired for their applications. The study consisted of comparing the implementation of the Space Programming Language using CWS with the requirements for the implementation of HAL/S. It is the conclusion of the study that CWS already contains many of the language features of HAL/S and that it can be expanded for compiling part or all of HAL/S. It is assumed that persons reading and evaluating this report have a basic familiarity with (1) the principles of compiler construction and operation, and (2) the logical structure and applications characteristics of HAL/S and SPL.

  8. FORTRAN subroutine for computing the optimal estimate of f(x)

    International Nuclear Information System (INIS)

    Gaffney, P.W.

    1980-10-01

    A FORTRAN subroutine called RANGE is presented that is designed to compute the optimal estimate of a function f given values of the function at n distinct points x 1 2 < ... < x/sub n/ and given a bound on one of the derivatives of f. We donate this estimate by Ω. It is optimal in the sense that the error abs value (f - Ω) has the smallest possible error bound

  9. C to VHDL compiler

    Science.gov (United States)

    Berdychowski, Piotr P.; Zabolotny, Wojciech M.

    2010-09-01

    The main goal of C to VHDL compiler project is to make FPGA platform more accessible for scientists and software developers. FPGA platform offers unique ability to configure the hardware to implement virtually any dedicated architecture, and modern devices provide sufficient number of hardware resources to implement parallel execution platforms with complex processing units. All this makes the FPGA platform very attractive for those looking for efficient heterogeneous, computing environment. Current industry standard in development of digital systems on FPGA platform is based on HDLs. Although very effective and expressive in hands of hardware development specialists, these languages require specific knowledge and experience, unreachable for most scientists and software programmers. C to VHDL compiler project attempts to remedy that by creating an application, that derives initial VHDL description of a digital system (for further compilation and synthesis), from purely algorithmic description in C programming language. This idea itself is not new, and the C to VHDL compiler combines the best approaches from existing solutions developed over many previous years, with the introduction of some new unique improvements.

  10. Transitioning to Intel-based Linux Servers in the Payload Operations Integration Center

    Science.gov (United States)

    Guillebeau, P. L.

    2004-01-01

    The MSFC Payload Operations Integration Center (POIC) is the focal point for International Space Station (ISS) payload operations. The POIC contains the facilities, hardware, software and communication interface necessary to support payload operations. ISS ground system support for processing and display of real-time spacecraft and telemetry and command data has been operational for several years. The hardware components were reaching end of life and vendor costs were increasing while ISS budgets were becoming severely constrained. Therefore it has been necessary to migrate the Unix portions of our ground systems to commodity priced Intel-based Linux servers. hardware architecture including networks, data storage, and highly available resources. This paper will concentrate on the Linux migration implementation for the software portion of our ground system. The migration began with 3.5 million lines of code running on Unix platforms with separate servers for telemetry, command, Payload information management systems, web, system control, remote server interface and databases. The Intel-based system is scheduled to be available for initial operational use by August 2004 The overall migration to Intel-based Linux servers in the control center involves changes to the This paper will address the Linux migration study approach including the proof of concept, criticality of customer buy-in and importance of beginning with POSlX compliant code. It will focus on the development approach explaining the software lifecycle. Other aspects of development will be covered including phased implementation, interim milestones and metrics measurements and reporting mechanisms. This paper will also address the testing approach covering all levels of testing including development, development integration, IV&V, user beta testing and acceptance testing. Test results including performance numbers compared with Unix servers will be included. need for a smooth transition while maintaining

  11. HAL/S-FC compiler system functional specification

    Science.gov (United States)

    1974-01-01

    Compiler organization is discussed, including overall compiler structure, internal data transfer, compiler development, and code optimization. The user, system, and SDL interfaces are described, along with compiler system requirements. Run-time software support package and restrictions and dependencies are also considered of the HAL/S-FC system.

  12. Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor

    OpenAIRE

    Byun, Chansup; Kepner, Jeremy; Arcand, William; Bestor, David; Bergeron, Bill; Gadepally, Vijay; Houle, Michael; Hubbell, Matthew; Jones, Michael; Klein, Anna; Michaleas, Peter; Milechin, Lauren; Mullen, Julie; Prout, Andrew; Rosa, Antonio

    2017-01-01

    Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher levels of parallelism. At the Lincoln Laboratory Supercomputing Center (LLSC), the majority of users are running data analysis applications such as MATLAB and O...

  13. SPARQL compiler for Bobox

    OpenAIRE

    Čermák, Miroslav

    2013-01-01

    The goal of the work is to design and implement a SPARQL compiler for the Bobox system. In addition to lexical and syntactic analysis corresponding to W3C standard for SPARQL language, it performs semantic analysis and optimization of queries. Compiler will constuct an appropriate model for execution in Bobox, that depends on the physical database schema.

  14. High-throughput sockets over RDMA for the Intel Xeon Phi coprocessor

    CERN Document Server

    Santogidis, Aram

    2017-01-01

    In this paper we describe the design, implementation and performance of Trans4SCIF, a user-level socket-like transport library for the Intel Xeon Phi coprocessor. Trans4SCIF library is primarily intended for high-throughput applications. It uses RDMA transfers over the native SCIF support, in a way that is transparent for the application, which has the illusion of using conventional stream sockets. We also discuss the integration of Trans4SCIF with the ZeroMQ messaging library, used extensively by several applications running at CERN. We show that this can lead to a substantial, up to 3x, increase of application throughput compared to the default TCP/IP transport option.

  15. GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

    Science.gov (United States)

    Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

    2017-08-01

    The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy

  16. Fortran programs for the time-dependent Gross-Pitaevskii equation in a fully anisotropic trap

    Science.gov (United States)

    Muruganandam, P.; Adhikari, S. K.

    2009-10-01

    Here we develop simple numerical algorithms for both stationary and non-stationary solutions of the time-dependent Gross-Pitaevskii (GP) equation describing the properties of Bose-Einstein condensates at ultra low temperatures. In particular, we consider algorithms involving real- and imaginary-time propagation based on a split-step Crank-Nicolson method. In a one-space-variable form of the GP equation we consider the one-dimensional, two-dimensional circularly-symmetric, and the three-dimensional spherically-symmetric harmonic-oscillator traps. In the two-space-variable form we consider the GP equation in two-dimensional anisotropic and three-dimensional axially-symmetric traps. The fully-anisotropic three-dimensional GP equation is also considered. Numerical results for the chemical potential and root-mean-square size of stationary states are reported using imaginary-time propagation programs for all the cases and compared with previously obtained results. Also presented are numerical results of non-stationary oscillation for different trap symmetries using real-time propagation programs. A set of convenient working codes developed in Fortran 77 are also provided for all these cases (twelve programs in all). In the case of two or three space variables, Fortran 90/95 versions provide some simplification over the Fortran 77 programs, and these programs are also included (six programs in all). Program summaryProgram title: (i) imagetime1d, (ii) imagetime2d, (iii) imagetime3d, (iv) imagetimecir, (v) imagetimesph, (vi) imagetimeaxial, (vii) realtime1d, (viii) realtime2d, (ix) realtime3d, (x) realtimecir, (xi) realtimesph, (xii) realtimeaxial Catalogue identifier: AEDU_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDU_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data

  17. Real time interrupt handling using FORTRAN IV plus under RSX-11M

    International Nuclear Information System (INIS)

    Schultz, D.E.

    1981-01-01

    A real-time data acquisition application for a linear accelerator is described. The important programming features of this application are use of connect to interrupt, a shared library, map to I/O page, and a shared data area. How you can provide rapid interrupt handling using these tools from FORTRAN IV PLUS is explained

  18. DDT: A Research Tool for Automatic Data Distribution in High Performance Fortran

    Directory of Open Access Journals (Sweden)

    Eduard AyguadÉ

    1997-01-01

    Full Text Available This article describes the main features and implementation of our automatic data distribution research tool. The tool (DDT accepts programs written in Fortran 77 and generates High Performance Fortran (HPF directives to map arrays onto the memories of the processors and parallelize loops, and executable statements to remap these arrays. DDT works by identifying a set of computational phases (procedures and loops. The algorithm builds a search space of candidate solutions for these phases which is explored looking for the combination that minimizes the overall cost; this cost includes data movement cost and computation cost. The movement cost reflects the cost of accessing remote data during the execution of a phase and the remapping costs that have to be paid in order to execute the phase with the selected mapping. The computation cost includes the cost of executing a phase in parallel according to the selected mapping and the owner computes rule. The tool supports interprocedural analysis and uses control flow information to identify how phases are sequenced during the execution of the application.

  19. Extending OpenMP for NUMA Machines

    Directory of Open Access Journals (Sweden)

    John Bircsak

    2000-01-01

    Full Text Available This paper describes extensions to OpenMP that implement data placement features needed for NUMA architectures. OpenMP is a collection of compiler directives and library routines used to write portable parallel programs for shared-memory architectures. Writing efficient parallel programs for NUMA architectures, which have characteristics of both shared-memory and distributed-memory architectures, requires that a programmer control the placement of data in memory and the placement of computations that operate on that data. Optimal performance is obtained when computations occur on processors that have fast access to the data needed by those computations. OpenMP -- designed for shared-memory architectures -- does not by itself address these issues. The extensions to OpenMP Fortran presented here have been mainly taken from High Performance Fortran. The paper describes some of the techniques that the Compaq Fortran compiler uses to generate efficient code based on these extensions. It also describes some additional compiler optimizations, and concludes with some preliminary results.

  20. Evaluation of the Intel Nehalem-EX server processor

    CERN Document Server

    Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

    2010-01-01

    In this paper we report on a set of benchmark results recently obtained by the CERN openlab by comparing the 4-socket, 32-core Intel Xeon X7560 server with the previous generation 4-socket server, based on the Xeon X7460 processor. The Xeon X7560 processor represents a major change in many respects, especially the memory sub-system, so it was important to make multiple comparisons. In most benchmarks the two 4-socket servers were compared. It should be underlined that both servers represent the “top of the line” in terms of frequency. However, in some cases, it was important to compare systems that integrated the latest processor features, such as QPI links, Symmetric multithreading and over-clocking via Turbo mode, and in such situations the X7560 server was compared to a dual socket L5520 based system with an identical frequency of 2.26 GHz. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following ...

  1. hepawk - A language for scanning high energy physics events

    International Nuclear Information System (INIS)

    Ohl, T.

    1992-01-01

    We present the programming language hepawk, designed for convenient scanning of data structures arising in the simulation of high energy physics events. The interpreter for this language has been implemented in FORTRAN-77, therefore hepawk runs on any machine with a FORTRAN-77 compiler. (orig.)

  2. Algorithmic synthesis using Python compiler

    Science.gov (United States)

    Cieszewski, Radoslaw; Romaniuk, Ryszard; Pozniak, Krzysztof; Linczuk, Maciej

    2015-09-01

    This paper presents a python to VHDL compiler. The compiler interprets an algorithmic description of a desired behavior written in Python and translate it to VHDL. FPGA combines many benefits of both software and ASIC implementations. Like software, the programmed circuit is flexible, and can be reconfigured over the lifetime of the system. FPGAs have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute operations of traditional processors, and possibly exploiting a greater level of parallelism. This can be achieved by using many computational resources at the same time. Creating parallel programs implemented in FPGAs in pure HDL is difficult and time consuming. Using higher level of abstraction and High-Level Synthesis compiler implementation time can be reduced. The compiler has been implemented using the Python language. This article describes design, implementation and results of created tools.

  3. Developing the science product algorithm testbed for Chinese next-generation geostationary meteorological satellites: Fengyun-4 series

    Science.gov (United States)

    Min, Min; Wu, Chunqiang; Li, Chuan; Liu, Hui; Xu, Na; Wu, Xiao; Chen, Lin; Wang, Fu; Sun, Fenglin; Qin, Danyu; Wang, Xi; Li, Bo; Zheng, Zhaojun; Cao, Guangzhen; Dong, Lixin

    2017-08-01

    Fengyun-4A (FY-4A), the first of the Chinese next-generation geostationary meteorological satellites, launched in 2016, offers several advances over the FY-2: more spectral bands, faster imaging, and infrared hyperspectral measurements. To support the major objective of developing the prototypes of FY-4 science algorithms, two science product algorithm testbeds for imagers and sounders have been developed by the scientists in the FY-4 Algorithm Working Group (AWG). Both testbeds, written in FORTRAN and C programming languages for Linux or UNIX systems, have been tested successfully by using Intel/g compilers. Some important FY-4 science products, including cloud mask, cloud properties, and temperature profiles, have been retrieved successfully through using a proxy imager, Himawari-8/Advanced Himawari Imager (AHI), and sounder data, obtained from the Atmospheric InfraRed Sounder, thus demonstrating their robustness. In addition, in early 2016, the FY-4 AWG was developed based on the imager testbed—a near real-time processing system for Himawari-8/AHI data for use by Chinese weather forecasters. Consequently, robust and flexible science product algorithm testbeds have provided essential and productive tools for popularizing FY-4 data and developing substantial improvements in FY-4 products.

  4. Evaluation of the Intel Sandy Bridge-EP server processor

    CERN Document Server

    Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

    2012-01-01

    In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing an 8-core “Sandy Bridge-EP” processor with Intel’s previous microarchitecture, the “Westmere-EP”. The Intel marketing names for these processors are “Xeon E5-2600 processor series” and “Xeon 5600 processor series”, respectively. Both processors are produced in a 32nm process, and both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores ...

  5. Implementation of a 3-D nonlinear MHD [magnetohydrodynamics] calculation on the Intel hypercube

    International Nuclear Information System (INIS)

    Lynch, V.E.; Carreras, B.A.; Drake, J.B.; Hicks, H.R.; Lawkins, W.F.

    1987-01-01

    The optimization of numerical schemes and increasing computer capabilities in the last ten years have improved the efficiency of 3-D nonlinear resistive MHD calculations by about two to three orders of magnitude. However, we are still very limited in performing these types of calculations. Hypercubes have a large number of processors with only local memory and bidirectional links among neighbors. The Intel Hypercube at Oak Ridge has 64 processors with 0.5 megabytes of memory per processor. The multiplicity of processors opens new possibilities for the treatment of such computations. The constraint on time and resources favored the approach of using the existing RSF code which solves as an initial value problem the reduced set of MHD equations for a periodic cylindrical geometry. This code includes minimal physics and geometry, but contains the basic three dimensionality and nonlinear structure of the equations. The code solves the reduced set of MHD equations by Fourier expansion in two angular coordinates and finite differences in the radial one. Due to the continuing interest in these calculations and the likelihood that future supercomputers will take greater advantage of parallelism, the present study was initiated by the ORNL Exploratory Studies Committee and funded entirely by Laboratory Discretionary Funds. The objectives of the study were: to ascertain the suitability of MHD calculation for parallel computation, to design and implement a parallel algorithm to perform the computations, and to evaluate the hypercube, and in particular, ORNL's Intel iPSC, for use in MHD computations

  6. GPU Computing in Bayesian Inference of Realized Stochastic Volatility Model

    International Nuclear Information System (INIS)

    Takaishi, Tetsuya

    2015-01-01

    The realized stochastic volatility (RSV) model that utilizes the realized volatility as additional information has been proposed to infer volatility of financial time series. We consider the Bayesian inference of the RSV model by the Hybrid Monte Carlo (HMC) algorithm. The HMC algorithm can be parallelized and thus performed on the GPU for speedup. The GPU code is developed with CUDA Fortran. We compare the computational time in performing the HMC algorithm on GPU (GTX 760) and CPU (Intel i7-4770 3.4GHz) and find that the GPU can be up to 17 times faster than the CPU. We also code the program with OpenACC and find that appropriate coding can achieve the similar speedup with CUDA Fortran

  7. Compiling a 50-year journey

    DEFF Research Database (Denmark)

    Hutton, Graham; Bahr, Patrick

    2017-01-01

    Fifty years ago, John McCarthy and James Painter published the first paper on compiler verification, in which they showed how to formally prove the correctness of a compiler that translates arithmetic expressions into code for a register-based machine. In this article, we revisit this example...

  8. GW Calculations of Materials on the Intel Xeon-Phi Architecture

    Science.gov (United States)

    Deslippe, Jack; da Jornada, Felipe H.; Vigil-Fowler, Derek; Biller, Ariel; Chelikowsky, James R.; Louie, Steven G.

    Intel Xeon-Phi processors are expected to power a large number of High-Performance Computing (HPC) systems around the United States and the world in the near future. We evaluate the ability of GW and pre-requisite Density Functional Theory (DFT) calculations for materials on utilizing the Xeon-Phi architecture. We describe the optimization process and performance improvements achieved. We find that the GW method, like other higher level Many-Body methods beyond standard local/semilocal approximations to Kohn-Sham DFT, is particularly well suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-waves, band-pairs and frequencies. Support provided by the SCIDAC program, Department of Energy, Office of Science, Advanced Scientic Computing Research and Basic Energy Sciences. Grant Numbers DE-SC0008877 (Austin) and DE-AC02-05CH11231 (LBNL).

  9. Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study

    OpenAIRE

    Rucci, Enzo; De Giusti, Armando Eduardo; Naiouf, Marcelo

    2017-01-01

    Manycores are consolidating in HPC community as a way of improving performance while keeping power efficiency. Knights Landing is the recently released second generation of Intel Xeon Phi architec- ture.While optimizing applications on CPUs, GPUs and first Xeon Phi’s has been largely studied in the last years, the new features in Knights Landing processors require the revision of programming and optimization techniques for these devices. In this work, we selected the Floyd-Warshall algorithm ...

  10. Application of Pfortran and Co-Array Fortran in the Parallelization of the GROMOS96 Molecular Dynamics Module

    Directory of Open Access Journals (Sweden)

    Piotr Bała

    2001-01-01

    Full Text Available After at least a decade of parallel tool development, parallelization of scientific applications remains a significant undertaking. Typically parallelization is a specialized activity supported only partially by the programming tool set, with the programmer involved with parallel issues in addition to sequential ones. The details of concern range from algorithm design down to low-level data movement details. The aim of parallel programming tools is to automate the latter without sacrificing performance and portability, allowing the programmer to focus on algorithm specification and development. We present our use of two similar parallelization tools, Pfortran and Cray's Co-Array Fortran, in the parallelization of the GROMOS96 molecular dynamics module. Our parallelization started from the GROMOS96 distribution's shared-memory implementation of the replicated algorithm, but used little of that existing parallel structure. Consequently, our parallelization was close to starting with the sequential version. We found the intuitive extensions to Pfortran and Co-Array Fortran helpful in the rapid parallelization of the project. We present performance figures for both the Pfortran and Co-Array Fortran parallelizations showing linear speedup within the range expected by these parallelization methods.

  11. The Julia programming language: the future of scientific computing

    Science.gov (United States)

    Gibson, John

    2017-11-01

    Julia is an innovative new open-source programming language for high-level, high-performance numerical computing. Julia combines the general-purpose breadth and extensibility of Python, the ease-of-use and numeric focus of Matlab, the speed of C and Fortran, and the metaprogramming power of Lisp. Julia uses type inference and just-in-time compilation to compile high-level user code to machine code on the fly. A rich set of numeric types and extensive numerical libraries are built-in. As a result, Julia is competitive with Matlab for interactive graphical exploration and with C and Fortran for high-performance computing. This talk interactively demonstrates Julia's numerical features and benchmarks Julia against C, C++, Fortran, Matlab, and Python on a spectral time-stepping algorithm for a 1d nonlinear partial differential equation. The Julia code is nearly as compact as Matlab and nearly as fast as Fortran. This material is based upon work supported by the National Science Foundation under Grant No. 1554149.

  12. SLACINPT - a FORTRAN program that generates boundary data for the SLAC gun code

    International Nuclear Information System (INIS)

    Michel, W.L.; Hepburn, J.D.

    1982-03-01

    The FORTRAN program SLACINPT was written to simplify the preparation of boundary data for the SLAC gun code. In SLACINPT, the boundary is described by a sequence of straight line or arc segments. From these, the program generates the individual boundary mesh point data, required as input by the SLAC gun code

  13. Innovative Language-Based & Object-Oriented Structured AMR Using Fortran 90 and OpenMP

    Science.gov (United States)

    Norton, C.; Balsara, D.

    1999-01-01

    Parallel adaptive mesh refinement (AMR) is an important numerical technique that leads to the efficient solution of many physical and engineering problems. In this paper, we describe how AMR programing can be performed in an object-oreinted way using the modern aspects of Fortran 90 combined with the parallelization features of OpenMP.

  14. Parallel grid generation algorithm for distributed memory computers

    Science.gov (United States)

    Moitra, Stuti; Moitra, Anutosh

    1994-01-01

    A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.

  15. An exploratory discussion on business files compilation

    International Nuclear Information System (INIS)

    Gao Chunying

    2014-01-01

    Business files compilation for an enterprise is a distillation and recreation of its spiritual wealth, from which the applicable information can be available to those who want to use it in a fast, extensive and precise way. Proceeding from the effects of business files compilation on scientific researches, productive constructions and developments, this paper in five points discusses the way how to define topics, analyze historical materials, search or select data and process it to an enterprise archives collection. Firstly, to expound the importance and necessity of business files compilation in production, operation and development of an company; secondly, to present processing methods from topic definition, material searching and data selection to final examination and correction; thirdly, to define principle and classification in order to make different categories and levels of processing methods available to business files compilation; fourthly, to discuss the specific method how to implement a file compilation through a documentation collection upon principle of topic definition gearing with demand; fifthly, to address application of information technology to business files compilation in view point of widely needs for business files so as to level up enterprise archives management. The present discussion focuses on the examination and correction principle of enterprise historical material compilation and the basic classifications as well as the major forms of business files compilation achievements. (author)

  16. Evaluation of the OpenCL AES Kernel using the Intel FPGA SDK for OpenCL

    Energy Technology Data Exchange (ETDEWEB)

    Jin, Zheming [Argonne National Lab. (ANL), Argonne, IL (United States); Yoshii, Kazutomo [Argonne National Lab. (ANL), Argonne, IL (United States); Finkel, Hal [Argonne National Lab. (ANL), Argonne, IL (United States); Cappello, Franck [Argonne National Lab. (ANL), Argonne, IL (United States)

    2017-04-20

    The OpenCL standard is an open programming model for accelerating algorithms on heterogeneous computing system. OpenCL extends the C-based programming language for developing portable codes on different platforms such as CPU, Graphics processing units (GPUs), Digital Signal Processors (DSPs) and Field Programmable Gate Arrays (FPGAs). The Intel FPGA SDK for OpenCL is a suite of tools that allows developers to abstract away the complex FPGA-based development flow for a high-level software development flow. Users can focus on the design of hardware-accelerated kernel functions in OpenCL and then direct the tools to generate the low-level FPGA implementations. The approach makes the FPGA-based development more accessible to software users as the needs for hybrid computing using CPUs and FPGAs are increasing. It can also significantly reduce the hardware development time as users can evaluate different ideas with high-level language without deep FPGA domain knowledge. In this report, we evaluate the performance of the kernel using the Intel FPGA SDK for OpenCL and Nallatech 385A FPGA board. Compared to the M506 module, the board provides more hardware resources for a larger design exploration space. The kernel performance is measured with the compute kernel throughput, an upper bound to the FPGA throughput. The report presents the experimental results in details. The Appendix lists the kernel source code.

  17. GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS on Intel Xeon Phi processors

    Directory of Open Access Journals (Sweden)

    H. Wang

    2017-08-01

    Full Text Available The Global Nested Air Quality Prediction Modeling System (GNAQPMS is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS, which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL. Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC, KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1 updating the pure Message Passing Interface (MPI parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2 fully employing the 512 bit wide vector processing units (VPUs on the KNL platform; (3 reducing unnecessary memory access to improve cache efficiency; (4 reducing the thread local storage (TLS in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5 changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined

  18. Time-efficient simulations of tight-binding electronic structures with Intel Xeon PhiTM many-core processors

    Science.gov (United States)

    Ryu, Hoon; Jeong, Yosang; Kang, Ji-Hoon; Cho, Kyu Nam

    2016-12-01

    Modelling of multi-million atomic semiconductor structures is important as it not only predicts properties of physically realizable novel materials, but can accelerate advanced device designs. This work elaborates a new Technology-Computer-Aided-Design (TCAD) tool for nanoelectronics modelling, which uses a sp3d5s∗ tight-binding approach to describe multi-million atomic structures, and simulate electronic structures with high performance computing (HPC), including atomic effects such as alloy and dopant disorders. Being named as Quantum simulation tool for Advanced Nanoscale Devices (Q-AND), the tool shows nice scalability on traditional multi-core HPC clusters implying the strong capability of large-scale electronic structure simulations, particularly with remarkable performance enhancement on latest clusters of Intel Xeon PhiTM coprocessors. A review of the recent modelling study conducted to understand an experimental work of highly phosphorus-doped silicon nanowires, is presented to demonstrate the utility of Q-AND. Having been developed via Intel Parallel Computing Center project, Q-AND will be open to public to establish a sound framework of nanoelectronics modelling with advanced HPC clusters of a many-core base. With details of the development methodology and exemplary study of dopant electronics, this work will present a practical guideline for TCAD development to researchers in the field of computational nanoelectronics.

  19. A parallel implementation of particle tracking with space charge effects on an INTEL iPSC/860

    International Nuclear Information System (INIS)

    Chang, L.; Bourianoff, G.; Cole, B.; Machida, S.

    1993-05-01

    Particle-tracking simulation is one of the scientific applications that is well-suited to parallel computations. At the Superconducting Super Collider, it has been theoretically and empirically demonstrated that particle tracking on a designed lattice can achieve very high parallel efficiency on a MIMD Intel iPSC/860 machine. The key to such success is the realization that the particles can be tracked independently without considering their interaction. The perfectly parallel nature of particle tracking is broken if the interaction effects between particles are included. The space charge introduces an electromagnetic force that will affect the motion of tracked particles in 3-D space. For accurate modeling of the beam dynamics with space charge effects, one needs to solve three-dimensional Maxwell field equations, usually by a particle-in-cell (PIC) algorithm. This will require each particle to communicate with its neighbor grids to compute the momentum changes at each time step. It is expected that the 3-D PIC method will degrade parallel efficiency of particle-tracking implementation on any parallel computer. In this paper, we describe an efficient scheme for implementing particle tracking with space charge effects on an INTEL iPSC/860 machine. Experimental results show that a parallel efficiency of 75% can be obtained

  20. Student Intern Ben Freed Competes as Finalist in Intel STS Competition, Three Other Interns Named Semifinalists | Poster

    Science.gov (United States)

    By Ashley DeVine, Staff Writer Werner H. Kirstin (WHK) student intern Ben Freed was one of 40 finalists to compete in the Intel Science Talent Search (STS) in Washington, DC, in March. “It was seven intense days of interacting with amazing judges and incredibly smart and interesting students. We met President Obama, and then the MIT astronomy lab named minor planets after each

  1. QEDMOD: Fortran program for calculating the model Lamb-shift operator

    Science.gov (United States)

    Shabaev, V. M.; Tupitsyn, I. I.; Yerokhin, V. A.

    2018-02-01

    We present Fortran package QEDMOD for computing the model QED operator hQED that can be used to account for the Lamb shift in accurate atomic-structure calculations. The package routines calculate the matrix elements of hQED with the user-specified one-electron wave functions. The operator can be used to calculate Lamb shift in many-electron atomic systems with a typical accuracy of few percent, either by evaluating the matrix element of hQED with the many-electron wave function, or by adding hQED to the Dirac-Coulomb-Breit Hamiltonian.

  2. Numerical computation of molecular integrals via optimized (vectorized) FORTRAN code

    International Nuclear Information System (INIS)

    Scott, T.C.; Grant, I.P.; Saunders, V.R.

    1997-01-01

    The calculation of molecular properties based on quantum mechanics is an area of fundamental research whose horizons have always been determined by the power of state-of-the-art computers. A computational bottleneck is the numerical calculation of the required molecular integrals to sufficient precision. Herein, we present a method for the rapid numerical evaluation of molecular integrals using optimized FORTRAN code generated by Maple. The method is based on the exploitation of common intermediates and the optimization can be adjusted to both serial and vectorized computations. (orig.)

  3. The FORTRAN-77 version of the Karlsruhe program system KAPROS

    International Nuclear Information System (INIS)

    Moritz, N.

    1985-02-01

    The FORTRAN-77 KAPROS-kernel includes some major changes compared with the version, which is described in the KfK-report 2254. The changes are documented in this report from the point of view of the system-programmer. This report is meant to be a supplement to the KfK-report 2254, assuming that the reader of this report is familiar with the KfK-report 2254. He also should be familiar with the IBM operating system MVS SP1.3.2 and the usual terms of data processing. (orig.) [de

  4. Proving correctness of compilers using structured graphs

    DEFF Research Database (Denmark)

    Bahr, Patrick

    2014-01-01

    it into a compiler implementation using a graph type along with a correctness proof. The implementation and correctness proof of a compiler using a tree type without explicit jumps is simple, but yields code duplication. Our method provides a convenient way of improving such a compiler without giving up the benefits...

  5. 12 CFR 411.600 - Semi-annual compilation.

    Science.gov (United States)

    2010-01-01

    ... 12 Banks and Banking 4 2010-01-01 2010-01-01 false Semi-annual compilation. 411.600 Section 411.600 Banks and Banking EXPORT-IMPORT BANK OF THE UNITED STATES NEW RESTRICTIONS ON LOBBYING Agency Reports § 411.600 Semi-annual compilation. (a) The head of each agency shall collect and compile the...

  6. ALGOL compiler. Syntax and semantic analysis

    International Nuclear Information System (INIS)

    Tarbouriech, Robert

    1971-01-01

    In this research thesis, the author reports the development of an ALGOL compiler which performs the main following tasks: systematic scan of the origin-programme to recognise the different components (identifiers, reserved words, constants, separators), analysis of the origin-programme structure to build up its statements and arithmetic expressions, processing of symbolic names (identifiers) to associate them with values they represent, and memory allocation for data and programme. Several issues are thus addressed: characteristics of the machine for which the compiler is developed, exact definition of the language (grammar, identifier and constant formation), syntax processing programme to provide the compiler with necessary elements (language vocabulary, precedence matrix), description of the first two phases of compilation: lexicographic analysis, and syntax analysis. The last phase (machine-code generation) is not addressed

  7. BEEC: An event generator for simulating the Bc meson production at an e+e- collider

    Science.gov (United States)

    Yang, Zhi; Wu, Xing-Gang; Wang, Xian-You

    2013-12-01

    distributed program, including test data, etc.: 114868 No. of bytes in distributed program, including test data, etc.: 963939 Distribution format: tar.gz Programming language: FORTRAN 77/90. Computer: Any computer with Fortran compiler, the program is tested with GNU Fortran compiler and Intel Fortran compiler. Operating system: UNIX, Linux and Windows. RAM: About 2.0 MB. Classification: 11.2. Nature of problem: Production of charmonium, (cb¯)-quarkonium and bottomonium via e+e- annihilation channel around the Z0 peak. Solution method: The production of heavy (QQ)-quarkonium (Q,Q‧=b,c) via e+e- annihilation are estimated by using the improved trace technology. The (QQ)-quarkonium in color-singlet 1S-wave state, 1P-wave state, and the color-octet 1S-wave states have been studied within the framework of non-relativistic QCD. The code with option can generate weighted and unweighted events conveniently, in particular, the unweighted events are generated by using an improved hit-and-miss approach so as to improve the generating efficiency. Restrictions: The generator is aimed at the production of double heavy quarkonium through e+e- annihilation at the Z0 peak. The considered processes are those that are associated with two heavy quark jets, which could provide sizable quarkonium events around the Z0 peak. Running time: It depends on which option one chooses to match PYTHIA when generating the heavy quarkonium events. Typically, for the production of the S-wave quarkonium states, if setting IDPP=2 (unweighted events), then it takes about 2 h on a 2.9 GHz AMD Athlon (tm) II×4 635 Processor machine to generate 105 events; if setting IDPP=3 (weighted events), it takes only ˜16 min to generate 105 events. For the production of the P-wave quarkonium states, the time will be almost one hundred times longer than the case of the S-wave quarkonium.

  8. Les multituds intel·ligents com a generadores de dades massives : la intel·ligència col·lectiva al servei de la innovació social

    Directory of Open Access Journals (Sweden)

    Sanz, Sandra

    2015-06-01

    Full Text Available Les últimes dècades es registra un increment de mobilitzacions socials organitzades, intervingudes, narrades i coordinades a través de les TIC. Són mostra de multituds intel·ligents (smart mobs que s'aprofiten dels nous mitjans de comunicació per organitzar-se. Tant pel nombre de missatges intercanviats i generats com per les pròpies interaccions generades, aquestes multituds intel·ligents es converteixen en objecte de les dades massives. La seva anàlisi a partir de les possibilitats que brinda l'enginyeria de dades pot contribuir a detectar idees construïdes com també sabers compartits fruit de la intel·ligència col·lectiva. Aquest fet afavoriria la reutilització d'aquesta informació per incrementar el coneixement del col·lectiu i contribuir al desenvolupament de la innovació social. És per això que en aquest article s'assenyalen els interrogants i les limitacions que encara presenten aquestes anàlisis i es posa en relleu la necessitat d'aprofundir en el desenvolupament de nous mètodes i tècniques d'anàlisi.En las últimas décadas se registra un incremento de movilizaciones sociales organizadas, mediadas, narradas y coordinadas a través de TICs. Son muestra de smart mobs o multitudes inteligentes que se aprovechan de los nuevos medios de comunicación para organizarse. Tanto por el número de mensajes intercambiados y generados como por las propias interacciones generadas, estas multitudes inteligentes se convierten en objeto del big data. Su análisis a partir de las posibilidades que brinda la ingeniería de datos puede contribuir a detectar ideas construidas así como saberes compartidos fruto de la inteligencia colectiva. Ello favorecería la reutilización de esta información para incrementar el conocimiento del colectivo y contribuir al desarrollo de la innovación social. Es por ello que en este artículo se señalan los interrogantes y limitaciones que todavía presentan estos análisis y se pone de relieve la

  9. Compilations and evaluations of nuclear structure and decay data

    International Nuclear Information System (INIS)

    Lorenz, A.

    1977-10-01

    This is the third issue of a report series on published and to-be-published compilations and evaluations of nuclear structure and decay (NSD) data. This compilation is published and distributed by the IAEA Nuclear Data Section approximately every six months. This compilation of compilations and evaluations is designed to keep the nuclear scientific community informed of the availability of compiled or evaluated NSD data, and contains references to laboratory reports, journal articles and books containing selected compilations and evaluations

  10. Evaluating the transport layer of the ALFA framework for the Intel(®) Xeon Phi(™) Coprocessor

    OpenAIRE

    Santogidis, Aram; Hirstius, Andreas; Lalis, Spyros

    2015-01-01

    The ALFA framework supports the software development of major High Energy Physics experiments. As part of our research effort to optimize the transport layer of ALFA, we focus on profiling its data transfer performance for inter-node communication on the Intel Xeon Phi Coprocessor. In this article we present the collected performance measurements with the related analysis of the results. The optimization opportunities that are discovered, help us to formulate the future plans of enabling high...

  11. Accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) model on Intel Xeon Phi processors

    OpenAIRE

    Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junming; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

    2017-01-01

    The GNAQPMS model is the global version of the Nested Air Quality Prediction Modelling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present our work of porting and optimizing the GNAQPMS model on the second generation Intel Xeon Phi processor codename “Knights Landing” (KNL). Compared with the first generation Xeon Phi coprocessor, KNL introduced many new hardware features such as a boo...

  12. Multi-threaded ATLAS simulation on Intel Knights Landing processors

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00014247; The ATLAS collaboration; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea

    2017-01-01

    The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with detai...

  13. Multi-threaded ATLAS Simulation on Intel Knights Landing Processors

    CERN Document Server

    Farrell, Steven; The ATLAS collaboration; Calafiura, Paolo; Leggett, Charles

    2016-01-01

    The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), will be delivered to its users in two phases with the first phase online now and the second phase expected in mid-2016. Cori Phase 2 will be based on the KNL architecture and will contain over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a great use-case for the KNL architecture and supercomputers like Cori. Simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this presentation we will give an overview of the ATLAS simulation application with details on its multi-thr...

  14. A new shared-memory programming paradigm for molecular dynamics simulations on the Intel Paragon

    International Nuclear Information System (INIS)

    D'Azevedo, E.F.; Romine, C.H.

    1994-12-01

    This report describes the use of shared memory emulation with DOLIB (Distributed Object Library) to simplify parallel programming on the Intel Paragon. A molecular dynamics application is used as an example to illustrate the use of the DOLIB shared memory library. SOTON-PAR, a parallel molecular dynamics code with explicit message-passing using a Lennard-Jones 6-12 potential, is rewritten using DOLIB primitives. The resulting code has no explicit message primitives and resembles a serial code. The new code can perform dynamic load balancing and achieves better performance than the original parallel code with explicit message-passing

  15. A FORTRAN realization of the block adjustment of CCD frames

    Science.gov (United States)

    Yu, Yong; Tang, Zhenghong; Li, Jinling; Zhao, Ming

    A FORTRAN version realization of the block adjustment (BA) of overlapping CCD frames is developed. The flowchart is introduced including (a) data collection, (b) preprocessing, and (c) BA and object positioning. The subroutines and their functions are also demonstrated. The program package is tested by simulated data with/without the application of white noises. It is also preliminarily applied to the reduction of optical positions of four extragalactic radio sources. The results show that because of the increase in the sky coverage and number of reference stars, the precision of deducted positions is improved compared with single plate adjustment.

  16. Plasma turbulence calculations on the Intel iPSC/860 (rx) hypercube

    International Nuclear Information System (INIS)

    Lynch, V.E.; Ruiter, J.R.

    1990-01-01

    One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A serial algorithm used for plasma turbulence calculations was modified to allocate a radial region in each node. In this way, convolutions at a fixed radius are performed in parallel, and communication is limited to boundary values for each radial region. For a semi-implicity numerical scheme (tridiagonal matrix solver), there is a factor of 3 improvement in efficiency with the Intel iPSC/860 machine using 64 processors over a single-processor Cray-II. For block-tridiagonal matrix cases (fully implicit code), a second parallelization takes place. The Fourier components are distributed in nodes. In each node, the block-tridiagonal matrix is inverted for each of allocated Fourier components. The algorithm for this second case has not yet been optimized. 10 refs., 4 figs

  17. Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors

    Energy Technology Data Exchange (ETDEWEB)

    Heybrock, Simon; Joo, Balint; Kalamkar, Dhiraj D; Smelyanskiy, Mikhail; Vaidyanathan, Karthikeyan; Wettig, Tilo; Dubey, Pradeep

    2014-12-01

    The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel Xeon Phi co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.

  18. Emmarcar el debat: Lliure expressió contra propietat intel·lectual, els propers cinquanta anys

    Directory of Open Access Journals (Sweden)

    Eben Moglen

    2007-02-01

    Full Text Available

    El Prof. Moglen explica i analitza, des d'una perspectiva històrica, la profunda revolució social i legal que resulta de la tecnologia digital quan aquesta s'aplica a tots els camps: programari, música i tot tipus de creacions. En concret, explica la manera en què la tecnologia digital està forçant una modificació substancial (desaparició dels sistemes de propietat intel·lectual i fa prediccions per al futur pròxim dels mercats de la PI.

  19. A browser-based tool for conversion between Fortran NAMELIST and XML/HTML

    Science.gov (United States)

    Naito, O.

    A browser-based tool for conversion between Fortran NAMELIST and XML/HTML is presented. It runs on an HTML5 compliant browser and generates reusable XML files to aid interoperability. It also provides a graphical interface for editing and annotating variables in NAMELIST, hence serves as a primitive code documentation environment. Although the tool is not comprehensive, it could be viewed as a test bed for integrating legacy codes into modern systems.

  20. A browser-based tool for conversion between Fortran NAMELIST and XML/HTML

    Directory of Open Access Journals (Sweden)

    O. Naito

    2017-01-01

    Full Text Available A browser-based tool for conversion between Fortran NAMELIST and XML/HTML is presented. It runs on an HTML5 compliant browser and generates reusable XML files to aid interoperability. It also provides a graphical interface for editing and annotating variables in NAMELIST, hence serves as a primitive code documentation environment. Although the tool is not comprehensive, it could be viewed as a test bed for integrating legacy codes into modern systems.

  1. Compilations and evaluations of nuclear structure and decay data

    International Nuclear Information System (INIS)

    Lorenz, A.

    1977-03-01

    This is the second issue of a report series on published and to-be-published compilations and evaluations of nuclear structure and decay (NSD) data. This compilation of compilations and evaluations is designed to keep the nuclear scientific community informed of the availability of compiled or evaluated NSD data, and contains references to laboratory reports, journal articles and books containing selected compilations and evaluations. It excludes references to ''mass-chain'' evaluations normally published in the ''Nuclear Data Sheets'' and ''Nuclear Physics''. The material contained in this compilation is sorted according to eight subject categories: general compilations; basic isotopic properties; nuclear structure properties; nuclear decay processes; half-lives, energies and spectra; nuclear decay processes: gamma-rays; nuclear decay processes: fission products; nuclear decay processes: (others); atomic processes

  2. Types: A data abstraction package in FORTRAN

    International Nuclear Information System (INIS)

    Youssef, S.

    1990-01-01

    TYPES is a collection of Fortran programs which allow the creation and manipulation of abstract ''data objects'' without the need for a preprocessor. Each data object is assigned a ''type'' as it is created which implies participation in a set of characteristic operations. Available types include scalars, logicals, ordered sets, stacks, queues, sequences, trees, arrays, character strings, block text, histograms, virtual and allocatable memories. A data object may contain integers, reals, or other data objects in any combination. In addition to the type specific operations, a set of universal utilities allows for copying input/output to disk, naming, editing, displaying, user input, interactive creation, tests for equality of contents or structure, machine to machine translation or source code creation for and data object. TYPES is available on VAX/VMS, SUN 3, SPARC, DEC/Ultrix, Silicon Graphics 4D and Cray/Unicos machines. The capabilities of the package are discussed together with characteristic applications and experience in writing the GVerify package

  3. CAPS OpenACC Compilers: Performance and Portability

    CERN Multimedia

    CERN. Geneva

    2013-01-01

    The announcement late 2011 of the new OpenACC directive-based programming standard supported by CAPS, CRAY and PGI compilers has open up the door to more scientific applications that can be ported on many-core systems. Following a porting methodology, this talk will first review the principles of programming with OpenACC and then the advanced features available in the CAPS compilers to further optimize OpenACC applications: library integration, tuning directives with auto-tune mechanisms to build applications adaptive to different GPUs. CAPS compilers use hardware vendors' backends such as NVIDIA CUDA and OpenCL making them the only OpenACC compilers supporting various many-core architectures. About the speaker Stéphane Bihan is co-funder and currently Director of Sales and Marketing at CAPS enterprise. He has held several R&D positions in companies such as ARC international plc in London, Canon Research Center France, ACE compiler experts in Amsterdam and the INRIA r...

  4. Using Intel Xeon Phi to accelerate the WRF TEMF planetary boundary layer scheme

    Science.gov (United States)

    Mielikainen, Jarno; Huang, Bormin; Huang, Allen

    2014-05-01

    The Weather Research and Forecasting (WRF) model is designed for numerical weather prediction and atmospheric research. The WRF software infrastructure consists of several components such as dynamic solvers and physics schemes. Numerical models are used to resolve the large-scale flow. However, subgrid-scale parameterizations are for an estimation of small-scale properties (e.g., boundary layer turbulence and convection, clouds, radiation). Those have a significant influence on the resolved scale due to the complex nonlinear nature of the atmosphere. For the cloudy planetary boundary layer (PBL), it is fundamental to parameterize vertical turbulent fluxes and subgrid-scale condensation in a realistic manner. A parameterization based on the Total Energy - Mass Flux (TEMF) that unifies turbulence and moist convection components produces a better result that the other PBL schemes. For that reason, the TEMF scheme is chosen as the PBL scheme we optimized for Intel Many Integrated Core (MIC), which ushers in a new era of supercomputing speed, performance, and compatibility. It allows the developers to run code at trillions of calculations per second using the familiar programming model. In this paper, we present our optimization results for TEMF planetary boundary layer scheme. The optimizations that were performed were quite generic in nature. Those optimizations included vectorization of the code to utilize vector units inside each CPU. Furthermore, memory access was improved by scalarizing some of the intermediate arrays. The results show that the optimization improved MIC performance by 14.8x. Furthermore, the optimizations increased CPU performance by 2.6x compared to the original multi-threaded code on quad core Intel Xeon E5-2603 running at 1.8 GHz. Compared to the optimized code running on a single CPU socket the optimized MIC code is 6.2x faster.

  5. Far-field Lorenz-Mie scattering in an absorbing host medium: Theoretical formalism and FORTRAN program

    Science.gov (United States)

    Mishchenko, Michael I.; Yang, Ping

    2018-01-01

    In this paper we make practical use of the recently developed first-principles approach to electromagnetic scattering by particles immersed in an unbounded absorbing host medium. Specifically, we introduce an actual computational tool for the calculation of pertinent far-field optical observables in the context of the classical Lorenz-Mie theory. The paper summarizes the relevant theoretical formalism, explains various aspects of the corresponding numerical algorithm, specifies the input and output parameters of a FORTRAN program available at https://www.giss.nasa.gov/staff/mmishchenko/Lorenz-Mie.html, and tabulates benchmark results useful for testing purposes. This public-domain FORTRAN program enables one to solve the following two important problems: (i) simulate theoretically the reading of a remote well-collimated radiometer measuring electromagnetic scattering by an individual spherical particle or a small random group of spherical particles; and (ii) compute the single-scattering parameters that enter the vector radiative transfer equation derived directly from the Maxwell equations.

  6. Far-Field Lorenz-Mie Scattering in an Absorbing Host Medium: Theoretical Formalism and FORTRAN Program

    Science.gov (United States)

    Mishchenko, Michael I.; Yang, Ping

    2018-01-01

    In this paper we make practical use of the recently developed first-principles approach to electromagnetic scattering by particles immersed in an unbounded absorbing host medium. Specifically, we introduce an actual computational tool for the calculation of pertinent far-field optical observables in the context of the classical Lorenzâ€"Mie theory. The paper summarizes the relevant theoretical formalism, explains various aspects of the corresponding numerical algorithm, specifies the input and output parameters of a FORTRAN program available at https://www.giss.nasa.gov/staff/mmishchenko/Lorenz-Mie.html, and tabulates benchmark results useful for testing purposes. This public-domain FORTRAN program enables one to solve the following two important problems: (i) simulate theoretically the reading of a remote well-collimated radiometer measuring electromagnetic scattering by an individual spherical particle or a small random group of spherical particles; and (ii) compute the single-scattering parameters that enter the vector radiative transfer equation derived directly from the Maxwell equations.

  7. HAL/S-360 compiler test activity report

    Science.gov (United States)

    Helmers, C. T.

    1974-01-01

    The levels of testing employed in verifying the HAL/S-360 compiler were as follows: (1) typical applications program case testing; (2) functional testing of the compiler system and its generated code; and (3) machine oriented testing of compiler implementation on operational computers. Details of the initial test plan and subsequent adaptation are reported, along with complete test results for each phase which examined the production of object codes for every possible source statement.

  8. Python based high-level synthesis compiler

    Science.gov (United States)

    Cieszewski, Radosław; Pozniak, Krzysztof; Romaniuk, Ryszard

    2014-11-01

    This paper presents a python based High-Level synthesis (HLS) compiler. The compiler interprets an algorithmic description of a desired behavior written in Python and map it to VHDL. FPGA combines many benefits of both software and ASIC implementations. Like software, the mapped circuit is flexible, and can be reconfigured over the lifetime of the system. FPGAs therefore have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute operations of traditional processors, and possibly exploiting a greater level of parallelism. Creating parallel programs implemented in FPGAs is not trivial. This article describes design, implementation and first results of created Python based compiler.

  9. Experience with low-power x86 processors (Atom) for HEP usage. An initial analysis of the Intel® dual core Atom™ N330 processor

    CERN Document Server

    Balazs, G; Nowak, A; CERN. Geneva. IT Department

    2009-01-01

    In this paper we compare a system based on an Intel Atom N330 low-power processor to a modern Intel Xeon® dual-socket server using CERN IT’s standard criteria for comparing price-performance and performance per watt. The Xeon server corresponds to what is typically acquired as servers in the LHC Computing Grid. The comparisons used public pricing information from November 2008. After the introduction in section 1, section 2 describes the hardware and software setup. In section 3 we describe the power measurements we did and in section 4 we discuss the throughput performance results. In section 5 we summarize our initial conclusions. We then go on to describe our long term vision and possible future scenarios for using such low-power processors, and finally we list interesting development directions.

  10. Automatic Parallelization An Overview of Fundamental Compiler Techniques

    CERN Document Server

    Midkiff, Samuel P

    2012-01-01

    Compiling for parallelism is a longstanding topic of compiler research. This book describes the fundamental principles of compiling "regular" numerical programs for parallelism. We begin with an explanation of analyses that allow a compiler to understand the interaction of data reads and writes in different statements and loop iterations during program execution. These analyses include dependence analysis, use-def analysis and pointer analysis. Next, we describe how the results of these analyses are used to enable transformations that make loops more amenable to parallelization, and

  11. JTpack90: A parallel, object-based, Fortran 90 linear algebra package

    Energy Technology Data Exchange (ETDEWEB)

    Turner, J.A.; Kothe, D.B. [Los Alamos National Lab., NM (United States); Ferrell, R.C. [Cambridge Power Computing Associates, Ltd., Brookline, MA (United States)

    1997-03-01

    The authors have developed an object-based linear algebra package, currently with emphasis on sparse Krylov methods, driven primarily by needs of the Los Alamos National Laboratory parallel unstructured-mesh casting simulation tool Telluride. Support for a number of sparse storage formats, methods, and preconditioners have been implemented, driven primarily by application needs. They describe the object-based Fortran 90 approach, which enhances maintainability, performance, and extensibility, the parallelization approach using a new portable gather/scatter library (PGSLib), current capabilities and future plans, and present preliminary performance results on a variety of platforms.

  12. A software methodology for compiling quantum programs

    Science.gov (United States)

    Häner, Thomas; Steiger, Damian S.; Svore, Krysta; Troyer, Matthias

    2018-04-01

    Quantum computers promise to transform our notions of computation by offering a completely new paradigm. To achieve scalable quantum computation, optimizing compilers and a corresponding software design flow will be essential. We present a software architecture for compiling quantum programs from a high-level language program to hardware-specific instructions. We describe the necessary layers of abstraction and their differences and similarities to classical layers of a computer-aided design flow. For each layer of the stack, we discuss the underlying methods for compilation and optimization. Our software methodology facilitates more rapid innovation among quantum algorithm designers, quantum hardware engineers, and experimentalists. It enables scalable compilation of complex quantum algorithms and can be targeted to any specific quantum hardware implementation.

  13. A compiler for variational forms

    OpenAIRE

    Kirby, Robert C.; Logg, Anders

    2011-01-01

    As a key step towards a complete automation of the finite element method, we present a new algorithm for automatic and efficient evaluation of multilinear variational forms. The algorithm has been implemented in the form of a compiler, the FEniCS Form Compiler FFC. We present benchmark results for a series of standard variational forms, including the incompressible Navier-Stokes equations and linear elasticity. The speedup compared to the standard quadrature-based approach is impressive; in s...

  14. A performance study of sparse Cholesky factorization on INTEL iPSC/860

    Science.gov (United States)

    Zubair, M.; Ghose, M.

    1992-01-01

    The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices. However, there is a lack of such efficient codes on parallel machines in general, and distributed machines in particular. Some of the issues that are critical to the implementation of sparse Cholesky factorization on a distributed memory parallel machine are ordering, partitioning and mapping, load balancing, and ordering of various tasks within a processor. Here, we focus on the effect of various partitioning schemes on the performance of sparse Cholesky factorization on the Intel iPSC/860. Also, a new partitioning heuristic for structured as well as unstructured sparse matrices is proposed, and its performance is compared with other schemes.

  15. Compiler Construction Using Java, JavaCC, and Yacc

    CERN Document Server

    Dos Reis, Anthony J

    2012-01-01

    Broad in scope, involving theory, the application of that theory, and programming technology, compiler construction is a moving target, with constant advances in compiler technology taking place. Today, a renewed focus on do-it-yourself programming makes a quality textbook on compilers, that both students and instructors will enjoy using, of even more vital importance. This book covers every topic essential to learning compilers from the ground up and is accompanied by a powerful and flexible software package for evaluating projects, as well as several tutorials, well-defined projects, and tes

  16. Autogenic Feedback Training (Body Fortran) with Biofeedback and the Computer for Self-Improvement and Change.

    Science.gov (United States)

    Cassel, Russell N.; Sumintardja, Elmira Nasrudin

    1983-01-01

    Describes autogenic feedback training, which provides the basis whereby an individual is able to improve on well being through use of a technique described as "body fortran," implying that you program self as one programs a computer. Necessary requisites are described including relaxation training and the management of stress. (JAC)

  17. PIG 3 - A simple compiler for mercury

    Energy Technology Data Exchange (ETDEWEB)

    Bindon, D C [Computer Branch, Technical Assessments and Services Division, Atomic Energy Establishment, Winfrith, Dorchester, Dorset (United Kingdom)

    1961-06-15

    A short machine language compilation scheme is described; which will read programmes from paper tape, punched cards, or magnetic tape. The compiler occupies pages 8-15 of the ferrite store during translation. (author)

  18. PIG 3 - A simple compiler for mercury

    International Nuclear Information System (INIS)

    Bindon, D.C.

    1961-06-01

    A short machine language compilation scheme is described; which will read programmes from paper tape, punched cards, or magnetic tape. The compiler occupies pages 8-15 of the ferrite store during translation. (author)

  19. Compilation of data on elementary particles

    International Nuclear Information System (INIS)

    Trippe, T.G.

    1984-09-01

    The most widely used data compilation in the field of elementary particle physics is the Review of Particle Properties. The origin, development and current state of this compilation are described with emphasis on the features which have contributed to its success: active involvement of particle physicists; critical evaluation and review of the data; completeness of coverage; regular distribution of reliable summaries including a pocket edition; heavy involvement of expert consultants; and international collaboration. The current state of the Review and new developments such as providing interactive access to the Review's database are described. Problems and solutions related to maintaining a strong and supportive relationship between compilation groups and the researchers who produce and use the data are discussed

  20. Compilation of functional languages using flow graph analysis

    NARCIS (Netherlands)

    Hartel, Pieter H.; Glaser, Hugh; Wild, John M.

    A system based on the notion of a flow graph is used to specify formally and to implement a compiler for a lazy functional language. The compiler takes a simple functional language as input and generates C. The generated C program can then be compiled, and loaded with an extensive run-time system to

  1. Research and Practice of the News Map Compilation Service

    Science.gov (United States)

    Zhao, T.; Liu, W.; Ma, W.

    2018-04-01

    Based on the needs of the news media on the map, this paper researches on the news map compilation service, conducts demand research on the service of compiling news maps, designs and compiles the public authority base map suitable for media publication, and constructs the news base map material library. It studies the compilation of domestic and international news maps with timeliness and strong pertinence and cross-regional characteristics, constructs the hot news thematic gallery and news map customization services, conducts research on types of news maps, establish closer liaison and cooperation methods with news media, and guides news media to use correct maps. Through the practice of the news map compilation service, this paper lists two cases of news map preparation services used by different media, compares and analyses cases, summarizes the research situation of news map compilation service, and at the same time puts forward outstanding problems and development suggestions in the service of news map compilation service.

  2. RESEARCH AND PRACTICE OF THE NEWS MAP COMPILATION SERVICE

    Directory of Open Access Journals (Sweden)

    T. Zhao

    2018-04-01

    Full Text Available Based on the needs of the news media on the map, this paper researches on the news map compilation service, conducts demand research on the service of compiling news maps, designs and compiles the public authority base map suitable for media publication, and constructs the news base map material library. It studies the compilation of domestic and international news maps with timeliness and strong pertinence and cross-regional characteristics, constructs the hot news thematic gallery and news map customization services, conducts research on types of news maps, establish closer liaison and cooperation methods with news media, and guides news media to use correct maps. Through the practice of the news map compilation service, this paper lists two cases of news map preparation services used by different media, compares and analyses cases, summarizes the research situation of news map compilation service, and at the same time puts forward outstanding problems and development suggestions in the service of news map compilation service.

  3. Applications Performance Under MPL and MPI on NAS IBM SP2

    Science.gov (United States)

    Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)

    1994-01-01

    On July 5, 1994, an IBM Scalable POWER parallel System (IBM SP2) with 64 nodes, was installed at the Numerical Aerodynamic Simulation (NAS) Facility Each node of NAS IBM SP2 is a "wide node" consisting of a RISC 6000/590 workstation module with a clock of 66.5 MHz which can perform four floating point operations per clock with a peak performance of 266 Mflop/s. By the end of 1994, 64 nodes of IBM SP2 will be upgraded to 160 nodes with a peak performance of 42.5 Gflop/s. An overview of the IBM SP2 hardware is presented. The basic understanding of architectural details of RS 6000/590 will help application scientists the porting, optimizing, and tuning of codes from other machines such as the CRAY C90 and the Paragon to the NAS SP2. Optimization techniques such as quad-word loading, effective utilization of two floating point units, and data cache optimization of RS 6000/590 is illustrated, with examples giving performance gains at each optimization step. The conversion of codes using Intel's message passing library NX to codes using native Message Passing Library (MPL) and the Message Passing Interface (NMI) library available on the IBM SP2 is illustrated. In particular, we will present the performance of Fast Fourier Transform (FFT) kernel from NAS Parallel Benchmarks (NPB) under MPL and MPI. We have also optimized some of Fortran BLAS 2 and BLAS 3 routines, e.g., the optimized Fortran DAXPY runs at 175 Mflop/s and optimized Fortran DGEMM runs at 230 Mflop/s per node. The performance of the NPB (Class B) on the IBM SP2 is compared with the CRAY C90, Intel Paragon, TMC CM-5E, and the CRAY T3D.

  4. Programs in Fortran language for reporting the results of the analyses by ICP emission spectroscopy

    International Nuclear Information System (INIS)

    Roca, M.

    1985-01-01

    Three programs, written in FORTRAN IV language, for reporting the results of the analyses by ICP emission spectroscopy from data stored in files on floppy disks have been developed. They are intended, respectively, for the analyses of: 1) waters, 2) granites and slates, and 3) different kinds of geological materials. (Author) 8 refs

  5. abc the aspectBench compiler for aspectJ a workbench for aspect-oriented programming language and compilers research

    DEFF Research Database (Denmark)

    Allan, Chris; Avgustinov, Pavel; Christensen, Aske Simon

    2005-01-01

    Aspect-oriented programming (AOP) is gaining popularity as a new way of modularising cross-cutting concerns. The aspectbench compiler (abc) is a new workbench for AOP research which provides an extensible research framework for both new language features and new compiler optimisations. This poste...

  6. NINJA: Java for High Performance Numerical Computing

    Directory of Open Access Journals (Sweden)

    José E. Moreira

    2002-01-01

    Full Text Available When Java was first introduced, there was a perception that its many benefits came at a significant performance cost. In the particularly performance-sensitive field of numerical computing, initial measurements indicated a hundred-fold performance disadvantage between Java and more established languages such as Fortran and C. Although much progress has been made, and Java now can be competitive with C/C++ in many important situations, significant performance challenges remain. Existing Java virtual machines are not yet capable of performing the advanced loop transformations and automatic parallelization that are now common in state-of-the-art Fortran compilers. Java also has difficulties in implementing complex arithmetic efficiently. These performance deficiencies can be attacked with a combination of class libraries (packages, in Java that implement truly multidimensional arrays and complex numbers, and new compiler techniques that exploit the properties of these class libraries to enable other, more conventional, optimizations. Two compiler techniques, versioning and semantic expansion, can be leveraged to allow fully automatic optimization and parallelization of Java code. Our measurements with the NINJA prototype Java environment show that Java can be competitive in performance with highly optimized and tuned Fortran code.

  7. A compilation of energy costs of physical activities.

    Science.gov (United States)

    Vaz, Mario; Karaolis, Nadine; Draper, Alizon; Shetty, Prakash

    2005-10-01

    There were two objectives: first, to review the existing data on energy costs of specified activities in the light of the recommendations made by the Joint Food and Agriculture Organization/World Health Organization/United Nations University (FAO/WHO/UNU) Expert Consultation of 1985. Second, to compile existing data on the energy costs of physical activities for an updated annexure of the current Expert Consultation on Energy and Protein Requirements. Electronic and manual search of the literature (predominantly English) to obtain published data on the energy costs of physical activities. The majority of the data prior to 1955 were obtained using an earlier compilation of Passmore and Durnin. Energy costs were expressed as physical activity ratio (PAR); the energy cost of the activity divided by either the measured or predicted basal metabolic rate (BMR). The compilation provides PARs for an expanded range of activities that include general personal activities, transport, domestic chores, occupational activities, sports and other recreational activities for men and women, separately, where available. The present compilation is largely in agreement with the 1985 compilation, for activities that are common to both compilations. The present compilation has been based on the need to provide data on adults for a wide spectrum of human activity. There are, however, lacunae in the available data for many activities, between genders, across age groups and in various physiological states.

  8. Computationally efficient implementation of sarse-tap FIR adaptive filters with tap-position control on intel IA-32 processors

    OpenAIRE

    Hirano, Akihiro; Nakayama, Kenji

    2008-01-01

    This paper presents an computationally ef cient implementation of sparse-tap FIR adaptive lters with tapposition control on Intel IA-32 processors with single-instruction multiple-data (SIMD) capability. In order to overcome randomorder memory access which prevents a ectorization, a blockbased processing and a re-ordering buffer are introduced. A dynamic register allocation and the use of memory-to-register operations help the maximization of the loop-unrolling level. Up to 66percent speedup ...

  9. Fiscal 2000 report on advanced parallelized compiler technology. Outlines; 2000 nendo advanced heiretsuka compiler gijutsu hokokusho (Gaiyo hen)

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2001-03-01

    Research and development was carried out concerning the automatic parallelized compiler technology which improves on the practical performance, cost/performance ratio, and ease of operation of the multiprocessor system now used for constructing supercomputers and expected to provide a fundamental architecture for microprocessors for the 21st century. Efforts were made to develop an automatic multigrain parallelization technology for extracting multigrain as parallelized from a program and for making full use of the same and a parallelizing tuning technology for accelerating parallelization by feeding back to the compiler the dynamic information and user knowledge to be acquired during execution. Moreover, a benchmark program was selected and studies were made to set execution rules and evaluation indexes for the establishment of technologies for subjectively evaluating the performance of parallelizing compilers for the existing commercial parallel processing computers, which was achieved through the implementation and evaluation of the 'Advanced parallelizing compiler technology research and development project.' (NEDO)

  10. Regulatory and technical reports compilation for 1980

    International Nuclear Information System (INIS)

    Oliu, W.E.; McKenzi, L.

    1981-04-01

    This compilation lists formal regulatory and technical reports and conference proceedings issued in 1980 by the US Nuclear Regulatory Commission. The compilation is divided into four major sections. The first major section consists of a sequential listing of all NRC reports in report-number order. The second major section of this compilation consists of a key-word index to report titles. The third major section contains an alphabetically arranged listing of contractor report numbers cross-referenced to their corresponding NRC report numbers. Finally, the fourth section is an errata supplement

  11. Development of PC based Monte Carlo simulations for the calculation of scanner-specific normalized organ doses from CT

    International Nuclear Information System (INIS)

    Jansen, J. T. M.; Shrimpton, P. C.; Zankl, M.

    2009-01-01

    This paper discusses the simulation of contemporary computed tomography (CT) scanners using Monte Carlo calculation methods to derive normalized organ doses, which enable hospital physicists to estimate typical organ and effective doses for CT examinations. The hardware used in a small PC-cluster at the Health Protection Agency (HPA) for these calculations is described. Investigations concerning optimization of software, including the radiation transport codes MCNP5 and MCNPX, and the Intel and PGI FORTRAN compilers, are presented in relation to results and calculation speed. Differences in approach for modelling the X-ray source are described and their influences are analysed. Comparisons with previously published calculations at HPA from the early 1990's proved satisfactory for the purposes of quality assurance and are presented in terms of organ dose ratios for whole body exposure and differences in organ location. Influences on normalized effective dose are discussed in relation to choice of cross section library, CT scanner technology (contemporary multi slice versus single slice), definition for effective dose (1990 and 2007 versions) and anthropomorphic phantom (mathematical and voxel). The results illustrate the practical need for the updated scanner-specific dose coefficients presently being calculated at HPA, in order to facilitate improved dosimetry for contemporary CT practice. (authors)

  12. The development of GPU-based parallel PRNG for Monte Carlo applications in CUDA Fortran

    Directory of Open Access Journals (Sweden)

    Hamed Kargaran

    2016-04-01

    Full Text Available The implementation of Monte Carlo simulation on the CUDA Fortran requires a fast random number generation with good statistical properties on GPU. In this study, a GPU-based parallel pseudo random number generator (GPPRNG have been proposed to use in high performance computing systems. According to the type of GPU memory usage, GPU scheme is divided into two work modes including GLOBAL_MODE and SHARED_MODE. To generate parallel random numbers based on the independent sequence method, the combination of middle-square method and chaotic map along with the Xorshift PRNG have been employed. Implementation of our developed PPRNG on a single GPU showed a speedup of 150x and 470x (with respect to the speed of PRNG on a single CPU core for GLOBAL_MODE and SHARED_MODE, respectively. To evaluate the accuracy of our developed GPPRNG, its performance was compared to that of some other commercially available PPRNGs such as MATLAB, FORTRAN and Miller-Park algorithm through employing the specific standard tests. The results of this comparison showed that the developed GPPRNG in this study can be used as a fast and accurate tool for computational science applications.

  13. The development of GPU-based parallel PRNG for Monte Carlo applications in CUDA Fortran

    Energy Technology Data Exchange (ETDEWEB)

    Kargaran, Hamed, E-mail: h-kargaran@sbu.ac.ir; Minuchehr, Abdolhamid; Zolfaghari, Ahmad [Department of nuclear engineering, Shahid Behesti University, Tehran, 1983969411 (Iran, Islamic Republic of)

    2016-04-15

    The implementation of Monte Carlo simulation on the CUDA Fortran requires a fast random number generation with good statistical properties on GPU. In this study, a GPU-based parallel pseudo random number generator (GPPRNG) have been proposed to use in high performance computing systems. According to the type of GPU memory usage, GPU scheme is divided into two work modes including GLOBAL-MODE and SHARED-MODE. To generate parallel random numbers based on the independent sequence method, the combination of middle-square method and chaotic map along with the Xorshift PRNG have been employed. Implementation of our developed PPRNG on a single GPU showed a speedup of 150x and 470x (with respect to the speed of PRNG on a single CPU core) for GLOBAL-MODE and SHARED-MODE, respectively. To evaluate the accuracy of our developed GPPRNG, its performance was compared to that of some other commercially available PPRNGs such as MATLAB, FORTRAN and Miller-Park algorithm through employing the specific standard tests. The results of this comparison showed that the developed GPPRNG in this study can be used as a fast and accurate tool for computational science applications.

  14. Compiling quantum circuits to realistic hardware architectures using temporal planners

    Science.gov (United States)

    Venturelli, Davide; Do, Minh; Rieffel, Eleanor; Frank, Jeremy

    2018-04-01

    To run quantum algorithms on emerging gate-model quantum hardware, quantum circuits must be compiled to take into account constraints on the hardware. For near-term hardware, with only limited means to mitigate decoherence, it is critical to minimize the duration of the circuit. We investigate the application of temporal planners to the problem of compiling quantum circuits to newly emerging quantum hardware. While our approach is general, we focus on compiling to superconducting hardware architectures with nearest neighbor constraints. Our initial experiments focus on compiling Quantum Alternating Operator Ansatz (QAOA) circuits whose high number of commuting gates allow great flexibility in the order in which the gates can be applied. That freedom makes it more challenging to find optimal compilations but also means there is a greater potential win from more optimized compilation than for less flexible circuits. We map this quantum circuit compilation problem to a temporal planning problem, and generated a test suite of compilation problems for QAOA circuits of various sizes to a realistic hardware architecture. We report compilation results from several state-of-the-art temporal planners on this test set. This early empirical evaluation demonstrates that temporal planning is a viable approach to quantum circuit compilation.

  15. A comparison of SuperLU solvers on the intel MIC architecture

    Science.gov (United States)

    Tuncel, Mehmet; Duran, Ahmet; Celebi, M. Serdar; Akaydin, Bora; Topkaya, Figen O.

    2016-10-01

    In many science and engineering applications, problems may result in solving a sparse linear system AX=B. For example, SuperLU_MCDT, a linear solver, was used for the large penta-diagonal matrices for 2D problems and hepta-diagonal matrices for 3D problems, coming from the incompressible blood flow simulation (see [1]). It is important to test the status and potential improvements of state-of-the-art solvers on new technologies. In this work, sequential, multithreaded and distributed versions of SuperLU solvers (see [2]) are examined on the Intel Xeon Phi coprocessors using offload programming model at the EURORA cluster of CINECA in Italy. We consider a portfolio of test matrices containing patterned matrices from UFMM ([3]) and randomly located matrices. This architecture can benefit from high parallelism and large vectors. We find that the sequential SuperLU benefited up to 45 % performance improvement from the offload programming depending on the sparse matrix type and the size of transferred and processed data.

  16. Experiences with the parallelisation of Monte Carlo problems

    International Nuclear Information System (INIS)

    Schmidt, F.; Dax, W.; Luger, M.

    1990-01-01

    Monte Carlo problems can be parallelized in a natural way. Therefore parallelisation of production codes can be performed quite easily provided the codes are written in FORTRAN and can be transferred to the parallel machine and this machine has a pseudo random number generator available. The MORSE code is a code which can be transferred. We have done this to the CRAY-2 and the 32 processor version of the TX2 which is a binary tree structured parallel machine based on INTEL 80286 processors. We are able to reach efficiencies up to 95% for realistic problems. Thus the same throughput as on one processor on the CRAY-2 could be reached. First experiments on the INTEL i860 based TX3 indicate an additional gain of a factor 100. This will permit the reconsideration of the Monte Carlo method in both nuclear engineering and as a general numerical tool. (author)

  17. GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

    OpenAIRE

    H. Wang; H. Wang; H. Wang; H. Wang; H. Chen; H. Chen; Q. Wu; Q. Wu; J. Lin; X. Chen; X. Xie; R. Wang; R. Wang; X. Tang; Z. Wang

    2017-01-01

    The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (code...

  18. An efficient MPI/OpenMP parallelization of the Hartree–Fock–Roothaan method for the first generation of Intel® Xeon Phi™ processor architecture

    International Nuclear Information System (INIS)

    Mironov, Vladimir; Moskovsky, Alexander; D’Mello, Michael; Alexeev, Yuri

    2017-01-01

    The Hartree-Fock (HF) method in the quantum chemistry package GAMESS represents one of the most irregular algorithms in computation today. Major steps in the calculation are the irregular computation of electron repulsion integrals (ERIs) and the building of the Fock matrix. These are the central components of the main Self Consistent Field (SCF) loop, the key hotspot in Electronic Structure (ES) codes. By threading the MPI ranks in the official release of the GAMESS code, we not only speed up the main SCF loop (4x to 6x for large systems), but also achieve a significant (>2x) reduction in the overall memory footprint. These improvements are a direct consequence of memory access optimizations within the MPI ranks. We benchmark our implementation against the official release of the GAMESS code on the Intel R Xeon PhiTM supercomputer. Here, scaling numbers are reported on up to 7,680 cores on Intel Xeon Phi coprocessors.

  19. "SMART": A Compact and Handy FORTRAN Code for the Physics of Stellar Atmospheres

    Science.gov (United States)

    Sapar, A.; Poolamäe, R.

    2003-01-01

    A new computer code SMART (Spectra from Model Atmospheres by Radiative Transfer) for computing the stellar spectra, forming in plane-parallel atmospheres, has been compiled by us and A. Aret. To guarantee wide compatibility of the code with shell environment, we chose FORTRAN-77 as programming language and tried to confine ourselves to common part of its numerous versions both in WINDOWS and LINUX. SMART can be used for studies of several processes in stellar atmospheres. The current version of the programme is undergoing rapid changes due to our goal to elaborate a simple, handy and compact code. Instead of linearisation (being a mathematical method of recurrent approximations) we propose to use the physical evolutionary changes or in other words relaxation of quantum state populations rates from LTE to NLTE has been studied using small number of NLTE states. This computational scheme is essentially simpler and more compact than the linearisation. This relaxation scheme enables using instead of the Λ-iteration procedure a physically changing emissivity (or the source function) which incorporates in itself changing Menzel coefficients for NLTE quantum state populations. However, the light scattering on free electrons is in the terms of Feynman graphs a real second-order quantum process and cannot be reduced to consequent processes of absorption and emission as in the case of radiative transfer in spectral lines. With duly chosen input parameters the code SMART enables computing radiative acceleration to the matter of stellar atmosphere in turbulence clumps. This also enables to connect the model atmosphere in more detail with the problem of the stellar wind triggering. Another problem, which has been incorporated into the computer code SMART, is diffusion of chemical elements and their isotopes in the atmospheres of chemically peculiar (CP) stars due to usual radiative acceleration and the essential additional acceleration generated by the light-induced drift. As

  20. Regulatory and technical reports: compilation for 1975-1978

    International Nuclear Information System (INIS)

    1982-04-01

    This brief compilation lists formal reports issued by the US Nuclear Regulatory Commission in 1975 through 1978 that were not listed in the Regulatory and Technical Reports Compilation for 1975 to 1978, NUREG-0304, Vol. 3. This compilation is divided into two sections. The first consists of a sequential listing of all reports in report-number order. The second section consists of an index developed from keywords in report titles and abstracts

  1. specsim: A Fortran-77 program for conditional spectral simulation in 3D

    Science.gov (United States)

    Yao, Tingting

    1998-12-01

    A Fortran 77 program, specsim, is presented for conditional spectral simulation in 3D domains. The traditional Fourier integral method allows generating random fields with a given covariance spectrum. Conditioning to local data is achieved by an iterative identification of the conditional phase information. A flowchart of the program is given to illustrate the implementation procedures of the program. A 3D case study is presented to demonstrate application of the program. A comparison with the traditional sequential Gaussian simulation algorithm emphasizes the advantages and drawbacks of the proposed algorithm.

  2. Scalability of Parallel Spatial Direct Numerical Simulations on Intel Hypercube and IBM SP1 and SP2

    Science.gov (United States)

    Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad

    1995-01-01

    The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 parallel computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be parallelized on a distributed-memory parallel machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same simulation, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this simulation, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical simulations; incompressible viscous flows; spectral methods; finite differences; parallel computing.

  3. Compilations and evaluations of nuclear structure and decay data

    International Nuclear Information System (INIS)

    Lorenz, A.

    1978-10-01

    This is the fourth issue of a report series on published and to-be-published compilations and evaluations of nuclear structure and decay (NSD) data. This compilation is published and distributed by the IAEA Nuclear Data Section every year. The material contained in this compilation is sorted according to eight subject categories: General compilations; basic isotopic properties; nuclear structure properties; nuclear decay processes, half-lives, energies and spectra; nuclear decay processes, gamma-rays; nuclear decay processes, fission products; nuclear decay processes (others); atomic processes

  4. Advanced C and C++ compiling

    CERN Document Server

    Stevanovic, Milan

    2014-01-01

    Learning how to write C/C++ code is only the first step. To be a serious programmer, you need to understand the structure and purpose of the binary files produced by the compiler: object files, static libraries, shared libraries, and, of course, executables.Advanced C and C++ Compiling explains the build process in detail and shows how to integrate code from other developers in the form of deployed libraries as well as how to resolve issues and potential mismatches between your own and external code trees.With the proliferation of open source, understanding these issues is increasingly the res

  5. Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

    CERN Document Server

    Abdurachmanov, David; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad

    2014-01-01

    Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).

  6. Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

    Science.gov (United States)

    Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad

    2015-05-01

    Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).

  7. Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

    International Nuclear Information System (INIS)

    Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Muzaffar, Shahzad; Knight, Robert

    2015-01-01

    Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG). (paper)

  8. A FORTRAN version implementation of block adjustment of CCD frames and its preliminary application

    Science.gov (United States)

    Yu, Y.; Tang, Z.-H.; Li, J.-L.; Zhao, M.

    2005-09-01

    A FORTRAN version implementation of the block adjustment (BA) of overlapping CCD frames is developed and its flowchart is shown. The program is preliminarily applied to obtain the optical positions of four extragalactic radio sources. The results show that because of the increase in the number and sky coverage of reference stars the precision of optical positions with BA is improved compared with the single CCD frame adjustment.

  9. HAPS, a Handy Analog Programming System

    DEFF Research Database (Denmark)

    Højberg, Kristian Søe

    1975-01-01

    HAPS (Hybrid Analog Programming System) is an analog compiler that can be run on a minicomputer in an interactive mode. Essentially HAPS is written in FORTRAN. The equations to be programmed for an ana log computer are read in by using a FORTRAN-like notation. The input must contain maximum...... and emphasizes the limitations HAPS puts on equation structure, types of computing circuit, scaling, and static testing....

  10. Compilation of Sandia Laboratories technical capabilities

    International Nuclear Information System (INIS)

    Lundergan, C.D.; Mead, P.L.

    1975-11-01

    This report is a compilation of 17 individual documents that together summarize the technical capabilities of Sandia Laboratories. Each document in this compilation contains details about a specific area of capability. Examples of application of the capability to research and development problems are provided. An eighteenth document summarizes the content of the other seventeen. Each of these documents was issued with a separate report number (SAND 74-0073A through SAND 74-0091, except -0078)

  11. Evaluation of the Intel iWarp parallel processor for space flight applications

    Science.gov (United States)

    Hine, Butler P., III; Fong, Terrence W.

    1993-01-01

    The potential of a DARPA-sponsored advanced processor, the Intel iWarp, for use in future SSF Data Management Systems (DMS) upgrades is evaluated through integration into the Ames DMS testbed and applications testing. The iWarp is a distributed, parallel computing system well suited for high performance computing applications such as matrix operations and image processing. The system architecture is modular, supports systolic and message-based computation, and is capable of providing massive computational power in a low-cost, low-power package. As a consequence, the iWarp offers significant potential for advanced space-based computing. This research seeks to determine the iWarp's suitability as a processing device for space missions. In particular, the project focuses on evaluating the ease of integrating the iWarp into the SSF DMS baseline architecture and the iWarp's ability to support computationally stressing applications representative of SSF tasks.

  12. PLOTGEOMX: a program for display of a neutron target assembly by means of a GHOST plotting system

    International Nuclear Information System (INIS)

    Clarke, J.H.

    1978-02-01

    The program PLOTGEOM has been modified to work on the A.E.R.E., Harwell IBM 370-167 computer using the GHOST graphics package. The control data routine has been altered to permit free format input and the program has been compiled and stored using the extended-H FORTRAN optimising compiler. (author)

  13. How to Build MCNP 6.2

    Energy Technology Data Exchange (ETDEWEB)

    Bull, Jeffrey S. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-11-13

    This presentation describes how to build MCNP 6.2. MCNP®* 6.2 can be compiled on Macs, PCs, and most Linux systems. It can also be built for parallel execution using both OpenMP and Messing Passing Interface (MPI) methods. MCNP6 requires Fortran, C, and C++ compilers to build the code.

  14. The inverse of winnowing: a FORTRAN subroutine and discussion of unwinnowing discrete data

    Science.gov (United States)

    Bracken, Robert E.

    2004-01-01

    This report describes an unwinnowing algorithm that utilizes a discrete Fourier transform, and a resulting Fortran subroutine that winnows or unwinnows a 1-dimensional stream of discrete data; the source code is included. The unwinnowing algorithm effectively increases (by integral factors) the number of available data points while maintaining the original frequency spectrum of a data stream. This has utility when an increased data density is required together with an availability of higher order derivatives that honor the original data.

  15. Mask industry quality assessment

    Science.gov (United States)

    Strott, Al; Bassist, Larry

    1994-12-01

    Product quality and timely delivery are two of the most important parameters in determining the success of a mask manufacturing facility. Because of the sensitivity of this data, very little was known about industry performance in these areas until an assessment was authored and presented at the 1993 BACUS Symposium by Larry Regis of Intel Corporation, Neil Paulsen of Intel Corporation, and James A. Reynolds of Reynolds Consulting. This data has been updated and will be published and presented at this year's BACUS Symposium. Contributor identities will again remain protected by utilizing Arthur Andersen & Company to compile the submittals. Participation was consistent with last year's representation of over 75% of the total merchant and captive mask volume in the United States. The data compiled includes shipments, customer return rate, customer return reasons from 1988 through Q2, 1994, performance to schedule, plate survival yield, and throughput time (TPT).

  16. Compilation of Sandia Laboratories technical capabilities

    Energy Technology Data Exchange (ETDEWEB)

    Lundergan, C. D.; Mead, P. L. [eds.

    1975-11-01

    This report is a compilation of 17 individual documents that together summarize the technical capabilities of Sandia Laboratories. Each document in this compilation contains details about a specific area of capability. Examples of application of the capability to research and development problems are provided. An eighteenth document summarizes the content of the other seventeen. Each of these documents was issued with a separate report number (SAND 74-0073A through SAND 74-0091, except -0078). (RWR)

  17. Extension of Alvis compiler front-end

    Energy Technology Data Exchange (ETDEWEB)

    Wypych, Michał; Szpyrka, Marcin; Matyasik, Piotr, E-mail: mwypych@agh.edu.pl, E-mail: mszpyrka@agh.edu.pl, E-mail: ptm@agh.edu.pl [AGH University of Science and Technology, Department of Applied Computer Science, Al. Mickiewicza 30, 30-059 Krakow (Poland)

    2015-12-31

    Alvis is a formal modelling language that enables possibility of verification of distributed concurrent systems. An Alvis model semantics finds expression in an LTS graph (labelled transition system). Execution of any language statement is expressed as a transition between formally defined states of such a model. An LTS graph is generated using a middle-stage Haskell representation of an Alvis model. Moreover, Haskell is used as a part of the Alvis language and is used to define parameters’ types and operations on them. Thanks to the compiler’s modular construction many aspects of compilation of an Alvis model may be modified. Providing new plugins for Alvis Compiler that support languages like Java or C makes possible using these languages as a part of Alvis instead of Haskell. The paper presents the compiler internal model and describes how the default specification language can be altered by new plugins.

  18. Semantics-Based Compiling: A Case Study in Type-Directed Partial Evaluation

    DEFF Research Database (Denmark)

    Danvy, Olivier; Vestergaard, René

    1996-01-01

    , block-structured, higher-order, call-by-value, allows subtyping, and obeys stack discipline. It is bigger than what is usually reported in the literature on semantics-based compiling and partial evaluation. Our compiling technique uses the first Futamura projection, i.e., we compile programs......-directed compilation, in the spirit of Scott and Strachey. Our conclusion is that lambda-calculus normalization suffices for compiling by specializing an interpreter....

  19. Semantics-based compiling: A case study in type-directed partial evaluation

    DEFF Research Database (Denmark)

    Danvy, Olivier; Vestergaard, René

    1996-01-01

    , block-structured, higher-order, call-by-value, allows subtyping, and obeys stack discipline. It is bigger than what is usually reported in the literature on semantics-based compiling and partial evaluation. Our compiling technique uses the first Futamura projection, i.e., we compile programs......-directed compilation, in the spirit of Scott and Strachey. Our conclusion is that lambda-calculus normalization suffices for compiling by specializing an interpreter....

  20. A Symmetric Approach to Compilation and Decompilation

    DEFF Research Database (Denmark)

    Ager, Mads Sig; Danvy, Olivier; Goldberg, Mayer

    2002-01-01

    Just as an interpreter for a source language can be turned into a compiler from the source language to a target language, we observe that an interpreter for a target language can be turned into a compiler from the target language to a source language. In both cases, the key issue is the choice of...

  1. ASSIST - a package of Fortran routines for handling input under specified syntax rules and for management of data structures

    International Nuclear Information System (INIS)

    Sinclair, J.E.

    1991-02-01

    The ASSIST package (A Structured Storage and Input Syntax Tool) provides for Fortran programs a means for handling data structures more general than those provided by the Fortran language, and for obtaining input to the program from a file or terminal according to specified syntax rules. The syntax-controlled input can be interactive, with automatic generation of prompts, and dialogue to correct any input errors. The range of syntax rules possible is sufficient to handle lists of numbers and character strings, keywords, commands with optional clauses, and many kinds of variable-format constructions, such as algebraic expressions. ASSIST was developed for use in two large programs for the analysis of safety of radioactive waste disposal facilities, but it should prove useful for a wide variety of applications. (author)

  2. Menhir: An Environment for High Performance Matlab

    Directory of Open Access Journals (Sweden)

    Stéphane Chauveau

    1999-01-01

    Full Text Available In this paper we present Menhir a compiler for generating sequential or parallel code from the Matlab language. The compiler has been designed in the context of using Matlab as a specification language. One of the major features of Menhir is its retargetability to generate parallel and sequential C or Fortran code. We present the compilation process and the target system description for Menhir. Preliminary performances are given and compared with MCC, the MathWorks Matlab compiler.

  3. Compilation of current high energy physics experiments

    International Nuclear Information System (INIS)

    1978-09-01

    This compilation of current high-energy physics experiments is a collaborative effort of the Berkeley Particle Data Group, the SLAC library, and the nine participating laboratories: Argonne (ANL), Brookhaven (BNL), CERN, DESY, Fermilab (FNAL), KEK, Rutherford (RHEL), Serpukhov (SERP), and SLAC. Nominally, the compilation includes summaries of all high-energy physics experiments at the above laboratories that were approved (and not subsequently withdrawn) before about June 1978, and had not completed taking of data by 1 January 1975. The experimental summaries are supplemented with three indexes to the compilation, several vocabulary lists giving names or abbreviations used, and a short summary of the beams at each of the laboratories (except Rutherford). The summaries themselves are included on microfiche

  4. Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™

    Science.gov (United States)

    Gomes, Jeremias M.; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H.

    2016-01-01

    We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon Phi™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP’s irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63× on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7× and 1.62×, respectively, as compared to efficient CPU and GPU implementations. PMID:27298591

  5. Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™.

    Science.gov (United States)

    Gomes, Jeremias M; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H

    2015-10-01

    We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel ® Xeon Phi ™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP's irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63 × on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7 × and 1.62 × , respectively, as compared to efficient CPU and GPU implementations.

  6. Compiler design handbook optimizations and machine code generation

    CERN Document Server

    Srikant, YN

    2003-01-01

    The widespread use of object-oriented languages and Internet security concerns are just the beginning. Add embedded systems, multiple memory banks, highly pipelined units operating in parallel, and a host of other advances and it becomes clear that current and future computer architectures pose immense challenges to compiler designers-challenges that already exceed the capabilities of traditional compilation techniques. The Compiler Design Handbook: Optimizations and Machine Code Generation is designed to help you meet those challenges. Written by top researchers and designers from around the

  7. The fortran programme for the calculation of the absorption and double scattering corrections in cross-section measurements with fast neutrons using the monte Carlo method (1963); Programme fortran pour le calcul des corrections d'absorption et de double diffusion dans les mesures de sections efficaces pour les neutrons rapides par la methode de monte-carlo (1963)

    Energy Technology Data Exchange (ETDEWEB)

    Fernandez, B [Commissariat a l' Energie Atomique, Saclay (France). Centre d' Etudes Nucleaires

    1963-07-01

    A calculation for double scattering and absorption corrections in fast neutron scattering experiments using Monte-Carlo method is given. Application to cylindrical target is presented in FORTRAN symbolic language. (author) [French] Un calcul des corrections de double diffusion et d'absorption dans les experiences de diffusion de neutrons rapides par la methode de Monte-Carlo est presente. L'application au cas d'une cible cylindrique est traitee en langage symbolique FORTRAN. (auteur)

  8. A FORTRAN program for an IBM PC compatible computer for calculating kinematical electron diffraction patterns

    International Nuclear Information System (INIS)

    Skjerpe, P.

    1989-01-01

    This report describes a computer program which is useful in transmission electron microscopy. The program is written in FORTRAN and calculates kinematical electron diffraction patterns in any zone axis from a given crystal structure. Quite large unit cells, containing up to 2250 atoms, can be handled by the program. The program runs on both the Helcules graphic card and the standard IBM CGA card

  9. Writing Compilers and Interpreters A Software Engineering Approach

    CERN Document Server

    Mak, Ronald

    2011-01-01

    Long-awaited revision to a unique guide that covers both compilers and interpreters Revised, updated, and now focusing on Java instead of C++, this long-awaited, latest edition of this popular book teaches programmers and software engineering students how to write compilers and interpreters using Java. You?ll write compilers and interpreters as case studies, generating general assembly code for a Java Virtual Machine that takes advantage of the Java Collections Framework to shorten and simplify the code. In addition, coverage includes Java Collections Framework, UML modeling, object-oriented p

  10. Programs in Fortran language for reporting the results of the analyses by ICP emission spectroscopy; Programas en lenguaje Fortran para la informacion de los resultados de los analisis efectuados mediante Espectroscopia Optica de emision con fuente de plasma

    Energy Technology Data Exchange (ETDEWEB)

    Roca, M

    1985-07-01

    Three programs, written in FORTRAN IV language, for reporting the results of the analyses by ICP emission spectroscopy from data stored in files on floppy disks have been developed. They are intended, respectively, for the analyses of: 1) waters, 2) granites and slates, and 3) different kinds of geological materials. (Author) 8 refs.

  11. ANIBAL - a Hybrid Computer Language for EAI 680-PDP 8/I, FPP 12

    DEFF Research Database (Denmark)

    Højberg, Kristian Søe

    1974-01-01

    and special hybrid computer commands. ANIBAL consists of a general-purpose analog interface subroutine ANI and the macro processor 8BAL (DECUS NO 8-497A.1). When a source program with FORTRAN and 8BAL statements is processed, the FORTRAN statements are transferred unchanged, while the 8BAL code is translated...... essentially to ANI sub-routine calls, which are defined in a macro library. The resulting code is translated by the standard FORTRAN compiler. The language is very flexible as the instructions can be changed and commands can be added to or excluded from the library ....

  12. Library of subroutines to produce one- and two-dimensional statistical distributions on the ES-1010 computer

    International Nuclear Information System (INIS)

    Vzorov, I.K.; Ivanov, V.V.

    1978-01-01

    A library of subroutines to produce 1- and 2-dimensional distribution on the ES-1010 computer is described. 1-dimensional distribution is represented as the histogram, 2-dimensional one is represented as the table. The library provides such opportunities as booking and deleting, filling and clearing histograms (tables), arithmetic operations with them, and printing histograms (tables) on the computer printer with variable printer line. All subroutines are written in FORTRAN-4 language and can be called from the program written in FORTRAN or in ASSEMBLER. This library can be implemented on all computer systems that offer a FORTRAN-4 compiler

  13. Numerical Schemes for Charged Particle Movement in PIC Simulations

    International Nuclear Information System (INIS)

    Kulhanek, P.

    2001-01-01

    A PIC model of plasma fibers is developed in the Department of Physics of the Czech Technical University for several years. The program code was written in FORTRAN 95, free-style (without compulsory columns). Fortran compiler and linker were used from Compaq Visual Fortran 6.1A embedded in the Microsoft Development studio GUI. Fully three-dimensional code with periodical boundary conditions was developed. Electromagnetic fields are localized on a grid and particles move freely through this grid. One of the partial problems of the PIC model is the numerical particle solver, which will be discussed in this paper. (author)

  14. Plasma Science and Applications at the Intel Science Fair: A Retrospective

    Science.gov (United States)

    Berry, Lee

    2009-11-01

    For the past five years, the Coalition for Plasma Science (CPS) has presented an award for a plasma project at the Intel International Science and Engineering Fair (ISEF). Eligible projects have ranged from grape-based plasma production in a microwave oven to observation of the effects of viscosity in a fluid model of quark-gluon plasma. Most projects have been aimed at applications, including fusion, thrusters, lighting, materials processing, and GPS improvements. However diagnostics (spectroscopy), technology (magnets), and theory (quark-gluon plasmas) have also been represented. All of the CPS award-winning projects so far have been based on experiments, with two awards going to women students and three to men. Since the award was initiated, both the number and quality of plasma projects has increased. The CPS expects this trend to continue, and looks forward to continuing its work with students who are excited about the possibilities of plasma. You too can share this excitement by judging at the 2010 fair in San Jose on May 11-12.

  15. Aviation Safety Modeling and Simulation (ASMM) Propulsion Fleet Modeling: A Tool for Semi-Automatic Construction of CORBA-based Applications from Legacy Fortran Programs

    Science.gov (United States)

    Sang, Janche

    2003-01-01

    Within NASA's Aviation Safety Program, NASA GRC participates in the Modeling and Simulation Project called ASMM. NASA GRC s focus is to characterize the propulsion systems performance from a fleet management and maintenance perspective by modeling and through simulation predict the characteristics of two classes of commercial engines (CFM56 and GE90). In prior years, the High Performance Computing and Communication (HPCC) program funded, NASA Glenn in developing a large scale, detailed simulations for the analysis and design of aircraft engines called the Numerical Propulsion System Simulation (NPSS). Three major aspects of this modeling included the integration of different engine components, coupling of multiple disciplines, and engine component zooming at appropriate level fidelity, require relatively tight coupling of different analysis codes. Most of these codes in aerodynamics and solid mechanics are written in Fortran. Refitting these legacy Fortran codes with distributed objects can increase these codes reusability. Aviation Safety s modeling and simulation use in characterizing fleet management has similar needs. The modeling and simulation of these propulsion systems use existing Fortran and C codes that are instrumental in determining the performance of the fleet. The research centers on building a CORBA-based development environment for programmers to easily wrap and couple legacy Fortran codes. This environment consists of a C++ wrapper library to hide the details of CORBA and an efficient remote variable scheme to facilitate data exchange between the client and the server model. Additionally, a Web Service model should also be constructed for evaluation of this technology s use over the next two- three years.

  16. Promising Compilation to ARMv8 POP

    OpenAIRE

    Podkopaev, Anton; Lahav, Ori; Vafeiadis, Viktor

    2017-01-01

    We prove the correctness of compilation of relaxed memory accesses and release-acquire fences from the "promising" semantics of [Kang et al. POPL'17] to the ARMv8 POP machine of [Flur et al. POPL'16]. The proof is highly non-trivial because both the ARMv8 POP and the promising semantics provide some extremely weak consistency guarantees for normal memory accesses; however, they do so in rather different ways. Our proof of compilation correctness to ARMv8 POP strengthens the results of the Kan...

  17. Computer applications in physics with FORTRAN, BASIC and C

    CERN Document Server

    Chandra, Suresh

    2014-01-01

    Because of encouraging response for first two editions of the book and for taking into account valuable suggestion from teachers as well as students, the text for Interpolation, Differentiation, Integration, Roots of an Equation, Solution of Simultaneous Equations, Eigenvalues and Eigenvectors of Matrix, Solution of Differential Equations, Solution of Partial Differential Equations, Monte Carlo Method and Simulation, Computation of some Functions is improved throughout and presented in a more systematic manner by using simple language. These techniques have vast applications in Science, Engineering and Technology. The C language is becoming popular in universities, colleges and engineering institutions. Besides the C language, programs are written in FORTRAN and BASIC languages. Consequently, this book has rather wide scope for its use. Each of the topics are developed in a systematic manner; thus making this book useful for graduate, postgraduate and engineering students. KEY FEATURES: Each topic is self exp...

  18. A comparative study of programming languages for next-generation astrodynamics systems

    Science.gov (United States)

    Eichhorn, Helge; Cano, Juan Luis; McLean, Frazer; Anderl, Reiner

    2018-03-01

    Due to the computationally intensive nature of astrodynamics tasks, astrodynamicists have relied on compiled programming languages such as Fortran for the development of astrodynamics software. Interpreted languages such as Python, on the other hand, offer higher flexibility and development speed thereby increasing the productivity of the programmer. While interpreted languages are generally slower than compiled languages, recent developments such as just-in-time (JIT) compilers or transpilers have been able to close this speed gap significantly. Another important factor for the usefulness of a programming language is its wider ecosystem which consists of the available open-source packages and development tools such as integrated development environments or debuggers. This study compares three compiled languages and three interpreted languages, which were selected based on their popularity within the scientific programming community and technical merit. The three compiled candidate languages are Fortran, C++, and Java. Python, Matlab, and Julia were selected as the interpreted candidate languages. All six languages are assessed and compared to each other based on their features, performance, and ease-of-use through the implementation of idiomatic solutions to classical astrodynamics problems. We show that compiled languages still provide the best performance for astrodynamics applications, but JIT-compiled dynamic languages have reached a competitive level of speed and offer an attractive compromise between numerical performance and programmer productivity.

  19. Design of methodology for incremental compiler construction

    Directory of Open Access Journals (Sweden)

    Pavel Haluza

    2011-01-01

    Full Text Available The paper deals with possibilities of the incremental compiler construction. It represents the compiler construction possibilities for languages with a fixed set of lexical units and for languages with a variable set of lexical units, too. The methodology design for the incremental compiler construction is based on the known algorithms for standard compiler construction and derived for both groups of languages. Under the group of languages with a fixed set of lexical units there belong languages, where each lexical unit has its constant meaning, e.g., common programming languages. For this group of languages the paper tries to solve the problem of the incremental semantic analysis, which is based on incremental parsing. In the group of languages with a variable set of lexical units (e.g., professional typographic system TEX, it is possible to change arbitrarily the meaning of each character on the input file at any time during processing. The change takes effect immediately and its validity can be somehow limited or is given by the end of the input. For this group of languages this paper tries to solve the problem case when we use macros temporarily changing the category of arbitrary characters.

  20. Compiling the First Monolingual Lusoga Dictionary

    Directory of Open Access Journals (Sweden)

    Minah Nabirye

    2011-10-01

    Full Text Available

    Abstract: In this research article a study is made of the approach followed to compile the first-ever monolingual dictionary for Lusoga. Lusoga is a Bantu language spoken in Uganda by slightly over two mil-lion people. Being an under-resourced language, the Lusoga orthography had to be designed, a grammar written, and a corpus built, before embarking on the compilation of the dictionary. This compilation was aimed at attaining an academic degree, hence requiring a rigorous research methodology. Firstly, the prevail-ing methods for compiling dictionaries were mainly practical and insufficient in explaining the theoretical linguistic basis for dictionary compilation. Since dictionaries are based on meaning, the theory of meaning was used to account for all linguistic data considered in dictionaries. However, meaning is considered at a very abstract level, far removed from the process of compiling dictionaries. Another theory, the theory of modularity, was used to bridge the gap between the theory of meaning and the compilation process. The modular theory explains how the different modules of a language contribute information to the different parts of the dictionary article or dictionary information in general. Secondly, the research also had to contend with the different approaches for analysing Bantu languages for Bantu and European audiences. A descrip-tion of the Bantu- and European-centred approaches to Bantu studies was undertaken in respect of (a the classification of Lusoga words, and (b the specification of their citations. As a result, Lusoga lexicography deviates from the prevailing Bantu classification and citation of nouns, adjectives and verbs in particular. The dictionary was tested on two separate occasions and all the feedback was considered in the compilation pro-cess. This article, then, gives an overall summary of all the steps involved in the compilation of the Eiwanika ly'Olusoga, i.e. the Monolingual Lusoga Dictionary

  1. Multi-threaded ATLAS simulation on Intel Knights Landing processors

    Science.gov (United States)

    Farrell, Steven; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea; ATLAS Collaboration

    2017-10-01

    The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with details on its multi-threaded design. Then, we will present a performance analysis of the application on KNL devices and compare it to a traditional x86 platform to demonstrate the capabilities of the architecture and evaluate the benefits of utilizing KNL platforms like Cori for ATLAS production.

  2. AICPA allows low-cost options for compiled financial statements.

    Science.gov (United States)

    Reinstein, Alan; Luecke, Randall W

    2002-02-01

    The AICPA Accounting and Review Services Committee's (ARSC) SSARS No. 8, Amendment to Statement on Standards for Accounting and Review Services No. 1, Compilation and Review of Financial Statements, issued in October 2000, allows financial managers to provide plain-paper, compiled financial statements for the exclusive use of management. Such financial statements were disallowed in 1979 when the AICPA issued SSARS No. 1, Compilation and Review of Financial Statements. With the issuance of SSARS No. 8, financial managers can prepare plain-paper, compiled financial statements when third parties are not expected to rely on the financial statements, management acknowledges such restrictions in writing, and management acknowledges its primary responsibility for the adequacy of the financial statements.

  3. Electronic circuits for communications systems: A compilation

    Science.gov (United States)

    1972-01-01

    The compilation of electronic circuits for communications systems is divided into thirteen basic categories, each representing an area of circuit design and application. The compilation items are moderately complex and, as such, would appeal to the applications engineer. However, the rationale for the selection criteria was tailored so that the circuits would reflect fundamental design principles and applications, with an additional requirement for simplicity whenever possible.

  4. La responsabilitat davant la intel·ligència artificial en el comerç electrònic

    OpenAIRE

    Martín i Palomas, Elisabet

    2015-01-01

    Es planteja en aquesta tesi l'efecte produït sobre la responsabilitat derivada de les accions realitzades autònomament per sistemes dotats d'intel·ligència artificial, sense la participació directa de cap ésser humà, en els temes més directament relacionats amb el comerç electrònic. Per a això s'analitzen les activitats realitzades per algunes de les principals empreses internacionals de comerç electrònic, com el grup nord-americà eBay o el grup xinès Alibaba. Després de desenvolupar els prin...

  5. Compilation Techniques Specific for a Hardware Cryptography-Embedded Multimedia Mobile Processor

    Directory of Open Access Journals (Sweden)

    Masa-aki FUKASE

    2007-12-01

    Full Text Available The development of single chip VLSI processors is the key technology of ever growing pervasive computing to answer overall demands for usability, mobility, speed, security, etc. We have so far developed a hardware cryptography-embedded multimedia mobile processor architecture, HCgorilla. Since HCgorilla integrates a wide range of techniques from architectures to applications and languages, one-sided design approach is not always useful. HCgorilla needs more complicated strategy, that is, hardware/software (H/S codesign. Thus, we exploit the software support of HCgorilla composed of a Java interface and parallelizing compilers. They are assumed to be installed in servers in order to reduce the load and increase the performance of HCgorilla-embedded clients. Since compilers are the essence of software's responsibility, we focus in this article on our recent results about the design, specifications, and prototyping of parallelizing compilers for HCgorilla. The parallelizing compilers are composed of a multicore compiler and a LIW compiler. They are specified to abstract parallelism from executable serial codes or the Java interface output and output the codes executable in parallel by HCgorilla. The prototyping compilers are written in Java. The evaluation by using an arithmetic test program shows the reasonability of the prototyping compilers compared with hand compilers.

  6. The Journey of a Source Line: How your Code is Translated into a Controlled Flow of Electrons

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    In this series we help you understand the bits and pieces that make your code command the underlying hardware. A multitude of layers translate and optimize source code, written in compiled and interpreted programming languages such as C++, Python or Java, to machine language. We explain the role and behavior of the layers in question in a typical usage scenario. While our main focus is on compilers and interpreters, we also talk about other facilities - such as the operating system, instruction sets and instruction decoders.   Biographie: Andrzej Nowak runs TIK Services, a technology and innovation consultancy based in Geneva, Switzerland. In the recent past, he co-founded and sold an award-winning Fintech start-up focused on peer-to-peer lending. Earlier, Andrzej worked at Intel and in the CERN openlab. At openlab, he managed a lab collaborating with Intel and was part of the Chief Technology Office, which set up next-generation technology projects for CERN and the openlab partne...

  7. The Journey of a Source Line: How your Code is Translated into a Controlled Flow of Electrons

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    In this series we help you understand the bits and pieces that make your code command the underlying hardware. A multitude of layers translate and optimize source code, written in compiled and interpreted programming languages such as C++, Python or Java, to machine language. We explain the role and behavior of the layers in question in a typical usage scenario. While our main focus is on compilers and interpreters, we also talk about other facilities - such as the operating system, instruction sets and instruction decoders. Biographie: Andrzej Nowak runs TIK Services, a technology and innovation consultancy based in Geneva, Switzerland. In the recent past, he co-founded and sold an award-winning Fintech start-up focused on peer-to-peer lending. Earlier, Andrzej worked at Intel and in the CERN openlab. At openlab, he managed a lab collaborating with Intel and was part of the Chief Technology Office, which set up next-generation technology projects for CERN and the openlab partners.

  8. JAVA-Based Implementation of Monterey-Miami Parabolic Equation (MMPE) Model with Enhanced Visualization and Improved Method of Environmental Definition

    National Research Council Canada - National Science Library

    Ha, Yonghoon

    2000-01-01

    .... This approach requires the user to have installed both Matlab and Fortran compilers. The MMPE model and associated acoustic processing tools are now rewritten in the object-oriented language Java...

  9. 49 CFR 801.57 - Records compiled for law enforcement purposes.

    Science.gov (United States)

    2010-10-01

    ... 49 Transportation 7 2010-10-01 2010-10-01 false Records compiled for law enforcement purposes. 801... compiled for law enforcement purposes. Pursuant to 5 U.S.C. 552(b)(7), any records compiled for law or..., would disclose investigative procedures and practices, or would endanger the life or security of law...

  10. Communication overhead on the Intel Paragon, IBM SP2 and Meiko CS-2

    Science.gov (United States)

    Bokhari, Shahid H.

    1995-01-01

    Interprocessor communication overhead is a crucial measure of the power of parallel computing systems-its impact can severely limit the performance of parallel programs. This report presents measurements of communication overhead on three contemporary commercial multicomputer systems: the Intel Paragon, the IBM SP2 and the Meiko CS-2. In each case the time to communicate between processors is presented as a function of message length. The time for global synchronization and memory access is discussed. The performance of these machines in emulating hypercubes and executing random pairwise exchanges is also investigated. It is shown that the interprocessor communication time depends heavily on the specific communication pattern required. These observations contradict the commonly held belief that communication overhead on contemporary machines is independent of the placement of tasks on processors. The information presented in this report permits the evaluation of the efficiency of parallel algorithm implementations against standard baselines.

  11. On the Automatic Parallelization of Sparse and Irregular Fortran Programs

    Directory of Open Access Journals (Sweden)

    Yuan Lin

    1999-01-01

    Full Text Available Automatic parallelization is usually believed to be less effective at exploiting implicit parallelism in sparse/irregular programs than in their dense/regular counterparts. However, not much is really known because there have been few research reports on this topic. In this work, we have studied the possibility of using an automatic parallelizing compiler to detect the parallelism in sparse/irregular programs. The study with a collection of sparse/irregular programs led us to some common loop patterns. Based on these patterns new techniques were derived that produced good speedups when manually applied to our benchmark codes. More importantly, these parallelization methods can be implemented in a parallelizing compiler and can be applied automatically.

  12. NLEdit: A generic graphical user interface for Fortran programs

    Science.gov (United States)

    Curlett, Brian P.

    1994-01-01

    NLEdit is a generic graphical user interface for the preprocessing of Fortran namelist input files. The interface consists of a menu system, a message window, a help system, and data entry forms. A form is generated for each namelist. The form has an input field for each namelist variable along with a one-line description of that variable. Detailed help information, default values, and minimum and maximum allowable values can all be displayed via menu picks. Inputs are processed through a scientific calculator program that allows complex equations to be used instead of simple numeric inputs. A custom user interface is generated simply by entering information about the namelist input variables into an ASCII file. There is no need to learn a new graphics system or programming language. NLEdit can be used as a stand-alone program or as part of a larger graphical user interface. Although NLEdit is intended for files using namelist format, it can be easily modified to handle other file formats.

  13. An Efficient Compiler for Weighted Rewrite Rules

    OpenAIRE

    Mohri, Mehryar; Sproat, Richard

    1996-01-01

    Context-dependent rewrite rules are used in many areas of natural language and speech processing. Work in computational phonology has demonstrated that, given certain conditions, such rewrite rules can be represented as finite-state transducers (FSTs). We describe a new algorithm for compiling rewrite rules into FSTs. We show the algorithm to be simpler and more efficient than existing algorithms. Further, many of our applications demand the ability to compile weighted rules into weighted FST...

  14. HOPE: Just-in-time Python compiler for astrophysical computations

    Science.gov (United States)

    Akeret, Joel; Gamper, Lukas; Amara, Adam; Refregier, Alexandre

    2014-11-01

    HOPE is a specialized Python just-in-time (JIT) compiler designed for numerical astrophysical applications. HOPE focuses on a subset of the language and is able to translate Python code into C++ while performing numerical optimization on mathematical expressions at runtime. To enable the JIT compilation, the user only needs to add a decorator to the function definition. By using HOPE, the user benefits from being able to write common numerical code in Python while getting the performance of compiled implementation.

  15. Compiling the First Monolingual Lusoga Dictionary | Nabirye | Lexikos

    African Journals Online (AJOL)

    Another theory, the theory of modularity, was used to bridge the gap between the theory of meaning and the compilation process. The modular ... This article, then, gives an overall summary of all the steps involved in the compilation of the Eiwanika ly'Olusoga, i.e. the Monolingual Lusoga Dictionary. Keywords: lexicography ...

  16. Hybrid expert system implementation to determine core reload patterns

    International Nuclear Information System (INIS)

    Greek, K.J.; Robinson, A.H.

    1989-01-01

    Determining reactor reload fuel patterns is a computationally intensive problem solving process for which automation can be of significant benefit. Often much effort is expended in the search for an optimal loading. While any modern programming language could be used to automate solution, the specialized tools of artificial intelligence (AI) are the most efficient means of introducing the fuel management expert's knowledge into the search for an optimum reload pattern. Prior research in pressurized water reactor refueling strategies developed FORTRAN programs that automated an expert's basic knowledge to direct a search for an acceptable minimum peak power loading. The dissatisfaction with maintenance of compiled knowledge in FORTRAN programs has served as the motivation for the development of the SHUFFLE expert system. SHUFFLE is written in Smalltalk, an object-oriented programming language, and evaluates loadings as it generates them using a two-group, two-dimensional nodal power calculation compiled in a personal computer-based FORTRAN. This paper reviews the object-oriented representation developed to solve the core reload problem with an expert system tool and its operating prototype, SHUFFLE

  17. BGSUB and BGFIX: FORTRAN programs to correct Ge(Li) gamma-ray spectra for photopeaks from radionuclides in background

    International Nuclear Information System (INIS)

    Cutshall, N.H.; Larsen, I.L.

    1980-03-01

    Two FORTRAN programs which provide correction and error analysis for background photopeak contributions to low-level gamma-ray spectra are discussed. A peak-by-peak background subtraction approach is used instead of channel-by-channel correction. The accuracy of corrected results near background levels is substantially improved over uncorrected values

  18. The programming language EFL

    Science.gov (United States)

    Feldman, S. I.

    1978-01-01

    EFL is a comprehensive language designed to make it easy to write portable, understandable programs. It provides a rich set of data types and structures, a convenient operator set, and good control flow forms. The lexical form is easy to type and to read. Whenever possible, EFL uses the same forms that Ratfor does; in this sense EFL may be viewed as a superset of Ratfor. EFL is a well-defined language; this distinguishes it from most FORTRAN preprocessors which only add simple flow of control constructs to FORTRAN. The EFL compiler generates (possibly tailored) Standard FORTRAN as its output. EFL should catch and diagnose all syntax errors.

  19. Data compilation for particle-impact desorption, 2

    International Nuclear Information System (INIS)

    Oshiyama, Takashi; Nagai, Siro; Ozawa, Kunio; Takeutchi, Fujio.

    1985-07-01

    The particle impact desorption is one of the elementary processes of hydrogen recycling in controlled thermonuclear fusion reactors. We have surveyed the literature concerning the ion impact desorption and photon stimulated desorption published through the end of 1984 and compiled the data on the desorption cross sections and yields with the aid of a computer. This report presents the results of the compilation in graphs and tables as functions of incident energy, surface temperature and surface coverage. (author)

  20. Logical inference techniques for loop parallelization

    KAUST Repository

    Oancea, Cosmin E.; Rauchwerger, Lawrence

    2012-01-01

    This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S = Ø, where S is a set expression representing array indexes. Using a language instead of an array-abstraction representation for S results in a smaller number of conservative approximations but exhibits a potentially-high runtime cost. To alleviate this cost we introduce a language translation F from the USR set-expression language to an equally rich language of predicates (F(S) ⇒ S = Ø). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates (F(S)) into a sequence of sufficient-independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECTCLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers. Copyright © 2012 ACM.

  1. Logical inference techniques for loop parallelization

    KAUST Repository

    Oancea, Cosmin E.

    2012-01-01

    This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop\\'s memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S = Ø, where S is a set expression representing array indexes. Using a language instead of an array-abstraction representation for S results in a smaller number of conservative approximations but exhibits a potentially-high runtime cost. To alleviate this cost we introduce a language translation F from the USR set-expression language to an equally rich language of predicates (F(S) ⇒ S = Ø). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates (F(S)) into a sequence of sufficient-independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECTCLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers. Copyright © 2012 ACM.

  2. The Cost of being Object-Oriented: A Preliminary Study

    Directory of Open Access Journals (Sweden)

    Zoran Budimlić

    1999-01-01

    Full Text Available Since the introduction of the Java programming language, there has been widespread interest in the use Java for the high performance scientific computing. One major impediment to such use is the performance penalty paid relative to Fortran. To support our research on overcoming this penalty through compiler technology, we have developed a benchmark suite, called OwlPack, which is based on the popular LINPACK library. Although there are existing implementations of LINPACK in Java, most of these are produced by direct translation from Fortran. As such they do not reflect the style of programming that a good object‐oriented programmer would use in Java. Our goal is to investigate how to make object‐oriented scientific programming practical. Therefore we developed two object‐oriented versions of LINPACK in Java, a true polymorphic version and a “Lite” version designed for higher performance. We used these libraries to perform a detailed performance analysis using several leading Java compilers and virtual machines, comparing the performance of the object‐oriented versions of the benchmark with a version produced by direct translation from Fortran. Although Java implementations have been made great strides, they still fall short on programs that use the full power of Java’s object‐oriented features. Our ultimate goal is to drive research on compiler technology that will reward, rather than penalize good object‐oriented programming practice.

  3. Deployment of the OSIRIS EM-PIC code on the Intel Knights Landing architecture

    Science.gov (United States)

    Fonseca, Ricardo

    2017-10-01

    Electromagnetic particle-in-cell (EM-PIC) codes such as OSIRIS have found widespread use in modelling the highly nonlinear and kinetic processes that occur in several relevant plasma physics scenarios, ranging from astrophysical settings to high-intensity laser plasma interaction. Being computationally intensive, these codes require large scale HPC systems, and a continuous effort in adapting the algorithm to new hardware and computing paradigms. In this work, we report on our efforts on deploying the OSIRIS code on the new Intel Knights Landing (KNL) architecture. Unlike the previous generation (Knights Corner), these boards are standalone systems, and introduce several new features, include the new AVX-512 instructions and on-package MCDRAM. We will focus on the parallelization and vectorization strategies followed, as well as memory management, and present a detailed performance evaluation of code performance in comparison with the CPU code. This work was partially supported by Fundaçã para a Ciência e Tecnologia (FCT), Portugal, through Grant No. PTDC/FIS-PLA/2940/2014.

  4. Modeling high-temperature superconductors and metallic alloys on the Intel IPSC/860

    Science.gov (United States)

    Geist, G. A.; Peyton, B. W.; Shelton, W. A.; Stocks, G. M.

    Oak Ridge National Laboratory has embarked on several computational Grand Challenges, which require the close cooperation of physicists, mathematicians, and computer scientists. One of these projects is the determination of the material properties of alloys from first principles and, in particular, the electronic structure of high-temperature superconductors. While the present focus of the project is on superconductivity, the approach is general enough to permit study of other properties of metallic alloys such as strength and magnetic properties. This paper describes the progress to date on this project. We include a description of a self-consistent KKR-CPA method, parallelization of the model, and the incorporation of a dynamic load balancing scheme into the algorithm. We also describe the development and performance of a consolidated KKR-CPA code capable of running on CRAYs, workstations, and several parallel computers without source code modification. Performance of this code on the Intel iPSC/860 is also compared to a CRAY 2, CRAY YMP, and several workstations. Finally, some density of state calculations of two perovskite superconductors are given.

  5. Balancing Contention and Synchronization on the Intel Paragon

    Science.gov (United States)

    Bokhari, Shahid H.; Nicol, David M.

    1996-01-01

    The Intel Paragon is a mesh-connected distributed memory parallel computer. It uses an oblivious and deterministic message routing algorithm: this permits us to develop highly optimized schedules for frequently needed communication patterns. The complete exchange is one such pattern. Several approaches are available for carrying it out on the mesh. We study an algorithm developed by Scott. This algorithm assumes that a communication link can carry one message at a time and that a node can only transmit one message at a time. It requires global synchronization to enforce a schedule of transmissions. Unfortunately global synchronization has substantial overhead on the Paragon. At the same time the powerful interconnection mechanism of this machine permits 2 or 3 messages to share a communication link with minor overhead. It can also overlap multiple message transmission from the same node to some extent. We develop a generalization of Scott's algorithm that executes complete exchange with a prescribed contention. Schedules that incur greater contention require fewer synchronization steps. This permits us to tradeoff contention against synchronization overhead. We describe the performance of this algorithm and compare it with Scott's original algorithm as well as with a naive algorithm that does not take interconnection structure into account. The Bounded contention algorithm is always better than Scott's algorithm and outperforms the naive algorithm for all but the smallest message sizes. The naive algorithm fails to work on meshes larger than 12 x 12. These results show that due consideration of processor interconnect and machine performance parameters is necessary to obtain peak performance from the Paragon and its successor mesh machines.

  6. VIDEO-PC, SVGA Routines for FORTRAN on PC

    International Nuclear Information System (INIS)

    1994-01-01

    1 - Description of program or function: These primitives are the lowest level routines needed to perform super VGA graphics on a PC. A sample main program is included that exercises the primitives. Both Lahey and Microsoft FORTRAN's have graphics libraries. However, the libraries do not support 256 color graphics at resolutions greater than 320x200. The primitives bypass these libraries while still conforming to standard usage of BIOS. The supported graphics modes depend upon the PC graphics card and its memory. Super VGA resolutions of 640x480 and 800x600 have been tested on an ATI VGA Wonder card with 512 K memory and on several 80486 PC's (unknown manufacturers) at retail stores. 2 - Method of solution: Both Lahey and Microsoft primitives depend upon sending the correct parameters to the PC BIOS (basic input output system) as discussed in the references. 3 - Restrictions on the complexity of the problem: The primitives were developed for 256 color VGA applications. It is known, for instance, that some CGA and 16 color VGA video modes will not operate correctly using these primitives. Potential users should try the test program on their PC/video card combination to determine applicability

  7. FAIL-SAFE: Fault Aware IntelLigent Software for Exascale

    Science.gov (United States)

    2016-06-13

    and that these programs can continue to correct solutions. To broaden the impact of this research, we also needed to be able to ameliorate errors...designing an interface between the application and an introspection framework for resilience ( IFR ) based on the inference engine SHINE; (4) using...the ROSE compiler to translate annotations into reasoning rules for the IFR ; and (5) designing a Knowledge/Experience Database, which will store

  8. A Fortran program (RELAX3D) to solve the 3 dimensional Poisson (Laplace) equation

    International Nuclear Information System (INIS)

    Houtman, H.; Kost, C.J.

    1983-09-01

    RELAX3D is an efficient, user friendly, interactive FORTRAN program which solves the Poisson (Laplace) equation Λ 2 =p for a general 3 dimensional geometry consisting of Dirichlet and Neumann boundaries approximated to lie on a regular 3 dimensional mesh. The finite difference equations at these nodes are solved using a successive point-iterative over-relaxation method. A menu of commands, supplemented by HELP facility, controls the dynamic loading of the subroutine describing the problem case, the iterations to converge to a solution, and the contour plotting of any desired slices, etc

  9. FORTRAN computer programs to process Savannah River Laboratory hydrogeochemical and stream-sediment reconnaissance data

    International Nuclear Information System (INIS)

    Zinkl, R.J.; Shettel, D.L. Jr.; D'Andrea, R.F. Jr.

    1980-03-01

    FORTRAN computer programs have been written to read, edit, and reformat the hydrogeochemical and stream-sediment reconnaissance data produced by Savannah River Laboratory for the National Uranium Resource Evaluation program. The data are presorted by Savannah River Laboratory into stream sediment, ground water, and stream water for each 1 0 x 2 0 quadrangle. Extraneous information is eliminated, and missing analyses are assigned a specific value (-99999.0). Negative analyses are below the detection limit; the absolute value of a negative analysis is assumed to be the detection limit

  10. Application of Intel Many Integrated Core (MIC) architecture to the Yonsei University planetary boundary layer scheme in Weather Research and Forecasting model

    Science.gov (United States)

    Huang, Melin; Huang, Bormin; Huang, Allen H.

    2014-10-01

    The Weather Research and Forecasting (WRF) model provided operational services worldwide in many areas and has linked to our daily activity, in particular during severe weather events. The scheme of Yonsei University (YSU) is one of planetary boundary layer (PBL) models in WRF. The PBL is responsible for vertical sub-grid-scale fluxes due to eddy transports in the whole atmospheric column, determines the flux profiles within the well-mixed boundary layer and the stable layer, and thus provide atmospheric tendencies of temperature, moisture (including clouds), and horizontal momentum in the entire atmospheric column. The YSU scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. To accelerate the computation process of the YSU scheme, we employ Intel Many Integrated Core (MIC) Architecture as it is a multiprocessor computer structure with merits of efficient parallelization and vectorization essentials. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.4x. Furthermore, the same CPU-based optimizations improved the performance on Intel Xeon E5-2603 by a factor of 1.6x as compared to the first version of multi-threaded code.

  11. GKS-EZ programming manual for FORTRAN-77

    Energy Technology Data Exchange (ETDEWEB)

    Beach, R.C.

    1992-01-01

    A standard has now been adopted for subroutine packages that drive graphic devices. It is known as the Graphical Kernel system (GKS), and many commercial implementations of it are available. Unfortunately, it is a difficult system to learn, and certain functions that are important for scientific use are not provided. Although GKS can be used to achieve portability of graphic applications between graphic devices, computers, and operating systems, it can also be misused in this respect. In addition, it introduces the very real problem of portability between the various implementations of GKS. This document describes a set of FORTRAN-77 subroutines that may be used to control a wide variety of graphic devices and overcome most of these problems. Some of these subroutines are from GKS itself, while others are higher-level subroutines that call GKS subroutines. These subroutines are collectively known as GKS-EZ. The purpose is to supply someone who is not a specialist in computer graphics with a flexible, robust, and easy to learn graphics system. Users of GKS-EZ should not have much need for a full GKS manual; this document will supply all of the information to use GKS-EZ except for a few items. These missing items include the numeric identification of the supported graphic devices and the procedure for linking the GKS subroutines into a executable module.

  12. Compiled MPI: Cost-Effective Exascale Applications Development

    Energy Technology Data Exchange (ETDEWEB)

    Bronevetsky, G; Quinlan, D; Lumsdaine, A; Hoefler, T

    2012-04-10

    The complexity of petascale and exascale machines makes it increasingly difficult to develop applications that can take advantage of them. Future systems are expected to feature billion-way parallelism, complex heterogeneous compute nodes and poor availability of memory (Peter Kogge, 2008). This new challenge for application development is motivating a significant amount of research and development on new programming models and runtime systems designed to simplify large-scale application development. Unfortunately, DoE has significant multi-decadal investment in a large family of mission-critical scientific applications. Scaling these applications to exascale machines will require a significant investment that will dwarf the costs of hardware procurement. A key reason for the difficulty in transitioning today's applications to exascale hardware is their reliance on explicit programming techniques, such as the Message Passing Interface (MPI) programming model to enable parallelism. MPI provides a portable and high performance message-passing system that enables scalable performance on a wide variety of platforms. However, it also forces developers to lock the details of parallelization together with application logic, making it very difficult to adapt the application to significant changes in the underlying system. Further, MPI's explicit interface makes it difficult to separate the application's synchronization and communication structure, reducing the amount of support that can be provided by compiler and run-time tools. This is in contrast to the recent research on more implicit parallel programming models such as Chapel, OpenMP and OpenCL, which promise to provide significantly more flexibility at the cost of reimplementing significant portions of the application. We are developing CoMPI, a novel compiler-driven approach to enable existing MPI applications to scale to exascale systems with minimal modifications that can be made incrementally over

  13. Java application for the superposition T-matrix code to study the optical properties of cosmic dust aggregates

    Science.gov (United States)

    Halder, P.; Chakraborty, A.; Deb Roy, P.; Das, H. S.

    2014-09-01

    , including test data, etc.: 120226886 Distribution format: tar.gz Programming language: Java, Fortran95. Computer: Any Windows or Linux systems capable of hosting a java runtime environment, java3D and fortran95 compiler; Developed on 2.40 GHz Intel Core i3. Operating system: Any Windows or Linux systems capable of hosting a java runtime environment, java3D and fortran95 compiler. RAM: Ranging from a few Mbytes to several Gbytes, depending on the input parameters. Classification: 1.3. External routines: jfreechart-1.0.14 [1] (free plotting library for java), j3d-jre-1.5.2 [2] (3D visualization). Nature of problem: Optical properties of cosmic dust aggregates. Solution method: Java application based on Mackowski and Mischenko's Superposition T-Matrix code. Restrictions: The program is designed for single processor systems. Additional comments: The distribution file for this program is over 120 Mbytes and therefore is not delivered directly when Download or Email is requested. Instead a html file giving details of how the program can be obtained is sent. Running time: Ranging from few minutes to several hours, depending on the input parameters. References: [1] http://www.jfree.org/index.html [2] https://java3d.java.net/

  14. Performance optimization of Qbox and WEST on Intel Knights Landing

    Science.gov (United States)

    Zheng, Huihuo; Knight, Christopher; Galli, Giulia; Govoni, Marco; Gygi, Francois

    We present the optimization of electronic structure codes Qbox and WEST targeting the Intel®Xeon Phi™processor, codenamed Knights Landing (KNL). Qbox is an ab-initio molecular dynamics code based on plane wave density functional theory (DFT) and WEST is a post-DFT code for excited state calculations within many-body perturbation theory. Both Qbox and WEST employ highly scalable algorithms which enable accurate large-scale electronic structure calculations on leadership class supercomputer platforms beyond 100,000 cores, such as Mira and Theta at the Argonne Leadership Computing Facility. In this work, features of the KNL architecture (e.g. hierarchical memory) are explored to achieve higher performance in key algorithms of the Qbox and WEST codes and to develop a road-map for further development targeting next-generation computing architectures. In particular, the optimizations of the Qbox and WEST codes on the KNL platform will target efficient large-scale electronic structure calculations of nanostructured materials exhibiting complex structures and prediction of their electronic and thermal properties for use in solar and thermal energy conversion device. This work was supported by MICCoM, as part of Comp. Mats. Sci. Program funded by the U.S. DOE, Office of Sci., BES, MSE Division. This research used resources of the ALCF, which is a DOE Office of Sci. User Facility under Contract DE-AC02-06CH11357.

  15. Solidify, An LLVM pass to compile LLVM IR into Solidity

    Energy Technology Data Exchange (ETDEWEB)

    2017-07-12

    The software currently compiles LLVM IR into Solidity (Ethereum’s dominant programming language) using LLVM’s pass library. Specifically, his compiler allows us to convert an arbitrary DSL into Solidity. We focus specifically on converting Domain Specific Languages into Solidity due to their ease of use, and provable properties. By creating a toolchain to compile lightweight domain-specific languages into Ethereum's dominant language, Solidity, we allow non-specialists to effectively develop safe and useful smart contracts. For example lawyers from a certain firm can have a proprietary DSL that codifies basic laws safely converted to Solidity to be securely executed on the blockchain. In another example, a simple provenance tracking language can be compiled and securely executed on the blockchain.

  16. Compilation of current high-energy-physics experiments

    International Nuclear Information System (INIS)

    Wohl, C.G.; Kelly, R.L.; Armstrong, F.E.

    1980-04-01

    This is the third edition of a compilation of current high energy physics experiments. It is a collaborative effort of the Berkeley Particle Data Group, the SLAC library, and ten participating laboratories: Argonne (ANL), Brookhaven (BNL), CERN, DESY, Fermilab (FNAL), the Institute for Nuclear Study, Tokyo (INS), KEK, Rutherford (RHEL), Serpukhov (SERP), and SLAC. The compilation includes summaries of all high energy physics experiments at the above laboratories that (1) were approved (and not subsequently withdrawn) before about January 1980, and (2) had not completed taking of data by 1 January 1976

  17. Peregrine Software Toolchains | High-Performance Computing | NREL

    Science.gov (United States)

    Group (PGI) C/C++ and Fortran (partially supported) The PGI Accelerator compilers include NVIDIA GPU support via the directive-based OpenACC 2.5 programming model, as well as full support for NVIDIA CUDA C

  18. A Compilation of Internship Reports - 2012

    Energy Technology Data Exchange (ETDEWEB)

    Stegman M.; Morris, M.; Blackburn, N.

    2012-08-08

    This compilation documents all research project undertaken by the 2012 summer Department of Energy - Workforce Development for Teachers and Scientists interns during their internship program at Brookhaven National Laboratory.

  19. Semantics-Based Compiling: A Case Study in Type-Directed Partial Evaluation

    DEFF Research Database (Denmark)

    Danvy, Olivier; Vestergaard, René

    1996-01-01

    in the style of denotational semantics; – the output of the generated compiler is effectively three-address code, in the fashion and efficiency of the Dragon Book; – the generated compiler processes several hundred lines of source code per second. The source language considered in this case study is imperative......, block-structured, higher-order, call-by-value, allows subtyping, and obeys stack discipline. It is bigger than what is usually reported in the literature on semantics-based compiling and partial evaluation. Our compiling technique uses the first Futamura projection, i.e., we compile programs...... by specializing a definitional interpreter with respect to the program. Specialization is carried out using type-directed partial evaluation, which is a mild version of partial evaluation akin to lambda-calculus normalization. Our definitional interpreter follows the format of denotational semantics, with a clear...

  20. Compilation of results 1987

    International Nuclear Information System (INIS)

    1987-01-01

    A compilation is carried out which in concentrated form presents reports on research and development within the nuclear energy field covering a two and a half years period. The foregoing report was edited in December 1984. The projects are presendted with title, project number, responsible unit, person to contact and short result reports. The result reports consist of short summaries over each project. (L.F.)

  1. Controlling Laboratory Processes From A Personal Computer

    Science.gov (United States)

    Will, H.; Mackin, M. A.

    1991-01-01

    Computer program provides natural-language process control from IBM PC or compatible computer. Sets up process-control system that either runs without operator or run by workers who have limited programming skills. Includes three smaller programs. Two of them, written in FORTRAN 77, record data and control research processes. Third program, written in Pascal, generates FORTRAN subroutines used by other two programs to identify user commands with device-driving routines written by user. Also includes set of input data allowing user to define user commands to be executed by computer. Requires personal computer operating under MS-DOS with suitable hardware interfaces to all controlled devices. Also requires FORTRAN 77 compiler and device drivers written by user.

  2. Compiler-Assisted Multiple Instruction Rollback Recovery Using a Read Buffer. Ph.D. Thesis

    Science.gov (United States)

    Alewine, Neal Jon

    1993-01-01

    Multiple instruction rollback (MIR) is a technique to provide rapid recovery from transient processor failures and was implemented in hardware by researchers and slow in mainframe computers. Hardware-based MIR designs eliminate rollback data hazards by providing data redundancy implemented in hardware. Compiler-based MIR designs were also developed which remove rollback data hazards directly with data flow manipulations, thus eliminating the need for most data redundancy hardware. Compiler-assisted techniques to achieve multiple instruction rollback recovery are addressed. It is observed that data some hazards resulting from instruction rollback can be resolved more efficiently by providing hardware redundancy while others are resolved more efficiently with compiler transformations. A compiler-assisted multiple instruction rollback scheme is developed which combines hardware-implemented data redundancy with compiler-driven hazard removal transformations. Experimental performance evaluations were conducted which indicate improved efficiency over previous hardware-based and compiler-based schemes. Various enhancements to the compiler transformations and to the data redundancy hardware developed for the compiler-assisted MIR scheme are described and evaluated. The final topic deals with the application of compiler-assisted MIR techniques to aid in exception repair and branch repair in a speculative execution architecture.

  3. Fringe pattern demodulation using the one-dimensional continuous wavelet transform: field-programmable gate array implementation.

    Science.gov (United States)

    Abid, Abdulbasit

    2013-03-01

    This paper presents a thorough discussion of the proposed field-programmable gate array (FPGA) implementation for fringe pattern demodulation using the one-dimensional continuous wavelet transform (1D-CWT) algorithm. This algorithm is also known as wavelet transform profilometry. Initially, the 1D-CWT is programmed using the C programming language and compiled into VHDL using the ImpulseC tool. This VHDL code is implemented on the Altera Cyclone IV GX EP4CGX150DF31C7 FPGA. A fringe pattern image with a size of 512×512 pixels is presented to the FPGA, which processes the image using the 1D-CWT algorithm. The FPGA requires approximately 100 ms to process the image and produce a wrapped phase map. For performance comparison purposes, the 1D-CWT algorithm is programmed using the C language. The C code is then compiled using the Intel compiler version 13.0. The compiled code is run on a Dell Precision state-of-the-art workstation. The time required to process the fringe pattern image is approximately 1 s. In order to further reduce the execution time, the 1D-CWT is reprogramed using Intel Integrated Primitive Performance (IPP) Library Version 7.1. The execution time was reduced to approximately 650 ms. This confirms that at least sixfold speedup was gained using FPGA implementation over a state-of-the-art workstation that executes heavily optimized implementation of the 1D-CWT algorithm.

  4. Compilations and evaluations of nuclear structure and decay date

    International Nuclear Information System (INIS)

    Lorenz, A.

    The material contained in this compilation is sorted according to eight subject categories: 1. General Compilations; 2. Basic Isotopic Properties; 3. Nuclear Structure Properties; 4. Nuclear Decay Processes: Half-lives, Energies and Spectra; 5. Nuclear Decay Processes: Gamma-rays; 6. Nuclear Decay Processes: Fission Products; 7. Nuclear Decay Processes: (Others); 8. Atomic Processes

  5. LIONS: a new set of Fortran90 codes for the SPIRAL project at GANIL

    International Nuclear Information System (INIS)

    Bertrand, P.

    1995-01-01

    In this paper a set of new computer programs developed at GANIL is presented. These codes are used to study different parts of the SPIRAL project, in particular the dynamics in the CIME cyclotron and the new extraction system of the ECR ion sources. Three important modules are described: CHA3D for the evaluation of 3D electric fields with or without space charge effects, LIONS for the motion of ions and EXTRACT for the ECRIS extraction. These modules are written in Fortran90 in a ''data parallel scheme''. They work either on UNIX workstations or parallel and vectorial computers. (orig.)

  6. Writing parallel programs that work

    CERN Multimedia

    CERN. Geneva

    2012-01-01

    Serial algorithms typically run inefficiently on parallel machines. This may sound like an obvious statement, but it is the root cause of why parallel programming is considered to be difficult. The current state of the computer industry is still that almost all programs in existence are serial. This talk will describe the techniques used in the Intel Parallel Studio to provide a developer with the tools necessary to understand the behaviors and limitations of the existing serial programs. Once the limitations are known the developer can refactor the algorithms and reanalyze the resulting programs with the tools in the Intel Parallel Studio to create parallel programs that work. About the speaker Paul Petersen is a Sr. Principal Engineer in the Software and Solutions Group (SSG) at Intel. He received a Ph.D. degree in Computer Science from the University of Illinois in 1993. After UIUC, he was employed at Kuck and Associates, Inc. (KAI) working on auto-parallelizing compiler (KAP), and was involved in th...

  7. Comparison of Software Technologies for Vectorization and Parallelization

    CERN Document Server

    Lazzaro, Alfio; Nowak, Andrzej; Valsan, Liviu

    2012-01-01

    This paper demonstrates how modern software development methodologies can be used to give an existing sequential application a considerable performance speed-up on modern x86 server systems. Whereas, in the past, speed-up was directly linked to the increase in clock frequency when moving to a more modern system, current x86 servers present a plethora of “performance dimensions” that need to be harnessed with great care. The application we used is a real-life data analysis example in C++ analyzing High Energy Physics data. The key software methods used are OpenMP, Intel Threading Building Blocks (TBB), Intel Cilk Plus, and the auto-vectorization capability of the Intel compiler (Composer XE). Somewhat surprisingly, the Message Passing Interface (MPI) is successfully added, although our focus is on single-node rather than multi-node performance optimization. The paper underlines the importance of algorithmic redesign in order to optimize each performance dimension and links this to close control of the memo...

  8. 1988 Bulletin compilation and index

    International Nuclear Information System (INIS)

    1989-02-01

    This document is published to provide current information about the national program for managing spent fuel and high-level radioactive waste. This document is a compilation of issues from the 1988 calendar year. A table of contents and one index have been provided to assist in finding information

  9. 1988 Bulletin compilation and index

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1989-02-01

    This document is published to provide current information about the national program for managing spent fuel and high-level radioactive waste. This document is a compilation of issues from the 1988 calendar year. A table of contents and one index have been provided to assist in finding information.

  10. Automating Visualization Service Generation with the WATT Compiler

    Science.gov (United States)

    Bollig, E. F.; Lyness, M. D.; Erlebacher, G.; Yuen, D. A.

    2007-12-01

    As tasks and workflows become increasingly complex, software developers are devoting increasing attention to automation tools. Among many examples, the Automator tool from Apple collects components of a workflow into a single script, with very little effort on the part of the user. Tasks are most often described as a series of instructions. The granularity of the tasks dictates the tools to use. Compilers translate fine-grained instructions to assembler code, while scripting languages (ruby, perl) are used to describe a series of tasks at a higher level. Compilers can also be viewed as transformational tools: a cross-compiler can translate executable code written on one computer to assembler code understood on another, while transformational tools can translate from one high-level language to another. We are interested in creating visualization web services automatically, starting from stand-alone VTK (Visualization Toolkit) code written in Tcl. To this end, using the OCaml programming language, we have developed a compiler that translates Tcl into C++, including all the stubs, classes and methods to interface with gSOAP, a C++ implementation of the Soap 1.1/1.2 protocols. This compiler, referred to as the Web Automation and Translation Toolkit (WATT), is the first step towards automated creation of specialized visualization web services without input from the user. The WATT compiler seeks to automate all aspects of web service generation, including the transport layer, the division of labor and the details related to interface generation. The WATT compiler is part of ongoing efforts within the NSF funded VLab consortium [1] to facilitate and automate time-consuming tasks for the science related to understanding planetary materials. Through examples of services produced by WATT for the VLab portal, we will illustrate features, limitations and the improvements necessary to achieve the ultimate goal of complete and transparent automation in the generation of web

  11. Compilation of current high energy physics experiments - Sept. 1978

    Energy Technology Data Exchange (ETDEWEB)

    Addis, L.; Odian, A.; Row, G. M.; Ward, C. E. W.; Wanderer, P.; Armenteros, R.; Joos, P.; Groves, T. H.; Oyanagi, Y.; Arnison, G. T. J.; Antipov, Yu; Barinov, N.

    1978-09-01

    This compilation of current high-energy physics experiments is a collaborative effort of the Berkeley Particle Data Group, the SLAC library, and the nine participating laboratories: Argonne (ANL), Brookhaven (BNL), CERN, DESY, Fermilab (FNAL), KEK, Rutherford (RHEL), Serpukhov (SERP), and SLAC. Nominally, the compilation includes summaries of all high-energy physics experiments at the above laboratories that were approved (and not subsequently withdrawn) before about June 1978, and had not completed taking of data by 1 January 1975. The experimental summaries are supplemented with three indexes to the compilation, several vocabulary lists giving names or abbreviations used, and a short summary of the beams at each of the laboratories (except Rutherford). The summaries themselves are included on microfiche. (RWR)

  12. DrawCompileEvolve: Sparking interactive evolutionary art with human creations

    DEFF Research Database (Denmark)

    Zhang, Jinhong; Taarnby, Rasmus; Liapis, Antonios

    2015-01-01

    This paper presents DrawCompileEvolve, a web-based drawing tool which allows users to draw simple primitive shapes, group them together or define patterns in their groupings (e.g. symmetry, repetition). The user’s vector drawing is then compiled into an indirectly encoded genetic representation......, which can be evolved interactively, allowing the user to change the image’s colors, patterns and ultimately transform it. The human artist has direct control while drawing the initial seed of an evolutionary run and indirect control while interactively evolving it, thus making DrawCompileEvolve a mixed...

  13. Vectorization vs. compilation in query execution

    NARCIS (Netherlands)

    J. Sompolski (Juliusz); M. Zukowski (Marcin); P.A. Boncz (Peter)

    2011-01-01

    textabstractCompiling database queries into executable (sub-) programs provides substantial benefits comparing to traditional interpreted execution. Many of these benefits, such as reduced interpretation overhead, better instruction code locality, and providing opportunities to use SIMD

  14. ptchg: A FORTRAN program for point-charge calculations of electric field gradients (EFGs)

    Science.gov (United States)

    Spearing, Dane R.

    1994-05-01

    ptchg, a FORTRAN program, has been developed to calculate electric field gradients (EFG) around an atomic site in crystalline solids using the point-charge direct-lattice summation method. It uses output from the crystal structure generation program Atoms as its input. As an application of ptchg, a point-charge calculation of the EFG quadrupolar parameters around the oxygen site in SiO 2 cristobalite is demonstrated. Although point-charge calculations of electric field gradients generally are limited to ionic compounds, the computed quadrupolar parameters around the oxygen site in SiO 2 cristobalite, a highly covalent material, are in good agreement with the experimentally determined values from nuclear magnetic resonance (NMR) spectroscopy.

  15. Production compilation : A simple mechanism to model complex skill acquisition

    NARCIS (Netherlands)

    Taatgen, N.A.; Lee, F.J.

    2003-01-01

    In this article we describe production compilation, a mechanism for modeling skill acquisition. Production compilation has been developed within the ACT-Rational (ACT-R; J. R. Anderson, D. Bothell, M. D. Byrne, & C. Lebiere, 2002) cognitive architecture and consists of combining and specializing

  16. 32 CFR 806b.19 - Information compiled in anticipation of civil action.

    Science.gov (United States)

    2010-07-01

    ... 32 National Defense 6 2010-07-01 2010-07-01 false Information compiled in anticipation of civil action. 806b.19 Section 806b.19 National Defense Department of Defense (Continued) DEPARTMENT OF THE AIR... compiled in anticipation of civil action. Withhold records compiled in connection with a civil action or...

  17. Charged particle induced thermonuclear reaction rates: a compilation for astrophysics

    International Nuclear Information System (INIS)

    Grama, C.

    1999-01-01

    We report on the results of the European network NACRE (Nuclear Astrophysics Compilation of REaction rates). The principal reason for setting up the NACRE network has been the necessity of building up a well-documented and detailed compilation of rates for charged-particle induced reactions on stable targets up to Si and on unstable nuclei of special significance in astrophysics. This work is meant to supersede the only existing compilation of reaction rates issued by Fowler and collaborators. The main goal of NACRE network was the transparency in the procedure of calculating the rates. More specifically this compilation aims at: 1. updating the experimental and theoretical data; 2. distinctly identifying the sources of the data used in rate calculation; 3. evaluating the uncertainties and errors; 4. providing numerically integrated reaction rates; 5. providing reverse reaction rates and analytical approximations of the adopted rates. The cross section data and/or resonance parameters for a total of 86 charged-particle induced reactions are given and the corresponding reaction rates are calculated and given in tabular form. Uncertainties are analyzed and realistic upper and lower bounds of the rates are determined. The compilation is concerned with the reaction rates that are large enough for the target lifetimes shorter than the age of the Universe, taken equal to 15 x 10 9 y. The reaction rates are provided for temperatures lower than T = 10 10 K. In parallel with the rate compilation a cross section data base has been created and located at the site http://pntpm.ulb.ac.be/nacre..htm. (authors)

  18. Asian collaboration on nuclear reaction data compilation

    International Nuclear Information System (INIS)

    Aikawa, Masayuki; Furutachi, Naoya; Kato, Kiyoshi; Makinaga, Ayano; Devi, Vidya; Ichinkhorloo, Dagvadorj; Odsuren, Myagmarjav; Tsubakihara, Kohsuke; Katayama, Toshiyuki; Otuka, Naohiko

    2013-01-01

    Nuclear reaction data are essential for research and development in nuclear engineering, radiation therapy, nuclear physics and astrophysics. Experimental data must be compiled in a database and be accessible to nuclear data users. One of the nuclear reaction databases is the EXFOR database maintained by the International Network of Nuclear Reaction Data Centres (NRDC) under the auspices of the International Atomic Energy Agency. Recently, collaboration among the Asian NRDC members is being further developed under the support of the Asia-Africa Science Platform Program of the Japan Society for the Promotion of Science. We report the activity for three years to develop the Asian collaboration on nuclear reaction data compilation. (author)

  19. Fault-tolerant digital microfluidic biochips compilation and synthesis

    CERN Document Server

    Pop, Paul; Stuart, Elena; Madsen, Jan

    2016-01-01

    This book describes for researchers in the fields of compiler technology, design and test, and electronic design automation the new area of digital microfluidic biochips (DMBs), and thus offers a new application area for their methods.  The authors present a routing-based model of operation execution, along with several associated compilation approaches, which progressively relax the assumption that operations execute inside fixed rectangular modules.  Since operations can experience transient faults during the execution of a bioassay, the authors show how to use both offline (design time) and online (runtime) recovery strategies. The book also presents methods for the synthesis of fault-tolerant application-specific DMB architectures. ·         Presents the current models used for the research on compilation and synthesis techniques of DMBs in a tutorial fashion; ·         Includes a set of “benchmarks”, which are presented in great detail and includes the source code of most of the t...

  20. Perbandingan Bubble Sort dengan Insertion Sort pada Bahasa Pemrograman C dan Fortran

    Directory of Open Access Journals (Sweden)

    Reina Reina

    2013-12-01

    Full Text Available Sorting is a basic algorithm studied by students of computer science major. Sorting algorithm is the basis of other algorithms such as searching algorithm, pattern matching algorithm. Bubble sort is a popular basic sorting algorithm due to its easiness to be implemented. Besides bubble sort, there is insertion sort. It is lesspopular than bubble sort because it has more difficult algorithm. This paper discusses about process time between insertion sort and bubble sort with two kinds of data. First is randomized data, and the second is data of descending list. Comparison of process time has been done in two kinds of programming language that is C programming language and FORTRAN programming language. The result shows that bubble sort needs more time than insertion sort does.

  1. A survey of compiler development aids. [concerning lexical, syntax, and semantic analysis

    Science.gov (United States)

    Buckles, B. P.; Hodges, B. C.; Hsia, P.

    1977-01-01

    A theoretical background was established for the compilation process by dividing it into five phases and explaining the concepts and algorithms that underpin each. The five selected phases were lexical analysis, syntax analysis, semantic analysis, optimization, and code generation. Graph theoretical optimization techniques were presented, and approaches to code generation were described for both one-pass and multipass compilation environments. Following the initial tutorial sections, more than 20 tools that were developed to aid in the process of writing compilers were surveyed. Eight of the more recent compiler development aids were selected for special attention - SIMCMP/STAGE2, LANG-PAK, COGENT, XPL, AED, CWIC, LIS, and JOCIT. The impact of compiler development aids were assessed some of their shortcomings and some of the areas of research currently in progress were inspected.

  2. SPECFUN1, Portable Special FORTRAN Routines with Test Drivers

    International Nuclear Information System (INIS)

    Cody, W.J.

    1993-01-01

    1 - Description of program or function: SPECFUN is a collection of transportable FORTRAN subroutines and test drivers to evaluate certain special functions. The individual subroutines are - Name/Description: ALGAMA: Log gamma function, DAW: Dawson's integral, EI: Exponential integrals, ERF: Error function, ERFC: Complementary error function, GAMMA: Gamma function, I0: Bessel function I-sub-0, I1: Bessel function I-sub-1, J0Y0: Bessel functions J-sub-0 and Y-sub-0, J1Y1: Bessel functions J-sub-1 and Y-sub-1, K0: Bessel function K-sub-0, K1: Bessel function K-sub-1, PSI: Logarithmic derivative of the gamma function, REN: Random number generator, RIBESL: Bessel function I-sub-(N,ALPHA), RJBESL: Bessel function J-sub-(N,ALPHA), RKBESL: Bessel function K-sub-(N,ALPHA), RYBESL: Bessel function Y-sub-(N,ALPHA), MACHAR: Machine-dependent constants. 2 - Method of solution: SPECFUN generally uses rational mini-max approximations for functions of one variable and recurrence relations for functions of two or more variables. 3 - Restrictions on the complexity of the problem: Accuracy is targeted at between 18 and 20 significant decimal digits

  3. SMMP v. 3.0—Simulating proteins and protein interactions in Python and Fortran

    Science.gov (United States)

    Meinke, Jan H.; Mohanty, Sandipan; Eisenmenger, Frank; Hansmann, Ulrich H. E.

    2008-03-01

    We describe a revised and updated version of the program package SMMP. SMMP is an open-source FORTRAN package for molecular simulation of proteins within the standard geometry model. It is designed as a simple and inexpensive tool for researchers and students to become familiar with protein simulation techniques. SMMP 3.0 sports a revised API increasing its flexibility, an implementation of the Lund force field, multi-molecule simulations, a parallel implementation of the energy function, Python bindings, and more. Program summaryTitle of program:SMMP Catalogue identifier:ADOJ_v3_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADOJ_v3_0.html Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Licensing provisions:Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html Programming language used:FORTRAN, Python No. of lines in distributed program, including test data, etc.:52 105 No. of bytes in distributed program, including test data, etc.:599 150 Distribution format:tar.gz Computer:Platform independent Operating system:OS independent RAM:2 Mbytes Classification:3 Does the new version supersede the previous version?:Yes Nature of problem:Molecular mechanics computations and Monte Carlo simulation of proteins. Solution method:Utilizes ECEPP2/3, FLEX, and Lund potentials. Includes Monte Carlo simulation algorithms for canonical, as well as for generalized ensembles. Reasons for new version:API changes and increased functionality. Summary of revisions:Added Lund potential; parameters used in subroutines are now passed as arguments; multi-molecule simulations; parallelized energy calculation for ECEPP; Python bindings. Restrictions:The consumed CPU time increases with the size of protein molecule. Running time:Depends on the size of the simulated molecule.

  4. Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

    Science.gov (United States)

    Nadkarni, P M; Miller, P L

    1991-01-01

    A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.

  5. abc: The AspectBench Compiler for AspectJ

    DEFF Research Database (Denmark)

    Allan, Chris; Avgustinov, Pavel; Christensen, Aske Simon

    2005-01-01

    abc is an extensible, optimising compiler for AspectJ. It has been designed as a workbench for experimental research in aspect-oriented programming languages and compilers. We outline a programme of research in these areas, and we review how abc can help in achieving those research goals...

  6. Program pseudo-random number generator for microcomputers

    International Nuclear Information System (INIS)

    Ososkov, G.A.

    1980-01-01

    Program pseudo-random number generators (PNG) intended for the test of control equipment and communication channels are considered. In the case of 8-bit microcomputers it is necessary to assign 4 words of storage to allocate one random number. The proposed economical algorithms of the random number generation are based on the idea of the ''mixing'' of such quarters of the preceeding random number to obtain the next one. Test results of the PNG are displayed for two such generators. A FORTRAN variant of the PNG is presented along with a program realizing the PNG made on the base of the INTEL-8080 autocode

  7. Improved neural network modeling of inverse lens distortion

    CSIR Research Space (South Africa)

    De Villiers, JP

    2011-04-01

    Full Text Available . This allows points which are too far o the optical axis to fall on the CCD, to not be mapped erroneously onto the image. 5. RESULTS All timings in Tables 1, 2 and 3 were performed on the same computer. It had an Intel i7-960 CPU, and an NVidia GTX480 GPUs... in pixels for each ANN was averaged and is presented along with the RMS error of the best ANN, the timing is openMP (www.openMP.org) is API for multi-platform shared-memory parallel programming in C/C++ and Fortran. yCUDA (www.nvidia.com/object/cuda home...

  8. The Katydid system for compiling KEE applications to Ada

    Science.gov (United States)

    Filman, Robert E.; Bock, Conrad; Feldman, Roy

    1990-01-01

    Components of a system known as Katydid are developed in an effort to compile knowledge-based systems developed in a multimechanism integrated environment (KEE) to Ada. The Katydid core is an Ada library supporting KEE object functionality, and the other elements include a rule compiler, a LISP-to-Ada translator, and a knowledge-base dumper. Katydid employs translation mechanisms that convert LISP knowledge structures and rules to Ada and utilizes basic prototypes of a run-time KEE object-structure library module for Ada. Preliminary results include the semiautomatic compilation of portions of a simple expert system to run in an Ada environment with the described algorithms. It is suggested that Ada can be employed for AI programming and implementation, and the Katydid system is being developed to include concurrency and synchronization mechanisms.

  9. Proceedings of the workshop on Compilation of (Symbolic) Languages for Parallel Computers

    Energy Technology Data Exchange (ETDEWEB)

    Foster, I.; Tick, E. (comp.)

    1991-11-01

    This report comprises the abstracts and papers for the talks presented at the Workshop on Compilation of (Symbolic) Languages for Parallel Computers, held October 31--November 1, 1991, in San Diego. These unreferred contributions were provided by the participants for the purpose of this workshop; many of them will be published elsewhere in peer-reviewed conferences and publications. Our goal is planning this workshop was to bring together researchers from different disciplines with common problems in compilation. In particular, we wished to encourage interaction between researchers working in compilation of symbolic languages and those working on compilation of conventional, imperative languages. The fundamental problems facing researchers interested in compilation of logic, functional, and procedural programming languages for parallel computers are essentially the same. However, differences in the basic programming paradigms have led to different communities emphasizing different species of the parallel compilation problem. For example, parallel logic and functional languages provide dataflow-like formalisms in which control dependencies are unimportant. Hence, a major focus of research in compilation has been on techniques that try to infer when sequential control flow can safely be imposed. Granularity analysis for scheduling is a related problem. The single- assignment property leads to a need for analysis of memory use in order to detect opportunities for reuse. Much of the work in each of these areas relies on the use of abstract interpretation techniques.

  10. MATH77 - A LIBRARY OF MATHEMATICAL SUBPROGRAMS FOR FORTRAN 77, RELEASE 4.0

    Science.gov (United States)

    Lawson, C. L.

    1994-01-01

    MATH77 is a high quality library of ANSI FORTRAN 77 subprograms implementing contemporary algorithms for the basic computational processes of science and engineering. The portability of MATH77 meets the needs of present-day scientists and engineers who typically use a variety of computing environments. Release 4.0 of MATH77 contains 454 user-callable and 136 lower-level subprograms. Usage of the user-callable subprograms is described in 69 sections of the 416 page users' manual. The topics covered by MATH77 are indicated by the following list of chapter titles in the users' manual: Mathematical Functions, Pseudo-random Number Generation, Linear Systems of Equations and Linear Least Squares, Matrix Eigenvalues and Eigenvectors, Matrix Vector Utilities, Nonlinear Equation Solving, Curve Fitting, Table Look-Up and Interpolation, Definite Integrals (Quadrature), Ordinary Differential Equations, Minimization, Polynomial Rootfinding, Finite Fourier Transforms, Special Arithmetic , Sorting, Library Utilities, Character-based Graphics, and Statistics. Besides subprograms that are adaptations of public domain software, MATH77 contains a number of unique packages developed by the authors of MATH77. Instances of the latter type include (1) adaptive quadrature, allowing for exceptional generality in multidimensional cases, (2) the ordinary differential equations solver used in spacecraft trajectory computation for JPL missions, (3) univariate and multivariate table look-up and interpolation, allowing for "ragged" tables, and providing error estimates, and (4) univariate and multivariate derivative-propagation arithmetic. MATH77 release 4.0 is a subroutine library which has been carefully designed to be usable on any computer system that supports the full ANSI standard FORTRAN 77 language. It has been successfully implemented on a CRAY Y/MP computer running UNICOS, a UNISYS 1100 computer running EXEC 8, a DEC VAX series computer running VMS, a Sun4 series computer running Sun

  11. Analysis of computer programming languages

    International Nuclear Information System (INIS)

    Risset, Claude Alain

    1967-01-01

    This research thesis aims at trying to identify some methods of syntax analysis which can be used for computer programming languages while putting aside computer devices which influence the choice of the programming language and methods of analysis and compilation. In a first part, the author proposes attempts of formalization of Chomsky grammar languages. In a second part, he studies analytical grammars, and then studies a compiler or analytic grammar for the Fortran language

  12. DLVM: A modern compiler infrastructure for deep learning systems

    OpenAIRE

    Wei, Richard; Schwartz, Lane; Adve, Vikram

    2017-01-01

    Deep learning software demands reliability and performance. However, many of the existing deep learning frameworks are software libraries that act as an unsafe DSL in Python and a computation graph interpreter. We present DLVM, a design and implementation of a compiler infrastructure with a linear algebra intermediate representation, algorithmic differentiation by adjoint code generation, domain-specific optimizations and a code generator targeting GPU via LLVM. Designed as a modern compiler ...

  13. Parallelization of MCNP4 code by using simple FORTRAN algorithms

    International Nuclear Information System (INIS)

    Yazid, P.I.; Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka.

    1993-12-01

    Simple FORTRAN algorithms, that rely only on open, close, read and write statements, together with disk files and some UNIX commands have been applied to parallelization of MCNP4. The code, named MCNPNFS, maintains almost all capabilities of MCNP4 in solving shielding problems. It is able to perform parallel computing on a set of any UNIX workstations connected by a network, regardless of the heterogeneity in hardware system, provided that all processors produce a binary file in the same format. Further, it is confirmed that MCNPNFS can be executed also on Monte-4 vector-parallel computer. MCNPNFS has been tested intensively by executing 5 photon-neutron benchmark problems, a spent fuel cask problem and 17 sample problems included in the original code package of MCNP4. Three different workstations, connected by a network, have been used to execute MCNPNFS in parallel. By measuring CPU time, the parallel efficiency is determined to be 58% to 99% and 86% in average. On Monte-4, MCNPNFS has been executed using 4 processors concurrently and has achieved the parallel efficiency of 79% in average. (author)

  14. MONTEC, an interactive fortran program to simulate radiation dose and dose-rate responses of populations

    International Nuclear Information System (INIS)

    Perry, K.A.; Szekely, J.G.

    1983-09-01

    The computer program MONTEC was written to simulate the distribution of responses in a population whose members are exposed to multiple radiation doses at variable dose rates. These doses and dose rates are randomly selected from lognormal distributions. The individual radiation responses are calculated from three equations, which include dose and dose-rate terms. Other response-dose/rate relationships or distributions can be incorporated by the user as the need arises. The purpose of this documentation is to provide a complete operating manual for the program. This version is written in FORTRAN-10 for the DEC system PDP-10

  15. Cross-compilation of ATLAS online software to the power PC-Vx works system

    International Nuclear Information System (INIS)

    Tian Yuren; Li Jin; Ren Zhengyu; Zhu Kejun

    2005-01-01

    BES III, selected ATLAS online software as a framework of its run-control system. BES III applied Power PC-VxWorks system on its front-end readout system, so it is necessary to cross-compile this software to PowerPC-VxWorks system. The article demonstrates several aspects related to this project, such as the structure and organization of the ATLAS online software, the application of CMT tool while cross-compiling, the selection and configuration of the cross-compiler, methods to solve various problems due to the difference of compiler and operating system etc. The software, after cross-compiling, can normally run, and makes up a complete run-control system with the software running on Linux system. (authors)

  16. Transportation legislative data base: State radioactive materials transportation statute compilation, 1989--1993

    International Nuclear Information System (INIS)

    1994-04-01

    The Transportation Legislative Data Base (TLDB) is a computer-based information service containing summaries of federal, state and certain local government statutes and regulations relating to the transportation of radioactive materials in the United States. The TLDB has been operated by the National Conference of State Legislatures (NCSL) under cooperative agreement with the US Department of Energy's (DOE) Office of Civilian Radioactive Waste Management since 1992. The data base system serves the legislative and regulatory information needs of federal, state, tribal and local governments, the affected private sector and interested members of the general public. Users must be approved by DOE and NCSL. This report is a state statute compilation that updates the 1989 compilation produced by Battelle Memorial Institute, the previous manager of the data base. This compilation includes statutes not included in the prior compilation, as well as newly enacted laws. Statutes not included in the prior compilation show an enactment date prior to 1989. Statutes that deal with low-level radioactive waste transportation are included in the data base as are statutes from the states of Alaska and Hawaii. Over 155 new entries to the data base are summarized in this compilation

  17. A Language for Specifying Compiler Optimizations for Generic Software

    Energy Technology Data Exchange (ETDEWEB)

    Willcock, Jeremiah J. [Indiana Univ., Bloomington, IN (United States)

    2007-01-01

    Compiler optimization is important to software performance, and modern processor architectures make optimization even more critical. However, many modern software applications use libraries providing high levels of abstraction. Such libraries often hinder effective optimization — the libraries are difficult to analyze using current compiler technology. For example, high-level libraries often use dynamic memory allocation and indirectly expressed control structures, such as iteratorbased loops. Programs using these libraries often cannot achieve an optimal level of performance. On the other hand, software libraries have also been recognized as potentially aiding in program optimization. One proposed implementation of library-based optimization is to allow the library author, or a library user, to define custom analyses and optimizations. Only limited systems have been created to take advantage of this potential, however. One problem in creating a framework for defining new optimizations and analyses is how users are to specify them: implementing them by hand inside a compiler is difficult and prone to errors. Thus, a domain-specific language for librarybased compiler optimizations would be beneficial. Many optimization specification languages have appeared in the literature, but they tend to be either limited in power or unnecessarily difficult to use. Therefore, I have designed, implemented, and evaluated the Pavilion language for specifying program analyses and optimizations, designed for library authors and users. These analyses and optimizations can be based on the implementation of a particular library, its use in a specific program, or on the properties of a broad range of types, expressed through concepts. The new system is intended to provide a high level of expressiveness, even though the intended users are unlikely to be compiler experts.

  18. Compiling the parallel programming language NestStep to the CELL processor

    OpenAIRE

    Holm, Magnus

    2010-01-01

    The goal of this project is to create a source-to-source compiler which will translate NestStep code to C code. The compiler's job is to replace NestStep constructs with a series of function calls to the NestStep runtime system. NestStep is a parallel programming language extension based on the BSP model. It adds constructs for parallel programming on top of an imperative programming language. For this project, only constructs extending the C language are relevant. The output code will compil...

  19. DET/MPS - THE GSFC ENERGY BALANCE PROGRAM, DIRECT ENERGY TRANSFER/MULTIMISSION SPACECRAFT MODULAR POWER SYSTEM (UNIX VERSION)

    Science.gov (United States)

    Jagielski, J. M.

    1994-01-01

    The DET/MPS programs model and simulate the Direct Energy Transfer and Multimission Spacecraft Modular Power System in order to aid both in design and in analysis of orbital energy balance. Typically, the DET power system has the solar array directly to the spacecraft bus, and the central building block of MPS is the Standard Power Regulator Unit. DET/MPS allows a minute-by-minute simulation of the power system's performance as it responds to various orbital parameters, focusing its output on solar array output and battery characteristics. While this package is limited in terms of orbital mechanics, it is sufficient to calculate eclipse and solar array data for circular or non-circular orbits. DET/MPS can be adjusted to run one or sequential orbits up to about one week, simulated time. These programs have been used on a variety of Goddard Space Flight Center spacecraft projects. DET/MPS is written in FORTRAN 77 with some VAX-type extensions. Any FORTRAN 77 compiler that includes VAX extensions should be able to compile and run the program with little or no modifications. The compiler must at least support free-form (or tab-delineated) source format and 'do do-while end-do' control structures. DET/MPS is available for three platforms: GSC-13374, for DEC VAX series computers running VMS, is available in DEC VAX Backup format on a 9-track 1600 BPI tape (standard distribution) or TK50 tape cartridge; GSC-13443, for UNIX-based computers, is available on a .25 inch streaming magnetic tape cartridge in UNIX tar format; and GSC-13444, for Macintosh computers running AU/X with either the NKR FORTRAN or AbSoft MacFORTRAN II compilers, is available on a 3.5 inch 800K Macintosh format diskette. Source code and test data are supplied. The UNIX version of DET requires 90K of main memory for execution. DET/MPS was developed in 1990. A/UX and Macintosh are registered trademarks of Apple Computer, Inc. VMS, DEC VAX and TK50 are trademarks of Digital Equipment Corporation. UNIX is a

  20. DET/MPS - THE GSFC ENERGY BALANCE PROGRAM, DIRECT ENERGY TRANSFER/MULTIMISSION SPACECRAFT MODULAR POWER SYSTEM (MACINTOSH A/UX VERSION)

    Science.gov (United States)

    Jagielski, J. M.

    1994-01-01

    The DET/MPS programs model and simulate the Direct Energy Transfer and Multimission Spacecraft Modular Power System in order to aid both in design and in analysis of orbital energy balance. Typically, the DET power system has the solar array directly to the spacecraft bus, and the central building block of MPS is the Standard Power Regulator Unit. DET/MPS allows a minute-by-minute simulation of the power system's performance as it responds to various orbital parameters, focusing its output on solar array output and battery characteristics. While this package is limited in terms of orbital mechanics, it is sufficient to calculate eclipse and solar array data for circular or non-circular orbits. DET/MPS can be adjusted to run one or sequential orbits up to about one week, simulated time. These programs have been used on a variety of Goddard Space Flight Center spacecraft projects. DET/MPS is written in FORTRAN 77 with some VAX-type extensions. Any FORTRAN 77 compiler that includes VAX extensions should be able to compile and run the program with little or no modifications. The compiler must at least support free-form (or tab-delineated) source format and 'do do-while end-do' control structures. DET/MPS is available for three platforms: GSC-13374, for DEC VAX series computers running VMS, is available in DEC VAX Backup format on a 9-track 1600 BPI tape (standard distribution) or TK50 tape cartridge; GSC-13443, for UNIX-based computers, is available on a .25 inch streaming magnetic tape cartridge in UNIX tar format; and GSC-13444, for Macintosh computers running AU/X with either the NKR FORTRAN or AbSoft MacFORTRAN II compilers, is available on a 3.5 inch 800K Macintosh format diskette. Source code and test data are supplied. The UNIX version of DET requires 90K of main memory for execution. DET/MPS was developed in 1990. A/UX and Macintosh are registered trademarks of Apple Computer, Inc. VMS, DEC VAX and TK50 are trademarks of Digital Equipment Corporation. UNIX is a

  1. DET/MPS - THE GSFC ENERGY BALANCE PROGRAM, DIRECT ENERGY TRANSFER/MULTIMISSION SPACECRAFT MODULAR POWER SYSTEM (DEC VAX VMS VERSION)

    Science.gov (United States)

    Jagielski, J. M.

    1994-01-01

    The DET/MPS programs model and simulate the Direct Energy Transfer and Multimission Spacecraft Modular Power System in order to aid both in design and in analysis of orbital energy balance. Typically, the DET power system has the solar array directly to the spacecraft bus, and the central building block of MPS is the Standard Power Regulator Unit. DET/MPS allows a minute-by-minute simulation of the power system's performance as it responds to various orbital parameters, focusing its output on solar array output and battery characteristics. While this package is limited in terms of orbital mechanics, it is sufficient to calculate eclipse and solar array data for circular or non-circular orbits. DET/MPS can be adjusted to run one or sequential orbits up to about one week, simulated time. These programs have been used on a variety of Goddard Space Flight Center spacecraft projects. DET/MPS is written in FORTRAN 77 with some VAX-type extensions. Any FORTRAN 77 compiler that includes VAX extensions should be able to compile and run the program with little or no modifications. The compiler must at least support free-form (or tab-delineated) source format and 'do do-while end-do' control structures. DET/MPS is available for three platforms: GSC-13374, for DEC VAX series computers running VMS, is available in DEC VAX Backup format on a 9-track 1600 BPI tape (standard distribution) or TK50 tape cartridge; GSC-13443, for UNIX-based computers, is available on a .25 inch streaming magnetic tape cartridge in UNIX tar format; and GSC-13444, for Macintosh computers running AU/X with either the NKR FORTRAN or AbSoft MacFORTRAN II compilers, is available on a 3.5 inch 800K Macintosh format diskette. Source code and test data are supplied. The UNIX version of DET requires 90K of main memory for execution. DET/MPS was developed in 1990. A/UX and Macintosh are registered trademarks of Apple Computer, Inc. VMS, DEC VAX and TK50 are trademarks of Digital Equipment Corporation. UNIX is a

  2. Compilation of solar abundance data

    International Nuclear Information System (INIS)

    Hauge, Oe.; Engvold, O.

    1977-01-01

    Interest in the previous compilations of solar abundance data by the same authors (ITA--31 and ITA--39) has led to this third, revised edition. Solar abundance data of 67 elements are tabulated and in addition upper limits for the abundances of 5 elements are listed. References are made to 167 papers. A recommended abundance value is given for each element. (JIW)

  3. Compilation of piping benchmark problems - Cooperative international effort

    Energy Technology Data Exchange (ETDEWEB)

    McAfee, W J [comp.

    1979-06-01

    This report is the culmination of an effort initiated in 1976 by the IWGFR to evaluate detailed and simplified analysis methods for piping systems with particular emphasis on piping bends. The procedure was to collect from participating member IWGFR countries descriptions of tests and test results for piping systems or bends, to compile, evaluate, and issue a selected number of these problems for analysis, and to compile and make a preliminary evaluation of the analysis results. The Oak Ridge National Laboratory agreed to coordinate this activity, including compilation of the original problems and the final analyses results. Of the problem descriptions submitted three were selected to be used. These were issued in December 1977. As a follow-on activity, addenda were issued that provided additional data or corrections to the original problem statement. A variety of both detailed and simplified analysis solutions were obtained. A brief comparative assessment of the analyses is contained in this report. All solutions submitted have also been included in order to provide users of this report the information necessary to make their own comparisons or evaluations.

  4. Compiling knowledge-based systems from KEE to Ada

    Science.gov (United States)

    Filman, Robert E.; Bock, Conrad; Feldman, Roy

    1990-01-01

    The dominant technology for developing AI applications is to work in a multi-mechanism, integrated, knowledge-based system (KBS) development environment. Unfortunately, systems developed in such environments are inappropriate for delivering many applications - most importantly, they carry the baggage of the entire Lisp environment and are not written in conventional languages. One resolution of this problem would be to compile applications from complex environments to conventional languages. Here the first efforts to develop a system for compiling KBS developed in KEE to Ada (trademark). This system is called KATYDID, for KEE/Ada Translation Yields Development Into Delivery. KATYDID includes early prototypes of a run-time KEE core (object-structure) library module for Ada, and translation mechanisms for knowledge structures, rules, and Lisp code to Ada. Using these tools, part of a simple expert system was compiled (not quite automatically) to run in a purely Ada environment. This experience has given us various insights on Ada as an artificial intelligence programming language, potential solutions of some of the engineering difficulties encountered in early work, and inspiration on future system development.

  5. Compilation of piping benchmark problems - Cooperative international effort

    International Nuclear Information System (INIS)

    McAfee, W.J.

    1979-06-01

    This report is the culmination of an effort initiated in 1976 by the IWGFR to evaluate detailed and simplified analysis methods for piping systems with particular emphasis on piping bends. The procedure was to collect from participating member IWGFR countries descriptions of tests and test results for piping systems or bends, to compile, evaluate, and issue a selected number of these problems for analysis, and to compile and make a preliminary evaluation of the analysis results. The Oak Ridge National Laboratory agreed to coordinate this activity, including compilation of the original problems and the final analyses results. Of the problem descriptions submitted three were selected to be used. These were issued in December 1977. As a follow-on activity, addenda were issued that provided additional data or corrections to the original problem statement. A variety of both detailed and simplified analysis solutions were obtained. A brief comparative assessment of the analyses is contained in this report. All solutions submitted have also been included in order to provide users of this report the information necessary to make their own comparisons or evaluations

  6. HAUFES : a FORTRAN code for the calculation of compound nuclear cross-sections by Hauser-Feshbach theory

    International Nuclear Information System (INIS)

    Viyogi, Y.P.; Ganguly, N.K.

    1975-01-01

    The FORTRAN code described in the report has been developed for the BESM-6 computer with a view to calculate the cross-section of reactions proceeding via the formation of compound nucleus for all open two-body reaction channels using Hauser-Feshbach theory with Moldauer's correction for the fluctuation of level widths. The code can also be used to analyse data from 'crystal blocking' experiments to obtain nuclear level densities. The report describes the input-output specifications along with a short account of the algorithm of the program. (author)

  7. A new Fortran 90 program to compute regular and irregular associated Legendre functions (new version announcement)

    Science.gov (United States)

    Schneider, Barry I.; Segura, Javier; Gil, Amparo; Guan, Xiaoxu; Bartschat, Klaus

    2018-04-01

    This is a revised and updated version of a modern Fortran 90 code to compute the regular Plm (x) and irregular Qlm (x) associated Legendre functions for all x ∈(- 1 , + 1) (on the cut) and | x | > 1 and integer degree (l) and order (m). The necessity to revise the code comes as a consequence of some comments of Prof. James Bremer of the UC//Davis Mathematics Department, who discovered that there were errors in the code for large integer degree and order for the normalized regular Legendre functions on the cut.

  8. Compilation of Instantaneous Source Functions for Varying ...

    African Journals Online (AJOL)

    Compilation of Instantaneous Source Functions for Varying Architecture of a Layered Reservoir with Mixed Boundaries and Horizontal Well Completion Part IV: Normal and Inverted Letter 'h' and 'H' Architecture.

  9. Compilation and analysis of Escherichia coli promoter DNA sequences.

    OpenAIRE

    Hawley, D K; McClure, W R

    1983-01-01

    The DNA sequence of 168 promoter regions (-50 to +10) for Escherichia coli RNA polymerase were compiled. The complete listing was divided into two groups depending upon whether or not the promoter had been defined by genetic (promoter mutations) or biochemical (5' end determination) criteria. A consensus promoter sequence based on homologies among 112 well-defined promoters was determined that was in substantial agreement with previous compilations. In addition, we have tabulated 98 promoter ...

  10. Using Peephole Optimization on Intermediate Code

    NARCIS (Netherlands)

    Tanenbaum, A.S.; van Staveren, H.; Stevenson, J.W.

    1982-01-01

    Many portable compilers generate an intermediate code that is subsequently translated into the target machine's assembly language. In this paper a stack-machine-based intermediate code suitable for algebraic languages (e.g., PASCAL, C, FORTRAN) and most byte-addressed mini- and microcomputers is

  11. Compiler-Agnostic Function Detection in Binaries

    NARCIS (Netherlands)

    Andriesse, D.A.; Slowinska, J.M.; Bos, H.J.

    2017-01-01

    We propose Nucleus, a novel function detection algorithm for binaries. In contrast to prior work, Nucleus is compiler-agnostic, and does not require any learning phase or signature information. Instead of scanning for signatures, Nucleus detects functions at the Control Flow Graph-level, making it

  12. Regulatory and technical reports (abstract index journal): Annual compilation for 1987

    International Nuclear Information System (INIS)

    1988-03-01

    This compilation consists of bibliographic data and abstracts for the formal regulatory and technical reports issued by the US Nuclear Regulatory Commission (NRC) Staff and its contractors. It is NRC's intention to publish this compilation quarterly and to cumulate it annually

  13. Software and DVFS Tuning for Performance and Energy-Efficiency on Intel KNL Processors

    Directory of Open Access Journals (Sweden)

    Enrico Calore

    2018-06-01

    Full Text Available Energy consumption of processors and memories is quickly becoming a limiting factor in the deployment of large computing systems. For this reason, it is important to understand the energy performance of these processors and to study strategies allowing their use in the most efficient way. In this work, we focus on the computing and energy performance of the Knights Landing Xeon Phi, the latest Intel many-core architecture processor for HPC applications. We consider the 64-core Xeon Phi 7230 and profile its performance and energy efficiency using both its on-chip MCDRAM and the off-chip DDR4 memory as the main storage for application data. As a benchmark application, we use a lattice Boltzmann code heavily optimized for this architecture and implemented using several different arrangements of the application data in memory (data-layouts, in short. We also assess the dependence of energy consumption on data-layouts, memory configurations (DDR4 or MCDRAM and the number of threads per core. We finally consider possible trade-offs between computing performance and energy efficiency, tuning the clock frequency of the processor using the Dynamic Voltage and Frequency Scaling (DVFS technique.

  14. Circus: A Replicated Procedure Call Facility

    Science.gov (United States)

    1984-08-01

    298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 client client stubs ...... ...... ..... ..... runtime libary stub compiler binding agent...runtime libary Figure 1: Structure of the Circus system replicated procedure call paired message protocol unreliable datagrams Figure 2: Circus...114-121. [11) Digit &! Equipment Corporation, Intel Corporation, a.nd Xerox Corporation. The Ethernet: A Local Area Networlc. September 1080. [12

  15. Compilation of Instantaneous Source Functions for Varying ...

    African Journals Online (AJOL)

    Compilation of Instantaneous Source Functions for Varying Architecture of a Layered Reservoir with Mixed Boundaries and Horizontal Well Completion Part III: B-Shaped Architecture with Vertical Well in the Upper Layer.

  16. LIONS: a new set of Fortran 90 codes for the SPIRAL project at GANIL

    International Nuclear Information System (INIS)

    Bertrand, P.

    1994-01-01

    A set of new computer programs developed at GANIL is presented; these codes are used to study different parts of the SPIRAL project (a new radioactive ion beam facility), and particularly the dynamics in the CIME cyclotron, its injection inflector, and the new extraction system of the ECR ion sources. Three important modules are described: CHA3D for the evaluation of 3D electric fields with or without space charge effects, LIONS for the motion of ions and EXTRACT for the ECRIS extraction. These modules are written in Fortran 90 in a ''data parallel scheme''. They work either on UNIX workstations or parallel and vectorial computers. (author). 5 figs., 5 refs

  17. Regulatory and technical reports. Compilation for second quarter 1982, April to June

    International Nuclear Information System (INIS)

    1982-08-01

    This compilation consists of bibliographic data and abstracts for the formal regulatory and technical reports issued by the US Nuclear Regulatory Commission (NRC) Staff and its contractors. It is NRC's intention to publish this compilation quarterly and to cumulate it annually. The main citations and abstracts in this compilation are listed in NUREG number order: NUREG-XXXX, NUREG/CP-XXXX, and NUREG/CR-XXXX. A detailed explanation of the entries precedes each index

  18. Gravity Data for Indiana (300 records compiled)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The gravity data (300 records) were compiled by Purdue University. This data base was received in February 1993. Principal gravity parameters include Free-air...

  19. High-speed vector-processing system of the MELCOM-COSMO 900II

    Energy Technology Data Exchange (ETDEWEB)

    Masuda, K; Mori, H; Fujikake, J; Sasaki, Y

    1983-01-01

    Progress in scientific and technical calculations has lead to a growing demand for high-speed vector calculations. Mitsubishi electric has developed an integrated array processor and automatic-vectorizing fortran compiler as an option for the MELCOM-COSMO 900II computer system. This facilitates the performance of vector calculations and matrix calculations, achieving significant gains in cost-effectiveness. The article outlines the high-speed vector system, includes discussion of compiler structuring, and cites examples of effective system application. 1 reference.

  20. Compiler Driven Code Comments and Refactoring

    DEFF Research Database (Denmark)

    Larsen, Per; Ladelsky, Razya; Karlsson, Sven

    2011-01-01

    . We demonstrate the ability of our tool to trans- form code, and suggest code refactoring that increase its amenability to optimization. The preliminary results shows that, with our tool-set, au- tomatic loop parallelization with the GNU C compiler, gcc, yields 8.6x best-case speedup over...