Flash-X, a multiphysics simulation software instrument [CL]

Posted on August 25, 2022 by arxiverbot

http://arxiv.org/abs/2208.11630

DubeyEtAl-2208.11630_f2.jpg

DubeyEtAl-2208.11630_f5.jpg

DubeyEtAl-2208.11630_f4.jpg

Flash-X is a highly composable multiphysics software system that can be used to simulate physical phenomena in several scientific domains. It derives some of its solvers from FLASH, which was first released in 2000. Flash-X has a new framework that relies on abstractions and asynchronous communications for performance portability across a range of increasingly heterogeneous hardware platforms. Flash-X is meant primarily for solving Eulerian formulations of applications with compressible and/or incompressible reactive flows. It also has a built-in, versatile Lagrangian framework that can be used in many different ways, including implementing tracers, particle-in-cell simulations, and immersed boundary methods.

Read this paper on arXiv…

A. Dubey, K. Weide, J. O’Neal, et. al.
Thu, 25 Aug 22
17/43

Comments: 16 pages, 5 Figures, published open access in SoftwareX

sympy2c: from symbolic expressions to fast C/C++ functions and ODE solvers in Python [IMA]

Posted on March 24, 2022 by arxiverbot

http://arxiv.org/abs/2203.11945

SchmittEtAl-2203.11945_f1.jpg

Computer algebra systems play an important role in science as they facilitate the development of new theoretical models. The resulting symbolic equations are often implemented in a compiled programming language in order to provide fast and portable codes for practical applications. We describe sympy2c, a new Python package designed to bridge the gap between the symbolic development and the numerical implementation of a theoretical model. sympy2c translates symbolic equations implemented in the SymPy Python package to C/C++ code that is optimized using symbolic transformations. The resulting functions can be conveniently used as an extension module in Python. sympy2c is used within the PyCosmo Python package to solve the Einstein-Boltzmann equations, a large system of ODEs describing the evolution of linear perturbations in the Universe. After reviewing the functionalities and usage of sympy2c, we describe its implementation and optimization strategies. This includes, in particular, a novel approach to generate optimized ODE solvers making use of the sparsity of the symbolic Jacobian matrix. We demonstrate its performance using the Einstein-Boltzmann equations as a test case. sympy2c is widely applicable and may prove useful for various areas of computational physics. sympy2c is publicly available at https://cosmology.ethz.ch/research/software-lab/sympy2c.html

Read this paper on arXiv…

U. Schmitt, B. Moser, C. Lorenz, et. al.
Thu, 24 Mar 22
40/56

Comments: 28 pages, 5 figures, 5 tables, Link to package: this https URL, the described packaged sympy2c is used within arXiv:2112.08395

Fast fully-reproducible serial/parallel Monte Carlo and MCMC simulations and visualizations via ParaMonte::Python library [CL]

Posted on October 5, 2020 by arxiverbot

http://arxiv.org/abs/2010.00724

ShahmoradiEtAl-2010.00724_f1.jpg

ShahmoradiEtAl-2010.00724_f4.jpg

ShahmoradiEtAl-2010.00724_f7.jpg

ParaMonte::Python (standing for Parallel Monte Carlo in Python) is a serial and MPI-parallelized library of (Markov Chain) Monte Carlo (MCMC) routines for sampling mathematical objective functions, in particular, the posterior distributions of parameters in Bayesian modeling and analysis in data science, Machine Learning, and scientific inference in general. In addition to providing access to fast high-performance serial/parallel Monte Carlo and MCMC sampling routines, the ParaMonte::Python library provides extensive post-processing and visualization tools that aim to automate and streamline the process of model calibration and uncertainty quantification in Bayesian data analysis. Furthermore, the automatically-enabled restart functionality of ParaMonte::Python samplers ensure seamless fully-deterministic into-the-future restart of Monte Carlo simulations, should any interruptions happen. The ParaMonte::Python library is MIT-licensed and is permanently maintained on GitHub at https://github.com/cdslaborg/paramonte/tree/master/src/interface/Python.

Read this paper on arXiv…

A. Shahmoradi, F. Bagheri and J. Osborne
Mon, 5 Oct 20
8/61

Comments: to be submitted to JOSS

$\mathtt{bimEX}$: A Mathematica package for exact computations in 3$+$1 bimetric relativity [CL]

Posted on April 25, 2019 by arxiverbot

http://arxiv.org/abs/1904.10464

TorselloEtAl-1904.10464_f1.jpg

We present $\mathtt{bimEX}$, a Mathematica package for exact computations in 3$+$1 bimetric relativity. It is based on the $\mathtt{xAct}$ bundle, which can handle computations involving both abstract tensors and their components. In this communication, we refer to the latter case as concrete computations. The package consists of two main parts. The first part involves the abstract tensors, and focuses on how to deal with multiple metrics in $\mathtt{xAct}$. The second part takes an ansatz for the primary variables in a chart as the input, and returns the covariant BSSN bimetric equations in components in that chart. Several functions are implemented to make this process as fast and user-friendly as possible. The package has been used and tested extensively in spherical symmetry and was the workhorse in obtaining the bimetric covariant BSSN equations and reproducing the bimetric 3$+$1 equations in the spherical polar chart.

Read this paper on arXiv…

F. Torsello
Thu, 25 Apr 19
5/58

Comments: 9 pages. The ancillary files contain the main paper with bibliographic tooltips. GitHub repository at this https URL

Towards new solutions for scientific computing: the case of Julia [IMA]

Posted on December 5, 2018 by arxiverbot

http://arxiv.org/abs/1812.01219

arxiverlogo

This year marks the consolidation of Julia (https://julialang.org/), a programming language designed for scientific computing, as the first stable version (1.0) has been released, in August 2018. Among its main features, expressiveness and high execution speeds are the most prominent: the performance of Julia code is similar to statically compiled languages, yet Julia provides a nice interactive shell and fully supports Jupyter; moreover, it can transparently call external codes written in C, Fortran, and even Python and R without the need of wrappers. The usage of Julia in the astronomical community is growing, and a GitHub organization named JuliaAstro takes care of coordinating the development of packages. In this paper, we present the features and shortcomings of this language and discuss its application in astronomy and astrophysics.

Read this paper on arXiv…

M. Tomasi and M. Giordano
Wed, 5 Dec 18
1/73

Comments: To appear in the Proceedings of ADASS2018

Gravitational octree code performance evaluation on Volta GPU [CL]

Posted on November 8, 2018 by arxiverbot

http://arxiv.org/abs/1811.02761

MikiEtAl-1811.02761_f1.jpg

MikiEtAl-1811.02761_f3.jpg

MikiEtAl-1811.02761_f2.jpg

In this study, the gravitational octree code originally optimized for the Fermi, Kepler, and Maxwell GPU architectures is adapted to the Volta architecture. The Volta architecture introduces independent thread scheduling requiring either the insertion of the explicit synchronizations at appropriate locations or the enforcement of the same implicit synchronizations as do the Pascal or earlier architectures by specifying \texttt{-gencode arch=compute_60,code=sm_70}. The performance measurements on Tesla V100, the current flagship GPU by NVIDIA, revealed that the $N$-body simulations of the Andromeda galaxy model with $2^{23} = 8388608$ particles took $3.8 \times 10^{-2}$~s or $3.3 \times 10^{-2}$~s per step for each case. Tesla V100 achieves a 1.4 to 2.2-fold acceleration in comparison with Tesla P100, the flagship GPU in the previous generation. The observed speed-up of 2.2 is greater than 1.5, which is the ratio of the theoretical peak performance of the two GPUs. The independence of the units for integer operations from those for floating-point number operations enables the overlapped execution of integer and floating-point number operations. It hides the execution time of the integer operations leading to the speed-up rate above the theoretical peak performance ratio. Tesla V100 can execute $N$-body simulation with up to $25 \times 2^{20} = 26214400$ particles, and it took $2.0 \times 10^{-1}$~s per step. It corresponds to $3.5$~TFlop/s, which is 22\% of the single-precision theoretical peak performance.

Read this paper on arXiv…

Y. Miki
Thu, 8 Nov 18
55/72

Comments: 10 pages, 10 figures, 2 tables, submitted to Computer Physics Communications

Mathematical Foundations of the GraphBLAS [CL]

Posted on June 21, 2016 by arxiver

http://arxiv.org/abs/1606.05790

KepnerEtAl-1606.05790_f6.jpg

KepnerEtAl-1606.05790_f11.jpg

KepnerEtAl-1606.05790_f4.jpg

The GraphBLAS standard (GraphBlas.org) is being developed to bring the potential of matrix based graph algorithms to the broadest possible audience. Mathematically the Graph- BLAS defines a core set of matrix-based graph operations that can be used to implement a wide class of graph algorithms in a wide range of programming environments. This paper provides an introduction to the mathematics of the GraphBLAS. Graphs represent connections between vertices with edges. Matrices can represent a wide range of graphs using adjacency matrices or incidence matrices. Adjacency matrices are often easier to analyze while incidence matrices are often better for representing data. Fortunately, the two are easily connected by matrix mul- tiplication. A key feature of matrix mathematics is that a very small number of matrix operations can be used to manipulate a very wide range of graphs. This composability of small number of operations is the foundation of the GraphBLAS. A standard such as the GraphBLAS can only be effective if it has low performance overhead. Performance measurements of prototype GraphBLAS implementations indicate that the overhead is low.

Read this paper on arXiv…

J. Kepner, P. Aaltonen, D. Bader, et. al.
Tue, 21 Jun 16
72/75

Comments: 9 pages; 11 figures; accepted to IEEE High Performance Extreme Computing (HPEC) conference 2016

BEANS – a software package for distributed Big Data analysis [IMA]

Posted on March 25, 2016 by arxiver

http://arxiv.org/abs/1603.07342

HypkiEtAl-1603.07342_f2.jpg

HypkiEtAl-1603.07342_f9.jpg

HypkiEtAl-1603.07342_f4.jpg

BEANS software is a web based, easy to install and maintain, new tool to store and analyse data in a distributed way for a massive amount of data. It provides a clear interface for querying, filtering, aggregating, and plotting data from an arbitrary number of datasets. Its main purpose is to simplify the process of storing, examining and finding new relations in the so-called Big Data.
Creation of BEANS software is an answer to the growing needs of the astronomical community to have a versatile tool to store, analyse and compare the complex astrophysical numerical simulations with observations (e.g. simulations of the Galaxy or star clusters with the Gaia archive). However, this software was built in a general form and it is ready to use in any other research field or open source software.

Read this paper on arXiv…

A. Hypki
Fri, 25 Mar 16
16/50

Comments: 14 pages, 6 figures, submitted to MNRAS, comments are welcome

Sapporo2: A versatile direct $N$-body library [IMA]

Posted on October 15, 2015 by arxiver

http://arxiv.org/abs/1510.04068

BedorfEtAl-1510.04068_f7.jpg

BedorfEtAl-1510.04068_f2.jpg

BedorfEtAl-1510.04068_f5.jpg

Astrophysical direct $N$-body methods have been one of the first production algorithms to be implemented using NVIDIA’s CUDA architecture. Now, almost seven years later, the GPU is the most used accelerator device in astronomy for simulating stellar systems. In this paper we present the implementation of the Sapporo2 $N$-body library, which allows researchers to use the GPU for $N$-body simulations with little to no effort. The first version, released five years ago, is actively used, but lacks advanced features and versatility in numerical precision and support for higher order integrators. In this updated version we have rebuilt the code from scratch and added support for OpenCL, multi-precision and higher order integrators. We show how to tune these codes for different GPU architectures and present how to continue utilizing the GPU optimal even when only a small number of particles ($N < 100$) is integrated. This careful tuning allows Sapporo2 to be faster than Sapporo1 even with the added options and double precision data loads. The code runs on a range of NVIDIA and AMD GPUs in single and double precision accuracy. With the addition of OpenCL support the library is also able to run on CPUs and other accelerators that support OpenCL.

Read this paper on arXiv…

J. Bedorf, E. Gaburov and S. Zwart
Thu, 15 Oct 15
3/57

Comments: 15 pages, 7 figures. Accepted for publication in Computational Astrophysics and Cosmology

GenASiS Basics: Object-oriented utilitarian functionality for large-scale physics simulations [IMA]

Posted on July 10, 2015 by arxiver

http://arxiv.org/abs/1507.02506

CardallEtAl-1507.02506_f12.jpg

CardallEtAl-1507.02506_f33.jpg

CardallEtAl-1507.02506_f7.jpg

Aside from numerical algorithms and problem setup, large-scale physics simulations on distributed-memory supercomputers require more basic utilitarian functionality, such as physical units and constants; display to the screen or standard output device; message passing; I/O to disk; and runtime parameter management and usage statistics. Here we describe and make available Fortran 2003 classes furnishing extensible object-oriented implementations of this sort of rudimentary functionality, along with individual `unit test’ programs and larger example problems demonstrating their use. These classes compose the Basics division of our developing astrophysics simulation code GenASiS (General Astrophysical Simulation System), but their fundamental nature makes them useful for physics simulations in many fields.

Read this paper on arXiv…

C. Cardall and R. Budiardja
Fri, 10 Jul 15
41/53

Comments: Computer Physics Communications in press

Remark on "Algorithm 916: Computing the Faddeyeva and Voigt functions": Efficiency Improvements and Fortran Translation [IMA]

Posted on May 27, 2015 by arxiver

http://arxiv.org/abs/1505.06848

arxiverlogo

This remark describes efficiency improvements to Algorithm 916 [Zaghloul and Ali 2011]. It is shown that the execution time required by the algorithm, when run at its highest accuracy, may be improved by more than a factor of two. A better accuracy vs efficiency trade off scheme is also implemented; this requires the user to supply the number of significant figures desired in the computed values as an extra input argument to the function. Using this trade-off, it is shown that the efficiency of the algorithm may be further improved significantly while maintaining reasonably accurate and safe results that are free of the pitfalls and complete loss of accuracy seen in other competitive techniques. The current version of the code is provided in Matlab and Scilab in addition to a Fortran translation prepared to meet the needs of real-world problems where very large numbers of function evaluations would require the use of a compiled language. To fulfill this last requirement, a recently proposed reformed version of Humlicek’s w4 routine, shown to maintain the claimed accuracy of the algorithm over a wide and fine grid is implemented in the present Fortran translation for the case of 4 significant figures. This latter modification assures the reliability of the code to be employed in the solution of practical problems requiring numerous evaluation of the function for applications tolerating low accuracy computations (<10-4).

Read this paper on arXiv…

M. Zaghloul
Wed, 27 May 15
47/48

Comments: 11 pages, 5 tables, Under review

The NIFTY way of Bayesian signal inference [IMA]

Posted on December 24, 2014 by arxiver

http://arxiv.org/abs/1412.7160

SeligEtAl-1412.7160_f4.jpg

SeligEtAl-1412.7160_f2.jpg

SeligEtAl-1412.7160_f1.jpg

We introduce NIFTY, “Numerical Information Field Theory”, a software package for the development of Bayesian signal inference algorithms that operate independently from any underlying spatial grid and its resolution. A large number of Bayesian and Maximum Entropy methods for 1D signal reconstruction, 2D imaging, as well as 3D tomography, appear formally similar, but one often finds individualized implementations that are neither flexible nor easily transferable. Signal inference in the framework of NIFTY can be done in an abstract way, such that algorithms, prototyped in 1D, can be applied to real world problems in higher-dimensional settings. NIFTY as a versatile library is applicable and already has been applied in 1D, 2D, 3D and spherical settings. A recent application is the D3PO algorithm targeting the non-trivial task of denoising, deconvolving, and decomposing photon observations in high energy astronomy.

Read this paper on arXiv…

M. Selig
Wed, 24 Dec 14
18/37

Comments: 6 pages, 2 figures, refereed proceeding of the 33rd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2013), software available at this http URL and this http URL

External Use of TOPCAT's Plotting Library [IMA]

Posted on November 2, 2014 by arxiver

http://arxiv.org/abs/1410.8507

TaylorEtAl-1410.8507_f5.jpg

TaylorEtAl-1410.8507_f6.jpg

TaylorEtAl-1410.8507_f1.jpg

The table analysis application TOPCAT uses a custom Java plotting library for highly configurable high-performance interactive or exported visualisations in two and three dimensions. We present here a variety of ways for end users or application developers to make use of this library outside of the TOPCAT application: via the command-line suite STILTS or its Jython variant JyStilts, via a traditional Java API, or by programmatically assigning values to a set of parameters in java code or using some form of inter-process communication. The library has been built with large datasets in mind; interactive plots scale well up to several million points, and static output to standard graphics formats is possible for unlimited sized input data.

Read this paper on arXiv…

M. Taylor
Fri, 31 Oct 14
19/69

Comments: 4 pages, 1 figure

HOPE: A Python Just-In-Time compiler for astrophysical computations [IMA]

Posted on October 17, 2014 by arxiver

http://arxiv.org/abs/1410.4345

arxiverlogo

The Python programming language is becoming increasingly popular for scientific applications due to its simplicity, versatility, and the broad range of its libraries. A drawback of this dynamic language, however, is its low runtime performance which limits its applicability for large simulations and for the analysis of large data sets, as is common in astrophysics and cosmology. While various frameworks have been developed to address this limitation, most focus on covering the complete language set, and either force the user to alter the code or are not able to reach the full speed of an optimised native compiled language. In order to combine the ease of Python and the speed of C++, we developed HOPE, a specialised Python just-in-time (JIT) compiler designed for numerical astrophysical applications. HOPE focuses on a subset of the language and is able to translate Python code into C++ while performing numerical optimisation on mathematical expressions at runtime. To enable the JIT compilation, the user only needs to add a decorator to the function definition. We assess the performance of HOPE by performing a series of benchmarks and compare its execution speed with that of plain Python, C++ and the other existing frameworks. We find that HOPE improves the performance compared to plain Python by a factor of 2 to 120, achieves speeds comparable to that of C++, and often exceeds the speed of the existing solutions. We discuss the differences between HOPE and the other frameworks, as well as future extensions of its capabilities. The fully documented HOPE package is available at this http URL and is published under the GPLv3 license on PyPI and GitHub.

Read this paper on arXiv…

J. Akeret, L. Gamper, A. Amara, et. al.
Fri, 17 Oct 14
36/54

Comments: Submitted to Astronomy and Computing. 13 pages, 1 figure. The code is available at this http URL

CosmoMC Installation and Running Guidelines [IMA]

Posted on September 5, 2014 by arxiver

http://arxiv.org/abs/1409.1354

arxiverlogo

CosmoMC is a Fortran 95 Markov-Chain Monte-Carlo (MCMC) engine to explore the cosmological parameter space, plus a Python suite for plotting and presenting results (see this http URL). This document describes the installation of the CosmoMC on a Linux system (Ubuntu 14.04.1 LTS 64-bit version). It is written for those who want to use it in their scientific research but without much training on Linux and the program. Besides a step-by-step installation guide, we also give a brief introduction of how to run the program on both a desktop and a cluster. We share our way to generate the plots that are commonly used in the references of cosmology. For more information, one can refer to the CosmoCoffee forum (this http URL) or contact the authors of this document. Questions and comments would be much appreciated.

Read this paper on arXiv…

M. Li and P. Wang
Fri, 5 Sep 14
54/69

Comments: The aim of this article is to help the undergraduate and postgraduate students to get into the field of cosmology. Thus, it was not submitted to any particular journal and is publicly available. Totally 10 pages, 0 figures

Achieving 100,000,000 database inserts per second using Accumulo and D4M [CL]

Posted on June 20, 2014 by arxiver

http://arxiv.org/abs/1406.4923

arxiverlogo

The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications. Accumulo is designed to deliver high performance on unstructured data such as graphs of network data. This paper tests the performance of Accumulo using data from the Graph500 benchmark. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a 216-node cluster running the MIT SuperCloud software stack. A peak performance of over 100,000,000 database inserts per second was achieved which is 100x larger than the highest previously published value for any other database. The performance scales linearly with the number of ingest clients, number of database servers, and data size. The performance was achieved by adapting several supercomputing techniques to this application: distributed arrays, domain decomposition, adaptive load balancing, and single-program-multiple-data programming.

Read this paper on arXiv…

J. Kepner, W. Arcand, D. Bestor, et. al.
Fri, 20 Jun 14
2/48

Comments: 6 pages; to appear in IEEE High Performance Extreme Computing (HPEC) 2014

%d bloggers like this: