A code for two-dimensional frequency analysis using the Least Absolute Shrinkage and Selection Operator (Lasso) for multidisciplinary use [SSA]

http://arxiv.org/abs/2111.10931


In Kato and Uemura (2012), we introduced the Least Absolute Shrinkage and Selection Operator (Lasso) method, a kind of sparse modeling, to study frequency structures of variable stars. A very high frequency resolution was achieved compared to traditional Fourier-type frequency analysis. This method has been extended to two-dimensional frequency analysis to obtain dynamic spectra. This two-dimensional Lasso frequency analysis yielded a wide range of results including separation of the orbital, superhump and negative superhump signals in Kepler data of SU UMa stars. In this paper, I briefly reviewed the progress and applications of this method. I present a full R code with examples of its usage. This code has been confirmed to detect the appearance of the orbital signal and the variation of the spin period after the eruption of the nova V1674 Her. This code also can be used in multidisciplinary purposes and I provide applications to analysis of avian vocalizations. I found fine structures in the call of the Eurasian wren ($\textit{Troglodytes troglodytes}$), which is likely to be used species identification. This code would be a new tool in studying avian vocalizations with high temporal and frequency resolutions. Interpretations of the power spectra of avian vocalizations will also be helpful in interpreting the power spectra of variable stars.

Read this paper on arXiv…

T. Kato
Tue, 23 Nov 21
46/84

Comments: 20 pages, 7 figures, with R code, VSOLJ Variable Star Bulletin 86

Habitability Models for Astrobiology [EPA]

http://arxiv.org/abs/2108.05417


Habitability has been generally defined as the capability of an environment to support life. Ecologists have been using Habitat Suitability Models (HSMs) for more than four decades to study the habitability of Earth from local to global scales. Astrobiologists have been proposing different habitability models for some time, with little integration and consistency among them, being different in function to those used by ecologists. Habitability models are not only used to determine if environments are habitable or not, but they also are used to characterize what key factors are responsible for the gradual transition from low to high habitability states. Here we review and compare some of the different models used by ecologists and astrobiologists and suggest how they could be integrated into new habitability standards. Such standards will help to improve the comparison and characterization of potentially habitable environments, prioritize target selections, and study correlations between habitability and biosignatures. Habitability models are the foundation of planetary habitability science and the synergy between ecologists and astrobiologists is necessary to expand our understanding of the habitability of Earth, the Solar System, and extrasolar planets.

Read this paper on arXiv…

A. Méndez, E. Rivera-Valentín, D. Schulze-Makuch, et. al.
Fri, 13 Aug 21
3/64

Comments: Published in Astrobiology, 21(8)

Fasano-Franceschini Test: an Implementation of a 2-Dimensional Kolmogorov-Smirnov test in R [CL]

http://arxiv.org/abs/2106.10539


The univariate Kolmogorov-Smirnov (KS) test is a non-parametric statistical test designed to assess whether a set of data is consistent with a given probability distribution (or, in the two-sample case, whether the two samples come from the same underlying distribution). The versatility of the KS test has made it a cornerstone of statistical analysis and is commonly used across the scientific disciplines. However, the test proposed by Kolmogorov and Smirnov does not naturally extend to multidimensional distributions. Here, we present the fasano.franceschini.test package, an R implementation of the 2-D KS two-sample test as defined by Fasano and Franceschini (Fasano and Franceschini 1987). The fasano.franceschini.test package provides three improvements over the current 2-D KS test on the Comprehensive R Archive Network (CRAN): (i) the Fasano and Franceschini test has been shown to run in $O(n^2)$ versus the Peacock implementation which runs in $O(n^3)$; (ii) the package implements a procedure for handling ties in the data; and (iii) the package implements a parallelized bootstrapping procedure for improved significance testing. Ultimately, the fasano.franceschini.test package presents a robust statistical test for analyzing random samples defined in 2-dimensions.

Read this paper on arXiv…

E. Ness-Cohn and R. Braun
Tue, 22 Jun 21
71/71

Comments: 8 pages, 4 figures

MatDRAM: A pure-MATLAB Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo Sampler [CL]

http://arxiv.org/abs/2010.04190


Markov Chain Monte Carlo (MCMC) algorithms are widely used for stochastic optimization, sampling, and integration of mathematical objective functions, in particular, in the context of Bayesian inverse problems and parameter estimation. For decades, the algorithm of choice in MCMC simulations has been the Metropolis-Hastings (MH) algorithm. An advancement over the traditional MH-MCMC sampler is the Delayed-Rejection Adaptive Metropolis (DRAM). In this paper, we present MatDRAM, a stochastic optimization, sampling, and Monte Carlo integration toolbox in MATLAB which implements a variant of the DRAM algorithm for exploring the mathematical objective functions of arbitrary-dimensions, in particular, the posterior distributions of Bayesian models in data science, Machine Learning, and scientific inference. The design goals of MatDRAM include nearly-full automation of MCMC simulations, user-friendliness, fully-deterministic reproducibility, and the restart functionality of simulations. We also discuss the implementation details of a technique to automatically monitor and ensure the diminishing adaptation of the proposal distribution of the DRAM algorithm and a method of efficiently storing the resulting simulated Markov chains. The MatDRAM library is open-source, MIT-licensed, and permanently located and maintained as part of the ParaMonte library at https://github.com/cdslaborg/paramonte.

Read this paper on arXiv…

S. Kumbhare and A. Shahmoradi
Mon, 12 Oct 20
22/59

Comments: N/A

Fast fully-reproducible serial/parallel Monte Carlo and MCMC simulations and visualizations via ParaMonte::Python library [CL]

http://arxiv.org/abs/2010.00724


ParaMonte::Python (standing for Parallel Monte Carlo in Python) is a serial and MPI-parallelized library of (Markov Chain) Monte Carlo (MCMC) routines for sampling mathematical objective functions, in particular, the posterior distributions of parameters in Bayesian modeling and analysis in data science, Machine Learning, and scientific inference in general. In addition to providing access to fast high-performance serial/parallel Monte Carlo and MCMC sampling routines, the ParaMonte::Python library provides extensive post-processing and visualization tools that aim to automate and streamline the process of model calibration and uncertainty quantification in Bayesian data analysis. Furthermore, the automatically-enabled restart functionality of ParaMonte::Python samplers ensure seamless fully-deterministic into-the-future restart of Monte Carlo simulations, should any interruptions happen. The ParaMonte::Python library is MIT-licensed and is permanently maintained on GitHub at https://github.com/cdslaborg/paramonte/tree/master/src/interface/Python.

Read this paper on arXiv…

A. Shahmoradi, F. Bagheri and J. Osborne
Mon, 5 Oct 20
8/61

Comments: to be submitted to JOSS

From Allometry to Dimensionally Homogenous `Laws': Reformulation of the Metabolic Rate Relation [CL]

http://arxiv.org/abs/1707.02340


Meaningful laws of nature must be independent of the units employed to measure the variables. The principle of similitude (Rayleigh 1915) or dimensional homogeneity, states that only commensurable quantities (ones having the same dimension) may be compared, therefore, meaningful laws of nature must be homogeneous equations in their various units of measurement, a result which was formalized in the $\rm \Pi$ theorem (Vaschy 1892; Buckingham 1914). However, most relations in allometry do not satisfy this basic requirement, including the `3/4 Law’ (Kleiber 1932) that relates the basal metabolic rate and body mass, which it is sometimes claimed to be the most fundamental biological rate (Brown et al. 2004) and the closest to a law in life sciences (West \& Brown 2004). Using the $\rm \Pi$ theorem, here we show that it is possible to construct a unique homogeneous equation for the metabolic rates, in agreement with data in the literature. We find that the variations in the dependence of the metabolic rates on body mass are secondary, coming from variations in the allometric dependence of the heart frequencies. This includes not only different classes of animals (mammals, birds, invertebrates) but also different exercise conditions (basal and maximal). Our results demonstrate that most of the differences found in the allometric exponents (White et al. 2007) are due to compare incommensurable quantities and that our dimensionally homogenous formula, unify these differences into a single formulation. We discuss the ecological implications of this new formulation in the context of the Malthusian’s, Fenchel’s and Calder’s relations.

Read this paper on arXiv…

A. Escala
Tue, 11 Jul 17
71/74

Comments: Submitted. Comments are welcome (andres.escala@aya.yale.edu)

Noisy independent component analysis of auto-correlated components [CL]

http://arxiv.org/abs/1705.02344


We present a new method for the separation of superimposed, independent, auto-correlated com- ponents from noisy multi-channel measurement. The presented method simultaneously reconstructs and separates the components, taking all channels into account and thereby increases the effective signal-to-noise ratio considerably, allowing separations even in the high noise regime. Characteristics of the measurement instruments can be included, allowing for application in complex measurement situations. Independent posterior samples can be provided, permitting error estimates on all de- sired quantities. Using the concept of information field theory, the algorithm is not restricted to any dimensionality of the underlying space or discretization scheme thereof.

Read this paper on arXiv…

J. Knollmuller and T. Ensslin
Tue, 9 May 17
42/82

Comments: N/A

Clustering with phylogenetic tools in astrophysics [IMA]

http://arxiv.org/abs/1606.00235


Phylogenetic approaches are finding more and more applications outside the field of biology. Astrophysics is no exception since an overwhelming amount of multivariate data has appeared in the last twenty years or so. In particular, the diversification of galaxies throughout the evolution of the Universe quite naturally invokes phylogenetic approaches. We have demonstrated that Maximum Parsimony brings useful astrophysical results, and we now proceed toward the analyses of large datasets for galaxies. In this talk I present how we solve the major difficulties for this goal: the choice of the parameters, their discretization, and the analysis of a high number of objects with an unsupervised NP-hard classification technique like cladistics. 1. Introduction How do the galaxy form, and when? How did the galaxy evolve and transform themselves to create the diversity we observe? What are the progenitors to present-day galaxies? To answer these big questions, observations throughout the Universe and the physical modelisation are obvious tools. But between these, there is a key process, without which it would be impossible to extract some digestible information from the complexity of these systems. This is classification. One century ago, galaxies were discovered by Hubble. From images obtained in the visible range of wavelengths, he synthetised his observations through the usual process: classification. With only one parameter (the shape) that is qualitative and determined with the eye, he found four categories: ellipticals, spirals, barred spirals and irregulars. This is the famous Hubble classification. He later hypothetized relationships between these classes, building the Hubble Tuning Fork. The Hubble classification has been refined, notably by de Vaucouleurs, and is still used as the only global classification of galaxies. Even though the physical relationships proposed by Hubble are not retained any more, the Hubble Tuning Fork is nearly always used to represent the classification of the galaxy diversity under its new name the Hubble sequence (e.g. Delgado-Serrano, 2012). Its success is impressive and can be understood by its simplicity, even its beauty, and by the many correlations found between the morphology of galaxies and their other properties. And one must admit that there is no alternative up to now, even though both the Hubble classification and diagram have been recognised to be unsatisfactory. Among the most obvious flaws of this classification, one must mention its monovariate, qualitative, subjective and old-fashioned nature, as well as the difficulty to characterise the morphology of distant galaxies. The first two most significant multivariate studies were by Watanabe et al. (1985) and Whitmore (1984). Since the year 2005, the number of studies attempting to go beyond the Hubble classification has increased largely. Why, despite of this, the Hubble classification and its sequence are still alive and no alternative have yet emerged (Sandage, 2005)? My feeling is that the results of the multivariate analyses are not easily integrated into a one-century old practice of modeling the observations. In addition, extragalactic objects like galaxies, stellar clusters or stars do evolve. Astronomy now provides data on very distant objects, raising the question of the relationships between those and our present day nearby galaxies. Clearly, this is a phylogenetic problem. Astrocladistics 1 aims at exploring the use of phylogenetic tools in astrophysics (Fraix-Burnet et al., 2006a,b). We have proved that Maximum Parsimony (or cladistics) can be applied in astrophysics and provides a new exploration tool of the data (Fraix-Burnet et al., 2009, 2012, Cardone \& Fraix-Burnet, 2013). As far as the classification of galaxies is concerned, a larger number of objects must now be analysed. In this paper, I

Read this paper on arXiv…

D. Fraix-Burnet
Thu, 2 Jun 16
56/60

Comments: Proceedings of the 60th World Statistics Congress of the International Statistical Institute, ISI2015, Jul 2015, Rio de Janeiro, Brazil

Multivariate Approaches to Classification in Extragalactic Astronomy [GA]

http://arxiv.org/abs/1508.06756


Clustering objects into synthetic groups is a natural activity of any science. Astrophysics is not an exception and is now facing a deluge of data. For galaxies, the one-century old Hubble classification and the Hubble tuning fork are still largely in use, together with numerous mono-or bivariate classifications most often made by eye. However, a classification must be driven by the data, and sophisticated multivariate statistical tools are used more and more often. In this paper we review these different approaches in order to situate them in the general context of unsupervised and supervised learning. We insist on the astrophysical outcomes of these studies to show that multivariate analyses provide an obvious path toward a renewal of our classification of galaxies and are invaluable tools to investigate the physics and evolution of galaxies.

Read this paper on arXiv…

D. Fraix-Burnet, M. Thuillard and A. Chattopadhyay
Fri, 28 Aug 15
15/49

Comments: Open Access paper. this http URL>. \<10.3389/fspas.2015.00003 \&gt