A statistical model of stellar variability. I. FENRIR: a physics-based model of stellar activity, and its fast Gaussian process approximation [SSA]

http://arxiv.org/abs/2304.08489


The detection of terrestrial planets by radial velocity and photometry is hindered by the presence of stellar signals. Those are often modeled as stationary Gaussian processes, whose kernels are based on qualitative considerations, which do not fully leverage the existing physical understanding of stars. Our aim is to build a formalism which allows to transfer the knowledge of stellar activity into practical data analysis methods. In particular, we aim at obtaining kernels with physical parameters. This has two purposes: better modelling signals of stellar origin to find smaller exoplanets, and extracting information about the star from the statistical properties of the data. We consider several observational channels such as photometry, radial velocity, activity indicators, and build a model called FENRIR to represent their stochastic variations due to stellar surface inhomogeneities. We compute analytically the covariance of this multi-channel stochastic process, and implement it in the S+LEAF framework to reduce the cost of likelihood evaluations from $O(N^3)$ to $O(N)$. We also compute analytically higher order cumulants of our FENRIR model, which quantify its non-Gaussianity. We obtain a fast Gaussian process framework with physical parameters, which we apply to the HARPS-N and SORCE observations of the Sun, and constrain a solar inclination compatible with the viewing geometry. We then discuss the application of our formalism to granulation. We exhibit non-Gaussianity in solar HARPS radial velocities, and argue that information is lost when stellar activity signals are assumed to be Gaussian. We finally discuss the origin of phase shifts between RVs and indicators, and how to build relevant activity indicators. We provide an open-source implementation of the FENRIR Gaussian process model with a Python interface.

Read this paper on arXiv…

N. Hara and J. Delisle
Tue, 18 Apr 23
79/80

Comments: Submitted to Astronomy \& Astrophysics

Lossy Compression of Large-Scale Radio Interferometric Data [IMA]

http://arxiv.org/abs/2304.07050


This work proposes to reduce visibility data volume using a baseline-dependent lossy compression technique that preserves smearing at the edges of the field-of-view. We exploit the relation of the rank of a matrix and the fact that a low-rank approximation can describe the raw visibility data as a sum of basic components where each basic component corresponds to a specific Fourier component of the sky distribution. As such, the entire visibility data is represented as a collection of data matrices from baselines, instead of a single tensor. The proposed methods are formulated as follows: provided a large dataset of the entire visibility data; the first algorithm, named $simple~SVD$ projects the data into a regular sampling space of rank$-r$ data matrices. In this space, the data for all the baselines has the same rank, which makes the compression factor equal across all baselines. The second algorithm, named $BDSVD$ projects the data into an irregular sampling space of rank$-r_{pq}$ data matrices. The subscript $pq$ indicates that the rank of the data matrix varies across baselines $pq$, which makes the compression factor baseline-dependent. MeerKAT and the European Very Long Baseline Interferometry Network are used as reference telescopes to evaluate and compare the performance of the proposed methods against traditional methods, such as traditional averaging and baseline-dependent averaging (BDA). For the same spatial resolution threshold, both $simple~SVD$ and $BDSVD$ show effective compression by two-orders of magnitude higher than traditional averaging and BDA. At the same space-saving rate, there is no decrease in spatial resolution and there is a reduction in the noise variance in the data which improves the S/N to over $1.5$ dB at the edges of the field-of-view.

Read this paper on arXiv…

M. Atemkeng, S. Perkins, E. Seck, et. al.
Mon, 17 Apr 23
19/51

Comments: N/A

Geometric Methods for Spherical Data, with Applications to Cosmology [CEA]

http://arxiv.org/abs/2303.15278


This survey is devoted to recent developments in the statistical analysis of spherical data, with a view to applications in Cosmology. We will start from a brief discussion of Cosmological questions and motivations, arguing that most Cosmological observables are spherical random fields. Then, we will introduce some mathematical background on spherical random fields, including spectral representations and the construction of needlet and wavelet frames. We will then focus on some specific issues, including tools and algorithms for map reconstruction (\textit{i.e.}, separating the different physical components which contribute to the observed field), geometric tools for testing the assumptions of Gaussianity and isotropy, and multiple testing methods to detect contamination in the field due to point sources. Although these tools are introduced in the Cosmological context, they can be applied to other situations dealing with spherical data. Finally, we will discuss more recent and challenging issues such as the analysis of polarization data, which can be viewed as realizations of random fields taking values in spin fiber bundles.

Read this paper on arXiv…

J. Duque and D. Marinucci
Tue, 28 Mar 23
56/81

Comments: 25 pages, 6 figures

Simulation-based inference of Bayesian hierarchical models while checking for model misspecification [CL]

http://arxiv.org/abs/2209.11057


This paper presents recent methodological advances to perform simulation-based inference (SBI) of a general class of Bayesian hierarchical models (BHMs), while checking for model misspecification. Our approach is based on a two-step framework. First, the latent function that appears as second layer of the BHM is inferred and used to diagnose possible model misspecification. Second, target parameters of the trusted model are inferred via SBI. Simulations used in the first step are recycled for score compression, which is necessary to the second step. As a proof of concept, we apply our framework to a prey-predator model built upon the Lotka-Volterra equations and involving complex observational processes.

Read this paper on arXiv…

F. Leclercq
Fri, 23 Sep 22
24/70

Comments: 6 pages, 2 figures. Accepted for publication as proceedings of MaxEnt’22 (18-22 July 2022, IHP, Paris, France, this https URL). The pySELFI code is publicly available at this http URL and on GitHub (this https URL)

Sample variance of rounded variables [CL]

http://arxiv.org/abs/2102.08483


If the rounding errors are assumed to be distributed independently from the intrinsic distribution of the random variable, the sample variance $s^2$ of the rounded variable is given by the sum of the true variance $\sigma^2$ and the variance of the rounding errors (which is equal to $w^2/12$ where $w$ is the size of the rounding window). Here the exact expressions for the sample variance of the rounded variables are examined and it is also discussed when the simple approximation $s^2=\sigma^2+w^2/12$ can be considered valid. In particular, if the underlying distribution $f$ belongs to a family of symmetric normalizable distributions such that $f(x)=\sigma^{-1}F(u)$ where $u=(x-\mu)/\sigma$, and $\mu$ and $\sigma^2$ are the mean and variance of the distribution, then the rounded sample variance scales like $s^2-(\sigma^2+w^2/12)\sim\sigma\Phi'(\sigma)$ as $\sigma\to\infty$ where $\Phi(\tau)=\int_{-\infty}^\infty{\rm d}u\,e^{iu\tau}F(u)$ is the characteristic function of $F(u)$. It follows that, roughly speaking, the approximation is valid for a slowly-varying symmetric underlying distribution with its variance sufficiently larger than the size of the rounding unit.

Read this paper on arXiv…

J. An
Thu, 18 Feb 21
55/66

Comments: N/A

Maximum entropy priors with derived parameters in a specified distribution [CL]

http://arxiv.org/abs/1804.08143


We propose a method for transforming probability distributions so that parameters of interest are forced into a specified distribution. We prove that this approach is the maximum entropy choice, and provide a motivating example applicable to neutrino hierarchy inference.

Read this paper on arXiv…

W. Handley and M. Millea
Tue, 24 Apr 18
17/87

Comments: 7 pages, 2 figures, Submitted to Bayesian Analysis

Two- and Multi-dimensional Curve Fitting using Bayesian Inference [CL]

http://arxiv.org/abs/1802.05339


Fitting models to data using Bayesian inference is quite common, but when each point in parameter space gives a curve, fitting the curve to a data set requires new nuisance parameters, which specify the metric embedding the one-dimensional curve into the higher-dimensional space occupied by the data. A generic formalism for curve fitting in the context of Bayesian inference is developed which shows how the aforementioned metric arises. The result is a natural generalization of previous works, and is compared to oft-used frequentist approaches and similar Bayesian techniques.

Read this paper on arXiv…

A. Steiner
Fri, 16 Feb 18
21/42

Comments: N/A

Probabilistic treatment of the uncertainty from the finite size of weighted Monte Carlo data [CL]

http://arxiv.org/abs/1712.01293


The finite size of Monte Carlo samples carries intrinsic uncertainty that can lead to a substantial bias in parameter estimation if it is neglected and the sample size is small. We introduce a probabilistic treatment of this problem by replacing the usual likelihood functions with novel generalized probability distributions that incorporate the finite statistics via suitable marginalization. These new PDFs are analytic, and can be used to replace the Poisson, multinomial, and sample-based unbinned likelihoods, which covers many use cases in high-energy physics. In the limit of infinite statistics, they reduce to the respective standard probability distributions. In the general case of arbitrary Monte Carlo weights, the expressions involve the fourth Lauricella function $F_D$, for which we find a new representation as a contour integral that allows an exact and efficient calculation. The result also entails a new expression for the probability generating function of the Dirichlet-multinomial distribution with integer parameters. We demonstrate the bias reduction of our approach with a typical toy Monte Carlo problem, estimating the normalization of a peak in a falling energy spectrum, and compare the results with previously published methods from the literature.

Read this paper on arXiv…

T. Glusenkamp
Wed, 6 Dec 17
41/71

Comments: 31 pages, 16 figures

Fast generation of isotropic Gaussian random fields on the sphere [CL]

http://arxiv.org/abs/1709.10314


The efficient simulation of isotropic Gaussian random fields on the unit sphere is a task encountered frequently in numerical applications. A fast algorithm based on Markov properties and Fast Fourier Transforms in 1d is presented that generates samples on an n x n grid in O(n^2 log n). Furthermore, an efficient method to set up the necessary conditional covariance matrices is derived and simulations demonstrate the performance of the algorithm.

Read this paper on arXiv…

P. Creasey and A. Lang
Mon, 2 Oct 17
28/47

Comments: 13 pages, 3 figures

GLASS: A General Likelihood Approximate Solution Scheme [IMA]

http://arxiv.org/abs/1708.08479


We present a technique for constructing suitable posterior probability distributions in situations for which the sampling distribution of the data is not known. This is very useful for modern scientific data analysis in the era of “big data”, for which exact likelihoods are commonly either unknown, computationally prohibitively expensive or inapplicable because of systematic effects in the data. The scheme involves implicitly computing the changes in an approximate sampling distribution as model parameters are changed via explicitly-computed moments of statistics constructed from the data.

Read this paper on arXiv…

S. Gratton
Wed, 30 Aug 2017
43/67

Comments: 14 pages, 4 figures

An unbiased estimator for the ellipticity from image moments [CEA]

http://arxiv.org/abs/1705.01109


An unbiased estimator for the ellipticity of an object in a noisy image is given in terms of the image moments. Three assumptions are made: i) the pixel noise is normally distributed, although with arbitrary covariance matrix, ii) the image moments are taken about a fixed centre, and iii) the point-spread function is known. The relevant combinations of image moments are then jointly normal and their covariance matrix can be computed. A particular estimator for the ratio of the means of jointly normal variates is constructed and used to provide the unbiased estimator of the ellipticity. Furthermore, an unbiased estimate of the covariance of the new estimator is also given.

Read this paper on arXiv…

N. Tessore
Thu, 4 May 17
49/54

Comments: 7 pages, comments welcome

A geometric approach to non-linear correlations with intrinsic scatter [IMA]

http://arxiv.org/abs/1704.05466


We propose a new mathematical model for $n-k$-dimensional non-linear correlations with intrinsic scatter in $n$-dimensional data. The model is based on Riemannian geometry, and is naturally invariant under coordinate transformations. We combine the model with a Bayesian approach for estimating the parameters of the correlation relation and the intrinsic scatter. The approach is symmetric, with no explicit division into dependent and independent variables, and supports censored and truncated datasets with indepedent, arbitrary errors. We also derive analytic likelihoods for the typical astrophysical use case of linear relations in $n$-dimensional Euclidean space. We pay particular attention to the case of linear regression in two dimensions, and compare our results to existing methods. Finally, we apply our methodology to the well-known $M_{BH}$-$\sigma$ correlation between the mass of a supermassive black hole in the centre of a galactic bulge and the corresponding bulge velocity dispersion. The main result of our analysis is that the most likely slope of this correlation is $\sim 6$ for the datasets used, rather than the values in the range $\sim 4\text{-}5$ typically quoted in the literature for these data.

Read this paper on arXiv…

P. Pihajoki
Thu, 20 Apr 17
27/49

Comments: 19 pages, 5 figures. Submitted to MNRAS. Comments welcome

A study of periodograms standardized using training data sets and application to exoplanet detection [CL]

http://arxiv.org/abs/1702.02049


When the noise affecting time series is colored with unknown statistics, a difficulty for sinusoid detection is to control the true significance level of the test outcome. This paper investigates the possibility of using training data sets of the noise to improve this control. Specifically, we analyze the performances of various detectors {applied to} periodograms standardized using training data sets. Emphasis is put on sparse detection in the Fourier domain and on the limitation posed by the necessarily finite size of the training sets available in practice. We study the resulting false alarm and detection rates and show that standardization leads in some cases to powerful constant false alarm rate tests. The study is both analytical and numerical. Although analytical results are derived in an asymptotic regime, numerical results show that theory accurately describes the tests’ behaviour for moderately large sample sizes. Throughout the paper, an application of the considered periodogram standardization is presented for exoplanet detection in radial velocity data.

Read this paper on arXiv…

S. Sulis, D. Mary and L. Bigot
Wed, 8 Feb 17
38/65

Comments: 14 pages, Accepted in IEEE Transactions on Signal Processing

Lognormal Distribution of Cosmic Voids in Simulations and Mocks [CEA]

http://arxiv.org/abs/1612.03180


Following up on previous studies, we here complete a full analysis of the void size distributions of the Cosmic Void Catalog (CVC) based on three different simulation and mock catalogs; dark matter, haloes and galaxies. Based on this analysis, we attempt to answer two questions: Is a 3-parameter log-normal distribution a good candidate to satisfy the void size distributions obtained from different types of environments? Is there a direct relation between the shape parameters of the void size distribution and the environmental effects? In an attempt to answer these questions, we here find that all void size distributions of these data samples satisfy the 3-parameter log-normal distribution whether the environment is dominated by dark matter, haloes or galaxies. In addition, the shape parameters of the 3-parameter log-normal void size distribution seem highly affected by environment, particularly existing substructures. Therefore, we show two quantitative relations given by linear equations between the skewness and the maximum tree depth, and variance of the void size distribution and the maximum tree depth directly from the simulated data. In addition to this, we find that the percentage of the voids with nonzero central density in the data sets has a critical importance. If the number of voids with nonzero central densities reaches greater and or equal to 3.84 percentage in a simulation/mock sample, then a second population is observed in the void size distributions. This second population emerges as a second peak in the log-normal void size distribution at larger radius.

Read this paper on arXiv…

E. Russell and J. Pycke
Tue, 13 Dec 16
7/77

Comments: 11 pages, 6 figures, accepted to ApJ main journal

Clustering with phylogenetic tools in astrophysics [IMA]

http://arxiv.org/abs/1606.00235


Phylogenetic approaches are finding more and more applications outside the field of biology. Astrophysics is no exception since an overwhelming amount of multivariate data has appeared in the last twenty years or so. In particular, the diversification of galaxies throughout the evolution of the Universe quite naturally invokes phylogenetic approaches. We have demonstrated that Maximum Parsimony brings useful astrophysical results, and we now proceed toward the analyses of large datasets for galaxies. In this talk I present how we solve the major difficulties for this goal: the choice of the parameters, their discretization, and the analysis of a high number of objects with an unsupervised NP-hard classification technique like cladistics. 1. Introduction How do the galaxy form, and when? How did the galaxy evolve and transform themselves to create the diversity we observe? What are the progenitors to present-day galaxies? To answer these big questions, observations throughout the Universe and the physical modelisation are obvious tools. But between these, there is a key process, without which it would be impossible to extract some digestible information from the complexity of these systems. This is classification. One century ago, galaxies were discovered by Hubble. From images obtained in the visible range of wavelengths, he synthetised his observations through the usual process: classification. With only one parameter (the shape) that is qualitative and determined with the eye, he found four categories: ellipticals, spirals, barred spirals and irregulars. This is the famous Hubble classification. He later hypothetized relationships between these classes, building the Hubble Tuning Fork. The Hubble classification has been refined, notably by de Vaucouleurs, and is still used as the only global classification of galaxies. Even though the physical relationships proposed by Hubble are not retained any more, the Hubble Tuning Fork is nearly always used to represent the classification of the galaxy diversity under its new name the Hubble sequence (e.g. Delgado-Serrano, 2012). Its success is impressive and can be understood by its simplicity, even its beauty, and by the many correlations found between the morphology of galaxies and their other properties. And one must admit that there is no alternative up to now, even though both the Hubble classification and diagram have been recognised to be unsatisfactory. Among the most obvious flaws of this classification, one must mention its monovariate, qualitative, subjective and old-fashioned nature, as well as the difficulty to characterise the morphology of distant galaxies. The first two most significant multivariate studies were by Watanabe et al. (1985) and Whitmore (1984). Since the year 2005, the number of studies attempting to go beyond the Hubble classification has increased largely. Why, despite of this, the Hubble classification and its sequence are still alive and no alternative have yet emerged (Sandage, 2005)? My feeling is that the results of the multivariate analyses are not easily integrated into a one-century old practice of modeling the observations. In addition, extragalactic objects like galaxies, stellar clusters or stars do evolve. Astronomy now provides data on very distant objects, raising the question of the relationships between those and our present day nearby galaxies. Clearly, this is a phylogenetic problem. Astrocladistics 1 aims at exploring the use of phylogenetic tools in astrophysics (Fraix-Burnet et al., 2006a,b). We have proved that Maximum Parsimony (or cladistics) can be applied in astrophysics and provides a new exploration tool of the data (Fraix-Burnet et al., 2009, 2012, Cardone \& Fraix-Burnet, 2013). As far as the classification of galaxies is concerned, a larger number of objects must now be analysed. In this paper, I

Read this paper on arXiv…

D. Fraix-Burnet
Thu, 2 Jun 16
56/60

Comments: Proceedings of the 60th World Statistics Congress of the International Statistical Institute, ISI2015, Jul 2015, Rio de Janeiro, Brazil

Spectral Kurtosis Statistics of Transient Signals [IMA]

http://arxiv.org/abs/1603.01158


We obtain analytical approximations for the expectation and variance of the Spectral Kurtosis estimator in the case of Gaussian and coherent transient time domain signals mixed with a quasi-stationary Gaussian background, which are suitable for practical estimations of their signal-to-noise ratio and duty-cycle relative to the instrumental integration time. We validate these analytical approximations by means of numerical simulations and demonstrate that such estimates are affected by statistical uncertainties that, for a suitable choice of the integration time, may not exceed a few percent. Based on these analytical results, we suggest a multiscale Spectral Kurtosis spectrometer design optimized for real-time detection of transient signals, automatic discrimination based on their statistical signature, and measurement of their properties.

Read this paper on arXiv…

G. Nita
Fri, 4 Mar 16
51/61

Comments: 12 pages, 8 figures, to appear in MNRAS

Using hydrodynamical simulations of stellar atmospheres for periodogram standardization : application to exoplanet detection [CL]

http://arxiv.org/abs/1601.07375


Our aim is to devise a detection method for exoplanet signatures (multiple sinusoids) that is both powerful and robust to partially unknown statistics under the null hypothesis. In the considered application, the noise is mostly created by the stellar atmosphere, with statistics depending on the complicated interplay of several parameters. Recent progresses in hydrodynamic (HD) simulations show however that realistic stellar noise realizations can be numerically produced off-line by astrophysicists. We propose a detection method that is calibrated by HD simulations and analyze its performances. A comparison of the theoretical results with simulations on synthetic and real data shows that the proposed method is powerful and robust.

Read this paper on arXiv…

S. Sulis, D. Mary and L. Bigot
Fri, 29 Jan 16
10/52

Comments: 5 pages, 3 figures. This manuscript was submitted and accepted to the 41st IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2016

Bayesian model comparison in cosmology [CEA]

http://arxiv.org/abs/1503.03414


The standard Bayesian model formalism comparison cannot be applied to most cosmological models as they lack well-motivated parameter priors. However, if the data-set being used is separable then it is possible to use some of the data to obtain the necessary parameter distributions, the rest of the data being retained for model comparison. While such methods are not fully prescriptive, they provide a route to applying Bayesian model comparison in cosmological situations where it could not otherwise be used.

Read this paper on arXiv…

D. Mortlock
Thu, 12 Mar 15
24/57

Comments: 4 pages, 2 figures; to appear in Statistical Challenges in 21st Century Cosmology, Proceedings IAU Symposium No. 306, A. H. Heavens & J.-L. Starck, eds

Bayesian inference of cosmic density fields from non-linear, scale-dependent, and stochastic biased tracers [CEA]

http://arxiv.org/abs/1408.2566


We present a Bayesian reconstruction algorithm to generate unbiased samples of the underlying dark matter field from galaxy redshift data. Our new contribution consists of implementing a non-Poisson likelihood including a deterministic non-linear and scale-dependent bias. In particular we present the Hamiltonian equations of motions for the negative binomial (NB) probability distribution function. This permits us to efficiently sample the posterior distribution function of density fields given a sample of galaxies using the Hamiltonian Monte Carlo technique implemented in the Argo code. We have tested our algorithm with the Bolshoi N-body simulation, inferring the underlying dark matter density field from a subsample of the halo catalogue. Our method shows that we can draw closely unbiased samples (compatible within 1-$\sigma$) from the posterior distribution up to scales of about k~1 h/Mpc in terms of power-spectra and cell-to-cell correlations. We find that a Poisson likelihood yields reconstructions with power spectra deviating more than 10% at k=0.2 h/Mpc. Our reconstruction algorithm is especially suited for emission line galaxy data for which a complex non-linear stochastic biasing treatment beyond Poissonity becomes indispensable.

Read this paper on arXiv…

M. Ata, F. Kitaura and V. Muller
Wed, 13 Aug 14
10/57

Comments: 8 pages, 4 figures

Fitting FFT-derived Spectra: Theory, Tool, and Application to Solar Radio Spike Decomposition [SSA]

http://arxiv.org/abs/1406.2280


Spectra derived from fast Fourier transform (FFT) analysis of time-domain data intrinsically contain statistical fluctuations whose distribution depends on the number of accumulated spectra contributing to a measurement. The tail of this distribution, which is essential for separation of the true signal from the statistical fluctuations, deviates noticeably from the normal distribution for a finite number of the accumulations. In this paper we develop a theory to properly account for the statistical fluctuations when fitting a model to a given accumulated spectrum. The method is implemented in software for the purpose of automatically fitting a large body of such FFT-derived spectra. We apply this tool to analyze a portion of a dense cluster of spikes recorded by our FST instrument during a record-breaking event that occurred on 06 Dec 2006. The outcome of this analysis is briefly discussed.

Read this paper on arXiv…

G. Nita, G. Fleishman, D. Gary, et. al.
Tue, 10 Jun 14
32/60

Comments: Accepted to ApJ, 57 pages, 16 figures

Fast Direct Methods for Gaussian Processes and the Analysis of NASA Kepler Mission Data [CL]

http://arxiv.org/abs/1403.6015


A number of problems in probability and statistics can be addressed using the multivariate normal (or multivariate Gaussian) distribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the $n$-dimensional setting, however, it requires the inversion of an $n \times n$ covariance matrix, $C$, as well as the evaluation of its determinant, $\det(C)$. In many cases, the covariance matrix is of the form $C = \sigma^2 I + K$, where $K$ is computed using a specified kernel, which depends on the data and additional parameters (called hyperparameters in Gaussian process computations). The matrix $C$ is typically dense, causing standard direct methods for inversion and determinant evaluation to require $\mathcal O(n^3)$ work. This cost is prohibitive for large-scale modeling. Here, we show that for the most commonly used covariance functions, the matrix $C$ can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an $\mathcal O (n\log^2 n) $ algorithm for inversion, as discussed in Ambikasaran and Darve, $2013$. More importantly, we show that this factorization enables the evaluation of the determinant $\det(C)$, permitting the direct calculation of probabilities in high dimensions under fairly broad assumption about the kernel defining $K$. Our fast algorithm brings many problems in marginalization and the adaptation of hyperparameters within practical reach using a single CPU core. The combination of nearly optimal scaling in terms of problem size with high-performance computing resources will permit the modeling of previously intractable problems. We illustrate the performance of the scheme on standard covariance kernels, and apply it to a real data set obtained from the $Kepler$ Mission.

Read this paper on arXiv…

S. Ambikasaran, D. Foreman-Mackey, L. Greengard, et. al.
Tue, 25 Mar 14
50/79

Interpreting the Distance Correlation COMBO-17 Results [CEA]

http://arxiv.org/abs/1402.3230


The accurate classification of galaxies in large-sample astrophysical databases of galaxy clusters depends sensitively on the ability to distinguish between morphological types, especially at higher redshifts. This capability can be enhanced through a new statistical measure of association and correlation, called the {\it distance correlation coefficient}, which is more powerful than the classical Pearson measure of linear relationships between two variables. The distance correlation measure offers a more precise alternative to the classical measure since it is capable of detecting nonlinear relationships that may appear in astrophysical applications. We showed recently that the comparison between the distance and Pearson correlation coefficients can be used effectively to isolate potential outliers in various galaxy datasets, and this comparison has the ability to confirm the level of accuracy associated with the data. In this work, we elucidate the advantages of distance correlation when applied to large databases. We illustrate how this distance correlation measure can be used effectively as a tool to confirm nonlinear relationships between various variables in the COMBO-17 database, including the lengths of the major and minor axes, and the alternative redshift distribution. For these outlier pairs, the distance correlation coefficient is routinely higher than the Pearson coefficient since it is easier to detect nonlinear relationships with distance correlation. The V-shaped scatterplots of Pearson versus distance correlation coefficients also reveal the patterns with increasing redshift and the contributions of different galaxy types within each redshift range.

Read this paper on arXiv…

M. Richards, D. Richards and E. Martinez-Gomez
Fri, 14 Feb 14
1/42

Automated Classification of Periodic Variable Stars detected by the Wide-field Infrared Survey Explorer [IMA]

http://arxiv.org/abs/1402.0125


We describe a methodology to classify periodic variable stars identified in the Wide-field Infrared Survey Explorer (WISE) full-mission single-exposure Source Database. This will assist in the future construction of a WISE periodic-Variable Source Database that assigns variables to specific science classes as constrained by the WISE observing cadence with statistically meaningful classification probabilities. We have analyzed the WISE light curves of 8273 variable stars identified in previous optical variability surveys (MACHO, GCVS, and ASAS) and show that Fourier decomposition techniques can be extended into the mid-IR to assist with their classification. Combined with other periodic light-curve features, this sample is then used to train a machine-learned classifier based on the random forest (RF) method. Consistent with previous classification studies of variable stars in general, the RF machine-learned classifier is superior to other methods in terms of accuracy, robustness against outliers, and relative immunity to features that carry little or redundant class information. For the three most common classes identified by WISE: Algols, RR Lyrae, and W Ursae Majoris type variables, we obtain classification efficiencies of 80.7%, 82.7%, and 84.5% respectively using cross-validation analyses, with 95% confidence intervals of approximately +/-2%. These accuracies are achieved at purity (or reliability) levels of 88.5%, 96.2%, and 87.8% respectively, similar to that achieved in previous automated classification studies of periodic variable stars.

Read this paper on arXiv…

Tue, 4 Feb 14
24/69