Bayesian model checking: A comparison of tests [IMA]

Two procedures for checking Bayesian models are compared using a simple test problem based on the local Hubble expansion. Over four orders of magnitude, p-values derived from a global goodness-of-fit criterion for posterior probability density functions (Lucy 2017) agree closely with posterior predictive p-values. The former can therefore serve as an effective proxy for the difficult-to-calculate posterior predictive p-values.

Read this paper on arXiv…

L. Lucy
Thu, 21 Dec 17

Comments: 4 pages, 3 figures. Submitted to Astronomy & Astrophysics

A posteriori noise estimation in variable data sets [CL]

Most physical data sets contain a stochastic contribution produced by measurement noise or other random sources along with the signal. Usually, neither the signal nor the noise are accurately known prior to the measurement so that both have to be estimated a posteriori. We have studied a procedure to estimate the standard deviation of the stochastic contribution assuming normality and independence, requiring a sufficiently well-sampled data set to yield reliable results. This procedure is based on estimating the standard deviation in a sample of weighted sums of arbitrarily sampled data points and is identical to the so-called DER_SNR algorithm for specific parameter settings. To demonstrate the applicability of our procedure, we present applications to synthetic data, high-resolution spectra, and a large sample of space-based light curves and, finally, give guidelines to apply the procedure in situation not explicitly considered here to promote its adoption in data analysis.

Read this paper on arXiv…

S. Czesla, T. Molle and J. Schmitt
Thu, 7 Dec 17

Comments: Accepted for publication in A&A

Inference of signals with unknown correlation structure from non-linear measurements [CL]

We present a method to reconstruct auto-correlated signals together with their auto-correlation structure from non-linear, noisy measurements for arbitrary monotonous non-linearities. In the presented formulation the algorithm provides a significant speedup compared to prior implementations, allowing for a wider range of application. The non-linearity can be used to model instrument characteristics or to enforce properties on the underlying signal, such as positivity. Uncertainties on any posterior quantities can be provided due to independent samples from an approximate posterior distribution. We demonstrate the methods applicability via three examples, using different measurement instruments, non-linearities and dimensionality for both, simulated measurements and real data.

Read this paper on arXiv…

J. Knollmuller, T. Steininger and T. Ensslin
Thu, 9 Nov 17

Comments: N/A

On the use of the Edgeworth expansion in cosmology I: how to foresee and evade its pitfalls [CEA]

Non-linear gravitational collapse introduces non-Gaussian statistics into the matter fields of the late Universe. As the large-scale structure is the target of current and future observational campaigns, one would ideally like to have the full probability density function of these non-Gaussian fields. The only viable way we see to achieve this analytically, at least approximately and in the near future, is via the Edgeworth expansion. We hence rederive this expansion for Fourier modes of non-Gaussian fields and then continue by putting it into a wider statistical context than previously done. We show that in its original form, the Edgeworth expansion only works if the non-Gaussian signal is averaged away. This is counterproductive, since we target the parameter-dependent non-Gaussianities as a signal of interest. We hence alter the analysis at the decisive step and now provide a roadmap towards a controlled and unadulterated analysis of non-Gaussianities in structure formation (with the Edgeworth expansion). Our central result is that, although the Edgeworth expansion has pathological properties, these can be predicted and avoided in a careful manner. We also show that, despite the non-Gaussianity coupling all modes, the Edgeworth series may be applied to any desired subset of modes, since this is equivalent (to the level of the approximation) to marginalising over the exlcuded modes. In this first paper of a series, we restrict ourselves to the sampling properties of the Edgeworth expansion, i.e.~how faithfully it reproduces the distribution of non-Gaussian data. A follow-up paper will detail its Bayesian use, when parameters are to be inferred.

Read this paper on arXiv…

E. Sellentin, A. Jaffe and A. Heavens
Tue, 12 Sep 17

Comments: 25 pages, 7 figures

Towards information optimal simulation of partial differential equations [CL]

Most simulation schemes for partial differential equations (PDEs) focus on minimizing a simple error norm of a discretized version of a field. This paper takes a fundamentally different approach; the discretized field is interpreted as data providing information about a real physical field that is unknown. This information is sought to be conserved by the scheme as the field evolves in time. Such an information theoretic approach to simulation was pursued before by information field dynamics (IFD). In this paper we work out the theory of IFD for nonlinear PDEs in a noiseless Gaussian approximation. The result is an action that can be minimized to obtain an informationally optimal simulation scheme. It can be brought into a closed form using field operators to calculate the appearing Gaussian integrals. The resulting simulation schemes are tested numerically in two instances for the Burgers equation. Their accuracy surpasses finite-difference schemes on the same resolution. The IFD scheme, however, has to be correctly informed on the subgrid correlation structure. In certain limiting cases we recover well-known simulation schemes like spectral Fourier Galerkin methods. We discuss implications of the approximations made.

Read this paper on arXiv…

R. Leike and T. Ensslin
Tue, 12 Sep 17

Comments: N/A

Field dynamics inference via spectral density estimation [CL]

Stochastic differential equations (SDEs) are of utmost importance in various scientific and industrial areas. They are the natural description of dynamical processes whose precise equations of motion are either not known or too expensive to solve, e.g., when modeling Brownian motion. In some cases, the equations governing the dynamics of a physical system on macroscopic scales occur to be unknown since they typically cannot be deduced from general principles. In this work, we describe how the underlying laws of a stochastic process can be approximated by the spectral density of the corresponding process. Furthermore, we show how the density can be inferred from possibly very noisy and incomplete measurements of the dynamical field. Generally, inverse problems like these can be tackled with the help of Information Field Theory (IFT). For now, we restrict to linear and autonomous processes. Though, this is a non-conceptual limitation that may be omitted in future work. To demonstrate its applicability we employ our reconstruction algorithm on a time-series and spatio-temporal processes.

Read this paper on arXiv…

P. Frank, T. Steininger and T. Ensslin
Fri, 18 Aug 17

Comments: 12 pages, 9 figures

Massive data compression for parameter-dependent covariance matrices [CEA]

We show how the massive data compression algorithm MOPED can be used to reduce, by orders of magnitude, the number of simulated datasets that are required to estimate the covariance matrix required for the analysis of gaussian-distributed data. This is relevant when the covariance matrix cannot be calculated directly. The compression is especially valuable when the covariance matrix varies with the model parameters. In this case, it may be prohibitively expensive to run enough simulations to estimate the full covariance matrix throughout the parameter space. This compression may be particularly valuable for the next-generation of weak lensing surveys, such as proposed for Euclid and LSST, for which the number of summary data (such as band power or shear correlation estimates) is very large, $\sim 10^4$, due to the large number of tomographic redshift bins that the data will be divided into. In the pessimistic case where the covariance matrix is estimated separately for all points in an MCMC analysis, this may require an unfeasible $10^9$ simulations. We show here that MOPED can reduce this number by a factor of 1000, or a factor of $\sim 10^6$ if some regularity in the covariance matrix is assumed, reducing the number of simulations required to a manageable $10^3$, making an otherwise intractable analysis feasible.

Read this paper on arXiv…

A. Heavens, E. Sellentin, D. Mijolla, et. al.
Fri, 21 Jul 17

Comments: 7 pages. For submission to MNRAS