Separation of pulsar signals from noise with supervised machine learning algorithms [IMA]

We evaluate the performance of four different machine learning algorithms (ANN, Adaboost, GBC, XGBoost), in the separation of pulsars from radio frequency interference (RFI) and other sources of noise, using a dataset consisting of pulsar candidates obtained from the post-processing of a pulsar search pipeline. This dataset was previously used for cross-validation of the {\tt SPINN}-based machine learning engine, which was used for the re-processing of the HTRU-S survey. We report a variety of quality metrics from all four of these algorithms. We apply a model-independent information theoretic approach to determine the features with the most predictive power, and also compare with the feature importance results from the machine learning algorithms, wherever possible. We find that the RMS distance between the folded profile and sub-integrations is the most important feature in Adaboost and XGBoost. In the case of GBC, we find that the logarithm of the ratio of barycentric period and dispersion measure to be the most important feature. The information theoretic approach to feature importance yields a ranking very well matched to that based on GBC. For all the aforementioned machine learning techniques, we report a recall of 100% with false positive rates of 0.15%, 0.077%, 0.1%, 0.08% for ANN, Adaboost, GBC, and XGBoost respectively. Amongst all four of these algorithms, we find that Adaboost has the minimum overlap between the error rates as a function of threshold for detection of pulsars and RFI, and based on this criterion can be considered to be the best.

Read this paper on arXiv…

S. Bethapudi and S. Desai
Tue, 18 Apr 17
32/40

Comments: 13 pages, 12 figures