We use machine learning (ML) to populate large dark matter-only simulations with baryonic galaxies. Our ML framework takes input halo properties including halo mass, environment, spin, and recent growth history, and outputs central galaxy and overall halo baryonic properties including stellar mass, star formation rate (SFR), metallicity, and neutral hydrogen mass. We apply this to the MUFASA cosmological hydrodynamic simulation, and show that it recovers the mean trends of output quantities with halo mass highly accurately, including following the sharp drop in SFR and gas in quenched massive galaxies. However, the scatter around the mean relations is under-predicted. Examining galaxies individually, at $z=0$ the stellar mass and metallicity are accurately recovered ($\sigma\lesssim 0.2$~dex), but SFR and HI show larger scatter ($\sigma\gtrsim 0.3$~dex); these values improve somewhat at $z=1,2$. Remarkably, ML quantitatively recovers second parameter trends in galaxy properties, e.g. that galaxies with higher gas content and lower metallicity have higher SFR at a given $M_$. Testing various ML algorithms, we find that none performs significantly better than the others. Ensembling the algorithms does not fare better, likely because of correlations between the algorithms and the fact that none of the algorithms predict the large observed scatter around the mean properties. For the random forest, we find that halo mass and nearby ($\sim 200$~kpc) environment are the most important predictive variables followed by growth history. We find that halo spin and $\sim$Mpc scale environment are not. Finally we study the impact of additionally inputting key baryonic properties $M_$, SFR, and $Z$, as would be available e.g. from an equilibrium model, and show that particularly providing the SFR enables HI to be recovered substantially more accurately.
S. Agarwal, R. Dave and B. Bassett
Tue, 12 Dec 17
Comments: 15 pages, 10 figures, 1 table, submitted to MNRAS