Paper

Painting galaxies into dark matter halos using machine learning

We develop a machine learning (ML) framework to populate large dark matter-only simulations with baryonic galaxies. Our ML framework takes input halo properties including halo mass, environment, spin, and recent growth history, and outputs central galaxy and halo baryonic properties including stellar mass ($M_*$), star formation rate (SFR), metallicity ($Z$), neutral ($\rm HI$) and molecular ($\rm H_2$) hydrogen mass. We apply this to the MUFASA cosmological hydrodynamic simulation, and show that it recovers the mean trends of output quantities with halo mass highly accurately, including following the sharp drop in SFR and gas in quenched massive galaxies. However, the scatter around the mean relations is under-predicted. Examining galaxies individually, at $z=0$ the stellar mass and metallicity are accurately recovered ($\sigma\lesssim 0.2$~dex), but SFR and $\rm HI$ show larger scatter ($\sigma\gtrsim 0.3$~dex); these values improve somewhat at $z=1,2$. Remarkably, ML quantitatively recovers second parameter trends in galaxy properties, e.g. that galaxies with higher gas content and lower metallicity have higher SFR at a given $M_*$. Testing various ML algorithms, we find that none perform significantly better than the others, nor does ensembling improve performance, likely because none of the algorithms reproduce the large observed scatter around the mean properties. For the random forest algorithm, we find that halo mass and nearby ($\sim 200$~kpc) environment are the most important predictive variables followed by growth history, while halo spin and $\sim$Mpc scale environment are not important. Finally we study the impact of additionally inputting key baryonic properties $M_*$, SFR and $Z$, as would be available e.g. from an equilibrium model, and show that particularly providing the SFR enables $\rm HI$ to be recovered substantially more accurately.

Results in Papers With Code
(↓ scroll down to see all results)