Joint learning of super-resolution (SR) and inverse tone-mapping (ITM) has been explored recently, to convert legacy low resolution (LR) standard dynamic range (SDR) videos to high resolution (HR) high dynamic range (HDR) videos for the growing need of UHD HDR TV/broadcasting applications. However, previous CNN-based methods directly reconstruct the HR HDR frames from LR SDR frames, and are only trained with a simple L2 loss. In this paper, we take a divide-and-conquer approach in designing a novel GAN-based joint SR-ITM network, called JSI-GAN, which is composed of three task-specific subnets: an image reconstruction subnet, a detail restoration (DR) subnet and a local contrast enhancement (LCE) subnet. We delicately design these subnets so that they are appropriately trained for the intended purpose, learning a pair of pixel-wise 1D separable filters via the DR subnet for detail restoration and a pixel-wise 2D local filter by the LCE subnet for contrast enhancement. Moreover, to train the JSI-GAN effectively, we propose a novel detail GAN loss alongside the conventional GAN loss, which helps enhancing both local details and contrasts to reconstruct high quality HR HDR results. When all subnets are jointly trained well, the predicted HR HDR results of higher quality are obtained with at least 0.41 dB gain in PSNR over those generated by the previous methods.