Audio Model Blocks

FiLM Module

Introduced by Chen et al. in WaveGrad: Estimating Gradients for Waveform Generation

The Feature-wise linear modulation (FiLM) module combines information from both noisy waveform and input mel-spectrogram. It is used in the WaveGrad model. The authors also added iteration index $n$ which indicates the noise level of the input waveform by using the Transformer sinusoidal positional embedding. To condition on the noise level directly, $n$ is replaced by $\sqrt{\bar{\alpha}}$ and a linear scale $C = 5000$ is applied. The FiLM module produces both scale and bias vectors given inputs, which are used in a UBlock for feature-wise affine transformation as:

$$ \gamma\left(D, \sqrt{\bar{\alpha}}\right) \odot U + \zeta\left(D, \sqrt{\bar{\alpha}}\right) $$

where $\gamma$ and $\zeta$ correspond to the scaling and shift vectors from the FiLM module, $D$ is the output from corresponding DBlock, $U$ is an intermediate output in the UBlock.

Source: WaveGrad: Estimating Gradients for Waveform Generation


Paper Code Results Date Stars


Task Papers Share
Speech Synthesis 5 45.45%
Image Generation 2 18.18%
Denoising 2 18.18%
Text-To-Speech Synthesis 2 18.18%