no code implementations • 29 Mar 2024 • Barbara Toniella Corradini, Mustafa Shukor, Paul Couairon, Guillaume Couairon, Franco Scarselli, Matthieu Cord
The pipeline is as follows: the image is passed to both a captioner model (i. e. BLIP) and a diffusion model (i. e., Stable Diffusion Model) to generate a text description and visual representation, respectively.