1 code implementation • Sensors 2021 • Ha Thi Phuong Thao, B T Balamurali, Gemma Roig, Dorien Herremans
The models that use all visual, audio, and text features simultaneously as their inputs performed better than those using features extracted from each modality separately.