People usually believe that network pruning not only reduces the computational cost of deep networks, but also prevents overfitting by decreasing model capacity.
In our model, the external knowledge is represented with sentence-level facts and graph-level facts, to properly suit the scenario of the composite of dialog history and image.
Visual dialogue is a challenging task that needs to extract implicit information from both visual (image) and textual (dialogue history) contexts.
The ability of generating detailed and non-repetitive responses is crucial for the agent to achieve human-like conversation.
In multi-person pose estimation, the left/right joint type discrimination is always a hard problem because of the similar appearance.
More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values.
Ranked #6 on Visual Dialog on VisDial v0.9 val
Upon the constructed graph, we propose a Scene Graph Convolutional Network (SceneGCN) to jointly reason the object properties and relational semantics for the correct answer.
The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description generation from natural images.
Semantic segmentation is one of the basic topics in computer vision, it aims to assign semantic labels to every pixel of an image.
A dual-path neural network model is proposed for couple feature learning in cross-modal information retrieval.
In this paper, we propose a data augmentation method using generative adversarial networks (GAN).
Under our learning policy, the Seq2Seq model can learn mappings gradually with noises.
In this paper, we explore a generative model for the task of generating unseen images with desired features.
Learning to generate colorful cartoon images from black-and-white sketches is not only an interesting research problem, but also a potential application in digital entertainment.