We investigate various methods to encode positional information in transformer-based language models and propose a novel implementation named Rotary Position Embedding(RoPE).
Ranked #1 on Semantic Text Matching on CAIL2019-SCM - val
In contrast to previous approaches that either lack the ability to generalize to arbitrary identity or fail to preserve attributes like facial expression and gaze direction, our framework is capable of transferring the identity of an arbitrary source face into an arbitrary target face while preserving the attributes of the target face.
Ranked #1 on Face Swapping on FaceForensics++ (ID retrieval metric)
In this work, we propose to avoid manual annotation and generate a large-scale training dataset for video question answering making use of automatic cross-modal supervision.
Ranked #1 on Visual Question Answering on MSVD-QA (using extra training data)
We argue that immature data pipelines are preventing a large portion of industry practitioners from leveraging the latest research on recommender systems.
We present a CLSRIL-23, a self supervised learning based audio pre-trained model which learns cross lingual speech representations from raw audio across 23 Indic languages.
Their feedback identified that Gradio should support a variety of interfaces and frameworks, allow for easy sharing of the interface, allow for input manipulation and interactive inference by the domain expert, as well as allow embedding the interface in iPython notebooks.
Recent outstanding results of supervised object detection in competitions and challenges are often associated with specific metrics and datasets.
The toolkit aims to help both developers and researchers in the whole process of designing segmentation models, training models, optimizing performance and inference speed, and deploying models.
ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales.
Ranked #14 on Semantic Segmentation on PASCAL VOC 2012 val