no code implementations • 22 Mar 2024 • Zhichao Wei, Qingkun Su, Long Qin, Weizhi Wang
CLS embeddings are used on the one hand to augment the text embeddings, and on the other hand together with patch embeddings to derive a small number of detail-rich subject embeddings, both of which are efficiently integrated into the diffusion model through the well-designed multimodal cross-attention mechanism.
no code implementations • 18 Mar 2024 • Zhenghao Zhang, Zuozhuo Dai, Long Qin, Weizhi Wang
Large-scale text-to-video models have shown remarkable abilities, but their direct application in video editing remains challenging due to limited available datasets.
no code implementations • 5 Mar 2024 • Weizhi Wang, Khalil Mrini, Linjie Yang, Sateesh Kumar, Yu Tian, Xifeng Yan, Heng Wang
Our MLM filter can generalize to different models and tasks, and be used as a drop-in replacement for CLIPScore.
1 code implementation • 21 Nov 2023 • Zuozhuo Dai, Zhenghao Zhang, Yao Yao, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang
Image animation is a key task in computer vision which aims to generate dynamic visual content from static image.
no code implementations • 2 Nov 2023 • Xinlu Zhang, Yujie Lu, Weizhi Wang, An Yan, Jun Yan, Lianke Qin, Heng Wang, Xifeng Yan, William Yang Wang, Linda Ruth Petzold
Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details.
no code implementations • NeurIPS 2023 • Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei
Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness.
no code implementations • 7 Jun 2023 • Weizhi Wang, Hong Wang, Xifeng Yan
Therefore, to verify the order reasoning capability of current neural models in sequential tasks, we propose a challenging benchmark , named STEPS.
1 code implementation • 10 May 2023 • Hong Wang, Xuan Luo, Weizhi Wang, Xifeng Yan
Large language models like ChatGPT have recently demonstrated impressive capabilities in natural language understanding and generation, enabling various applications including translation, essay writing, and chit-chatting.
1 code implementation • 23 May 2022 • Yichao Du, Weizhi Wang, Zhirui Zhang, Boxing Chen, Tong Xu, Jun Xie, Enhong Chen
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency, and fewer parameters.
1 code implementation • 20 May 2022 • Weizhi Wang, Li Dong, Hao Cheng, Haoyu Song, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei
With the visually-augmented context, VaLM uses a visual knowledge fusion layer to enable multimodal grounded language modeling by attending to both text context and visual knowledge in images.
1 code implementation • 21 Dec 2021 • Yichao Du, Zhirui Zhang, Weizhi Wang, Boxing Chen, Jun Xie, Tong Xu
In this paper, we attempt to model the joint probability of transcription and translation based on the speech input to directly leverage such triplet data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • Findings (EMNLP) 2021 • Weizhi Wang, Zhirui Zhang, Yichao Du, Boxing Chen, Jun Xie, Weihua Luo
However, it usually suffers from capturing spurious correlations between the output language and language invariant semantics due to the maximum likelihood training objective, leading to poor transfer performance on zero-shot translation.
1 code implementation • 31 Aug 2021 • Weizhi Wang, Zhirui Zhang, Junliang Guo, Yinpei Dai, Boxing Chen, Weihua Luo
In this paper, we propose to formulate the task-oriented dialogue system as the purely natural language generation task, so as to fully leverage the large-scale pre-trained models like GPT-2 and simplify complicated delexicalization prepossessing.
no code implementations • LREC 2014 • Guoyu Tang, Yunqing Xia, Weizhi Wang, Raymond Lau, Fang Zheng
We address the polysemy issue with a Bayesian model, and the synonymy issue by exploiting the Wikipedia redirections.