no code implementations • Findings (EMNLP) 2021 • Fuwei Cui, Hui Di, Hongjie Ren, Kazushige Ouchi, Ze Liu, Jinan Xu
Generative conversation systems tend to produce meaningless and generic responses, which significantly reduce the user experience.
no code implementations • 19 Dec 2024 • Junjie Zhou, Zheng Liu, Ze Liu, Shitao Xiao, Yueze Wang, Bo Zhao, Chen Jason Zhang, Defu Lian, Yongping Xiong
Despite the rapidly growing demand for multimodal retrieval, progress in this field remains severely constrained by a lack of training data.
1 code implementation • 21 Aug 2024 • Ze Liu, Jin Zhang, Chao Feng, Defu Lian, Jie Wang, Enhong Chen
Although advancements in deep learning have significantly enhanced the recommendation accuracy of deep recommendation models, these methods still suffer from low recommendation efficiency.
1 code implementation • 20 Apr 2024 • Guohao Wang, Ting Liu, Hongqiang Lyu, Ze Liu
The result highlights the effectiveness of biological language model in capturing both the order (sequential) and functional meaning (semantics) within genomes.
Ranked #1 on
Temporal Sequences
on f5C Dataset
1 code implementation • 27 Oct 2023 • Houwen Peng, Kan Wu, Yixuan Wei, Guoshuai Zhao, Yuxiang Yang, Ze Liu, Yifan Xiong, Ziyue Yang, Bolin Ni, Jingcheng Hu, Ruihang Li, Miaosen Zhang, Chen Li, Jia Ning, Ruizhe Wang, Zheng Zhang, Shuguang Liu, Joe Chau, Han Hu, Peng Cheng
In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs).
no code implementations • 8 Oct 2023 • Ze Liu
Given the growing computational complexity of these models and the scarcity of large, high-quality datasets, this research focuses on transfer learning, especially on few-shot, low-resource, and customized datasets.
1 code implementation • 8 Aug 2023 • Yichao Shen, Zigang Geng, Yuhui Yuan, Yutong Lin, Ze Liu, Chunyu Wang, Han Hu, Nanning Zheng, Baining Guo
We introduce a highly performant 3D object detector for point clouds using the DETR framework.
Ranked #2 on
3D Object Detection
on ScanNetV2
1 code implementation • CVPR 2023 • Zigang Geng, Chunyu Wang, Yixuan Wei, Ze Liu, Houqiang Li, Han Hu
Human pose is typically represented by a coordinate vector of body joints or their heatmap embeddings.
Ranked #1 on
Pose Estimation
on MPII Human Pose
1 code implementation • ICCV 2023 • Yixuan Wei, Han Hu, Zhenda Xie, Ze Liu, Zheng Zhang, Yue Cao, Jianmin Bao, Dong Chen, Baining Guo
Experiments suggest that the feature map distillation approach significantly boosts the fine-tuning performance of CLIP models on several typical downstream vision tasks.
no code implementations • 3 Nov 2022 • Yutong Lin, Ze Liu, Zheng Zhang, Han Hu, Nanning Zheng, Stephen Lin, Yue Cao
In this paper, we present a study of frozen pretrained models when applied to diverse and representative computer vision tasks, including object detection, semantic segmentation and video action recognition.
Ranked #3 on
Action Recognition In Videos
on Kinetics-400
2 code implementations • 7 Jun 2022 • Changho Hwang, Wei Cui, Yifan Xiong, Ziyue Yang, Ze Liu, Han Hu, Zilong Wang, Rafael Salas, Jithin Jose, Prabhat Ram, Joe Chau, Peng Cheng, Fan Yang, Mao Yang, Yongqiang Xiong
On efficiency, Flex accelerates SwinV2-MoE, achieving up to 1. 55x and 2. 11x speedup in training and inference over Fairseq, respectively.
22 code implementations • CVPR 2022 • Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo
Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.
Ranked #4 on
Image Classification
on ImageNet V2
(using extra training data)
15 code implementations • CVPR 2022 • Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, Han Hu
The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.
Ranked #28 on
Action Classification
on Kinetics-600
(using extra training data)
4 code implementations • ICCV 2021 • Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong
Instead of grouping local points to each object candidate, our method computes the feature of an object from all the points in the point cloud with the help of an attention mechanism in the Transformers \cite{vaswani2017attention}, where the contribution of each point is automatically learned in the network training.
Ranked #3 on
3D Object Detection
on SUN-RGBD
76 code implementations • ICCV 2021 • Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo
This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision.
Ranked #2 on
Image Classification
on OmniBenchmark
1 code implementation • 25 Jan 2021 • Qian Chen, Ze Liu, Yi Zhang, Keren Fu, Qijun Zhao, Hongwei Du
The proposed model, named RD3D, aims at pre-fusion in the encoder stage and in-depth fusion in the decoder stage to effectively promote the full integration of RGB and depth streams.
no code implementations • ICCVW 2021 • Zhuliang Yao, Yue Cao, Yutong Lin, Ze Liu, Zheng Zhang, Han Hu
Transformer-based vision architectures have attracted great attention because of the strong performance over the convolutional neural networks (CNNs).
1 code implementation • 4 Nov 2020 • Qian Chen, Keren Fu, Ze Liu, Geng Chen, Hongwei Du, Bensheng Qiu, LingShao
Finally, we propose an effective layer-wise aggregation module to fuse the features extracted from the enhanced depth maps and RGB images for the accurate detection of salient objects.
1 code implementation • ECCV 2020 • Ze Liu, Han Hu, Yue Cao, Zheng Zhang, Xin Tong
Our investigation reveals that despite the different designs of these operators, all of these operators make surprisingly similar contributions to the network performance under the same network input and feature numbers and result in the state-of-the-art accuracy on standard benchmarks.
Ranked #4 on
3D Semantic Segmentation
on PartNet