1 code implementation • 27 Nov 2024 • Qing Jiang, Gen Luo, Yuqin Yang, Yuda Xiong, Yihao Chen, Zhaoyang Zeng, Tianhe Ren, Lei Zhang
From the data perspective, we build a fully automated data engine and construct the Rexverse-2M dataset which possesses multiple granularities to support the joint training of perception and understanding.
1 code implementation • 21 Nov 2024 • Tianhe Ren, Yihao Chen, Qing Jiang, Zhaoyang Zeng, Yuda Xiong, Wenlong Liu, Zhengyu Ma, Junyi Shen, Yuan Gao, Xiaoke Jiang, Xingyu Chen, Zhuheng Song, Yuhong Zhang, Hongjie Huang, Han Gao, Shilong Liu, Hao Zhang, Feng Li, Kent Yu, Lei Zhang
DINO-X employs the same Transformer-based encoder-decoder architecture as Grounding DINO 1. 5 to pursue an object-level representation for open-world object understanding.
1 code implementation • 14 Oct 2024 • Yuqi Li, Yao Lu, Zeyu Dong, Chuanguang Yang, Yihao Chen, Jianping Gou
Based on similarity matrix derived from CKA, we employ Fisher Optimal Segmentation to partition the network into multiple segments, which provides a basis for removing the layers in a segment-wise manner.
no code implementations • 13 Sep 2024 • Yihao Chen, Yue Yang
This swimmer is a flexible object equipped with a muscle model based on torsional springs.
no code implementations • 3 Sep 2024 • Yihao Chen, Haochen Wu, Nan Jiang, Xiang Xia, Qing Gu, Yunqi Hao, Pengfei Cai, Yu Guan, Jialong Wang, Weilin Xie, Lei Fang, Sian Fang, Yan Song, Wu Guo, Lin Liu, Minqiang Xu
In Track 2, we continued using the CM system from Track 1 and fused it with a CNN-based ASV system.
no code implementations • 14 Aug 2024 • Yihao Chen, Zefang Wang
Channel pruning is a promising method for accelerating and compressing convolutional neural networks.
2 code implementations • 16 May 2024 • Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang
Empirical results demonstrate the effectiveness of Grounding DINO 1. 5, with the Grounding DINO 1. 5 Pro model attaining a 54. 3 AP on the COCO detection benchmark and a 55. 7 AP on the LVIS-minival zero-shot transfer benchmark, setting new records for open-set object detection.
Ranked #1 on Zero-Shot Object Detection on MSCOCO (using extra training data)
1 code implementation • 25 Feb 2024 • Zhimin Zhao, Yihao Chen, Abdul Ali Bangash, Bram Adams, Ahmed E. Hassan
In machine learning (ML), efficient asset management, including ML models, datasets, algorithms, and tools, is vital for resource optimization, consistent performance, and a streamlined development lifecycle.
1 code implementation • 17 Jan 2024 • Liye Chen, Biaoshun Li, Yihao Chen, Mujie Lin, Shipeng Zhang, Chenxin Li, Yu Pang, Ling Wang
Antibody-drug conjugate (ADC) has revolutionized the field of cancer treatment in the era of precision medicine due to their ability to precisely target cancer cells and release highly effective drug.
2 code implementations • 21 Dec 2023 • Han Shu, Wenshuo Li, Yehui Tang, Yiman Zhang, Yihao Chen, Houqiang Li, Yunhe Wang, Xinghao Chen
Extensive experiments on various zero-shot transfer tasks demonstrate the significantly advantageous performance of our TinySAM against counterpart methods.
no code implementations • 21 Nov 2023 • Shufa Wei, Xiaolong Xu, Xianbiao Qi, Xi Yin, Jun Xia, Jingyi Ren, Peijun Tang, Yuxiang Zhong, Yihao Chen, Xiaoqin Ren, Yuxin Liang, Liankai Huang, Kai Xie, Weikang Gui, Wei Tan, Shuanglong Sun, Yongquan Hu, Qinxian Liu, Nanjin Li, Chihao Dai, Lihua Wang, Xiaohui Liu, Lei Zhang, Yutao Xie
Our training corpus mainly consists of academic papers, thesis, content from some academic domain, high-quality Chinese data and others.
1 code implementation • 5 May 2023 • Zeyan Li, Junjie Chen, Yihao Chen, Chengyang Luo, Yiwei Zhao, Yongqian Sun, Kaixin Sui, Xiping Wang, Dapeng Liu, Xing Jin, Qi Wang, Dan Pei
Such attribute combinations are substantial clues to the underlying root causes and thus are called root causes of multidimensional data.
1 code implementation • 19 Apr 2023 • Xianbiao Qi, Jianan Wang, Yihao Chen, Yukai Shi, Lei Zhang
In contrast to previous practical tricks that address training instability by learning rate warmup, layer normalization, attention formulation, and weight initialization, we show that Lipschitz continuity is a more essential property to ensure training stability.
1 code implementation • CVPR 2023 • Yihao Chen, Xianbiao Qi, Jianan Wang, Lei Zhang
In this way, we can reduce the GPU memory consumption of contrastive loss computation from $\bigO(B^2)$ to $\bigO(\frac{B^2}{N})$, where $B$ and $N$ are the batch size and the number of GPUs used for training.
no code implementations • 28 Dec 2022 • He Cao, Jianan Wang, Tianhe Ren, Xianbiao Qi, Yihao Chen, Yuan YAO, Lei Zhang
We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND).
no code implementations • 23 Sep 2022 • Yu Zheng, Jinghan Peng, Yihao Chen, Yajun Zhang, Jialong Wang, Min Liu, Minqiang Xu
In the pre-training stage we reserve the speaker weights, and there are no positive samples to train them in this stage.
no code implementations • 22 Sep 2022 • Yu Zheng, Yihao Chen, Jinghan Peng, Yajun Zhang, Min Liu, Minqiang Xu
In the SV task fixed track, our system was a fusion of five models, and two models were fused in the SV task open track.
no code implementations • 17 Sep 2022 • Shunming Tao, Yihao Chen, Jingxing Wu, Duancheng Zhao, Hanxuan Cai, Ling Wang
Virus infection is one of the major diseases that seriously threaten human health.
no code implementations • 14 Apr 2022 • Jianye Pang, Cheng Jiang, Yihao Chen, Jianbo Chang, Ming Feng, Renzhi Wang, Jianhua Yao
Therefore, designing an elegant and efficient vision transformer learner for dense prediction in medical volume is promising and challenging.
1 code implementation • 12 Jul 2021 • Yuxiang Zhong, Xianbiao Qi, Shanjun Li, Dengyi Gu, Yihao Chen, Peiyang Ning, Rong Xiao
In this technical report, we present our 1st place solution for the ICDAR 2021 competition on mathematical formula detection (MFD).
no code implementations • 24 Jun 2021 • Yixuan Qiao, Hao Chen, Jun Wang, Yihao Chen, Xianbin Ye, Ziliang Li, Xianbiao Qi, Peng Gao, Guotong Xie
TextVQA requires models to read and reason about text in images to answer questions about them.
3 code implementations • 5 May 2021 • Jiaquan Ye, Xianbiao Qi, Yelin He, Yihao Chen, Dengyi Gu, Peng Gao, Rong Xiao
In our method, we divide the table content recognition task into foursub-tasks: table structure recognition, text line detection, text line recognition, and box assignment. Our table structure recognition algorithm is customized based on MASTER [1], a robust image textrecognition algorithm.
Ranked #2 on Table Recognition on PubTabNet
no code implementations • 5 May 2021 • Yelin He, Xianbiao Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, Rong Xiao
This paper presents our solution for the ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX.
no code implementations • 28 Oct 2020 • Yihao Chen, Alexander Lerch
Automatic lyrics generation has received attention from both music and AI communities for years.
1 code implementation • 24 Sep 2020 • Yihao Chen, Xin Tang, Xianbiao Qi, Chun-Guang Li, Rong Xiao
We conduct extensive experiments on benchmark datasets for different tasks, including node classification, link prediction, graph classification and graph regression, and confirm that the learned graph normalization leads to competitive results and that the learned weights suggest the appropriate normalization techniques for the specific task.
no code implementations • 23 Sep 2020 • Bingcong Li, Xin Tang, Xianbiao Qi, Yihao Chen, Rong Xiao
Thus, we propose a lightweight scene text recognition model named Hamming OCR.
no code implementations • 17 Mar 2020 • Di Wu, Yihao Chen, Xianbiao Qi, Yongjian Yu, Weixuan Chen, Rong Xiao
We utilise the overlay between the accurate mask prediction and less accurate mesh prediction to iteratively optimise the direct regressed 6D pose information with a focus on translation estimation.
7 code implementations • 7 Oct 2019 • Ning Lu, Wenwen Yu, Xianbiao Qi, Yihao Chen, Ping Gong, Rong Xiao, Xiang Bai
Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture.