Search Results for author: Yihao Chen

Found 28 papers, 14 papers with code

ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

1 code implementation27 Nov 2024 Qing Jiang, Gen Luo, Yuqin Yang, Yuda Xiong, Yihao Chen, Zhaoyang Zeng, Tianhe Ren, Lei Zhang

From the data perspective, we build a fully automated data engine and construct the Rexverse-2M dataset which possesses multiple granularities to support the joint training of perception and understanding.

SGLP: A Similarity Guided Fast Layer Partition Pruning for Compressing Large Deep Models

1 code implementation14 Oct 2024 Yuqi Li, Yao Lu, Zeyu Dong, Chuanguang Yang, Yihao Chen, Jianping Gou

Based on similarity matrix derived from CKA, we employ Fisher Optimal Segmentation to partition the network into multiple segments, which provides a basis for removing the layers in a segment-wise manner.

Computational Efficiency Image Classification

An Effective Information Theoretic Framework for Channel Pruning

no code implementations14 Aug 2024 Yihao Chen, Zefang Wang

Channel pruning is a promising method for accelerating and compressing convolutional neural networks.

Model Compression

Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

2 code implementations16 May 2024 Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang

Empirical results demonstrate the effectiveness of Grounding DINO 1. 5, with the Grounding DINO 1. 5 Pro model attaining a 54. 3 AP on the COCO detection benchmark and a 55. 7 AP on the LVIS-minival zero-shot transfer benchmark, setting new records for open-set object detection.

 Ranked #1 on Zero-Shot Object Detection on MSCOCO (using extra training data)

Edge-computing Few-Shot Object Detection +2

An Empirical Study of Challenges in Machine Learning Asset Management

1 code implementation25 Feb 2024 Zhimin Zhao, Yihao Chen, Abdul Ali Bangash, Bram Adams, Ahmed E. Hassan

In machine learning (ML), efficient asset management, including ML models, datasets, algorithms, and tools, is vital for resource optimization, consistent performance, and a streamlined development lifecycle.

Asset Management

ADCNet: a unified framework for predicting the activity of antibody-drug conjugates

1 code implementation17 Jan 2024 Liye Chen, Biaoshun Li, Yihao Chen, Mujie Lin, Shipeng Zhang, Chenxin Li, Yu Pang, Ling Wang

Antibody-drug conjugate (ADC) has revolutionized the field of cancer treatment in the era of precision medicine due to their ability to precisely target cancer cells and release highly effective drug.

Activity Prediction Language Modelling +1

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

2 code implementations21 Dec 2023 Han Shu, Wenshuo Li, Yehui Tang, Yiman Zhang, Yihao Chen, Houqiang Li, Yunhe Wang, Xinghao Chen

Extensive experiments on various zero-shot transfer tasks demonstrate the significantly advantageous performance of our TinySAM against counterpart methods.

Knowledge Distillation Quantization

Generic and Robust Root Cause Localization for Multi-Dimensional Data in Online Service Systems

1 code implementation5 May 2023 Zeyan Li, Junjie Chen, Yihao Chen, Chengyang Luo, Yiwei Zhao, Yongqian Sun, Kaixin Sui, Xiping Wang, Dapeng Liu, Xing Jin, Qi Wang, Dan Pei

Such attribute combinations are substantial clues to the underlying root causes and thus are called root causes of multidimensional data.


LipsFormer: Introducing Lipschitz Continuity to Vision Transformers

1 code implementation19 Apr 2023 Xianbiao Qi, Jianan Wang, Yihao Chen, Yukai Shi, Lei Zhang

In contrast to previous practical tricks that address training instability by learning rate warmup, layer normalization, attention formulation, and weight initialization, we show that Lipschitz continuity is a more essential property to ensure training stability.

DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training

1 code implementation CVPR 2023 Yihao Chen, Xianbiao Qi, Jianan Wang, Lei Zhang

In this way, we can reduce the GPU memory consumption of contrastive loss computation from $\bigO(B^2)$ to $\bigO(\frac{B^2}{N})$, where $B$ and $N$ are the batch size and the number of GPUs used for training.

Contrastive Learning

Exploring Vision Transformers as Diffusion Learners

no code implementations28 Dec 2022 He Cao, Jianan Wang, Tianhe Ren, Xianbiao Qi, Yihao Chen, Yuan YAO, Lei Zhang

We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND).


The SpeakIn System Description for CNSRC2022

no code implementations22 Sep 2022 Yu Zheng, Yihao Chen, Jinghan Peng, Yajun Zhang, Min Liu, Minqiang Xu

In the SV task fixed track, our system was a fusion of five models, and two models were fused in the SV task open track.

Retrieval Speaker Recognition +1

3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of Transformer-MLP Paradigm for Dense Prediction in Medical Volume

no code implementations14 Apr 2022 Jianye Pang, Cheng Jiang, Yihao Chen, Jianbo Chang, Ming Feng, Renzhi Wang, Jianhua Yao

Therefore, designing an elegant and efficient vision transformer learner for dense prediction in medical volume is promising and challenging.

Inductive Bias

1st Place Solution for ICDAR 2021 Competition on Mathematical Formula Detection

1 code implementation12 Jul 2021 Yuxiang Zhong, Xianbiao Qi, Shanjun Li, Dengyi Gu, Yihao Chen, Peiyang Ning, Rong Xiao

In this technical report, we present our 1st place solution for the ICDAR 2021 competition on mathematical formula detection (MFD).

PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML

3 code implementations5 May 2021 Jiaquan Ye, Xianbiao Qi, Yelin He, Yihao Chen, Dengyi Gu, Peng Gao, Rong Xiao

In our method, we divide the table content recognition task into foursub-tasks: table structure recognition, text line detection, text line recognition, and box assignment. Our table structure recognition algorithm is customized based on MASTER [1], a robust image textrecognition algorithm.

Line Detection Table Recognition

Melody-Conditioned Lyrics Generation with SeqGANs

no code implementations28 Oct 2020 Yihao Chen, Alexander Lerch

Automatic lyrics generation has received attention from both music and AI communities for years.

Learning Graph Normalization for Graph Neural Networks

1 code implementation24 Sep 2020 Yihao Chen, Xin Tang, Xianbiao Qi, Chun-Guang Li, Rong Xiao

We conduct extensive experiments on benchmark datasets for different tasks, including node classification, link prediction, graph classification and graph regression, and confirm that the learned graph normalization leads to competitive results and that the learned weights suggest the appropriate normalization techniques for the specific task.

Graph Classification Graph Regression +2

Neural Mesh Refiner for 6-DoF Pose Estimation

no code implementations17 Mar 2020 Di Wu, Yihao Chen, Xianbiao Qi, Yongjian Yu, Weixuan Chen, Rong Xiao

We utilise the overlay between the accurate mask prediction and less accurate mesh prediction to iteratively optimise the direct regressed 6D pose information with a focus on translation estimation.

Autonomous Driving Instance Segmentation +4

MASTER: Multi-Aspect Non-local Network for Scene Text Recognition

7 code implementations7 Oct 2019 Ning Lu, Wenwen Yu, Xianbiao Qi, Yihao Chen, Ping Gong, Rong Xiao, Xiang Bai

Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture.

Decoder Scene Text Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.