Search Results for author: Qi Qian

Found 40 papers, 20 papers with code

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

1 code implementation • 30 Nov 2023 • Anwen Hu, Yaya Shi, Haiyang Xu, Jiabo Ye, Qinghao Ye, Ming Yan, Chenliang Li, Qi Qian, Ji Zhang, Fei Huang

In this work, towards a more versatile copilot for academic paper writing, we mainly focus on strengthening the multi-modal diagram analysis ability of Multimodal LLMs.

Language Modelling Large Language Model

830

Paper
Code

Stable Cluster Discrimination for Deep Clustering

1 code implementation • ICCV 2023 • Qi Qian

Meanwhile, one-stage methods are developed mainly for representation learning rather than clustering, where various constraints for cluster assignments are designed to avoid collapsing explicitly.

Ranked #1 on Unsupervised Image Classification on CIFAR-10

Clustering Deep Clustering +3

Paper
Code

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

2 code implementations • 7 Nov 2023 • Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks.

Ranked #11 on Visual Question Answering (VQA) on InfiMM-Eval

Language Modelling Large Language Model +1

1,909

Paper
Code

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model

2 code implementations • 8 Oct 2023 • Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Guohai Xu, Chenliang Li, Junfeng Tian, Qi Qian, Ji Zhang, Qin Jin, Liang He, Xin Alex Lin, Fei Huang

Text is ubiquitous in our visual world, conveying crucial information, such as in documents, websites, and everyday photographs.

Language Modelling Large Language Model +1

830

Paper
Code

Graph Convolution Based Efficient Re-Ranking for Visual Retrieval

1 code implementation • 15 Jun 2023 • Yuqi Zhang, Qi Qian, Hongsong Wang, Chong Liu, Weihua Chen, Fan Wang

In particular, the plain GCR is extended for cross-camera retrieval and an improved feature propagation formulation is presented to leverage affinity relationships across different cameras.

Distributed Computing Image Retrieval +3

Paper
Code

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks

1 code implementation • 7 Jun 2023 • Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Guangwei Xu, Chenliang Li, Qi Qian, Maofei Que, Ji Zhang, Xiao Zeng, Fei Huang

In addition, to facilitate a comprehensive evaluation of video-language models, we carefully build the largest human-annotated Chinese benchmarks covering three popular video-language tasks of cross-modal retrieval, video captioning, and video category classification.

Cross-Modal Retrieval Language Modelling +3

254

Paper
Code

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

1 code implementation • 27 Apr 2023 • Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

Our code, pre-trained model, instruction-tuned models, and evaluation set are available at https://github. com/X-PLUG/mPLUG-Owl.

Ranked #3 on Visual Question Answering (VQA) on HallusionBench

Visual Question Answering (VQA) Zero-Shot Video Question Answer

1,909

Paper
Code

ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human

1 code implementation • 16 Apr 2023 • Junfeng Tian, Hehong Chen, Guohai Xu, Ming Yan, Xing Gao, Jianhai Zhang, Chenliang Li, Jiayi Liu, Wenshen Xu, Haiyang Xu, Qi Qian, Wei Wang, Qinghao Ye, Jiejing Zhang, Ji Zhang, Fei Huang, Jingren Zhou

In this paper, we present ChatPLUG, a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format.

World Knowledge

300

Paper
Code

Improved Visual Fine-tuning with Natural Language Supervision

1 code implementation • ICCV 2023 • Junyang Wang, Yuanhong Xu, Juhua Hu, Ming Yan, Jitao Sang, Qi Qian

Fine-tuning a visual pre-trained model can leverage the semantic information from large-scale pre-training data and mitigate the over-fitting problem on downstream vision tasks with limited training examples.

Paper
Code

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

4 code implementations • 1 Feb 2023 • Haiyang Xu, Qinghao Ye, Ming Yan, Yaya Shi, Jiabo Ye, Yuanhong Xu, Chenliang Li, Bin Bi, Qi Qian, Wei Wang, Guohai Xu, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou

In contrast to predominant paradigms of solely relying on sequence-to-sequence generation or encoder-based instance discrimination, mPLUG-2 introduces a multi-module composition network by sharing common universal modules for modality collaboration and disentangling different modality modules to deal with modality entanglement.

Ranked #1 on Video Captioning on MSR-VTT

Action Classification Image Classification +7

6,000

Paper
Code

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

no code implementations • ICCV 2023 • Qinghao Ye, Guohai Xu, Ming Yan, Haiyang Xu, Qi Qian, Ji Zhang, Fei Huang

We achieve state-of-the-art results on 15 well-established video-language understanding and generation tasks, especially on temporal-oriented datasets (e. g., SSv2-Template and SSv2-Label) with 8. 6% and 11. 1% improvement respectively.

Ranked #1 on Visual Question Answering (VQA) on TGIF-QA

TGIF-Action TGIF-Frame +7

Paper
Add Code

Semantic Data Augmentation based Distance Metric Learning for Domain Generalization

no code implementations • 2 Aug 2022 • Mengzhu Wang, Jianlong Yuan, Qi Qian, Zhibin Wang, Hao Li

Further, we provide an in-depth analysis of the mechanism and rational behind our approach, which gives us a better understanding of why leverage logits in lieu of features can help domain generalization.

Data Augmentation Domain Generalization +1

Paper
Add Code

An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation

no code implementations • 25 May 2022 • Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong Jin, Xiangyang Ji, Antoni B. Chan

With our empirical result obtained from 1, 330 models, we provide the following main observations: 1) ERM combined with data augmentation can achieve state-of-the-art performance if we choose a proper pre-trained model respecting the data property; 2) specialized algorithms further improve the robustness on top of ERM when handling a specific type of distribution shift, e. g., GroupDRO for spurious correlation and CORAL for large-scale out-of-distribution data; 3) Comparing different pre-training modes, architectures and data sizes, we provide novel observations about pre-training on distribution shift, which sheds light on designing or selecting pre-training strategy for different kinds of distribution shifts.

Data Augmentation

Paper
Add Code

RBGNet: Ray-based Grouping for 3D Object Detection

1 code implementation • CVPR 2022 • Haiyang Wang, Shaoshuai Shi, Ze Yang, Rongyao Fang, Qi Qian, Hongsheng Li, Bernt Schiele, LiWei Wang

In order to learn better representations of object shape to enhance cluster features for predicting 3D boxes, we propose a ray-based feature grouping module, which aggregates the point-wise features on object surfaces using a group of determined rays uniformly emitted from cluster centers.

Ranked #13 on 3D Object Detection on ScanNetV2

3D Object Detection Object +1

Paper
Code

Reconstruction Task Finds Universal Winning Tickets

no code implementations • 23 Feb 2022 • Ruichen Li, Binghui Li, Qi Qian, LiWei Wang

Pruning well-trained neural networks is effective to achieve a promising accuracy-efficiency trade-off in computer vision regimes.

Image Reconstruction object-detection +1

Paper
Add Code

Improved Fine-Tuning by Better Leveraging Pre-Training Data

no code implementations • 24 Nov 2021 • Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Xiangyang Ji, Antoni Chan, Rong Jin

The generalization result of using pre-training data shows that the excess risk bound on a target task can be improved when the appropriate pre-training data is included in fine-tuning.

Image Classification Learning Theory

Paper
Add Code

Dash: Semi-Supervised Learning with Dynamic Thresholding

no code implementations • 1 Sep 2021 • Yi Xu, Lei Shang, Jinxing Ye, Qi Qian, Yu-Feng Li, Baigui Sun, Hao Li, Rong Jin

In this work we develop a simple yet powerful framework, whose key idea is to select a subset of training examples from the unlabeled data when performing existing SSL methods so that only the unlabeled examples with pseudo labels related to the labeled data will be used to train models.

Ranked #1 on Semi-Supervised Image Classification on CIFAR-10, 250 Labels

Semi-Supervised Image Classification

Paper
Add Code

Unsupervised Visual Representation Learning by Online Constrained K-Means

1 code implementation • CVPR 2022 • Qi Qian, Yuanhong Xu, Juhua Hu, Hao Li, Rong Jin

Clustering is to assign each instance a pseudo label that will be used to learn representations in discrimination.

Ranked #5 on Unsupervised Image Classification on CIFAR-10

Clustering Contrastive Learning +6

Paper
Code

Why Does Multi-Epoch Training Help?

no code implementations • 13 May 2021 • Yi Xu, Qi Qian, Hao Li, Rong Jin

Stochastic gradient descent (SGD) has become the most attractive optimization method in training large-scale deep neural networks due to its simplicity, low computational cost in each updating step, and good performance.

Paper
Add Code

A Theoretical Analysis of Learning with Noisily Labeled Data

no code implementations • 8 Apr 2021 • Yi Xu, Qi Qian, Hao Li, Rong Jin

Noisy labels are very common in deep supervised learning.

Paper
Add Code

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

1 code implementation • CVPR 2021 • Qiang Zhou, Chaohui Yu, Zhibin Wang, Qi Qian, Hao Li

To alleviate the confirmation bias problem and improve the quality of pseudo annotations, we further propose a co-rectify scheme based on Instant-Teaching, denoted as Instant-Teaching$^*$.

Ranked #12 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)

Object object-detection +2

Paper
Code

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

2 code implementations • 1 Feb 2021 • Ming Lin, Pichao Wang, Zhenhong Sun, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong Jin

Comparing with previous NAS methods, the proposed Zen-NAS is magnitude times faster on multiple server-side and mobile-side GPU platforms with state-of-the-art accuracy on ImageNet.

Ranked #2 on Neural Architecture Search on ImageNet

Image Classification Neural Architecture Search

344

Paper
Code

Zen-NAS: A Zero-Shot NAS for High-Performance Image Recognition

2 code implementations • ICCV 2021 • Ming Lin, Pichao Wang, Zhenhong Sun, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong Jin

To address this issue, instead of using an accuracy predictor, we propose a novel zero-shot index dubbed Zen-Score to rank the architectures.

Neural Architecture Search Vocal Bursts Intensity Prediction

344

Paper
Code

WeMix: How to Better Utilize Data Augmentation

no code implementations • 3 Oct 2020 • Yi Xu, Asaf Noy, Ming Lin, Qi Qian, Hao Li, Rong Jin

To this end, we develop two novel algorithms, termed "AugDrop" and "MixLoss", to correct the data bias in the data augmentation.

Data Augmentation

Paper
Add Code

Improved Knowledge Distillation via Full Kernel Matrix Transfer

1 code implementation • 30 Sep 2020 • Qi Qian, Hao Li, Juhua Hu

Recently, a number of works propose to transfer the pairwise similarity between examples to distill relative information.

Knowledge Distillation Model Compression

Paper
Code

Semi-Anchored Detector for One-Stage Object Detection

no code implementations • 10 Sep 2020 • Lei Chen, Qi Qian, Hao Li

The anchor-free strategy benefits the classification task but can lead to sup-optimum for the regression task due to the lack of prior bounding boxes.

Classification General Classification +4

Paper
Add Code

Neural Architecture Design for GPU-Efficient Networks

2 code implementations • 24 Jun 2020 • Ming Lin, Hesen Chen, Xiuyu Sun, Qi Qian, Hao Li, Rong Jin

To address this issue, we propose a general principle for designing GPU-efficient networks based on extensive empirical studies.

Neural Architecture Search

29,648

Paper
Code

Towards Understanding Label Smoothing

no code implementations • 20 Jun 2020 • Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong Jin

Label smoothing regularization (LSR) has a great success in training deep neural networks by stochastic algorithms such as stochastic gradient descent and its variants.

Paper
Add Code

Weakly Supervised Representation Learning with Coarse Labels

1 code implementation • ICCV 2021 • Yuanhong Xu, Qi Qian, Hao Li, Rong Jin, Juhua Hu

To mitigate this challenge, we propose an algorithm to learn the fine-grained patterns for the target task, when only its coarse-class labels are available.

Learning with coarse labels Representation Learning

Paper
Code

Hierarchically Robust Representation Learning

no code implementations • CVPR 2020 • Qi Qian, Juhua Hu, Hao Li

Experiments on benchmark data sets demonstrate the effectiveness of the robust deep representations.

Representation Learning

Paper
Add Code

SoftTriple Loss: Deep Metric Learning Without Triplet Sampling

5 code implementations • ICCV 2019 • Qi Qian, Lei Shang, Baigui Sun, Juhua Hu, Hao Li, Rong Jin

The set of triplet constraints has to be sampled within the mini-batch.

Ranked #21 on Metric Learning on CUB-200-2011 (using extra training data)

Image Retrieval Metric Learning

197

Paper
Code

DR Loss: Improving Object Detection by Distributional Ranking

1 code implementation • CVPR 2020 • Qi Qian, Lei Chen, Hao Li, Rong Jin

This architecture is efficient but can suffer from the imbalance issue with respect to two aspects: the inter-class imbalance between the number of candidates from foreground and background classes and the intra-class imbalance in the hardness of background candidates, where only a few candidates are hard to be identified.

Object object-detection +1

111

Paper
Code

Robust Gaussian Process Regression for Real-Time High Precision GPS Signal Enhancement

no code implementations • 3 Jun 2019 • Ming Lin, Xiaomin Song, Qi Qian, Hao Li, Liang Sun, Shenghuo Zhu, Rong Jin

We validate the superiority of the proposed method in our real-time high precision positioning system against several popular state-of-the-art robust regression methods.

regression

Paper
Add Code

Which Factorization Machine Modeling is Better: A Theoretical Answer with Optimal Guarantee

no code implementations • 30 Jan 2019 • Ming Lin, Shuang Qiu, Jieping Ye, Xiaomin Song, Qi Qian, Liang Sun, Shenghuo Zhu, Rong Jin

This bound is sub-optimal comparing to the information theoretical lower bound $\mathcal{O}(kd)$.

Paper
Add Code

Large-scale Distance Metric Learning with Uncertainty

no code implementations • CVPR 2018 • Qi Qian, Jiasheng Tang, Hao Li, Shenghuo Zhu, Rong Jin

Furthermore, we can show that the metric is learned from latent examples only, but it can preserve the large margin property even for the original data.

Metric Learning

Paper
Add Code

Robust Optimization over Multiple Domains

no code implementations • 19 May 2018 • Qi Qian, Shenghuo Zhu, Jiasheng Tang, Rong Jin, Baigui Sun, Hao Li

Hence, we propose to learn the model and the adversarial distribution simultaneously with the stochastic algorithm for efficiency.

BIG-bench Machine Learning Cloud Computing +1

Paper
Add Code

Similarity Learning via Adaptive Regression and Its Application to Image Retrieval

no code implementations • 6 Dec 2015 • Qi Qian, Inci M. Baytas, Rong Jin, Anil Jain, Shenghuo Zhu

The similarity between pairs of images can be measured by the distances between their high dimensional representations, and the problem of learning the appropriate similarity is often addressed by distance metric learning.

Image Retrieval Metric Learning +2

Paper
Add Code

Towards Making High Dimensional Distance Metric Learning Practical

no code implementations • 15 Sep 2015 • Qi Qian, Rong Jin, Lijun Zhang, Shenghuo Zhu

In this work, we present a dual random projection frame for DML with high dimensional data that explicitly addresses the limitation of dimensionality reduction for DML.

Dimensionality Reduction Metric Learning +1

Paper
Add Code

Fine-Grained Visual Categorization via Multi-stage Metric Learning

no code implementations • CVPR 2015 • Qi Qian, Rong Jin, Shenghuo Zhu, Yuanqing Lin

To this end, we proposed a multi-stage metric learning framework that divides the large-scale high dimensional learning problem to a series of simple subproblems, achieving $\mathcal{O}(d)$ computational complexity.

Fine-Grained Visual Categorization Metric Learning

Paper
Add Code

Efficient Distance Metric Learning by Adaptive Sampling and Mini-Batch Stochastic Gradient Descent (SGD)

no code implementations • 3 Apr 2013 • Qi Qian, Rong Jin, Jin-Feng Yi, Lijun Zhang, Shenghuo Zhu

Although stochastic gradient descent (SGD) has been successfully applied to improve the efficiency of DML, it can still be computationally expensive because in order to ensure that the solution is a PSD matrix, it has to, at every iteration, project the updated distance metric onto the PSD cone, an expensive operation.

Computational Efficiency Metric Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.