Search Results for author: Kaipeng Zhang

Found 31 papers, 21 papers with code

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks

42 code implementations • 11 Apr 2016 • Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao

Face detection and alignment in unconstrained environment are challenging due to various poses, illuminations and occlusions.

Ranked #27 on Face Detection on WIDER Face (Easy)

Face Alignment Face Detection

9,917

Paper
Code

A Discriminative Feature Learning Approach for Deep Face Recognition

1 code implementation • ECCV 2016 2016 • Yandong Wen, Kaipeng Zhang, Zhifeng Li, Yu Qiao

In most of the available CNNs, the softmax loss function is used as the supervision signal to train the deep model.

Face Recognition Face Verification

939

Paper
Code

Detecting Faces Using Inside Cascaded Contextual CNN

no code implementations • ICCV 2017 • Kaipeng Zhang, Zhanpeng Zhang, Hao Wang, Zhifeng Li, Yu Qiao, Wei Liu

Deep Convolutional Neural Networks (CNNs) achieve substantial improvements in face detection in the wild.

Face Detection

Paper
Add Code

Super-Identity Convolutional Neural Network for Face Hallucination

no code implementations • ECCV 2018 • Kaipeng Zhang, Zhanpeng Zhang, Chia-Wen Cheng, Winston H. Hsu, Yu Qiao, Wei Liu, Tong Zhang

Face hallucination is a generative task to super-resolve the facial image with low resolution while human perception of face heavily relies on identity information.

Face Generation Face Hallucination +1

Paper
Add Code

Bootstrap Model Ensemble and Rank Loss for Engagement Intensity Regression

no code implementations • 8 Jul 2019 • Kai Wang, Jianfei Yang, Da Guo, Kaipeng Zhang, Xiaojiang Peng, Yu Qiao

Based on our winner solution last year, we mainly explore head features and body features with a bootstrap strategy and two novel loss functions in this paper.

regression

Paper
Add Code

FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution

1 code implementation • 9 Mar 2020 • Zhanpeng Zhang, Kaipeng Zhang

One challenge of semantic segmentation is to deal with the object scale variations and leverage the context.

Real-Time Semantic Segmentation Super-Resolution

Paper
Code

Neural Routing by Memory

no code implementations • NeurIPS 2021 • Kaipeng Zhang, Zhenqiang Li, Zhifeng Li, Wei Liu, Yoichi Sato

However, they use the same procedure sequence for all inputs, regardless of the intermediate features. This paper proffers a simple yet effective idea of constructing parallel procedures and assigning similar intermediate features to the same specialized procedures in a divide-and-conquer fashion.

Paper
Add Code

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

1 code implementation • ICCV 2023 • Mengzhao Chen, Wenqi Shao, Peng Xu, Mingbao Lin, Kaipeng Zhang, Fei Chao, Rongrong Ji, Yu Qiao, Ping Luo

Token compression aims to speed up large-scale vision transformers (e. g. ViTs) by pruning (dropping) or merging tokens.

Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-S)

Efficient ViTs

Paper
Code

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models

1 code implementation • 15 Jun 2023 • Peng Xu, Wenqi Shao, Kaipeng Zhang, Peng Gao, Shuo Liu, Meng Lei, Fanqing Meng, Siyuan Huang, Yu Qiao, Ping Luo

Large Vision-Language Models (LVLMs) have recently played a dominant role in multimodal vision-language learning.

Hallucination Image Captioning +3

360

Paper
Code

Align, Adapt and Inject: Sound-guided Unified Image Generation

no code implementations • 20 Jun 2023 • Yue Yang, Kaipeng Zhang, Yuying Ge, Wenqi Shao, Zeyue Xue, Yu Qiao, Ping Luo

Then, we propose the audio adapter to adapt audio representation into an audio token enriched with specific semantics, which can be injected into a frozen T2I model flexibly.

Image Generation Retrieval +1

Paper
Add Code

Meta-Transformer: A Unified Framework for Multimodal Learning

1 code implementation • 20 Jul 2023 • Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue

Multimodal learning aims to build models that can process and relate information from multiple modalities.

Time Series

1,436

Paper
Code

Tiny LVLM-eHub: Early Multimodal Experiments with Bard

1 code implementation • 7 Aug 2023 • Wenqi Shao, Yutao Hu, Peng Gao, Meng Lei, Kaipeng Zhang, Fanqing Meng, Peng Xu, Siyuan Huang, Hongsheng Li, Yu Qiao, Ping Luo

Secondly, it conducts an in-depth analysis of LVLMs' predictions using the ChatGPT Ensemble Evaluation (CEE), which leads to a robust and accurate evaluation and exhibits improved alignment with human evaluation compared to the word matching approach.

Hallucination Visual Reasoning

360

Paper
Code

Foundation Model is Efficient Multimodal Multitask Model Selector

1 code implementation • NeurIPS 2023 • Fanqing Meng, Wenqi Shao, Zhanglin Peng, Chonghe Jiang, Kaipeng Zhang, Yu Qiao, Ping Luo

This paper investigates an under-explored but important problem: given a collection of pre-trained neural networks, predicting their performance on each multi-modal task without fine-tuning them, such as image recognition, referring, captioning, visual question answering, and text question answering.

Model Selection Question Answering +1

Paper
Code

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

2 code implementations • 25 Aug 2023 • Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo

LWC modulates the extreme values of weights by optimizing the clipping threshold.

Common Sense Reasoning Computational Efficiency +3

548

Paper
Code

ImageBind-LLM: Multi-modality Instruction Tuning

2 code implementations • 7 Sep 2023 • Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao

During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder.

Instruction Following Text Generation

5,487

Paper
Code

Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching

no code implementations • 8 Oct 2023 • Hao Zhang, Lumin Xu, Shenqi Lai, Wenqi Shao, Nanning Zheng, Ping Luo, Yu Qiao, Kaipeng Zhang

Current image-based keypoint detection methods for animal (including human) bodies and faces are generally divided into full-supervised and few-shot class-agnostic approaches.

Keypoint Detection

Paper
Add Code

Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching

1 code implementation • 9 Oct 2023 • Ziyao Guo, Kai Wang, George Cazenavette, Hui Li, Kaipeng Zhang, Yang You

The ultimate goal of Dataset Distillation is to synthesize a small synthetic dataset such that a model trained on this synthetic set will perform equally well as a model trained on the full, real dataset.

Paper
Code

Towards Unified and Effective Domain Generalization

1 code implementation • 16 Oct 2023 • Yiyuan Zhang, Kaixiong Gong, Xiaohan Ding, Kaipeng Zhang, Fangrui Lv, Kurt Keutzer, Xiangyu Yue

We propose $\textbf{UniDG}$, a novel and $\textbf{Uni}$fied framework for $\textbf{D}$omain $\textbf{G}$eneralization that is capable of significantly enhancing the out-of-distribution generalization performance of foundation models regardless of their architectures.

Ranked #1 on Domain Generalization on TerraIncognita

Domain Generalization Out-of-Distribution Generalization

Paper
Code

DREAM+: Efficient Dataset Distillation by Bidirectional Representative Matching

1 code implementation • 23 Oct 2023 • Yanqing Liu, Jianyang Gu, Kai Wang, Zheng Zhu, Kaipeng Zhang, Wei Jiang, Yang You

Dataset distillation plays a crucial role in creating compact datasets with similar training performance compared with original large-scale ones.

Transfer Learning

Paper
Code

MLLMs-Augmented Visual-Language Representation Learning

1 code implementation • 30 Nov 2023 • Yanqing Liu, Kai Wang, Wenqi Shao, Ping Luo, Yu Qiao, Mike Zheng Shou, Kaipeng Zhang, Yang You

Visual-language pre-training has achieved remarkable success in many multi-modal tasks, largely attributed to the availability of large-scale image-text datasets.

Representation Learning Retrieval +1

Paper
Code

OneLLM: One Framework to Align All Modalities with Language

1 code implementation • 6 Dec 2023 • Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue

In detail, we first train an image projection module to connect a vision encoder with LLM.

Ranked #73 on Visual Question Answering on MM-Vet

Question Answering Visual Question Answering

440

Paper
Code

TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training

1 code implementation • 20 Dec 2023 • Yuqi Lin, Minghao Chen, Kaipeng Zhang, Hengjia Li, Mingming Li, Zheng Yang, Dongqin Lv, Binbin Lin, Haifeng Liu, Deng Cai

As a result, we dissect the preservation of patch-wise spatial information in CLIP and proposed a local-to-global framework to obtain image tags.

Ranked #1 on Unsupervised Semantic Segmentation with Language-image Pre-training on PASCAL VOC

Classification Multi-Label Classification +3

Paper
Code

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

1 code implementation • 4 Jan 2024 • Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo

Charts play a vital role in data visualization, understanding data patterns, and informed decision-making.

Data Visualization Decision Making +2

Paper
Code

Two Trades is not Baffled: Condensing Graph via Crafting Rational Gradient Matching

1 code implementation • 7 Feb 2024 • Tianle Zhang, Yuchen Zhang, Kun Wang, Kai Wang, Beining Yang, Kaipeng Zhang, Wenqi Shao, Ping Liu, Joey Tianyi Zhou, Yang You

Training on large-scale graphs has achieved remarkable results in graph representation learning, but its cost and storage have raised growing concerns.

Graph Representation Learning

Paper
Code

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

1 code implementation • 8 Feb 2024 • Peng Gao, Renrui Zhang, Chris Liu, Longtian Qiu, Siyuan Huang, Weifeng Lin, Shitian Zhao, Shijie Geng, Ziyi Lin, Peng Jin, Kaipeng Zhang, Wenqi Shao, Chao Xu, Conghui He, Junjun He, Hao Shao, Pan Lu, Hongsheng Li, Yu Qiao

We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX.

Ranked #4 on Video Question Answering on MVBench

Benchmarking Language Modelling +4

2,470

Paper
Code

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation

1 code implementation • 18 Feb 2024 • Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo

Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc.

Question Answering Text Summarization

Paper
Code

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

no code implementations • 22 Feb 2024 • Junting Chen, Yao Mu, Qiaojun Yu, Tianming Wei, Silang Wu, Zhecheng Yuan, Zhixuan Liang, Chao Yang, Kaipeng Zhang, Wenqi Shao, Yu Qiao, Huazhe Xu, Mingyu Ding, Ping Luo

To bridge this ``ideal-to-real'' gap, this paper presents \textbf{RobotScript}, a platform for 1) a deployable robot manipulation pipeline powered by code generation; and 2) a code generation benchmark for robot manipulation tasks in free-form natural language.

Code Generation Common Sense Reasoning +2

Paper
Add Code

Towards Implicit Prompt For Text-To-Image Models

no code implementations • 4 Mar 2024 • Yue Yang, Yuqi Lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang, Ping Luo

We call for increased attention to the potential and risks of implicit prompts in the T2I community and further investigation into the capabilities and impacts of implicit prompts, advocating for a balanced approach that harnesses their benefits while mitigating their risks.

Position

Paper
Add Code

AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Adversarial Visual-Instructions

no code implementations • 14 Mar 2024 • Hao Zhang, Wenqi Shao, Hong Liu, Yongqiang Ma, Ping Luo, Yu Qiao, Kaipeng Zhang

To bridge this gap, we introduce AVIBench, a framework designed to analyze the robustness of LVLMs when facing various adversarial visual-instructions (AVIs), including four types of image-based AVIs, ten types of text-based AVIs, and nine types of content bias AVIs (such as gender, violence, cultural, and racial biases, among others).

Fairness Language Modelling

Paper
Add Code

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

1 code implementation • 31 Mar 2024 • Lirui Zhao, Yue Yang, Kaipeng Zhang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Rongrong Ji

Text-to-image (T2I) generative models have attracted significant attention and found extensive applications within and beyond academic research.

Language Modelling Large Language Model

Paper
Code

Adapting LLaMA Decoder to Vision Transformer

no code implementations • 10 Apr 2024 • Jiahao Wang, Wenqi Shao, Mengzhao Chen, Chengyue Wu, Yong liu, Kaipeng Zhang, Songyang Zhang, Kai Chen, Ping Luo

We first "LLaMAfy" a standard ViT step-by-step to align with LLaMA's architecture, and find that directly applying a casual mask to the self-attention brings an attention collapse issue, resulting in the failure to the network training.

Computational Efficiency Quantization +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.