Search Results for author: Li Yuan

Found 74 papers, 46 papers with code

Object Relation Detection Based on One-shot Learning

no code implementations • 16 Jul 2018 • Li Zhou, Jian Zhao, Jianshu Li, Li Yuan, Jiashi Feng

Detecting the relations among objects, such as "cat on sofa" and "person ride horse", is a crucial task in image understanding, and beneficial to bridging the semantic gap between images and natural language.

Object One-Shot Learning +1

Paper
Add Code

The Unconstrained Ear Recognition Challenge 2019 - ArXiv Version With Appendix

no code implementations • 11 Mar 2019 • Žiga Emeršič, Aruna Kumar S. V., B. S. Harish, Weronika Gutfeter, Jalil Nourmohammadi Khiarak, Andrzej Pacut, Earnest Hansley, Mauricio Pamplona Segundo, Sudeep Sarkar, Hyeonjung Park, Gi Pyo Nam, Ig-Jae Kim, Sagar G. Sangodkar, Ümit Kaçar, Murvet Kirci, Li Yuan, Jishou Yuan, Haonan Zhao, Fei Lu, Junying Mao, Xiaoshuang Zhang, Dogucan Yaman, Fevziye Irem Eyiokur, Kadir Bulut Özler, Hazim Kemal Ekenel, Debbrota Paul Chowdhury, Sambit Bakshi, Pankaj K. Sa, Banshidhar Majhi, Peter Peer, Vitomir Štruc

The goal of the challenge is to assess the performance of existing ear recognition techniques on a challenging large-scale ear dataset and to analyze performance of the technology from various viewpoints, such as generalization abilities to unseen data characteristics, sensitivity to rotations, occlusions and image resolution and performance bias on sub-groups of subjects, selected based on demographic criteria, i. e. gender and ethnicity.

Benchmarking Person Recognition

Paper
Add Code

Few-shot Adaptive Faster R-CNN

no code implementations • CVPR 2019 • Tao Wang, Xiaopeng Zhang, Li Yuan, Jiashi Feng

To address these challenges, we first introduce a pairing mechanism over source and target features to alleviate the issue of insufficient target domain samples.

object-detection Object Detection +1

Paper
Add Code

Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization

no code implementations • 17 Apr 2019 • Li Yuan, Francis EH Tay, Ping Li, Li Zhou, Jiashi Feng

The evaluator defines a learnable information preserving metric between original video and summary video and "supervises" the selector to identify the most informative frames to form the summary video.

Ranked #7 on Unsupervised Video Summarization on TvSum

Unsupervised Video Summarization

Paper
Add Code

Distilling Object Detectors with Fine-grained Feature Imitation

3 code implementations • CVPR 2019 • Tao Wang, Li Yuan, Xiaopeng Zhang, Jiashi Feng

To address the challenge of distilling knowledge in detection model, we propose a fine-grained feature imitation method exploiting the cross-location discrepancy of feature response.

Knowledge Distillation Object +2

412

Paper
Code

Central Similarity Quantization for Efficient Image and Video Retrieval

1 code implementation • CVPR 2020 • Li Yuan, Tao Wang, Xiaopeng Zhang, Francis EH Tay, Zequn Jie, Wei Liu, Jiashi Feng

In this work, we propose a new \emph{global} similarity metric, termed as \emph{central similarity}, with which the hash codes of similar data pairs are encouraged to approach a common center and those for dissimilar pairs to converge to different centers, to improve hash learning efficiency and retrieval accuracy.

Quantization Retrieval +1

227

Paper
Code

Revisiting Knowledge Distillation via Label Smoothing Regularization

2 code implementations • CVPR 2020 • Li Yuan, Francis E. H. Tay, Guilin Li, Tao Wang, Jiashi Feng

Without any extra computation cost, Tf-KD achieves up to 0. 65\% improvement on ImageNet over well-established baseline models, which is superior to label smoothing regularization.

Self-Knowledge Distillation

1,260

Paper
Code

YNU-HPCC at SemEval-2020 Task 8: Using a Parallel-Channel Model for Memotion Analysis

1 code implementation • SEMEVAL 2020 • Li Yuan, Jin Wang, Xue-jie Zhang

In recent years, the growing ubiquity of Internet memes on social media platforms, such as Facebook, Instagram, and Twitter, has become a topic of immense interest.

Emotion Recognition Sentiment Analysis +2

Paper
Code

Exploring global diverse attention via pairwise temporal relation for video summarization

no code implementations • 23 Sep 2020 • Ping Li, Qinghao Ye, Luming Zhang, Li Yuan, Xianghua Xu, Ling Shao

In this paper, we propose an efficient convolutional neural network architecture for video SUMmarization via Global Diverse Attention called SUM-GDA, which adapts attention mechanism in a global perspective to consider pairwise temporal relations of video frames.

Relation Video Summarization

Paper
Add Code

Toward Accurate Person-level Action Recognition in Videos of Crowded Scenes

no code implementations • 16 Oct 2020 • Li Yuan, Yichen Zhou, Shuning Chang, Ziyuan Huang, Yunpeng Chen, Xuecheng Nie, Tao Wang, Jiashi Feng, Shuicheng Yan

Prior works always fail to deal with this problem in two aspects: (1) lacking utilizing information of the scenes; (2) lacking training data in the crowd and complex scenes.

Action Recognition In Videos Semantic Segmentation

Paper
Add Code

Towards Accurate Human Pose Estimation in Videos of Crowded Scenes

no code implementations • 16 Oct 2020 • Li Yuan, Shuning Chang, Xuecheng Nie, Ziyuan Huang, Yichen Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan

In this paper, we focus on improving human pose estimation in videos of crowded scenes from the perspectives of exploiting temporal context and collecting new data.

Optical Flow Estimation Pose Estimation

Paper
Add Code

A Simple Baseline for Pose Tracking in Videos of Crowded Scenes

no code implementations • 16 Oct 2020 • Li Yuan, Shuning Chang, Ziyuan Huang, Yichen Zhou, Yunpeng Chen, Xuecheng Nie, Francis E. H. Tay, Jiashi Feng, Shuicheng Yan

This paper presents our solution to ACM MM challenge: Large-scale Human-centric Video Analysis in Complex Events\cite{lin2020human}; specifically, here we focus on Track3: Crowd Pose Tracking in Complex Events.

Multi-Object Tracking Optical Flow Estimation +1

Paper
Add Code

Fooling the primate brain with minimal, targeted image manipulation

no code implementations • 11 Nov 2020 • Li Yuan, Will Xiao, Giorgia Dellaferrera, Gabriel Kreiman, Francis E. H. Tay, Jiashi Feng, Margaret S. Livingstone

Here we propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reflected in behavior.

Adversarial Attack Image Manipulation

Paper
Add Code

Graph Attention Network with Memory Fusion for Aspect-level Sentiment Analysis

1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Li Yuan, Jin Wang, Liang-Chih Yu, Xuejie Zhang

Recent studies used attention-based methods that can effectively improve the performance of aspect-level sentiment analysis.

Graph Attention Sentiment Analysis

Paper
Code

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

13 code implementations • ICCV 2021 • Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis EH Tay, Jiashi Feng, Shuicheng Yan

To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study.

Ranked #400 on Image Classification on ImageNet

Image Classification Language Modelling

3,137

Paper
Code

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

1 code implementation • 31 Mar 2021 • Zeke Xie, Li Yuan, Zhanxing Zhu, Masashi Sugiyama

It is well-known that stochastic gradient noise (SGN) acts as implicit regularization for deep learning and is essentially important for both optimization and generalization of deep networks.

Paper
Code

All Tokens Matter: Token Labeling for Training Better Vision Transformers

6 code implementations • NeurIPS 2021 • Zihang Jiang, Qibin Hou, Li Yuan, Daquan Zhou, Yujun Shi, Xiaojie Jin, Anran Wang, Jiashi Feng

In this paper, we present token labeling -- a new training objective for training high-performance vision transformers (ViTs).

Ranked #3 on Efficient ViTs on ImageNet-1K (With LV-ViT-S)

Efficient ViTs General Classification +1

417

Paper
Code

Continual Learning via Bit-Level Information Preserving

1 code implementation • CVPR 2021 • Yujun Shi, Li Yuan, Yunpeng Chen, Jiashi Feng

Continual learning tackles the setting of learning different tasks sequentially.

Continual Learning Quantization

Paper
Code

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition

4 code implementations • 23 Jun 2021 • Qibin Hou, Zihang Jiang, Li Yuan, Ming-Ming Cheng, Shuicheng Yan, Jiashi Feng

By realizing the importance of the positional information carried by 2D feature representations, unlike recent MLP-like models that encode the spatial information along the flattened spatial dimensions, Vision Permutator separately encodes the feature representations along the height and width dimensions with linear projections.

184

Paper
Code

VOLO: Vision Outlooker for Visual Recognition

7 code implementations • 24 Jun 2021 • Li Yuan, Qibin Hou, Zihang Jiang, Jiashi Feng, Shuicheng Yan

Though recently the prevailing vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification, their performance is still inferior to that of the latest SOTA CNNs if no extra data are provided.

Ranked #1 on Image Classification on VizWiz-Classification

Domain Generalization Image Classification +1

29,671

Paper
Code

PnP-DETR: Towards Efficient Visual Analysis with Transformers

1 code implementation • ICCV 2021 • Tao Wang, Li Yuan, Yunpeng Chen, Jiashi Feng, Shuicheng Yan

Recently, DETR pioneered the solution of vision tasks with transformers, it directly translates the image feature map into the object detection result.

object-detection Object Detection +1

129

Paper
Code

Full Transformer Framework for Robust Point Cloud Registration with Deep Information Interaction

1 code implementation • 17 Dec 2021 • Guangyan Chen, Meiling Wang, Yufeng Yue, Qingxiang Zhang, Li Yuan

Recent Transformer-based methods have achieved advanced performance in point cloud registration by utilizing advantages of the Transformer in order-invariance and modeling dependency to aggregate information.

Geometric Matching Point Cloud Registration

Paper
Code

DynaMixer: A Vision MLP Architecture with Dynamic Mixing

2 code implementations • 28 Jan 2022 • Ziyu Wang, Wenhao Jiang, Yiming Zhu, Li Yuan, Yibing Song, Wei Liu

In contrast with vision transformers and CNNs, the success of MLP-like models shows that simple information fusion operations among tokens and channels can yield a good representation power for deep recognition models.

Image Classification

160

Paper
Code

Masked Autoencoders for Point Cloud Self-supervised Learning

3 code implementations • 13 Mar 2022 • Yatian Pang, Wenxiao Wang, Francis E. H. Tay, Wei Liu, Yonghong Tian, Li Yuan

Then, a standard Transformer based autoencoder, with an asymmetric design and a shifting mask tokens operation, learns high-level latent features from unmasked point patches, aiming to reconstruct the masked point patches.

Ranked #2 on Point Cloud Segmentation on PointCloud-C

3D Part Segmentation Few-Shot 3D Point Cloud Classification +2

388

Paper
Code

Improving Vision Transformers by Revisiting High-frequency Components

1 code implementation • 3 Apr 2022 • Jiawang Bai, Li Yuan, Shu-Tao Xia, Shuicheng Yan, Zhifeng Li, Wei Liu

Inspired by this finding, we first investigate the effects of existing techniques for improving ViT models from a new frequency perspective, and find that the success of some techniques (e. g., RandAugment) can be attributed to the better usage of the high-frequency components.

Ranked #2 on Domain Generalization on Stylized-ImageNet

Domain Generalization Image Classification +1

Paper
Code

Locality Guidance for Improving Vision Transformers on Tiny Datasets

1 code implementation • 20 Jul 2022 • Kehan Li, Runyi Yu, Zhennan Wang, Li Yuan, Guoli Song, Jie Chen

Therefore, our locality guidance approach is very simple and efficient, and can serve as a basic performance enhancement method for VTs on tiny datasets.

Paper
Code

Spikformer: When Spiking Neural Network Meets Transformer

2 code implementations • 29 Sep 2022 • Zhaokun Zhou, Yuesheng Zhu, Chao He, YaoWei Wang, Shuicheng Yan, Yonghong Tian, Li Yuan

Spikformer (66. 3M parameters) with comparable size to SEW-ResNet-152 (60. 2M, 69. 26%) can achieve 74. 81% top1 accuracy on ImageNet using 4 time steps, which is the state-of-the-art in directly trained SNNs models.

Image Classification

234

Paper
Code

ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation

no code implementations • CVPR 2023 • Kehan Li, Zhennan Wang, Zesen Cheng, Runyi Yu, Yian Zhao, Guoli Song, Chang Liu, Li Yuan, Jie Chen

Recently, self-supervised large-scale visual pre-training models have shown great promise in representing pixel-level semantic relationships, significantly promoting the development of unsupervised dense prediction tasks, e. g., unsupervised semantic segmentation (USS).

Image Segmentation Unsupervised Semantic Segmentation

Paper
Add Code

Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation

no code implementations • CVPR 2023 • Zesen Cheng, Pengchong Qiao, Kehan Li, Siheng Li, Pengxu Wei, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen

Weakly supervised semantic segmentation is typically inspired by class activation maps, which serve as pseudo masks with class-discriminative regions highlighted.

Optical Character Recognition (OCR) Weakly supervised Semantic Segmentation +1

Paper
Add Code

Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph Alignment Network and Word-pair Relation Tagging

1 code implementation • 28 Nov 2022 • Li Yuan, Yi Cai, Jin Wang, Qing Li

This paper is the first to propose jointly performing MNER and MRE as a joint multimodal entity-relation extraction task (JMERE).

graph construction named-entity-recognition +5

Paper
Code

Rethinking Point Cloud Registration as Masking and Reconstruction

1 code implementation • ICCV 2023 • Guangyan Chen, Meiling Wang, Li Yuan, Yi Yang, Yufeng Yue

In this paper, a critical observation is made that the invisible parts of each point cloud can be directly utilized as inherent masks, and the aligned point cloud pair can be regarded as the reconstruction target.

Point Cloud Registration

Paper
Code

MADAv2: Advanced Multi-Anchor Based Active Domain Adaptation Segmentation

1 code implementation • 18 Jan 2023 • Munan Ning, Donghuan Lu, Yujia Xie, Dongdong Chen, Dong Wei, Yefeng Zheng, Yonghong Tian, Shuicheng Yan, Li Yuan

Unsupervised domain adaption has been widely adopted in tasks with scarce annotated data.

Domain Adaptation Semantic Segmentation +1

Paper
Code

Parallel Vertex Diffusion for Unified Visual Grounding

no code implementations • 13 Mar 2023 • Zesen Cheng, Kehan Li, Peng Jin, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen

An intuitive materialization of our paradigm is Parallel Vertex Diffusion (PVD) to directly set vertex coordinates as the generation target and use a diffusion model to train and infer.

Visual Grounding

Paper
Add Code

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

4 code implementations • ICCV 2023 • Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Xiangyang Ji, Chang Liu, Li Yuan, Jie Chen

Existing text-video retrieval solutions are, in essence, discriminant models focused on maximizing the conditional likelihood, i. e., p(candidates|query).

Ranked #15 on Video Retrieval on MSVD

Retrieval Video Retrieval

Paper
Code

Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation

no code implementations • ICCV 2023 • Kehan Li, Yian Zhao, Zhennan Wang, Zesen Cheng, Peng Jin, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen

Interactive segmentation enables users to segment as needed by providing cues of objects, which introduces human-computer interaction for many fields, such as image editing and medical image analysis.

Interactive Segmentation

Paper
Add Code

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

4 code implementations • CVPR 2023 • Peng Jin, Jinfa Huang, Pengfei Xiong, Shangxuan Tian, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen

Contrastive learning-based video-language representation learning approaches, e. g., CLIP, have achieved outstanding performance, which pursue semantic interaction upon pre-defined video-text pairs.

Ranked #8 on Video Question Answering on MSRVTT-QA

Contrastive Learning Question Answering +5

Paper
Code

Learning with Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning

1 code implementation • CVPR 2023 • Zeyin Song, Yifan Zhao, Yujun Shi, Peixi Peng, Li Yuan, Yonghong Tian

However, in this work, we find that the CE loss is not ideal for the base session training as it suffers poor class separation in terms of representations, which further degrades generalization to novel classes.

Contrastive Learning Few-Shot Class-Incremental Learning +1

Paper
Code

Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders

no code implementations • 25 Apr 2023 • Heng Pan, Chenyang Liu, Wenxiao Wang, Li Yuan, Hongfa Wang, Zhifeng Li, Wei Liu

To study which type of deep features is appropriate for MIM as a learning target, we propose a simple MIM framework with serials of well-trained self-supervised models to convert an Image to a feature Vector as the learning target of MIM, where the feature extractor is also known as a teacher model.

Attribute Vocal Bursts Intensity Prediction

Paper
Add Code

PointGPT: Auto-regressively Generative Pre-training from Point Clouds

1 code implementation • NeurIPS 2023 • Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, Yufeng Yue

Large language models (LLMs) based on the generative pre-training transformer (GPT) have demonstrated remarkable effectiveness across a diverse range of downstream tasks.

Few-Shot Learning

163

Paper
Code

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

4 code implementations • 20 May 2023 • Peng Jin, Hao Li, Zesen Cheng, Jinfa Huang, Zhennan Wang, Li Yuan, Chang Liu, Jie Chen

In this paper, we propose the Disentangled Conceptualization and Set-to-set Alignment (DiCoSA) to simulate the conceptualizing and reasoning process of human beings.

Retrieval Video Retrieval

Paper
Code

Album Storytelling with Iterative Story-aware Captioning and Large Language Models

no code implementations • 22 May 2023 • Munan Ning, Yujia Xie, Dongdong Chen, Zeyin Song, Lu Yuan, Yonghong Tian, Qixiang Ye, Li Yuan

One natural approach is to use caption models to describe each photo in the album, and then use LLMs to summarize and rewrite the generated captions into an engaging story.

Paper
Add Code

Temporal Contrastive Learning for Spiking Neural Networks

no code implementations • 23 May 2023 • Haonan Qiu, Zeyin Song, Yanqi Chen, Munan Ning, Wei Fang, Tao Sun, Zhengyu Ma, Li Yuan, Yonghong Tian

However, in this work, we find the method above is not ideal for the SNNs training as it omits the temporal dynamics of SNNs and degrades the performance quickly with the decrease of inference time steps.

Contrastive Learning

Paper
Add Code

ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation

no code implementations • 24 May 2023 • Dongxu Yue, Qin Guo, Munan Ning, Jiaxi Cui, Yuesheng Zhu, Li Yuan

Despite the successful image reconstruction achieved by diffusion-based methods, there are still challenges in effectively manipulating fine-gained facial attributes with textual instructions. To address these issues and facilitate convenient manipulation of real facial images, we propose a novel approach that conduct text-driven image editing in the semantic latent space of diffusion model.

Attribute Image Reconstruction

Paper
Add Code

Auto-Spikformer: Spikformer Architecture Search

no code implementations • 1 Jun 2023 • Kaiwei Che, Zhaokun Zhou, Zhengyu Ma, Wei Fang, Yanqi Chen, Shuaijie Shen, Li Yuan, Yonghong Tian

The integration of self-attention mechanisms into Spiking Neural Networks (SNNs) has garnered considerable interest in the realm of advanced deep learning, primarily due to their biological properties.

Paper
Add Code

ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases

1 code implementation • 28 Jun 2023 • Jiaxi Cui, Zongjian Li, Yang Yan, Bohua Chen, Li Yuan

Furthermore, we propose a self-attention method to enhance the ability of large models to overcome errors present in reference data, further optimizing the issue of model hallucinations at the model level and improving the problem-solving capabilities of large models.

Language Modelling Large Language Model +1

6,361

Paper
Code

Spike-driven Transformer

1 code implementation • NeurIPS 2023 • Man Yao, Jiakui Hu, Zhaokun Zhou, Li Yuan, Yonghong Tian, Bo Xu, Guoqi Li

In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition.

168

Paper
Code

Learning Sparse Neural Networks with Identity Layers

no code implementations • 14 Jul 2023 • Mingjian Ni, Guangyao Chen, Xiawu Zheng, Peixi Peng, Li Yuan, Yonghong Tian

Applying such theory, we propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR, which utilizes CKA to reduce feature similarity between layers and increase network sparsity.

Paper
Add Code

Fully Transformer-Equipped Architecture for End-to-End Referring Video Object Segmentation

no code implementations • 21 Sep 2023 • Ping Li, Yu Zhang, Li Yuan, Xianghua Xu

Referring Video Object Segmentation (RVOS) requires segmenting the object in video referred by a natural language query.

Object Referring Video Object Segmentation +4

Paper
Add Code

Efficient Long-Short Temporal Attention Network for Unsupervised Video Object Segmentation

no code implementations • 21 Sep 2023 • Ping Li, Yu Zhang, Li Yuan, Huaxin Xiao, Binbin Lin, Xianghua Xu

Unsupervised Video Object Segmentation (VOS) aims at identifying the contours of primary foreground objects in videos without any prior knowledge.

Semantic Segmentation Unsupervised Video Object Segmentation +1

Paper
Add Code

Triple-View Knowledge Distillation for Semi-Supervised Semantic Segmentation

no code implementations • 22 Sep 2023 • Ping Li, Junjie Chen, Li Yuan, Xianghua Xu, Mingli Song

To alleviate the expensive human labeling, semi-supervised semantic segmentation employs a few labeled images and an abundant of unlabeled images to predict the pixel-level label map with the same size.

Feature Importance Knowledge Distillation +1

Paper
Add Code

Adversarial Attacks on Video Object Segmentation with Hard Region Discovery

no code implementations • 25 Sep 2023 • Ping Li, Yu Zhang, Li Yuan, Jian Zhao, Xianghua Xu, Xiaoqin Zhang

Particularly, the gradients from the segmentation model are exploited to discover the easily confused region, in which it is difficult to identify the pixel-wise objects from the background in a frame.

Autonomous Driving Object +5

Paper
Add Code

LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

1 code implementation • 2 Oct 2023 • Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, Li Yuan

This phenomenon forces us to revisit that hallucination may be another view of adversarial examples, and it shares similar features with conventional adversarial examples as the basic feature of LLMs.

Hallucination

Paper
Code

LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

4 code implementations • 3 Oct 2023 • Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, Hongfa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, Wancai Zhang, Zhifeng Li, Wei Liu, Li Yuan

We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.

Ranked #1 on Zero-shot Audio Classification on VGG-Sound (using extra training data)

Audio Classification Contrastive Learning +11

2,344

Paper
Code

HiFi-123: Towards High-fidelity One Image to 3D Content Generation

no code implementations • 10 Oct 2023 • Wangbo Yu, Li Yuan, Yan-Pei Cao, Xiangjun Gao, Xiaoyu Li, WenBo Hu, Long Quan, Ying Shan, Yonghong Tian

Our contributions are twofold: First, we propose a Reference-Guided Novel View Enhancement (RGNV) technique that significantly improves the fidelity of diffusion-based zero-shot novel view synthesis methods.

3D Generation Image to 3D +1

Paper
Add Code

IDRNet: Intervention-Driven Relation Network for Semantic Segmentation

1 code implementation • NeurIPS 2023 • Zhenchao Jin, Xiaowei Hu, Lingting Zhu, Luchuan Song, Li Yuan, Lequan Yu

Next, a deletion diagnostics procedure is conducted to model relations of these semantic-level representations via perceiving the network outputs and the extracted relations are utilized to guide the semantic-level representations to interact with each other.

Relation Relation Network +1

728

Paper
Code

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

no code implementations • 18 Oct 2023 • Xinhua Cheng, Tianyu Yang, Jianan Wang, Yu Li, Lei Zhang, Jian Zhang, Li Yuan

Recent text-to-3D generation methods achieve impressive 3D content creation capacity thanks to the advances in image diffusion models and optimizing strategies.

3D Generation Text to 3D

Paper
Add Code

Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

2 code implementations • 14 Nov 2023 • Peng Jin, Ryuichi Takanobu, Wancai Zhang, Xiaochun Cao, Li Yuan

Large language models have demonstrated impressive universal capabilities across a wide range of open-ended tasks and have extended their utility to encompass multimodal conversations.

Ranked #1 on Video-based Generative Performance Benchmarking (Consistency) on VideoInstruct

Image-based Generative Performance Benchmarking Language Modelling +9

615

Paper
Code

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

4 code implementations • 16 Nov 2023 • Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng Jin, Li Yuan

In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.

Ranked #2 on Zero-Shot Video Question Answer on TGIF-QA

Language Modelling Large Language Model +2

2,344

Paper
Code

Advancing Vision Transformers with Group-Mix Attention

1 code implementation • 26 Nov 2023 • Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, Ping Luo

The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value.

Image Classification object-detection +2

102

Paper
Code

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

1 code implementation • 27 Nov 2023 • Munan Ning, Bin Zhu, Yujia Xie, Bin Lin, Jiaxi Cui, Lu Yuan, Dongdong Chen, Li Yuan

Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries.

Decision Making Question Answering

Paper
Code

Regressor-Segmenter Mutual Prompt Learning for Crowd Counting

no code implementations • 4 Dec 2023 • Mingyue Guo, Li Yuan, Zhaoyi Yan, Binghui Chen, YaoWei Wang, Qixiang Ye

In this study, we propose mutual prompt learning (mPrompt), which leverages a regressor and a segmenter as guidance for each other, solving bias and inaccuracy caused by annotation variance while distinguishing foreground from background.

Crowd Counting

Paper
Add Code

FreestyleRet: Retrieving Images from Style-Diversified Queries

1 code implementation • 5 Dec 2023 • Hao Li, Curise Jia, Peng Jin, Zesen Cheng, Kehan Li, Jialu Sui, Chang Liu, Li Yuan

In this paper, we propose the Style-Diversified Query-Based Image Retrieval task, which enables retrieval based on various query styles.

Image Retrieval Retrieval

Paper
Code

Machine Mindset: An MBTI Exploration of Large Language Models

1 code implementation • 20 Dec 2023 • Jiaxi Cui, Liuzhenghao Lv, Jing Wen, Rongsheng Wang, Jing Tang, Yonghong Tian, Li Yuan

We present a novel approach for integrating Myers-Briggs Type Indicator (MBTI) personality traits into large language models (LLMs), addressing the challenges of personality consistency in personalized AI.

Large Language Model Personality Alignment +2

383

Paper
Code

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

1 code implementation • 20 Dec 2023 • Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan

The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency.

3D Generation Image to 3D

250

Paper
Code

Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket

1 code implementation • 4 Jan 2024 • Zhaokun Zhou, Kaiwei Che, Wei Fang, Keyu Tian, Yuesheng Zhu, Shuicheng Yan, Yonghong Tian, Li Yuan

To the best of our knowledge, this is the first time that the SNN achieves 80+% accuracy on ImageNet.

Image Classification Self-Supervised Learning

234

Paper
Code

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

2 code implementations • 29 Jan 2024 • Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Munan Ning, Li Yuan

In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs.

Ranked #52 on Visual Question Answering on MM-Vet

Hallucination Visual Question Answering

2,344

Paper
Code

Peer-review-in-LLMs: Automatic Evaluation Method for LLMs in Open-environment

1 code implementation • 2 Feb 2024 • Kun-Peng Ning, Shuo Yang, Yu-Yang Liu, Jia-Yu Yao, Zhen-Hui Liu, Yu Wang, Ming Pang, Li Yuan

Existing large language models (LLMs) evaluation methods typically focus on testing the performance on some closed-environment and domain-specific benchmarks with human annotations.

Paper
Code

LLMBind: A Unified Modality-Task Integration Framework

no code implementations • 22 Feb 2024 • Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan

In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.

Audio Generation Image Segmentation +3

Paper
Add Code

Optimal ANN-SNN Conversion with Group Neurons

1 code implementation • 29 Feb 2024 • Liuzhenghao Lv, Wei Fang, Li Yuan, Yonghong Tian

For instance, while converting artificial neural networks (ANNs) to SNNs circumvents the need for direct training of SNNs, it encounters issues related to conversion errors and high inference time delays.

Paper
Code

A Logical Pattern Memory Pre-trained Model for Entailment Tree Generation

1 code implementation • 11 Mar 2024 • Li Yuan, Yi Cai, Haopeng Ren, Jiexin Wang

LMPM incorporates an external memory structure to learn and store the latent representations of logical patterns, which aids in generating logically consistent conclusions.

Paper
Code

Envision3D: One Image to 3D with Anchor Views Interpolation

1 code implementation • 13 Mar 2024 • Yatian Pang, Tanghui Jia, Yujun Shi, Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Xing Zhou, Francis E. H. Tay, Li Yuan

To address this issue, we propose a novel cascade diffusion framework, which decomposes the challenging dense views generation task into two tractable stages, namely anchor views generation and anchor views interpolation.

Image to 3D

Paper
Code

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

2 code implementations • 25 Mar 2024 • Chenlin Zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Liwei Huang, Xiaopeng Fan, Li Yuan, Zhengyu Ma, Huihui Zhou, Yonghong Tian

ii) We incorporate the hierarchical structure, which significantly benefits the performance of both the brain and artificial neural networks, into spiking transformers to obtain multi-scale spiking representation.

Paper
Code

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

2 code implementations • 7 Apr 2024 • Shenghai Yuan, Jinfa Huang, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo

Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions.

Text-to-Video Generation Video Generation

963

Paper
Code

UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark

no code implementations • 15 Apr 2024 • Zhaokun Zhou, Qiulin Wang, Bin Lin, Yiwei Su, Rui Chen, Xin Tao, Amin Zheng, Li Yuan, Pengfei Wan, Di Zhang

To further evaluate the IAA capability of MLLMs, we construct the UNIAA-Bench, which consists of three aesthetic levels: Perception, Description, and Assessment.

Language Modelling Large Language Model

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.