Search Results for author: Li Yuan

Found 74 papers, 46 papers with code

Object Relation Detection Based on One-shot Learning

no code implementations16 Jul 2018 Li Zhou, Jian Zhao, Jianshu Li, Li Yuan, Jiashi Feng

Detecting the relations among objects, such as "cat on sofa" and "person ride horse", is a crucial task in image understanding, and beneficial to bridging the semantic gap between images and natural language.

Object One-Shot Learning +1

The Unconstrained Ear Recognition Challenge 2019 - ArXiv Version With Appendix

no code implementations11 Mar 2019 Žiga Emeršič, Aruna Kumar S. V., B. S. Harish, Weronika Gutfeter, Jalil Nourmohammadi Khiarak, Andrzej Pacut, Earnest Hansley, Mauricio Pamplona Segundo, Sudeep Sarkar, Hyeonjung Park, Gi Pyo Nam, Ig-Jae Kim, Sagar G. Sangodkar, Ümit Kaçar, Murvet Kirci, Li Yuan, Jishou Yuan, Haonan Zhao, Fei Lu, Junying Mao, Xiaoshuang Zhang, Dogucan Yaman, Fevziye Irem Eyiokur, Kadir Bulut Özler, Hazim Kemal Ekenel, Debbrota Paul Chowdhury, Sambit Bakshi, Pankaj K. Sa, Banshidhar Majhi, Peter Peer, Vitomir Štruc

The goal of the challenge is to assess the performance of existing ear recognition techniques on a challenging large-scale ear dataset and to analyze performance of the technology from various viewpoints, such as generalization abilities to unseen data characteristics, sensitivity to rotations, occlusions and image resolution and performance bias on sub-groups of subjects, selected based on demographic criteria, i. e. gender and ethnicity.

Benchmarking Person Recognition

Few-shot Adaptive Faster R-CNN

no code implementations CVPR 2019 Tao Wang, Xiaopeng Zhang, Li Yuan, Jiashi Feng

To address these challenges, we first introduce a pairing mechanism over source and target features to alleviate the issue of insufficient target domain samples.

object-detection Object Detection +1

Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization

no code implementations17 Apr 2019 Li Yuan, Francis EH Tay, Ping Li, Li Zhou, Jiashi Feng

The evaluator defines a learnable information preserving metric between original video and summary video and "supervises" the selector to identify the most informative frames to form the summary video.

Unsupervised Video Summarization

Distilling Object Detectors with Fine-grained Feature Imitation

3 code implementations CVPR 2019 Tao Wang, Li Yuan, Xiaopeng Zhang, Jiashi Feng

To address the challenge of distilling knowledge in detection model, we propose a fine-grained feature imitation method exploiting the cross-location discrepancy of feature response.

Knowledge Distillation Object +2

Central Similarity Quantization for Efficient Image and Video Retrieval

1 code implementation CVPR 2020 Li Yuan, Tao Wang, Xiaopeng Zhang, Francis EH Tay, Zequn Jie, Wei Liu, Jiashi Feng

In this work, we propose a new \emph{global} similarity metric, termed as \emph{central similarity}, with which the hash codes of similar data pairs are encouraged to approach a common center and those for dissimilar pairs to converge to different centers, to improve hash learning efficiency and retrieval accuracy.

Quantization Retrieval +1

Revisiting Knowledge Distillation via Label Smoothing Regularization

2 code implementations CVPR 2020 Li Yuan, Francis E. H. Tay, Guilin Li, Tao Wang, Jiashi Feng

Without any extra computation cost, Tf-KD achieves up to 0. 65\% improvement on ImageNet over well-established baseline models, which is superior to label smoothing regularization.

Self-Knowledge Distillation

YNU-HPCC at SemEval-2020 Task 8: Using a Parallel-Channel Model for Memotion Analysis

1 code implementation SEMEVAL 2020 Li Yuan, Jin Wang, Xue-jie Zhang

In recent years, the growing ubiquity of Internet memes on social media platforms, such as Facebook, Instagram, and Twitter, has become a topic of immense interest.

Emotion Recognition Sentiment Analysis +2

Exploring global diverse attention via pairwise temporal relation for video summarization

no code implementations23 Sep 2020 Ping Li, Qinghao Ye, Luming Zhang, Li Yuan, Xianghua Xu, Ling Shao

In this paper, we propose an efficient convolutional neural network architecture for video SUMmarization via Global Diverse Attention called SUM-GDA, which adapts attention mechanism in a global perspective to consider pairwise temporal relations of video frames.

Relation Video Summarization

Toward Accurate Person-level Action Recognition in Videos of Crowded Scenes

no code implementations16 Oct 2020 Li Yuan, Yichen Zhou, Shuning Chang, Ziyuan Huang, Yunpeng Chen, Xuecheng Nie, Tao Wang, Jiashi Feng, Shuicheng Yan

Prior works always fail to deal with this problem in two aspects: (1) lacking utilizing information of the scenes; (2) lacking training data in the crowd and complex scenes.

Action Recognition In Videos Semantic Segmentation

Towards Accurate Human Pose Estimation in Videos of Crowded Scenes

no code implementations16 Oct 2020 Li Yuan, Shuning Chang, Xuecheng Nie, Ziyuan Huang, Yichen Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan

In this paper, we focus on improving human pose estimation in videos of crowded scenes from the perspectives of exploiting temporal context and collecting new data.

Optical Flow Estimation Pose Estimation

A Simple Baseline for Pose Tracking in Videos of Crowded Scenes

no code implementations16 Oct 2020 Li Yuan, Shuning Chang, Ziyuan Huang, Yichen Zhou, Yunpeng Chen, Xuecheng Nie, Francis E. H. Tay, Jiashi Feng, Shuicheng Yan

This paper presents our solution to ACM MM challenge: Large-scale Human-centric Video Analysis in Complex Events\cite{lin2020human}; specifically, here we focus on Track3: Crowd Pose Tracking in Complex Events.

Multi-Object Tracking Optical Flow Estimation +1

Fooling the primate brain with minimal, targeted image manipulation

no code implementations11 Nov 2020 Li Yuan, Will Xiao, Giorgia Dellaferrera, Gabriel Kreiman, Francis E. H. Tay, Jiashi Feng, Margaret S. Livingstone

Here we propose an array of methods for creating minimal, targeted image perturbations that lead to changes in both neuronal activity and perception as reflected in behavior.

Adversarial Attack Image Manipulation

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

13 code implementations ICCV 2021 Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis EH Tay, Jiashi Feng, Shuicheng Yan

To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study.

Image Classification Language Modelling

Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization

1 code implementation31 Mar 2021 Zeke Xie, Li Yuan, Zhanxing Zhu, Masashi Sugiyama

It is well-known that stochastic gradient noise (SGN) acts as implicit regularization for deep learning and is essentially important for both optimization and generalization of deep networks.

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition

4 code implementations23 Jun 2021 Qibin Hou, Zihang Jiang, Li Yuan, Ming-Ming Cheng, Shuicheng Yan, Jiashi Feng

By realizing the importance of the positional information carried by 2D feature representations, unlike recent MLP-like models that encode the spatial information along the flattened spatial dimensions, Vision Permutator separately encodes the feature representations along the height and width dimensions with linear projections.

VOLO: Vision Outlooker for Visual Recognition

7 code implementations24 Jun 2021 Li Yuan, Qibin Hou, Zihang Jiang, Jiashi Feng, Shuicheng Yan

Though recently the prevailing vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification, their performance is still inferior to that of the latest SOTA CNNs if no extra data are provided.

Domain Generalization Image Classification +1

PnP-DETR: Towards Efficient Visual Analysis with Transformers

1 code implementation ICCV 2021 Tao Wang, Li Yuan, Yunpeng Chen, Jiashi Feng, Shuicheng Yan

Recently, DETR pioneered the solution of vision tasks with transformers, it directly translates the image feature map into the object detection result.

object-detection Object Detection +1

Full Transformer Framework for Robust Point Cloud Registration with Deep Information Interaction

1 code implementation17 Dec 2021 Guangyan Chen, Meiling Wang, Yufeng Yue, Qingxiang Zhang, Li Yuan

Recent Transformer-based methods have achieved advanced performance in point cloud registration by utilizing advantages of the Transformer in order-invariance and modeling dependency to aggregate information.

Geometric Matching Point Cloud Registration

DynaMixer: A Vision MLP Architecture with Dynamic Mixing

2 code implementations28 Jan 2022 Ziyu Wang, Wenhao Jiang, Yiming Zhu, Li Yuan, Yibing Song, Wei Liu

In contrast with vision transformers and CNNs, the success of MLP-like models shows that simple information fusion operations among tokens and channels can yield a good representation power for deep recognition models.

Image Classification

Masked Autoencoders for Point Cloud Self-supervised Learning

3 code implementations13 Mar 2022 Yatian Pang, Wenxiao Wang, Francis E. H. Tay, Wei Liu, Yonghong Tian, Li Yuan

Then, a standard Transformer based autoencoder, with an asymmetric design and a shifting mask tokens operation, learns high-level latent features from unmasked point patches, aiming to reconstruct the masked point patches.

3D Part Segmentation Few-Shot 3D Point Cloud Classification +2

Improving Vision Transformers by Revisiting High-frequency Components

1 code implementation3 Apr 2022 Jiawang Bai, Li Yuan, Shu-Tao Xia, Shuicheng Yan, Zhifeng Li, Wei Liu

Inspired by this finding, we first investigate the effects of existing techniques for improving ViT models from a new frequency perspective, and find that the success of some techniques (e. g., RandAugment) can be attributed to the better usage of the high-frequency components.

Domain Generalization Image Classification +1

Locality Guidance for Improving Vision Transformers on Tiny Datasets

1 code implementation20 Jul 2022 Kehan Li, Runyi Yu, Zhennan Wang, Li Yuan, Guoli Song, Jie Chen

Therefore, our locality guidance approach is very simple and efficient, and can serve as a basic performance enhancement method for VTs on tiny datasets.

Spikformer: When Spiking Neural Network Meets Transformer

2 code implementations29 Sep 2022 Zhaokun Zhou, Yuesheng Zhu, Chao He, YaoWei Wang, Shuicheng Yan, Yonghong Tian, Li Yuan

Spikformer (66. 3M parameters) with comparable size to SEW-ResNet-152 (60. 2M, 69. 26%) can achieve 74. 81% top1 accuracy on ImageNet using 4 time steps, which is the state-of-the-art in directly trained SNNs models.

Image Classification

ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation

no code implementations CVPR 2023 Kehan Li, Zhennan Wang, Zesen Cheng, Runyi Yu, Yian Zhao, Guoli Song, Chang Liu, Li Yuan, Jie Chen

Recently, self-supervised large-scale visual pre-training models have shown great promise in representing pixel-level semantic relationships, significantly promoting the development of unsupervised dense prediction tasks, e. g., unsupervised semantic segmentation (USS).

Image Segmentation Unsupervised Semantic Segmentation

Joint Multimodal Entity-Relation Extraction Based on Edge-enhanced Graph Alignment Network and Word-pair Relation Tagging

1 code implementation28 Nov 2022 Li Yuan, Yi Cai, Jin Wang, Qing Li

This paper is the first to propose jointly performing MNER and MRE as a joint multimodal entity-relation extraction task (JMERE).

graph construction named-entity-recognition +5

Rethinking Point Cloud Registration as Masking and Reconstruction

1 code implementation ICCV 2023 Guangyan Chen, Meiling Wang, Li Yuan, Yi Yang, Yufeng Yue

In this paper, a critical observation is made that the invisible parts of each point cloud can be directly utilized as inherent masks, and the aligned point cloud pair can be regarded as the reconstruction target.

Point Cloud Registration

Parallel Vertex Diffusion for Unified Visual Grounding

no code implementations13 Mar 2023 Zesen Cheng, Kehan Li, Peng Jin, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen

An intuitive materialization of our paradigm is Parallel Vertex Diffusion (PVD) to directly set vertex coordinates as the generation target and use a diffusion model to train and infer.

Visual Grounding

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

4 code implementations ICCV 2023 Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Xiangyang Ji, Chang Liu, Li Yuan, Jie Chen

Existing text-video retrieval solutions are, in essence, discriminant models focused on maximizing the conditional likelihood, i. e., p(candidates|query).

Retrieval Video Retrieval

Multi-granularity Interaction Simulation for Unsupervised Interactive Segmentation

no code implementations ICCV 2023 Kehan Li, Yian Zhao, Zhennan Wang, Zesen Cheng, Peng Jin, Xiangyang Ji, Li Yuan, Chang Liu, Jie Chen

Interactive segmentation enables users to segment as needed by providing cues of objects, which introduces human-computer interaction for many fields, such as image editing and medical image analysis.

Interactive Segmentation

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

4 code implementations CVPR 2023 Peng Jin, Jinfa Huang, Pengfei Xiong, Shangxuan Tian, Chang Liu, Xiangyang Ji, Li Yuan, Jie Chen

Contrastive learning-based video-language representation learning approaches, e. g., CLIP, have achieved outstanding performance, which pursue semantic interaction upon pre-defined video-text pairs.

Contrastive Learning Question Answering +5

Learning with Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning

1 code implementation CVPR 2023 Zeyin Song, Yifan Zhao, Yujun Shi, Peixi Peng, Li Yuan, Yonghong Tian

However, in this work, we find that the CE loss is not ideal for the base session training as it suffers poor class separation in terms of representations, which further degrades generalization to novel classes.

Contrastive Learning Few-Shot Class-Incremental Learning +1

Img2Vec: A Teacher of High Token-Diversity Helps Masked AutoEncoders

no code implementations25 Apr 2023 Heng Pan, Chenyang Liu, Wenxiao Wang, Li Yuan, Hongfa Wang, Zhifeng Li, Wei Liu

To study which type of deep features is appropriate for MIM as a learning target, we propose a simple MIM framework with serials of well-trained self-supervised models to convert an Image to a feature Vector as the learning target of MIM, where the feature extractor is also known as a teacher model.

Attribute Vocal Bursts Intensity Prediction

PointGPT: Auto-regressively Generative Pre-training from Point Clouds

1 code implementation NeurIPS 2023 Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, Yufeng Yue

Large language models (LLMs) based on the generative pre-training transformer (GPT) have demonstrated remarkable effectiveness across a diverse range of downstream tasks.

Few-Shot Learning

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

4 code implementations20 May 2023 Peng Jin, Hao Li, Zesen Cheng, Jinfa Huang, Zhennan Wang, Li Yuan, Chang Liu, Jie Chen

In this paper, we propose the Disentangled Conceptualization and Set-to-set Alignment (DiCoSA) to simulate the conceptualizing and reasoning process of human beings.

Retrieval Video Retrieval

Album Storytelling with Iterative Story-aware Captioning and Large Language Models

no code implementations22 May 2023 Munan Ning, Yujia Xie, Dongdong Chen, Zeyin Song, Lu Yuan, Yonghong Tian, Qixiang Ye, Li Yuan

One natural approach is to use caption models to describe each photo in the album, and then use LLMs to summarize and rewrite the generated captions into an engaging story.

Temporal Contrastive Learning for Spiking Neural Networks

no code implementations23 May 2023 Haonan Qiu, Zeyin Song, Yanqi Chen, Munan Ning, Wei Fang, Tao Sun, Zhengyu Ma, Li Yuan, Yonghong Tian

However, in this work, we find the method above is not ideal for the SNNs training as it omits the temporal dynamics of SNNs and degrades the performance quickly with the decrease of inference time steps.

Contrastive Learning

ChatFace: Chat-Guided Real Face Editing via Diffusion Latent Space Manipulation

no code implementations24 May 2023 Dongxu Yue, Qin Guo, Munan Ning, Jiaxi Cui, Yuesheng Zhu, Li Yuan

Despite the successful image reconstruction achieved by diffusion-based methods, there are still challenges in effectively manipulating fine-gained facial attributes with textual instructions. To address these issues and facilitate convenient manipulation of real facial images, we propose a novel approach that conduct text-driven image editing in the semantic latent space of diffusion model.

Attribute Image Reconstruction

Auto-Spikformer: Spikformer Architecture Search

no code implementations1 Jun 2023 Kaiwei Che, Zhaokun Zhou, Zhengyu Ma, Wei Fang, Yanqi Chen, Shuaijie Shen, Li Yuan, Yonghong Tian

The integration of self-attention mechanisms into Spiking Neural Networks (SNNs) has garnered considerable interest in the realm of advanced deep learning, primarily due to their biological properties.

ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases

1 code implementation28 Jun 2023 Jiaxi Cui, Zongjian Li, Yang Yan, Bohua Chen, Li Yuan

Furthermore, we propose a self-attention method to enhance the ability of large models to overcome errors present in reference data, further optimizing the issue of model hallucinations at the model level and improving the problem-solving capabilities of large models.

Language Modelling Large Language Model +1

Spike-driven Transformer

1 code implementation NeurIPS 2023 Man Yao, Jiakui Hu, Zhaokun Zhou, Li Yuan, Yonghong Tian, Bo Xu, Guoqi Li

In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: 1) Event-driven, no calculation is triggered when the input of Transformer is zero; 2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; 3) Self-attention with linear complexity at both token and channel dimensions; 4) The operations between spike-form Query, Key, and Value are mask and addition.

Learning Sparse Neural Networks with Identity Layers

no code implementations14 Jul 2023 Mingjian Ni, Guangyao Chen, Xiawu Zheng, Peixi Peng, Li Yuan, Yonghong Tian

Applying such theory, we propose a plug-and-play CKA-based Sparsity Regularization for sparse network training, dubbed CKA-SR, which utilizes CKA to reduce feature similarity between layers and increase network sparsity.

Fully Transformer-Equipped Architecture for End-to-End Referring Video Object Segmentation

no code implementations21 Sep 2023 Ping Li, Yu Zhang, Li Yuan, Xianghua Xu

Referring Video Object Segmentation (RVOS) requires segmenting the object in video referred by a natural language query.

Object Referring Video Object Segmentation +4

Efficient Long-Short Temporal Attention Network for Unsupervised Video Object Segmentation

no code implementations21 Sep 2023 Ping Li, Yu Zhang, Li Yuan, Huaxin Xiao, Binbin Lin, Xianghua Xu

Unsupervised Video Object Segmentation (VOS) aims at identifying the contours of primary foreground objects in videos without any prior knowledge.

Semantic Segmentation Unsupervised Video Object Segmentation +1

Triple-View Knowledge Distillation for Semi-Supervised Semantic Segmentation

no code implementations22 Sep 2023 Ping Li, Junjie Chen, Li Yuan, Xianghua Xu, Mingli Song

To alleviate the expensive human labeling, semi-supervised semantic segmentation employs a few labeled images and an abundant of unlabeled images to predict the pixel-level label map with the same size.

Feature Importance Knowledge Distillation +1

Adversarial Attacks on Video Object Segmentation with Hard Region Discovery

no code implementations25 Sep 2023 Ping Li, Yu Zhang, Li Yuan, Jian Zhao, Xianghua Xu, Xiaoqin Zhang

Particularly, the gradients from the segmentation model are exploited to discover the easily confused region, in which it is difficult to identify the pixel-wise objects from the background in a frame.

Autonomous Driving Object +5

LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

1 code implementation2 Oct 2023 Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, Li Yuan

This phenomenon forces us to revisit that hallucination may be another view of adversarial examples, and it shares similar features with conventional adversarial examples as the basic feature of LLMs.

Hallucination

HiFi-123: Towards High-fidelity One Image to 3D Content Generation

no code implementations10 Oct 2023 Wangbo Yu, Li Yuan, Yan-Pei Cao, Xiangjun Gao, Xiaoyu Li, WenBo Hu, Long Quan, Ying Shan, Yonghong Tian

Our contributions are twofold: First, we propose a Reference-Guided Novel View Enhancement (RGNV) technique that significantly improves the fidelity of diffusion-based zero-shot novel view synthesis methods.

3D Generation Image to 3D +1

IDRNet: Intervention-Driven Relation Network for Semantic Segmentation

1 code implementation NeurIPS 2023 Zhenchao Jin, Xiaowei Hu, Lingting Zhu, Luchuan Song, Li Yuan, Lequan Yu

Next, a deletion diagnostics procedure is conducted to model relations of these semantic-level representations via perceiving the network outputs and the extracted relations are utilized to guide the semantic-level representations to interact with each other.

Relation Relation Network +1

Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts

no code implementations18 Oct 2023 Xinhua Cheng, Tianyu Yang, Jianan Wang, Yu Li, Lei Zhang, Jian Zhang, Li Yuan

Recent text-to-3D generation methods achieve impressive 3D content creation capacity thanks to the advances in image diffusion models and optimizing strategies.

3D Generation Text to 3D

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

4 code implementations16 Nov 2023 Bin Lin, Yang Ye, Bin Zhu, Jiaxi Cui, Munan Ning, Peng Jin, Li Yuan

In this work, we unify visual representation into the language feature space to advance the foundational LLM towards a unified LVLM.

Language Modelling Large Language Model +2

Advancing Vision Transformers with Group-Mix Attention

1 code implementation26 Nov 2023 Chongjian Ge, Xiaohan Ding, Zhan Tong, Li Yuan, Jiangliu Wang, Yibing Song, Ping Luo

The attention map is computed based on the mixtures of tokens and group proxies and used to re-combine the tokens and groups in Value.

Image Classification object-detection +2

Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

1 code implementation27 Nov 2023 Munan Ning, Bin Zhu, Yujia Xie, Bin Lin, Jiaxi Cui, Lu Yuan, Dongdong Chen, Li Yuan

Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries.

Decision Making Question Answering

Regressor-Segmenter Mutual Prompt Learning for Crowd Counting

no code implementations4 Dec 2023 Mingyue Guo, Li Yuan, Zhaoyi Yan, Binghui Chen, YaoWei Wang, Qixiang Ye

In this study, we propose mutual prompt learning (mPrompt), which leverages a regressor and a segmenter as guidance for each other, solving bias and inaccuracy caused by annotation variance while distinguishing foreground from background.

Crowd Counting

FreestyleRet: Retrieving Images from Style-Diversified Queries

1 code implementation5 Dec 2023 Hao Li, Curise Jia, Peng Jin, Zesen Cheng, Kehan Li, Jialu Sui, Chang Liu, Li Yuan

In this paper, we propose the Style-Diversified Query-Based Image Retrieval task, which enables retrieval based on various query styles.

Image Retrieval Retrieval

Machine Mindset: An MBTI Exploration of Large Language Models

1 code implementation20 Dec 2023 Jiaxi Cui, Liuzhenghao Lv, Jing Wen, Rongsheng Wang, Jing Tang, Yonghong Tian, Li Yuan

We present a novel approach for integrating Myers-Briggs Type Indicator (MBTI) personality traits into large language models (LLMs), addressing the challenges of personality consistency in personalized AI.

Large Language Model Personality Alignment +2

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

1 code implementation20 Dec 2023 Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan

The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency.

3D Generation Image to 3D

Peer-review-in-LLMs: Automatic Evaluation Method for LLMs in Open-environment

1 code implementation2 Feb 2024 Kun-Peng Ning, Shuo Yang, Yu-Yang Liu, Jia-Yu Yao, Zhen-Hui Liu, Yu Wang, Ming Pang, Li Yuan

Existing large language models (LLMs) evaluation methods typically focus on testing the performance on some closed-environment and domain-specific benchmarks with human annotations.

LLMBind: A Unified Modality-Task Integration Framework

no code implementations22 Feb 2024 Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan

In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.

Audio Generation Image Segmentation +3

Optimal ANN-SNN Conversion with Group Neurons

1 code implementation29 Feb 2024 Liuzhenghao Lv, Wei Fang, Li Yuan, Yonghong Tian

For instance, while converting artificial neural networks (ANNs) to SNNs circumvents the need for direct training of SNNs, it encounters issues related to conversion errors and high inference time delays.

A Logical Pattern Memory Pre-trained Model for Entailment Tree Generation

1 code implementation11 Mar 2024 Li Yuan, Yi Cai, Haopeng Ren, Jiexin Wang

LMPM incorporates an external memory structure to learn and store the latent representations of logical patterns, which aids in generating logically consistent conclusions.

Envision3D: One Image to 3D with Anchor Views Interpolation

1 code implementation13 Mar 2024 Yatian Pang, Tanghui Jia, Yujun Shi, Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Xing Zhou, Francis E. H. Tay, Li Yuan

To address this issue, we propose a novel cascade diffusion framework, which decomposes the challenging dense views generation task into two tractable stages, namely anchor views generation and anchor views interpolation.

Image to 3D

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

2 code implementations25 Mar 2024 Chenlin Zhou, Han Zhang, Zhaokun Zhou, Liutao Yu, Liwei Huang, Xiaopeng Fan, Li Yuan, Zhengyu Ma, Huihui Zhou, Yonghong Tian

ii) We incorporate the hierarchical structure, which significantly benefits the performance of both the brain and artificial neural networks, into spiking transformers to obtain multi-scale spiking representation.

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

2 code implementations7 Apr 2024 Shenghai Yuan, Jinfa Huang, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo

Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions.

Text-to-Video Generation Video Generation

UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark

no code implementations15 Apr 2024 Zhaokun Zhou, Qiulin Wang, Bin Lin, Yiwei Su, Rui Chen, Xin Tao, Amin Zheng, Li Yuan, Pengfei Wan, Di Zhang

To further evaluate the IAA capability of MLLMs, we construct the UNIAA-Bench, which consists of three aesthetic levels: Perception, Description, and Assessment.

Language Modelling Large Language Model

Cannot find the paper you are looking for? You can Submit a new open access paper.