Search Results for author: Yi Yang

Found 505 papers, 248 papers with code

Benchmarking Intersectional Biases in NLP

1 code implementation NAACL 2022 John Lalor, Yi Yang, Kendall Smith, Nicole Forsgren, Ahmed Abbasi

While much work has highlighted biases embedded in state-of-the-art language models, and more recent efforts have focused on how to debias, research assessing the fairness and performance of biased/debiased models on downstream prediction tasks has been limited.

Benchmarking BIG-bench Machine Learning +1

Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts

no code implementations ACL 2022 Yue Guo, Yi Yang, Ahmed Abbasi

Specifically, we propose a variant of the beam search method to automatically search for biased prompts such that the cloze-style completions are the most different with respect to different demographic groups.


Buy Tesla, Sell Ford: Assessing Implicit Stock Market Preference in Pre-trained Language Models

no code implementations ACL 2022 Chengyu Chuang, Yi Yang

Given the prevalence of NLP models in financial decision making systems, this work raises the awareness of their potential implicit preferences in the stock markets.

Decision Making

Content-Consistent Matching for Domain Adaptive Semantic Segmentation

1 code implementation ECCV 2020 Guangrui Li, Guoliang Kang, Wu Liu, Yunchao Wei, Yi Yang

The target of CCM is to acquire those synthetic images that share similar distribution with the real ones in the target domain, so that the domain gap can be naturally alleviated by employing the content-consistent synthetic images for training.

Domain Adaptation Semantic Segmentation +1

Joint Conditional Diffusion Model for Image Restoration with Mixed Degradations

no code implementations11 Apr 2024 Yufeng Yue, Meng Yu, Luojie Yang, Yi Yang

Image restoration is rather challenging in adverse weather conditions, especially when multiple degradations occur simultaneously.

Image Restoration

CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers

no code implementations10 Apr 2024 Longwei Zou, Qingyang Wang, Han Zhao, Jiangang Kong, Yi Yang, Yangdong Deng

The fast-growing large scale language models are delivering unprecedented performance on almost all natural language processing tasks.


LGSDF: Continual Global Learning of Signed Distance Fields Aided by Local Updating

2 code implementations8 Apr 2024 Yufeng Yue, Yinan Deng, Jiahui Wang, Yi Yang

Implicit reconstruction of ESDF (Euclidean Signed Distance Field) involves training a neural network to regress the signed distance from any point to the nearest obstacle, which has the advantages of lightweight storage and continuous querying.

Self-Supervised Learning

Visual Knowledge in the Big Model Era: Retrospect and Prospect

no code implementations5 Apr 2024 Wenguan Wang, Yi Yang, Yunhe Pan

Visual knowledge is a new form of knowledge representation that can encapsulate visual concepts and their relations in a succinct, comprehensive, and interpretable manner, with a deep root in cognitive psychology.

Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks

no code implementations4 Apr 2024 Lei Zhang, YuHang Zhou, Yi Yang, Xinbo Gao

Despite providing high-performance solutions for computer vision tasks, the deep neural network (DNN) model has been proved to be extremely vulnerable to adversarial attacks.

Adversarial Defense Adversarial Robustness +1

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

no code implementations2 Apr 2024 Tianhao Zhao, Yongcan Chen, Yu Wu, Tianyang Liu, Bo Du, Peilun Xiao, Shi Qiu, Hongda Yang, Guozhen Li, Yi Yang, Yutian Lin

In the first stage, we train a BEV autoencoder to reconstruct the BEV segmentation maps given corrupted noisy latent representation, which urges the decoder to learn fundamental knowledge of typical BEV patterns.

Autonomous Driving Bird's-Eye View Semantic Segmentation +2

Clustering for Protein Representation Learning

no code implementations30 Mar 2024 Ruijie Quan, Wenguan Wang, Fan Ma, Hehe Fan, Yi Yang

We select the highest-scoring clusters and use their medoid nodes for the next iteration of clustering, until we obtain a hierarchical and informative representation of the protein.

Clustering Protein Folding +1

Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity

no code implementations29 Mar 2024 Ruijie Quan, Wenguan Wang, Zhibo Tian, Fan Ma, Yi Yang

Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface.

Brain Computer Interface Image Reconstruction +1

Neural Clustering based Visual Representation Learning

1 code implementation26 Mar 2024 Guikun Chen, Xia Li, Yi Yang, Wenguan Wang

In this work, we propose feature extraction with clustering (FEC), a conceptually elegant yet surprisingly ad-hoc interpretable neural clustering framework, which views feature extraction as a process of selecting representatives from data and thus automatically captures the underlying data distribution.

Clustering Representation Learning

Clustering Propagation for Universal Medical Image Segmentation

1 code implementation25 Mar 2024 Yuhang Ding, Liulei Li, Wenguan Wang, Yi Yang

}$ This enables knowledge acquired from prior slices to assist in the segmentation of the current slice, further efficiently bridging the communication between remote slices using mere 2D networks.

Clustering Image Segmentation +4

Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs

no code implementations24 Mar 2024 Zhuoyi Peng, Yi Yang

We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases.

Self-Supervised Learning Semantic Similarity +1

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing

no code implementations24 Mar 2024 Xiangpeng Yang, Linchao Zhu, Hehe Fan, Yi Yang

We find that the crux of the issue stems from the imprecise distribution of attention weights across designated regions, including inaccurate text-to-attribute control and attention leakage.

Attribute Video Editing

Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval

no code implementations24 Mar 2024 Yucheng Suo, Fan Ma, Linchao Zhu, Yi Yang

The pseudo-word tokens generated in this stream are explicitly aligned with fine-grained semantics in the text embedding space.

Attribute Image Retrieval +2

Ghost Sentence: A Tool for Everyday Users to Copyright Data from Large Language Models

no code implementations23 Mar 2024 Shuai Zhao, Linchao Zhu, Ruijie Quan, Yi Yang

These concealed passphrases in user documents, referred to as \textit{ghost sentences}, once they are identified in the generated content of LLMs, users can be sure that their data is used for training.


LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels

1 code implementation22 Mar 2024 Tuo Feng, Wenguan Wang, Fan Ma, Yi Yang

Consequently, it is essential to develop LiDAR perception methods that are both efficient and effective.

Volumetric Environment Representation for Vision-Language Navigation

1 code implementation21 Mar 2024 Rui Liu, Wenguan Wang, Yi Yang

To achieve a comprehensive 3D representation with fine-grained details, we introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.

Multi-Task Learning Navigate +2

Beyond Surface Similarity: Detecting Subtle Semantic Shifts in Financial Narratives

no code implementations21 Mar 2024 Jiaxin Liu, Yi Yang, Kar Yan Tam

In this paper, we introduce the Financial-STS task, a financial domain-specific NLP task designed to measure the nuanced semantic similarity between pairs of financial narratives.

Decision Making Semantic Similarity +2

OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments

1 code implementation14 Mar 2024 Yinan Deng, Jiahui Wang, Jingyu Zhao, Xinyu Tian, Guangyan Chen, Yi Yang, Yufeng Yue

In this work, we propose OpenGraph, the first open-vocabulary hierarchical graph representation designed for large-scale outdoor environments.

Zero-Shot Learning

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

1 code implementation10 Mar 2024 Wenhao Wang, Yi Yang

In this paper, we introduce VidProM, the first large-scale dataset comprising 1. 67 million unique text-to-video prompts from real users.

Copy Detection Image Generation +3

DLP-GAN: learning to draw modern Chinese landscape photos with generative adversarial network

no code implementations6 Mar 2024 Xiangquan Gui, Binxuan Zhang, Li Li, Yi Yang

To solve such problems, in this paper, we (1) propose DLP-GAN (Draw Modern Chinese Landscape Photos with Generative Adversarial Network), an unsupervised cross-domain image translation framework with a novel asymmetric cycle mapping, and (2) introduce a generator based on a dense-fusion module to match different translation directions.

Generative Adversarial Network Translation

RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules

1 code implementation5 Mar 2024 Miaomiao Li, Jiaqi Zhu, Yang Wang, Yi Yang, Yilin Li, Hongan Wang

Weakly supervised text classification (WSTC), also called zero-shot or dataless text classification, has attracted increasing attention due to its applicability in classifying a mass of texts within the dynamic and open Web environment, since it requires only a limited set of seed words (label names) for each category instead of labeled data.

Pseudo Label text-classification +1

Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States

no code implementations15 Feb 2024 Hanyu Duan, Yi Yang, Kar Yan Tam

More specifically, we check whether and how an LLM reacts differently in its hidden states when it answers a question right versus when it hallucinates.


ProtChatGPT: Towards Understanding Proteins with Large Language Models

no code implementations15 Feb 2024 Chao Wang, Hehe Fan, Ruijie Quan, Yi Yang

The protein first undergoes protein encoders and PLP-former to produce protein embeddings, which are then projected by the adapter to conform with the LLM.

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

no code implementations9 Feb 2024 Zhenglin Zhou, Fan Ma, Hehe Fan, Yi Yang

Specifically, we incorporate the FLAME into both 3D representation and score distillation: 1) FLAME-based 3D Gaussian splatting, driving 3D Gaussian points by rigging each point to a FLAME mesh.

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

1 code implementation8 Feb 2024 Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, Yi Yang

Lastly, we aggregate all the shaded instances to provide the necessary information for accurately generating multiple instances in stable diffusion (SD).

Attribute Conditional Text-to-Image Synthesis +1

BootsTAP: Bootstrapped Training for Tracking-Any-Point

2 code implementations1 Feb 2024 Carl Doersch, Yi Yang, Dilara Gokay, Pauline Luc, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ross Goroshin, João Carreira, Andrew Zisserman

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes.

CapHuman: Capture Your Moments in Parallel Universes

1 code implementation1 Feb 2024 Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang

Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner.

Image Generation

Retrosynthesis prediction enhanced by in-silico reaction data augmentation

no code implementations31 Jan 2024 Xu Zhang, Yiming Mo, Wenguan Wang, Yi Yang

As a response, we exploit easy-to-access unpaired data (i. e., one component of product-reactant(s) pair) for generating in-silico paired data to facilitate model training.

Data Augmentation Retrosynthesis

DeFlow: Decoder of Scene Flow Network in Autonomous Driving

2 code implementations29 Jan 2024 Qingwen Zhang, Yi Yang, Heng Fang, Ruoyu Geng, Patric Jensfelt

Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene, especially for aiding tasks in autonomous driving.

Autonomous Driving

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

1 code implementation27 Jan 2024 Yixuan Tang, Yi Yang

We hope MultiHop-RAG will be a valuable resource for the community in developing effective RAG systems, thereby facilitating greater adoption of LLMs in practice.

Benchmarking Retrieval

Product-Level Try-on: Characteristics-preserving Try-on with Realistic Clothes Shading and Wrinkles

no code implementations20 Jan 2024 Yanlong Zang, Han Yang, Jiaxu Miao, Yi Yang

Image-based virtual try-on systems, which fit new garments onto human portraits, are gaining research attention. An ideal pipeline should preserve the static features of clothes(like textures and logos)while also generating dynamic elements(e. g. shadows, folds)that adapt to the model's pose and environment. Previous works fail specifically in generating dynamic features, as they preserve the warped in-shop clothes trivially with predicted an alpha mask by composition. To break the dilemma of over-preserving and textures losses, we propose a novel diffusion-based Product-level virtual try-on pipeline,\ie PLTON, which can preserve the fine details of logos and embroideries while producing realistic clothes shading and wrinkles. The main insights are in three folds:1)Adaptive Dynamic Rendering:We take a pre-trained diffusion model as a generative prior and tame it with image features, training a dynamic extractor from scratch to generate dynamic tokens that preserve high-fidelity semantic information.

Denoising Virtual Try-on

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

1 code implementation19 Jan 2024 Xiangpeng Yang, Linchao Zhu, Xiaohan Wang, Yi Yang

(2) Equipping the visual and text encoder with separated prompts failed to mitigate the visual-text modality gap.

Retrieval Video Retrieval

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

1 code implementation16 Jan 2024 Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, Yi Yang

Recent LLM-driven visual agents mainly focus on solving image-based tasks, which limits their ability to understand dynamic scenes, making it far from real-life applications like guiding students in laboratory experiments and identifying their mistakes.


AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents

no code implementations12 Jan 2024 Yuanzhi Liang, Linchao Zhu, Yi Yang

To address this challenge, we introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.


MS-DETR: Efficient DETR Training with Mixed Supervision

1 code implementation8 Jan 2024 Chuyang Zhao, Yifan Sun, Wenhao Wang, Qiang Chen, Errui Ding, Yi Yang, Jingdong Wang

The traditional training procedure using one-to-one supervision in the original DETR lacks direct supervision for the object detection candidates.

Object object-detection +1

GD^2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

no code implementations1 Jan 2024 Xiao Pan, Zongxin Yang, Shuai Bai, Yi Yang

Targeting these issues, we propose the GD$^2$-NeRF, a Generative Detail compensation framework via GAN and Diffusion that is both inference-time finetuning-free and with vivid plausible details.

Image to 3D Novel View Synthesis +1

Human101: Training 100+FPS Human Gaussians in 100s from 1 View

1 code implementation23 Dec 2023 MingWei Li, Jiachen Tao, Zongxin Yang, Yi Yang

In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS.

Model Stealing Attack against Recommender System

no code implementations18 Dec 2023 Zhihao Zhu, Rui Fan, Chenwang Wu, Yi Yang, Defu Lian, Enhong Chen

Some adversarial attacks have achieved model stealing attacks against recommender systems, to some extent, by collecting abundant training data of the target model (target data) or making a mass of queries.

Recommendation Systems

Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity

no code implementations18 Dec 2023 Zhihao Zhu, Chenwang Wu, Rui Fan, Yi Yang, Defu Lian, Enhong Chen

Recent research demonstrates that GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.

Active Learning Graph Classification +1

Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens

no code implementations12 Dec 2023 Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang

This amplifies the effect of visual tokens on text generation, especially when the relative distance is longer between visual and text tokens.

Hallucination Position +2

DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers

1 code implementation11 Dec 2023 Sarin Chandy, Varun Gangal, Yi Yang, Gabriel Maggiotti

DYAD is based on a bespoke near-sparse matrix structure which approximates the dense "weight" matrix W that matrix-multiplies the input in the typical realization of such a layer, a. k. a DENSE.


SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

no code implementations10 Dec 2023 Zechuan Zhang, Zongxin Yang, Yi Yang

A key limitation of previous methods is their insufficient prior guidance in transitioning from 2D to 3D and in texture prediction.

Learning from One Continuous Video Stream

no code implementations1 Dec 2023 João Carreira, Michael King, Viorica Pătrăucean, Dilara Gokay, Cătălin Ionescu, Yi Yang, Daniel Zoran, Joseph Heyward, Carl Doersch, Yusuf Aytar, Dima Damen, Andrew Zisserman

We introduce a framework for online learning from a single continuous video stream -- the way people and animals learn, without mini-batches, data augmentation or shuffling.

Data Augmentation Future prediction

AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text

no code implementations29 Nov 2023 Jianfeng Zhang, Xuanmeng Zhang, Huichao Zhang, Jun Hao Liew, Chenxu Zhang, Yi Yang, Jiashi Feng

We study the problem of creating high-fidelity and animatable 3D avatars from only textual descriptions.

FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax

no code implementations27 Nov 2023 Yu Lu, Linchao Zhu, Hehe Fan, Yi Yang

Text-to-video (T2V) generation is a rapidly growing research area that aims to translate the scenes, objects, and actions within complex video text into a sequence of coherent visual frames.

Video Generation

Scalable AI Generative Content for Vehicular Network Semantic Communication

no code implementations23 Nov 2023 Hao Feng, Yi Yang, Zhu Han

Experimental results suggest that the proposed method surpasses the baseline in perceiving vehicles in blind spots and effectively compresses communication data.

Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation

no code implementations21 Nov 2023 Mu Chen, Zhedong Zheng, Yi Yang

Based on such observation, we propose a depth-aware framework to explicitly leverage depth estimation to mix the categories and facilitate the two complementary tasks, i. e., segmentation and depth learning in an end-to-end manner.

Depth Estimation Scene Segmentation +2

Clarity ChatGPT: An Interactive and Adaptive Processing System for Image Restoration and Enhancement

no code implementations20 Nov 2023 Yanyan Wei, Zhao Zhang, Jiahuan Ren, Xiaogang Xu, Richang Hong, Yi Yang, Shuicheng Yan, Meng Wang

The generalization capability of existing image restoration and enhancement (IRE) methods is constrained by the limited pre-trained datasets, making it difficult to handle agnostic inputs such as different degradation levels and scenarios beyond their design scopes.

Image Restoration Language Modelling

Cut-and-Paste: Subject-Driven Video Editing with Attention Control

no code implementations20 Nov 2023 Zhichao Zuo, Zhao Zhang, Yan Luo, Yang Zhao, Haijun Zhang, Yi Yang, Meng Wang

This paper presents a novel framework termed Cut-and-Paste for real-word semantic video editing under the guidance of text prompt and additional reference image.

Object Video Editing

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields

1 code implementation20 Nov 2023 Zhiyuan Min, Yawei Luo, Wei Yang, Yuesong Wang, Yi Yang

Different from existing methods that consider cross-view and along-epipolar information independently, EVE-NeRF conducts the view-epipolar feature aggregation in an entangled manner by injecting the scene-invariant appearance continuity and geometry consistency priors to the aggregation process.

Generalizable Novel View Synthesis

Exploring the Relationship between In-Context Learning and Instruction Tuning

no code implementations17 Nov 2023 Hanyu Duan, Yixuan Tang, Yi Yang, Ahmed Abbasi, Kar Yan Tam

In this work, we explore the relationship between ICL and IT by examining how the hidden states of LLMs change in these two paradigms.

In-Context Learning

Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads

no code implementations17 Nov 2023 Yi Yang, Hanyu Duan, Ahmed Abbasi, John P. Lalor, Kar Yan Tam

Although a burgeoning literature has emerged on stereotypical bias mitigation in PLMs, such as work on debiasing gender and racial stereotyping, how such biases manifest and behave internally within PLMs remains largely unknown.

Fairness Language Modelling

Human-Centric Autonomous Systems With LLMs for User Command Reasoning

1 code implementation14 Nov 2023 Yi Yang, Qingwen Zhang, Ci Li, Daniel Simões Marta, Nazre Batool, John Folkesson

The evolution of autonomous driving has made remarkable advancements in recent years, evolving into a tangible reality.

Autonomous Driving Binary Classification

Text Augmented Spatial-aware Zero-shot Referring Image Segmentation

no code implementations27 Oct 2023 Yucheng Suo, Linchao Zhu, Yi Yang

This task aims to identify the instance mask that is most related to a referring expression without training on pixel-level annotations.

Image Segmentation Referring Expression +4

RDBench: ML Benchmark for Relational Databases

no code implementations25 Oct 2023 Zizhao Zhang, Yi Yang, Lutong Zou, He Wen, Tao Feng, Jiaxuan You

Benefiting from high-quality datasets and standardized evaluation metrics, machine learning (ML) has achieved sustained progress and widespread applications.


PPFL: A Personalized Federated Learning Framework for Heterogeneous Population

no code implementations22 Oct 2023 Hao Di, Yi Yang, Haishan Ye, Xiangyu Chang

Personalization aims to characterize individual preferences and is widely applied across many fields.

Personalized Federated Learning

Fast and Accurate Factual Inconsistency Detection Over Long Documents

1 code implementation19 Oct 2023 Barrett Martin Lattimer, Patrick Chen, Xinyuan Zhang, Yi Yang

We introduce SCALE (Source Chunking Approach for Large-scale inconsistency Evaluation), a task-agnostic model for detecting factual inconsistencies using a novel chunking strategy.

Chunking Natural Language Inference +2

Is ChatGPT a Financial Expert? Evaluating Language Models on Financial Natural Language Processing

no code implementations19 Oct 2023 Yue Guo, Zian Xu, Yi Yang

This study compares the performance of encoder-only language models and the decoder-only language models.

Language Modelling

FinEntity: Entity-level Sentiment Classification for Financial Texts

1 code implementation19 Oct 2023 Yixuan Tang, Yi Yang, Allen H Huang, Andy Tam, Justin Z Tang

In this work, we introduce an entity-level sentiment classification dataset, called \textbf{FinEntity}, that annotates financial entity spans and their sentiment (positive, neutral, and negative) in financial news.

Classification Sentiment Analysis +1

IcoCap: Improving Video Captioning by Compounding Images

no code implementations IEEE Transactions on Multimedia 2023 Yuanzhi Liang, Linchao Zhu, Xiaohan Wang, Yi Yang

Video captioning is a more challenging task compared to image captioning, primarily due to differences in content density.

Ranked #5 on Video Captioning on VATEX (using extra training data)

Image Captioning Video Captioning

GETAvatar: Generative Textured Meshes for Animatable Human Avatars

no code implementations ICCV 2023 Xuanmeng Zhang, Jianfeng Zhang, Rohan Chacko, Hongyi Xu, Guoxian Song, Yi Yang, Jiashi Feng

We study the problem of 3D-aware full-body human generation, aiming at creating animatable human avatars with high-quality textures and geometries.

Image Generation

LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and Reasoning

no code implementations ICCV 2023 Liulei Li, Wenguan Wang, Yi Yang

Current high-performance semantic segmentation models are purely data-driven sub-symbolic approaches and blind to the structured nature of the visual world.

Segmentation Semantic Parsing +1

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

1 code implementation NeurIPS 2023 Zechuan Zhang, Li Sun, Zongxin Yang, Ling Chen, Yi Yang

Reconstructing 3D clothed human avatars from single images is a challenging task, especially when encountering complex poses and loose clothing.

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

1 code implementation18 Sep 2023 Kexin Li, Zongxin Yang, Lei Chen, Yi Yang, Jun Xiao

However, existing methods exhibit two limitations: 1) they address video temporal features and audio-visual interactive features separately, disregarding the inherent spatial-temporal dependence of combined audio and video, and 2) they inadequately introduce audio constraints and object-level information during the decoding stage, resulting in segmentation outcomes that fail to comply with audio directives.

Video Segmentation Video Semantic Segmentation

RMP: A Random Mask Pretrain Framework for Motion Prediction

1 code implementation16 Sep 2023 Yi Yang, Qingwen Zhang, Thomas Gilles, Nazre Batool, John Folkesson

As the pretraining technique is growing in popularity, little work has been done on pretrained learning-based motion prediction methods in autonomous driving.

Autonomous Driving motion prediction +1

InvestLM: A Large Language Model for Investment using Financial Domain Instruction Tuning

1 code implementation15 Sep 2023 Yi Yang, Yixuan Tang, Kar Yan Tam

We present a new financial domain large language model, InvestLM, tuned on LLaMA-65B (Touvron et al., 2023), using a carefully curated instruction dataset related to financial investment.

Language Modelling Large Language Model

MC-NeRF: Multi-Camera Neural Radiance Fields for Multi-Camera Image Acquisition Systems

no code implementations14 Sep 2023 Yu Gao, Lutong Su, Hao Liang, Yufeng Yue, Yi Yang, Mengyin Fu

In this paper, we propose MC-NeRF, a method that enables joint optimization of both intrinsic and extrinsic parameters alongside NeRF.

Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring

1 code implementation13 Sep 2023 Dongwei Ren, Wei Shang, Yi Yang, WangMeng Zuo

To aggregate long-term sharp features from detected sharp frames, we utilize a global Transformer with multi-scale matching capability.


Editing 3D Scenes via Text Prompts without Retraining

no code implementations10 Sep 2023 Shuangkang Fang, Yufeng Wang, Yi Yang, Yi-Hsuan Tsai, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang

To tackle these issues, we introduce a text-driven editing method, termed DN2N, which allows for the direct acquisition of a NeRF model with universal editing capabilities, eliminating the requirement for retraining.

3D scene Editing 3D Scene Reconstruction +2

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

1 code implementation4 Sep 2023 Yunhong Lou, Linchao Zhu, Yaxiong Wang, Xiaohan Wang, Yi Yang

We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions while preserving motion diversity. Despite the recent significant process in text-based human motion generation, existing methods often prioritize fitting training motions at the expense of action diversity.

Ranked #2 on Motion Synthesis on HumanML3D (using extra training data)

Language Modelling Motion Synthesis

RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

no code implementations30 Aug 2023 Mel Vecerik, Carl Doersch, Yi Yang, Todor Davchev, Yusuf Aytar, Guangyao Zhou, Raia Hadsell, Lourdes Agapito, Jon Scholz

For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly.

AIoT-Based Drum Transcription Robot using Convolutional Neural Networks

no code implementations29 Aug 2023 Yukun Su, Yi Yang

With the development of information technology, robot technology has made great progress in various fields.

Drum Transcription Music Transcription

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

1 code implementation ICCV 2023 Yuanyou Xu, Zongxin Yang, Yi Yang

Tracking any given object(s) spatially and temporally is a common purpose in Visual Object Tracking (VOT) and Video Object Segmentation (VOS).

Object Representation Learning +6

Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation

no code implementations ICCV 2023 Chen Liang, Wenguan Wang, Jiaxu Miao, Yi Yang

Recent advances in semi-supervised semantic segmentation have been heavily reliant on pseudo labeling to compensate for limited labeled data, disregarding the valuable relational knowledge among semantic concepts.

Segmentation Semi-Supervised Semantic Segmentation

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

no code implementations ICCV 2023 Jinyu Chen, Wenguan Wang, Si Liu, Hongsheng Li, Yi Yang

CCPD transfers the fundamental, point-to-point wayfinding skill that is well trained on the large-scale PointGoal task to ORAN, so as to help ORAN to better master audio-visual navigation with far fewer training samples.

Decision Making Transfer Learning +1

Compositional Feature Augmentation for Unbiased Scene Graph Generation

1 code implementation ICCV 2023 Lin Li, Guikun Chen, Jun Xiao, Yi Yang, Chunping Wang, Long Chen

Specifically, we first decompose each relation triplet feature into two components: intrinsic feature and extrinsic feature, which correspond to the intrinsic characteristics and extrinsic contexts of a relation triplet, respectively.

Graph Generation Relation +1

Bird's-Eye-View Scene Graph for Vision-Language Navigation

1 code implementation ICCV 2023 Rui Liu, Xiaohan Wang, Wenguan Wang, Yi Yang

Vision-language navigation (VLN), which entails an agent to navigate 3D environments following human instructions, has shown great advances.

Navigate Vision-Language Navigation

DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation

no code implementations31 Jul 2023 Yue Zhang, Hehe Fan, Yi Yang, Mohan Kankanhalli

The proposed method, named Mixture of Depth and Point cloud video experts (DPMix), achieved the first place in the 4D Action Segmentation Track of the HOI4D Challenge 2023.

Action Segmentation Human-Object Interaction Detection +2

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

1 code implementation ICCV 2023 Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$\&$3D aligned results in a coarse-to-fine manner and a novel 3D joint contrastive learning approach for adding explicitly global supervision for the 3D feature space.

Contrastive Learning Human Mesh Recovery

Clustering based Point Cloud Representation Learning for 3D Analysis

1 code implementation ICCV 2023 Tuo Feng, Wenguan Wang, Xiaohan Wang, Yi Yang, Qinghua Zheng

The mined patterns are, in turn, used to repaint the embedding space, so as to respect the underlying distribution of the entire training dataset and improve the robustness to the variations.

Clustering Point Cloud Segmentation +2

Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models

1 code implementation24 Jul 2023 Yuanzhi Liang, Linchao Zhu, Yi Yang

MOE challenges models to understand characters' intentions and accurately determine their actions within intricate contexts involving multi-character and novel object interactions.

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

no code implementations ICCV 2023 Xiao Pan, Zongxin Yang, Jianxin Ma, Chang Zhou, Yi Yang

However, such SPC-based representation i) optimizes under the volatile observation space which leads to the pose-misalignment between training and inference stages, and ii) lacks the global relationships among human parts that is critical for handling the incomplete painted SMPL.

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

no code implementations13 Jul 2023 Shuo Huang, Zongxin Yang, Liangting Li, Yi Yang, Jia Jia

Large-scale pre-trained vision-language models allow for the zero-shot text-based generation of 3D avatars.

Stroke Extraction of Chinese Character Based on Deep Structure Deformable Image Registration

1 code implementation10 Jul 2023 Meng Li, Yahan Yu, Yi Yang, Guanghao Ren, Jian Wang

In this paper, we propose a deep learning-based character stroke extraction method that takes semantic features and prior information of strokes into consideration.

Image Registration Semantic Segmentation

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video Object Segmentation

no code implementations5 Jul 2023 Jiahao Li, Yuanyou Xu, Zongxin Yang, Yi Yang, Yueting Zhuang

The Associating Objects with Transformers (AOT) framework has exhibited exceptional performance in a wide range of complex scenarios for video object segmentation.

Object Position +4

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking

no code implementations5 Jul 2023 Yuanyou Xu, Jiahao Li, Zongxin Yang, Yi Yang, Yueting Zhuang

MSDeAOT efficiently propagates object masks from previous frames to the current frame using two feature scales of 16 and 8.

Object Segmentation +4

Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition

1 code implementation3 Jul 2023 Chao Liang, Zongxin Yang, Linchao Zhu, Yi Yang

In real-world scenarios, collected and annotated data often exhibit the characteristics of multiple classes and long-tailed distribution.

Learning with noisy labels Multi-Label Classification +1

Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023

1 code implementation15 Jun 2023 Jiayi Shao, Xiaohan Wang, Ruijie Quan, Yi Yang

This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023, including Natural Language Queries and Moment Queries.

Moment Queries Natural Language Queries

Shuffled Autoregression For Motion Interpolation

no code implementations10 Jun 2023 Shuo Huang, Jia Jia, Zongxin Yang, Wei Wang, Haozhe Wu, Yi Yang, Junliang Xing

However, motion interpolation is a more complex problem that takes isolated poses (e. g., only one start pose and one end pose) as input.

Motion Interpolation

Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval

no code implementations3 Jun 2023 Xu Zhang, Zhedong Zheng, Xiaohan Wang, Yi Yang

We propose a novel Consensus Network (Css-Net) that self-adaptively learns from noisy triplets to minimize the negative effects of triplet ambiguity.

Image Retrieval Image Retrieval with Multi-Modal Query +1

A Feature Reuse Framework with Texture-adaptive Aggregation for Reference-based Super-Resolution

1 code implementation2 Jun 2023 Xiaoyong Mei, Yi Yang, Ming Li, Changqin Huang, Kai Zhang, Pietro Lió

In this study, we propose a feature reuse framework that guides the step-by-step texture reconstruction process through different stages, reducing the negative impacts of perceptual and adversarial loss.

Image Super-Resolution Reference-based Super-Resolution

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

1 code implementation29 May 2023 Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

Given a single test sample, the VLM is forced to maximize the CLIP reward between the input and sampled results from the VLM output distribution.

Image Captioning Image Classification +5

Whitening-based Contrastive Learning of Sentence Embeddings

1 code implementation28 May 2023 Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, Yi Yang

Consequently, using multiple positive samples with enhanced diversity further improves contrastive learning due to better alignment.

Contrastive Learning Semantic Textual Similarity +4

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

1 code implementation23 May 2023 Shuai Zhao, Xiaohan Wang, Linchao Zhu, Ruijie Quan, Yi Yang

With such merits, we transform CLIP into a scene text reader and introduce CLIP4STR, a simple yet effective STR method built upon image and text encoders of CLIP.

 Ranked #1 on Scene Text Recognition on WOST (using extra training data)

Language Modelling Scene Text Recognition

Gloss-Free End-to-End Sign Language Translation

1 code implementation22 May 2023 Kezhou Lin, Xiaohan Wang, Linchao Zhu, Ke Sun, Bang Zhang, Yi Yang

In this paper, we tackle the problem of sign language translation (SLT) without gloss annotations.

Sign Language Translation Translation

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

no code implementations22 May 2023 Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng

Towards this goal, we propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature Adapting and Blending, which transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.

 Ranked #1 on Visual Question Answering (VQA) on MSVD-QA (using extra training data)

Question Answering Retrieval +6

PTGB: Pre-Train Graph Neural Networks for Brain Network Analysis

1 code implementation20 May 2023 Yi Yang, Hejie Cui, Carl Yang

The human brain is the central hub of the neurobiological system, controlling behavior and cognition in complex ways.

Transfer Learning Unsupervised Pre-training

PointGPT: Auto-regressively Generative Pre-training from Point Clouds

1 code implementation NeurIPS 2023 Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, Yufeng Yue

Large language models (LLMs) based on the generative pre-training transformer (GPT) have demonstrated remarkable effectiveness across a diverse range of downstream tasks.

Few-Shot Learning

Pyramid Diffusion Models For Low-light Image Enhancement

1 code implementation17 May 2023 Dewei Zhou, Zongxin Yang, Yi Yang

Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement.

Denoising Image Generation +1

Segment and Track Anything

1 code implementation11 May 2023 Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang

This report presents a framework called Segment And Track Anything (SAMTrack) that allows users to precisely and effectively segment and track any object in a video.

Autonomous Driving Object Tracking

Video Object Segmentation in Panoptic Wild Scenes

2 code implementations8 May 2023 Yuanyou Xu, Zongxin Yang, Yi Yang

Considering the challenges in panoptic VOS, we propose a strong baseline method named panoptic object association with transformers (PAOT), which uses panoptic identification to associate objects with a pyramid architecture on multiple scales.

Object Semantic Segmentation +2

Feature-compatible Progressive Learning for Video Copy Detection

2 code implementations20 Apr 2023 Wenhao Wang, Yifan Sun, Yi Yang

Video Copy Detection (VCD) has been developed to identify instances of unauthorized or duplicated video content.

Copy Detection Video Similarity

DETR with Additional Global Aggregation for Cross-domain Weakly Supervised Object Detection

no code implementations CVPR 2023 Zongheng Tang, Yifan Sun, Si Liu, Yi Yang

Second, through our design, the object queries and the foreground query in the decoder share consensus on the class semantics, therefore making the strong and weak supervision mutually benefit each other for domain alignment.

object-detection Weakly Supervised Object Detection

TransHP: Image Classification with Hierarchical Prompting

1 code implementation NeurIPS 2023 Wenhao Wang, Yifan Sun, Wei Li, Yi Yang

This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task.

Classification Image Classification

Efficient Multimodal Fusion via Interactive Prompting

no code implementations CVPR 2023 Yaowei Li, Ruijie Quan, Linchao Zhu, Yi Yang

Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era.

PVD-AL: Progressive Volume Distillation with Active Learning for Efficient Conversion Between Different NeRF Architectures

1 code implementation8 Apr 2023 Shuangkang Fang, Yufeng Wang, Yi Yang, Weixin Xu, Heng Wang, Wenrui Ding, Shuchang Zhou

To address this limitation and maximize the potential of each architecture, we propose Progressive Volume Distillation with Active Learning (PVD-AL), a systematic distillation method that enables any-to-any conversions between different architectures.

3D Reconstruction Novel View Synthesis

GIF: A General Graph Unlearning Strategy via Influence Function

1 code implementation6 Apr 2023 Jiancan Wu, Yi Yang, Yuchun Qian, Yongduo Sui, Xiang Wang, Xiangnan He

Then, we recognize the crux to the inability of traditional influence function for graph unlearning, and devise Graph Influence Function (GIF), a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a $\epsilon$-mass perturbation in deleted data.

Machine Unlearning

Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time

2 code implementations CVPR 2023 Wei Shang, Dongwei Ren, Yi Yang, Hongzhi Zhang, Kede Ma, WangMeng Zuo

Moreover, on the seemingly implausible x16 interpolation task, our method outperforms existing methods by more than 1. 5 dB in terms of PSNR.

Contrastive Learning Deblurring +2

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

1 code implementation CVPR 2023 Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details.

3D human pose and shape estimation

CRRS: Concentric Rectangles Regression Strategy for Multi-point Representation on Fisheye Images

no code implementations26 Mar 2023 Xihan Wang, Xi Xu, Yu Gao, Yi Yang, Yufeng Yue, Mengyin Fu

Compared with the previous work for muti-point representation, the experiments show that CRRS can improve the training performance both in accurate and stability.


Exploring Expression-related Self-supervised Learning for Affective Behaviour Analysis

1 code implementation18 Mar 2023 Fanglei Xue, Yifan Sun, Yi Yang

This paper explores an expression-related self-supervised learning (SSL) method (ContraWarping) to perform expression classification in the 5th Affective Behavior Analysis in-the-wild (ABAW) competition.

Self-Supervised Learning

Unsupervised Facial Expression Representation Learning with Contrastive Local Warping

1 code implementation16 Mar 2023 Fanglei Xue, Yifan Sun, Yi Yang

Therefore, given a facial image, ContraWarping employs some global transformations and local warping to generate its positive and negative samples and sets up a novel contrastive learning framework.

Contrastive Learning Facial Expression Recognition +4

Lana: A Language-Capable Navigator for Instruction Following and Generation

1 code implementation CVPR 2023 Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang

Recently, visual-language navigation (VLN) -- entailing robot agents to follow navigation instructions -- has shown great advance.

Instruction Following Text Generation

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training

1 code implementation6 Mar 2023 Wei Li, Linchao Zhu, Longyin Wen, Yi Yang

This decoder is both data-efficient and computation-efficient: 1) it only requires the text data for training, easing the burden on the collection of paired data.

Image Captioning Text Generation

Soft Prompt Guided Joint Learning for Cross-Domain Sentiment Analysis

no code implementations1 Mar 2023 Jingli Shi, Weihua Li, Quan Bai, Yi Yang, Jianhua Jiang

Aspect term extraction is a fundamental task in fine-grained sentiment analysis, which aims at detecting customer's opinion targets from reviews on product or service.

Sentiment Analysis Term Extraction +1

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding

no code implementations22 Jan 2023 Juncheng Li, Siliang Tang, Linchao Zhu, Wenqiao Zhang, Yi Yang, Tat-Seng Chua, Fei Wu, Yueting Zhuang

To systematically benchmark the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

Temporal Perceiving Video-Language Pre-training

no code implementations18 Jan 2023 Fan Ma, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, Jiashi Feng, Yi Yang

Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features.

Contrastive Learning Moment Retrieval +7

Analogical Inference Enhanced Knowledge Graph Embedding

1 code implementation3 Jan 2023 Zhen Yao, Wen Zhang, Mingyang Chen, Yufeng Huang, Yi Yang, Huajun Chen

And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding.

Knowledge Graph Embedding Knowledge Graphs +1

Knowledge-guided Causal Intervention for Weakly-supervised Object Localization

1 code implementation3 Jan 2023 Feifei Shao, Yawei Luo, Fei Gao, Yi Yang, Jun Xiao

Previous weakly-supervised object localization (WSOL) methods aim to expand activation map discriminative areas to cover the whole objects, yet neglect two inherent challenges when relying solely on image-level labels.

Knowledge Distillation Object +1

MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects

1 code implementation ICCV 2023 Yuanzhi Liang, Xiaohan Wang, Linchao Zhu, Yi Yang

Experimental results and visualizations, based on a large-scale dataset PartNet-Mobility, show the effectiveness of MAAL in learning multi-modal data and solving the 3D articulated object affordance problem.


Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection

1 code implementation ICCV 2023 Liangqi Li, Jiaxu Miao, Dahu Shi, Wenming Tan, Ye Ren, Yi Yang, ShiLiang Pu

Current methods for open-vocabulary object detection (OVOD) rely on a pre-trained vision-language model (VLM) to acquire the recognition ability.

Knowledge Distillation Language Modelling +2

Learning Symmetry-Aware Geometry Correspondences for 6D Object Pose Estimation

1 code implementation ICCV 2023 Heng Zhao, Shenxing Wei, Dahu Shi, Wenming Tan, Zheyang Li, Ye Ren, Xing Wei, Yi Yang, ShiLiang Pu

Taking the symmetry properties of objects into consideration, we design a symmetry-aware matching loss to facilitate the learning of dense point-wise geometry features and improve the performance considerably.

6D Pose Estimation 6D Pose Estimation using RGB +3

Adversarially Masking Synthetic To Mimic Real: Adaptive Noise Injection for Point Cloud Segmentation Adaptation

no code implementations CVPR 2023 Guangrui Li, Guoliang Kang, Xiaohan Wang, Yunchao Wei, Yi Yang

With the help of adversarial training, the masking module can learn to generate source masks to mimic the pattern of irregular target noise, thereby narrowing the domain gap.

Point Cloud Segmentation Semantic Segmentation

Rethinking Point Cloud Registration as Masking and Reconstruction

1 code implementation ICCV 2023 Guangyan Chen, Meiling Wang, Li Yuan, Yi Yang, Yufeng Yue

In this paper, a critical observation is made that the invisible parts of each point cloud can be directly utilized as inherent masks, and the aligned point cloud pair can be regarded as the reconstruction target.

Point Cloud Registration

ProD: Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification

no code implementations CVPR 2023 Tianyi Ma, Yifan Sun, Zongxin Yang, Yi Yang

Based on these two common practices, the key point of ProD is using the prompting mechanism in the transformer to disentangle the domain-general (DG) and domain-specific (DS) knowledge from the backbone feature.

Cross-Domain Few-Shot Domain Generalization +1

PointListNet: Deep Learning on 3D Point Lists

no code implementations CVPR 2023 Hehe Fan, Linchao Zhu, Yi Yang, Mohan Kankanhalli

Deep neural networks on regular 1D lists (e. g., natural languages) and irregular 3D sets (e. g., point clouds) have made tremendous achievements.

Context-Aware Pretraining for Efficient Blind Image Decomposition

1 code implementation CVPR 2023 Chao Wang, Zhedong Zheng, Ruijie Quan, Yifan Sun, Yi Yang

(2) The conventional paradigm usually focuses on mining the abnormal pattern of a superimposed image to separate the noise, which de facto conflicts with the primary image restoration task.

Attribute Image Reconstruction +1

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

5 code implementations CVPR 2023 Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Action Classification Action Recognition +3

StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition

no code implementations25 Dec 2022 Xiaolong Shen, Zhedong Zheng, Yi Yang

As its name suggests, it is made up of two modules: Part-level Spatial Modeling and Part-level Temporal Modeling.

Optical Flow Estimation Sign Language Recognition

MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering

1 code implementation CVPR 2023 Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou

To build Video Question Answering (VideoQA) systems capable of assisting humans in daily activities, seeking answers from long-form videos with diverse and complex events is a must.

Question Answering Video Question Answering +2

One is All: Bridging the Gap Between Neural Radiance Fields Architectures with Progressive Volume Distillation

1 code implementation29 Nov 2022 Shuangkang Fang, Weixin Xu, Heng Wang, Yi Yang, Yufeng Wang, Shuchang Zhou

In this paper, we propose Progressive Volume Distillation (PVD), a systematic distillation method that allows any-to-any conversions between different architectures, including MLP, sparse or low-rank tensors, hashtables and their compositions.

 Ranked #1 on Novel View Synthesis on NeRF (Average PSNR metric)

3D Reconstruction Neural Rendering +1

A Light-weight, Effective and Efficient Model for Label Aggregation in Crowdsourcing

no code implementations19 Nov 2022 Yi Yang, Zhong-Qiu Zhao, Quan Bai, Qing Liu, Weihua Li

Due to the dynamic nature, the proposed algorithms can also estimate true labels online without re-visiting historical data.

Stereo Image Rain Removal via Dual-View Mutual Attention

no code implementations18 Nov 2022 Yanyan Wei, Zhao Zhang, ZhongQiu Zhao, Yang Zhao, Richang Hong, Yi Yang

Stereo images, containing left and right view images with disparity, are utilized in solving low-vision tasks recently, e. g., rain removal and super-resolution.