Search Results for author: Yi Yang

Found 487 papers, 242 papers with code

Benchmarking Intersectional Biases in NLP

1 code implementation NAACL 2022 John Lalor, Yi Yang, Kendall Smith, Nicole Forsgren, Ahmed Abbasi

While much work has highlighted biases embedded in state-of-the-art language models, and more recent efforts have focused on how to debias, research assessing the fairness and performance of biased/debiased models on downstream prediction tasks has been limited.

Benchmarking BIG-bench Machine Learning +1

Buy Tesla, Sell Ford: Assessing Implicit Stock Market Preference in Pre-trained Language Models

no code implementations ACL 2022 Chengyu Chuang, Yi Yang

Given the prevalence of NLP models in financial decision making systems, this work raises the awareness of their potential implicit preferences in the stock markets.

Decision Making

Content-Consistent Matching for Domain Adaptive Semantic Segmentation

1 code implementation ECCV 2020 Guangrui Li, Guoliang Kang, Wu Liu, Yunchao Wei, Yi Yang

The target of CCM is to acquire those synthetic images that share similar distribution with the real ones in the target domain, so that the domain gap can be naturally alleviated by employing the content-consistent synthetic images for training.

Domain Adaptation Semantic Segmentation +1

Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts

no code implementations ACL 2022 Yue Guo, Yi Yang, Ahmed Abbasi

Specifically, we propose a variant of the beam search method to automatically search for biased prompts such that the cloze-style completions are the most different with respect to different demographic groups.

Fairness

OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments

1 code implementation14 Mar 2024 Yinan Deng, Jiahui Wang, Jingyu Zhao, Xinyu Tian, Guangyan Chen, Yi Yang, Yufeng Yue

Environment maps endowed with sophisticated semantics are pivotal for facilitating seamless interaction between robots and humans, enabling them to effectively carry out various tasks.

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models

1 code implementation10 Mar 2024 Wenhao Wang, Yi Yang

In this paper, we introduce VidProM, the first large-scale dataset comprising 1. 67 million unique text-to-video prompts from real users.

Copy Detection Image Generation +3

DLP-GAN: learning to draw modern Chinese landscape photos with generative adversarial network

no code implementations6 Mar 2024 Xiangquan Gui, Binxuan Zhang, Li Li, Yi Yang

To solve such problems, in this paper, we (1) propose DLP-GAN (Draw Modern Chinese Landscape Photos with Generative Adversarial Network), an unsupervised cross-domain image translation framework with a novel asymmetric cycle mapping, and (2) introduce a generator based on a dense-fusion module to match different translation directions.

Generative Adversarial Network Translation

RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules

1 code implementation5 Mar 2024 Miaomiao Li, Jiaqi Zhu, Yang Wang, Yi Yang, Yilin Li, Hongan Wang

Weakly supervised text classification (WSTC), also called zero-shot or dataless text classification, has attracted increasing attention due to its applicability in classifying a mass of texts within the dynamic and open Web environment, since it requires only a limited set of seed words (label names) for each category instead of labeled data.

Pseudo Label text-classification +1

ProtChatGPT: Towards Understanding Proteins with Large Language Models

no code implementations15 Feb 2024 Chao Wang, Hehe Fan, Ruijie Quan, Yi Yang

The protein first undergoes protein encoders and PLP-former to produce protein embeddings, which are then projected by the adapter to conform with the LLM.

Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States

no code implementations15 Feb 2024 Hanyu Duan, Yi Yang, Kar Yan Tam

More specifically, we check whether and how an LLM reacts differently in its hidden states when it answers a question right versus when it hallucinates.

Hallucination

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

no code implementations9 Feb 2024 Zhenglin Zhou, Fan Ma, Hehe Fan, Yi Yang

Specifically, we incorporate the FLAME into both 3D representation and score distillation: 1) FLAME-based 3D Gaussian splatting, driving 3D Gaussian points by rigging each point to a FLAME mesh.

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

1 code implementation8 Feb 2024 Dewei Zhou, You Li, Fan Ma, Xiaoting Zhang, Yi Yang

Lastly, we aggregate all the shaded instances to provide the necessary information for accurately generating multiple instances in stable diffusion (SD).

Attribute Conditional Text-to-Image Synthesis +1

CapHuman: Capture Your Moments in Parallel Universes

1 code implementation1 Feb 2024 Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang

Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner.

Image Generation

BootsTAP: Bootstrapped Training for Tracking-Any-Point

2 code implementations1 Feb 2024 Carl Doersch, Yi Yang, Dilara Gokay, Pauline Luc, Skanda Koppula, Ankush Gupta, Joseph Heyward, Ross Goroshin, João Carreira, Andrew Zisserman

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes.

Retrosynthesis prediction enhanced by in-silico reaction data augmentation

no code implementations31 Jan 2024 Xu Zhang, Yiming Mo, Wenguan Wang, Yi Yang

As a response, we exploit easy-to-access unpaired data (i. e., one component of product-reactant(s) pair) for generating in-silico paired data to facilitate model training.

Data Augmentation Retrosynthesis

DeFlow: Decoder of Scene Flow Network in Autonomous Driving

1 code implementation29 Jan 2024 Qingwen Zhang, Yi Yang, Heng Fang, Ruoyu Geng, Patric Jensfelt

Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene, especially for aiding tasks in autonomous driving.

Autonomous Driving

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

1 code implementation27 Jan 2024 Yixuan Tang, Yi Yang

We hope MultiHop-RAG will be a valuable resource for the community in developing effective RAG systems, thereby facilitating greater adoption of LLMs in practice.

Benchmarking Retrieval

Product-Level Try-on: Characteristics-preserving Try-on with Realistic Clothes Shading and Wrinkles

no code implementations20 Jan 2024 Yanlong Zang, Han Yang, Jiaxu Miao, Yi Yang

Image-based virtual try-on systems, which fit new garments onto human portraits, are gaining research attention. An ideal pipeline should preserve the static features of clothes(like textures and logos)while also generating dynamic elements(e. g. shadows, folds)that adapt to the model's pose and environment. Previous works fail specifically in generating dynamic features, as they preserve the warped in-shop clothes trivially with predicted an alpha mask by composition. To break the dilemma of over-preserving and textures losses, we propose a novel diffusion-based Product-level virtual try-on pipeline,\ie PLTON, which can preserve the fine details of logos and embroideries while producing realistic clothes shading and wrinkles. The main insights are in three folds:1)Adaptive Dynamic Rendering:We take a pre-trained diffusion model as a generative prior and tame it with image features, training a dynamic extractor from scratch to generate dynamic tokens that preserve high-fidelity semantic information.

Denoising Virtual Try-on

DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

1 code implementation19 Jan 2024 Xiangpeng Yang, Linchao Zhu, Xiaohan Wang, Yi Yang

(2) Equipping the visual and text encoder with separated prompts failed to mitigate the visual-text modality gap.

Retrieval Video Retrieval

DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models

1 code implementation16 Jan 2024 Zongxin Yang, Guikun Chen, Xiaodi Li, Wenguan Wang, Yi Yang

Recent LLM-driven visual agents mainly focus on solving image-based tasks, which limits their ability to understand dynamic scenes, making it far from real-life applications like guiding students in laboratory experiments and identifying their mistakes.

Scheduling

AntEval: Evaluation of Social Interaction Competencies in LLM-Driven Agents

no code implementations12 Jan 2024 Yuanzhi Liang, Linchao Zhu, Yi Yang

To address this challenge, we introduce the Multi-Agent Interaction Evaluation Framework (AntEval), encompassing a novel interaction framework and evaluation methods.

Informativeness

MS-DETR: Efficient DETR Training with Mixed Supervision

1 code implementation8 Jan 2024 Chuyang Zhao, Yifan Sun, Wenhao Wang, Qiang Chen, Errui Ding, Yi Yang, Jingdong Wang

The traditional training procedure using one-to-one supervision in the original DETR lacks direct supervision for the object detection candidates.

Object object-detection +1

GD^2-NeRF: Generative Detail Compensation via GAN and Diffusion for One-shot Generalizable Neural Radiance Fields

no code implementations1 Jan 2024 Xiao Pan, Zongxin Yang, Shuai Bai, Yi Yang

Targeting these issues, we propose the GD$^2$-NeRF, a Generative Detail compensation framework via GAN and Diffusion that is both inference-time finetuning-free and with vivid plausible details.

Image to 3D Novel View Synthesis +1

Human101: Training 100+FPS Human Gaussians in 100s from 1 View

1 code implementation23 Dec 2023 MingWei Li, Jiachen Tao, Zongxin Yang, Yi Yang

In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS.

Model Stealing Attack against Recommender System

no code implementations18 Dec 2023 Zhihao Zhu, Rui Fan, Chenwang Wu, Yi Yang, Defu Lian, Enhong Chen

Some adversarial attacks have achieved model stealing attacks against recommender systems, to some extent, by collecting abundant training data of the target model (target data) or making a mass of queries.

Recommendation Systems

Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity

no code implementations18 Dec 2023 Zhihao Zhu, Chenwang Wu, Rui Fan, Yi Yang, Defu Lian, Enhong Chen

Recent research demonstrates that GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.

Active Learning Graph Classification +1

Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens

no code implementations12 Dec 2023 Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang

This amplifies the effect of visual tokens on text generation, especially when the relative distance is longer between visual and text tokens.

Hallucination Position +2

DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers

1 code implementation11 Dec 2023 Sarin Chandy, Varun Gangal, Yi Yang, Gabriel Maggiotti

DYAD is based on a bespoke near-sparse matrix structure which approximates the dense "weight" matrix W that matrix-multiplies the input in the typical realization of such a layer, a. k. a DENSE.

Descriptive

SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

no code implementations10 Dec 2023 Zechuan Zhang, Zongxin Yang, Yi Yang

A key limitation of previous methods is their insufficient prior guidance in transitioning from 2D to 3D and in texture prediction.

Learning from One Continuous Video Stream

no code implementations1 Dec 2023 João Carreira, Michael King, Viorica Pătrăucean, Dilara Gokay, Cătălin Ionescu, Yi Yang, Daniel Zoran, Joseph Heyward, Carl Doersch, Yusuf Aytar, Dima Damen, Andrew Zisserman

We introduce a framework for online learning from a single continuous video stream -- the way people and animals learn, without mini-batches, data augmentation or shuffling.

Data Augmentation Future prediction

AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text

no code implementations29 Nov 2023 Jianfeng Zhang, Xuanmeng Zhang, Huichao Zhang, Jun Hao Liew, Chenxu Zhang, Yi Yang, Jiashi Feng

We study the problem of creating high-fidelity and animatable 3D avatars from only textual descriptions.

FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax

no code implementations27 Nov 2023 Yu Lu, Linchao Zhu, Hehe Fan, Yi Yang

Text-to-video (T2V) generation is a rapidly growing research area that aims to translate the scenes, objects, and actions within complex video text into a sequence of coherent visual frames.

Video Generation

Scalable AI Generative Content for Vehicular Network Semantic Communication

no code implementations23 Nov 2023 Hao Feng, Yi Yang, Zhu Han

Experimental results suggest that the proposed method surpasses the baseline in perceiving vehicles in blind spots and effectively compresses communication data.

Transferring to Real-World Layouts: A Depth-aware Framework for Scene Adaptation

no code implementations21 Nov 2023 Mu Chen, Zhedong Zheng, Yi Yang

Based on such observation, we propose a depth-aware framework to explicitly leverage depth estimation to mix the categories and facilitate the two complementary tasks, i. e., segmentation and depth learning in an end-to-end manner.

Depth Estimation Scene Segmentation +2

Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields

1 code implementation20 Nov 2023 Zhiyuan Min, Yawei Luo, Wei Yang, Yuesong Wang, Yi Yang

Different from existing methods that consider cross-view and along-epipolar information independently, EVE-NeRF conducts the view-epipolar feature aggregation in an entangled manner by injecting the scene-invariant appearance continuity and geometry consistency priors to the aggregation process.

Generalizable Novel View Synthesis

Clarity ChatGPT: An Interactive and Adaptive Processing System for Image Restoration and Enhancement

no code implementations20 Nov 2023 Yanyan Wei, Zhao Zhang, Jiahuan Ren, Xiaogang Xu, Richang Hong, Yi Yang, Shuicheng Yan, Meng Wang

The generalization capability of existing image restoration and enhancement (IRE) methods is constrained by the limited pre-trained datasets, making it difficult to handle agnostic inputs such as different degradation levels and scenarios beyond their design scopes.

Image Restoration Language Modelling

Cut-and-Paste: Subject-Driven Video Editing with Attention Control

no code implementations20 Nov 2023 Zhichao Zuo, Zhao Zhang, Yan Luo, Yang Zhao, Haijun Zhang, Yi Yang, Meng Wang

This paper presents a novel framework termed Cut-and-Paste for real-word semantic video editing under the guidance of text prompt and additional reference image.

Object Video Editing

Exploring the Relationship between In-Context Learning and Instruction Tuning

no code implementations17 Nov 2023 Hanyu Duan, Yixuan Tang, Yi Yang, Ahmed Abbasi, Kar Yan Tam

In this work, we explore the relationship between ICL and IT by examining how the hidden states of LLMs change in these two paradigms.

In-Context Learning

Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads

no code implementations17 Nov 2023 Yi Yang, Hanyu Duan, Ahmed Abbasi, John P. Lalor, Kar Yan Tam

Although a burgeoning literature has emerged on stereotypical bias mitigation in PLMs, such as work on debiasing gender and racial stereotyping, how such biases manifest and behave internally within PLMs remains largely unknown.

Fairness Language Modelling

Human-Centric Autonomous Systems With LLMs for User Command Reasoning

1 code implementation14 Nov 2023 Yi Yang, Qingwen Zhang, Ci Li, Daniel Simões Marta, Nazre Batool, John Folkesson

The evolution of autonomous driving has made remarkable advancements in recent years, evolving into a tangible reality.

Autonomous Driving Binary Classification

Text Augmented Spatial-aware Zero-shot Referring Image Segmentation

no code implementations27 Oct 2023 Yucheng Suo, Linchao Zhu, Yi Yang

This task aims to identify the instance mask that is most related to a referring expression without training on pixel-level annotations.

Image Segmentation Referring Expression +4

RDBench: ML Benchmark for Relational Databases

no code implementations25 Oct 2023 Zizhao Zhang, Yi Yang, Lutong Zou, He Wen, Tao Feng, Jiaxuan You

Benefiting from high-quality datasets and standardized evaluation metrics, machine learning (ML) has achieved sustained progress and widespread applications.

Benchmarking

PPFL: A Personalized Federated Learning Framework for Heterogeneous Population

no code implementations22 Oct 2023 Hao Di, Yi Yang, Haishan Ye, Xiangyu Chang

Personalization aims to characterize individual preferences and is widely applied across many fields.

Personalized Federated Learning

Fast and Accurate Factual Inconsistency Detection Over Long Documents

1 code implementation19 Oct 2023 Barrett Martin Lattimer, Patrick Chen, Xinyuan Zhang, Yi Yang

We introduce SCALE (Source Chunking Approach for Large-scale inconsistency Evaluation), a task-agnostic model for detecting factual inconsistencies using a novel chunking strategy.

Chunking Natural Language Inference +2

FinEntity: Entity-level Sentiment Classification for Financial Texts

1 code implementation19 Oct 2023 Yixuan Tang, Yi Yang, Allen H Huang, Andy Tam, Justin Z Tang

In this work, we introduce an entity-level sentiment classification dataset, called \textbf{FinEntity}, that annotates financial entity spans and their sentiment (positive, neutral, and negative) in financial news.

Classification Sentiment Analysis +1

Is ChatGPT a Financial Expert? Evaluating Language Models on Financial Natural Language Processing

no code implementations19 Oct 2023 Yue Guo, Zian Xu, Yi Yang

This study compares the performance of encoder-only language models and the decoder-only language models.

Language Modelling

IcoCap: Improving Video Captioning by Compounding Images

no code implementations IEEE Transactions on Multimedia 2023 Yuanzhi Liang, Linchao Zhu, Xiaohan Wang, Yi Yang

Video captioning is a more challenging task compared to image captioning, primarily due to differences in content density.

Ranked #4 on Video Captioning on VATEX (using extra training data)

Image Captioning Video Captioning

GETAvatar: Generative Textured Meshes for Animatable Human Avatars

no code implementations ICCV 2023 Xuanmeng Zhang, Jianfeng Zhang, Rohan Chacko, Hongyi Xu, Guoxian Song, Yi Yang, Jiashi Feng

We study the problem of 3D-aware full-body human generation, aiming at creating animatable human avatars with high-quality textures and geometries.

Image Generation

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

1 code implementation NeurIPS 2023 Zechuan Zhang, Li Sun, Zongxin Yang, Ling Chen, Yi Yang

Reconstructing 3D clothed human avatars from single images is a challenging task, especially when encountering complex poses and loose clothing.

LOGICSEG: Parsing Visual Semantics with Neural Logic Learning and Reasoning

no code implementations ICCV 2023 Liulei Li, Wenguan Wang, Yi Yang

Current high-performance semantic segmentation models are purely data-driven sub-symbolic approaches and blind to the structured nature of the visual world.

Segmentation Semantic Parsing +1

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

1 code implementation18 Sep 2023 Kexin Li, Zongxin Yang, Lei Chen, Yi Yang, Jun Xiao

However, existing methods exhibit two limitations: 1) they address video temporal features and audio-visual interactive features separately, disregarding the inherent spatial-temporal dependence of combined audio and video, and 2) they inadequately introduce audio constraints and object-level information during the decoding stage, resulting in segmentation outcomes that fail to comply with audio directives.

Video Segmentation Video Semantic Segmentation

RMP: A Random Mask Pretrain Framework for Motion Prediction

1 code implementation16 Sep 2023 Yi Yang, Qingwen Zhang, Thomas Gilles, Nazre Batool, John Folkesson

As the pretraining technique is growing in popularity, little work has been done on pretrained learning-based motion prediction methods in autonomous driving.

Autonomous Driving motion prediction +1

InvestLM: A Large Language Model for Investment using Financial Domain Instruction Tuning

1 code implementation15 Sep 2023 Yi Yang, Yixuan Tang, Kar Yan Tam

We present a new financial domain large language model, InvestLM, tuned on LLaMA-65B (Touvron et al., 2023), using a carefully curated instruction dataset related to financial investment.

Language Modelling Large Language Model

MC-NeRF: Multi-Camera Neural Radiance Fields for Multi-Camera Image Acquisition Systems

no code implementations14 Sep 2023 Yu Gao, Lutong Su, Hao Liang, Yufeng Yue, Yi Yang, Mengyin Fu

Neural Radiance Fields (NeRF) employ multi-view images for 3D scene representation and have shown remarkable performance.

Aggregating Long-term Sharp Features via Hybrid Transformers for Video Deblurring

1 code implementation13 Sep 2023 Dongwei Ren, Wei Shang, Yi Yang, WangMeng Zuo

To aggregate long-term sharp features from detected sharp frames, we utilize a global Transformer with multi-scale matching capability.

Deblurring

Editing 3D Scenes via Text Prompts without Retraining

1 code implementation10 Sep 2023 Shuangkang Fang, Yufeng Wang, Yi Yang, Yi-Hsuan Tsai, Wenrui Ding, Shuchang Zhou, Ming-Hsuan Yang

To tackle these issues, we introduce a text-driven editing method, termed DN2N, which allows for the direct acquisition of a NeRF model with universal editing capabilities, eliminating the requirement for retraining.

3D scene Editing 3D Scene Reconstruction +2

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

1 code implementation4 Sep 2023 Yunhong Lou, Linchao Zhu, Yaxiong Wang, Xiaohan Wang, Yi Yang

We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions while preserving motion diversity. Despite the recent significant process in text-based human motion generation, existing methods often prioritize fitting training motions at the expense of action diversity.

Ranked #2 on Motion Synthesis on HumanML3D (using extra training data)

Language Modelling Motion Synthesis

RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

no code implementations30 Aug 2023 Mel Vecerik, Carl Doersch, Yi Yang, Todor Davchev, Yusuf Aytar, Guangyao Zhou, Raia Hadsell, Lourdes Agapito, Jon Scholz

For robots to be useful outside labs and specialized factories we need a way to teach them new useful behaviors quickly.

AIoT-Based Drum Transcription Robot using Convolutional Neural Networks

no code implementations29 Aug 2023 Yukun Su, Yi Yang

With the development of information technology, robot technology has made great progress in various fields.

Drum Transcription Music Transcription

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

1 code implementation ICCV 2023 Yuanyou Xu, Zongxin Yang, Yi Yang

Tracking any given object(s) spatially and temporally is a common purpose in Visual Object Tracking (VOT) and Video Object Segmentation (VOS).

Object Representation Learning +6

Logic-induced Diagnostic Reasoning for Semi-supervised Semantic Segmentation

no code implementations ICCV 2023 Chen Liang, Wenguan Wang, Jiaxu Miao, Yi Yang

Recent advances in semi-supervised semantic segmentation have been heavily reliant on pseudo labeling to compensate for limited labeled data, disregarding the valuable relational knowledge among semantic concepts.

Segmentation Semi-Supervised Semantic Segmentation

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

1 code implementation ICCV 2023 Jinyu Chen, Wenguan Wang, Si Liu, Hongsheng Li, Yi Yang

CCPD transfers the fundamental, point-to-point wayfinding skill that is well trained on the large-scale PointGoal task to ORAN, so as to help ORAN to better master audio-visual navigation with far fewer training samples.

Decision Making Transfer Learning +1

Compositional Feature Augmentation for Unbiased Scene Graph Generation

1 code implementation ICCV 2023 Lin Li, Guikun Chen, Jun Xiao, Yi Yang, Chunping Wang, Long Chen

Specifically, we first decompose each relation triplet feature into two components: intrinsic feature and extrinsic feature, which correspond to the intrinsic characteristics and extrinsic contexts of a relation triplet, respectively.

Graph Generation Relation +1

Bird's-Eye-View Scene Graph for Vision-Language Navigation

no code implementations ICCV 2023 Rui Liu, Xiaohan Wang, Wenguan Wang, Yi Yang

Vision-language navigation (VLN), which entails an agent to navigate 3D environments following human instructions, has shown great advances.

Navigate Vision-Language Navigation

JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery

1 code implementation ICCV 2023 Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$\&$3D aligned results in a coarse-to-fine manner and a novel 3D joint contrastive learning approach for adding explicitly global supervision for the 3D feature space.

Contrastive Learning Human Mesh Recovery

DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation

no code implementations31 Jul 2023 Yue Zhang, Hehe Fan, Yi Yang, Mohan Kankanhalli

The proposed method, named Mixture of Depth and Point cloud video experts (DPMix), achieved the first place in the 4D Action Segmentation Track of the HOI4D Challenge 2023.

Action Segmentation Human-Object Interaction Detection +2

Clustering based Point Cloud Representation Learning for 3D Analysis

1 code implementation ICCV 2023 Tuo Feng, Wenguan Wang, Xiaohan Wang, Yi Yang, Qinghua Zheng

The mined patterns are, in turn, used to repaint the embedding space, so as to respect the underlying distribution of the entire training dataset and improve the robustness to the variations.

Clustering Point Cloud Segmentation +2

Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models

1 code implementation24 Jul 2023 Yuanzhi Liang, Linchao Zhu, Yi Yang

MOE challenges models to understand characters' intentions and accurately determine their actions within intricate contexts involving multi-character and novel object interactions.

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

no code implementations ICCV 2023 Xiao Pan, Zongxin Yang, Jianxin Ma, Chang Zhou, Yi Yang

However, such SPC-based representation i) optimizes under the volatile observation space which leads to the pose-misalignment between training and inference stages, and ii) lacks the global relationships among human parts that is critical for handling the incomplete painted SMPL.

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion

no code implementations13 Jul 2023 Shuo Huang, Zongxin Yang, Liangting Li, Yi Yang, Jia Jia

Large-scale pre-trained vision-language models allow for the zero-shot text-based generation of 3D avatars.

Stroke Extraction of Chinese Character Based on Deep Structure Deformable Image Registration

1 code implementation10 Jul 2023 Meng Li, Yahan Yu, Yi Yang, Guanghao Ren, Jian Wang

In this paper, we propose a deep learning-based character stroke extraction method that takes semantic features and prior information of strokes into consideration.

Image Registration Semantic Segmentation

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised Video Object Segmentation

no code implementations5 Jul 2023 Jiahao Li, Yuanyou Xu, Zongxin Yang, Yi Yang, Yueting Zhuang

The Associating Objects with Transformers (AOT) framework has exhibited exceptional performance in a wide range of complex scenarios for video object segmentation.

Object Position +4

ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: TREK-150 Single Object Tracking

no code implementations5 Jul 2023 Yuanyou Xu, Jiahao Li, Zongxin Yang, Yi Yang, Yueting Zhuang

MSDeAOT efficiently propagates object masks from previous frames to the current frame using two feature scales of 16 and 8.

Object Segmentation +4

Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition

1 code implementation3 Jul 2023 Chao Liang, Zongxin Yang, Linchao Zhu, Yi Yang

In real-world scenarios, collected and annotated data often exhibit the characteristics of multiple classes and long-tailed distribution.

Learning with noisy labels Multi-Label Classification +1

Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023

1 code implementation15 Jun 2023 Jiayi Shao, Xiaohan Wang, Ruijie Quan, Yi Yang

This report presents ReLER submission to two tracks in the Ego4D Episodic Memory Benchmark in CVPR 2023, including Natural Language Queries and Moment Queries.

Moment Queries Natural Language Queries

Shuffled Autoregression For Motion Interpolation

no code implementations10 Jun 2023 Shuo Huang, Jia Jia, Zongxin Yang, Wei Wang, Haozhe Wu, Yi Yang, Junliang Xing

However, motion interpolation is a more complex problem that takes isolated poses (e. g., only one start pose and one end pose) as input.

Motion Interpolation

Relieving Triplet Ambiguity: Consensus Network for Language-Guided Image Retrieval

no code implementations3 Jun 2023 Xu Zhang, Zhedong Zheng, Xiaohan Wang, Yi Yang

We propose a novel Consensus Network (Css-Net) that self-adaptively learns from noisy triplets to minimize the negative effects of triplet ambiguity.

Image Retrieval Image Retrieval with Multi-Modal Query +1

A Feature Reuse Framework with Texture-adaptive Aggregation for Reference-based Super-Resolution

1 code implementation2 Jun 2023 Xiaoyong Mei, Yi Yang, Ming Li, Changqin Huang, Kai Zhang, Pietro Lió

In this study, we propose a feature reuse framework that guides the step-by-step texture reconstruction process through different stages, reducing the negative impacts of perceptual and adversarial loss.

Image Super-Resolution Reference-based Super-Resolution

Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models

1 code implementation29 May 2023 Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

Given a single test sample, the VLM is forced to maximize the CLIP reward between the input and sampled results from the VLM output distribution.

Image Captioning Image Classification +5

Whitening-based Contrastive Learning of Sentence Embeddings

1 code implementation28 May 2023 Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, Yi Yang

Consequently, using multiple positive samples with enhanced diversity further improves contrastive learning due to better alignment.

Contrastive Learning Semantic Textual Similarity +4

CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model

1 code implementation23 May 2023 Shuai Zhao, Xiaohan Wang, Linchao Zhu, Ruijie Quan, Yi Yang

With such merits, we transform CLIP into a scene text reader and introduce CLIP4STR, a simple yet effective STR method built upon image and text encoders of CLIP.

 Ranked #1 on Scene Text Recognition on WOST (using extra training data)

Language Modelling Scene Text Recognition

Gloss-Free End-to-End Sign Language Translation

1 code implementation22 May 2023 Kezhou Lin, Xiaohan Wang, Linchao Zhu, Ke Sun, Bang Zhang, Yi Yang

In this paper, we tackle the problem of sign language translation (SLT) without gloss annotations.

Sign Language Translation Translation

VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending

no code implementations22 May 2023 Xingjian He, Sihan Chen, Fan Ma, Zhicheng Huang, Xiaojie Jin, Zikang Liu, Dongmei Fu, Yi Yang, Jing Liu, Jiashi Feng

Towards this goal, we propose a novel video-text pre-training method dubbed VLAB: Video Language pre-training by feature Adapting and Blending, which transfers CLIP representations to video pre-training tasks and develops unified video multimodal models for a wide range of video-text tasks.

 Ranked #1 on Visual Question Answering (VQA) on MSVD-QA (using extra training data)

Question Answering Retrieval +6

PTGB: Pre-Train Graph Neural Networks for Brain Network Analysis

1 code implementation20 May 2023 Yi Yang, Hejie Cui, Carl Yang

The human brain is the central hub of the neurobiological system, controlling behavior and cognition in complex ways.

Transfer Learning Unsupervised Pre-training

PointGPT: Auto-regressively Generative Pre-training from Point Clouds

1 code implementation NeurIPS 2023 Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, Yufeng Yue

Large language models (LLMs) based on the generative pre-training transformer (GPT) have demonstrated remarkable effectiveness across a diverse range of downstream tasks.

Few-Shot Learning

Pyramid Diffusion Models For Low-light Image Enhancement

1 code implementation17 May 2023 Dewei Zhou, Zongxin Yang, Yi Yang

Recovering noise-covered details from low-light images is challenging, and the results given by previous methods leave room for improvement.

Denoising Image Generation +1

Segment and Track Anything

1 code implementation11 May 2023 Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang, Wenguan Wang, Yi Yang

This report presents a framework called Segment And Track Anything (SAMTrack) that allows users to precisely and effectively segment and track any object in a video.

Autonomous Driving Object Tracking

Video Object Segmentation in Panoptic Wild Scenes

2 code implementations8 May 2023 Yuanyou Xu, Zongxin Yang, Yi Yang

Considering the challenges in panoptic VOS, we propose a strong baseline method named panoptic object association with transformers (PAOT), which uses panoptic identification to associate objects with a pyramid architecture on multiple scales.

Object Semantic Segmentation +2

Feature-compatible Progressive Learning for Video Copy Detection

2 code implementations20 Apr 2023 Wenhao Wang, Yifan Sun, Yi Yang

Video Copy Detection (VCD) has been developed to identify instances of unauthorized or duplicated video content.

Copy Detection Video Similarity

DETR with Additional Global Aggregation for Cross-domain Weakly Supervised Object Detection

no code implementations CVPR 2023 Zongheng Tang, Yifan Sun, Si Liu, Yi Yang

Second, through our design, the object queries and the foreground query in the decoder share consensus on the class semantics, therefore making the strong and weak supervision mutually benefit each other for domain alignment.

object-detection Weakly Supervised Object Detection

TransHP: Image Classification with Hierarchical Prompting

1 code implementation NeurIPS 2023 Wenhao Wang, Yifan Sun, Wei Li, Yi Yang

This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task.

Classification Image Classification

Efficient Multimodal Fusion via Interactive Prompting

no code implementations CVPR 2023 Yaowei Li, Ruijie Quan, Linchao Zhu, Yi Yang

Large-scale pre-training has brought unimodal fields such as computer vision and natural language processing to a new era.

PVD-AL: Progressive Volume Distillation with Active Learning for Efficient Conversion Between Different NeRF Architectures

1 code implementation8 Apr 2023 Shuangkang Fang, Yufeng Wang, Yi Yang, Weixin Xu, Heng Wang, Wenrui Ding, Shuchang Zhou

To address this limitation and maximize the potential of each architecture, we propose Progressive Volume Distillation with Active Learning (PVD-AL), a systematic distillation method that enables any-to-any conversions between different architectures.

3D Reconstruction Novel View Synthesis

GIF: A General Graph Unlearning Strategy via Influence Function

1 code implementation6 Apr 2023 Jiancan Wu, Yi Yang, Yuchun Qian, Yongduo Sui, Xiang Wang, Xiangnan He

Then, we recognize the crux to the inability of traditional influence function for graph unlearning, and devise Graph Influence Function (GIF), a model-agnostic unlearning method that can efficiently and accurately estimate parameter changes in response to a $\epsilon$-mass perturbation in deleted data.

Machine Unlearning

Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time

2 code implementations CVPR 2023 Wei Shang, Dongwei Ren, Yi Yang, Hongzhi Zhang, Kede Ma, WangMeng Zuo

Moreover, on the seemingly implausible x16 interpolation task, our method outperforms existing methods by more than 1. 5 dB in terms of PSNR.

Contrastive Learning Deblurring +2

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

1 code implementation CVPR 2023 Xiaolong Shen, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, Yi Yang

However, using a single kind of modeling structure is difficult to balance the learning of short-term and long-term temporal correlations, and may bias the network to one of them, leading to undesirable predictions like global location shift, temporal inconsistency, and insufficient local details.

3D human pose and shape estimation

CRRS: Concentric Rectangles Regression Strategy for Multi-point Representation on Fisheye Images

no code implementations26 Mar 2023 Xihan Wang, Xi Xu, Yu Gao, Yi Yang, Yufeng Yue, Mengyin Fu

Compared with the previous work for muti-point representation, the experiments show that CRRS can improve the training performance both in accurate and stability.

regression

Exploring Expression-related Self-supervised Learning for Affective Behaviour Analysis

1 code implementation18 Mar 2023 Fanglei Xue, Yifan Sun, Yi Yang

This paper explores an expression-related self-supervised learning (SSL) method (ContraWarping) to perform expression classification in the 5th Affective Behavior Analysis in-the-wild (ABAW) competition.

Self-Supervised Learning

Unsupervised Facial Expression Representation Learning with Contrastive Local Warping

1 code implementation16 Mar 2023 Fanglei Xue, Yifan Sun, Yi Yang

Therefore, given a facial image, ContraWarping employs some global transformations and local warping to generate its positive and negative samples and sets up a novel contrastive learning framework.

Contrastive Learning Facial Expression Recognition +4

Lana: A Language-Capable Navigator for Instruction Following and Generation

1 code implementation CVPR 2023 Xiaohan Wang, Wenguan Wang, Jiayi Shao, Yi Yang

Recently, visual-language navigation (VLN) -- entailing robot agents to follow navigation instructions -- has shown great advance.

Instruction Following Text Generation

DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training

1 code implementation6 Mar 2023 Wei Li, Linchao Zhu, Longyin Wen, Yi Yang

This decoder is both data-efficient and computation-efficient: 1) it only requires the text data for training, easing the burden on the collection of paired data.

Image Captioning Text Generation

Soft Prompt Guided Joint Learning for Cross-Domain Sentiment Analysis

no code implementations1 Mar 2023 Jingli Shi, Weihua Li, Quan Bai, Yi Yang, Jianhua Jiang

Aspect term extraction is a fundamental task in fine-grained sentiment analysis, which aims at detecting customer's opinion targets from reviews on product or service.

Sentiment Analysis Term Extraction +1

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding

no code implementations22 Jan 2023 Juncheng Li, Siliang Tang, Linchao Zhu, Wenqiao Zhang, Yi Yang, Tat-Seng Chua, Fei Wu, Yueting Zhuang

To systematically benchmark the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

Temporal Perceiving Video-Language Pre-training

no code implementations18 Jan 2023 Fan Ma, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, Jiashi Feng, Yi Yang

Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features.

Contrastive Learning Moment Retrieval +7

Knowledge-guided Causal Intervention for Weakly-supervised Object Localization

1 code implementation3 Jan 2023 Feifei Shao, Yawei Luo, Fei Gao, Yi Yang, Jun Xiao

Previous weakly-supervised object localization (WSOL) methods aim to expand activation map discriminative areas to cover the whole objects, yet neglect two inherent challenges when relying solely on image-level labels.

Knowledge Distillation Object +1

Analogical Inference Enhanced Knowledge Graph Embedding

1 code implementation3 Jan 2023 Zhen Yao, Wen Zhang, Mingyang Chen, Yufeng Huang, Yi Yang, Huajun Chen

And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding.

Knowledge Graph Embedding Knowledge Graphs +1

ProD: Prompting-To-Disentangle Domain Knowledge for Cross-Domain Few-Shot Image Classification

no code implementations CVPR 2023 Tianyi Ma, Yifan Sun, Zongxin Yang, Yi Yang

Based on these two common practices, the key point of ProD is using the prompting mechanism in the transformer to disentangle the domain-general (DG) and domain-specific (DS) knowledge from the backbone feature.

Cross-Domain Few-Shot Domain Generalization +1

Learning Symmetry-Aware Geometry Correspondences for 6D Object Pose Estimation

1 code implementation ICCV 2023 Heng Zhao, Shenxing Wei, Dahu Shi, Wenming Tan, Zheyang Li, Ye Ren, Xing Wei, Yi Yang, ShiLiang Pu

Taking the symmetry properties of objects into consideration, we design a symmetry-aware matching loss to facilitate the learning of dense point-wise geometry features and improve the performance considerably.

6D Pose Estimation 6D Pose Estimation using RGB +3

MAAL: Multimodality-Aware Autoencoder-Based Affordance Learning for 3D Articulated Objects

1 code implementation ICCV 2023 Yuanzhi Liang, Xiaohan Wang, Linchao Zhu, Yi Yang

Experimental results and visualizations, based on a large-scale dataset PartNet-Mobility, show the effectiveness of MAAL in learning multi-modal data and solving the 3D articulated object affordance problem.

Object

Context-Aware Pretraining for Efficient Blind Image Decomposition

1 code implementation CVPR 2023 Chao Wang, Zhedong Zheng, Ruijie Quan, Yifan Sun, Yi Yang

(2) The conventional paradigm usually focuses on mining the abnormal pattern of a superimposed image to separate the noise, which de facto conflicts with the primary image restoration task.

Attribute Image Reconstruction +1

Distilling DETR with Visual-Linguistic Knowledge for Open-Vocabulary Object Detection

1 code implementation ICCV 2023 Liangqi Li, Jiaxu Miao, Dahu Shi, Wenming Tan, Ye Ren, Yi Yang, ShiLiang Pu

Current methods for open-vocabulary object detection (OVOD) rely on a pre-trained vision-language model (VLM) to acquire the recognition ability.

Knowledge Distillation Language Modelling +2

PointListNet: Deep Learning on 3D Point Lists

no code implementations CVPR 2023 Hehe Fan, Linchao Zhu, Yi Yang, Mohan Kankanhalli

Deep neural networks on regular 1D lists (e. g., natural languages) and irregular 3D sets (e. g., point clouds) have made tremendous achievements.

Rethinking Point Cloud Registration as Masking and Reconstruction

1 code implementation ICCV 2023 Guangyan Chen, Meiling Wang, Li Yuan, Yi Yang, Yufeng Yue

In this paper, a critical observation is made that the invisible parts of each point cloud can be directly utilized as inherent masks, and the aligned point cloud pair can be regarded as the reconstruction target.

Point Cloud Registration

Adversarially Masking Synthetic To Mimic Real: Adaptive Noise Injection for Point Cloud Segmentation Adaptation

no code implementations CVPR 2023 Guangrui Li, Guoliang Kang, Xiaohan Wang, Yunchao Wei, Yi Yang

With the help of adversarial training, the masking module can learn to generate source masks to mimic the pattern of irregular target noise, thereby narrowing the domain gap.

Point Cloud Segmentation Semantic Segmentation

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

5 code implementations CVPR 2023 Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Action Classification Action Recognition +3

StepNet: Spatial-temporal Part-aware Network for Sign Language Recognition

no code implementations25 Dec 2022 Xiaolong Shen, Zhedong Zheng, Yi Yang

As the name implies, StepNet consists of two modules: Part-level Spatial Modeling and Part-level Temporal Modeling.

Optical Flow Estimation Sign Language Recognition

MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering

1 code implementation CVPR 2023 Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou

To build Video Question Answering (VideoQA) systems capable of assisting humans in daily activities, seeking answers from long-form videos with diverse and complex events is a must.

Question Answering Video Question Answering +2

One is All: Bridging the Gap Between Neural Radiance Fields Architectures with Progressive Volume Distillation

1 code implementation29 Nov 2022 Shuangkang Fang, Weixin Xu, Heng Wang, Yi Yang, Yufeng Wang, Shuchang Zhou

In this paper, we propose Progressive Volume Distillation (PVD), a systematic distillation method that allows any-to-any conversions between different architectures, including MLP, sparse or low-rank tensors, hashtables and their compositions.

 Ranked #1 on Novel View Synthesis on NeRF (Average PSNR metric)

3D Reconstruction Neural Rendering +1

A Light-weight, Effective and Efficient Model for Label Aggregation in Crowdsourcing

no code implementations19 Nov 2022 Yi Yang, Zhong-Qiu Zhao, Quan Bai, Qing Liu, Weihua Li

Due to the dynamic nature, the proposed algorithms can also estimate true labels online without re-visiting historical data.

Stereo Image Rain Removal via Dual-View Mutual Attention

no code implementations18 Nov 2022 Yanyan Wei, Zhao Zhang, ZhongQiu Zhao, Yang Zhao, Richang Hong, Yi Yang

Stereo images, containing left and right view images with disparity, are utilized in solving low-vision tasks recently, e. g., rain removal and super-resolution.

Disparity Estimation Image Restoration +2

ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022

1 code implementation17 Nov 2022 Jiayi Shao, Xiaohan Wang, Yi Yang

Moreover, in order to better capture the long-term temporal dependencies in the long videos, we propose a segment-level recurrence mechanism.

Moment Queries Temporal Action Localization

Exploiting Contrastive Learning and Numerical Evidence for Confusing Legal Judgment Prediction

no code implementations15 Nov 2022 Leilei Gan, Baokui Li, Kun Kuang, Yating Zhang, Lei Wang, Luu Anh Tuan, Yi Yang, Fei Wu

Given the fact description text of a legal case, legal judgment prediction (LJP) aims to predict the case's charge, law article and penalty term.

Contrastive Learning

PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation

1 code implementation14 Nov 2022 Mu Chen, Zhedong Zheng, Yi Yang, Tat-Seng Chua

In an attempt to fill this gap, we propose a unified pixel- and patch-wise self-supervised learning framework, called PiPa, for domain adaptive semantic segmentation that facilitates intra-image pixel-wise correlations and patch-wise semantic consistency against different contexts.

Self-Supervised Learning Semantic Segmentation +2

An Improved End-to-End Multi-Target Tracking Method Based on Transformer Self-Attention

no code implementations11 Nov 2022 Yong Hong, Deren Li, Shupei Luo, Xin Chen, Yi Yang, Mi Wang

This study proposes an improved end-to-end multi-target tracking algorithm that adapts to multi-view multi-scale scenes based on the self-attentive mechanism of the transformer's encoder-decoder structure.

Multiple Object Tracking

Learning Cross-view Geo-localization Embeddings via Dynamic Weighted Decorrelation Regularization

no code implementations10 Nov 2022 Tingyu Wang, Zhedong Zheng, Zunjie Zhu, Yuhan Gao, Yi Yang, Chenggang Yan

Cross-view geo-localization aims to spot images of the same location shot from two platforms, e. g., the drone platform and the satellite platform.

NoiSER: Noise is All You Need for Low-Light Image Enhancement

no code implementations9 Nov 2022 Zhao Zhang, Suiyi Zhao, Xiaojie Jin, Mingliang Xu, Yi Yang, Shuicheng Yan

In this paper, we present an embarrassingly simple yet effective solution to a seemingly impossible mission, low-light image enhancement (LLIE) without access to any task-related data.

Low-Light Image Enhancement regression

TAP-Vid: A Benchmark for Tracking Any Point in a Video

3 code implementations7 Nov 2022 Carl Doersch, Ankush Gupta, Larisa Markeeva, Adrià Recasens, Lucas Smaira, Yusuf Aytar, João Carreira, Andrew Zisserman, Yi Yang

Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move.

Optical Flow Estimation Point Tracking

Simple Primitives with Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-shot Learning

no code implementations5 Nov 2022 Zhe Liu, Yun Li, Lina Yao, Xiaojun Chang, Wei Fang, XiaoJun Wu, Yi Yang

We design Semantic Attention (SA) and generative Knowledge Disentanglement (KD) to learn the dependence of feasibility and contextuality, respectively.

Compositional Zero-Shot Learning Disentanglement

Decoupled Cross-Scale Cross-View Interaction for Stereo Image Enhancement in The Dark

no code implementations2 Nov 2022 Huan Zheng, Zhao Zhang, Jicong Fan, Richang Hong, Yi Yang, Shuicheng Yan

Specifically, we present a decoupled interaction module (DIM) that aims for sufficient dual-view information interaction.

Image Enhancement

Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing

no code implementations28 Oct 2022 Wenguan Wang, Yi Yang, Fei Wu

Neural-symbolic computing (NeSy), which pursues the integration of the symbolic and statistical paradigms of cognition, has been an active research area of Artificial Intelligence (AI) for many years.

Tele-Knowledge Pre-training for Fault Analysis

1 code implementation20 Oct 2022 Zhuo Chen, Wen Zhang, Yufeng Huang, Mingyang Chen, Yuxia Geng, Hongtao Yu, Zhen Bi, Yichi Zhang, Zhen Yao, Wenting Song, Xinliang Wu, Yi Yang, Mingyi Chen, Zhaoyang Lian, YingYing Li, Lei Cheng, Huajun Chen

In this work, we share our experience on tele-knowledge pre-training for fault analysis, a crucial task in telecommunication applications that requires a wide range of knowledge normally found in both machine log data and product documents.

Language Modelling

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

2 code implementations18 Oct 2022 Zongxin Yang, Yi Yang

To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach.

Object Semantic Segmentation +2

Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation

no code implementations18 Oct 2022 Ruijun Li, Weihua Li, Yi Yang, Hanyu Wei, Jianhua Jiang, Quan Bai

Recently, diffusion models have been proven to perform remarkably well in text-to-image synthesis tasks in a number of studies, immediately presenting new study opportunities for image generation.

Language Modelling Text-to-Image Generation

Feature-Proxy Transformer for Few-Shot Segmentation

2 code implementations13 Oct 2022 Jian-Wei Zhang, Yifan Sun, Yi Yang, Wei Chen

With a rethink of recent advances, we find that the current FSS framework has deviated far from the supervised segmentation framework: Given the deep features, FSS methods typically use an intricate decoder to perform sophisticated pixel-wise matching, while the supervised segmentation methods use a simple linear classification head.

Few-Shot Semantic Segmentation Segmentation +1

Sparse Teachers Can Be Dense with Knowledge

1 code implementation8 Oct 2022 Yi Yang, Chen Zhang, Dawei Song

Recent advances in distilling pretrained language models have discovered that, besides the expressiveness of knowledge, the student-friendliness should be taken into consideration to realize a truly knowledgable teacher.

GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models

2 code implementations5 Oct 2022 Chen Liang, Wenguan Wang, Jiaxu Miao, Yi Yang

Going beyond this, we propose GMMSeg, a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature, class).

Segmentation Semantic Segmentation

Seeing Through the Noisy Dark: Towards Real-world Low-Light Image Enhancement and Denoising

no code implementations2 Oct 2022 Jiahuan Ren, Zhao Zhang, Richang Hong, Mingliang Xu, Yi Yang, Shuicheng Yan

Low-light image enhancement (LLIE) aims at improving the illumination and visibility of dark images with lighting noise.

Attribute Denoising +1

Slimmable Networks for Contrastive Self-supervised Learning

no code implementations30 Sep 2022 Shuai Zhao, Xiaohan Wang, Linchao Zhu, Yi Yang

In this work, we present a one-stage solution to obtain pre-trained small models without the need for extra teachers, namely, slimmable networks for contrastive self-supervised learning (\emph{SlimCLR}).

Contrastive Learning Knowledge Distillation +1

Boost CTR Prediction for New Advertisements via Modeling Visual Content

no code implementations23 Sep 2022 Tan Yu, Zhipeng Jin, Jie Liu, Yi Yang, Hongliang Fei, Ping Li

To overcome the limitations of behavior ID features in modeling new ads, we exploit the visual content in ads to boost the performance of CTR prediction models.

Click-Through Rate Prediction Quantization

Instance As Identity: A Generic Online Paradigm for Video Instance Segmentation

1 code implementation5 Aug 2022 Feng Zhu, Zongxin Yang, Xin Yu, Yi Yang, Yunchao Wei

In this work, we propose a new online VIS paradigm named Instance As Identity (IAI), which models temporal information for both detection and tracking in an efficient way.

Instance Segmentation Semantic Segmentation +1

Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation

1 code implementation3 Aug 2022 Xingchen Li, Long Chen, Wenbo Ma, Yi Yang, Jun Xiao

However, we argue that most existing WSSGG works only focus on object-consistency, which means the grounded regions should have the same object category label as text entities.

Graph Generation Object +1

GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning

no code implementations3 Aug 2022 Benyuan Sun, Jin Dai, Zihao Liang, Congying Liu, Yi Yang, Bo Bai

SIMT lays the foundation of pre-training with large-scale multi-task multi-domain datasets and is proved essential for stable training in our GPPF experiments.

Multi-Task Learning

V$^2$L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval

1 code implementation26 Jul 2022 Wenhao Wang, Yifan Sun, Zongxin Yang, Yi Yang

While model ensemble is common, we show that combining the vision models and vision-language models brings particular benefits from their complementarity and is a key factor to our superiority.

Metric Learning Retrieval

Doge Tickets: Uncovering Domain-general Language Models by Playing Lottery Tickets

1 code implementation20 Jul 2022 Yi Yang, Chen Zhang, Benyou Wang, Dawei Song

To uncover the domain-general LM, we propose to identify domain-general parameters by playing lottery tickets (dubbed doge tickets).

Domain Generalization

MHR-Net: Multiple-Hypothesis Reconstruction of Non-Rigid Shapes from 2D Views

1 code implementation19 Jul 2022 Haitian Zeng, Xin Yu, Jiaxu Miao, Yi Yang

We propose MHR-Net, a novel method for recovering Non-Rigid Shapes from Motion (NRSfM).

ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries Challenge 2022

1 code implementation1 Jul 2022 Naiyuan Liu, Xiaohan Wang, Xiaobo Li, Yi Yang, Yueting Zhuang

In this report, we present the ReLER@ZJU-Alibaba submission to the Ego4D Natural Language Queries (NLQ) Challenge in CVPR 2022.

Data Augmentation Natural Language Queries

Data-Efficient Brain Connectome Analysis via Multi-Task Meta-Learning

1 code implementation9 Jun 2022 Yi Yang, Yanqiao Zhu, Hejie Cui, Xuan Kan, Lifang He, Ying Guo, Carl Yang

Specifically, we propose to meta-train the model on datasets of large sample sizes and transfer the knowledge to small datasets.

Meta-Learning

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

1 code implementation ICLR 2021 Hehe Fan, Xin Yu, Yuhang Ding, Yi Yang, Mohan Kankanhalli

Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.

3D Action Recognition Semantic Segmentation

A Benchmark and Asymmetrical-Similarity Learning for Practical Image Copy Detection

1 code implementation24 May 2022 Wenhao Wang, Yifan Sun, Yi Yang

Moreover, this paper further reveals a unique difficulty for solving the hard negative problem in ICD, i. e., there is a fundamental conflict between current metric learning and ICD.

Copy Detection Metric Learning

Joint Representation Learning and Keypoint Detection for Cross-view Geo-localization

1 code implementation IEEE Transactions on Image Processing (TIP) 2022 Jinliang Lin, Zhedong Zheng, Zhun Zhong, Zhiming Luo, Shaozi Li, Yi Yang, Nicu Sebe

Inspired by the human visual system for mining local patterns, we propose a new framework called RK-Net to jointly learn the discriminative Representation and detect salient Keypoints with a single Network.

Drone navigation Drone-view target localization +3

A Simple Yet Efficient Method for Adversarial Word-Substitute Attack

no code implementations7 May 2022 Tianle Li, Yi Yang

This research highlights that an adversary can fool a deep NLP model with much less cost.

text-classification Text Classification

CenterCLIP: Token Clustering for Efficient Text-Video Retrieval

1 code implementation2 May 2022 Shuai Zhao, Linchao Zhu, Xiaohan Wang, Yi Yang

In this paper, to reduce the number of redundant video tokens, we design a multi-segment token clustering algorithm to find the most representative tokens and drop the non-essential ones.

Ranked #9 on Video Retrieval on MSVD (using extra training data)

Clustering Retrieval +1

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective

1 code implementation27 Apr 2022 Zhedong Zheng, Jiayin Zhu, Wei Ji, Yi Yang, Tat-Seng Chua

This research aims to study a self-supervised 3D clothing reconstruction method, which recovers the geometry shape and texture of human clothing from a single image.

3D Reconstruction Person Re-Identification +2

Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives

no code implementations25 Apr 2022 Shaoning Xiao, Long Chen, Kaifeng Gao, Zhao Wang, Yi Yang, Zhimeng Zhang, Jun Xiao

From the view of feature, we break down the video into trajectories and first leverage trajectory feature in VideoQA to enhance the alignment between two modalities.

Question Answering Video Question Answering

Bidirectional Self-Training with Multiple Anisotropic Prototypes for Domain Adaptive Semantic Segmentation

1 code implementation16 Apr 2022 Yulei Lu, Yawei Luo, Li Zhang, Zheyang Li, Yi Yang, Jun Xiao

A thriving trend for domain adaptive segmentation endeavors to generate the high-quality pseudo labels for target domain and retrain the segmentor on them.

Pseudo Label Semantic Segmentation +2

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis

1 code implementation CVPR 2022 Xuanmeng Zhang, Zhedong Zheng, Daiheng Gao, Bang Zhang, Pan Pan, Yi Yang

To address this challenge, we propose Multi-View Consistent Generative Adversarial Networks (MVCGAN) for high-quality 3D-aware image synthesis with geometry constraints.

3D-Aware Image Synthesis

Unified Transformer Tracker for Object Tracking

1 code implementation CVPR 2022 Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, Zhicheng Yan

Although UniTrack \cite{wang2021different} demonstrates that a shared appearance model with multiple heads can be used to tackle individual tracking tasks, it fails to exploit the large-scale tracking datasets for training and performs poorly on single object tracking.

Multiple Object Tracking Object

In-N-Out Generative Learning for Dense Unsupervised Video Segmentation

1 code implementation29 Mar 2022 Xiao Pan, Peike Li, Zongxin Yang, Huiling Zhou, Chang Zhou, Hongxia Yang, Jingren Zhou, Yi Yang

By contrast, pixel-level optimization is more explicit, however, it is sensitive to the visual quality of training data and is not robust to object deformation.

Contrastive Learning Semantic Segmentation +3

Automated Progressive Learning for Efficient Training of Vision Transformers

1 code implementation CVPR 2022 Changlin Li, Bohan Zhuang, Guangrun Wang, Xiaodan Liang, Xiaojun Chang, Yi Yang

First, we develop a strong manual baseline for progressive learning of ViTs, by introducing momentum growth (MoGrow) to bridge the gap brought by model growth.

Deep Hierarchical Semantic Segmentation

2 code implementations CVPR 2022 Liulei Li, Tianfei Zhou, Wenguan Wang, Jianwu Li, Yi Yang

In this paper, we instead address hierarchical semantic segmentation (HSS), which aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.

Multi-Label Classification Segmentation +1

Visual Abductive Reasoning

1 code implementation CVPR 2022 Chen Liang, Wenguan Wang, Tianfei Zhou, Yi Yang

In this paper, we propose a new task and dataset, Visual Abductive Reasoning (VAR), for examining abductive reasoning ability of machine intelligence in everyday visual situations.

Benchmarking Sentence +1

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

1 code implementation CVPR 2022 Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric Wang

To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

Scalable Video Object Segmentation with Identification Mechanism

2 code implementations22 Mar 2022 Zongxin Yang, Jiaxu Miao, Yunchao Wei, Wenguan Wang, Xiaohan Wang, Yi Yang

This paper delves into the challenges of achieving scalable and effective multi-object modeling for semi-supervised Video Object Segmentation (VOS).

Object Segmentation +3

Bridging the Source-to-target Gap for Cross-domain Person Re-Identification with Intermediate Domains

1 code implementation3 Mar 2022 Yongxing Dai, Yifan Sun, Jun Liu, Zekun Tong, Yi Yang, Ling-Yu Duan

Instead of directly aligning the source and target domains against each other, we propose to align the source and target domains against their intermediate domains for a smooth knowledge transfer.

Domain Generalization Person Re-Identification +1

Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning

no code implementations25 Feb 2022 Feifei Shao, Yawei Luo, Ping Liu, Jie Chen, Yi Yang, Yulei Lu, Jun Xiao

To deploy SSDR-AL in a more practical scenario, we design a noise-aware iterative labeling strategy to confront the "noisy annotation" problem introduced by the previous "dominant labeling" strategy in superpoints.

Active Learning Semantic Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.