Search Results for author: Jingjing Chen

Found 52 papers, 32 papers with code

Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models

no code implementations19 Apr 2024 Yian Li, Wentao Tian, Yang Jiao, Jingjing Chen, Yu-Gang Jiang

Counterfactual reasoning, as a crucial manifestation of human intelligence, refers to making presuppositions based on established facts and extrapolating potential outcomes.

Benchmarking counterfactual +3

From Canteen Food to Daily Meals: Generalizing Food Recognition to More Practical Scenarios

no code implementations12 Mar 2024 Guoshan Liu, Yang Jiao, Jingjing Chen, Bin Zhu, Yu-Gang Jiang

These two datasets are used to evaluate the transferability of approaches from the well-curated food image domain to the everyday-life food image domain.

Food Recognition

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models

1 code implementation12 Mar 2024 Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

To address this issue, we propose a novel LMM architecture named Lumen, a Large multimodal model with versatile vision-centric capability enhancement.

Concept Alignment Language Modelling

Doubly Abductive Counterfactual Inference for Text-based Image Editing

1 code implementation5 Mar 2024 Xue Song, Jiequan Cui, Hanwang Zhang, Jingjing Chen, Richang Hong, Yu-Gang Jiang

Through the lens of the formulation, we find that the crux of TBIE is that existing techniques hardly achieve a good trade-off between editability and fidelity, mainly due to the overfitting of the single-image fine-tuning.

counterfactual Counterfactual Inference +2

Open-Vocabulary Video Relation Extraction

1 code implementation25 Dec 2023 Wentao Tian, Zheng Wang, Yuqian Fu, Jingjing Chen, Lechao Cheng

A comprehensive understanding of videos is inseparable from describing the action with its contextual action-object interactions.

Action Classification Action Understanding +3

FoodLMM: A Versatile Food Assistant using Large Multi-modal Model

no code implementations22 Dec 2023 Yuehao Yin, Huiyan Qi, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, Chong-Wah Ngo

In the second stage, we construct a multi-round conversation dataset and a reasoning segmentation dataset to fine-tune the model, enabling it to conduct professional dialogues and generate segmentation masks based on complex reasoning in the food domain.

Food Recognition Multi-Task Learning +3

On the Importance of Spatial Relations for Few-shot Action Recognition

no code implementations14 Aug 2023 Yilun Zhang, Yuqian Fu, Xingjun Ma, Lizhe Qi, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

We are thus motivated to investigate the importance of spatial relations and propose a more accurate few-shot action recognition method that leverages both spatial and temporal information.

Few-Shot action recognition Few Shot Action Recognition +1

NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario

2 code implementations24 May 2023 Tianwen Qian, Jingjing Chen, Linhai Zhuo, Yang Jiao, Yu-Gang Jiang

We introduce a novel visual question answering (VQA) task in the context of autonomous driving, aiming to answer natural language questions based on street-view clues.

Autonomous Driving Question Answering +1

Transferability Estimation Based On Principal Gradient Expectation

no code implementations29 Nov 2022 Huiyan Qi, Lechao Cheng, Jingjing Chen, Yue Yu, Xue Song, Zunlei Feng, Yu-Gang Jiang

Transfer learning aims to improve the performance of target tasks by transferring knowledge acquired in source tasks.

Transfer Learning

ME-D2N: Multi-Expert Domain Decompositional Network for Cross-Domain Few-Shot Learning

1 code implementation11 Oct 2022 Yuqian Fu, Yu Xie, Yanwei Fu, Jingjing Chen, Yu-Gang Jiang

Concretely, to solve the data imbalance problem between the source data with sufficient examples and the auxiliary target data with limited examples, we build our model under the umbrella of multi-expert learning.

cross-domain few-shot learning Knowledge Distillation

TGDM: Target Guided Dynamic Mixup for Cross-Domain Few-Shot Learning

1 code implementation11 Oct 2022 Linhai Zhuo, Yuqian Fu, Jingjing Chen, Yixin Cao, Yu-Gang Jiang

The proposed TGDM framework contains a Mixup-3T network for learning classifiers and a dynamic ratio generation network (DRGN) for learning the optimal mix ratio.

cross-domain few-shot learning Transfer Learning

Text-driven Video Prediction

no code implementations6 Oct 2022 Xue Song, Jingjing Chen, Bin Zhu, Yu-Gang Jiang

Specifically, appearance and motion components are provided by the image and caption separately.

Causal Inference Video Generation +1

Locate before Answering: Answer Guided Question Localization for Video Question Answering

no code implementations5 Oct 2022 Tianwen Qian, Ran Cui, Jingjing Chen, Pai Peng, Xiaowei Guo, Yu-Gang Jiang

Considering the fact that the question often remains concentrated in a short temporal range, we propose to first locate the question to a segment in the video and then infer the answer using the located segment only.

Question Answering Video Question Answering

Enhancing the Self-Universality for Transferable Targeted Attacks

1 code implementation CVPR 2023 Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

Our new attack method is proposed based on the observation that highly universal adversarial perturbations tend to be more transferable for targeted attacks.

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

1 code implementation CVPR 2023 Yang Jiao, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques.

3D Object Detection Autonomous Driving +1

Unsupervised High-Resolution Portrait Gaze Correction and Animation

1 code implementation1 Jul 2022 Jichao Zhang, Jingjing Chen, Hao Tang, Enver Sangineto, Peng Wu, Yan Yan, Nicu Sebe, Wei Wang

Solving this problem using an unsupervised method remains an open problem, especially for high-resolution face images in the wild, which are not easy to annotate with gaze and head pose labels.

Image Inpainting Vocal Bursts Intensity Prediction

Cross-lingual Adaptation for Recipe Retrieval with Mixup

no code implementations8 May 2022 Bin Zhu, Chong-Wah Ngo, Jingjing Chen, Wing-Kwong Chan

To bridge the domain gap, recipe mixup loss is proposed to enforce the intermediate domain to locate in the shortest geodesic path between source and target domains in the recipe embedding space.

Retrieval Unsupervised Domain Adaptation

Adaptive Split-Fusion Transformer

1 code implementation26 Apr 2022 Zixuan Su, Hao Zhang, Jingjing Chen, Lei Pang, Chong-Wah Ngo, Yu-Gang Jiang

Neural networks for visual content understanding have recently evolved from convolutional ones (CNNs) to transformers.

Image Classification

ObjectFormer for Image Manipulation Detection and Localization

no code implementations CVPR 2022 Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Ser-Nam Lim, Yu-Gang Jiang

Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives the research of image tampering detection.

Image Manipulation Image Manipulation Detection

Wave-SAN: Wavelet based Style Augmentation Network for Cross-Domain Few-Shot Learning

1 code implementation15 Mar 2022 Yuqian Fu, Yu Xie, Yanwei Fu, Jingjing Chen, Yu-Gang Jiang

The key challenge of CD-FSL lies in the huge data shift between source and target domains, which is typically in the form of totally different visual styles.

cross-domain few-shot learning Self-Supervised Learning

Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding

no code implementations10 Mar 2022 Yang Jiao, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Recently, one-stage visual grounders attract high attention due to their comparable accuracy but significantly higher efficiency than two-stage grounders.

Object Visual Grounding

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

1 code implementation10 Mar 2022 Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart.

3D dense captioning Dense Captioning +3

Cross-Modal Transferable Adversarial Attacks from Images to Videos

no code implementations CVPR 2022 Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

This paper investigates the transferability of adversarial perturbation across different modalities, i. e., leveraging adversarial perturbation generated on white-box image models to attack black-box video models.

Video Recognition

Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation

no code implementations10 Dec 2021 Tianyi Liu, Zuxuan Wu, Wenhan Xiong, Jingjing Chen, Yu-Gang Jiang

Our experiments show that there is a trade-off between understanding tasks and generation tasks while using the same model, and a feasible way to improve both tasks is to use more data.

Image-text matching Language Modelling +8

BiC-Net: Learning Efficient Spatio-Temporal Relation for Text-Video Retrieval

2 code implementations29 Oct 2021 Ning Han, Jingjing Chen, Chuhao Shi, Yawen Zeng, Guangyi Xiao, Hao Chen

The task of text-video retrieval aims to understand the correspondence between language and vision, has gained increasing attention in recent years.

Cross-Modal Retrieval Relation +3

Attacking Video Recognition Models with Bullet-Screen Comments

1 code implementation29 Oct 2021 Kai Chen, Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

On both UCF-101 and HMDB-51 datasets, our BSC attack method can achieve about 90\% fooling rate when attacking three mainstream video recognition models, while only occluding \textless 8\% areas in the video.

Adversarial Attack Adversarial Attack on Video Classification +2

Boosting the Transferability of Video Adversarial Examples via Temporal Translation

1 code implementation18 Oct 2021 Zhipeng Wei, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

To this end, we propose to boost the transferability of video adversarial examples for black-box attacks on video recognition models.

Adversarial Attack Translation +1

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

1 code implementation9 Oct 2021 Yang Jiao, Zequn Jie, Weixin Luo, Jingjing Chen, Yu-Gang Jiang, Xiaolin Wei, Lin Ma

Referring Image Segmentation (RIS) aims at segmenting the target object from an image referred by one given natural language expression.

Image Segmentation Retrieval +2

Self-supervised Learning for Semi-supervised Temporal Language Grounding

no code implementations23 Sep 2021 Fan Luo, Shaoxiang Chen, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang

Given a text description, Temporal Language Grounding (TLG) aims to localize temporal boundaries of the segments that contain the specified semantics in an untrimmed video.

Contrastive Learning Pseudo Label +2

Towards Transferable Adversarial Attacks on Vision Transformers

2 code implementations9 Sep 2021 Zhipeng Wei, Jingjing Chen, Micah Goldblum, Zuxuan Wu, Tom Goldstein, Yu-Gang Jiang

We evaluate the transferability of attacks on state-of-the-art ViTs, CNNs and robustly trained CNNs.

Cross-domain Contrastive Learning for Unsupervised Domain Adaptation

1 code implementation10 Jun 2021 Rui Wang, Zuxuan Wu, Zejia Weng, Jingjing Chen, Guo-Jun Qi, Yu-Gang Jiang

Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a fully-labeled source domain to a different unlabeled target domain.

Clustering Contrastive Learning +3

VideoLT: Large-scale Long-tailed Video Recognition

1 code implementation ICCV 2021 Xing Zhang, Zuxuan Wu, Zejia Weng, Huazhu Fu, Jingjing Chen, Yu-Gang Jiang, Larry Davis

In this paper, we introduce VideoLT, a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition.

Image Classification Video Recognition

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

2 code implementations20 Apr 2021 Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Ser-Nam Lim, Yu-Gang Jiang

The widespread dissemination of Deepfakes demands effective approaches that can detect perceptually convincing forged images.

DeepFake Detection Face Swapping +1

WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection

1 code implementation5 Jan 2021 Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, Yu-Gang Jiang

WildDeepfake is a small dataset that can be used, in addition to existing datasets, to develop and test the effectiveness of deepfake detectors against real-world deepfakes.

DeepFake Detection Face Swapping

Colonoscopy Polyp Detection: Domain Adaptation From Medical Report Images to Real-time Videos

no code implementations31 Dec 2020 Zhi-Qin Zhan, Huazhu Fu, Yan-Yao Yang, Jingjing Chen, Jie Liu, Yu-Gang Jiang

However, there are several issues between the image-based training and video-based inference, including domain differences, lack of positive samples, and temporal smoothness.

Domain Adaptation

Multi-modal Cooking Workflow Construction for Food Recipes

no code implementations20 Aug 2020 Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yu-Gang Jiang, Tat-Seng Chua

Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe.

Common Sense Reasoning Decoder

Dual In-painting Model for Unsupervised Gaze Correction and Animation in the Wild

1 code implementation9 Aug 2020 Jichao Zhang, Jingjing Chen, Hao Tang, Wei Wang, Yan Yan, Enver Sangineto, Nicu Sebe

In this paper we address the problem of unsupervised gaze correction in the wild, presenting a solution that works without the need for precise annotations of the gaze angle and the head pose.

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

5 code implementations Interspeech 2020 Jingjing Chen, Qirong Mao, Dong Liu

By introduces a improved transformer, elements in speech sequences can interact directly, which enables DPTNet can model for the speech sequences with direct context-awareness.

Speech Separation Audio and Speech Processing Sound

Coarse-to-Fine Gaze Redirection with Numerical and Pictorial Guidance

1 code implementation7 Apr 2020 Jingjing Chen, Jichao Zhang, Enver Sangineto, Jiayuan Fan, Tao Chen, Nicu Sebe

In this paper, we propose to alleviate these problems by means of a novel gaze redirection framework which exploits both a numerical and a pictorial direction guidance, jointly with a coarse-to-fine learning strategy.

gaze redirection Image Generation

Clean-Label Backdoor Attacks on Video Recognition Models

1 code implementation CVPR 2020 Shihao Zhao, Xingjun Ma, Xiang Zheng, James Bailey, Jingjing Chen, Yu-Gang Jiang

We propose the use of a universal adversarial trigger as the backdoor trigger to attack video recognition models, a situation where backdoor attacks are likely to be challenged by the above 4 strict conditions.

Backdoor Attack backdoor defense +2

Heuristic Black-box Adversarial Attacks on Video Recognition Models

1 code implementation21 Nov 2019 Zhipeng Wei, Jingjing Chen, Xingxing Wei, Linxi Jiang, Tat-Seng Chua, Fengfeng Zhou, Yu-Gang Jiang

To overcome this challenge, we propose a heuristic black-box attack model that generates adversarial perturbations only on the selected frames and regions.

Adversarial Attack Video Recognition

GazeCorrection:Self-Guided Eye Manipulation in the wild using Self-Supervised Generative Adversarial Networks

no code implementations arXiv 2019 Jichao Zhang, Meng Sun, Jingjing Chen, Hao Tang, Yan Yan, Xueying Qin, Nicu Sebe

Gaze correction aims to redirect the person's gaze into the camera by manipulating the eye region, and it can be considered as a specific image resynthesis problem.

Resynthesis

Probabilistic Forecasting of the Masses and Radii of Other Worlds

5 code implementations29 Mar 2016 Jingjing Chen, David M. Kipping

By conditioning our model upon a sample spanning dwarf planets to late-type stars, Forecaster can predict the mass (or radius) from the radius (or mass) for objects covering nine orders-of-magnitude in mass.

Earth and Planetary Astrophysics Instrumentation and Methods for Astrophysics

Cannot find the paper you are looking for? You can Submit a new open access paper.