Search Results for author: Juncheng Li

Found 71 papers, 37 papers with code

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

1 code implementation22 Nov 2023 Qifan Yu, Juncheng Li, Longhui Wei, Liang Pang, Wentao Ye, Bosheng Qin, Siliang Tang, Qi Tian, Yueting Zhuang

Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks.

Attribute counterfactual +3

Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer

no code implementations21 Nov 2023 Wenqiao Zhang, Zheqi Lv, Hao Zhou, Jia-Wei Liu, Juncheng Li, Mengze Li, Siliang Tang, Yueting Zhuang

Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate. This setting neglects the more practical scenario where training data are collected from multiple sources.

Domain Adaptation Transfer Learning

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback

no code implementations21 Nov 2023 Minghe Gao, Juncheng Li, Hao Fei, Liang Pang, Wei Ji, Guoming Wang, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

Visual programming, a modular and generalizable paradigm, integrates different modules and Python operators to solve various vision-language tasks.

Logical Reasoning

Towards Complex-query Referring Image Segmentation: A Novel Benchmark

no code implementations29 Sep 2023 Wei Ji, Li Li, Hao Fei, Xiangyan Liu, Xun Yang, Juncheng Li, Roger Zimmermann

Referring Image Understanding (RIS) has been extensively studied over the past decade, leading to the development of advanced algorithms.

Image Segmentation Semantic Segmentation

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval

no code implementations19 Aug 2023 Kaihang Pan, Juncheng Li, Hongye Song, Hao Fei, Wei Ji, Shuo Zhang, Jun Lin, Xiaozhong Liu, Siliang Tang

Recent studies have shown that dense retrieval models, lacking dedicated training data, struggle to perform well across diverse retrieval tasks, as different retrieval tasks often entail distinct search intents.

Retrieval Text-to-Image Generation

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

1 code implementation8 Aug 2023 Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Hanwang Zhang, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Yueting Zhuang

This shortcoming results in MLLMs' underperformance in comprehending demonstrative instructions consisting of multiple, interleaved, and multimodal instructions that demonstrate the required context to complete a task.

Image Captioning Instruction Following

Multi-Scale Prototypical Transformer for Whole Slide Image Classification

no code implementations5 Jul 2023 Saisai Ding, Jun Wang, Juncheng Li, Jun Shi

The PT is developed to reduce redundant instances in bags by integrating prototypical learning into the Transformer architecture.

Classification Image Classification +1

Weakly Supervised Lesion Detection and Diagnosis for Breast Cancers with Partially Annotated Ultrasound Images

no code implementations12 Jun 2023 Jian Wang, Liang Qiao, Shichong Zhou, Jin Zhou, Jun Wang, Juncheng Li, Shihui Ying, Cai Chang, Jun Shi

To address this issue, a novel Two-Stage Detection and Diagnosis Network (TSDDNet) is proposed based on weakly supervised learning to enhance diagnostic accuracy of the ultrasound-based CAD for breast cancers.

Lesion Detection Weakly-supervised Learning

Multi-scale Efficient Graph-Transformer for Whole Slide Image Classification

no code implementations25 May 2023 Saisai Ding, Juncheng Li, Jun Wang, Shihui Ying, Jun Shi

The key idea of MEGT is to adopt two independent Efficient Graph-based Transformer (EGT) branches to process the low-resolution and high-resolution patch embeddings (i. e., tokens in a Transformer) of WSIs, respectively, and then fuse these tokens via a multi-scale feature fusion module (MFFM).

Image Classification whole slide images

Sim-Suction: Learning a Suction Grasp Policy for Cluttered Environments Using a Synthetic Benchmark

1 code implementation25 May 2023 Juncheng Li, David J. Cappelleri

This paper presents Sim-Suction, a robust object-aware suction grasp policy for mobile manipulation platforms with dynamic camera viewpoints, designed to pick up unknown objects from cluttered environments.

Physical Simulations

Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document

1 code implementation23 May 2023 Xiangnan Chen, Qian Xiao, Juncheng Li, Duo Dong, Jun Lin, Xiaozhong Liu, Siliang Tang

GOSE initiates by generating preliminary relation predictions on entity pairs extracted from a scanned image of the document.

Relation Relation Extraction

Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration

1 code implementation22 May 2023 Qifan Yu, Juncheng Li, Wentao Ye, Siliang Tang, Yueting Zhuang

Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images.

Data Augmentation Prompt Engineering +1

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions

no code implementations21 May 2023 Bosheng Qin, Juncheng Li, Siliang Tang, Tat-Seng Chua, Yueting Zhuang

To improve the consistency between adjacent frames of generated videos, we propose the Frame Difference Loss, which is incorporated during the training process.

Attribute Image Generation +2

Sim-MEES: Modular End-Effector System Grasping Dataset for Mobile Manipulators in Cluttered Environments

1 code implementation17 May 2023 Juncheng Li, David J. Cappelleri

Our dataset generation process combines analytic models and dynamic simulations of the entire cluttered environment to provide accurate grasp labels.

Fast MRI Reconstruction via Edge Attention

1 code implementation22 Apr 2023 Hanhui Yang, Juncheng Li, Lok Ming Lui, Shihui Ying, Jun Shi, Tieyong Zeng

To solve this problem, we propose a lightweight and accurate Edge Attention MRI Reconstruction Network (EAMRI) to reconstruct images with edge guidance.

MRI Reconstruction

EWT: Efficient Wavelet-Transformer for Single Image Denoising

no code implementations13 Apr 2023 Juncheng Li, Bodong Cheng, Ying Chen, Guangwei Gao, Tieyong Zeng

Transformer-based image denoising methods have achieved encouraging results in the past year.

Image Denoising

PFT-SSR: Parallax Fusion Transformer for Stereo Image Super-Resolution

no code implementations24 Mar 2023 Hansheng Guo, Juncheng Li, Guangwei Gao, Zhi Li, Tieyong Zeng

Stereo image super-resolution aims to boost the performance of image super-resolution by exploiting the supplementary information provided by binocular systems.

Stereo Image Super-Resolution

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

1 code implementation ICCV 2023 Qifan Yu, Juncheng Li, Yu Wu, Siliang Tang, Wei Ji, Yueting Zhuang

Based on that, we further introduce a novel Entangled cross-modal prompt approach for open-world predicate scene graph generation (Epic), where models can generalize to unseen predicates in a zero-shot manner.

Graph Generation Language Modelling +1

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models

no code implementations ICCV 2023 Juncheng Li, Minghe Gao, Longhui Wei, Siliang Tang, Wenqiao Zhang, Mengze Li, Wei Ji, Qi Tian, Tat-Seng Chua, Yueting Zhuang

Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-training models to adapt to downstream tasks in a parameter -- and data -- efficient way, by learning the ``soft prompts'' to condition frozen pre-training models.

Domain Generalization Few-Shot Learning +1

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding

no code implementations22 Jan 2023 Juncheng Li, Siliang Tang, Linchao Zhu, Wenqiao Zhang, Yi Yang, Tat-Seng Chua, Fei Wu, Yueting Zhuang

To systematically benchmark the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning

1 code implementation CVPR 2023 Wei Ji, Renjie Liang, Zhedong Zheng, Wenqiao Zhang, Shengyu Zhang, Juncheng Li, Mengze Li, Tat-Seng Chua

Moreover, we treat the uncertainty score of frames in a video as a whole, and estimate the difficulty of each video, which can further relieve the burden of video selection.

Active Learning Moment Retrieval +1

Efficient Image Super-Resolution with Feature Interaction Weighted Hybrid Network

no code implementations29 Dec 2022 Wenjie Li, Juncheng Li, Guangwei Gao, Weihong Deng, Jian Yang, Guo-Jun Qi, Chia-Wen Lin

Recently, great progress has been made in single-image super-resolution (SISR) based on deep learning technology.

Image Super-Resolution

DBA: Efficient Transformer with Dynamic Bilinear Low-Rank Attention

no code implementations24 Nov 2022 Bosheng Qin, Juncheng Li, Siliang Tang, Yueting Zhuang

Furthermore, we show that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, optimizing the attention in bilinear form.

LEMMA

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

1 code implementation3 Aug 2022 Juncheng Li, Junlin Xie, Linchao Zhu, Long Qian, Siliang Tang, Wenqiao Zhang, Haochen Shi, Shengyu Zhang, Longhui Wei, Qi Tian, Yueting Zhuang

In this paper, we introduce a new task, named Temporal Emotion Localization in videos~(TEL), which aims to detect human emotions and localize their corresponding temporal boundaries in untrimmed videos with aligned subtitles.

Emotion Classification Temporal Action Localization +1

Quantized Consensus under Data-Rate Constraints and DoS Attacks: A Zooming-In and Holding Approach

no code implementations18 Jul 2022 Maopeng Ran, Shuai Feng, Juncheng Li, Lihua Xie

This paper is concerned with the quantized consensus problem for uncertain nonlinear multi-agent systems under data-rate constraints and Denial-of-Service (DoS) attacks.

Quantization

Masked Autoencoders that Listen

3 code implementations13 Jul 2022 Po-Yao Huang, Hu Xu, Juncheng Li, Alexei Baevski, Michael Auli, Wojciech Galuba, Florian Metze, Christoph Feichtenhofer

Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.

Ranked #2 on Speaker Identification on VoxCeleb1 (using extra training data)

Audio Classification Representation Learning +1

Snow Mask Guided Adaptive Residual Network for Image Snow Removal

no code implementations11 Jul 2022 Bodong Cheng, Juncheng Li, Ying Chen, Shuyi Zhang, Tieyong Zeng

Recently, some methods have been proposed for snow removing, and most methods deal with snow images directly as the optimization object.

Image Restoration object-detection +4

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

no code implementations9 Jul 2022 Wenqiao Zhang, Jiannan Guo, Mengze Li, Haochen Shi, Shengyu Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang

In this scenario, the input image serves as an intuitive context and background for the search, while the corresponding language expressly requests new traits on how specific characteristics of the query image should be modified in order to get the intended target image.

Content-Based Image Retrieval counterfactual +2

Cross-receptive Focused Inference Network for Lightweight Image Super-Resolution

1 code implementation6 Jul 2022 Wenjie Li, Juncheng Li, Guangwei Gao, Jiantao Zhou, Jian Yang, Guo-Jun Qi

Recently, Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks due to the ability of global feature extraction.

Image Super-Resolution

NTIRE 2022 Challenge on Stereo Image Super-Resolution: Methods and Results

no code implementations20 Apr 2022 Longguang Wang, Yulan Guo, Yingqian Wang, Juncheng Li, Shuhang Gu, Radu Timofte

In this paper, we summarize the 1st NTIRE challenge on stereo image super-resolution (restoration of rich details in a pair of low-resolution stereo images) with a focus on new solutions and results.

Stereo Image Super-Resolution

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

1 code implementation CVPR 2022 Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric Wang

To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Semantic correspondence Sentence

Feature Distillation Interaction Weighting Network for Lightweight Image Super-Resolution

1 code implementation16 Dec 2021 Guangwei Gao, Wenjie Li, Juncheng Li, Fei Wu, Huimin Lu, Yi Yu

Convolutional neural networks based single-image super-resolution (SISR) has made great progress in recent years.

Image Super-Resolution

From Beginner to Master: A Survey for Deep Learning-based Single-Image Super-Resolution

1 code implementation29 Sep 2021 Juncheng Li, Zehua Pei, Tieyong Zeng

In this survey, we give an overview of DL-based SISR methods and group them according to their targets, such as reconstruction efficiency, reconstruction accuracy, and perceptual accuracy.

Image Quality Assessment Image Super-Resolution

FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic Segmentation

1 code implementation2 Sep 2021 Guangwei Gao, Guoan Xu, Juncheng Li, Yi Yu, Huimin Lu, Jian Yang

Specifically, FBSNet employs a symmetrical encoder-decoder structure with two branches, semantic information branch and spatial detail branch.

Autonomous Driving Drone navigation +1

Transformer for Single Image Super-Resolution

1 code implementation25 Aug 2021 Zhisheng Lu, Juncheng Li, Hong Liu, Chaoyan Huang, Linlin Zhang, Tieyong Zeng

LTB is composed of a series of Efficient Transformers (ET), which occupies a small GPU memory occupation, thanks to the specially designed Efficient Multi-Head Attention (EMHA).

Image Super-Resolution

Structure-Preserving Deraining with Residue Channel Prior Guidance

1 code implementation ICCV 2021 Qiaosi Yi, Juncheng Li, Qinyan Dai, Faming Fang, Guixu Zhang, Tieyong Zeng

Although these methods can remove part of the rain streaks, it is difficult for them to adapt to real-world scenarios and restore high-quality rain-free images with clear and accurate structures.

Single Image Deraining

Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

no code implementations ICCV 2021 Juncheng Li, Siliang Tang, Linchao Zhu, Haochen Shi, Xuanwen Huang, Fei Wu, Yi Yang, Yueting Zhuang

Secondly, we introduce semantic coherence learning to explicitly encourage the semantic coherence of the adaptive hierarchical graph network from three hierarchies.

Feedback Network for Mutually Boosted Stereo Image Super-Resolution and Disparity Estimation

1 code implementation2 Jun 2021 Qinyan Dai, Juncheng Li, Qiaosi Yi, Faming Fang, Guixu Zhang

Besides the cross-view information exploitation in the low-resolution (LR) space, HR representations produced by the SR process are utilized to perform HR disparity estimation with higher accuracy, through which the HR features can be aggregated to generate a finer SR result.

Disparity Estimation Image Reconstruction +1

Lightweight Image Super-Resolution with Multi-scale Feature Interaction Network

no code implementations24 Mar 2021 Zhengxue Wang, Guangwei Gao, Juncheng Li, Yi Yu, Huimin Lu

Recently, the single image super-resolution (SISR) approaches with deep and complex convolutional neural network structures have achieved promising performance.

Image Super-Resolution

Efficient and Accurate Multi-scale Topological Network for Single Image Dehazing

no code implementations24 Feb 2021 Qiaosi Yi, Juncheng Li, Faming Fang, Aiwen Jiang, Guixu Zhang

To achieve this, we propose a Multi-scale Topological Network (MSTN) to fully explore the features at different scales.

feature selection Image Dehazing +1

Scale-Aware Network with Regional and Semantic Attentions for Crowd Counting under Cluttered Background

no code implementations5 Jan 2021 Qiaosi Yi, Yunxing Liu, Aiwen Jiang, Juncheng Li, Kangfu Mei, Mingwen Wang

Although the emergence of deep learning has greatly promoted the development of this field, crowd counting under cluttered background is still a serious challenge.

Crowd Counting Density Estimation

Robust Meta-learning with Noise via Eigen-Reptile

no code implementations1 Jan 2021 Dong Chen, Lingfei Wu, Siliang Tang, Fangli Xu, Juncheng Li, Chang Zong, Chilie Tan, Yueting Zhuang

In particular, we first cast the meta-overfitting problem (overfitting on sampling and label noise) as a gradient noise problem since few available samples cause meta-learner to overfit on existing examples (clean or corrupted) of an individual task at every gradient step.

Few-Shot Learning

Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations

no code implementations12 Sep 2020 Ze Cheng, Juncheng Li, Chenxu Wang, Jixuan Gu, Hao Xu, Xinjian Li, Florian Metze

In this paper, we provide a theoretical explanation that low total correlation of sampled representation cannot guarantee low total correlation of the mean representation.

MDCN: Multi-scale Dense Cross Network for Image Super-Resolution

1 code implementation30 Aug 2020 Juncheng Li, Faming Fang, Jiaqian Li, Kangfu Mei, Guixu Zhang

Among them, MDCB aims to detect multi-scale features and maximize the use of image features flow at different scales, HFDB focuses on adaptively recalibrate channel-wise feature responses to achieve feature distillation, and DRB attempts to reconstruct SR images with different upsampling factors in a single model.

Dynamic Reconstruction Image Super-Resolution

Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

no code implementations11 Aug 2020 Jiacheng Li, Siliang Tang, Juncheng Li, Jun Xiao, Fei Wu, ShiLiang Pu, Yueting Zhuang

In this paper, we focus on enhancing the generalization ability of the VIST model by considering the few-shot setting.

Meta-Learning Visual Storytelling

Disentangle Perceptual Learning through Online Contrastive Learning

no code implementations24 Jun 2020 Kangfu Mei, Yao Lu, Qiaosi Yi, Hao-Yu Wu, Juncheng Li, Rui Huang

Perceptual learning approaches like perceptual loss are empirically powerful for such tasks but they usually rely on the pre-trained classification network to provide features, which are not necessarily optimal in terms of visual perception of image transformation.

Contrastive Learning feature selection

Towards Zero-shot Learning for Automatic Phonemic Transcription

no code implementations26 Feb 2020 Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W. black, Florian Metze

The difficulty of this task is that phoneme inventories often differ between the training languages and the target language, making it infeasible to recognize unseen phonemes.

Zero-Shot Learning

Fast Loop Closure Detection via Binary Content

no code implementations25 Feb 2020 Han Wang, Juncheng Li, Maopeng Ran, Lihua Xie

Our method is compared with the state-of-the-art loop closure detection methods and the results show that it outperforms the traditional methods at both recall rate and speed.

Image Retrieval Loop Closure Detection +2

HighEr-Resolution Network for Image Demosaicing and Enhancing

1 code implementation19 Nov 2019 Kangfu Mei, Juncheng Li, Jiajie Zhang, Hao-Yu Wu, Jie Li, Rui Huang

However, plenty of studies have shown that global information is crucial for image restoration tasks like image demosaicing and enhancing.

Demosaicking

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

no code implementations CVPR 2020 Juncheng Li, Xin Wang, Siliang Tang, Haizhou Shi, Fei Wu, Yueting Zhuang, William Yang Wang

Visual navigation is a task of training an embodied agent by intelligently navigating to a target object (e. g., television) using only visual observations.

Object reinforcement-learning +3

Walking with MIND: Mental Imagery eNhanceD Embodied QA

no code implementations5 Aug 2019 Juncheng Li, Siliang Tang, Fei Wu, Yueting Zhuang

The experimental results and further analysis prove that the agent with the MIND module is superior to its counterparts not only in EQA performance but in many other aspects such as route planning, behavioral interpretation, and the ability to generalize from a few examples.

Adversarial camera stickers: A physical camera-based attack on deep learning systems

1 code implementation21 Mar 2019 Juncheng Li, Frank R. Schmidt, J. Zico Kolter

In this work, we consider an alternative question: is it possible to fool deep classifiers, over all perceived objects of a certain type, by physically manipulating the camera itself?

A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling

3 code implementations22 Oct 2018 Yun Wang, Juncheng Li, Florian Metze

This paper compares five types of pooling functions both theoretically and experimentally, with special focus on their performance of localization.

Sound Audio and Speech Processing

Progressive Feature Fusion Network for Realistic Image Dehazing

1 code implementation4 Oct 2018 Kangfu Mei, Aiwen Jiang, Juncheng Li, Mingwen Wang

Most of them follow a classic atmospheric scattering model which is an elegant simplified physical model based on the assumption of single-scattering and homogeneous atmospheric medium.

Image Dehazing Single Image Dehazing

An Effective Single-Image Super-Resolution Model Using Squeeze-and-Excitation Networks

1 code implementation3 Oct 2018 Kangfu Mei, Aiwen Jiang, Juncheng Li, Jihua Ye, Mingwen Wang

Recent works on single-image super-resolution are concentrated on improving performance through enhancing spatial encoding between convolutional layers.

Image Super-Resolution

Multi-scale Residual Network for Image Super-Resolution

1 code implementation ECCV 2018 Juncheng Li, Faming Fang, Kangfu Mei, Guixu Zhang

Meanwhile, we let these features interact with each other to get the most efficacious image information, we call this structure Multi-scale Residual Block (MSRB).

Image Super-Resolution

Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval

1 code implementation ICMR 2018 Niluthpol Chowdhury Mithun, Juncheng Li, Florian Metze, Amit K. Roy-Chowdhury

Constructing a joint representation invariant across different modalities (e. g., video, language) is of significant importance in many multimedia applications.

Retrieval Text Retrieval +1

A Comparison of deep learning methods for environmental sound

1 code implementation20 Mar 2017 Juncheng Li, Wei Dai, Florian Metze, Shuhui Qu, Samarjit Das

On these features, we apply five models: Gaussian Mixture Model (GMM), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Convolutional Deep Neural Net- work (CNN) and i-vector.

Avg

Learning Filter Banks Using Deep Learning For Acoustic Signals

no code implementations29 Nov 2016 Shuhui Qu, Juncheng Li, Wei Dai, Samarjit Das

Based on the procedure of log Mel-filter banks, we design a filter bank learning layer.

Very Deep Convolutional Neural Networks for Raw Waveforms

10 code implementations1 Oct 2016 Wei Dai, Chia Dai, Shuhui Qu, Juncheng Li, Samarjit Das

Our CNNs, with up to 34 weight layers, are efficient to optimize over very long sequences (e. g., vector of size 32000), necessary for processing acoustic waveforms.

Representation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.