Search Results for author: Juncheng Li

Found 89 papers, 45 papers with code

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation

no code implementations8 Dec 2024 Leigang Qu, Haochuan Li, Wenjie Wang, Xiang Liu, Juncheng Li, Liqiang Nie, Tat-Seng Chua

To adapt SILMM to LMMs with continuous features, we propose a diversity mechanism to obtain diverse representations and a kernel-based continuous DPO for alignment.

Diversity Prompt Engineering +1

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

1 code implementation5 Dec 2024 Jinbin Bai, Wei Chow, Ling Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, Shuicheng Yan

HumanEdit bridges this gap by employing human annotators to construct data pairs and administrators to provide feedback.

Unified Generative and Discriminative Training for Multi-modal Large Language Models

no code implementations1 Nov 2024 Wei Chow, Juncheng Li, Qifan Yu, Kaihang Pan, Hao Fei, Zhiqi Ge, Shuai Yang, Siliang Tang, Hanwang Zhang, Qianru Sun

Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval, yet struggles with complex scenarios requiring fine-grained semantic differentiation.

Dynamic Time Warping Image-text Classification +4

RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection

no code implementations2 Oct 2024 Bingchen Miao, Wenqiao Zhang, Juncheng Li, Siliang Tang, Zhaocheng Li, Haochen Shi, Jun Xiao, Yueting Zhuang

To address this practical challenge, we introduce a first-of-its-kind study that comprehensively investigates Modality-Incomplete Industrial Anomaly Detection (MIIAD), to consider the imperfect learning environment in which the multimodal information may be incomplete.

Anomaly Detection Philosophy

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration

no code implementations30 Sep 2024 Kaihang Pan, Zhaoyu Fan, Juncheng Li, Qifan Yu, Hao Fei, Siliang Tang, Richang Hong, Hanwang Zhang, Qianru Sun

In this paper, we propose UniKE, a novel multimodal editing method that establishes a unified perspective and paradigm for intrinsic knowledge editing and external knowledge resorting.

knowledge editing

Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images

no code implementations24 Aug 2024 Tianxiang Huang, Jing Shi, Ge Jin, Juncheng Li, Jun Wang, Jun Du, Jun Shi

In this work, we propose a novel hip landmark detection model by integrating the Topological GCN (TGCN) with an Improved Conformer (TGCN-ICF) into a unified frame-work to improve detection performance.

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition

1 code implementation19 Aug 2024 Tianwei Lin, Jiang Liu, Wenqiao Zhang, Zhaocheng Li, Yang Dai, Haoyuan Li, Zhelun Yu, Wanggui He, Juncheng Li, Hao Jiang, Siliang Tang, Yueting Zhuang

Considering this, we introduce an innovative PEFT method, TeamLoRA, consisting of a collaboration and competition module for experts, and thus achieving the right balance of effectiveness and efficiency: (i) For collaboration, a novel knowledge-sharing and -organizing mechanism is devised to appropriately reduce the scale of matrix operations, thereby boosting the training and inference speed.

Multi-Task Learning parameter-efficient fine-tuning +1

Auto-Encoding Morph-Tokens for Multimodal LLM

1 code implementation3 May 2024 Kaihang Pan, Siliang Tang, Juncheng Li, Zhaoyu Fan, Wei Chow, Shuicheng Yan, Tat-Seng Chua, Yueting Zhuang, Hanwang Zhang

For multimodal LLMs, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge.

Image Reconstruction MORPH

Sim-Grasp: Learning 6-DOF Grasp Policies for Cluttered Environments Using a Synthetic Benchmark

1 code implementation1 May 2024 Juncheng Li, David J. Cappelleri

In this paper, we present Sim-Grasp, a robust 6-DOF two-finger grasping system that integrates advanced language models for enhanced object manipulation in cluttered environments.

Object

WorldGPT: Empowering LLM as Multimodal World Model

1 code implementation28 Apr 2024 Zhiqi Ge, Hongzhe Huang, Mingze Zhou, Juncheng Li, Guoming Wang, Siliang Tang, Yueting Zhuang

As for evaluation, we build WorldNet, a multimodal state transition prediction benchmark encompassing varied real-life scenarios.

Language Modelling Large Language Model +1

LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation

no code implementations21 Apr 2024 Haoyu Zheng, Wenqiao Zhang, Yaoke Wang, Hao Zhou, Jiang Liu, Juncheng Li, Zheqi Lv, Siliang Tang, Yueting Zhuang

Revolutionary advancements in text-to-image models have unlocked new dimensions for sophisticated content creation, e. g., text-conditioned image editing, allowing us to edit the diverse images that convey highly complex visual concepts according to the textual guidance.

Image Generation Image Morphing +2

Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales

no code implementations17 Apr 2024 Minghe Gao, Shuang Chen, Liang Pang, Yuan YAO, Jisheng Dang, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang, Tat-Seng Chua

Their ability to execute intricate compositional reasoning tasks is also constrained, culminating in a stagnation of learning progression for these models.

Hallucination

Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution

no code implementations CVPR 2024 Longguang Wang, Juncheng Li, Yingqian Wang, Qingyong Hu, Yulan Guo

The difficulty of acquiring high-resolution (HR) and low-resolution (LR) image pairs in real scenarios limits the performance of existing learning-based image super-resolution (SR) methods in the real world.

Diversity Image Generation +1

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

1 code implementation CVPR 2024 Qifan Yu, Juncheng Li, Longhui Wei, Liang Pang, Wentao Ye, Bosheng Qin, Siliang Tang, Qi Tian, Yueting Zhuang

Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks.

Attribute counterfactual +3

Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer

no code implementations CVPR 2024 Wenqiao Zhang, Zheqi Lv, Hao Zhou, Jia-Wei Liu, Juncheng Li, Mengze Li, Siliang Tang, Yueting Zhuang

Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate. This setting neglects the more practical scenario where training data are collected from multiple sources.

Diversity Domain Adaptation +1

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback

no code implementations21 Nov 2023 Minghe Gao, Juncheng Li, Hao Fei, Liang Pang, Wei Ji, Guoming Wang, Zheqi Lv, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

Visual programming, a modular and generalizable paradigm, integrates different modules and Python operators to solve various vision-language tasks.

Logical Reasoning

Towards Complex-query Referring Image Segmentation: A Novel Benchmark

no code implementations29 Sep 2023 Wei Ji, Li Li, Hao Fei, Xiangyan Liu, Xun Yang, Juncheng Li, Roger Zimmermann

Referring Image Understanding (RIS) has been extensively studied over the past decade, leading to the development of advanced algorithms.

Image Segmentation Semantic Segmentation

I3: Intent-Introspective Retrieval Conditioned on Instructions

no code implementations19 Aug 2023 Kaihang Pan, Juncheng Li, Wenjie Wang, Hao Fei, Hongye Song, Wei Ji, Jun Lin, Xiaozhong Liu, Tat-Seng Chua, Siliang Tang

Recent studies indicate that dense retrieval models struggle to perform well on a wide variety of retrieval tasks that lack dedicated training data, as different retrieval tasks often entail distinct search intents.

Retrieval Text-to-Image Generation

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

1 code implementation8 Aug 2023 Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Hanwang Zhang, Yueting Zhuang

This shortcoming results in MLLMs' underperformance in comprehending demonstrative instructions consisting of multiple, interleaved, and multimodal instructions that demonstrate the required context to complete a task.

Caption Generation Image Captioning +1

Multi-Scale Prototypical Transformer for Whole Slide Image Classification

no code implementations5 Jul 2023 Saisai Ding, Jun Wang, Juncheng Li, Jun Shi

The PT is developed to reduce redundant instances in bags by integrating prototypical learning into the Transformer architecture.

Classification Image Classification +1

Weakly Supervised Lesion Detection and Diagnosis for Breast Cancers with Partially Annotated Ultrasound Images

no code implementations12 Jun 2023 Jian Wang, Liang Qiao, Shichong Zhou, Jin Zhou, Jun Wang, Juncheng Li, Shihui Ying, Cai Chang, Jun Shi

To address this issue, a novel Two-Stage Detection and Diagnosis Network (TSDDNet) is proposed based on weakly supervised learning to enhance diagnostic accuracy of the ultrasound-based CAD for breast cancers.

Lesion Detection Weakly-supervised Learning

Multi-scale Efficient Graph-Transformer for Whole Slide Image Classification

no code implementations25 May 2023 Saisai Ding, Juncheng Li, Jun Wang, Shihui Ying, Jun Shi

The key idea of MEGT is to adopt two independent Efficient Graph-based Transformer (EGT) branches to process the low-resolution and high-resolution patch embeddings (i. e., tokens in a Transformer) of WSIs, respectively, and then fuse these tokens via a multi-scale feature fusion module (MFFM).

Image Classification whole slide images

Sim-Suction: Learning a Suction Grasp Policy for Cluttered Environments Using a Synthetic Benchmark

1 code implementation25 May 2023 Juncheng Li, David J. Cappelleri

This paper presents Sim-Suction, a robust object-aware suction grasp policy for mobile manipulation platforms with dynamic camera viewpoints, designed to pick up unknown objects from cluttered environments.

Physical Simulations

Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document

1 code implementation23 May 2023 Xiangnan Chen, Qian Xiao, Juncheng Li, Duo Dong, Jun Lin, Xiaozhong Liu, Siliang Tang

GOSE initiates by generating preliminary relation predictions on entity pairs extracted from a scanned image of the document.

Relation Relation Extraction

Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration

1 code implementation22 May 2023 Qifan Yu, Juncheng Li, Wentao Ye, Siliang Tang, Yueting Zhuang

Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images.

Data Augmentation Prompt Engineering +1

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions

no code implementations21 May 2023 Bosheng Qin, Juncheng Li, Siliang Tang, Tat-Seng Chua, Yueting Zhuang

We introduce InstructVid2Vid, an end-to-end diffusion-based methodology for video editing guided by human language instructions.

Attribute Image Generation +2

Sim-MEES: Modular End-Effector System Grasping Dataset for Mobile Manipulators in Cluttered Environments

1 code implementation17 May 2023 Juncheng Li, David J. Cappelleri

Our dataset generation process combines analytic models and dynamic simulations of the entire cluttered environment to provide accurate grasp labels.

Fast MRI Reconstruction via Edge Attention

1 code implementation22 Apr 2023 Hanhui Yang, Juncheng Li, Lok Ming Lui, Shihui Ying, Jun Shi, Tieyong Zeng

To solve this problem, we propose a lightweight and accurate Edge Attention MRI Reconstruction Network (EAMRI) to reconstruct images with edge guidance.

MRI Reconstruction

EWT: Efficient Wavelet-Transformer for Single Image Denoising

no code implementations13 Apr 2023 Juncheng Li, Bodong Cheng, Ying Chen, Guangwei Gao, Tieyong Zeng

Transformer-based image denoising methods have achieved encouraging results in the past year.

Image Denoising

PFT-SSR: Parallax Fusion Transformer for Stereo Image Super-Resolution

no code implementations24 Mar 2023 Hansheng Guo, Juncheng Li, Guangwei Gao, Zhi Li, Tieyong Zeng

Stereo image super-resolution aims to boost the performance of image super-resolution by exploiting the supplementary information provided by binocular systems.

Stereo Image Super-Resolution

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

1 code implementation ICCV 2023 Qifan Yu, Juncheng Li, Yu Wu, Siliang Tang, Wei Ji, Yueting Zhuang

Based on that, we further introduce a novel Entangled cross-modal prompt approach for open-world predicate scene graph generation (Epic), where models can generalize to unseen predicates in a zero-shot manner.

Graph Generation Language Modelling +1

Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization

1 code implementation22 Mar 2023 Kaihang Pan, Juncheng Li, Hongye Song, Jun Lin, Xiaozhong Liu, Siliang Tang

Though effective, prompt tuning under few-shot settings on the one hand heavily relies on a good initialization of soft prompts.

Domain Generalization Few-Shot Learning

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models

no code implementations ICCV 2023 Juncheng Li, Minghe Gao, Longhui Wei, Siliang Tang, Wenqiao Zhang, Mengze Li, Wei Ji, Qi Tian, Tat-Seng Chua, Yueting Zhuang

Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-training models to adapt to downstream tasks in a parameter -- and data -- efficient way, by learning the ``soft prompts'' to condition frozen pre-training models.

Domain Generalization Few-Shot Learning +1

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding

no code implementations22 Jan 2023 Juncheng Li, Siliang Tang, Linchao Zhu, Wenqiao Zhang, Yi Yang, Tat-Seng Chua, Fei Wu, Yueting Zhuang

To systematically benchmark the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Diversity Semantic correspondence +1

Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-Based Active Learning

1 code implementation CVPR 2023 Wei Ji, Renjie Liang, Zhedong Zheng, Wenqiao Zhang, Shengyu Zhang, Juncheng Li, Mengze Li, Tat-Seng Chua

Moreover, we treat the uncertainty score of frames in a video as a whole, and estimate the difficulty of each video, which can further relieve the burden of video selection.

Active Learning Moment Retrieval +1

Efficient Image Super-Resolution with Feature Interaction Weighted Hybrid Network

1 code implementation29 Dec 2022 Wenjie Li, Juncheng Li, Guangwei Gao, Weihong Deng, Jian Yang, Guo-Jun Qi, Chia-Wen Lin

Lightweight image super-resolution aims to reconstruct high-resolution images from low-resolution images using low computational costs.

Image Super-Resolution

DBA: Efficient Transformer with Dynamic Bilinear Low-Rank Attention

no code implementations24 Nov 2022 Bosheng Qin, Juncheng Li, Siliang Tang, Yueting Zhuang

Furthermore, we show that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, optimizing the attention in bilinear form.

LEMMA

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos

1 code implementation3 Aug 2022 Juncheng Li, Junlin Xie, Linchao Zhu, Long Qian, Siliang Tang, Wenqiao Zhang, Haochen Shi, Shengyu Zhang, Longhui Wei, Qi Tian, Yueting Zhuang

In this paper, we introduce a new task, named Temporal Emotion Localization in videos~(TEL), which aims to detect human emotions and localize their corresponding temporal boundaries in untrimmed videos with aligned subtitles.

Emotion Classification Temporal Action Localization +1

Quantized Consensus under Data-Rate Constraints and DoS Attacks: A Zooming-In and Holding Approach

no code implementations18 Jul 2022 Maopeng Ran, Shuai Feng, Juncheng Li, Lihua Xie

This paper is concerned with the quantized consensus problem for uncertain nonlinear multi-agent systems under data-rate constraints and Denial-of-Service (DoS) attacks.

Quantization

Masked Autoencoders that Listen

4 code implementations13 Jul 2022 Po-Yao Huang, Hu Xu, Juncheng Li, Alexei Baevski, Michael Auli, Wojciech Galuba, Florian Metze, Christoph Feichtenhofer

Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.

Ranked #4 on Speaker Identification on VoxCeleb1 (using extra training data)

Audio Classification Decoder +2

Snow Mask Guided Adaptive Residual Network for Image Snow Removal

no code implementations11 Jul 2022 Bodong Cheng, Juncheng Li, Ying Chen, Shuyi Zhang, Tieyong Zeng

Recently, some methods have been proposed for snow removing, and most methods deal with snow images directly as the optimization object.

Image Restoration object-detection +4

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval

no code implementations9 Jul 2022 Wenqiao Zhang, Jiannan Guo, Mengze Li, Haochen Shi, Shengyu Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang

In this scenario, the input image serves as an intuitive context and background for the search, while the corresponding language expressly requests new traits on how specific characteristics of the query image should be modified in order to get the intended target image.

Content-Based Image Retrieval counterfactual +2

Cross-receptive Focused Inference Network for Lightweight Image Super-Resolution

1 code implementation6 Jul 2022 Wenjie Li, Juncheng Li, Guangwei Gao, Jiantao Zhou, Jian Yang, Guo-Jun Qi

Recently, Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks due to the ability of global feature extraction.

Image Super-Resolution

NTIRE 2022 Challenge on Stereo Image Super-Resolution: Methods and Results

no code implementations20 Apr 2022 Longguang Wang, Yulan Guo, Yingqian Wang, Juncheng Li, Shuhang Gu, Radu Timofte

In this paper, we summarize the 1st NTIRE challenge on stereo image super-resolution (restoration of rich details in a pair of low-resolution stereo images) with a focus on new solutions and results.

Stereo Image Super-Resolution

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning

1 code implementation CVPR 2022 Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric Wang

To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.

Diversity Semantic correspondence +1

Feature Distillation Interaction Weighting Network for Lightweight Image Super-Resolution

1 code implementation16 Dec 2021 Guangwei Gao, Wenjie Li, Juncheng Li, Fei Wu, Huimin Lu, Yi Yu

Convolutional neural networks based single-image super-resolution (SISR) has made great progress in recent years.

Image Super-Resolution

A Systematic Survey of Deep Learning-based Single-Image Super-Resolution

1 code implementation29 Sep 2021 Juncheng Li, Zehua Pei, Wenjie Li, Guangwei Gao, Longguang Wang, Yingqian Wang, Tieyong Zeng

This is an exhaustive survey of SISR, which can help researchers better understand SISR and inspire more exciting research in this field.

Deep Learning Image Quality Assessment +2

FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic Segmentation

1 code implementation2 Sep 2021 Guangwei Gao, Guoan Xu, Juncheng Li, Yi Yu, Huimin Lu, Jian Yang

Specifically, FBSNet employs a symmetrical encoder-decoder structure with two branches, semantic information branch and spatial detail branch.

Autonomous Driving Decoder +2

Transformer for Single Image Super-Resolution

1 code implementation25 Aug 2021 Zhisheng Lu, Juncheng Li, Hong Liu, Chaoyan Huang, Linlin Zhang, Tieyong Zeng

LTB is composed of a series of Efficient Transformers (ET), which occupies a small GPU memory occupation, thanks to the specially designed Efficient Multi-Head Attention (EMHA).

Image Super-Resolution

Structure-Preserving Deraining with Residue Channel Prior Guidance

1 code implementation ICCV 2021 Qiaosi Yi, Juncheng Li, Qinyan Dai, Faming Fang, Guixu Zhang, Tieyong Zeng

Although these methods can remove part of the rain streaks, it is difficult for them to adapt to real-world scenarios and restore high-quality rain-free images with clear and accurate structures.

Single Image Deraining

Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

no code implementations ICCV 2021 Juncheng Li, Siliang Tang, Linchao Zhu, Haochen Shi, Xuanwen Huang, Fei Wu, Yi Yang, Yueting Zhuang

Secondly, we introduce semantic coherence learning to explicitly encourage the semantic coherence of the adaptive hierarchical graph network from three hierarchies.

Feedback Network for Mutually Boosted Stereo Image Super-Resolution and Disparity Estimation

1 code implementation2 Jun 2021 Qinyan Dai, Juncheng Li, Qiaosi Yi, Faming Fang, Guixu Zhang

Besides the cross-view information exploitation in the low-resolution (LR) space, HR representations produced by the SR process are utilized to perform HR disparity estimation with higher accuracy, through which the HR features can be aggregated to generate a finer SR result.

Disparity Estimation Image Reconstruction +1

Lightweight Image Super-Resolution with Multi-scale Feature Interaction Network

no code implementations24 Mar 2021 Zhengxue Wang, Guangwei Gao, Juncheng Li, Yi Yu, Huimin Lu

Recently, the single image super-resolution (SISR) approaches with deep and complex convolutional neural network structures have achieved promising performance.

Image Super-Resolution

Efficient and Accurate Multi-scale Topological Network for Single Image Dehazing

no code implementations24 Feb 2021 Qiaosi Yi, Juncheng Li, Faming Fang, Aiwen Jiang, Guixu Zhang

To achieve this, we propose a Multi-scale Topological Network (MSTN) to fully explore the features at different scales.

feature selection Image Dehazing +1

Scale-Aware Network with Regional and Semantic Attentions for Crowd Counting under Cluttered Background

no code implementations5 Jan 2021 Qiaosi Yi, Yunxing Liu, Aiwen Jiang, Juncheng Li, Kangfu Mei, Mingwen Wang

Although the emergence of deep learning has greatly promoted the development of this field, crowd counting under cluttered background is still a serious challenge.

Crowd Counting Density Estimation +1

Robust Meta-learning with Noise via Eigen-Reptile

no code implementations1 Jan 2021 Dong Chen, Lingfei Wu, Siliang Tang, Fangli Xu, Juncheng Li, Chang Zong, Chilie Tan, Yueting Zhuang

In particular, we first cast the meta-overfitting problem (overfitting on sampling and label noise) as a gradient noise problem since few available samples cause meta-learner to overfit on existing examples (clean or corrupted) of an individual task at every gradient step.

Few-Shot Learning

Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations

no code implementations12 Sep 2020 Ze Cheng, Juncheng Li, Chenxu Wang, Jixuan Gu, Hao Xu, Xinjian Li, Florian Metze

In this paper, we provide a theoretical explanation that low total correlation of sampled representation cannot guarantee low total correlation of the mean representation.

MDCN: Multi-scale Dense Cross Network for Image Super-Resolution

1 code implementation30 Aug 2020 Juncheng Li, Faming Fang, Jiaqian Li, Kangfu Mei, Guixu Zhang

Among them, MDCB aims to detect multi-scale features and maximize the use of image features flow at different scales, HFDB focuses on adaptively recalibrate channel-wise feature responses to achieve feature distillation, and DRB attempts to reconstruct SR images with different upsampling factors in a single model.

Dynamic Reconstruction Image Super-Resolution

Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

no code implementations11 Aug 2020 Jiacheng Li, Siliang Tang, Juncheng Li, Jun Xiao, Fei Wu, ShiLiang Pu, Yueting Zhuang

In this paper, we focus on enhancing the generalization ability of the VIST model by considering the few-shot setting.

Meta-Learning Visual Storytelling

Disentangle Perceptual Learning through Online Contrastive Learning

no code implementations24 Jun 2020 Kangfu Mei, Yao Lu, Qiaosi Yi, Hao-Yu Wu, Juncheng Li, Rui Huang

Perceptual learning approaches like perceptual loss are empirically powerful for such tasks but they usually rely on the pre-trained classification network to provide features, which are not necessarily optimal in terms of visual perception of image transformation.

Contrastive Learning feature selection +1

Towards Zero-shot Learning for Automatic Phonemic Transcription

no code implementations26 Feb 2020 Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W. black, Florian Metze

The difficulty of this task is that phoneme inventories often differ between the training languages and the target language, making it infeasible to recognize unseen phonemes.

Zero-Shot Learning

Fast Loop Closure Detection via Binary Content

no code implementations25 Feb 2020 Han Wang, Juncheng Li, Maopeng Ran, Lihua Xie

Our method is compared with the state-of-the-art loop closure detection methods and the results show that it outperforms the traditional methods at both recall rate and speed.

Image Retrieval Loop Closure Detection +2

HighEr-Resolution Network for Image Demosaicing and Enhancing

1 code implementation19 Nov 2019 Kangfu Mei, Juncheng Li, Jiajie Zhang, Hao-Yu Wu, Jie Li, Rui Huang

However, plenty of studies have shown that global information is crucial for image restoration tasks like image demosaicing and enhancing.

Demosaicking

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

no code implementations CVPR 2020 Juncheng Li, Xin Wang, Siliang Tang, Haizhou Shi, Fei Wu, Yueting Zhuang, William Yang Wang

Visual navigation is a task of training an embodied agent by intelligently navigating to a target object (e. g., television) using only visual observations.

Deep Reinforcement Learning Object +4

Walking with MIND: Mental Imagery eNhanceD Embodied QA

no code implementations5 Aug 2019 Juncheng Li, Siliang Tang, Fei Wu, Yueting Zhuang

The experimental results and further analysis prove that the agent with the MIND module is superior to its counterparts not only in EQA performance but in many other aspects such as route planning, behavioral interpretation, and the ability to generalize from a few examples.

Adversarial camera stickers: A physical camera-based attack on deep learning systems

1 code implementation21 Mar 2019 Juncheng Li, Frank R. Schmidt, J. Zico Kolter

In this work, we consider an alternative question: is it possible to fool deep classifiers, over all perceived objects of a certain type, by physically manipulating the camera itself?

A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling

3 code implementations22 Oct 2018 Yun Wang, Juncheng Li, Florian Metze

This paper compares five types of pooling functions both theoretically and experimentally, with special focus on their performance of localization.

Sound Audio and Speech Processing

Progressive Feature Fusion Network for Realistic Image Dehazing

1 code implementation4 Oct 2018 Kangfu Mei, Aiwen Jiang, Juncheng Li, Mingwen Wang

Most of them follow a classic atmospheric scattering model which is an elegant simplified physical model based on the assumption of single-scattering and homogeneous atmospheric medium.

4k Decoder +2

An Effective Single-Image Super-Resolution Model Using Squeeze-and-Excitation Networks

1 code implementation3 Oct 2018 Kangfu Mei, Aiwen Jiang, Juncheng Li, Jihua Ye, Mingwen Wang

Recent works on single-image super-resolution are concentrated on improving performance through enhancing spatial encoding between convolutional layers.

Image Super-Resolution

Multi-scale Residual Network for Image Super-Resolution

1 code implementation ECCV 2018 Juncheng Li, Faming Fang, Kangfu Mei, Guixu Zhang

Meanwhile, we let these features interact with each other to get the most efficacious image information, we call this structure Multi-scale Residual Block (MSRB).

Image Super-Resolution

Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval

1 code implementation ICMR 2018 Niluthpol Chowdhury Mithun, Juncheng Li, Florian Metze, Amit K. Roy-Chowdhury

Constructing a joint representation invariant across different modalities (e. g., video, language) is of significant importance in many multimedia applications.

Image-text Retrieval Text Retrieval +1

A Comparison of deep learning methods for environmental sound

1 code implementation20 Mar 2017 Juncheng Li, Wei Dai, Florian Metze, Shuhui Qu, Samarjit Das

On these features, we apply five models: Gaussian Mixture Model (GMM), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Convolutional Deep Neural Net- work (CNN) and i-vector.

Avg Deep Learning

Learning Filter Banks Using Deep Learning For Acoustic Signals

no code implementations29 Nov 2016 Shuhui Qu, Juncheng Li, Wei Dai, Samarjit Das

Based on the procedure of log Mel-filter banks, we design a filter bank learning layer.

Deep Learning

Very Deep Convolutional Neural Networks for Raw Waveforms

10 code implementations1 Oct 2016 Wei Dai, Chia Dai, Shuhui Qu, Juncheng Li, Samarjit Das

Our CNNs, with up to 34 weight layers, are efficient to optimize over very long sequences (e. g., vector of size 32000), necessary for processing acoustic waveforms.

Representation Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.