no code implementations • 9 Dec 2024 • Qifan Yu, Zhebei Shen, Zhongqi Yue, Yang Wu, Wenqiao Zhang, Yunfei Li, Juncheng Li, Siliang Tang, Yueting Zhuang
Instruction tuning fine-tunes pre-trained Multi-modal Large Language Models (MLLMs) to handle real-world tasks.
no code implementations • 8 Dec 2024 • Leigang Qu, Haochuan Li, Wenjie Wang, Xiang Liu, Juncheng Li, Liqiang Nie, Tat-Seng Chua
To adapt SILMM to LMMs with continuous features, we propose a diversity mechanism to obtain diverse representations and a kernel-based continuous DPO for alignment.
1 code implementation • 5 Dec 2024 • Jinbin Bai, Wei Chow, Ling Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, Shuicheng Yan
HumanEdit bridges this gap by employing human annotators to construct data pairs and administrators to provide feedback.
no code implementations • 24 Nov 2024 • Qifan Yu, Wei Chow, Zhongqi Yue, Kaihang Pan, Yang Wu, Xiaoyang Wan, Juncheng Li, Siliang Tang, Hanwang Zhang, Yueting Zhuang
Instruction-based image editing aims to modify specific image elements with natural language instructions.
no code implementations • 1 Nov 2024 • Wei Chow, Juncheng Li, Qifan Yu, Kaihang Pan, Hao Fei, Zhiqi Ge, Shuai Yang, Siliang Tang, Hanwang Zhang, Qianru Sun
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval, yet struggles with complex scenarios requiring fine-grained semantic differentiation.
no code implementations • 2 Oct 2024 • Bingchen Miao, Wenqiao Zhang, Juncheng Li, Siliang Tang, Zhaocheng Li, Haochen Shi, Jun Xiao, Yueting Zhuang
To address this practical challenge, we introduce a first-of-its-kind study that comprehensively investigates Modality-Incomplete Industrial Anomaly Detection (MIIAD), to consider the imperfect learning environment in which the multimodal information may be incomplete.
no code implementations • 30 Sep 2024 • Kaihang Pan, Zhaoyu Fan, Juncheng Li, Qifan Yu, Hao Fei, Siliang Tang, Richang Hong, Hanwang Zhang, Qianru Sun
In this paper, we propose UniKE, a novel multimodal editing method that establishes a unified perspective and paradigm for intrinsic knowledge editing and external knowledge resorting.
1 code implementation • 27 Sep 2024 • Hongzhe Huang, Zhewen Yu, Jiang Liu, Li Cai, Dian Jiao, Wenqiao Zhang, Siliang Tang, Juncheng Li, Hao Jiang, Haoyuan Li, Yueting Zhuang
Recent advances in Multi-modal Large Language Models (MLLMs), such as LLaVA-series models, are driven by massive machine-generated instruction-following data tuning.
no code implementations • 25 Sep 2024 • Longguang Wang, Yulan Guo, Juncheng Li, Hongda Liu, Yang Zhao, Yingqian Wang, Zhi Jin, Shuhang Gu, Radu Timofte
This paper summarizes the 3rd NTIRE challenge on stereo image super-resolution (SR) with a focus on new solutions and results.
no code implementations • 24 Aug 2024 • Tianxiang Huang, Jing Shi, Ge Jin, Juncheng Li, Jun Wang, Jun Du, Jun Shi
In this work, we propose a novel hip landmark detection model by integrating the Topological GCN (TGCN) with an Improved Conformer (TGCN-ICF) into a unified frame-work to improve detection performance.
1 code implementation • 19 Aug 2024 • Tianwei Lin, Jiang Liu, Wenqiao Zhang, Zhaocheng Li, Yang Dai, Haoyuan Li, Zhelun Yu, Wanggui He, Juncheng Li, Hao Jiang, Siliang Tang, Yueting Zhuang
Considering this, we introduce an innovative PEFT method, TeamLoRA, consisting of a collaboration and competition module for experts, and thus achieving the right balance of effectiveness and efficiency: (i) For collaboration, a novel knowledge-sharing and -organizing mechanism is devised to appropriately reduce the scale of matrix operations, thereby boosting the training and inference speed.
Ranked #187 on Visual Question Answering on MM-Vet
1 code implementation • 3 May 2024 • Kaihang Pan, Siliang Tang, Juncheng Li, Zhaoyu Fan, Wei Chow, Shuicheng Yan, Tat-Seng Chua, Yueting Zhuang, Hanwang Zhang
For multimodal LLMs, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge.
1 code implementation • 1 May 2024 • Juncheng Li, David J. Cappelleri
In this paper, we present Sim-Grasp, a robust 6-DOF two-finger grasping system that integrates advanced language models for enhanced object manipulation in cluttered environments.
1 code implementation • 28 Apr 2024 • Zhiqi Ge, Hongzhe Huang, Mingze Zhou, Juncheng Li, Guoming Wang, Siliang Tang, Yueting Zhuang
As for evaluation, we build WorldNet, a multimodal state transition prediction benchmark encompassing varied real-life scenarios.
no code implementations • 21 Apr 2024 • Haoyu Zheng, Wenqiao Zhang, Yaoke Wang, Hao Zhou, Jiang Liu, Juncheng Li, Zheqi Lv, Siliang Tang, Yueting Zhuang
Revolutionary advancements in text-to-image models have unlocked new dimensions for sophisticated content creation, e. g., text-conditioned image editing, allowing us to edit the diverse images that convey highly complex visual concepts according to the textual guidance.
no code implementations • 17 Apr 2024 • Minghe Gao, Shuang Chen, Liang Pang, Yuan YAO, Jisheng Dang, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang, Tat-Seng Chua
Their ability to execute intricate compositional reasoning tasks is also constrained, culminating in a stagnation of learning progression for these models.
1 code implementation • 20 Mar 2024 • Wenqiao Zhang, Tianwei Lin, Jiang Liu, Fangxun Shu, Haoyuan Li, Lei Zhang, He Wanggui, Hao Zhou, Zheqi Lv, Hao Jiang, Juncheng Li, Siliang Tang, Yueting Zhuang
Recent advancements indicate that scaling up Multimodal Large Language Models (MLLMs) effectively enhances performance on downstream multimodal tasks.
Ranked #189 on Visual Question Answering on MM-Vet
1 code implementation • 18 Feb 2024 • Long Qian, Juncheng Li, Yu Wu, Yaobo Ye, Hao Fei, Tat-Seng Chua, Yueting Zhuang, Siliang Tang
Large Language Models (LLMs) demonstrate remarkable proficiency in comprehending and handling text-based tasks.
no code implementations • CVPR 2024 • Xinyi Jiang, Guoming Wang, Junhao Guo, Juncheng Li, Wenqiao Zhang, Rongxing Lu, Siliang Tang
On MM-Vet our method achieves an improvement in MM-Vet scores increasing from 31. 1 to 32. 4.
no code implementations • CVPR 2024 • Longguang Wang, Juncheng Li, Yingqian Wang, Qingyong Hu, Yulan Guo
The difficulty of acquiring high-resolution (HR) and low-resolution (LR) image pairs in real scenarios limits the performance of existing learning-based image super-resolution (SR) methods in the real world.
1 code implementation • CVPR 2024 • Qifan Yu, Juncheng Li, Longhui Wei, Liang Pang, Wentao Ye, Bosheng Qin, Siliang Tang, Qi Tian, Yueting Zhuang
Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks.
no code implementations • CVPR 2024 • Wenqiao Zhang, Zheqi Lv, Hao Zhou, Jia-Wei Liu, Juncheng Li, Mengze Li, Siliang Tang, Yueting Zhuang
Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate. This setting neglects the more practical scenario where training data are collected from multiple sources.
no code implementations • 21 Nov 2023 • Minghe Gao, Juncheng Li, Hao Fei, Liang Pang, Wei Ji, Guoming Wang, Zheqi Lv, Wenqiao Zhang, Siliang Tang, Yueting Zhuang
Visual programming, a modular and generalizable paradigm, integrates different modules and Python operators to solve various vision-language tasks.
no code implementations • 29 Sep 2023 • Wei Ji, Li Li, Hao Fei, Xiangyan Liu, Xun Yang, Juncheng Li, Roger Zimmermann
Referring Image Understanding (RIS) has been extensively studied over the past decade, leading to the development of advanced algorithms.
1 code implementation • 27 Sep 2023 • Wenjie Li, Mei Wang, Kai Zhang, Juncheng Li, Xiaoming Li, Yuhang Zhang, Guangwei Gao, Weihong Deng, Chia-Wen Lin
We also discuss notable benchmarks commonly utilized in the field.
no code implementations • 19 Aug 2023 • Kaihang Pan, Juncheng Li, Wenjie Wang, Hao Fei, Hongye Song, Wei Ji, Jun Lin, Xiaozhong Liu, Tat-Seng Chua, Siliang Tang
Recent studies indicate that dense retrieval models struggle to perform well on a wide variety of retrieval tasks that lack dedicated training data, as different retrieval tasks often entail distinct search intents.
1 code implementation • 8 Aug 2023 • Juncheng Li, Kaihang Pan, Zhiqi Ge, Minghe Gao, Wei Ji, Wenqiao Zhang, Tat-Seng Chua, Siliang Tang, Hanwang Zhang, Yueting Zhuang
This shortcoming results in MLLMs' underperformance in comprehending demonstrative instructions consisting of multiple, interleaved, and multimodal instructions that demonstrate the required context to complete a task.
no code implementations • 5 Jul 2023 • Saisai Ding, Jun Wang, Juncheng Li, Jun Shi
The PT is developed to reduce redundant instances in bags by integrating prototypical learning into the Transformer architecture.
no code implementations • 12 Jun 2023 • Jian Wang, Liang Qiao, Shichong Zhou, Jin Zhou, Jun Wang, Juncheng Li, Shihui Ying, Cai Chang, Jun Shi
To address this issue, a novel Two-Stage Detection and Diagnosis Network (TSDDNet) is proposed based on weakly supervised learning to enhance diagnostic accuracy of the ultrasound-based CAD for breast cancers.
no code implementations • 25 May 2023 • Saisai Ding, Juncheng Li, Jun Wang, Shihui Ying, Jun Shi
The key idea of MEGT is to adopt two independent Efficient Graph-based Transformer (EGT) branches to process the low-resolution and high-resolution patch embeddings (i. e., tokens in a Transformer) of WSIs, respectively, and then fuse these tokens via a multi-scale feature fusion module (MFFM).
1 code implementation • 25 May 2023 • Juncheng Li, David J. Cappelleri
This paper presents Sim-Suction, a robust object-aware suction grasp policy for mobile manipulation platforms with dynamic camera viewpoints, designed to pick up unknown objects from cluttered environments.
1 code implementation • 23 May 2023 • Xiangnan Chen, Qian Xiao, Juncheng Li, Duo Dong, Jun Lin, Xiaozhong Liu, Siliang Tang
GOSE initiates by generating preliminary relation predictions on entity pairs extracted from a scanned image of the document.
1 code implementation • 22 May 2023 • Qifan Yu, Juncheng Li, Wentao Ye, Siliang Tang, Yueting Zhuang
Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images.
no code implementations • 21 May 2023 • Bosheng Qin, Juncheng Li, Siliang Tang, Tat-Seng Chua, Yueting Zhuang
We introduce InstructVid2Vid, an end-to-end diffusion-based methodology for video editing guided by human language instructions.
1 code implementation • 17 May 2023 • Juncheng Li, David J. Cappelleri
Our dataset generation process combines analytic models and dynamic simulations of the entire cluttered environment to provide accurate grasp labels.
1 code implementation • 22 Apr 2023 • Hanhui Yang, Juncheng Li, Lok Ming Lui, Shihui Ying, Jun Shi, Tieyong Zeng
To solve this problem, we propose a lightweight and accurate Edge Attention MRI Reconstruction Network (EAMRI) to reconstruct images with edge guidance.
no code implementations • 13 Apr 2023 • Juncheng Li, Bodong Cheng, Ying Chen, Guangwei Gao, Tieyong Zeng
Transformer-based image denoising methods have achieved encouraging results in the past year.
no code implementations • 24 Mar 2023 • Hansheng Guo, Juncheng Li, Guangwei Gao, Zhi Li, Tieyong Zeng
Stereo image super-resolution aims to boost the performance of image super-resolution by exploiting the supplementary information provided by binocular systems.
1 code implementation • ICCV 2023 • Qifan Yu, Juncheng Li, Yu Wu, Siliang Tang, Wei Ji, Yueting Zhuang
Based on that, we further introduce a novel Entangled cross-modal prompt approach for open-world predicate scene graph generation (Epic), where models can generalize to unseen predicates in a zero-shot manner.
1 code implementation • 22 Mar 2023 • Kaihang Pan, Juncheng Li, Hongye Song, Jun Lin, Xiaozhong Liu, Siliang Tang
Though effective, prompt tuning under few-shot settings on the one hand heavily relies on a good initialization of soft prompts.
no code implementations • ICCV 2023 • Juncheng Li, Minghe Gao, Longhui Wei, Siliang Tang, Wenqiao Zhang, Mengze Li, Wei Ji, Qi Tian, Tat-Seng Chua, Yueting Zhuang
Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-training models to adapt to downstream tasks in a parameter -- and data -- efficient way, by learning the ``soft prompts'' to condition frozen pre-training models.
1 code implementation • 21 Feb 2023 • Guoan Xu, Juncheng Li, Guangwei Gao, Huimin Lu, Jian Yang, Dong Yue
In the past decade, convolutional neural networks (CNNs) have shown prominence for semantic segmentation.
no code implementations • 22 Jan 2023 • Juncheng Li, Siliang Tang, Linchao Zhu, Wenqiao Zhang, Yi Yang, Tat-Seng Chua, Fei Wu, Yueting Zhuang
To systematically benchmark the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.
1 code implementation • CVPR 2023 • Wei Ji, Renjie Liang, Zhedong Zheng, Wenqiao Zhang, Shengyu Zhang, Juncheng Li, Mengze Li, Tat-Seng Chua
Moreover, we treat the uncertainty score of frames in a video as a whole, and estimate the difficulty of each video, which can further relieve the burden of video selection.
1 code implementation • 29 Dec 2022 • Wenjie Li, Juncheng Li, Guangwei Gao, Weihong Deng, Jian Yang, Guo-Jun Qi, Chia-Wen Lin
Lightweight image super-resolution aims to reconstruct high-resolution images from low-resolution images using low computational costs.
no code implementations • 24 Nov 2022 • Bosheng Qin, Juncheng Li, Siliang Tang, Yueting Zhuang
Furthermore, we show that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, optimizing the attention in bilinear form.
1 code implementation • 4 Aug 2022 • Juncheng Li, Xin He, Longhui Wei, Long Qian, Linchao Zhu, Lingxi Xie, Yueting Zhuang, Qi Tian, Siliang Tang
Large-scale vision-language pre-training has shown impressive advances in a wide range of downstream tasks.
1 code implementation • 3 Aug 2022 • Juncheng Li, Junlin Xie, Linchao Zhu, Long Qian, Siliang Tang, Wenqiao Zhang, Haochen Shi, Shengyu Zhang, Longhui Wei, Qi Tian, Yueting Zhuang
In this paper, we introduce a new task, named Temporal Emotion Localization in videos~(TEL), which aims to detect human emotions and localize their corresponding temporal boundaries in untrimmed videos with aligned subtitles.
no code implementations • 18 Jul 2022 • Maopeng Ran, Shuai Feng, Juncheng Li, Lihua Xie
This paper is concerned with the quantized consensus problem for uncertain nonlinear multi-agent systems under data-rate constraints and Denial-of-Service (DoS) attacks.
4 code implementations • 13 Jul 2022 • Po-Yao Huang, Hu Xu, Juncheng Li, Alexei Baevski, Michael Auli, Wojciech Galuba, Florian Metze, Christoph Feichtenhofer
Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers.
Ranked #4 on Speaker Identification on VoxCeleb1 (using extra training data)
no code implementations • 11 Jul 2022 • Bodong Cheng, Juncheng Li, Ying Chen, Shuyi Zhang, Tieyong Zeng
Recently, some methods have been proposed for snow removing, and most methods deal with snow images directly as the optimization object.
no code implementations • 9 Jul 2022 • Wenqiao Zhang, Jiannan Guo, Mengze Li, Haochen Shi, Shengyu Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang
In this scenario, the input image serves as an intuitive context and background for the search, while the corresponding language expressly requests new traits on how specific characteristics of the query image should be modified in order to get the intended target image.
1 code implementation • 6 Jul 2022 • Wenjie Li, Juncheng Li, Guangwei Gao, Jiantao Zhou, Jian Yang, Guo-Jun Qi
Recently, Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks due to the ability of global feature extraction.
no code implementations • 29 Apr 2022 • Juncheng Li, Hanhui Yang, Qiaosi Yi, Faming Fang, Guangwei Gao, Tieyong Zeng, Guixu Zhang
Single image denoising (SID) has achieved significant breakthroughs with the development of deep learning.
1 code implementation • 28 Apr 2022 • Guangwei Gao, Zhengxue Wang, Juncheng Li, Wenjie Li, Yi Yu, Tieyong Zeng
Single-image super-resolution (SISR) has achieved significant breakthroughs with the development of deep learning.
no code implementations • 20 Apr 2022 • Longguang Wang, Yulan Guo, Yingqian Wang, Juncheng Li, Shuhang Gu, Radu Timofte
In this paper, we summarize the 1st NTIRE challenge on stereo image super-resolution (restoration of rich details in a pair of low-resolution stereo images) with a focus on new solutions and results.
1 code implementation • 19 Apr 2022 • Guangwei Gao, Zixiang Xu, Juncheng Li, Jian Yang, Tieyong Zeng, Guo-Jun Qi
Then, we design an efficient Feature Refinement Module (FRM) to enhance the encoded features.
1 code implementation • CVPR 2022 • Juncheng Li, Junlin Xie, Long Qian, Linchao Zhu, Siliang Tang, Fei Wu, Yi Yang, Yueting Zhuang, Xin Eric Wang
To systematically measure the compositional generalizability of temporal grounding models, we introduce a new Compositional Temporal Grounding task and construct two new dataset splits, i. e., Charades-CG and ActivityNet-CG.
1 code implementation • 16 Dec 2021 • Guangwei Gao, Wenjie Li, Juncheng Li, Fei Wu, Huimin Lu, Yi Yu
Convolutional neural networks based single-image super-resolution (SISR) has made great progress in recent years.
no code implementations • 13 Dec 2021 • Wenqiao Zhang, Haochen Shi, Jiannan Guo, Shengyu Zhang, Qingpeng Cai, Juncheng Li, Sihui Luo, Yueting Zhuang
We propose the Multimodal relAtional Graph adversarIal inferenCe (MAGIC) framework for diverse and unpaired TextCap.
1 code implementation • 29 Sep 2021 • Juncheng Li, Zehua Pei, Wenjie Li, Guangwei Gao, Longguang Wang, Yingqian Wang, Tieyong Zeng
This is an exhaustive survey of SISR, which can help researchers better understand SISR and inspire more exciting research in this field.
1 code implementation • 2 Sep 2021 • Guangwei Gao, Guoan Xu, Juncheng Li, Yi Yu, Huimin Lu, Jian Yang
Specifically, FBSNet employs a symmetrical encoder-decoder structure with two branches, semantic information branch and spatial detail branch.
1 code implementation • 25 Aug 2021 • Zhisheng Lu, Juncheng Li, Hong Liu, Chaoyan Huang, Linlin Zhang, Tieyong Zeng
LTB is composed of a series of Efficient Transformers (ET), which occupies a small GPU memory occupation, thanks to the specially designed Efficient Multi-Head Attention (EMHA).
1 code implementation • ICCV 2021 • Qiaosi Yi, Juncheng Li, Qinyan Dai, Faming Fang, Guixu Zhang, Tieyong Zeng
Although these methods can remove part of the rain streaks, it is difficult for them to adapt to real-world scenarios and restore high-quality rain-free images with clear and accurate structures.
no code implementations • ICCV 2021 • Juncheng Li, Siliang Tang, Linchao Zhu, Haochen Shi, Xuanwen Huang, Fei Wu, Yi Yang, Yueting Zhuang
Secondly, we introduce semantic coherence learning to explicitly encourage the semantic coherence of the adaptive hierarchical graph network from three hierarchies.
1 code implementation • 2 Jun 2021 • Qinyan Dai, Juncheng Li, Qiaosi Yi, Faming Fang, Guixu Zhang
Besides the cross-view information exploitation in the low-resolution (LR) space, HR representations produced by the SR process are utilized to perform HR disparity estimation with higher accuracy, through which the HR features can be aggregated to generate a finer SR result.
no code implementations • 24 Mar 2021 • Zhengxue Wang, Guangwei Gao, Juncheng Li, Yi Yu, Huimin Lu
Recently, the single image super-resolution (SISR) approaches with deep and complex convolutional neural network structures have achieved promising performance.
no code implementations • 24 Feb 2021 • Qiaosi Yi, Juncheng Li, Faming Fang, Aiwen Jiang, Guixu Zhang
To achieve this, we propose a Multi-scale Topological Network (MSTN) to fully explore the features at different scales.
no code implementations • 5 Jan 2021 • Qiaosi Yi, Yunxing Liu, Aiwen Jiang, Juncheng Li, Kangfu Mei, Mingwen Wang
Although the emergence of deep learning has greatly promoted the development of this field, crowd counting under cluttered background is still a serious challenge.
no code implementations • 1 Jan 2021 • Dong Chen, Lingfei Wu, Siliang Tang, Fangli Xu, Juncheng Li, Chang Zong, Chilie Tan, Yueting Zhuang
In particular, we first cast the meta-overfitting problem (overfitting on sampling and label noise) as a gradient noise problem since few available samples cause meta-learner to overfit on existing examples (clean or corrupted) of an individual task at every gradient step.
no code implementations • 12 Sep 2020 • Ze Cheng, Juncheng Li, Chenxu Wang, Jixuan Gu, Hao Xu, Xinjian Li, Florian Metze
In this paper, we provide a theoretical explanation that low total correlation of sampled representation cannot guarantee low total correlation of the mean representation.
1 code implementation • 30 Aug 2020 • Juncheng Li, Faming Fang, Jiaqian Li, Kangfu Mei, Guixu Zhang
Among them, MDCB aims to detect multi-scale features and maximize the use of image features flow at different scales, HFDB focuses on adaptively recalibrate channel-wise feature responses to achieve feature distillation, and DRB attempts to reconstruct SR images with different upsampling factors in a single model.
no code implementations • 11 Aug 2020 • Jiacheng Li, Siliang Tang, Juncheng Li, Jun Xiao, Fei Wu, ShiLiang Pu, Yueting Zhuang
In this paper, we focus on enhancing the generalization ability of the VIST model by considering the few-shot setting.
no code implementations • 24 Jun 2020 • Kangfu Mei, Yao Lu, Qiaosi Yi, Hao-Yu Wu, Juncheng Li, Rui Huang
Perceptual learning approaches like perceptual loss are empirically powerful for such tasks but they usually rely on the pre-trained classification network to provide features, which are not necessarily optimal in terms of visual perception of image transformation.
1 code implementation • 26 Feb 2020 • Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham Neubig, Alan W. black, Florian Metze
Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages.
no code implementations • 26 Feb 2020 • Xinjian Li, Siddharth Dalmia, David R. Mortensen, Juncheng Li, Alan W. black, Florian Metze
The difficulty of this task is that phoneme inventories often differ between the training languages and the target language, making it infeasible to recognize unseen phonemes.
no code implementations • 25 Feb 2020 • Han Wang, Juncheng Li, Maopeng Ran, Lihua Xie
Our method is compared with the state-of-the-art loop closure detection methods and the results show that it outperforms the traditional methods at both recall rate and speed.
1 code implementation • 19 Nov 2019 • Kangfu Mei, Juncheng Li, Jiajie Zhang, Hao-Yu Wu, Jie Li, Rui Huang
However, plenty of studies have shown that global information is crucial for image restoration tasks like image demosaicing and enhancing.
no code implementations • CVPR 2020 • Juncheng Li, Xin Wang, Siliang Tang, Haizhou Shi, Fei Wu, Yueting Zhuang, William Yang Wang
Visual navigation is a task of training an embodied agent by intelligently navigating to a target object (e. g., television) using only visual observations.
no code implementations • 5 Aug 2019 • Juncheng Li, Siliang Tang, Fei Wu, Yueting Zhuang
The experimental results and further analysis prove that the agent with the MIND module is superior to its counterparts not only in EQA performance but in many other aspects such as route planning, behavioral interpretation, and the ability to generalize from a few examples.
1 code implementation • 21 Mar 2019 • Juncheng Li, Frank R. Schmidt, J. Zico Kolter
In this work, we consider an alternative question: is it possible to fool deep classifiers, over all perceived objects of a certain type, by physically manipulating the camera itself?
3 code implementations • 22 Oct 2018 • Yun Wang, Juncheng Li, Florian Metze
This paper compares five types of pooling functions both theoretically and experimentally, with special focus on their performance of localization.
Sound Audio and Speech Processing
1 code implementation • 4 Oct 2018 • Kangfu Mei, Aiwen Jiang, Juncheng Li, Mingwen Wang
Most of them follow a classic atmospheric scattering model which is an elegant simplified physical model based on the assumption of single-scattering and homogeneous atmospheric medium.
1 code implementation • 3 Oct 2018 • Kangfu Mei, Aiwen Jiang, Juncheng Li, Jihua Ye, Mingwen Wang
Recent works on single-image super-resolution are concentrated on improving performance through enhancing spatial encoding between convolutional layers.
1 code implementation • ECCV 2018 • Juncheng Li, Faming Fang, Kangfu Mei, Guixu Zhang
Meanwhile, we let these features interact with each other to get the most efficacious image information, we call this structure Multi-scale Residual Block (MSRB).
1 code implementation • ICMR 2018 • Niluthpol Chowdhury Mithun, Juncheng Li, Florian Metze, Amit K. Roy-Chowdhury
Constructing a joint representation invariant across different modalities (e. g., video, language) is of significant importance in many multimedia applications.
Ranked #37 on Video Retrieval on MSR-VTT
1 code implementation • 20 Mar 2017 • Juncheng Li, Wei Dai, Florian Metze, Shuhui Qu, Samarjit Das
On these features, we apply five models: Gaussian Mixture Model (GMM), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Convolutional Deep Neural Net- work (CNN) and i-vector.
no code implementations • 29 Nov 2016 • Shuhui Qu, Juncheng Li, Wei Dai, Samarjit Das
Based on the procedure of log Mel-filter banks, we design a filter bank learning layer.
10 code implementations • 1 Oct 2016 • Wei Dai, Chia Dai, Shuhui Qu, Juncheng Li, Samarjit Das
Our CNNs, with up to 34 weight layers, are efficient to optimize over very long sequences (e. g., vector of size 32000), necessary for processing acoustic waveforms.