1 code implementation • 9 Jan 2025 • Qi Chen, Changli Wu, Jiayi Ji, Yiwei Ma, Danni Yang, Xiaoshuai Sun
To tackle intent ambiguity, we designed a Prompt-Aware Decoder (PAD) that guides the decoding process by deriving task-driven signals from the interaction between the expression and visual features.
1 code implementation • 10 Dec 2024 • Jinlu Zhang, Jiji Tang, Rongsheng Zhang, Tangjie Lv, Xiaoshuai Sun
Story visualization has gained increasing attention in artificial intelligence.
1 code implementation • 5 Dec 2024 • Bo Tong, Bokai Lai, Yiyi Zhou, Gen Luo, Yunhang Shen, Ke Li, Xiaoshuai Sun, Rongrong Ji
Despite a big leap forward in capability, multimodal large language models (MLLMs) tend to behave like a sloth in practical use, i. e., slow response and large latency.
Ranked #68 on Visual Question Answering on MM-Vet
1 code implementation • 3 Dec 2024 • Changli Wu, Qi Chen, Jiayi Ji, Haowei Wang, Yiwei Ma, You Huang, Gen Luo, Hao Fei, Xiaoshuai Sun, Rongrong Ji
The RG-SAN consists of the Text-driven Localization Module (TLM) and the Rule-guided Weak Supervision (RWS) strategy.
1 code implementation • 29 Nov 2024 • Qiong Wu, Wenhao Lin, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji
In particular, we reveal that visual tokens will stop contributing to reasoning when the text tokens receive enough image information, yielding obvious visual redundancy.
no code implementations • 25 Nov 2024 • Yubin Gu, Yuan Meng, Xiaoshuai Sun, Jiayi Ji, Weijian Ruan, Rongrong Ji
In this paper, we propose a novel multiple-in-one IR model that can effectively restore images with both single and mixed degradations.
no code implementations • 22 Nov 2024 • Yijun Fan, Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji
To our knowledge, this is the first method to generate 3D objects from any modality prompts.
no code implementations • 17 Oct 2024 • Yaxin Luo, Gen Luo, Jiayi Ji, Yiyi Zhou, Xiaoshuai Sun, Zhiqiang Shen, Rongrong Ji
In $\gamma$-MoD, a novel metric is proposed to guide the deployment of MoDs in the MLLM, namely rank of attention maps (ARank).
Ranked #166 on Visual Question Answering on MM-Vet
1 code implementation • 6 Oct 2024 • Ke Sun, Shen Chen, Taiping Yao, Hong Liu, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji
The rapid progress of Deepfake technology has made face swapping highly realistic, raising concerns about the malicious use of fabricated facial content.
1 code implementation • 26 Aug 2024 • Yiwei Ma, Jiayi Ji, Ke Ye, Weihuang Lin, Zhibin Wang, Yonghan Zheng, Qiang Zhou, Xiaoshuai Sun, Rongrong Ji
We will open-source I2EBench, including all instructions, input images, human annotations, edited images from all evaluated methods, and a simple script for evaluating the results from new IIE models.
1 code implementation • 19 Aug 2024 • Mingrui Wu, Oucheng Huang, Jiayi Ji, Jiale Li, Xinyue Cai, Huafeng Kuang, Jianzhuang Liu, Xiaoshuai Sun, Rongrong Ji
In this work, we propose a training-free, trajectory-based controllable T2I approach, termed TraDiffusion.
1 code implementation • 11 Aug 2024 • Ziyin Zhou, Ke Sun, Zhongxi Chen, Huafeng Kuang, Xiaoshuai Sun, Rongrong Ji
The rapid progress in generative models has given rise to the critical task of AI-Generated Content Stealth (AIGC-S), which aims to create AI-generated images that can evade both forensic detectors and human inspection.
1 code implementation • 31 Jul 2024 • Mingrui Wu, Xinyue Cai, Jiayi Ji, Jiale Li, Oucheng Huang, Gen Luo, Hao Fei, Guannan Jiang, Xiaoshuai Sun, Rongrong Ji
We observe that attention, as the core module of MLLMs, connects text prompt tokens and visual tokens, ultimately determining the final results.
2 code implementations • 30 Jul 2024 • Changli Wu, Yihang Liu, Jiayi Ji, Yiwei Ma, Haowei Wang, Gen Luo, Henghui Ding, Xiaoshuai Sun, Rongrong Ji
3D Referring Expression Segmentation (3D-RES) is dedicated to segmenting a specific instance within a 3D space based on a natural language description.
1 code implementation • 23 Jul 2024 • Yiwei Ma, Zhibin Wang, Xiaoshuai Sun, Weihuang Lin, Qiang Zhou, Jiayi Ji, Rongrong Ji
However, the quadratic complexity of the vision encoder in MLLMs constrains the resolution of input images.
Ranked #162 on Visual Question Answering on MM-Vet
1 code implementation • 7 Jul 2024 • Danni Yang, Ruohan Dong, Jiayi Ji, Yiwei Ma, Haowei Wang, Xiaoshuai Sun, Rongrong Ji
Specifically, we introduce the DiffPNG framework, a straightforward yet effective approach that fully capitalizes on the diffusion's architecture for segmentation by decomposing the process into a sequence of localization, segmentation, and refinement steps.
1 code implementation • 7 Jul 2024 • Zhipeng Qian, Yiwei Ma, Zhekai Lin, Jiayi Ji, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji
3D referring expression comprehension (3DREC) and segmentation (3DRES) have overlapping objectives, indicating their potential for collaboration.
1 code implementation • 24 Jun 2024 • Mingrui Wu, Jiayi Ji, Oucheng Huang, Jiale Li, Yuhang Wu, Xiaoshuai Sun, Rongrong Ji
We identify three types of relationship co-occurrences that lead to hallucinations: relationship-relationship, subject-relationship, and relationship-object.
no code implementations • 17 Jun 2024 • Zhipeng Qian, Pei Zhang, Baosong Yang, Kai Fan, Yiwei Ma, Derek F. Wong, Xiaoshuai Sun, Rongrong Ji
This paper introduces AnyTrans, an all-encompassing framework for the task-Translate AnyText in the Image (TATI), which includes multilingual text translation and text fusion within images.
no code implementations • 9 Jun 2024 • Yiwei Ma, Xiaoshuai Sun, Jiayi Ji, Guannan Jiang, Weilin Zhuang, Rongrong Ji
To address this issue, we propose an effective bi-directional one-to-many embedding paradigm that offers a clear optimization direction for each sample, thus mitigating the optimization problem.
1 code implementation • 3 Jun 2024 • Danni Yang, Jiayi Ji, Yiwei Ma, Tianyu Guo, Haowei Wang, Xiaoshuai Sun, Rongrong Ji
These strategies are designed to extract the most accurate masks from SAM's output, thus guiding the training of the student model with enhanced precision.
1 code implementation • 1 Jun 2024 • Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Xiaopeng Hong, Yongjian Wu, Rongrong Ji
This paper explores a novel dynamic network for vision and language tasks, where the inferring structure is customized on the fly for different inputs.
no code implementations • 2 May 2024 • Yiwei Ma, Zhekai Lin, Jiayi Ji, Yijun Fan, Xiaoshuai Sun, Rongrong Ji
Recent advancements in automatic 3D avatar generation guided by text have made significant progress.
1 code implementation • 31 Mar 2024 • Xiaorui Huang, Gen Luo, Chaoyang Zhu, Bo Tong, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji
Recently, Segment Anything Model (SAM) has become a research hotspot in the fields of multimedia and computer vision, which exhibits powerful yet versatile capabilities on various (un) conditional image segmentation tasks.
1 code implementation • 27 Mar 2024 • Zhongxi Chen, Ke Sun, Ziyin Zhou, Xianming Lin, Xiaoshuai Sun, Liujuan Cao, Rongrong Ji
The rapid progress in deep learning has given rise to hyper-realistic facial forgery methods, leading to concerns related to misinformation and security risks.
1 code implementation • Proceedings of the AAAI Conference on Artificial Intelligence 2024 • Mingrui Wu, Yuqi Liu, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji
To address this challenge, we introduce a simple Disentangled HOI Detection (DHD) model for detecting novel relationships by integrating an open-set object detector with a Visual Language Model (VLM).
1 code implementation • 22 Mar 2024 • Qiong Wu, Weihao Ye, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji
In this paper, we propose a novel parameter and computation efficient tuning method for Multi-modal Large Language Models (MLLMs), termed Efficient Attention Skipping (EAS).
1 code implementation • 11 Mar 2024 • Jinlu Zhang, Yiyi Zhou, Qiancheng Zheng, Xiaoxiong Du, Gen Luo, Jun Peng, Xiaoshuai Sun, Rongrong Ji
Text-to-3D-aware face (T3D Face) generation and manipulation is an emerging research hot spot in machine learning, which still suffers from low efficiency and poor quality.
1 code implementation • 5 Mar 2024 • Gen Luo, Yiyi Zhou, Yuxin Zhang, Xiawu Zheng, Xiaoshuai Sun, Rongrong Ji
Contrary to previous works, we study this problem from the perspective of image resolution, and reveal that a combination of low- and high-resolution visual features can effectively mitigate this shortcoming.
Ranked #152 on Visual Question Answering on MM-Vet
1 code implementation • 15 Jan 2024 • Siyu Zou, Jiji Tang, Yiyi Zhou, Jing He, Chaoyi Zhao, Rongsheng Zhang, Zhipeng Hu, Xiaoshuai Sun
In particular, InstDiffEdit aims to employ the cross-modal attention ability of existing diffusion models to achieve instant mask guidance during the diffusion steps.
1 code implementation • CVPR 2024 • Sihan Liu, Yiwei Ma, Xiaoqing Zhang, Haowei Wang, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji
Referring Remote Sensing Image Segmentation (RRSIS) is a new challenge that combines computer vision and natural language processing, delineating specific regions in aerial images as described by textual queries.
1 code implementation • 30 Nov 2023 • Yiwei Ma, Yijun Fan, Jiayi Ji, Haowei Wang, Xiaoshuai Sun, Guannan Jiang, Annan Shu, Rongrong Ji
Nevertheless, a substantial domain gap exists between 2D images and 3D assets, primarily attributed to variations in camera-related attributes and the exclusive presence of foreground objects.
1 code implementation • 1 Nov 2023 • Minglang Huang, Yiyi Zhou, Gen Luo, Guannan Jiang, Weilin Zhuang, Xiaoshuai Sun
To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e. g., referring points or grounding boxes, for efficient RES training.
1 code implementation • 27 Oct 2023 • Danni Yang, Jiayi Ji, Xiaoshuai Sun, Haowei Wang, Yinan Li, Yiwei Ma, Rongrong Ji
Remarkably, our SS-PNG-NW+ outperforms fully-supervised models with only 30% and 50% supervision data, exceeding their performance by 0. 8% and 1. 1% respectively.
1 code implementation • 17 Oct 2023 • Haowei Wang, Jiayi Ji, Tianyu Guo, Yilong Yang, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji
To address this, we introduce two cascading modules based on the barycenter of the mask, which are Coordinate Guided Aggregation (CGA) and Barycenter Driven Localization (BDL), responsible for segmentation and detection, respectively.
1 code implementation • 14 Oct 2023 • Jiayi Ji, Haowei Wang, Changli Wu, Yiwei Ma, Xiaoshuai Sun, Rongrong Ji
The rising importance of 3D understanding, pivotal in computer vision, autonomous driving, and robotics, is evident.
1 code implementation • NeurIPS 2023 • Qiong Wu, Wei Yu, Yiyi Zhou, Shubin Huang, Xiaoshuai Sun, Rongrong Ji
In this paper, we aim at parameter and computation efficient transfer learning (PCETL) for VLP models.
1 code implementation • 31 Aug 2023 • Changli Wu, Yiwei Ma, Qi Chen, Haowei Wang, Gen Luo, Jiayi Ji, Xiaoshuai Sun
In 3D Referring Expression Segmentation (3D-RES), the earlier approach adopts a two-stage paradigm, extracting segmentation proposals and then matching them with referring expressions.
no code implementations • 11 Aug 2023 • Ke Sun, Shen Chen, Taiping Yao, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji
In this paper, we focus on a novel and challenging problem: Continual Face Forgery Detection (CFFD), which aims to efficiently learn from new forgery attacks without forgetting previous ones.
1 code implementation • 6 Aug 2023 • Haowei Wang, Jiji Tang, Jiayi Ji, Xiaoshuai Sun, Rongsheng Zhang, Yiwei Ma, Minda Zhao, Lincheng Li, Zeng Zhao, Tangjie Lv, Rongrong Ji
Insufficient synergy neglects the idea that a robust 3D representation should align with the joint vision-language space, rather than independently aligning with each modality.
no code implementations • 31 Jul 2023 • Ke Sun, Shen Chen, Taiping Yao, Haozhe Yang, Xiaoshuai Sun, Shouhong Ding, Rongrong Ji
To address this issues, in this paper, we propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
1 code implementation • 30 Jun 2023 • Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Tianshuo Xu, Xiaoshuai Sun, Tongliang Liu, Rongrong Ji, DaCheng Tao
Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight.
1 code implementation • 1 Jun 2023 • Shubin Huang, Qiong Wu, Yiyi Zhou, WeiJie Chen, Rongsheng Zhang, Xiaoshuai Sun, Rongrong Ji
In addition, we also experiment DVP with the recently popular adapter approach to keep the most parameters of PLMs intact when adapting to VL tasks, helping PLMs achieve a quick shift between single- and multi-modal tasks.
1 code implementation • NeurIPS 2023 • Gen Luo, Yiyi Zhou, Tianhe Ren, Shengxin Chen, Xiaoshuai Sun, Rongrong Ji
To validate MMA, we apply it to a recent LLM called LLaMA and term this formed large vision-language instructed model as LaVIN.
1 code implementation • ICCV 2023 • Yiwei Ma, Xiaioqing Zhang, Xiaoshuai Sun, Jiayi Ji, Haowei Wang, Guannan Jiang, Weilin Zhuang, Rongrong Ji
Text-driven 3D stylization is a complex and crucial task in the fields of computer vision (CV) and computer graphics (CG), aimed at transforming a bare mesh to fit a target text.
1 code implementation • CVPR 2022 • Peng Mi, Jianghang Lin, Yiyi Zhou, Yunhang Shen, Gen Luo, Xiaoshuai Sun, Liujuan Cao, Rongrong Fu, Qiang Xu, Rongrong Ji
In this paper, we study teacher-student learning from the perspective of data initialization and propose a novel algorithm called Active Teacher(Source code are available at: \url{https://github. com/HunterJ-Lin/ActiveTeacher}) for semi-supervised object detection (SSOD).
1 code implementation • 22 Feb 2023 • Gen Luo, Yiyi Zhou, Lei Jin, Xiaoshuai Sun, Rongrong Ji
In addition to this challenge, we also reveal two key issues in one-stage SSOD, which are low-quality pseudo-labeling and multi-task optimization conflict, respectively.
1 code implementation • 16 Feb 2023 • Gen Luo, Minglang Huang, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Zhiyu Wang, Rongrong Ji
Experimental results show the superior performance and efficiency of RepAdapter than the state-of-the-art PETL methods.
1 code implementation • 13 Feb 2023 • Yiwei Ma, Jiayi Ji, Xiaoshuai Sun, Yiyi Zhou, Rongrong Ji
In this paper, we study the local visual modeling with grid features for image captioning, which is critical for generating accurate and detailed captions.
1 code implementation • 9 Jan 2023 • Haowei Wang, Jiayi Ji, Yiyi Zhou, Yongjian Wu, Xiaoshuai Sun
Extensive experiments on the PNG benchmark dataset demonstrate the effectiveness and efficiency of our method.
no code implementations • CVPR 2023 • Lei Jin, Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Annan Shu, Rongrong Ji
Based on RefCLIP, we further propose the first model-agnostic weakly supervised training scheme for existing REC models, where RefCLIP acts as a mature teacher to generate pseudo-labels for teaching common REC models.
no code implementations • CVPR 2023 • Jiamu Sun, Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Zhiyu Wang, Rongrong Ji
In this paper, we present the first attempt of semi-supervised learning for REC and propose a strong baseline method called RefTeacher.
2 code implementations • 11 Oct 2022 • Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji, DaCheng Tao
One of the popular solutions is Sharpness-Aware Minimization (SAM), which smooths the loss landscape via minimizing the maximized change of training loss when adding a perturbation to the weight.
1 code implementation • CVPR 2023 • Jingjia Huang, Yinan Li, Jiashi Feng, Xinglong Wu, Xiaoshuai Sun, Rongrong Ji
We then introduce \textbf{Clover}\textemdash a Correlated Video-Language pre-training method\textemdash towards a universal Video-Language model for solving multiple video understanding tasks with neither performance nor efficiency compromise.
Ranked #1 on Video Question Answering on LSMDC-FiB
2 code implementations • 15 Jul 2022 • Yiwei Ma, Guohai Xu, Xiaoshuai Sun, Ming Yan, Ji Zhang, Rongrong Ji
However, cross-grained contrast, which is the contrast between coarse-grained representations and fine-grained representations, has rarely been explored in prior research.
Ranked #12 on Video Retrieval on MSVD
1 code implementation • 17 Apr 2022 • Gen Luo, Yiyi Zhou, Jiamu Sun, Xiaoshuai Sun, Rongrong Ji
But the most encouraging finding is that with much less training overhead and parameters, SimREC can still achieve better performance than a set of large-scale pre-trained models, e. g., UNITER and VILLA, portraying the special role of REC in existing V&L research.
1 code implementation • 16 Apr 2022 • Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Yan Wang, Liujuan Cao, Yongjian Wu, Feiyue Huang, Rongrong Ji
Despite the exciting performance, Transformer is criticized for its excessive parameters and computation cost.
1 code implementation • 2 Apr 2022 • Jing He, Yiyi Zhou, Qi Zhang, Jun Peng, Yunhang Shen, Xiaoshuai Sun, Chao Chen, Rongrong Ji
Pixel synthesis is a promising research paradigm for image generation, which can well exploit pixel-wise prior knowledge for generation.
1 code implementation • 1 Apr 2022 • Mingrui Wu, Jiaxin Gu, Yunhang Shen, Mingbao Lin, Chao Chen, Xiaoshuai Sun
Extensive experiments on HICO-Det dataset demonstrate that our model discovers potential interactive pairs and enables the recognition of unseen HOIs.
Human-Object Interaction Detection Knowledge Distillation +4
3 code implementations • 30 Mar 2022 • Chaoyang Zhu, Yiyi Zhou, Yunhang Shen, Gen Luo, Xingjia Pan, Mingbao Lin, Chao Chen, Liujuan Cao, Xiaoshuai Sun, Rongrong Ji
In this paper, we propose a simple yet universal network termed SeqTR for visual grounding tasks, e. g., phrase localization, referring expression comprehension (REC) and segmentation (RES).
Ranked #11 on Referring Expression Segmentation on RefCOCO testB
no code implementations • 13 Mar 2022 • Chengpeng Dai, Fuhai Chen, Xiaoshuai Sun, Rongrong Ji, Qixiang Ye, Yongjian Wu
Recently, automatic video captioning has attracted increasing attention, where the core challenge lies in capturing the key semantic items, like objects and actions as well as their spatial-temporal correlations from the redundant frames and semantic content.
no code implementations • 12 Mar 2022 • Fuhai Chen, Xuri Ge, Xiaoshuai Sun, Yue Gao, Jianzhuang Liu, Fufeng Chen, Wenjie Li
The key of referring expression comprehension lies in capturing the cross-modal visual-linguistic relevance.
no code implementations • CVPR 2022 • Mingrui Wu, Xuying Zhang, Xiaoshuai Sun, Yiyi Zhou, Chao Chen, Jiaxin Gu, Xing Sun, Rongrong Ji
Current Image captioning (IC) methods predict textual words sequentially based on the input visual information from the visual feature extractor and the partially generated sentence information.
1 code implementation • 17 Oct 2021 • Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Yongjian Wu, Yue Gao, Rongrong Ji
Based on the LaConv module, we further build the first fully language-driven convolution network, termed as LaConvNet, which can unify the visual recognition and multi-modal reasoning in one forward structure.
1 code implementation • CVPR 2021 • Xuying Zhang, Xiaoshuai Sun, Yunpeng Luo, Jiayi Ji, Yiyi Zhou, Yongjian Wu, Feiyue Huang, Rongrong Ji
Then, we build a BERTbased language model to extract language context and propose Adaptive-Attention (AA) module on top of a transformer decoder to adaptively measure the contribution of visual and language cues before making decisions for word prediction.
1 code implementation • 16 Jan 2021 • Yunpeng Luo, Jiayi Ji, Xiaoshuai Sun, Liujuan Cao, Yongjian Wu, Feiyue Huang, Chia-Wen Lin, Rongrong Ji
Descriptive region features extracted by object detection networks have played an important role in the recent advancements of image captioning.
1 code implementation • ICCV 2021 • Yiyi Zhou, Tianhe Ren, Chaoyang Zhu, Xiaoshuai Sun, Jianzhuang Liu, Xinghao Ding, Mingliang Xu, Rongrong Ji
Due to the superior ability of global dependency modeling, Transformer and its variants have become the primary choice of many vision-and-language tasks.
1 code implementation • 13 Dec 2020 • Jiayi Ji, Yunpeng Luo, Xiaoshuai Sun, Fuhai Chen, Gen Luo, Yongjian Wu, Yue Gao, Rongrong Ji
The latter contains a Global Adaptive Controller that can adaptively fuse the global information into the decoder to guide the caption generation.
no code implementations • 1 Dec 2020 • Mingbao Lin, Rongrong Ji, Xiaoshuai Sun, Baochang Zhang, Feiyue Huang, Yonghong Tian, DaCheng Tao
To achieve fast online adaptivity, a class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches.
2 code implementations • CVPR 2020 • Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Chenglin Wu, Cheng Deng, Rongrong Ji
In addition, we address a key challenge in this multi-task setup, i. e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).
Generalized Referring Expression Comprehension Referring Expression +2
1 code implementation • 7 Dec 2019 • Yiyi Zhou, Rongrong Ji, Gen Luo, Xiaoshuai Sun, Jinsong Su, Xinghao Ding, Chia-Wen Lin, Qi Tian
Referring Expression Comprehension (REC) is an emerging research spot in computer vision, which refers to detecting the target region in an image given an text description.
no code implementations • NeurIPS 2019 • Fuhai Chen, Rongrong Ji, Jiayi Ji, Xiaoshuai Sun, Baochang Zhang, Xuri Ge, Yongjian Wu, Feiyue Huang, Yan Wang
To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema.
no code implementations • 20 Nov 2019 • Sheng Jin, Shangchen Zhou, Yao Liu, Chao Chen, Xiaoshuai Sun, Hongxun Yao, Xian-Sheng Hua
In this paper, we propose a novel Semi-supervised Self-pace Adversarial Hashing method, named SSAH to solve the above problems in a unified framework.
no code implementations • 21 Oct 2019 • Shen Chen, Liujuan Cao, Mingbao Lin, Yan Wang, Xiaoshuai Sun, Chenglin Wu, Jingfei Qiu, Rongrong Ji
Specifically, we utilize an off-the-shelf algorithm to generate a binary Hadamard codebook to satisfy the requirement of bit independence and bit balance, which subsequently serves as the desired outputs of the hash functions learning.
1 code implementation • 18 Oct 2019 • Haozhe Xie, Hongxun Yao, Shangchen Zhou, Shengping Zhang, Xiaoshuai Sun, Wenxiu Sun
Inferring the 3D shape of an object from an RGB image has shown impressive results, however, existing methods rely primarily on recognizing the most similar 3D model from the training set to solve the problem.
no code implementations • 14 Oct 2019 • Ying Zheng, Hongxun Yao, Xiaoshuai Sun, Shengping Zhang, Sicheng Zhao, Fatih Porikli
Conventional methods for this task often rely on the availability of the temporal order of sketch strokes, additional cues acquired from different modalities and supervised augmentation of sketch datasets with real images, which also limit the applicability and feasibility of these methods in real scenarios.
no code implementations • 14 Oct 2019 • Ying Zheng, Hongxun Yao, Xiaoshuai Sun
First, we propose a homogeneous transformation method to address the problem of domain adaptation.
no code implementations • 9 Oct 2019 • Fuhai Chen, Rongrong Ji, Chengpeng Dai, Xiaoshuai Sun, Chia-Wen Lin, Jiayi Ji, Baochang Zhang, Feiyue Huang, Liujuan Cao
Specially, we propose a novel Structured-Spatial Semantic Embedding model for image deblurring (termed S3E-Deblur), which introduces a novel Structured-Spatial Semantic tree model (S3-tree) to bridge two basic tasks in computer vision: image deblurring (ImD) and image captioning (ImC).
no code implementations • 7 Aug 2019 • Chen Shen, Rongrong Ji, Fuhai Chen, Xiaoshuai Sun, Xiangming Li
Specifically, the proposed module first embeds the scene concepts into factored weights explicitly and attends the visual information extracted from the input image.
no code implementations • 6 Aug 2019 • Rongrong Ji, Ke Li, Yan Wang, Xiaoshuai Sun, Feng Guo, Xiaowei Guo, Yongjian Wu, Feiyue Huang, Jiebo Luo
In this paper, we address the problem of monocular depth estimation when only a limited number of training image-depth pairs are available.
1 code implementation • NeurIPS 2019 • Jie Hu, Rongrong Ji, Shengchuan Zhang, Xiaoshuai Sun, Qixiang Ye, Chia-Wen Lin, Qi Tian
Learning representations with diversified information remains as an open problem.
no code implementations • 31 May 2019 • Mingbao Lin, Rongrong Ji, Shen Chen, Feng Zheng, Xiaoshuai Sun, Baochang Zhang, Liujuan Cao, Guodong Guo, Feiyue Huang
In this paper, we propose to model the similarity distributions between the input data and the hashing codes, upon which a novel supervised online hashing method, dubbed as Similarity Distribution based Online Hashing (SDOH), is proposed, to keep the intrinsic semantic relationship in the produced Hamming space.
1 code implementation • 11 May 2019 • Mingbao Lin, Rongrong Ji, Hong Liu, Xiaoshuai Sun, Shen Chen, Qi Tian
We then treat the learning of hash functions as a set of binary classification problems to fit the assigned target code.
5 code implementations • ICCV 2019 • Haozhe Xie, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Shengping Zhang
Then, a context-aware fusion module is introduced to adaptively select high-quality reconstructions for each part (e. g., table legs) from different coarse 3D volumes to obtain a fused 3D volume.
Ranked #4 on 3D Object Reconstruction on Data3D−R2N2
1 code implementation • 29 Jan 2019 • Mingbao Lin, Rongrong Ji, Hong Liu, Xiaoshuai Sun, Yongjian Wu, Yunsheng Wu
In this paper, we propose a novel supervised online hashing method, termed Balanced Similarity for Online Discrete Hashing (BSODH), to solve the above problems in a unified framework.
no code implementations • 9 Nov 2018 • Xiaoshuai Sun
The two pathways characterize both long-term and short-term attention cues and are integrated dynamically using maxima normalization.
no code implementations • 4 Jul 2018 • Sheng Jin, Hongxun Yao, Xiaoshuai Sun, Shangchen Zhou, Lei Zhang, Xian-Sheng Hua
As the core of DSaH, the saliency loss guides the attention network to mine discriminative regions from pairs of images.
no code implementations • CVPR 2018 • Fuhai Chen, Rongrong Ji, Xiaoshuai Sun, Yongjian Wu, Jinsong Su
In offline optimization, we adopt an end-to-end formulation, which jointly trains the visual tree parser, the structured relevance and diversity constraints, as well as the LSTM based captioning model.
no code implementations • 8 May 2018 • Zheng Xu, Xitong Yang, Xue Li, Xiaoshuai Sun
We propose a novel deep neural network architecture for the challenging problem of single image dehazing, which aims to recover the clear image from a degraded hazy image.
no code implementations • CVPR 2013 • Xiaoshuai Sun, Xin-Jing Wang, Hongxun Yao, Lei Zhang
In this paper, we propose a computational model of visual representativeness by integrating cognitive theories of representativeness heuristics with computer vision and machine learning techniques.