no code implementations • Findings (EMNLP) 2021 • Jingwen Xu, Jing Zhang, Xirui Ke, Yuxiao Dong, Hong Chen, Cuiping Li, Yongbin Liu
Its general process is to first encode the implicit relation of an entity pair and then match the relation of a query entity pair with the relations of the reference entity pairs.
no code implementations • SemEval (NAACL) 2022 • Jing Zhang, Yujin Wang
Online misogyny meme detection is an image/text multimodal classification task, the complicated relation of image and text challenges the intelligent system’s modality fusion learning capability.
no code implementations • ACL 2022 • Yimeng Zhuang, Jing Zhang, Mei Tu
(2) A sparse attention matrix estimation module, which predicts dominant elements of an attention matrix based on the output of the previous hidden state cross module.
no code implementations • Findings (EMNLP) 2021 • Yu Feng, Jing Zhang, Gaole He, Wayne Xin Zhao, Lemao Liu, Quan Liu, Cuiping Li, Hong Chen
Knowledge Base Question Answering (KBQA) is to answer natural language questions posed over knowledge bases (KBs).
1 code implementation • ACL 2022 • Daniel Zhang-li, Jing Zhang, Jifan Yu, Xiaokang Zhang, Peng Zhang, Jie Tang, Juanzi Li
We investigate the usage of entity linking (EL)in downstream tasks and present the first modularized EL toolkit for easy task adaptation.
no code implementations • 25 Apr 2024 • Shen Zhang, Haojie Zhang, Jing Zhang, Xudong Zhang, Yimeng Zhuang, Jinting Wu
In human-computer interaction, it is crucial for agents to respond to human by understanding their emotions.
no code implementations • 24 Apr 2024 • Ting Luo, Jing Zhang, Yingwei Qiu, Li Zhang, Yaohua Hu, Zhuliang Yu, Zhen Liang
The proposed MDDD includes four main modules: manifold feature transformation, dynamic distribution alignment, classifier learning, and ensemble learning.
1 code implementation • 22 Apr 2024 • Yuxin Mao, Xuyang Shen, Jing Zhang, Zhen Qin, Jinxing Zhou, Mochu Xiang, Yiran Zhong, Yuchao Dai
To support research in this field, we have developed a comprehensive Text to Audible-Video Generation Benchmark (TAVGBench), which contains over 1. 7 million clips with a total duration of 11. 8 thousand hours.
no code implementations • 17 Apr 2024 • Xinmei Huang, Haoyang Li, Jing Zhang, Xinxin Zhao, Zhiming Yao, Yiyan Li, Zhuohao Yu, Tieying Zhang, Hong Chen, Cuiping Li
Database knob tuning is a critical challenge in the database community, aiming to optimize knob values to enhance database performance for specific workloads.
2 code implementations • 10 Apr 2024 • Xiaokang Zhang, Zijun Yao, Jing Zhang, Kaifeng Yun, Jifan Yu, Juanzi Li, Jie Tang
Detecting non-factual content is a longstanding goal to increase the trustworthiness of large language models (LLMs) generations.
no code implementations • 8 Apr 2024 • Haimei Zhao, Jing Zhang, Zhuo Chen, Shanshan Zhao, DaCheng Tao
We devote UniMix to two main setups: 1) unsupervised domain adaption, adapting the model from the clear weather source domain to the adverse weather target domain; 2) domain generalization, learning a model that generalizes well to unseen scenes in adverse weather.
1 code implementation • 6 Apr 2024 • Pengxiao Han, Changkun Ye, Jieming Zhou, Jing Zhang, Jie Hong, Xuesong Li
We propose a new approach, the Latent-based Diffusion Model for Long-tailed Recognition (LDMLR), as a feature augmentation method to tackle the issue.
no code implementations • 4 Apr 2024 • Jifan Yu, Xiaohan Zhang, Yifan Xu, Xuanyu Lei, Zijun Yao, Jing Zhang, Lei Hou, Juanzi Li
Recently, knowledge-grounded dialogue generation models, that intentionally invoke external knowledge resources to more informative responses, are also proven to be effective in reducing hallucination.
no code implementations • 4 Apr 2024 • Zhongkai Wu, Ziyu Wan, Jing Zhang, Jing Liao, Dong Xu
Instead of reconstructing a blurred NeRF by averaging inconsistencies, we introduce a novel approach using Generative Adversarial Networks (GANs) for NeRF generation to better accommodate the geometric and appearance inconsistencies present in the multi-view images.
1 code implementation • 2 Apr 2024 • Shasha Guo, Lizi Liao, Jing Zhang, Yanling Wang, Cuiping Li, Hong Chen
Knowledge base question generation (KBQG) aims to generate natural language questions from a set of triplet facts extracted from KB.
no code implementations • 28 Mar 2024 • Kexin Shi, Jing Zhang, Linjiajie Fang, Wenjia Wang, BingYi Jing
In implicit collaborative filtering, hard negative mining techniques are developed to accelerate and enhance the recommendation model learning.
1 code implementation • 28 Mar 2024 • Xiaokang Zhang, Jing Zhang, Zeyao Ma, Yang Li, Bohan Zhang, Guanlin Li, Zijun Yao, Kangli Xu, Jinchang Zhou, Daniel Zhang-li, Jifan Yu, Shu Zhao, Juanzi Li, Jie Tang
We introduce TableLLM, a robust large language model (LLM) with 13 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios.
no code implementations • 28 Mar 2024 • Xiaodong Chen, Yuxuan Hu, Jing Zhang
Based on this phenomenon, we propose LLM-Streamline, which consists of two parts: layer pruning, where we remove a set of consecutive layers with the lowest importance in the model according to the target sparsity; and layer replacement, where we train a lightweight model to substitute the pruned layers, thereby mitigating the performance degradation caused by pruning.
1 code implementation • 27 Mar 2024 • Xiaofeng Cong, Jie Gui, Jing Zhang, JunMing Hou, Hao Shen
There are two distinctions between nighttime and daytime haze.
no code implementations • 23 Mar 2024 • Sihan Ma, Qiong Cao, Jing Zhang, DaCheng Tao
This paper addresses the problem of generating 3D interactive human motion from text.
no code implementations • 21 Mar 2024 • Peipei Song, Jing Zhang, Piotr Koniusz, Nick Barnes
Existing eye fixation prediction methods perform the mapping from input images to the corresponding dense fixation maps generated from raw fixation points.
1 code implementation • 20 Mar 2024 • Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, HaoNan Guo, Bo Du, DaCheng Tao, Liangpei Zhang
However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
Ranked #1 on Semantic Segmentation on SpaceNet 1 (using extra training data)
Aerial Scene Classification Building change detection for remote sensing images +12
no code implementations • 19 Mar 2024 • Jing Zhang, Irving Fang, Juexiao Zhang, Hao Wu, Akshat Kaushik, Alice Rodriguez, Hanwen Zhao, Zhuo Zheng, Radu Iovita, Chen Feng
Most importantly, the LUWA dataset provides an underexplored opportunity for vision and learning communities and complements existing image classification problems on common objects.
1 code implementation • 18 Mar 2024 • Yanling Wang, Jing Zhang, Lingxi Zhang, Lixin Liu, Yuxiao Dong, Cuiping Li, Hong Chen, Hongzhi Yin
Open-world semi-supervised learning (Open-world SSL) for node classification, that classifies unlabeled nodes into seen classes or multiple novel classes, is a practical but under-explored problem in the graph community.
1 code implementation • 17 Mar 2024 • Jing Zhang, Liang Zheng, Dan Guo, Meng Wang
This paper develops small vision language models to understand visual art, which, given an art work, aims to identify its emotion category and explain this prediction with natural language.
no code implementations • 9 Mar 2024 • Daniel Zhang-li, Nianyi Lin, Jifan Yu, Zheyuan Zhang, Zijun Yao, Xiaokang Zhang, Lei Hou, Jing Zhang, Juanzi Li
Recent advancements in pretraining have demonstrated that modern Large Language Models (LLMs) possess the capability to effectively learn arithmetic operations.
1 code implementation • 26 Feb 2024 • Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, Hong Chen
To address the limitations, we introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B, specifically designed for the text-to-SQL task.
no code implementations • 20 Feb 2024 • Chao Xue, Di Liang, Pengfei Wang, Jing Zhang
In the real world, many facts contained in KGs are time-constrained thus temporal KGQA has received increasing attention.
1 code implementation • 14 Feb 2024 • Zhexiong Liu, Jing Zhang, Jiaying Lu, Wenjing Ma, Joyce C Ho
Logic reasoning has been critically needed in problem-solving and decision-making.
no code implementations • 8 Feb 2024 • Zhikai Li, Xuewen Liu, Jing Zhang, Qingyi Gu
In particular, for the former, we introduce a learnable per-channel dual clipping scheme, which is designed to efficiently identify outliers in the unbalanced activations with fine granularity.
no code implementations • 7 Feb 2024 • Xin Zhao, Shiyu Hu, Yipei Wang, Jing Zhang, Yimin Hu, Rongshuai Liu, Haibin Ling, Yin Li, Renshu Li, Kun Liu, Jiadong Li
These challenges are especially manifested in videos captured by unmanned aerial vehicles (UAV), where the target is usually far away from the camera and often with significant motion relative to the camera.
no code implementations • 4 Feb 2024 • Weizheng Lu, Jiaming Zhang, Jing Zhang, Yueguo Chen
Tables, typically two-dimensional and structured to store large amounts of data, are essential in daily activities like database queries, spreadsheet calculations, and generating reports from web tables.
no code implementations • 1 Feb 2024 • Fanzhe Fu, Junru Chen, Jing Zhang, Carl Yang, Lvbin Ma, Yang Yang
Time-series data presents limitations stemming from data quality issues, bias and vulnerabilities, and generalization problem.
1 code implementation • 1 Feb 2024 • Jitao Sang, Yuhang Wang, Jing Zhang, Yanxu Zhu, Chao Kong, Junhong Ye, Shuyu Wei, Jinlin Xiao
In the first phase, based on human supervision, the quality of weak supervision is enhanced through a combination of scalable oversight and ensemble learning, reducing the capability gap between weak teachers and strong students.
1 code implementation • 31 Jan 2024 • Maoyuan Ye, Jing Zhang, Juhua Liu, Chenyu Liu, BaoCai Yin, Cong Liu, Bo Du, DaCheng Tao
In terms of the AMG mode, Hi-SAM segments text stroke foreground masks initially, then samples foreground points for hierarchical text mask generation and achieves layout analysis in passing.
Ranked #1 on Hierarchical Text Segmentation on HierText
no code implementations • 28 Jan 2024 • Bowen Tang, Long Yan, Jing Zhang, Qian Yu, Lu Sheng, Dong Xu
Firstly, to recover the virtual features of the base data, we model the CLIP features of base class images as samples from a von Mises-Fisher (vMF) distribution based on the pre-trained classifier.
1 code implementation • 13 Jan 2024 • Haibin He, Maoyuan Ye, Jing Zhang, Juhua Liu, DaCheng Tao
In response to this issue, we propose to efficiently turn an off-the-shelf query-based image text spotter into a specialist on video and present a simple baseline termed GoMatching, which focuses the training efforts on tracking while maintaining strong recognition performance.
1 code implementation • 8 Jan 2024 • Yihao Li, Philippe Zhang, Yubo Tan, Jing Zhang, Zhihan Wang, Weili Jiang, Pierre-Henri Conze, Mathieu Lamard, Gwenolé Quellec, Mostafa El Habib Daho
As for Task 3 (prediction of spherical equivalent), we have designed a deep regression model based on the data distribution of the dataset and employed an integration strategy to enhance the model's prediction accuracy.
no code implementations • 2 Jan 2024 • Jing Zhang, Tengfei Zhao, Shiyu Hu, Xin Zhao
Cryo-electron microscopy (cryo-EM) has achieved near-atomic level resolution of biomolecules by reconstructing 2D micrographs.
1 code implementation • 27 Dec 2023 • XiMing Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, Qian Yu
However, existing text-to-SVG generation methods lack editability and struggle with visual quality and result diversity.
1 code implementation • 27 Dec 2023 • Xiawei Li, Qingyuan Xu, Jing Zhang, Tianyi Zhang, Qian Yu, Lu Sheng, Dong Xu
The point affinity proposed in this paper is characterized by features from multiple modalities (e. g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution.
1 code implementation • 25 Dec 2023 • Yuxiang Yang, Yingqi Deng, Yufei Xu, Jing Zhang
Animal Pose Estimation and Tracking (APT) is a critical task in detecting and monitoring the keypoints of animals across a series of video frames, which is essential for understanding animal behavior.
2 code implementations • 22 Dec 2023 • Wenxi Yue, Jing Zhang, Kun Hu, Qiuxia Wu, ZongYuan Ge, Yong Xia, Jiebo Luo, Zhiyong Wang
Specifically, we achieve this by proposing (1) Collaborative Prompts that describe instrument structures via collaborating category-level and part-level texts; (2) Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) Part-to-Whole Adaptive Fusion and Hierarchical Decoding that adaptively fuse the part-level representations into a whole for accurate instrument segmentation in surgical scenarios.
1 code implementation • 20 Dec 2023 • Zhangbin Li, Dan Guo, Jinxing Zhou, Jing Zhang, Meng Wang
These selected pairs are constrained to have larger similarity values than the mismatched pairs.
Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +4
no code implementations • 18 Dec 2023 • Nilakshan Kunananthaseelan, Jing Zhang, Mehrtash Harandi
We introduce a language-grounded visual prompting method to adapt the visual encoder of vision-language models for downstream tasks.
no code implementations • 13 Dec 2023 • Yuanbo Wen, Tao Gao, ZiQi Li, Jing Zhang, Ting Chen
Haze obscures remote sensing images, hindering valuable information extraction.
1 code implementation • 30 Nov 2023 • Xiao Liu, Xuanyu Lei, Shengyuan Wang, Yue Huang, Zhuoer Feng, Bosi Wen, Jiale Cheng, Pei Ke, Yifan Xu, Weng Lam Tam, Xiaohan Zhang, Lichao Sun, Hongning Wang, Jing Zhang, Minlie Huang, Yuxiao Dong, Jie Tang
We will provide public APIs for evaluating AlignBench with CritiqueLLM to facilitate the evaluation of LLMs' Chinese alignment.
1 code implementation • 29 Nov 2023 • Wenquan Lu, Yufei Xu, Jing Zhang, Chaoyue Wang, DaCheng Tao
Given a generated failed image due to malformed hands, we utilize ControlNet modules to re-inject such correct hand information.
no code implementations • 27 Nov 2023 • Xinyu Tian, Shu Zou, Zhaoyuan Yang, Jing Zhang
Although soft prompt tuning is effective in efficiently adapting Vision-Language (V&L) models for downstream tasks, it shows limitations in dealing with distribution shifts.
no code implementations • 26 Nov 2023 • Weijie Jin, Jing Zhang, Chao-Kai Wen, Shi Jin, Xiao Li, Shuangfeng Han
Reconfigurable intelligent surfaces (RIS) can improve signal propagation environments by adjusting the phase of the incident signal.
1 code implementation • 22 Nov 2023 • Zhe Zhang, Gaochang Wu, Jing Zhang, Chunhua Shen, DaCheng Tao, Tianyou Chai
To solve the challenge, we propose a novel DA-STC method for domain adaptive video semantic segmentation, which incorporates a bidirectional multi-level spatio-temporal fusion module and a category-aware spatio-temporal feature alignment module to facilitate consistent learning for domain-invariant features.
1 code implementation • 13 Nov 2023 • Junyang Wang, Yuhang Wang, Guohai Xu, Jing Zhang, Yukai Gu, Haitao Jia, Jiaqi Wang, Haiyang Xu, Ming Yan, Ji Zhang, Jitao Sang
Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.
1 code implementation • 12 Nov 2023 • Zhaoyuan Yang, Zhengyang Yu, Zhiwei Xu, Jaskirat Singh, Jing Zhang, Dylan Campbell, Peter Tu, Richard Hartley
We present a diffusion-based image morphing approach with perceptually-uniform sampling (IMPUS) that produces smooth, direct and realistic interpolations given an image pair.
no code implementations • 7 Nov 2023 • Hao liu, Jinrui Gan, Xiaoxuan Fan, Yi Zhang, Chuanxian Luo, Jing Zhang, Guangxin Jiang, Yucheng Qian, Changwei Zhao, Huan Ma, Zhenyu Guo
In this paper, we first point out that the unification of task objectives and adaptation for task difficulty are critical for bridging the gap between time series masked reconstruction and forecasting.
1 code implementation • ICCV 2023 • Yuxin Mao, Jing Zhang, Mochu Xiang, Yiran Zhong, Yuchao Dai
To achieve this, our ECMVAE factorizes the representations of each modality with a modality-shared representation and a modality-specific representation.
no code implementations • 26 Sep 2023 • Guozhong Zheng, Jiqiang Zhang, Jing Zhang, Weiran Cai, Li Chen
In the pairwise scenario, we reveal that high levels of trust and trustworthiness emerge when individuals appreciate both their historical experience and returns in the future.
1 code implementation • ICCV 2023 • Zhexiong Wan, Yuxin Mao, Jing Zhang, Yuchao Dai
Recently, the RGB images and point clouds fusion methods have been proposed to jointly estimate 2D optical flow and 3D scene flow.
no code implementations • 23 Sep 2023 • Shasha Guo, Jing Zhang, Xirui Ke, Cuiping Li, Hong Chen
The above insights make diversifying question generation an intriguing task, where the first challenge is evaluation metrics for diversity.
no code implementations • 19 Sep 2023 • Yuanbo Wen, Tao Gao, ZiQi Li, Jing Zhang, Ting Chen
This module leverages dimension-wise queries that are independent of the input features and employs global context-aware attention (GCA) to capture essential features while avoiding the entanglement of redundant or irrelevant information.
no code implementations • 18 Sep 2023 • Xingyu Yang, Daqing Liu, Heng Zhang, Yong Luo, Chaoyue Wang, Jing Zhang
Composed image retrieval is a type of image retrieval task where the user provides a reference image as a starting point and specifies a text on how to shift from the starting point to the desired target image.
1 code implementation • 6 Sep 2023 • Jinglong Wang, Xiawei Li, Jing Zhang, Qingyuan Xu, Qin Zhou, Qian Yu, Lu Sheng, Dong Xu
The pre-trained text-image discriminative models, such as CLIP, has been explored for open-vocabulary semantic segmentation with unsatisfactory results due to the loss of crucial localization information and awareness of object shapes.
1 code implementation • 5 Sep 2023 • Yuxiang Yang, Yingqi Deng, Jing Zhang, Jiahao Nie, Zheng-Jun Zha
The spatial information indicating objects' spatial adjacency across consecutive frames is crucial for effective object tracking.
no code implementations • 31 Aug 2023 • Yuxuan Hu, Jing Zhang, Zhe Zhao, Chen Zhao, Xiaodong Chen, Cuiping Li, Hong Chen
Structured pruning is a widely used technique for reducing the size of pre-trained language models (PLMs), but current methods often overlook the potential of compressing the hidden dimension (d) in PLMs, a dimension critical to model size and efficiency.
no code implementations • 24 Aug 2023 • Mengya Han, Heliang Zheng, Chaoyue Wang, Yong Luo, Han Hu, Jing Zhang, Yonggang Wen
In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples.
1 code implementation • 17 Aug 2023 • Wenxi Yue, Jing Zhang, Kun Hu, Yong Xia, Jiebo Luo, Zhiyong Wang
However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline.
no code implementations • 12 Aug 2023 • Xingyu Zhou, Le Liang, Jing Zhang, Chao-Kai Wen, Shi Jin
However, optimal MIMO detection is associated with a complexity that grows exponentially with the MIMO dimensions and quickly becomes impractical.
1 code implementation • 7 Aug 2023 • Yinjie Zhao, Lichen Zhao, Qian Yu, Jing Zhang, Lu Sheng, Dong Xu
The first is a Distortion Mapping Module, which guides the model to pre-adapt to distorted features globally.
1 code implementation • 31 Jul 2023 • Mengqi He, Jing Zhang, Zhaoyuan Yang, Mingyi He, Nick Barnes, Yuchao Dai
We analysis performance of semantic segmentation models wrt.
no code implementations • 31 Jul 2023 • Yuxin Mao, Jing Zhang, Mochu Xiang, Yunqiu Lv, Yiran Zhong, Yuchao Dai
We propose a latent diffusion model with contrastive learning for audio-visual segmentation (AVS) to extensively explore the contribution of audio.
1 code implementation • ICCV 2023 • Ruikai Cui, Shi Qiu, Saeed Anwar, Jiawei Liu, Chaoyue Xing, Jing Zhang, Nick Barnes
Point cloud completion aims to recover the complete shape based on a partial observation.
1 code implementation • ICCV 2023 • Mingjin Zhang, Chi Zhang, Qiming Zhang, Jie Guo, Xinbo Gao, Jing Zhang
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
1 code implementation • ICCV 2023 • Jiawei Liu, Changkun Ye, Shan Wang, Ruikai Cui, Jing Zhang, Kaihao Zhang, Nick Barnes
To improve model calibration, we propose Adaptive Stochastic Label Perturbation (ASLP) which learns a unique label perturbation level for each training image.
1 code implementation • 21 Jul 2023 • Jie Qi, Jing Zhang, Miroslav Krstic
The recently introduced DeepONet operator-learning framework for PDE control is extended from the results for basic hyperbolic and parabolic PDEs to an advanced hyperbolic class that involves delays on both the state and the system output or input.
no code implementations • 19 Jul 2023 • Mochu Xiang, Jing Zhang, Nick Barnes, Yuchao Dai
Effectively measuring and modeling the reliability of a trained model is essential to the real-world deployment of monocular depth estimation (MDE) models.
no code implementations • 10 Jul 2023 • Aixuan Li, Jing Zhang, Yunqiu Lv, Tong Zhang, Yiran Zhong, Mingyi He, Yuchao Dai
In this case, salient objects are typically non-camouflaged, and camouflaged objects are usually not salient.
1 code implementation • 7 Jul 2023 • Yunqiu Lv, Jing Zhang, Nick Barnes, Yuchao Dai
Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation.
no code implementations • 6 Jul 2023 • Peter Tu, Zhaoyuan Yang, Richard Hartley, Zhiwei Xu, Jing Zhang, Yiwei Fu, Dylan Campbell, Jaskirat Singh, Tianyu Wang
This paper begins with a description of methods for estimating image probability density functions that reflects the observation that such data is usually constrained to lie in restricted regions of the high-dimensional image space-not every pattern of pixels is an image.
1 code implementation • 5 Jul 2023 • Nicholas Heller, Fabian Isensee, Dasha Trofimova, Resha Tejpaul, Zhongchen Zhao, Huai Chen, Lisheng Wang, Alex Golts, Daniel Khapun, Daniel Shats, Yoel Shoshan, Flora Gilboa-Solomon, Yasmeen George, Xi Yang, Jianpeng Zhang, Jing Zhang, Yong Xia, Mengran Wu, Zhiyang Liu, Ed Walczak, Sean McSweeney, Ranveer Vasdev, Chris Hornung, Rafat Solaiman, Jamee Schoephoerster, Bailey Abernathy, David Wu, Safa Abdulkadir, Ben Byun, Justice Spriggs, Griffin Struyk, Alexandra Austin, Ben Simpson, Michael Hagstrom, Sierra Virnig, John French, Nitin Venkatesh, Sarah Chan, Keenan Moore, Anna Jacobsen, Susan Austin, Mark Austin, Subodh Regmi, Nikolaos Papanikolopoulos, Christopher Weight
Overall KiTS21 facilitated a significant advancement in the state of the art in kidney tumor segmentation, and provides useful insights that are applicable to the field of semantic segmentation as a whole.
1 code implementation • 4 Jul 2023 • Dingjun Wu, Jing Zhang, Xinmei Huang
The knowledge-augmented deep learning paradigm refers to a paradigm in which domain knowledge is identified and integrated into deep models.
1 code implementation • 3 Jul 2023 • Yonglin Li, Jing Zhang, Xiao Teng, Long Lan
However, it lacks proficiency in referring video object segmentation (RVOS) due to the need for precise user-interactive prompts and a limited understanding of different modalities, such as language and vision.
1 code implementation • 29 Jun 2023 • Sihan Ma, Qiong Cao, Hongwei Yi, Jing Zhang, DaCheng Tao
Demystifying complex human-ground interactions is essential for accurate and realistic 3D human motion reconstruction from RGB videos, as it ensures consistency between the humans and the ground plane.
no code implementations • 26 Jun 2023 • Lingxi Zhang, Jing Zhang, Yanling Wang, Shulin Cao, Xinmei Huang, Cuiping Li, Hong Chen, Juanzi Li
The generalization problem on KBQA has drawn considerable attention.
1 code implementation • NeurIPS 2023 • XiMing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, Dong Xu
Even though trained mainly on images, we discover that pretrained diffusion models show impressive power in guiding sketch synthesis.
1 code implementation • 19 Jun 2023 • Ting Zhe, YongQian Li, Jing Zhang, Yong Luo, Han Hu, Bo Du, Yonggang Wen, DaCheng Tao
We represent the action information in each hand interaction region as a triplet, resulting in a total of 878 action triplets.
1 code implementation • 13 Jun 2023 • Ge-Peng Ji, Jing Zhang, Dylan Campbell, Huan Xiong, Nick Barnes
Unlike existing fully-supervised approaches, we rethink colorectal polyp segmentation from an out-of-distribution perspective with a simple but effective self-supervised learning approach.
1 code implementation • 6 Jun 2023 • Fusheng Hao, Fengxiang He, Yikai Wang, Fuxiang Wu, Jing Zhang, Jun Cheng, DaCheng Tao
Massive human-related data is collected to train neural networks for computer vision tasks.
1 code implementation • 6 Jun 2023 • Aixuan Li, Yuxin Mao, Jing Zhang, Yuchao Dai
In particular, following the principle of disentangled representation learning, we introduce a mutual information upper bound with a mutual information minimization regularizer to encourage the disentangled representation of each modality for salient object detection.
1 code implementation • 5 Jun 2023 • Xinlei Niu, Christian Walder, Jing Zhang, Charles Patrick Martin
We propose a unified approach to obtain structured sparse optimal paths in the latent space of a variational autoencoder (VAE) using dynamic programming and Gumbel propagation.
1 code implementation • 31 May 2023 • Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, DaCheng Tao
In this paper, we present DeepSolo++, a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously.
Ranked #1 on Text Spotting on Inverse-Text
no code implementations • 20 May 2023 • Ming Xu, Jing Zhang
In this framework, we first develop an embedding module that contains a sampling algorithm (MGWalk) and an encoder network to learn latent representation for each road segment.
1 code implementation • 4 May 2023 • Chenzhan Shang, Yupeng Hou, Wayne Xin Zhao, Yaliang Li, Jing Zhang
In our approach, we first employ the hypergraph structure to model users' historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations.
2 code implementations • NeurIPS 2023 • Di Wang, Jing Zhang, Bo Du, Minqiang Xu, Lin Liu, DaCheng Tao, Liangpei Zhang
In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS.
no code implementations • 3 May 2023 • Tao Chen, Liang Lv, Di Wang, Jing Zhang, Yue Yang, Zeyang Zhao, Chen Wang, Xiaowei Guo, Hao Chen, Qingye Wang, Yufei Xu, Qiming Zhang, Bo Du, Liangpei Zhang, DaCheng Tao
With the world population rapidly increasing, transforming our agrifood systems to be more productive, efficient, safe, and sustainable is crucial to mitigate potential food shortages.
1 code implementation • 2 May 2023 • Haibin He, Jing Zhang, Mengyang Xu, Juhua Liu, Bo Du, DaCheng Tao
Video text spotting refers to localizing, recognizing, and tracking textual elements such as captions, logos, license plates, signs, and other forms of text within consecutive video frames.
2 code implementations • 23 Apr 2023 • Jiahao Nie, Zhiwei He, Yuxiang Yang, Zhengyi Bao, Mingyu Gao, Jing Zhang
By integrating the derived classification scores with the center-ness scores, the resulting network can effectively suppress interference proposals and further mitigate task misalignment.
1 code implementation • 19 Apr 2023 • Kunping Huang, Sen Zhang, Jing Zhang, DaCheng Tao
This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks.
1 code implementation • 19 Apr 2023 • Liang Zhang, Anwen Hu, Jing Zhang, Shuo Hu, Qin Jin
Taking into account the length of product manuals and the fact that a question is always related to a small number of pages, MPMQA can be naturally split into two subtasks: retrieving most related pages and then generating multimodal answers.
2 code implementations • 19 Apr 2023 • Di Wang, Jing Zhang, Bo Du, Liangpei Zhang, DaCheng Tao
Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions.
no code implementations • 16 Apr 2023 • Peilin Chen, Hong Wen, Jing Zhang, Fuyu Lv, Zhao Li, Qijie Shen, Wanjie Tao, Ying Zhou, Chao Zhang
Online travel platforms (OTPs), e. g., Ctrip. com or Fliggy. com, can effectively provide travel-related products or services to users.
no code implementations • 14 Apr 2023 • Jinlong Fan, Jing Zhang, DaCheng Tao
Experiments on multiple human avatars demonstrate that our UVA achieves competitive results in novel view synthesis and novel pose rendering while enabling local and independent editing of geometry and appearance.
1 code implementation • 10 Apr 2023 • Jizhizi Li, Jing Zhang, DaCheng Tao
Image matting refers to extracting precise alpha matte from natural images, and it plays a critical role in various downstream applications, such as image editing.
no code implementations • 4 Apr 2023 • Qijie Shen, Hong Wen, Jing Zhang, Qi Rao
Specifically, SIE is proposed to extract user's short-term interests by integrating three fundamental interests encoders within it namely query-dependent, target-dependent and causal-dependent interest encoder, respectively, followed by delivering the resultant representation to the module LIE, where it can effectively capture user long-term interests by devising an attention mechanism with respect to the short-term interests from SIE module.
1 code implementation • 1 Apr 2023 • Jiahao Nie, Zhiwei He, Yuxiang Yang, Xudong Lv, Mingyu Gao, Jing Zhang
Incorporating this transformer-based voting scheme into 3D RPN, a novel Siamese method dubbed GLT-T is developed for 3D single object tracking on point clouds.
2 code implementations • 29 Mar 2023 • Haimei Zhao, Qiming Zhang, Shanshan Zhao, Zhe Chen, Jing Zhang, DaCheng Tao
Multi-view camera-based 3D object detection has become popular due to its low cost, but accurately inferring 3D geometry solely from camera data remains challenging and may lead to inferior performance.
1 code implementation • 27 Mar 2023 • Qiming Zhang, Jing Zhang, Yufei Xu, DaCheng Tao
Window-based attention has become a popular choice in vision transformers due to its superior performance, lower computational complexity, and less memory footprint.
no code implementations • ICCV 2023 • Yiqian Wu, Jing Zhang, Hongbo Fu, Xiaogang Jin
To better validate our pose-conditional 3D-aware generators, we develop a new FID measure to evaluate the 3D-level performance.
no code implementations • 21 Mar 2023 • Jing Zhang, Chuanwen Li, Jianzgong Qi, Jiayuan He
We first introduce various types of class imbalance in federated learning, after which we review existing methods for estimating the extent of class imbalance without the need of knowing the actual data to preserve data privacy.
1 code implementation • 19 Mar 2023 • Kang Liao, Lang Nie, Shujuan Huang, Chunyu Lin, Jing Zhang, Yao Zhao, Moncef Gabbouj, DaCheng Tao
In this paper, we provide a comprehensive survey of learning-based camera calibration techniques, by analyzing their strengths and limitations.
1 code implementation • ICCV 2023 • Haoyu He, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang
Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty.
1 code implementation • 2 Mar 2023 • Siyuan Yan, Jing Zhang, Nick Barnes
To effectively model the two types of uncertainty, we introduce a Bayesian generative model to simultaneously estimate the posterior distribution of model parameters and its predictions.
1 code implementation • 2 Mar 2023 • Qi Zheng, Daqing Liu, Chaoyue Wang, Jing Zhang, Dadong Wang, DaCheng Tao
Vision-and-language navigation (VLN) simulates a visual agent that follows natural-language navigation instructions in real-world scenes.
no code implementations • 1 Mar 2023 • Chao Xue, Wei Liu, Shuai Xie, Zhenfang Wang, Jiaxing Li, Xuyang Peng, Liang Ding, Shanshan Zhao, Qiong Cao, Yibo Yang, Fengxiang He, Bohua Cai, Rongcheng Bian, Yiyan Zhao, Heliang Zheng, Xiangyang Liu, Dongkai Liu, Daqing Liu, Li Shen, Chang Li, Shijin Zhang, Yukang Zhang, Guanpu Chen, Shixiang Chen, Yibing Zhan, Jing Zhang, Chaoyue Wang, DaCheng Tao
Automated machine learning (AutoML) seeks to build ML models with minimal human effort.
1 code implementation • 28 Feb 2023 • Jing Zhang, Xiaokang Zhang, Daniel Zhang-li, Jifan Yu, Zijun Yao, Zeyao Ma, Yiqi Xu, Haohua Wang, Xiaohan Zhang, Nianyi Lin, Sunrui Lu, Juanzi Li, Jie Tang
We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge.
no code implementations • 24 Feb 2023 • Chao Xue, Di Liang, Sirui Wang, Wei Wu, Jing Zhang
To alleviate this problem, we propose a novel Dual Path Modeling Framework to enhance the model's ability to perceive subtle differences in sentence pairs by separately modeling affinity and difference semantics.
1 code implementation • 23 Feb 2023 • Bo Chen, Jing Zhang, Fanjin Zhang, Tianyi Han, Yuqing Cheng, Xiaoyan Li, Yuxiao Dong, Jie Tang
The toolkit is at https://github. com/THUDM/WhoIsWho.
1 code implementation • 12 Feb 2023 • Haoyang Li, Jing Zhang, Cuiping Li, Hong Chen
Due to the structural property of the SQL queries, the seq2seq model takes the responsibility of parsing both the schema items (i. e., tables and columns) and the skeleton (i. e., SQL keywords).
Ranked #1 on Semantic Parsing on spider
1 code implementation • 10 Feb 2023 • Jie zhou, Qian Yu, Chuan Luo, Jing Zhang
In recent years, thanks to the rapid development of deep learning (DL), DL-based multi-task learning (MTL) has made significant progress, and it has been successfully applied to recommendation systems (RS).
no code implementations • 7 Feb 2023 • Jinlong Fan, Jing Zhang, Zhi Hou, DaCheng Tao
In this paper, we propose AniPixel, a novel animatable and generalizable human avatar reconstruction method that leverages pixel-aligned features for body geometry prediction and RGB color blending.
1 code implementation • 30 Jan 2023 • Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong
To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
1 code implementation • NeurIPS 2023 • Jing Zhang, Chi Zhang, Wenjia Wang, Bing-Yi Jing
Due to the inability to interact with the environment, offline reinforcement learning (RL) methods face the challenge of estimating the Out-of-Distribution (OOD) points.
1 code implementation • 13 Jan 2023 • Jie Gui, Tuo Chen, Jing Zhang, Qiong Cao, Zhenan Sun, Hao Luo, DaCheng Tao
Deep supervised learning algorithms typically require a large volume of labeled data to achieve satisfactory performance.
no code implementations • ICCV 2023 • Haotian Wang, Haoang Chi, Wenjing Yang, Zhipeng Lin, Mingyang Geng, Long Lan, Jing Zhang, DaCheng Tao
As a complementary of SDPA, we also propose Target-Combined Deployment Authorization (TPDA), where unauthorized domains are partially accessible, and simplify the DSO method to a perturbation operation on the pseudo predictions, referred to as Target-Dependent Domain-Specified Optimization (TDSO).
1 code implementation • CVPR 2023 • Wenju Sun, Qingyong Li, Jing Zhang, Wen Wang, Yangli-ao Geng
BMKP decouples the functions of learning and knowledge remembering via a bilevel-memory design: a working memory responsible for adaptively model learning, to ensure plasticity; a long-term memory in charge of enduringly storing the knowledge incorporated within the learned model, to guarantee stability.
1 code implementation • CVPR 2023 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao
Perceiving potential "action possibilities" (i. e., affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions.
no code implementations • CVPR 2023 • Xinyu Tian, Jing Zhang, Mochu Xiang, Yuchao Dai
Most of the existing salient object detection (SOD) models focus on improving the overall model performance, without explicitly explaining the discrepancy between the training and testing distributions.
1 code implementation • 15 Dec 2022 • Jianzhi Long, Jicang Cai, Abdullah Al-Battal, Shiwei Jin, Jing Zhang, DaCheng Tao, Truong Nguyen
Ultrasound is progressing toward becoming an affordable and versatile solution to medical imaging.
1 code implementation • 10 Dec 2022 • Lei Ding, Jing Zhang, Kai Zhang, Haitao Guo, Bing Liu, Lorenzo Bruzzone
Semantic Change Detection (SCD) refers to the task of simultaneously extracting the changed areas and the semantic categories (before and after the changes) in Remote Sensing Images (RSIs).
Ranked #1 on Change Detection on SECOND
1 code implementation • 7 Dec 2022 • Yufei Xu, Jing Zhang, Qiming Zhang, DaCheng Tao
In this paper, we show the surprisingly good properties of plain vision transformers for body pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model dubbed ViTPose.
Ranked #1 on Animal Pose Estimation on AP-10K (using extra training data)
1 code implementation • 5 Dec 2022 • Meng Lan, Jing Zhang, Lefei Zhang, DaCheng Tao
Recently, the joint learning framework (JOINT) integrates matching based transductive reasoning and online inductive learning to achieve accurate and robust semi-supervised video object segmentation (SVOS).
no code implementations • 24 Nov 2022 • Benjamin Kiefer, Matej Kristan, Janez Perš, Lojze Žust, Fabio Poiesi, Fabio Augusto de Alcantara Andrade, Alexandre Bernardino, Matthew Dawkins, Jenni Raitoharju, Yitong Quan, Adem Atmaca, Timon Höfer, Qiming Zhang, Yufei Xu, Jing Zhang, DaCheng Tao, Lars Sommer, Raphael Spraul, Hangyue Zhao, Hongpu Zhang, Yanyun Zhao, Jan Lukas Augustin, Eui-ik Jeon, Impyeong Lee, Luca Zedda, Andrea Loddo, Cecilia Di Ruberto, Sagar Verma, Siddharth Gupta, Shishir Muralidhara, Niharika Hegde, Daitao Xing, Nikolaos Evangeliou, Anthony Tzes, Vojtěch Bartl, Jakub Špaňhel, Adam Herout, Neelanjan Bhowmik, Toby P. Breckon, Shivanand Kundargi, Tejas Anvekar, Chaitra Desai, Ramesh Ashok Tabib, Uma Mudengudi, Arpita Vats, Yang song, Delong Liu, Yonglin Li, Shuman Li, Chenhao Tan, Long Lan, Vladimir Somers, Christophe De Vleeschouwer, Alexandre Alahi, Hsiang-Wei Huang, Cheng-Yen Yang, Jenq-Neng Hwang, Pyong-Kun Kim, Kwangju Kim, Kyoungoh Lee, Shuai Jiang, Haiwen Li, Zheng Ziqiang, Tuan-Anh Vu, Hai Nguyen-Truong, Sai-Kit Yeung, Zhuang Jia, Sophia Yang, Chih-Chung Hsu, Xiu-Yu Hou, Yu-An Jhang, Simon Yang, Mau-Tsuen Yang
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection.
2 code implementations • 20 Nov 2022 • Jiahao Nie, Zhiwei He, Yuxiang Yang, Mingyu Gao, Jing Zhang
Technically, a global-local transformer (GLT) module is employed to integrate object- and patch-aware prior into seed point features to effectively form strong feature representation for geometric positions of the seed points, thus providing more robust and accurate cues for offset learning.
1 code implementation • CVPR 2023 • Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Tongliang Liu, Bo Du, DaCheng Tao
In this paper, we present DeepSolo, a simple DETR-like baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously.
Ranked #1 on Text Spotting on Total-Text (using extra training data)
1 code implementation • 13 Nov 2022 • Ruikai Cui, Shi Qiu, Saeed Anwar, Jing Zhang, Nick Barnes
Unsupervised point cloud completion aims to infer the whole geometry of a partial object observation without requiring partial-complete correspondence.
1 code implementation • 10 Nov 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, DaCheng Tao, Andreas Geiger
We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images.
Ranked #1 on Optical Flow Estimation on Sintel-clean
no code implementations • 3 Nov 2022 • Yufei Xu, Jing Zhang, Qiming Zhang, DaCheng Tao
Self-supervised pre-training vision transformer (ViT) via masked image modeling (MIM) has been proven very effective.
1 code implementation • 27 Oct 2022 • Qizhou Wang, Feng Liu, Yonggang Zhang, Jing Zhang, Chen Gong, Tongliang Liu, Bo Han
Out-of-distribution (OOD) detection aims to identify OOD data based on representations extracted from well-trained deep models.
Ranked #20 on Out-of-Distribution Detection on ImageNet-1k vs Places
no code implementations • 26 Oct 2022 • Zhaoyuan Yang, Zhiwei Xu, Jing Zhang, Richard Hartley, Peter Tu
In this work, we formulate a novel framework for adversarial robustness using the manifold hypothesis.
no code implementations • 17 Oct 2022 • Jing Zhang, Zhao Li, Jiqiang Zhang, Lin Ma, Guozhong Zheng, Li Chen
Here we show that oscillatory behaviors naturally emerge if incomplete information is incorporated into the cooperation evolution of a non-Markov model.
no code implementations • Future Generation Computer Systems 2022 • Jing Zhang, Qihan Huang, Yirui Huang, Qian Ding, Pei-Wei Tsai
Open Data Processing Services (ODPS) offers vast storage capacity and excellent efficiency, which collects and stores a lot of data.
1 code implementation • 24 Sep 2022 • Lichen Zhao, Daigang Cai, Jing Zhang, Lu Sheng, Dong Xu, Rui Zheng, Yinjie Zhao, Lipeng Wang, Xibo Fan
We also propose a new 3D VQA framework to effectively predict the completely visually grounded and explainable answer.
1 code implementation • 19 Sep 2022 • Haimei Zhao, Jing Zhang, Zhuo Chen, Bo Yuan, DaCheng Tao
Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges.
1 code implementation • 31 Aug 2022 • ZiMing Wang, Xiaoliang Huo, Zhenghao Chen, Jing Zhang, Lu Sheng, Dong Xu
In addition to previous methods that seek correspondences by hand-crafted or learnt geometric features, recent point cloud registration methods have tried to apply RGB-D data to achieve more accurate correspondence.
2 code implementations • 28 Aug 2022 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao
Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions, which makes it difficult to establish an explicit link between object parts and affordance labels.
no code implementations • 23 Aug 2022 • Zhou Yang, Jing Zhang, Chao Zhou
A robust control problem is considered in this paper, where the controlled stochastic differential equations (SDEs) include ambiguity parameters and their coefficients satisfy non-Lipschitz continuous and non-linear growth conditions, the objective function is expressed as a backward stochastic differential equation (BSDE) with the generator depending on the value function.
no code implementations • 20 Aug 2022 • Jiawei Liu, Jing Zhang, Ruikai Cui, Kaihao Zhang, Weihao Li, Nick Barnes
We propose a new setting that relaxes an assumption in the conventional Co-Salient Object Detection (CoSOD) setting by allowing the presence of "noisy images" which do not show the shared co-salient object.
no code implementations • 15 Aug 2022 • Jing Zhang, Athanasios Tsiligkaridis, Hiroshi Taguchi, Arvind Raghunathan, Daniel Nikovski
We propose a Predictive Group Elevator Scheduler by using predictive information of passengers arrivals from a Transformer based destination predictor and a linear regression model that predicts remaining time to destinations.
2 code implementations • 8 Aug 2022 • Di Wang, Qiming Zhang, Yufei Xu, Jing Zhang, Bo Du, DaCheng Tao, Liangpei Zhang
Large-scale vision foundation models have made significant progress in visual tasks on natural images, with vision transformers being the primary choice due to their good scalability and representation ability.
Ranked #1 on Aerial Scene Classification on AID (50% as trainset)
no code implementations • 28 Jul 2022 • Hai Yang, Yuhang Sheng, Yi Jiang, Xiaoyang Fang, Dongdong Li, Jing Zhang, Zhe Wang
In addition, Subtype-Former also achieved outstanding results in pan-cancer subtyping, which can help analyze the commonalities and differences across various cancer types at the molecular level.
no code implementations • 20 Jul 2022 • Yaqian Liang, Shanshan Zhao, Baosheng Yu, Jing Zhang, Fazhi He
We first randomly mask some patches of the mesh and feed the corrupted mesh into Mesh Transformers.
1 code implementation • 18 Jul 2022 • Ziqiang Li, Chaoyue Wang, Heliang Zheng, Jing Zhang, Bin Li
Since data augmentation strategies have largely alleviated the training instability, how to further improve the generative performance of DE-GANs becomes a hotspot.
1 code implementation • 16 Jul 2022 • Haimei Zhao, Jing Zhang, Sen Zhang, DaCheng Tao
A naive way is to accomplish them independently in a sequential or parallel manner, but there are many drawbacks, i. e., 1) the depth and VO results suffer from the inherent scale ambiguity issue; 2) the BEV layout is directly predicted from the front-view image without using any depth-related information, although the depth map contains useful geometry clues for inferring scene layouts.
1 code implementation • 14 Jul 2022 • Dingfeng Shi, Yujie Zhong, Qiong Cao, Jing Zhang, Lin Ma, Jia Li, DaCheng Tao
Moreover, we propose two losses to facilitate and stabilize the training of action classification.
Ranked #15 on Temporal Action Localization on THUMOS’14
no code implementations • 14 Jul 2022 • Zhe Chen, Jing Zhang, Yufei Xu, DaCheng Tao
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF) which aims to mitigate the gap between features from different levels and form a comprehensive object representation to achieve better detection performance.
1 code implementation • 11 Jul 2022 • Sen Zhang, Jing Zhang, DaCheng Tao
Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.
1 code implementation • 11 Jul 2022 • Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
1 code implementation • 10 Jul 2022 • Maoyuan Ye, Jing Zhang, Shanshan Zhao, Juhua Liu, Bo Du, DaCheng Tao
However, these methods built upon detection transformer framework might achieve sub-optimal training efficiency and performance due to coarse positional query modeling. In addition, the point label form exploited in previous works implies the reading order of humans, which impedes the detection robustness from our observation.
Ranked #3 on Scene Text Detection on SCUT-CTW1500
no code implementations • 7 Jul 2022 • Yiping Yuan, Jing Zhang, Shaunak Chatterjee, Shipeng Yu, Romer Rosales
In particular, we provide an online use case on notification delivery time optimization to show how we make better decisions, drive more user engagement, and provide more value to users.
no code implementations • 6 Jul 2022 • Jiazhen Lou, Hong Wen, Fuyu Lv, Jing Zhang, Tengfei Yuan, Zhao Li
Recommender Systems (RS), as an efficient tool to discover users' interested items from a very large corpus, has attracted more and more attention from academia and industry.
1 code implementation • CVPR 2023 • Xincheng Yao, Ruoqi Li, Jing Zhang, Jun Sun, Chongyang Zhang
In this way, our model can form a more explicit and discriminative decision boundary to distinguish known and also unseen anomalies from normal samples more effectively.
Ranked #3 on Supervised Anomaly Detection on MVTec AD (using extra training data)
1 code implementation • CVPR 2023 • Xu Zhang, Wen Wang, Zhe Chen, Yufei Xu, Jing Zhang, DaCheng Tao
Motivated by the progress of visual-language research, we propose that pre-trained language models (e. g., CLIP) can facilitate animal pose estimation by providing rich prior knowledge for describing animal keypoints in text.
no code implementations • 19 Jun 2022 • Jing Zhang
Big data have the characteristics of enormous volume, high velocity, diversity, value-sparsity, and uncertainty, which lead the knowledge learning from them full of challenges.
4 code implementations • 12 Jun 2022 • Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, DaCheng Tao
Based on APT-36K, we benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra- and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking.
1 code implementation • 11 Jun 2022 • Wei Li, Qiming Zhang, Jing Zhang, Zhen Huang, Xinmei Tian, DaCheng Tao
To address these issues, we establish a new high-quality dataset named RealRain-1k, consisting of $1, 120$ high-resolution paired clean and rainy images with low- and high-density rain streaks, respectively.
1 code implementation • CVPR 2023 • Jizhizi Li, Jing Zhang, DaCheng Tao
Different from conventional image matting, which either requires user-defined scribbles/trimap to extract a specific foreground object or directly extracts all the foreground objects in the image indiscriminately, we introduce a new task named Referring Image Matting (RIM) in this paper, which aims to extract the meticulous alpha matte of the specific object that best matches the given natural language description, thus enabling a more natural and simpler instruction for image matting.
Ranked #1 on Referring Image Matting (RefMatte-RW100) on RefMatte
1 code implementation • 23 May 2022 • Yunqiu Lv, Jing Zhang, Yuchao Dai, Aixuan Li, Nick Barnes, Deng-Ping Fan
With the above understanding about camouflaged objects, we present the first triple-task learning framework to simultaneously localize, segment, and rank camouflaged objects, indicating the conspicuousness level of camouflage.
no code implementations • 11 May 2022 • Mengqi He, Jing Zhang, Wenxin Yu
However, as a large amount of background is excluded, the foreground bounding box region contains a less complex background, making it possible to perform handcrafted features-based saliency detection with only the cropped foreground region.
1 code implementation • 7 May 2022 • Yuanbo Wen, Tao Gao, Jing Zhang, Kaihao Zhang, Ting Chen
This approach comprises two key modules, a rain streaks removal network (R$^2$Net) focusing on accurate rain removal, and a details reconstruction network (DRNet) designed to recover the textural details of rain-free images.
1 code implementation • CVPR 2022 • Xin Lin, Changxing Ding, Jing Zhang, Yibing Zhan, DaCheng Tao
Scene graph generation (SGG) aims to detect objects and predict the relationships between each pair of objects.
no code implementations • CVPR 2022 • Xianing Chen, Qiong Cao, Yujie Zhong, Jing Zhang, Shenghua Gao, DaCheng Tao
Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation.
5 code implementations • 26 Apr 2022 • Yufei Xu, Jing Zhang, Qiming Zhang, DaCheng Tao
In this paper, we show the surprisingly good capabilities of plain vision transformers for pose estimation from various aspects, namely simplicity in model structure, scalability in model size, flexibility in training paradigm, and transferability of knowledge between models, through a simple baseline model called ViTPose.
Ranked #1 on Pose Estimation on COCO test-dev
no code implementations • 21 Apr 2022 • Long Lan, Xiao Teng, Jing Zhang, Xiang Zhang, DaCheng Tao
To purify the label noise, we propose to take advantage of the knowledge of teacher model in an offline scheme.
Knowledge Distillation Unsupervised Person Re-Identification
1 code implementation • 19 Apr 2022 • Jing Zhang, Jianwen Xie, Nick Barnes, Ping Li
We propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution.
no code implementations • 18 Apr 2022 • Ziqiang Li, Beihao Xia, Jing Zhang, Chaoyue Wang, Bin Li
Generative Adversarial Networks (GANs) have achieved remarkable achievements in image synthesis.
2 code implementations • 18 Apr 2022 • Qiming Zhang, Yufei Xu, Jing Zhang, DaCheng Tao
Attention within windows has been widely explored in vision transformers to balance the performance, computation complexity, and memory footprint.
1 code implementation • 6 Apr 2022 • Sanqing Qu, Guang Chen, Jing Zhang, Zhijun Li, wei he, DaCheng Tao
Source-free Domain Adaptation (SFDA) aims to adapt a pre-trained source model to the unlabeled target domain without accessing the well-labeled source data, which is a much more practical setting due to the data privacy, security, and transmission issues.
2 code implementations • 6 Apr 2022 • Di Wang, Jing Zhang, Bo Du, Gui-Song Xia, DaCheng Tao
To this end, we train different networks from scratch with the help of the largest RS scene recognition dataset up to now -- MillionAID, to obtain a series of RS pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks.
Ranked #1 on Aerial Scene Classification on UCM (80% as trainset)
Aerial Scene Classification Building change detection for remote sensing images +5
2 code implementations • CVPR 2023 • Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, DaCheng Tao, Bohan Zhuang
In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously.
Ranked #21 on Semantic Segmentation on ADE20K
1 code implementation • 31 Mar 2022 • Sihan Ma, Jizhizi Li, Jing Zhang, He Zhang, DaCheng Tao
P3M-10k consists of 10, 421 high resolution face-blurred portrait images along with high-quality alpha mattes, which enables us to systematically evaluate both trimap-free and trimap-based matting methods and obtain some useful findings about model generalization ability under the privacy preserving training (PPT) setting.
Ranked #1 on Image Matting on P3M-10k
no code implementations • 26 Mar 2022 • Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han, Zhenghao Liu, Ning Ding, Yongming Rao, Yizhao Gao, Liang Zhang, Ming Ding, Cong Fang, Yisen Wang, Mingsheng Long, Jing Zhang, Yinpeng Dong, Tianyu Pang, Peng Cui, Lingxiao Huang, Zheng Liang, HuaWei Shen, HUI ZHANG, Quanshi Zhang, Qingxiu Dong, Zhixing Tan, Mingxuan Wang, Shuo Wang, Long Zhou, Haoran Li, Junwei Bao, Yingwei Pan, Weinan Zhang, Zhou Yu, Rui Yan, Chence Shi, Minghao Xu, Zuobai Zhang, Guoqiang Wang, Xiang Pan, Mengjie Li, Xiaoyu Chu, Zijun Yao, Fangwei Zhu, Shulin Cao, Weicheng Xue, Zixuan Ma, Zhengyan Zhang, Shengding Hu, Yujia Qin, Chaojun Xiao, Zheni Zeng, Ganqu Cui, Weize Chen, Weilin Zhao, Yuan YAO, Peng Li, Wenzhao Zheng, Wenliang Zhao, Ziyi Wang, Borui Zhang, Nanyi Fei, Anwen Hu, Zenan Ling, Haoyang Li, Boxi Cao, Xianpei Han, Weidong Zhan, Baobao Chang, Hao Sun, Jiawen Deng, Chujie Zheng, Juanzi Li, Lei Hou, Xigang Cao, Jidong Zhai, Zhiyuan Liu, Maosong Sun, Jiwen Lu, Zhiwu Lu, Qin Jin, Ruihua Song, Ji-Rong Wen, Zhouchen Lin, LiWei Wang, Hang Su, Jun Zhu, Zhifang Sui, Jiajun Zhang, Yang Liu, Xiaodong He, Minlie Huang, Jian Tang, Jie Tang
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.
no code implementations • 18 Mar 2022 • Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, Jing Zhang, Xian Wu
The acquired disease-grounded visual features can better represent the abnormal regions of the input image, which could alleviate data bias problem; 2) MGT module effectively uses the multi-grained features and Transformer framework to generate the long medical report.
2 code implementations • CVPR 2022 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao
To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i. e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision.
2 code implementations • 17 Mar 2022 • Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, DaCheng Tao
Besides, we introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.
no code implementations • 11 Mar 2022 • Sen Zhang, Jing Zhang, DaCheng Tao
In this paper, we propose a unified information theoretic framework for learning-motivated methods aimed at odometry estimation, a crucial component of many robotics and vision tasks such as navigation and virtual reality where relative camera poses are required in real time.
no code implementations • 11 Mar 2022 • Sen Zhang, Jing Zhang, DaCheng Tao
In this work, we propose VRVO, a novel framework for retrieving the absolute scale from virtual data that can be easily obtained from modern simulation environments, whereas in the real domain no stereo or ground-truth data are required in either the training or inference phases.
1 code implementation • ACL 2022 • Jing Zhang, Xiaokang Zhang, Jifan Yu, Jian Tang, Jie Tang, Cuiping Li, Hong Chen
Recent works on knowledge base question answering (KBQA) retrieve subgraphs for easier reasoning.
5 code implementations • 21 Feb 2022 • Qiming Zhang, Yufei Xu, Jing Zhang, DaCheng Tao
Vision transformers have shown great potential in various computer vision tasks owing to their strong capability to model long-range dependency using the self-attention mechanism.
Ranked #2 on Image Classification on ImageNet ReaL
1 code implementation • 5 Feb 2022 • Qijie Shen, Hong Wen, Wanjie Tao, Jing Zhang, Fuyu Lv, Zulong Chen, Zhao Li
In many classical e-commerce platforms, personalized recommendation has been proven to be of great business value, which can improve user satisfaction and increase the revenue of platforms.
1 code implementation • 6 Jan 2022 • Chen Chen, Zhe Chen, Jing Zhang, DaCheng Tao
We observe that the prevailing set abstraction design for down-sampling points may maintain too much unimportant background information that can affect feature learning for detecting objects.
1 code implementation • 5 Jan 2022 • Wenju Sun, Qingyong Li, Jing Zhang, Danyu Wang, Wen Wang, Yangli-ao Geng
DisCOIL follows the basic principle of POC, but it adopts variational auto-encoders (VAE) instead of other well-established one-class classifiers (e. g. deep SVDD), because a trained VAE can not only identify the probability of an input sample belonging to a class but also generate pseudo samples of the class to assist in learning new tasks.
1 code implementation • CVPR 2022 • Mingjin Zhang, Rui Zhang, Yuxiang Yang, Haichen Bai, Jing Zhang, Jie Guo
TOAA block calculates the low-level information with attention mechanism in both row and column directions and fuses it with the high-level information to capture the shape characteristic of targets and suppress noises.
no code implementations • CVPR 2022 • Daigang Cai, Lichen Zhao, Jing Zhang, Lu Sheng, Dong Xu
Observing that the 3D captioning task and the 3D grounding task contain both shared and complementary information in nature, in this work, we propose a unified framework to jointly solve these two distinct but closely related tasks in a synergistic fashion, which consists of both shared task-agnostic modules and lightweight task-specific modules.
1 code implementation • 28 Dec 2021 • Meng Lan, Jing Zhang, Fengxiang He, Lefei Zhang
Semi-supervised video object segmentation (VOS) refers to segmenting the target object in remaining frames given its annotation in the first frame, which has been actively studied in recent years.
no code implementations • 28 Dec 2021 • Jiawei Liu, Jing Zhang, Nick Barnes
We study semi-supervised salient object detection, with access to a small number of labeled samples and a large number of unlabeled samples.
no code implementations • 27 Dec 2021 • Xiaofeng Pan, Ming Li, Jing Zhang, Keren Yu, Luping Wang, Hong Wen, Chengjun Mao, Bo Cao
At last, we develop an Ensemble Prediction Network (EPN) which incorporates the output of FRN and DMN to make the final CVR prediction.
no code implementations • NeurIPS 2021 • Jing Zhang, Jianwen Xie, Nick Barnes, Ping Li
In this paper, we take a step further by proposing a novel generative vision transformer with latent variables following an informative energy-based prior for salient object detection.
1 code implementation • 27 Dec 2021 • Xiaofeng Pan, Yibin Shen, Jing Zhang, Xu He, Yang Huang, Hong Wen, Chengjun Mao, Bo Cao
In this paper, we propose a novel CTR model named MOEF for recommendations under frequent changes of occasions.
1 code implementation • AAAI 2022 2021 • Yue He, Chen Chen, Jing Zhang, Juhua Liu, Fengxiang He, Chaoyue Wang, Bo Du
Technically, given the character segmentation maps predicted by a VR model, we construct a subgraph for each instance, where nodes represent the pixels in it and edges are added between nodes based on their spatial similarity.
Ranked #9 on Scene Text Recognition on ICDAR2015 (using extra training data)
1 code implementation • 12 Dec 2021 • Yu Feng, Jing Zhang, Xiaokang Zhang, Lemao Liu, Cuiping Li, Hong Chen
Embedding-based methods are popular for Knowledge Base Question Answering (KBQA), but few current models have numerical reasoning skills and thus struggle to answer ordinal constrained questions.
1 code implementation • CVPR 2022 • Zhe Chen, Jing Zhang, DaCheng Tao
Then, a glimpse-based decoder is introduced to provide refined detection results based on both the glimpse features and the attention modeling outputs of the previous stage.
Ranked #1 on Object Detection on MS COCO (GFlops metric)
2 code implementations • 6 Dec 2021 • Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, Michael A. Yee, Jing Zhang, Yue Zhang
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.