Search Results for author: Jun Xiao

Found 99 papers, 37 papers with code

De-Biased Court's View Generation with Causality

no code implementations EMNLP 2020 Yiquan Wu, Kun Kuang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Jun Xiao, Yueting Zhuang, Luo Si, Fei Wu

Court{'}s view generation is a novel but essential task for legal AI, aiming at improving the interpretability of judgment prediction results and enabling automatic legal document generation.

counterfactual Text Generation

From Easy to Hard: Learning Curricular Shape-aware Features for Robust Panoptic Scene Graph Generation

no code implementations12 Jul 2024 Hanrong Shi, Lin Li, Jun Xiao, Yueting Zhuang, Long Chen

Despite remarkable progress in PSG, almost all existing methods neglect the importance of shape-aware features, which inherently focus on the contours and boundaries of objects.

Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution

no code implementations16 Jun 2024 Cuixin Yang, Rongkang Dong, Jun Xiao, Cong Zhang, Kin-Man Lam, Fei Zhou, Guoping Qiu

As virtual and augmented reality applications gain popularity, omnidirectional image (ODI) super-resolution has become increasingly important.

ERP Image Super-Resolution

ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models

no code implementations16 Jun 2024 Kaifeng Gao, Jiaxin Shi, Hanwang Zhang, Chunping Wang, Jun Xiao

Inspired from the huge success of large language models (LLMs) and following GPT (generative pre-trained transformer), we bring causal (i. e., unidirectional) generation into VDMs, and use past frames as prompt to generate future frames.

Video Generation

$\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

no code implementations27 May 2024 Weiquan Wang, Jun Xiao, Chunping Wang, Wei Liu, Zhao Wang, Long Chen

Continuous diffusion models have demonstrated their effectiveness in addressing the inherent uncertainty and indeterminacy in monocular 3D human pose estimation (HPE).

Monocular 3D Human Pose Estimation Quantization

FreeTuner: Any Subject in Any Style with Training-free Diffusion

no code implementations23 May 2024 Youcan Xu, Zhen Wang, Jun Xiao, Wei Liu, Long Chen

With the advance of diffusion models, various personalized image generation methods have been proposed.

Disentanglement Image Generation +1

AudioScenic: Audio-Driven Video Scene Editing

no code implementations25 Apr 2024 Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang

AudioScenic exploits the inherent properties of audio, namely, audio magnitude and frequency, to guide the editing process, aiming to control the temporal dynamics and enhance the temporal consistency.

Neural Interaction Energy for Multi-Agent Trajectory Prediction

no code implementations25 Apr 2024 Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi Yang

In this study, we introduce a framework called Multi-Agent Trajectory prediction via neural interaction Energy (MATE).

Trajectory Prediction

Existence Is Chaos: Enhancing 3D Human Motion Prediction with Uncertainty Consideration

no code implementations21 Mar 2024 Zhihao Wang, Yulin Zhou, Ningyu Zhang, Xiaosong Yang, Jun Xiao, Zhao Wang

We believe our work could provide a novel perspective to consider the uncertainty quality for the general motion prediction task and encourage the studies in this field.

Decoder Human motion prediction +1

Distributionally Generative Augmentation for Fair Facial Attribute Classification

1 code implementation CVPR 2024 Fengda Zhang, Qianpei He, Kun Kuang, Jiashuo Liu, Long Chen, Chao Wu, Jun Xiao, Hanwang Zhang

This work proposes a novel, generation-based two-stage framework to train a fair FAC model on biased data without additional annotation.

Attribute Classification +2

Let's Rectify Step by Step: Improving Aspect-based Sentiment Analysis with Diffusion Models

1 code implementation23 Feb 2024 Shunyu Liu, Jie zhou, Qunxi Zhu, Qin Chen, Qingchun Bai, Jun Xiao, Liang He

Aspect-Based Sentiment Analysis (ABSA) stands as a crucial task in predicting the sentiment polarity associated with identified aspects within text.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +1

Towards Progressive Multi-Frequency Representation for Image Warping

1 code implementation CVPR 2024 Jun Xiao, Zihang Lyu, Cong Zhang, Yakun Ju, Changjian Shui, Kin-Man Lam

Image warping a classic task in computer vision aims to use geometric transformations to change the appearance of images.

Image Super-Resolution

DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism

no code implementations25 Nov 2023 Zhen Wang, Xinyun Jiang, Jun Xiao, Tao Chen, Long Chen

The denoising process involves the explicit predictions of edit operations and corresponding content words, refining reference captions through iterative step-wise editing.

Caption Generation Denoising +1

Compositional Zero-shot Learning via Progressive Language-based Observations

no code implementations23 Nov 2023 Lin Li, Guikun Chen, Jun Xiao, Long Chen

Compositional zero-shot learning aims to recognize unseen state-object compositions by leveraging known primitives (state and object) during training.

Compositional Zero-Shot Learning

Informative Data Mining for One-Shot Cross-Domain Semantic Segmentation

no code implementations ICCV 2023 Yuxi Wang, Jian Liang, Jun Xiao, Shuqi Mei, Yuran Yang, Zhaoxiang Zhang

One-shot domain adaptation methods attempt to overcome these challenges by transferring the pre-trained source model to the target domain using only one target data.

Domain Adaptation Semantic Segmentation +1

CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation

1 code implementation18 Sep 2023 Kexin Li, Zongxin Yang, Lei Chen, Yi Yang, Jun Xiao

However, existing methods exhibit two limitations: 1) they address video temporal features and audio-visual interactive features separately, disregarding the inherent spatial-temporal dependence of combined audio and video, and 2) they inadequately introduce audio constraints and object-level information during the decoding stage, resulting in segmentation outcomes that fail to comply with audio directives.

Video Segmentation Video Semantic Segmentation

Compositional Feature Augmentation for Unbiased Scene Graph Generation

1 code implementation ICCV 2023 Lin Li, Guikun Chen, Jun Xiao, Yi Yang, Chunping Wang, Long Chen

Specifically, we first decompose each relation triplet feature into two components: intrinsic feature and extrinsic feature, which correspond to the intrinsic characteristics and extrinsic contexts of a relation triplet, respectively.

Diversity Graph Generation +2

Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation

no code implementations30 Jul 2023 Wenqing Wang, Kaifeng Gao, Yawei Luo, Tao Jiang, Fei Gao, Jian Shao, Jianwen Sun, Jun Xiao

Video-based scene graph generation (VidSGG) is an approach that aims to represent video content in a dynamic graph by identifying visual entities and their relationships.

Graph Generation Missing Labels +2

Improved Neural Radiance Fields Using Pseudo-depth and Fusion

no code implementations27 Jul 2023 Jingliang Li, Qiang Zhou, Chaohui Yu, Zhengda Lu, Jun Xiao, Zhibin Wang, Fan Wang

To make the constructed volumes as close as possible to the surfaces of objects in the scene and the rendered depth more accurate, we propose to perform depth prediction and radiance field reconstruction simultaneously.

Depth Estimation Depth Prediction +1

Improving Reference-based Distinctive Image Captioning with Contrastive Rewards

no code implementations25 Jun 2023 Yangjun Mao, Jun Xiao, Dong Zhang, Meng Cao, Jian Shao, Yueting Zhuang, Long Chen

A recent DIC method proposes to generate distinctive captions by comparing the target image with a set of semantic-similar reference images, i. e., reference-based DIC (Ref-DIC).

Benchmarking Contrastive Learning +1

Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models

1 code implementation NeurIPS 2023 Lin Li, Jun Xiao, Guikun Chen, Jian Shao, Yueting Zhuang, Long Chen

To dynamically fuse different cues, we further introduce a chain-of-thought method that prompts LLMs to generate reasonable weights for different visual cues.

Relation

TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding

no code implementations19 May 2023 Chenchi Zhang, Jun Xiao, Lei Chen, Jian Shao, Long Chen

In this paper, we argue that their poor interpretability is attributed to the holistic prompt generation and inference process.

Sentence Visual Grounding

Generalized Universal Domain Adaptation with Generative Flow Networks

no code implementations8 May 2023 Didi Zhu, Yinchuan Li, Yunfeng Shao, Jianye Hao, Fei Wu, Kun Kuang, Jun Xiao, Chao Wu

We introduce a new problem in unsupervised domain adaptation, termed as Generalized Universal Domain Adaptation (GUDA), which aims to achieve precise prediction of all target labels including unknown categories.

Universal Domain Adaptation Unsupervised Domain Adaptation

Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection

1 code implementation1 Feb 2023 Kaifeng Gao, Long Chen, Hanwang Zhang, Jun Xiao, Qianru Sun

Without bells and whistles, our RePro achieves a new state-of-the-art performance on two VidVRD benchmarks of not only the base training object and predicate categories, but also the unseen ones.

Object Relation +1

Knowledge-guided Causal Intervention for Weakly-supervised Object Localization

1 code implementation3 Jan 2023 Feifei Shao, Yawei Luo, Fei Gao, Yi Yang, Jun Xiao

Previous weakly-supervised object localization (WSOL) methods aim to expand activation map discriminative areas to cover the whole objects, yet neglect two inherent challenges when relying solely on image-level labels.

Knowledge Distillation Object +1

SAViT: Structure-Aware Vision Transformer Pruning via Collaborative Optimization

1 code implementation NIPS 2022 Zheng Chuanyang, Zheyang Li, Kai Zhang, Zhi Yang, Wenming Tan, Jun Xiao, Ye Ren, ShiLiang Pu

In this paper, we introduce joint importance, which integrates essential structural-aware interactions between components for the first time, to perform collaborative pruning.

object-detection Object Detection

DS-MVSNet: Unsupervised Multi-view Stereo via Depth Synthesis

no code implementations13 Aug 2022 Jingliang Li, Zhengda Lu, Yiqun Wang, Ying Wang, Jun Xiao

To mine the information in probability volume, we creatively synthesize the source depths by splattering the probability volume and depth hypotheses to source views.

Deep Progressive Feature Aggregation Network for High Dynamic Range Imaging

no code implementations4 Aug 2022 Jun Xiao, Qian Ye, Tianshan Liu, Cong Zhang, Kin-Man Lam

The primary challenges are ghosting artifacts caused by object motion between low dynamic range images and distorted content in under and overexposed regions.

Vocal Bursts Intensity Prediction

Online Video Super-Resolution with Convolutional Kernel Bypass Graft

no code implementations4 Aug 2022 Jun Xiao, Xinyang Jiang, Ningxin Zheng, Huan Yang, Yifan Yang, Yuqing Yang, Dongsheng Li, Kin-Man Lam

Then, our proposed CKBG method enhances this lightweight base model by bypassing the original network with ``kernel grafts'', which are extra convolutional kernels containing the prior knowledge of external pretrained image SR models.

Transfer Learning Video Super-Resolution

Integrating Object-aware and Interaction-aware Knowledge for Weakly Supervised Scene Graph Generation

1 code implementation3 Aug 2022 Xingchen Li, Long Chen, Wenbo Ma, Yi Yang, Jun Xiao

However, we argue that most existing WSSGG works only focus on object-consistency, which means the grounded regions should have the same object category label as text entities.

Graph Generation Object +1

Rethinking the Evaluation of Unbiased Scene Graph Generation

no code implementations3 Aug 2022 Xingchen Li, Long Chen, Jian Shao, Shaoning Xiao, Songyang Zhang, Jun Xiao

Current Scene Graph Generation (SGG) methods tend to predict frequent predicate categories and fail to recognize rare ones due to the severe imbalanced distribution of predicates.

Diversity Graph Generation +1

Unified Normalization for Accelerating and Stabilizing Transformers

1 code implementation2 Aug 2022 Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, ShiLiang Pu

To tackle these issues, we propose Unified Normalization (UN), which can speed up the inference by being fused with other linear operations and achieve comparable performance on par with LN.

Rethinking the Reference-based Distinctive Image Captioning

1 code implementation22 Jul 2022 Yangjun Mao, Long Chen, Zhihong Jiang, Dong Zhang, Zhimeng Zhang, Jian Shao, Jun Xiao

Unfortunately, reference images used by existing Ref-DIC works are easy to distinguish: these reference images only resemble the target image at scene-level and have few common objects, such that a Ref-DIC model can trivially generate distinctive captions even without considering the reference images.

Attribute Benchmarking +1

Explicit Image Caption Editing

1 code implementation20 Jul 2022 Zhen Wang, Long Chen, Wenbo Ma, Guangxing Han, Yulei Niu, Jian Shao, Jun Xiao

Given an image and a reference caption, the image caption editing task aims to correct the misalignment errors and generate a refined caption.

Sentence

Rethinking Data Augmentation for Robust Visual Question Answering

1 code implementation18 Jul 2022 Long Chen, Yuhang Zheng, Jun Xiao

Unfortunately, to guarantee augmented samples have reasonable ground-truth answers, they manually design a set of heuristic rules for several question types, which extremely limits its generalization abilities.

Data Augmentation Knowledge Distillation +2

Learning Regularized Multi-Scale Feature Flow for High Dynamic Range Imaging

no code implementations6 Jul 2022 Qian Ye, Masanori Suganuma, Jun Xiao, Takayuki Okatani

Reconstructing ghosting-free high dynamic range (HDR) images of dynamic scenes from a set of multi-exposure images is a challenging task, especially with large object motion and occlusions, leading to visible artifacts using existing methods.

Vocal Bursts Intensity Prediction

The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation

1 code implementation CVPR 2022 Lin Li, Long Chen, Yifeng Huang, Zhimeng Zhang, Songyang Zhang, Jun Xiao

Then, in Pos-NSD, we use a clustering-based algorithm to divide all positive samples into multiple sets, and treat the samples in the noisiest set as noisy positive samples.

Graph Generation Out-of-Distribution Detection +2

A Knowledge-Enhanced Adversarial Model for Cross-lingual Structured Sentiment Analysis

no code implementations31 May 2022 Qi Zhang, Jie zhou, Qin Chen, Qingchun Bai, Jun Xiao, Liang He

Notably, we propose a Knowledge-Enhanced Adversarial Model (\texttt{KEAM}) with both implicit distributed and explicit structural knowledge to enhance the cross-lingual transfer.

Cross-Lingual Transfer Sentiment Analysis

Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives

no code implementations25 Apr 2022 Shaoning Xiao, Long Chen, Kaifeng Gao, Zhao Wang, Yi Yang, Zhimeng Zhang, Jun Xiao

From the view of feature, we break down the video into trajectories and first leverage trajectory feature in VideoQA to enhance the alignment between two modalities.

Question Answering Video Question Answering

Bidirectional Self-Training with Multiple Anisotropic Prototypes for Domain Adaptive Semantic Segmentation

1 code implementation16 Apr 2022 Yulei Lu, Yawei Luo, Li Zhang, Zheyang Li, Yi Yang, Jun Xiao

A thriving trend for domain adaptive segmentation endeavors to generate the high-quality pseudo labels for target domain and retrain the segmentor on them.

Pseudo Label Semantic Segmentation +2

DepthGAN: GAN-based Depth Generation of Indoor Scenes from Semantic Layouts

no code implementations22 Mar 2022 Yidi Li, Yiqun Wang, Zhengda Lu, Jun Xiao

Limited by the computational efficiency and accuracy, generating complex 3D scenes remains a challenging problem for existing generation networks.

Computational Efficiency

Active Learning for Point Cloud Semantic Segmentation via Spatial-Structural Diversity Reasoning

no code implementations25 Feb 2022 Feifei Shao, Yawei Luo, Ping Liu, Jie Chen, Yi Yang, Yulei Lu, Jun Xiao

To deploy SSDR-AL in a more practical scenario, we design a noise-aware iterative labeling strategy to confront the "noisy annotation" problem introduced by the previous "dominant labeling" strategy in superpoints.

Active Learning Diversity +1

Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs

1 code implementation CVPR 2022 Kaifeng Gao, Long Chen, Yulei Niu, Jian Shao, Jun Xiao

To this end, we propose a new classification-then-grounding framework for VidSGG, which can avoid all the three overlooked drawbacks.

Predicate Classification

Relational Graph Learning for Grounded Video Description Generation

no code implementations2 Dec 2021 Wenqiao Zhang, Xin Eric Wang, Siliang Tang, Haizhou Shi, Haocheng Shi, Jun Xiao, Yueting Zhuang, William Yang Wang

Such a setting can help explain the decisions of captioning models and prevents the model from hallucinating object words in its description.

Graph Learning Hallucination +2

Consensus Graph Representation Learning for Better Grounded Image Captioning

no code implementations2 Dec 2021 Wenqiao Zhang, Haochen Shi, Siliang Tang, Jun Xiao, Qiang Yu, Yueting Zhuang

The contemporary visual captioning models frequently hallucinate objects that are not actually in a scene, due to the visual misclassification or over-reliance on priors that resulting in the semantic inconsistency between the visual information and the target lexical words.

Graph Representation Learning Hallucination +1

Unified Group Fairness on Federated Learning

no code implementations9 Nov 2021 Fengda Zhang, Kun Kuang, Yuxuan Liu, Long Chen, Chao Wu, Fei Wu, Jiaxun Lu, Yunfeng Shao, Jun Xiao

We validate the advantages of the FMDA-M algorithm with various kinds of distribution shift settings in experiments, and the results show that FMDA-M algorithm outperforms the existing fair FL algorithms on unified group fairness.

Attribute Fairness +1

Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering

1 code implementation3 Oct 2021 Long Chen, Yuhang Zheng, Yulei Niu, Hanwang Zhang, Jun Xiao

Specifically, CSST is composed of two parts: Counterfactual Samples Synthesizing (CSS) and Counterfactual Samples Training (CST).

counterfactual Question Answering +1

Natural Language Video Localization with Learnable Moment Proposals

1 code implementation EMNLP 2021 Shaoning Xiao, Long Chen, Jian Shao, Yueting Zhuang, Jun Xiao

Given an untrimmed video and a natural language query, Natural Language Video Localization (NLVL) aims to identify the video moment described by the query.

Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation

no code implementations3 Sep 2021 Jiahui Li, Kun Kuang, Lin Li, Long Chen, Songyang Zhang, Jian Shao, Jun Xiao

Deep neural networks have demonstrated remarkable performance in many data-driven and prediction-oriented applications, and sometimes even perform better than humans.

Financial Analysis Medical Diagnosis

Video Relation Detection via Tracklet based Visual Transformer

1 code implementation19 Aug 2021 Kaifeng Gao, Long Chen, Yifeng Huang, Jun Xiao

Video Visual Relation Detection (VidVRD), has received significant attention of our community over recent years.

Decoder Relation +1

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

no code implementations1 Jun 2021 Jiahui Li, Kun Kuang, Baoxiang Wang, Furui Liu, Long Chen, Fei Wu, Jun Xiao

Specifically, Shapley Value and its desired properties are leveraged in deep MARL to credit any combinations of agents, which grants us the capability to estimate the individual credit for each agent.

counterfactual Multi-agent Reinforcement Learning +4

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

no code implementations26 May 2021 Feifei Shao, Long Chen, Jian Shao, Wei Ji, Shaoning Xiao, Lu Ye, Yueting Zhuang, Jun Xiao

With the success of deep neural networks in object detection, both WSOD and WSOL have received unprecedented attention.

Object object-detection +2

VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching

no code implementations12 May 2021 Chenchi Zhang, Wenbo Ma, Jun Xiao, Hanwang Zhang, Jian Shao, Yueting Zhuang, Long Chen

In this paper, we argue that these methods overlook an obvious \emph{mismatch} between the roles of proposals in the two stages: they generate proposals solely based on the detection confidence (i. e., query-agnostic), hoping that the proposals contain all instances mentioned in the text query (i. e., query-aware).

Image-text matching Referring Expression +2

Improving Weakly-supervised Object Localization via Causal Intervention

1 code implementation21 Apr 2021 Feifei Shao, Yawei Luo, Li Zhang, Lu Ye, Siliang Tang, Yi Yang, Jun Xiao

The recent emerged weakly supervised object localization (WSOL) methods can learn to localize an object in the image only using image-level labels.

Object Weakly-Supervised Object Localization

Efficient Ring-topology Decentralized Federated Learning with Deep Generative Models for Industrial Artificial Intelligent

no code implementations15 Apr 2021 Zhao Wang, Yifan Hu, Jun Xiao, Chao Wu

A novel ring FL topology as well as a map-reduce based synchronizing method are designed in the proposed RDFL to improve decentralized FL performance and bandwidth utilization.

Federated Learning

Human-like Controllable Image Captioning with Verb-specific Semantic Roles

1 code implementation CVPR 2021 Long Chen, Zhihong Jiang, Jun Xiao, Wei Liu

However, we argue that almost all existing objective control signals have overlooked two indispensable characteristics of an ideal control signal: 1) Event-compatible: all visual contents referred to in a single sentence should be compatible with the described activity.

Caption Generation controllable image captioning +3

Boundary Proposal Network for Two-Stage Natural Language Video Localization

no code implementations15 Mar 2021 Shaoning Xiao, Long Chen, Songyang Zhang, Wei Ji, Jian Shao, Lu Ye, Jun Xiao

State-of-the-art NLVL methods are almost in one-stage fashion, which can be typically grouped into two categories: 1) anchor-based approach: it first pre-defines a series of video segment candidates (e. g., by sliding window), and then does classification for each candidate; 2) anchor-free approach: it directly predicts the probabilities for each video frame as a boundary or intermediate frame inside the positive segment.

Vocal Bursts Valence Prediction

Kinetic Energy Distribution of Fragments for Thermal Neutron-Induced $^{235}$U and $^{239}$Pu Fission Reactions

no code implementations24 Dec 2020 Xiaojun Sun, Haiyuan Peng, Liying Xie, Kai Zhang, Yan Liang, Yinlu Han, Nengchuan Su, Jie Yan, Jun Xiao, Junjie Sun

(2) Every complementary pair of the primary fission fragments is approximatively described as two ellipsoids with large deformation at scission moment.

Nuclear Theory

ROBY: Evaluating the Robustness of a Deep Model by its Decision Boundaries

no code implementations18 Dec 2020 Jinyin Chen, Zhen Wang, Haibin Zheng, Jun Xiao, Zhaoyan Ming

This work proposes a generic evaluation metric ROBY, a novel attack-independent robustness measure based on the model's decision boundaries.

GFL: A Decentralized Federated Learning Framework Based On Blockchain

no code implementations21 Oct 2020 Yifan Hu, YuHang Zhou, Jun Xiao, Chao Wu

Federated learning(FL) is a rapidly growing field and many centralized and decentralized FL frameworks have been proposed.

Data Poisoning Federated Learning

Federated Unsupervised Representation Learning

no code implementations18 Oct 2020 Fengda Zhang, Kun Kuang, Zhaoyang You, Tao Shen, Jun Xiao, Yin Zhang, Chao Wu, Yueting Zhuang, Xiaolin Li

FURL poses two new challenges: (1) data distribution shift (Non-IID distribution) among clients would make local models focus on different categories, leading to the inconsistency of representation spaces.

Federated Learning Representation Learning

Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding

1 code implementation3 Sep 2020 Long Chen, Wenbo Ma, Jun Xiao, Hanwang Zhang, Shih-Fu Chang

The prevailing framework for solving referring expression grounding is based on a two-stage process: 1) detecting proposals with an object detector and 2) grounding the referent to one of the proposals.

Referring Expression Vocal Bursts Valence Prediction

Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

no code implementations11 Aug 2020 Jiacheng Li, Siliang Tang, Juncheng Li, Jun Xiao, Fei Wu, ShiLiang Pu, Yueting Zhuang

In this paper, we focus on enhancing the generalization ability of the VIST model by considering the few-shot setting.

Meta-Learning Visual Storytelling

Accurate Lung Nodules Segmentation with Detailed Representation Transfer and Soft Mask Supervision

no code implementations29 Jul 2020 Changwei Wang, Rongtao Xu, Shibiao Xu, Weiliang Meng, Jun Xiao, Xiaopeng Zhang

Then, a novel Network with detailed representation transfer and Soft Mask supervision (DSNet) is proposed to process the input low-resolution images of lung nodules into high-quality segmentation results.

Computed Tomography (CT) Lesion Segmentation +3

Hierarchical Fashion Graph Network for Personalized Outfit Recommendation

1 code implementation26 May 2020 Xingchen Li, Xiang Wang, Xiangnan He, Long Chen, Jun Xiao, Tat-Seng Chua

Fashion outfit recommendation has attracted increasing attentions from online shopping services and fashion communities. Distinct from other scenarios (e. g., social networking or content sharing) which recommend a single item (e. g., a friend or picture) to a user, outfit recommendation predicts user preference on a set of well-matched fashion items. Hence, performing high-quality personalized outfit recommendation should satisfy two requirements -- 1) the nice compatibility of fashion items and 2) the consistence with user preference.

Counterfactual Samples Synthesizing for Robust Visual Question Answering

2 code implementations CVPR 2020 Long Chen, Xin Yan, Jun Xiao, Hanwang Zhang, ShiLiang Pu, Yueting Zhuang

To reduce the language biases, several recent works introduce an auxiliary question-only model to regularize the training of targeted VQA model, and achieve dominating performance on VQA-CP.

 Ranked #1 on Visual Question Answering (VQA) on VQA-CP (using extra training data)

counterfactual Question Answering +1

Evaluation Framework For Large-scale Federated Learning

1 code implementation3 Mar 2020 Lifeng Liu, Fengda Zhang, Jun Xiao, Chao Wu

Federated learning is proposed as a machine learning setting to enable distributed edge devices, such as mobile phones, to collaboratively learn a shared prediction model while keeping all the training data on device, which can not only take full advantage of data distributed across millions of nodes to train a good model but also protect data privacy.

Federated Learning

Reinforcement-Learning based Portfolio Management with Augmented Asset Movement Prediction States

1 code implementation9 Feb 2020 Yunan Ye, Hengzhi Pei, Boxin Wang, Pin-Yu Chen, Yada Zhu, Jun Xiao, Bo Li

Our framework aims to address two unique challenges in financial PM: (1) data heterogeneity -- the collected information for each asset is usually diverse, noisy and imbalanced (e. g., news articles); and (2) environment uncertainty -- the financial market is versatile and non-stationary.

Management reinforcement-learning +1

DEBUG: A Dense Bottom-Up Grounding Approach for Natural Language Video Localization

no code implementations IJCNLP 2019 Chujie Lu, Long Chen, Chilie Tan, Xiaolin Li, Jun Xiao

In this paper, we focus on natural language video localization: localizing (ie, grounding) a natural language description in a long and untrimmed video sequence.

Video Dialog via Progressive Inference and Cross-Transformer

no code implementations IJCNLP 2019 Weike Jin, Zhou Zhao, Mao Gu, Jun Xiao, Furu Wei, Yueting Zhuang

Video dialog is a new and challenging task, which requires the agent to answer questions combining video information with dialog history.

Answer Generation Question Answering +4

Weak Supervision Enhanced Generative Network for Question Generation

no code implementations1 Jul 2019 Yutong Wang, Jiyuan Zheng, Qijiong Liu, Zhou Zhao, Jun Xiao, Yueting Zhuang

More specifically, we devise a discriminator, Relation Guider, to capture the relations between the whole passage and the associated answer and then the Multi-Interaction mechanism is deployed to transfer the knowledge dynamically for our question generation system.

Decoder Question Answering +2

Galaxy Learning -- A Position Paper

no code implementations22 Apr 2019 Chao Wu, Jun Xiao, Gang Huang, Fei Wu

Model training, as well as the communication, is achieved with blockchain and its smart contracts.

BIG-bench Machine Learning Position

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

no code implementations ICCV 2019 Long Chen, Hanwang Zhang, Jun Xiao, Xiangnan He, ShiLiang Pu, Shih-Fu Chang

CMAT is a multi-agent policy gradient method that frames objects as cooperative agents, and then directly maximizes a graph-level metric as the reward.

counterfactual Graph Generation +2

CIAN: Cross-Image Affinity Net for Weakly Supervised Semantic Segmentation

1 code implementation27 Nov 2018 Junsong Fan, Zhao-Xiang Zhang, Tieniu Tan, Chunfeng Song, Jun Xiao

Weakly supervised semantic segmentation with only image-level labels saves large human effort to annotate pixel-level labels.

Segmentation Weakly supervised segmentation +2

Textually Guided Ranking Network for Attentional Image Retweet Modeling

no code implementations24 Oct 2018 Zhou Zhao, Hanbing Zhan, Lingtao Meng, Jun Xiao, Jun Yu, Min Yang, Fei Wu, Deng Cai

In this paper, we study the problem of image retweet prediction in social media, which predicts the image sharing behavior that the user reposts the image tweets from their followees.

Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks

1 code implementation CVPR 2018 Long Chen, Hanwang Zhang, Jun Xiao, Wei Liu, Shih-Fu Chang

We propose a novel framework called Semantics-Preserving Adversarial Embedding Network (SP-AEN) for zero-shot visual recognition (ZSL), where test images and their classes are both unseen during training.

General Classification Zero-Shot Learning

Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks

7 code implementations15 Aug 2017 Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, Tat-Seng Chua

Factorization Machines (FMs) are a supervised learning approach that enhances the linear regression model by incorporating the second-order feature interactions.

regression

Graph-Theoretic Spatiotemporal Context Modeling for Video Saliency Detection

no code implementations25 Jul 2017 Lina Wei, Fangfang Wang, Xi Li, Fei Wu, Jun Xiao

As a result, a key issue in video saliency detection is how to effectively capture the intrinsical properties of atomic video structures as well as their associated contextual interactions along the spatial and temporal dimensions.

Video Saliency Detection

Video Question Answering via Attribute-Augmented Attention Network Learning

no code implementations20 Jul 2017 Yunan Ye, Zhou Zhao, Yimeng Li, Long Chen, Jun Xiao, Yueting Zhuang

Video Question Answering is a challenging problem in visual information retrieval, which provides the answer to the referenced video content according to the question.

Attribute Information Retrieval +6

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

2 code implementations CVPR 2017 Long Chen, Hanwang Zhang, Jun Xiao, Liqiang Nie, Jian Shao, Wei Liu, Tat-Seng Chua

Existing visual attention models are generally spatial, i. e., the attention is modeled as spatial probabilities that re-weight the last conv-layer feature map of a CNN encoding an input image.

Image Captioning Sentence

Metric Learning Driven Multi-Task Structured Output Optimization for Robust Keypoint Tracking

no code implementations4 Dec 2014 Liming Zhao, Xi Li, Jun Xiao, Fei Wu, Yueting Zhuang

As an important and challenging problem in computer vision and graphics, keypoint-based object tracking is typically formulated in a spatio-temporal statistical learning framework.

Metric Learning Object Tracking

Cannot find the paper you are looking for? You can Submit a new open access paper.