no code implementations • 2 May 2024 • Peng Chu, Jiang Wang, Andre Abrantes
The development of Audio Description (AD) has been a pivotal step forward in making video content more accessible and inclusive.
3 code implementations • 9 Mar 2024 • Chengjie Zhang, Jiang Wang, He Kong
Asynchronous microphone array calibration is a prerequisite for many audition robot applications.
no code implementations • CVPR 2024 • Zichen Miao, Jiang Wang, Ze Wang, Zhengyuan Yang, Lijuan Wang, Qiang Qiu, Zicheng Liu
We also show the effectiveness of our RL fine-tuning framework on enhancing the diversity of image generation with different types of diffusion models including class-conditional models and text-conditional models e. g. StableDiffusion.
no code implementations • 7 Jun 2023 • Andre Abrantes, Jiang Wang, Peng Chu, Quanzeng You, Zicheng Liu
We introduce a novel framework called RefineVIS for Video Instance Segmentation (VIS) that achieves good object association between frames and accurate segmentation masks by iteratively refining the representations using sequence context.
Ranked #5 on Video Instance Segmentation on YouTube-VIS 2021 (using extra training data)
1 code implementation • CVPR 2023 • Chung-Ching Lin, Jiang Wang, Kun Luo, Kevin Lin, Linjie Li, Lijuan Wang, Zicheng Liu
The most recent efforts in video matting have focused on eliminating trimap dependency since trimap annotations are expensive and trimap-based methods are less adaptable for real-time applications.
1 code implementation • CVPR 2023 • Ze Wang, Jiang Wang, Zicheng Liu, Qiang Qiu
In this paper, we show that a binary latent space can be explored for compact yet expressive image representations.
no code implementations • 2 Feb 2023 • Ze Wang, Jiang Wang, Zicheng Liu, Qiang Qiu
In the proposed framework, we model energy estimation and data restoration as the forward and backward passes of a single network without any auxiliary components, e. g., an extra decoder.
no code implementations • 14 Jun 2022 • Quanzeng You, Jiang Wang, Peng Chu, Andre Abrantes, Zicheng Liu
We propose a consistent end-to-end video instance segmentation framework with Inter-Frame Recurrent Attention to model both the temporal instance consistency for adjacent frames and the global temporal context.
1 code implementation • 26 Mar 2022 • Jiang Wang, Filip Ilievski, Pedro Szekely, Ke-Thia Yao
Experiments on legacy benchmarks and a new large benchmark, DWD, show that augmenting the knowledge graph with quantities and years is beneficial for predicting both entities and numbers, as KGA outperforms the vanilla models and other relevant baselines.
no code implementations • CVPR 2023 • Shiqi Lin, Zhizheng Zhang, Zhipeng Huang, Yan Lu, Cuiling Lan, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Amey Parulkar, Viraj Navkal, Zhibo Chen
Improving the generalization ability of Deep Neural Networks (DNNs) is critical for their practical uses, which has been a longstanding challenge.
no code implementations • 25 Jan 2022 • Yinhan Wang, Jiang Wang, Shipeng Fan
An active defense against an incoming missile requires information of it, including a guidance law parameter and a first-order lateral time constant.
no code implementations • CVPR 2022 • Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Zheng-Jun Zha
In this paper, to address more practical scenarios, we propose a new task, Lifelong Unsupervised Domain Adaptive (LUDA) person ReID.
Domain Adaptive Person Re-Identification Knowledge Distillation +4
no code implementations • 30 Nov 2021 • Xiaotian Han, Quanzeng You, Chunyu Wang, Zhizheng Zhang, Peng Chu, Houdong Hu, Jiang Wang, Zicheng Liu
This dataset provides a more reliable benchmark of multi-camera, multi-object tracking systems in cluttered and crowded environments.
Ranked #2 on Object Tracking on MMPTRACK
no code implementations • 1 Apr 2021 • Peng Chu, Jiang Wang, Quanzeng You, Haibin Ling, Zicheng Liu
TransMOT effectively models the interactions of a large number of objects by arranging the trajectories of the tracked objects as a set of sparse weighted graphs, and constructing a spatial graph transformer encoder layer, a temporal transformer encoder layer, and a spatial graph transformer decoder layer based on the graphs.
Ranked #2 on Multi-Object Tracking on 2DMOT15 (using extra training data)
no code implementations • 9 Mar 2021 • Zichao Liu, Jiang Wang, Shaoming He, Hyo-Sang Shin, Antonios Tsourdos
This paper investigates the problem of impact-time-control and proposes a learning-based computational guidance algorithm to solve this problem.
1 code implementation • 22 Jul 2020 • Brooke E. Husic, Nicholas E. Charron, Dominik Lemm, Jiang Wang, Adrià Pérez, Maciej Majewski, Andreas Krämer, Yaoyi Chen, Simon Olsson, Gianni de Fabritiis, Frank Noé, Cecilia Clementi
5, 755 (2019)] demonstrated that the existence of such a variational limit enables the use of a supervised machine learning framework to generate a coarse-grained force field, which can then be used for simulation in the coarse-grained space.
no code implementations • 4 May 2020 • Jiang Wang, Stefan Chmiela, Klaus-Robert Müller, Frank Noè, Cecilia Clementi
Using ensemble learning and stratified sampling, we propose a 2-layer training scheme that enables GDML to learn an effective coarse-grained model.
no code implementations • 30 Dec 2019 • Xiaojie Jin, Jiang Wang, Joshua Slocum, Ming-Hsuan Yang, Shengyang Dai, Shuicheng Yan, Jiashi Feng
In this paper, we propose the resource constrained differentiable architecture search (RC-DARTS) method to learn architectures that are significantly smaller and faster while achieving comparable accuracy.
6 code implementations • CVPR 2020 • Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le
We show that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger.
Ranked #4 on Domain Generalization on VizWiz-Classification
no code implementations • 4 Dec 2018 • Jiang Wang, Simon Olsson, Christoph Wehmeyer, Adria Perez, Nicholas E. Charron, Gianni de Fabritiis, Frank Noe, Cecilia Clementi
We show that CGnets can capture all-atom explicit-solvent free energy surfaces with models using only a few coarse-grained beads and no solvent, while classical coarse-graining methods fail to capture crucial features of the free energy surface.
no code implementations • ICCV 2019 • JIyang Gao, Jiang Wang, Shengyang Dai, Li-Jia Li, Ram Nevatia
Comparing to standard Faster RCNN, it contains three highlights: an ensemble of two classification heads and a distillation head to avoid overfitting on noisy labels and improve the mining precision, masking the negative sample loss in box predictor to avoid the harm of false negative labels, and training box regression head only on seed annotations to eliminate the harm from inaccurate boundaries of mined bounding boxes.
no code implementations • ICLR 2018 • Xu Chen, Jiang Wang, Hao Ge
This formulation shows the connection between the standard GAN training process and the primal-dual subgradient methods for convex optimization.
no code implementations • CVPR 2017 • Yin Cui, Feng Zhou, Jiang Wang, Xiao Liu, Yuanqing Lin, Serge Belongie
We demonstrate how to approximate kernels such as Gaussian RBF up to a given order using compact explicit feature maps in a parameter-free manner.
no code implementations • 20 May 2016 • Xiao Liu, Jiang Wang, Shilei Wen, Errui Ding, Yuanqing Lin
By designing a novel reward strategy, we are able to learn to locate regions that are spatially and semantically distinctive with reinforcement learning algorithm.
1 code implementation • CVPR 2016 • Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, Wei Xu
While deep convolutional neural networks (CNNs) have shown a great success in single-label image classification, it is important to note that real world images generally contain multiple labels, which could correspond to different objects, scenes, actions and attributes in an image.
no code implementations • 22 Mar 2016 • Xiao Liu, Tian Xia, Jiang Wang, Yi Yang, Feng Zhou, Yuanqing Lin
Fine-grained recognition is challenging due to its subtle local inter-class differences versus large intra-class variations such as poses.
no code implementations • ICCV 2015 • Chunshui Cao, Xian-Ming Liu, Yi Yang, Yinan Yu, Jiang Wang, Zilei Wang, Yongzhen Huang, Liang Wang, Chang Huang, Wei Xu, Deva Ramanan, Thomas S. Huang
While feedforward deep convolutional neural networks (CNNs) have been a great success in computer vision, it is important to remember that the human visual contex contains generally more feedback connections than foward connections.
no code implementations • 18 Nov 2015 • Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, Ram Nevatia
ABC-CNN determines an attention map for an image-question pair by convolving the image feature map with configurable convolutional kernels derived from the question's semantics.
no code implementations • CVPR 2016 • Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, Alan L. Yuille
We adapt a state-of-the-art semantic image segmentation model, which we jointly train with multi-scale input images and the attention model.
no code implementations • CVPR 2016 • Haonan Yu, Jiang Wang, Zhiheng Huang, Yi Yang, Wei Xu
The sentence generator produces one simple short sentence that describes a specific short video interval.
1 code implementation • ICCV 2015 • Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille
In particular, we propose a transposed weight sharing scheme, which not only improves performance on image captioning, but also makes the model more suitable for the novel concept learning task.
2 code implementations • 20 Dec 2014 • Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan Yuille
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions.
no code implementations • 4 Oct 2014 • Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Alan L. Yuille
In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images.
no code implementations • CVPR 2014 • Jiang wang, Xiaohan Nie, Yin Xia, Ying Wu, Song-Chun Zhu
We present a novel multiview spatio-temporal AND-OR graph (MST-AOG) representation for cross-view action recognition, i. e., the recognition is performed on the video from an unknown and unseen view.
6 code implementations • CVPR 2014 • Jiang Wang, Yang song, Thomas Leung, Chuck Rosenberg, Jinbin Wang, James Philbin, Bo Chen, Ying Wu
This paper proposes a deep ranking model that employs deep learning techniques to learn similarity metric directly from images. It has higher learning capability than models based on hand-crafted features.
no code implementations • CVPR 2013 • Shengfeng He, Qingxiong Yang, Rynson W. H. Lau, Jiang Wang, Ming-Hsuan Yang
A robust tracking framework based on the locality sensitive histograms is proposed, which consists of two main components: a new feature for tracking that is robust to illumination changes and a novel multi-region tracking algorithm that runs in realtime even with hundreds of regions.