Search Results for author: Lin Ma

Found 101 papers, 50 papers with code

DiP: Learning Discriminative Implicit Parts for Person Re-Identification

1 code implementation24 Dec 2022 Dengjie Li, Siyu Chen, Yujie Zhong, Lin Ma

In person re-identification (ReID) tasks, many works explore the learning of part features to improve the performance over global image features.

Person Re-Identification

Multiple Object Tracking Challenge Technical Report for Team MT_IoT

1 code implementation7 Dec 2022 Feng Yan, Zhiheng Li, Weixin Luo, Zequn Jie, Fan Liang, Xiaolin Wei, Lin Ma

This is a brief technical report of our proposed method for Multiple-Object Tracking (MOT) Challenge in Complex Environments.

Ranked #2 on Multi-Object Tracking on DanceTrack (using extra training data)

Human Detection Multi-Object Tracking +1

AeDet: Azimuth-invariant Multi-view 3D Object Detection

1 code implementation22 Nov 2022 Chengjian Feng, Zequn Jie, Yujie Zhong, Xiangxiang Chu, Lin Ma

However, the typical convolution ignores the radial symmetry of the BEV features and increases the difficulty of the detector optimization.

3D Object Detection Depth Estimation +2

HAM: Hierarchical Attention Model with High Performance for 3D Visual Grounding

no code implementations22 Oct 2022 Jiaming Chen, Weixin Luo, Xiaolin Wei, Lin Ma, Wei zhang

To simplify the pipeline, we carefully investigate 3D visual grounding and summarize three fundamental problems about how to develop an end-to-end model with high performance for this task.

Visual Grounding

Oscillatory cooperation prevalence emerges from misperception

no code implementations17 Oct 2022 Jing Zhang, Zhao Li, Jiqiang Zhang, Lin Ma, Guozhong Zheng, Li Chen

Here we show that oscillatory behaviors naturally emerge if incomplete information is incorporated into the cooperation evolution of a non-Markov model.

Planning Assembly Sequence with Graph Transformer

1 code implementation11 Oct 2022 Lin Ma, Jiangtao Gong, Hao Xu, Hao Chen, Hao Zhao, Wenbing Huang, Guyue Zhou

In this paper, we present a graph-transformer based framework for the ASP problem which is trained and demonstrated on a self-collected ASP database.

Contrastive Video-Language Learning with Fine-grained Frame Sampling

no code implementations10 Oct 2022 Zixu Wang, Yujie Zhong, Yishu Miao, Lin Ma, Lucia Specia

However, even in paired video-text segments, only a subset of the frames are semantically relevant to the corresponding text, with the remainder representing noise; where the ratio of noisy frames is higher for longer videos.

Question Answering Representation Learning +3

Contextual Modeling for 3D Dense Captioning on Point Clouds

no code implementations8 Oct 2022 Yufeng Zhong, Long Xu, Jiebo Luo, Lin Ma

With such global and local contextual modeling strategies, our proposed model can effectively characterize the object representations and contextual information and thereby generate comprehensive and detailed descriptions of the located objects.

3D dense captioning Dense Captioning

SoccerNet 2022 Challenges Results

7 code implementations5 Oct 2022 Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li

The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.

Action Spotting Camera Calibration +3

Reweighting Clicks with Dwell Time in Recommendation

no code implementations19 Sep 2022 Ruobing Xie, Lin Ma, Shaoliang Zhang, Feng Xia, Leyu Lin

Precisely, we first define a new behavior named valid read, which helps to select high-quality click instances for different users and items via dwell time.

Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation

1 code implementation16 Sep 2022 Jinlong Li, Zequn Jie, Xu Wang, Xiaolin Wei, Lin Ma

To tackle with this issue, this paper proposes an Expansion and Shrinkage scheme based on the offset learning in the deformable convolution, to sequentially improve the recall and precision of the located object in the two respective stages.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

Weakly Supervised Semantic Segmentation via Progressive Patch Learning

1 code implementation16 Sep 2022 Jinlong Li, Zequn Jie, Xu Wang, Yu Zhou, Xiaolin Wei, Lin Ma

"Progressive Patch Learning" further extends the feature destruction and patch learning to multi-level granularities in a progressive manner.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

1 code implementation7 Sep 2022 Yang Jiao, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques.

3D Object Detection Autonomous Driving +1

MT-Net Submission to the Waymo 3D Detection Leaderboard

no code implementations11 Jul 2022 Shaoxiang Chen, Zequn Jie, Xiaolin Wei, Lin Ma

In this technical report, we introduce our submission to the Waymo 3D Detection leaderboard.

3D Object Detection

Cycle-Interactive Generative Adversarial Network for Robust Unsupervised Low-Light Enhancement

no code implementations3 Jul 2022 Zhangkai Ni, Wenhan Yang, Hanli Wang, Shiqi Wang, Lin Ma, Sam Kwong

Getting rid of the fundamental limitations in fitting to the paired training data, recent unsupervised low-light enhancement methods excel in adjusting illumination and contrast of images.

Low-Light Image Enhancement

PromptDet: Towards Open-vocabulary Detection using Uncurated Images

2 code implementations30 Mar 2022 Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.

Language Modelling

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

1 code implementation10 Mar 2022 Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart.

3D dense captioning Dense Captioning

A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

no code implementations10 Mar 2022 Xiaohan Lan, Yitian Yuan, Xin Wang, Long Chen, Zhi Wang, Lin Ma, Wenwu Zhu

New benchmarking results indicate that our proposed evaluation protocols can better monitor the research progress.

Benchmarking

Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding

no code implementations10 Mar 2022 Yang Jiao, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

However, exploring relationships among these suspected objects in the one-stage visual grounding paradigm is non-trivial due to two core problems: (1) no object proposals are available as the basis on which to select suspected objects and perform relationship modeling; (2) compared with those irrelevant to the text query, suspected objects are more confusing, as they may share similar semantics, be entangled with certain relationships, etc, and thereby more easily mislead the model's prediction.

Visual Grounding

Probabilistic fair behaviors spark its boost in the Ultimatum Game: the strength of good Samaritans

no code implementations12 Feb 2022 Guozhong Zheng, Jiqiang Zhang, Rizhou Liang, Lin Ma, Li Chen

Behavioral experiments on the Ultimatum Game have shown that we human beings have remarkable preference in fair play, contradicting the predictions by the game theory.

Fairness

Syntax Customized Video Captioning by Imitating Exemplar Sentences

1 code implementation2 Dec 2021 Yitian Yuan, Lin Ma, Wenwu Zhu

Enhancing the diversity of sentences to describe video contents is an important problem arising in recent video captioning research.

Video Captioning

Controllable Video Captioning with an Exemplar Sentence

1 code implementation2 Dec 2021 Yitian Yuan, Lin Ma, Jingwen Wang, Wenwu Zhu

In this paper, we investigate a novel and challenging task, namely controllable video captioning with an exemplar sentence.

Video Captioning

Sensoring and Application of Multimodal Data for the Detection of Freezing of Gait in Parkinson's Disease

no code implementations9 Oct 2021 Wei zhang, Debin Huang, Hantao Li, Lipeng Wang, Yanzhao Wei, Kang Pan, Lin Ma, Huanhuan Feng, Jing Pan, Yuzhu Guo

The accurate and reliable detection or prediction of freezing of gaits (FOG) is important for fall prevention in Parkinson's Disease (PD) and studying the physiological transitions during the occurrence of FOG.

Electroencephalogram (EEG)

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

1 code implementation9 Oct 2021 Yang Jiao, Zequn Jie, Weixin Luo, Jingjing Chen, Yu-Gang Jiang, Xiaolin Wei, Lin Ma

Referring Image Segmentation (RIS) aims at segmenting the target object from an image referred by one given natural language expression.

Image Segmentation Retrieval +1

Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection

1 code implementation4 Aug 2021 Chen Zhang, Runmin Cong, Qinwei Lin, Lin Ma, Feng Li, Yao Zhao, Sam Kwong

For the cross-modality interaction in feature encoder, existing methods either indiscriminately treat RGB and depth modalities, or only habitually utilize depth cues as auxiliary information of the RGB branch.

object-detection RGB-D Salient Object Detection +1

Discriminative-Generative Representation Learning for One-Class Anomaly Detection

no code implementations27 Jul 2021 Xuan Xia, Xizhou Pan, Xing He, Jingfei Zhang, Ning Ding, Lin Ma

As a kind of generative self-supervised learning methods, generative adversarial nets have been widely studied in the field of anomaly detection.

Anomaly Detection Representation Learning +1

Sentence-level Online Handwritten Chinese Character Recognition

no code implementations4 Jul 2021 Yunxin Li, Qian Yang, Qingcai Chen, Lin Ma, Baotian Hu, Xiaolong Wang, Yuxin Ding

Single online handwritten Chinese character recognition~(single OLHCCR) has achieved prominent performance.

Word Embeddings

GlyphCRM: Bidirectional Encoder Representation for Chinese Character with its Glyph

no code implementations1 Jul 2021 Yunxin Li, Yu Zhao, Baotian Hu, Qingcai Chen, Yang Xiang, Xiaolong Wang, Yuxin Ding, Lin Ma

Previous works indicate that the glyph of Chinese characters contains rich semantic information and has the potential to enhance the representation of Chinese characters.

Beyond Monocular Deraining: Parallel Stereo Deraining Network Via Semantic Prior

no code implementations9 May 2021 Kaihao Zhang, Wenhan Luo, Yanjiang Yu, Wenqi Ren, Fang Zhao, Changsheng Li, Lin Ma, Wei Liu, Hongdong Li

We first use a coarse deraining network to reduce the rain streaks on the input images, and then adopt a pre-trained semantic segmentation network to extract semantic features from the coarse derained image.

Benchmarking Rain Removal +1

Relation-aware Instance Refinement for Weakly Supervised Visual Grounding

1 code implementation CVPR 2021 Yongfei Liu, Bo Wan, Lin Ma, Xuming He

Visual grounding, which aims to build a correspondence between visual objects and their language entities, plays a key role in cross-modal scene understanding.

Scene Understanding Visual Grounding +1

Take More Positives: An Empirical Study of Contrastive Learing in Unsupervised Person Re-Identification

no code implementations12 Jan 2021 Xuanyu He, Wei zhang, Ran Song, Qian Zhang, Xiangyuan Lan, Lin Ma

By studying two unsupervised person re-ID methods in a cross-method way, we point out a hard negative problem is handled implicitly by their designs of data augmentations and PK sampler respectively.

Contrastive Learning Unsupervised Person Re-Identification

Similarity Reasoning and Filtration for Image-Text Matching

1 code implementation5 Jan 2021 Haiwen Diao, Ying Zhang, Lin Ma, Huchuan Lu

Image-text matching plays a critical role in bridging the vision and language, and great progress has been made by exploiting the global alignment between image and sentence, or local alignments between regions and words.

Cross-Modal Retrieval Image Retrieval +1

Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network

1 code implementation30 Dec 2020 Zhangkai Ni, Wenhan Yang, Shiqi Wang, Lin Ma, Sam Kwong

In this paper, we present an unsupervised image enhancement generative adversarial network (UEGAN), which learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner, rather than learning on a large number of paired images.

Image Enhancement L2 Regularization

Unpaired Image Enhancement with Quality-Attention Generative Adversarial Network

no code implementations30 Dec 2020 Zhangkai Ni, Wenhan Yang, Shiqi Wang, Lin Ma, Sam Kwong

The key novelty of the proposed QAGAN lies in the injected QAM for the generator such that it learns domain-relevant quality attention directly from the two domains.

Image Enhancement

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

3 code implementations18 Nov 2020 Wen Liu, Zhixin Piao, Zhi Tu, Wenhan Luo, Lin Ma, Shenghua Gao

Also, we build a new dataset, namely iPER dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis.

Denoising Image Generation +1

Intrinsic Relationship Reasoning for Small Object Detection

no code implementations2 Sep 2020 Kui Fu, Jia Li, Lin Ma, Kai Mu, Yonghong Tian

In this paper, we propose a novel context reasoning approach for small object detection which models and infers the intrinsic semantic and spatial layout relationships between objects.

object-detection Small Object Detection

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

1 code implementation ECCV 2020 Haoran Wang, Ying Zhang, Zhong Ji, Yanwei Pang, Lin Ma

In this paper, we propose a Consensus-aware Visual-Semantic Embedding (CVSE) model to incorporate the consensus information, namely the commonsense knowledge shared between both modalities, into image-text matching.

Image Captioning Retrieval +2

Deblurring by Realistic Blurring

1 code implementation CVPR 2020 Kaihao Zhang, Wenhan Luo, Yiran Zhong, Lin Ma, Bjorn Stenger, Wei Liu, Hongdong Li

To address this problem, we propose a new method which combines two GAN models, i. e., a learning-to-Blur GAN (BGAN) and learning-to-DeBlur GAN (DBGAN), in order to learn a better model for image deblurring by primarily learning how to blur images.

Deblurring Image Deblurring

Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos

no code implementations16 Mar 2020 Yijun Song, Jingwen Wang, Lin Ma, Zhou Yu, Jun Yu

The task of temporally grounding textual queries in videos is to localize one video segment that semantically corresponds to the given query.

Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video

no code implementations25 Jan 2020 Zhenfang Chen, Lin Ma, Wenhan Luo, Peng Tang, Kwan-Yee K. Wong

In this paper, we study the problem of weakly-supervised temporal grounding of sentence in video.

Fine-grained Image-to-Image Transformation towards Visual Recognition

no code implementations CVPR 2020 Wei Xiong, Yutong He, Yixuan Zhang, Wenhan Luo, Lin Ma, Jiebo Luo

In this paper, we aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image, which can thereby benefit the subsequent fine-grained image recognition and few-shot learning tasks.

Few-Shot Learning Fine-Grained Image Recognition

Coupled Network for Robust Pedestrian Detection with Gated Multi-Layer Feature Extraction and Deformable Occlusion Handling

no code implementations18 Dec 2019 Tianrui Liu, Wenhan Luo, Lin Ma, Jun-Jie Huang, Tania Stathaki, Tianhong Dai

Ablation studies have validated the effectiveness of both the proposed gated multi-layer feature extraction sub-network and the deformable occlusion handling sub-network.

Occlusion Handling Pedestrian Detection

LaFIn: Generative Landmark Guided Face Inpainting

1 code implementation26 Nov 2019 Yang Yang, Xiaojie Guo, Jiayi Ma, Lin Ma, Haibin Ling

It is challenging to inpaint face images in the wild, due to the large variation of appearance, such as different poses, expressions and occlusions.

Facial Inpainting

EDIT: Exemplar-Domain Aware Image-to-Image Translation

1 code implementation24 Nov 2019 Yuanbin Fu, Jiayi Ma, Lin Ma, Xiaojie Guo

The principle behind is that, for images from multiple domains, the content features can be obtained by a uniform extractor, while (re-)stylization is achieved by mapping the extracted features specifically to different purposes (domains and exemplars).

Image-to-Image Translation Translation

Exploiting Local and Global Structure for Point Cloud Semantic Segmentation with Contextual Point Representations

1 code implementation NeurIPS 2019 Xu Wang, Jingming He, Lin Ma

In this paper, we propose one novel model for point cloud semantic segmentation, which exploits both the local and global structures within the point cloud based on the contextual point representations.

Graph Attention Semantic Segmentation

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos

1 code implementation NeurIPS 2019 Yitian Yuan, Lin Ma, Jingwen Wang, Wei Liu, Wenwu Zhu

Temporal sentence grounding in videos aims to detect and localize one target video segment, which semantically corresponds to a given sentence.

Context-Gated Convolution

1 code implementation ECCV 2020 Xudong Lin, Lin Ma, Wei Liu, Shih-Fu Chang

As such, being aware of the global context, the modulated convolution kernel of our proposed CGC can better extract representative local patterns and compose discriminative features.

Ranked #59 on Image Classification on ObjectNet (using extra training data)

Action Recognition Image Classification +1

Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis

2 code implementations ICCV 2019 Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, Shenghua Gao

In this paper, we propose to use a 3D body mesh recovery module to disentangle the pose and shape, which can not only model the joint location and rotation but also characterize the personalized body shape.

Denoising Novel View Synthesis

Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction

1 code implementation11 Sep 2019 Jingwen Wang, Lin Ma, Wenhao Jiang

The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence).

Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network

1 code implementation ICCV 2019 Bairui Wang, Lin Ma, Wei zhang, Wenhao Jiang, Jingwen Wang, Wei Liu

In this paper, we propose to guide the video caption generation with Part-of-Speech (POS) information, based on a gated fusion of multiple representations of input videos.

POS Video Captioning

Sentence Specified Dynamic Video Thumbnail Generation

1 code implementation12 Aug 2019 Yitian Yuan, Lin Ma, Wenwu Zhu

With the tremendous growth of videos over the Internet, video thumbnails, providing video content previews, are becoming increasingly crucial to influencing users' online searching experiences.

Position Focused Attention Network for Image-Text Matching

1 code implementation23 Jul 2019 Yaxiong Wang, Hao Yang, Xueming Qian, Lin Ma, Jing Lu, Biao Li, Xin Fan

Then, an attention mechanism is proposed to model the relations between the image region and blocks and generate the valuable position feature, which will be further utilized to enhance the region expression and model a more reliable relationship between the visual image and the textual sentence.

Text Matching

Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning

no code implementations3 Jun 2019 Wei Zhang, Bairui Wang, Lin Ma, Wei Liu

Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a reconstruction network (RecNet) in a novel encoder-decoder-reconstructor architecture, which leverages both forward (video to sentence) and backward (sentence to video) flows for video captioning.

reinforcement-learning Reinforcement Learning (RL) +1

Hallucinating Optical Flow Features for Video Classification

1 code implementation28 May 2019 Yongyi Tang, Lin Ma, Lianqiang Zhou

However, extracting motion information, specifically in the form of optical flow features, is extremely computationally expensive, especially for large-scale video classification.

Classification General Classification +2

Spatio-temporal Video Re-localization by Warp LSTM

no code implementations CVPR 2019 Yang Feng, Lin Ma, Wei Liu, Jiebo Luo

The need for efficiently finding the video content a user wants is increasing because of the erupting of user-generated videos on the Web.

Retrieval Video Retrieval

PFLD: A Practical Facial Landmark Detector

18 code implementations28 Feb 2019 Xiaojie Guo, Siyuan Li, Jinke Yu, Jiawan Zhang, Jiayi Ma, Lin Ma, Wei Liu, Haibin Ling

Being accurate, efficient, and compact is essential to a facial landmark detector for practical use.

Face Alignment Facial Landmark Detection

Hierarchical Photo-Scene Encoder for Album Storytelling

no code implementations2 Feb 2019 Bairui Wang, Lin Ma, Wei zhang, Wenhao Jiang, Feng Zhang

In this paper, we propose a novel model with a hierarchical photo-scene encoder and a reconstructor for the task of album storytelling.

Visual Storytelling

Real-Time Referring Expression Comprehension by Single-Stage Grounding Network

no code implementations9 Dec 2018 Xinpeng Chen, Lin Ma, Jingyuan Chen, Zequn Jie, Wei Liu, Jiebo Luo

Experiments on RefCOCO, RefCOCO+, and RefCOCOg datasets demonstrate that our proposed SSG without relying on any region proposals can achieve comparable performance with other advanced models.

Referring Expression Referring Expression Comprehension

Deep Non-Blind Deconvolution via Generalized Low-Rank Approximation

no code implementations NeurIPS 2018 Wenqi Ren, Jiawei Zhang, Lin Ma, Jinshan Pan, Xiaochun Cao, WangMeng Zuo, Wei Liu, Ming-Hsuan Yang

In this paper, we present a deep convolutional neural network to capture the inherent properties of image degradation, which can handle different kernels and saturated pixels in a unified framework.

Deblurring

Multi-granularity Generator for Temporal Action Proposal

no code implementations CVPR 2019 Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, Shih-Fu Chang

In this paper, we propose a multi-granularity generator (MGG) to perform the temporal action proposal from different granularity perspectives, relying on the video visual features equipped with the position embedding information.

Action Recognition Temporal Action Proposal Generation

Unsupervised Image Captioning

1 code implementation CVPR 2019 Yang Feng, Lin Ma, Wei Liu, Jiebo Luo

Instead of relying on manually labeled image-sentence pairs, our proposed model merely requires an image set, a sentence corpus, and an existing visual concept detector.

Image Captioning

Temporally Grounding Natural Sentence in Video

no code implementations EMNLP 2018 Jingyuan Chen, Xinpeng Chen, Lin Ma, Zequn Jie, Tat-Seng Chua

We introduce an effective and efficient method that grounds (i. e., localizes) natural sentences in long, untrimmed video sequences.

Video Captioning

Non-local NetVLAD Encoding for Video Classification

no code implementations29 Sep 2018 Yongyi Tang, Xing Zhang, Jingwen Wang, Shaoxiang Chen, Lin Ma, Yu-Gang Jiang

This paper describes our solution for the 2$^\text{nd}$ YouTube-8M video understanding challenge organized by Google AI.

Classification General Classification +3

Baidu Apollo Auto-Calibration System - An Industry-Level Data-Driven and Learning based Vehicle Longitude Dynamic Calibrating Algorithm

1 code implementation30 Aug 2018 Fan Zhu, Lin Ma, Xin Xu, Dingfeng Guo, Xiao Cui, Qi Kong

Since manual calibration is not sustainable once entering into mass production stage for industrial purposes, we here introduce a machine-learning based auto-calibration system for autonomous driving vehicles.

Autonomous Driving BIG-bench Machine Learning

Video Re-localization

1 code implementation ECCV 2018 Yang Feng, Lin Ma, Wei Liu, Tong Zhang, Jiebo Luo

We first exploit and reorganize the videos in ActivityNet to form a new dataset for video re-localization research, which consists of about 10, 000 videos of diverse visual appearances associated with localized boundary information.

Copy Detection

Recurrent Fusion Network for Image Captioning

no code implementations ECCV 2018 Wenhao Jiang, Lin Ma, Yu-Gang Jiang, Wei Liu, Tong Zhang

In this paper, in order to exploit the complementary information from multiple encoders, we propose a novel Recurrent Fusion Network (RFNet) for tackling image captioning.

Image Captioning

Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks

no code implementations ECCV 2018 Minjun Li, Hao-Zhi Huang, Lin Ma, Wei Liu, Tong Zhang, Yu-Gang Jiang

Recent studies on unsupervised image-to-image translation have made a remarkable progress by training a pair of generative adversarial networks with a cycle-consistent loss.

Translation Unsupervised Image-To-Image Translation

Safe Element Screening for Submodular Function Minimization

no code implementations ICML 2018 Weizhong Zhang, Bin Hong, Lin Ma, Wei Liu, Tong Zhang

Relying on this study, we subsequently propose a novel safe screening method to quickly identify the elements guaranteed to be included (we refer to them as active) or excluded (inactive) in the final optimal solution of SFM during the optimization process.

Combinatorial Optimization Sparse Learning

Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamic

no code implementations7 May 2018 Yongyi Tang, Lin Ma, Wei Liu, Wei-Shi Zheng

Human motion prediction aims at generating future frames of human motion based on an observed sequence of skeletons.

Human motion prediction motion prediction

Learning to Guide Decoding for Image Captioning

no code implementations3 Apr 2018 Wenhao Jiang, Lin Ma, Xinpeng Chen, Hanwang Zhang, Wei Liu

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task.

Image Captioning

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

1 code implementation CVPR 2018 Jingwen Wang, Wenhao Jiang, Lin Ma, Wei Liu, Yong Xu

We propose a bidirectional proposal method that effectively exploits both past and future contexts to make proposal predictions.

Dense Video Captioning

Reconstruction Network for Video Captioning

3 code implementations CVPR 2018 Bairui Wang, Lin Ma, Wei zhang, Wei Liu

Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a reconstruction network (RecNet) with a novel encoder-decoder-reconstructor architecture, which leverages both the forward (video to sentence) and backward (sentence to video) flows for video captioning.

Video Captioning

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

1 code implementation CVPR 2018 Xinpeng Chen, Lin Ma, Wenhao Jiang, Jian Yao, Wei Liu

Recently, caption generation with an encoder-decoder framework has been extensively studied and applied in different domains, such as image captioning, code captioning, and so on.

Image Captioning

Adversarial Spatio-Temporal Learning for Video Deblurring

1 code implementation28 Mar 2018 Kaihao Zhang, Wenhan Luo, Yiran Zhong, Lin Ma, Wei Liu, Hongdong Li

To tackle the second challenge, we leverage the developed DBLRNet as a generator in the GAN (generative adversarial network) architecture, and employ a content loss in addition to an adversarial loss for efficient adversarial training.

Deblurring

Neural Stereoscopic Image Style Transfer

no code implementations ECCV 2018 Xinyu Gong, HaoZhi Huang, Lin Ma, Fumin Shen, Wei Liu, Tong Zhang

While each view of the stereoscopic pair is processed in an individual path, a novel feature aggregation strategy is proposed to effectively share information between the two paths.

Style Transfer

Normalized Direction-preserving Adam

1 code implementation ICLR 2018 Zijun Zhang, Lin Ma, Zongpeng Li, Chuan Wu

Adaptive optimization algorithms, such as Adam and RMSprop, have shown better optimization performance than stochastic gradient descent (SGD) in some scenarios.

General Classification

Real-Time Neural Style Transfer for Videos

no code implementations CVPR 2017 Hao-Zhi Huang, Hao Wang, Wenhan Luo, Lin Ma, Wenhao Jiang, Xiaolong Zhu, Zhifeng Li, Wei Liu

More specifically, a hybrid loss is proposed to capitalize on the content information of input frames, the style information of a given style image, and the temporal information of consecutive frames.

Style Transfer Video Style Transfer

Adaptive Neighboring Selection Algorithm Based on Curvature Prediction in Manifold Learning

no code implementations13 Apr 2017 Lin Ma, Caifa Zhou, Xi Liu, Yubin Xu

By verifying the proposed algorithm on embedding Swiss roll from R3 to R2 based on LLE and ISOMAP algorithm, the simulation results show that the proposed adaptive neighboring selection algorithm is feasible and able to find the optimal value of K, making the residual variance relatively small and better visualization of the results.

Dimensionality Reduction

Joint Semi-supervised RSS Dimensionality Reduction and Fingerprint Based Algorithm for Indoor Localization

no code implementations12 Apr 2017 Caifa Zhou, Lin Ma, Xuezhi Tan

Another significant innovation of this paper is jointing the fingerprint based algorithm with CM-SDE algorithm to improve the localization accuracy of indoor localization.

Dimensionality Reduction Indoor Localization

Local Subspace Collaborative Tracking

no code implementations ICCV 2015 Lin Ma, Xiaoqin Zhang, Weiming Hu, Junliang Xing, Jiwen Lu, Jie zhou

To address this, this paper presents a local subspace collaborative tracking method for robust visual tracking, where multiple linear and nonlinear subspaces are learned to better model the nonlinear relationship of object appearances.

Object Tracking Visual Tracking

Multiple Feature Fusion via Weighted Entropy for Visual Tracking

no code implementations ICCV 2015 Lin Ma, Jiwen Lu, Jianjiang Feng, Jie zhou

It is desirable to combine multiple feature descriptors to improve the visual tracking performance because different features can provide complementary information to describe objects of interest.

Visual Object Tracking Visual Tracking

Learning to Answer Questions From Image Using Convolutional Neural Network

no code implementations1 Jun 2015 Lin Ma, Zhengdong Lu, Hang Li

We demonstrate the efficacy of our proposed model on the DAQUAR and COCO-QA datasets, which are two benchmark datasets for the image QA, with the performances significantly outperforming the state-of-the-art.

General Classification Question Answering +1

Multimodal Convolutional Neural Networks for Matching Image and Sentence

2 code implementations ICCV 2015 Lin Ma, Zhengdong Lu, Lifeng Shang, Hang Li

In this paper, we propose multimodal convolutional neural networks (m-CNNs) for matching image and sentence.

Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.