Search Results for author: Lin Ma

Found 135 papers, 68 papers with code

Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models

no code implementations12 Jun 2024 Shimin Chen, Yitian Yuan, Shaoxiang Chen, Zequn Jie, Lin Ma

Amidst the advancements in image-based Large Vision-Language Models (image-LVLM), the transition to video-based models (video-LVLM) is hindered by the limited availability of quality video data.

Video Understanding

AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning

1 code implementation CVPR 2024 Duojun Huang, Xinyu Xiong, Jie Ma, Jichang Li, Zequn Jie, Lin Ma, Guanbin Li

In this paper, we propose a novel framework, termed AlignSAM, designed for automatic prompting for aligning SAM to an open context through reinforcement learning.

reinforcement-learning Segmentation

TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing

no code implementations27 May 2024 Xinyu Zhang, Mengxue Kang, Fei Wei, Shuang Xu, Yuhe Liu, Lin Ma

By providing the diffusion models with knowledge of the generated prompt and image mask, our models generate images with a superior understanding of instructions.

Image Generation Text-based Image Editing

Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs

no code implementations23 May 2024 Qingyuan Li, Ran Meng, Yiduo Li, Bo Zhang, Yifan Lu, Yerui Sun, Lin Ma, Yuchen Xie

We introduce Integer Scale, a novel post-training quantization scheme for large language models that effectively resolves the inference bottleneck in current fine-grained quantization approaches while maintaining similar accuracies.

Quantization

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

1 code implementation18 May 2024 Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, Min Zhang

Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities.

Visual Question Answering

Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

no code implementations9 May 2024 Yuan Gao, Weizhong Zhang, Wenhan Luo, Lin Ma, Jin-Gang Yu, Gui-Song Xia, Jiayi Ma

We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task.

Auxiliary Learning Neural Architecture Search

Matten: Video Generation with Mamba-Attention

no code implementations5 May 2024 Yu Gao, Jiancheng Huang, Xiaopeng Sun, Zequn Jie, Yujie Zhong, Lin Ma

In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation.

Video Generation

LaSagnA: Language-based Segmentation Assistant for Complex Queries

1 code implementation12 Apr 2024 Cong Wei, Haoxian Tan, Yujie Zhong, Yujiu Yang, Lin Ma

Recent advancements have empowered Large Language Models for Vision (vLLMs) to generate detailed perceptual outcomes, including bounding boxes and masks.

Segmentation Semantic Segmentation

UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection

1 code implementation7 Apr 2024 Yingsen Zeng, Yujie Zhong, Chengjian Feng, Lin Ma

Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while Moment Retrieval (MR) aims to identify the events described by open-ended natural language within untrimmed videos.

Action Detection Moment Queries +4

Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models

1 code implementation12 Mar 2024 Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

To address this issue, we propose a novel LMM architecture named Lumen, a Large multimodal model with versatile vision-centric capability enhancement.

Concept Alignment Instruction Following +2

Misalignment-Robust Frequency Distribution Loss for Image Transformation

1 code implementation CVPR 2024 Zhangkai Ni, Juncheng Wu, Zian Wang, Wenhan Yang, Hanli Wang, Lin Ma

This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution, which heavily rely on precisely aligned paired datasets with pixel-level alignments.

Image Enhancement Style Transfer +1

A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation

1 code implementation21 Feb 2024 Yunxin Li, Baotian Hu, Wenhan Luo, Lin Ma, Yuxin Ding, Min Zhang

For this setting, previous methods utilize visual and textual encoders to encode the image and keywords and employ a language model-based decoder to generate the product description.

In-Context Learning Language Modelling +2

Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs

no code implementations19 Feb 2024 Naihao Deng, Zhenjie Sun, Ruiqi He, Aman Sikka, Yulong Chen, Lin Ma, Yue Zhang, Rada Mihalcea

In this paper, we investigate the effectiveness of various LLMs in interpreting tabular data through different prompting strategies and data formats.

Fact Checking Question Answering

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

no code implementations CVPR 2024 Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Ma

The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector.

Object object-detection +1

LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs

no code implementations29 Jan 2024 Shaoxiang Chen, Zequn Jie, Lin Ma

To address this issue, we propose to apply an efficient Mixture of Experts (MoE) design, which is a sparse Mixture of LoRA Experts (MoLE) for instruction finetuning MLLMs.

Language Modelling Large Language Model

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

2 code implementations19 Jan 2024 Chenyu Wang, Weixin Luo, Qianyu Chen, Haonan Mai, Jindi Guo, Sixun Dong, Xiaohua, Xuan, Zhengxin Li, Lin Ma, Shenghua Gao

Recently, the astonishing performance of large language models (LLMs) in natural language comprehension and generation tasks triggered lots of exploration of using them as central controllers to build agent systems.

Language Modelling Large Language Model

Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment

no code implementations15 Dec 2023 Xiaoxu Xu, Yitian Yuan, Qiudan Zhang, Wenhui Wu, Zequn Jie, Lin Ma, Xu Wang

During the inference stage, the learned text-3D correspondence will help us ground the text queries to the 3D target objects even without 2D images.

3D visual grounding Natural Language Queries +1

ColNeRF: Collaboration for Generalizable Sparse Input Neural Radiance Field

1 code implementation14 Dec 2023 Zhangkai Ni, Peiqi Yang, Wenhan Yang, Hanli Wang, Lin Ma, Sam Kwong

Through this, we construct a novel collaborative module that aligns information from various views and meanwhile imposes self-supervised constraints to ensure multi-view consistency in both geometry and appearance.

Novel View Synthesis

EKGNet: A 10.96μW Fully Analog Neural Network for Intra-Patient Arrhythmia Classification

no code implementations24 Oct 2023 Benyamin Haghi, Lin Ma, Sahin Lale, Anima Anandkumar, Azita Emami

We present an integrated approach by combining analog computing and deep learning for electrocardiogram (ECG) arrhythmia classification.

Classification

DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation

no code implementations19 Oct 2023 Bangbang Yang, Wenqi Dong, Lin Ma, WenBo Hu, Xiao Liu, Zhaopeng Cui, Yuewen Ma

To ensure meaningful and aligned textures to the scene, we develop a novel coarse-to-fine panoramic texture generation approach with dual texture alignment, which both considers the geometry and texture cues of the captured scenes.

Texture Synthesis

SoccerNet 2023 Challenges Results

2 code implementations12 Sep 2023 Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim, Chen Chen, Fabian Deuser, Feng Yan, Fufu Yu, Gal Shitrit, Guanshuo Wang, Gyusik Choi, Hankyul Kim, Hao Guo, Hasby Fahrudin, Hidenari Koguchi, Håkan Ardö, Ibrahim Salah, Ido Yerushalmy, Iftikar Muhammad, Ikuma Uchida, Ishay Be'ery, Jaonary Rabarisoa, Jeongae Lee, Jiajun Fu, Jianqin Yin, Jinghang Xu, Jongho Nang, Julien Denize, Junjie Li, Junpei Zhang, Juntae Kim, Kamil Synowiec, Kenji Kobayashi, Kexin Zhang, Konrad Habel, Kota Nakajima, Licheng Jiao, Lin Ma, Lizhi Wang, Luping Wang, Menglong Li, Mengying Zhou, Mohamed Nasr, Mohamed Abdelwahed, Mykola Liashuha, Nikolay Falaleev, Norbert Oswald, Qiong Jia, Quoc-Cuong Pham, Ran Song, Romain Hérault, Rui Peng, Ruilong Chen, Ruixuan Liu, Ruslan Baikulov, Ryuto Fukushima, Sergio Escalera, Seungcheon Lee, Shimin Chen, Shouhong Ding, Taiga Someya, Thomas B. Moeslund, Tianjiao Li, Wei Shen, Wei zhang, Wei Li, Wei Dai, Weixin Luo, Wending Zhao, Wenjie Zhang, Xinquan Yang, Yanbiao Ma, Yeeun Joo, Yingsen Zeng, Yiyang Gan, Yongqiang Zhu, Yujie Zhong, Zheng Ruan, Zhiheng Li, Zhijian Huang, Ziyu Meng

More information on the tasks, challenges, and leaderboards are available on https://www. soccer-net. org.

Action Spotting Camera Calibration +4

Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields

no code implementations ICCV 2023 WenBo Hu, Yuling Wang, Lin Ma, Bangbang Yang, Lin Gao, Xiao Liu, Yuewen Ma

Despite the tremendous progress in neural radiance fields (NeRF), we still face a dilemma of the trade-off between quality and efficiency, e. g., MipNeRF presents fine-detailed and anti-aliased renderings but takes days for training, while Instant-ngp can accomplish the reconstruction in a few minutes but suffers from blurring or aliasing when rendering at various distances or resolutions due to ignoring the sampling area.

PaVa: a novel Path-based Valley-seeking clustering algorithm

no code implementations13 Jun 2023 Lin Ma, Conan Liu, Tiefeng Ma, Shuangzhe Liu

First, the clusters are wrapped in spherical shells after the distance transformation, making the extraction process efficient even with clusters of arbitrary shape.

Clustering

E2E-LOAD: End-to-End Long-form Online Action Detection

1 code implementation ICCV 2023 Shuqiang Cao, Weixin Luo, Bairui Wang, Wei zhang, Lin Ma

Furthermore, we propose a novel and efficient inference mechanism that accelerates heavy spatial-temporal exploration.

Online Action Detection

Bridging the Gap Between End-to-end and Non-End-to-end Multi-Object Tracking

2 code implementations22 May 2023 Feng Yan, Weixin Luo, Yujie Zhong, Yiyang Gan, Lin Ma

Existing end-to-end Multi-Object Tracking (e2e-MOT) methods have not surpassed non-end-to-end tracking-by-detection methods.

Multi-Object Tracking Video Object Tracking

A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

1 code implementation8 May 2023 Yunxin Li, Baotian Hu, Xinyu Chen, Yuxin Ding, Lin Ma, Min Zhang

This makes the language model well-suitable for such multi-modal reasoning scenario on joint textual and visual clues.

Language Modelling

LMEye: An Interactive Perception Network for Large Language Models

1 code implementation5 May 2023 Yunxin Li, Baotian Hu, Xinyu Chen, Lin Ma, Yong Xu, Min Zhang

LMEye addresses this issue by allowing the LLM to request the desired visual information aligned with various human instructions, which we term as the dynamic visual information interaction.

Language Modelling Large Language Model +1

A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text

1 code implementation3 May 2023 Yunxin Li, Baotian Hu, Yuxin Ding, Lin Ma, Min Zhang

Inspired by the Divide-and-Conquer algorithm and dual-process theory, in this paper, we regard linguistically complex texts as compound proposition texts composed of multiple simple proposition sentences and propose an end-to-end Neural Divide-and-Conquer Reasoning framework, dubbed NDCR.

Image Retrieval Logical Reasoning +1

Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network

1 code implementation ICCV 2023 Cong Han, Yujie Zhong, Dengjie Li, Kai Han, Lin Ma

Recently, the open-vocabulary semantic segmentation problem has attracted increasing attention and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pretrained visual-language model.

Classification Language Modelling +4

Adaptive Sparse Pairwise Loss for Object Re-Identification

1 code implementation CVPR 2023 Xiao Zhou, Yujie Zhong, Zhen Cheng, Fan Liang, Lin Ma

To address this problem, we propose a novel loss paradigm termed Sparse Pairwise (SP) loss that only leverages few appropriate pairs for each class in a mini-batch, and empirically demonstrate that it is sufficient for the ReID tasks.

Object

DiP: Learning Discriminative Implicit Parts for Person Re-Identification

1 code implementation24 Dec 2022 Dengjie Li, Siyu Chen, Yujie Zhong, Lin Ma

In person re-identification (ReID) tasks, many works explore the learning of part features to improve the performance over global image features.

Person Re-Identification Position

Multiple Object Tracking Challenge Technical Report for Team MT_IoT

1 code implementation7 Dec 2022 Feng Yan, Zhiheng Li, Weixin Luo, Zequn Jie, Fan Liang, Xiaolin Wei, Lin Ma

This is a brief technical report of our proposed method for Multiple-Object Tracking (MOT) Challenge in Complex Environments.

Ranked #8 on Multi-Object Tracking on DanceTrack (using extra training data)

Human Detection Multi-Object Tracking +2

AeDet: Azimuth-invariant Multi-view 3D Object Detection

1 code implementation CVPR 2023 Chengjian Feng, Zequn Jie, Yujie Zhong, Xiangxiang Chu, Lin Ma

However, the typical convolution ignores the radial symmetry of the BEV features and increases the difficulty of the detector optimization.

3D Object Detection Depth Estimation +3

Learning Point-Language Hierarchical Alignment for 3D Visual Grounding

1 code implementation22 Oct 2022 Jiaming Chen, Weixin Luo, Ran Song, Xiaolin Wei, Lin Ma, Wei zhang

This paper presents a novel hierarchical alignment model (HAM) that learns multi-granularity visual and linguistic representations in an end-to-end manner.

3D visual grounding Sentence +1

Oscillatory cooperation prevalence emerges from misperception

no code implementations17 Oct 2022 Jing Zhang, Zhao Li, Jiqiang Zhang, Lin Ma, Guozhong Zheng, Li Chen

Here we show that oscillatory behaviors naturally emerge if incomplete information is incorporated into the cooperation evolution of a non-Markov model.

Planning Assembly Sequence with Graph Transformer

1 code implementation11 Oct 2022 Lin Ma, Jiangtao Gong, Hao Xu, Hao Chen, Hao Zhao, Wenbing Huang, Guyue Zhou

In this paper, we present a graph-transformer based framework for the ASP problem which is trained and demonstrated on a self-collected ASP database.

Contrastive Video-Language Learning with Fine-grained Frame Sampling

no code implementations10 Oct 2022 Zixu Wang, Yujie Zhong, Yishu Miao, Lin Ma, Lucia Specia

However, even in paired video-text segments, only a subset of the frames are semantically relevant to the corresponding text, with the remainder representing noise; where the ratio of noisy frames is higher for longer videos.

Question Answering Representation Learning +3

Contextual Modeling for 3D Dense Captioning on Point Clouds

no code implementations8 Oct 2022 Yufeng Zhong, Long Xu, Jiebo Luo, Lin Ma

With such global and local contextual modeling strategies, our proposed model can effectively characterize the object representations and contextual information and thereby generate comprehensive and detailed descriptions of the located objects.

3D dense captioning Dense Captioning +2

SoccerNet 2022 Challenges Results

7 code implementations5 Oct 2022 Silvio Giancola, Anthony Cioppa, Adrien Deliège, Floriane Magera, Vladimir Somers, Le Kang, Xin Zhou, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdulrahman Darwish, Adrien Maglo, Albert Clapés, Andreas Luyts, Andrei Boiarov, Artur Xarles, Astrid Orcesi, Avijit Shah, Baoyu Fan, Bharath Comandur, Chen Chen, Chen Zhang, Chen Zhao, Chengzhi Lin, Cheuk-Yiu Chan, Chun Chuen Hui, Dengjie Li, Fan Yang, Fan Liang, Fang Da, Feng Yan, Fufu Yu, Guanshuo Wang, H. Anthony Chan, He Zhu, Hongwei Kan, Jiaming Chu, Jianming Hu, Jianyang Gu, Jin Chen, João V. B. Soares, Jonas Theiner, Jorge De Corte, José Henrique Brito, Jun Zhang, Junjie Li, Junwei Liang, Leqi Shen, Lin Ma, Lingchi Chen, Miguel Santos Marques, Mike Azatov, Nikita Kasatkin, Ning Wang, Qiong Jia, Quoc Cuong Pham, Ralph Ewerth, Ran Song, RenGang Li, Rikke Gade, Ruben Debien, Runze Zhang, Sangrok Lee, Sergio Escalera, Shan Jiang, Shigeyuki Odashima, Shimin Chen, Shoichi Masui, Shouhong Ding, Sin-wai Chan, Siyu Chen, Tallal El-Shabrawy, Tao He, Thomas B. Moeslund, Wan-Chi Siu, Wei zhang, Wei Li, Xiangwei Wang, Xiao Tan, Xiaochuan Li, Xiaolin Wei, Xiaoqing Ye, Xing Liu, Xinying Wang, Yandong Guo, YaQian Zhao, Yi Yu, YingYing Li, Yue He, Yujie Zhong, Zhenhua Guo, Zhiheng Li

The SoccerNet 2022 challenges were the second annual video understanding challenges organized by the SoccerNet team.

Action Spotting Camera Calibration +3

Reweighting Clicks with Dwell Time in Recommendation

no code implementations19 Sep 2022 Ruobing Xie, Lin Ma, Shaoliang Zhang, Feng Xia, Leyu Lin

Precisely, we first define a new behavior named valid read, which helps to select high-quality click instances for different users and items via dwell time.

valid

Expansion and Shrinkage of Localization for Weakly-Supervised Semantic Segmentation

1 code implementation16 Sep 2022 Jinlong Li, Zequn Jie, Xu Wang, Xiaolin Wei, Lin Ma

To tackle with this issue, this paper proposes an Expansion and Shrinkage scheme based on the offset learning in the deformable convolution, to sequentially improve the recall and precision of the located object in the two respective stages.

Object Weakly supervised Semantic Segmentation +1

Weakly Supervised Semantic Segmentation via Progressive Patch Learning

1 code implementation16 Sep 2022 Jinlong Li, Zequn Jie, Xu Wang, Yu Zhou, Xiaolin Wei, Lin Ma

"Progressive Patch Learning" further extends the feature destruction and patch learning to multi-level granularities in a progressive manner.

Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

1 code implementation CVPR 2023 Yang Jiao, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques.

3D Object Detection Autonomous Driving +1

MT-Net Submission to the Waymo 3D Detection Leaderboard

no code implementations11 Jul 2022 Shaoxiang Chen, Zequn Jie, Xiaolin Wei, Lin Ma

In this technical report, we introduce our submission to the Waymo 3D Detection leaderboard.

3D Object Detection

Cycle-Interactive Generative Adversarial Network for Robust Unsupervised Low-Light Enhancement

no code implementations3 Jul 2022 Zhangkai Ni, Wenhan Yang, Hanli Wang, Shiqi Wang, Lin Ma, Sam Kwong

Getting rid of the fundamental limitations in fitting to the paired training data, recent unsupervised low-light enhancement methods excel in adjusting illumination and contrast of images.

Generative Adversarial Network Low-Light Image Enhancement

PromptDet: Towards Open-vocabulary Detection using Uncurated Images

2 code implementations30 Mar 2022 Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.

Language Modelling Object

MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes

1 code implementation10 Mar 2022 Yang Jiao, Shaoxiang Chen, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

3D dense captioning is a recently-proposed novel task, where point clouds contain more geometric information than the 2D counterpart.

3D dense captioning Dense Captioning +3

Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding

no code implementations10 Mar 2022 Yang Jiao, Zequn Jie, Jingjing Chen, Lin Ma, Yu-Gang Jiang

Recently, one-stage visual grounders attract high attention due to their comparable accuracy but significantly higher efficiency than two-stage grounders.

Object Visual Grounding

Probabilistic fair behaviors spark its boost in the Ultimatum Game: the strength of good Samaritans

no code implementations12 Feb 2022 Guozhong Zheng, Jiqiang Zhang, Rizhou Liang, Lin Ma, Li Chen

Behavioral experiments on the Ultimatum Game have shown that we human beings have remarkable preference in fair play, contradicting the predictions by the game theory.

Fairness

Syntax Customized Video Captioning by Imitating Exemplar Sentences

1 code implementation2 Dec 2021 Yitian Yuan, Lin Ma, Wenwu Zhu

Enhancing the diversity of sentences to describe video contents is an important problem arising in recent video captioning research.

Decoder Sentence +2

Controllable Video Captioning with an Exemplar Sentence

1 code implementation2 Dec 2021 Yitian Yuan, Lin Ma, Jingwen Wang, Wenwu Zhu

In this paper, we investigate a novel and challenging task, namely controllable video captioning with an exemplar sentence.

Caption Generation Decoder +3

Two-stage Visual Cues Enhancement Network for Referring Image Segmentation

1 code implementation9 Oct 2021 Yang Jiao, Zequn Jie, Weixin Luo, Jingjing Chen, Yu-Gang Jiang, Xiaolin Wei, Lin Ma

Referring Image Segmentation (RIS) aims at segmenting the target object from an image referred by one given natural language expression.

Image Segmentation Retrieval +2

Sensoring and Application of Multimodal Data for the Detection of Freezing of Gait in Parkinson's Disease

no code implementations9 Oct 2021 Wei zhang, Debin Huang, Hantao Li, Lipeng Wang, Yanzhao Wei, Kang Pan, Lin Ma, Huanhuan Feng, Jing Pan, Yuzhu Guo

The accurate and reliable detection or prediction of freezing of gaits (FOG) is important for fall prevention in Parkinson's Disease (PD) and studying the physiological transitions during the occurrence of FOG.

EEG valid

Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection

1 code implementation4 Aug 2021 Chen Zhang, Runmin Cong, Qinwei Lin, Lin Ma, Feng Li, Yao Zhao, Sam Kwong

For the cross-modality interaction in feature encoder, existing methods either indiscriminately treat RGB and depth modalities, or only habitually utilize depth cues as auxiliary information of the RGB branch.

object-detection RGB-D Salient Object Detection +1

Discriminative-Generative Representation Learning for One-Class Anomaly Detection

no code implementations27 Jul 2021 Xuan Xia, Xizhou Pan, Xing He, Jingfei Zhang, Ning Ding, Lin Ma

As a kind of generative self-supervised learning methods, generative adversarial nets have been widely studied in the field of anomaly detection.

Anomaly Detection Representation Learning +1

Sentence-level Online Handwritten Chinese Character Recognition

no code implementations4 Jul 2021 Yunxin Li, Qian Yang, Qingcai Chen, Lin Ma, Baotian Hu, Xiaolong Wang, Yuxin Ding

Single online handwritten Chinese character recognition~(single OLHCCR) has achieved prominent performance.

Sentence Word Embeddings

GlyphCRM: Bidirectional Encoder Representation for Chinese Character with its Glyph

no code implementations1 Jul 2021 Yunxin Li, Yu Zhao, Baotian Hu, Qingcai Chen, Yang Xiang, Xiaolong Wang, Yuxin Ding, Lin Ma

Previous works indicate that the glyph of Chinese characters contains rich semantic information and has the potential to enhance the representation of Chinese characters.

Beyond Monocular Deraining: Parallel Stereo Deraining Network Via Semantic Prior

no code implementations9 May 2021 Kaihao Zhang, Wenhan Luo, Yanjiang Yu, Wenqi Ren, Fang Zhao, Changsheng Li, Lin Ma, Wei Liu, Hongdong Li

We first use a coarse deraining network to reduce the rain streaks on the input images, and then adopt a pre-trained semantic segmentation network to extract semantic features from the coarse derained image.

Benchmarking Rain Removal +1

Relation-aware Instance Refinement for Weakly Supervised Visual Grounding

1 code implementation CVPR 2021 Yongfei Liu, Bo Wan, Lin Ma, Xuming He

Visual grounding, which aims to build a correspondence between visual objects and their language entities, plays a key role in cross-modal scene understanding.

Object Relation +3

Take More Positives: An Empirical Study of Contrastive Learing in Unsupervised Person Re-Identification

no code implementations12 Jan 2021 Xuanyu He, Wei zhang, Ran Song, Qian Zhang, Xiangyuan Lan, Lin Ma

By studying two unsupervised person re-ID methods in a cross-method way, we point out a hard negative problem is handled implicitly by their designs of data augmentations and PK sampler respectively.

Contrastive Learning Unsupervised Person Re-Identification

Similarity Reasoning and Filtration for Image-Text Matching

1 code implementation5 Jan 2021 Haiwen Diao, Ying Zhang, Lin Ma, Huchuan Lu

Image-text matching plays a critical role in bridging the vision and language, and great progress has been made by exploiting the global alignment between image and sentence, or local alignments between regions and words.

Image Retrieval Image-text matching +3

Unpaired Image Enhancement with Quality-Attention Generative Adversarial Network

no code implementations30 Dec 2020 Zhangkai Ni, Wenhan Yang, Shiqi Wang, Lin Ma, Sam Kwong

The key novelty of the proposed QAGAN lies in the injected QAM for the generator such that it learns domain-relevant quality attention directly from the two domains.

Generative Adversarial Network Image Enhancement

Towards Unsupervised Deep Image Enhancement with Generative Adversarial Network

1 code implementation30 Dec 2020 Zhangkai Ni, Wenhan Yang, Shiqi Wang, Lin Ma, Sam Kwong

In this paper, we present an unsupervised image enhancement generative adversarial network (UEGAN), which learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner, rather than learning on a large number of paired images.

Generative Adversarial Network Image Enhancement +1

Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis

2 code implementations18 Nov 2020 Wen Liu, Zhixin Piao, Zhi Tu, Wenhan Luo, Lin Ma, Shenghua Gao

Also, we build a new dataset, namely iPER dataset, for the evaluation of human motion imitation, appearance transfer, and novel view synthesis.

Denoising Image Generation +1

Intrinsic Relationship Reasoning for Small Object Detection

no code implementations2 Sep 2020 Kui Fu, Jia Li, Lin Ma, Kai Mu, Yonghong Tian

In this paper, we propose a novel context reasoning approach for small object detection which models and infers the intrinsic semantic and spatial layout relationships between objects.

Object object-detection +1

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

1 code implementation ECCV 2020 Haoran Wang, Ying Zhang, Zhong Ji, Yanwei Pang, Lin Ma

In this paper, we propose a Consensus-aware Visual-Semantic Embedding (CVSE) model to incorporate the consensus information, namely the commonsense knowledge shared between both modalities, into image-text matching.

Image Captioning Image-text matching +3

Deblurring by Realistic Blurring

1 code implementation CVPR 2020 Kaihao Zhang, Wenhan Luo, Yiran Zhong, Lin Ma, Bjorn Stenger, Wei Liu, Hongdong Li

To address this problem, we propose a new method which combines two GAN models, i. e., a learning-to-Blur GAN (BGAN) and learning-to-DeBlur GAN (DBGAN), in order to learn a better model for image deblurring by primarily learning how to blur images.

Deblurring Image Deblurring

Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos

no code implementations16 Mar 2020 Yijun Song, Jingwen Wang, Lin Ma, Zhou Yu, Jun Yu

The task of temporally grounding textual queries in videos is to localize one video segment that semantically corresponds to the given query.

Sentence

Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video

no code implementations25 Jan 2020 Zhenfang Chen, Lin Ma, Wenhan Luo, Peng Tang, Kwan-Yee K. Wong

In this paper, we study the problem of weakly-supervised temporal grounding of sentence in video.

Sentence

Fine-grained Image-to-Image Transformation towards Visual Recognition

no code implementations CVPR 2020 Wei Xiong, Yutong He, Yixuan Zhang, Wenhan Luo, Lin Ma, Jiebo Luo

In this paper, we aim at transforming an image with a fine-grained category to synthesize new images that preserve the identity of the input image, which can thereby benefit the subsequent fine-grained image recognition and few-shot learning tasks.

Few-Shot Learning Fine-Grained Image Recognition

Coupled Network for Robust Pedestrian Detection with Gated Multi-Layer Feature Extraction and Deformable Occlusion Handling

no code implementations18 Dec 2019 Tianrui Liu, Wenhan Luo, Lin Ma, Jun-Jie Huang, Tania Stathaki, Tianhong Dai

Ablation studies have validated the effectiveness of both the proposed gated multi-layer feature extraction sub-network and the deformable occlusion handling sub-network.

Occlusion Handling Pedestrian Detection

LaFIn: Generative Landmark Guided Face Inpainting

1 code implementation26 Nov 2019 Yang Yang, Xiaojie Guo, Jiayi Ma, Lin Ma, Haibin Ling

It is challenging to inpaint face images in the wild, due to the large variation of appearance, such as different poses, expressions and occlusions.

Attribute Facial Inpainting

EDIT: Exemplar-Domain Aware Image-to-Image Translation

1 code implementation24 Nov 2019 Yuanbin Fu, Jiayi Ma, Lin Ma, Xiaojie Guo

The principle behind is that, for images from multiple domains, the content features can be obtained by a uniform extractor, while (re-)stylization is achieved by mapping the extracted features specifically to different purposes (domains and exemplars).

Generative Adversarial Network Image-to-Image Translation +1

Exploiting Local and Global Structure for Point Cloud Semantic Segmentation with Contextual Point Representations

1 code implementation NeurIPS 2019 Xu Wang, Jingming He, Lin Ma

In this paper, we propose one novel model for point cloud semantic segmentation, which exploits both the local and global structures within the point cloud based on the contextual point representations.

Graph Attention Semantic Segmentation

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos

1 code implementation NeurIPS 2019 Yitian Yuan, Lin Ma, Jingwen Wang, Wei Liu, Wenwu Zhu

Temporal sentence grounding in videos aims to detect and localize one target video segment, which semantically corresponds to a given sentence.

Sentence Temporal Sentence Grounding

Context-Gated Convolution

1 code implementation ECCV 2020 Xudong Lin, Lin Ma, Wei Liu, Shih-Fu Chang

As such, being aware of the global context, the modulated convolution kernel of our proposed CGC can better extract representative local patterns and compose discriminative features.

Ranked #61 on Image Classification on ObjectNet (using extra training data)

Action Recognition Image Classification +1

Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis

2 code implementations ICCV 2019 Wen Liu, Zhixin Piao, Jie Min, Wenhan Luo, Lin Ma, Shenghua Gao

In this paper, we propose to use a 3D body mesh recovery module to disentangle the pose and shape, which can not only model the joint location and rotation but also characterize the personalized body shape.

Denoising Novel View Synthesis

Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction

1 code implementation11 Sep 2019 Jingwen Wang, Lin Ma, Wenhao Jiang

The task of temporally grounding language queries in videos is to temporally localize the best matched video segment corresponding to a given language (sentence).

Sentence

Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network

1 code implementation ICCV 2019 Bairui Wang, Lin Ma, Wei zhang, Wenhao Jiang, Jingwen Wang, Wei Liu

In this paper, we propose to guide the video caption generation with Part-of-Speech (POS) information, based on a gated fusion of multiple representations of input videos.

Caption Generation Decoder +3

Sentence Specified Dynamic Video Thumbnail Generation

1 code implementation12 Aug 2019 Yitian Yuan, Lin Ma, Wenwu Zhu

With the tremendous growth of videos over the Internet, video thumbnails, providing video content previews, are becoming increasingly crucial to influencing users' online searching experiences.

Sentence

Position Focused Attention Network for Image-Text Matching

1 code implementation23 Jul 2019 Yaxiong Wang, Hao Yang, Xueming Qian, Lin Ma, Jing Lu, Biao Li, Xin Fan

Then, an attention mechanism is proposed to model the relations between the image region and blocks and generate the valuable position feature, which will be further utilized to enhance the region expression and model a more reliable relationship between the visual image and the textual sentence.

Image-text matching Position +2

Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video

1 code implementation ACL 2019 Zhenfang Chen, Lin Ma, Wenhan Luo, Kwan-Yee K. Wong

In this paper, we address a novel task, namely weakly-supervised spatio-temporally grounding natural sentence in video.

object-detection Sentence +1

Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning

no code implementations3 Jun 2019 Wei Zhang, Bairui Wang, Lin Ma, Wei Liu

Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a reconstruction network (RecNet) in a novel encoder-decoder-reconstructor architecture, which leverages both forward (video to sentence) and backward (sentence to video) flows for video captioning.

Decoder reinforcement-learning +3

Hallucinating Optical Flow Features for Video Classification

1 code implementation28 May 2019 Yongyi Tang, Lin Ma, Lianqiang Zhou

However, extracting motion information, specifically in the form of optical flow features, is extremely computationally expensive, especially for large-scale video classification.

Classification General Classification +3

Spatio-temporal Video Re-localization by Warp LSTM

no code implementations CVPR 2019 Yang Feng, Lin Ma, Wei Liu, Jiebo Luo

The need for efficiently finding the video content a user wants is increasing because of the erupting of user-generated videos on the Web.

Retrieval Video Retrieval

PFLD: A Practical Facial Landmark Detector

18 code implementations28 Feb 2019 Xiaojie Guo, Siyuan Li, Jinke Yu, Jiawan Zhang, Jiayi Ma, Lin Ma, Wei Liu, Haibin Ling

Being accurate, efficient, and compact is essential to a facial landmark detector for practical use.

Face Alignment Facial Landmark Detection

Hierarchical Photo-Scene Encoder for Album Storytelling

no code implementations2 Feb 2019 Bairui Wang, Lin Ma, Wei zhang, Wenhao Jiang, Feng Zhang

In this paper, we propose a novel model with a hierarchical photo-scene encoder and a reconstructor for the task of album storytelling.

Decoder Image-guided Story Ending Generation

Real-Time Referring Expression Comprehension by Single-Stage Grounding Network

no code implementations9 Dec 2018 Xinpeng Chen, Lin Ma, Jingyuan Chen, Zequn Jie, Wei Liu, Jiebo Luo

Experiments on RefCOCO, RefCOCO+, and RefCOCOg datasets demonstrate that our proposed SSG without relying on any region proposals can achieve comparable performance with other advanced models.

Attribute Referring Expression +1

Deep Non-Blind Deconvolution via Generalized Low-Rank Approximation

no code implementations NeurIPS 2018 Wenqi Ren, Jiawei Zhang, Lin Ma, Jinshan Pan, Xiaochun Cao, WangMeng Zuo, Wei Liu, Ming-Hsuan Yang

In this paper, we present a deep convolutional neural network to capture the inherent properties of image degradation, which can handle different kernels and saturated pixels in a unified framework.

Deblurring

Multi-granularity Generator for Temporal Action Proposal

no code implementations CVPR 2019 Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, Shih-Fu Chang

In this paper, we propose a multi-granularity generator (MGG) to perform the temporal action proposal from different granularity perspectives, relying on the video visual features equipped with the position embedding information.

Action Recognition Temporal Action Proposal Generation

Unsupervised Image Captioning

1 code implementation CVPR 2019 Yang Feng, Lin Ma, Wei Liu, Jiebo Luo

Instead of relying on manually labeled image-sentence pairs, our proposed model merely requires an image set, a sentence corpus, and an existing visual concept detector.

Image Captioning Sentence

Temporally Grounding Natural Sentence in Video

no code implementations EMNLP 2018 Jingyuan Chen, Xinpeng Chen, Lin Ma, Zequn Jie, Tat-Seng Chua

We introduce an effective and efficient method that grounds (i. e., localizes) natural sentences in long, untrimmed video sequences.

Sentence Video Captioning

Non-local NetVLAD Encoding for Video Classification

no code implementations29 Sep 2018 Yongyi Tang, Xing Zhang, Jingwen Wang, Shaoxiang Chen, Lin Ma, Yu-Gang Jiang

This paper describes our solution for the 2$^\text{nd}$ YouTube-8M video understanding challenge organized by Google AI.

Classification General Classification +3

Baidu Apollo Auto-Calibration System - An Industry-Level Data-Driven and Learning based Vehicle Longitude Dynamic Calibrating Algorithm

1 code implementation30 Aug 2018 Fan Zhu, Lin Ma, Xin Xu, Dingfeng Guo, Xiao Cui, Qi Kong

Since manual calibration is not sustainable once entering into mass production stage for industrial purposes, we here introduce a machine-learning based auto-calibration system for autonomous driving vehicles.

Autonomous Driving BIG-bench Machine Learning

Video Re-localization

1 code implementation ECCV 2018 Yang Feng, Lin Ma, Wei Liu, Tong Zhang, Jiebo Luo

We first exploit and reorganize the videos in ActivityNet to form a new dataset for video re-localization research, which consists of about 10, 000 videos of diverse visual appearances associated with localized boundary information.

Copy Detection

Recurrent Fusion Network for Image Captioning

no code implementations ECCV 2018 Wenhao Jiang, Lin Ma, Yu-Gang Jiang, Wei Liu, Tong Zhang

In this paper, in order to exploit the complementary information from multiple encoders, we propose a novel Recurrent Fusion Network (RFNet) for tackling image captioning.

Decoder Image Captioning

Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks

no code implementations ECCV 2018 Minjun Li, Hao-Zhi Huang, Lin Ma, Wei Liu, Tong Zhang, Yu-Gang Jiang

Recent studies on unsupervised image-to-image translation have made a remarkable progress by training a pair of generative adversarial networks with a cycle-consistent loss.

Translation Unsupervised Image-To-Image Translation

Safe Element Screening for Submodular Function Minimization

no code implementations ICML 2018 Weizhong Zhang, Bin Hong, Lin Ma, Wei Liu, Tong Zhang

Relying on this study, we subsequently propose a novel safe screening method to quickly identify the elements guaranteed to be included (we refer to them as active) or excluded (inactive) in the final optimal solution of SFM during the optimization process.

Combinatorial Optimization Sparse Learning

Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamic

no code implementations7 May 2018 Yongyi Tang, Lin Ma, Wei Liu, Wei-Shi Zheng

Human motion prediction aims at generating future frames of human motion based on an observed sequence of skeletons.

Human motion prediction motion prediction

Learning to Guide Decoding for Image Captioning

no code implementations3 Apr 2018 Wenhao Jiang, Lin Ma, Xinpeng Chen, Hanwang Zhang, Wei Liu

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task.

Attribute Decoder +1

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

1 code implementation CVPR 2018 Jingwen Wang, Wenhao Jiang, Lin Ma, Wei Liu, Yong Xu

We propose a bidirectional proposal method that effectively exploits both past and future contexts to make proposal predictions.

Decoder Dense Video Captioning

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

1 code implementation CVPR 2018 Xinpeng Chen, Lin Ma, Wenhao Jiang, Jian Yao, Wei Liu

Recently, caption generation with an encoder-decoder framework has been extensively studied and applied in different domains, such as image captioning, code captioning, and so on.

Caption Generation Decoder +1

Reconstruction Network for Video Captioning

3 code implementations CVPR 2018 Bairui Wang, Lin Ma, Wei zhang, Wei Liu

Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a reconstruction network (RecNet) with a novel encoder-decoder-reconstructor architecture, which leverages both the forward (video to sentence) and backward (sentence to video) flows for video captioning.

Decoder Sentence +1

Adversarial Spatio-Temporal Learning for Video Deblurring

1 code implementation28 Mar 2018 Kaihao Zhang, Wenhan Luo, Yiran Zhong, Lin Ma, Wei Liu, Hongdong Li

To tackle the second challenge, we leverage the developed DBLRNet as a generator in the GAN (generative adversarial network) architecture, and employ a content loss in addition to an adversarial loss for efficient adversarial training.

Deblurring Generative Adversarial Network

Neural Stereoscopic Image Style Transfer

no code implementations ECCV 2018 Xinyu Gong, HaoZhi Huang, Lin Ma, Fumin Shen, Wei Liu, Tong Zhang

While each view of the stereoscopic pair is processed in an individual path, a novel feature aggregation strategy is proposed to effectively share information between the two paths.

Style Transfer

Normalized Direction-preserving Adam

1 code implementation ICLR 2018 Zijun Zhang, Lin Ma, Zongpeng Li, Chuan Wu

Adaptive optimization algorithms, such as Adam and RMSprop, have shown better optimization performance than stochastic gradient descent (SGD) in some scenarios.

General Classification

Real-Time Neural Style Transfer for Videos

no code implementations CVPR 2017 Hao-Zhi Huang, Hao Wang, Wenhan Luo, Lin Ma, Wenhao Jiang, Xiaolong Zhu, Zhifeng Li, Wei Liu

More specifically, a hybrid loss is proposed to capitalize on the content information of input frames, the style information of a given style image, and the temporal information of consecutive frames.

Style Transfer Video Style Transfer

Adaptive Neighboring Selection Algorithm Based on Curvature Prediction in Manifold Learning

no code implementations13 Apr 2017 Lin Ma, Caifa Zhou, Xi Liu, Yubin Xu

By verifying the proposed algorithm on embedding Swiss roll from R3 to R2 based on LLE and ISOMAP algorithm, the simulation results show that the proposed adaptive neighboring selection algorithm is feasible and able to find the optimal value of K, making the residual variance relatively small and better visualization of the results.

Dimensionality Reduction

Joint Semi-supervised RSS Dimensionality Reduction and Fingerprint Based Algorithm for Indoor Localization

no code implementations12 Apr 2017 Caifa Zhou, Lin Ma, Xuezhi Tan

Another significant innovation of this paper is jointing the fingerprint based algorithm with CM-SDE algorithm to improve the localization accuracy of indoor localization.

Dimensionality Reduction Indoor Localization

Multiple Feature Fusion via Weighted Entropy for Visual Tracking

no code implementations ICCV 2015 Lin Ma, Jiwen Lu, Jianjiang Feng, Jie zhou

It is desirable to combine multiple feature descriptors to improve the visual tracking performance because different features can provide complementary information to describe objects of interest.

Object Visual Object Tracking +1

Local Subspace Collaborative Tracking

no code implementations ICCV 2015 Lin Ma, Xiaoqin Zhang, Weiming Hu, Junliang Xing, Jiwen Lu, Jie zhou

To address this, this paper presents a local subspace collaborative tracking method for robust visual tracking, where multiple linear and nonlinear subspaces are learned to better model the nonlinear relationship of object appearances.

Object Object Tracking +1

Learning to Answer Questions From Image Using Convolutional Neural Network

no code implementations1 Jun 2015 Lin Ma, Zhengdong Lu, Hang Li

We demonstrate the efficacy of our proposed model on the DAQUAR and COCO-QA datasets, which are two benchmark datasets for the image QA, with the performances significantly outperforming the state-of-the-art.

General Classification Question Answering +2

Cannot find the paper you are looking for? You can Submit a new open access paper.