Search Results for author: Heng Wang

Found 97 papers, 47 papers with code

Proposal-based Video Completion

no code implementations ECCV 2020 Yuan-Ting Hu, Heng Wang, Nicolas Ballas, Kristen Grauman, Alexander G. Schwing

Video inpainting is an important technique for a wide variety of applications from video content editing to video restoration.

Image Inpainting object-detection +4

Theoretical Analysis of Near-Field MIMO Channel Capacity and Mid-Band Experimental Validation

no code implementations19 Jun 2025 Haiyang Miao, Jianhua Zhang, Pan Tang, Heng Wang, Lei Tian, Guangyi Liu

With the increase of multiple-input-multiple-output (MIMO) array size and carrier frequency, near-field MIMO communications will become crucial in 6G wireless networks.

Generalizable LLM Learning of Graph Synthetic Data with Reinforcement Learning

no code implementations1 Jun 2025 Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xinyun Liu, Yulia Tsvetkov

We then compare them against existing settings on both in-domain synthetic tasks and out-of-domain real-world tasks with implicit graph structures such as multi-hop QA, structured planning, and more.

Kimi-VL Technical Report

1 code implementation10 Apr 2025 Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, HaoNing Wu, Haotian Yao, Haoyu Lu, Heng Wang, Hongcheng Gao, Huabin Zheng, Jiaming Li, Jianlin Su, Jianzhou Wang, Jiaqi Deng, Jiezhong Qiu, Jin Xie, Jinhong Wang, Jingyuan Liu, Junjie Yan, Kun Ouyang, Liang Chen, Lin Sui, Longhui Yu, Mengfan Dong, Mengnan Dong, Nuo Xu, Pengyu Cheng, Qizheng Gu, Runjie Zhou, Shaowei Liu, Sihan Cao, Tao Yu, Tianhui Song, Tongtong Bai, Wei Song, Weiran He, Weixiao Huang, Weixin Xu, Xiaokun Yuan, Xingcheng Yao, Xingzhe Wu, Xinxing Zu, Xinyu Zhou, Xinyuan Wang, Y. Charles, Yan Zhong, Yang Li, Yangyang Hu, Yanru Chen, Yejie Wang, Yibo Liu, Yibo Miao, Yidao Qin, Yimin Chen, Yiping Bao, Yiqin Wang, Yongsheng Kang, Yuanxin Liu, Yulun Du, Yuxin Wu, Yuzhi Wang, Yuzi Yan, Zaida Zhou, Zhaowei Li, Zhejun Jiang, Zheng Zhang, Zhilin Yang, Zhiqi Huang, Zihao Huang, Zijia Zhao, Ziwei Chen, Zongyu Lin

We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2. 8B parameters in its language decoder (Kimi-VL-A3B).

Long-Context Understanding Mathematical Reasoning +4

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

no code implementations1 Apr 2025 Weizhi Wang, Yu Tian, Linjie Yang, Heng Wang, Xifeng Yan

We open-source all aspects of our work, including compute-efficient and data-efficient training details, data filtering methods, sequence packing scripts, pre-training data in WebDataset format, FSDP-based training codebase, and both base and instruction-tuned model checkpoints.

Large Language Model Multimodal Large Language Model

ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object

1 code implementation CVPR 2025 Zhe Shan, Yang Liu, Lei Zhou, Cheng Yan, Heng Wang, Xia Xie

In this work, we propose ROS-SAM, a method designed to achieve high-quality interactive segmentation while preserving generalization across diverse remote sensing data.

Domain Adaptation Interactive Segmentation +2

BannerAgency: Advertising Banner Design with Multimodal LLM Agents

no code implementations14 Mar 2025 Heng Wang, Yotaro Shimose, Shingo Takamatsu

Advertising banners are critical for capturing user attention and enhancing advertising campaign effectiveness.

Marketing

Reward Shaping to Mitigate Reward Hacking in RLHF

1 code implementation26 Feb 2025 Jiayi Fu, Xuandong Zhao, Chengyuan Yao, Heng Wang, Qi Han, Yanghua Xiao

Reinforcement Learning from Human Feedback (RLHF) is essential for aligning large language models (LLMs) with human values.

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

1 code implementation17 Feb 2025 Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, HongYu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu, Jianchang Wu, Jiangjie Zhen, Ranchen Ming, Song Yuan, Xuelin Zhang, Yu Zhou, Bingxin Li, Buyun Ma, Hongyuan Wang, Kang An, Wei Ji, Wen Li, Xuan Wen, Xiangwen Kong, Yuankai Ma, Yuanwei Liang, Yun Mou, Bahtiyar Ahmidi, Bin Wang, Bo Li, Changxin Miao, Chen Xu, Chenrun Wang, Dapeng Shi, Deshan Sun, Dingyuan Hu, Dula Sai, Enle Liu, Guanzhe Huang, Gulin Yan, Heng Wang, Haonan Jia, Haoyang Zhang, Jiahao Gong, Junjing Guo, Jiashuai Liu, Jiahong Liu, Jie Feng, Jie Wu, Jiaoren Wu, Jie Yang, Jinguo Wang, Jingyang Zhang, Junzhe Lin, Kaixiang Li, Lei Xia, Li Zhou, Liang Zhao, Longlong Gu, Mei Chen, Menglin Wu, Ming Li, Mingxiao Li, Mingliang Li, Mingyao Liang, Na Wang, Nie Hao, Qiling Wu, Qinyuan Tan, Ran Sun, Shuai Shuai, Shaoliang Pang, Shiliang Yang, Shuli Gao, Shanshan Yuan, SiQi Liu, Shihong Deng, Shilei Jiang, Sitong Liu, Tiancheng Cao, Tianyu Wang, Wenjin Deng, Wuxun Xie, Weipeng Ming, Wenqing He, Wen Sun, Xin Han, Xin Huang, Xiaomin Deng, Xiaojia Liu, Xin Wu, Xu Zhao, Yanan Wei, Yanbo Yu, Yang Cao, Yangguang Li, Yangzhen Ma, Yanming Xu, Yaoyu Wang, Yaqiang Shi, Yilei Wang, Yizhuang Zhou, Yinmin Zhong, Yang Zhang, Yaoben Wei, Yu Luo, Yuanwei Lu, Yuhe Yin, Yuchu Luo, Yuanhao Ding, Yuting Yan, Yaqi Dai, Yuxiang Yang, Zhe Xie, Zheng Ge, Zheng Sun, Zhewei Huang, Zhichao Chang, Zhisheng Guan, Zidong Yang, Zili Zhang, Binxing Jiao, Daxin Jiang, Heung-Yeung Shum, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu

Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following.

Instruction Following Voice Cloning

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

3 code implementations14 Feb 2025 Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou, Kaijun Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun, Xin Han, Yanan Wei, Zheng Ge, Aojie Li, Bin Wang, Bizhu Huang, Bo wang, Brian Li, Changxing Miao, Chen Xu, Chenfei Wu, Chenguang Yu, Dapeng Shi, Dingyuan Hu, Enle Liu, Gang Yu, Ge Yang, Guanzhe Huang, Gulin Yan, Haiyang Feng, Hao Nie, Haonan Jia, Hanpeng Hu, Hanqi Chen, Haolong Yan, Heng Wang, Hongcheng Guo, Huilin Xiong, Huixin Xiong, Jiahao Gong, Jianchang Wu, Jiaoren Wu, Jie Wu, Jie Yang, Jiashuai Liu, Jiashuo Li, Jingyang Zhang, Junjing Guo, Junzhe Lin, Kaixiang Li, Lei Liu, Lei Xia, Liang Zhao, Liguo Tan, Liwen Huang, Liying Shi, Ming Li, Mingliang Li, Muhua Cheng, Na Wang, Qiaohui Chen, Qinglin He, Qiuyan Liang, Quan Sun, Ran Sun, Rui Wang, Shaoliang Pang, Shiliang Yang, Sitong Liu, SiQi Liu, Shuli Gao, Tiancheng Cao, Tianyu Wang, Weipeng Ming, Wenqing He, Xu Zhao, Xuelin Zhang, Xianfang Zeng, Xiaojia Liu, Xuan Yang, Yaqi Dai, Yanbo Yu, Yang Li, Yineng Deng, Yingming Wang, Yilei Wang, Yuanwei Lu, Yu Chen, Yu Luo, Yuchu Luo, Yuhe Yin, Yuheng Feng, Yuxiang Yang, Zecheng Tang, Zekai Zhang, Zidong Yang, Binxing Jiao, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu, Heung-Yeung Shum, Daxin Jiang

We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length.

Video Generation Video Reconstruction

CoSER: Coordinating LLM-Based Persona Simulation of Established Roles

2 code implementations13 Feb 2025 Xintao Wang, Heng Wang, Yifei Zhang, Xinfeng Yuan, Rui Xu, Jen-tse Huang, Siyu Yuan, Haoran Guo, Jiangjie Chen, Wei Wang, Yanghua Xiao, Shuchang Zhou

It provides authentic dialogues with real-world intricacies, as well as diverse data types such as conversation setups, character experiences and internal thoughts.

Fast Prompt Alignment for Text-to-Image Generation

1 code implementation11 Dec 2024 Khalil Mrini, Hanlin Lu, Linjie Yang, Weilin Huang, Heng Wang

Text-to-image generation has advanced rapidly, yet aligning complex textual prompts with generated visuals remains challenging, especially with intricate object relationships and fine-grained details.

In-Context Learning Text to Image Generation +2

Gotta Hear Them All: Sound Source Aware Vision to Audio Generation

1 code implementation23 Nov 2024 Wei Guo, Heng Wang, Jianbo Ma, Weidong Cai

By addressing V2A generation at the sound-source level, SSV2A surpasses state-of-the-art methods in both generation fidelity and relevance as evidenced by extensive experiments.

All Audio Generation

Real-time and Downtime-tolerant Fault Diagnosis for Railway Turnout Machines (RTMs) Empowered with Cloud-Edge Pipeline Parallelism

no code implementations4 Nov 2024 Fan Wu, Muhammad Bilal, Haolong Xiang, Heng Wang, Jinjun Yu, Xiaolong Xu

In this paper, an edge-cloud collaborative early-warning system is proposed to enable real-time and downtime-tolerant fault diagnosis of RTMs, providing a new paradigm for the deployment of models in safety-critical scenarios.

Fault Diagnosis

DINeuro: Distilling Knowledge from 2D Natural Images via Deformable Tubular Transferring Strategy for 3D Neuron Reconstruction

no code implementations29 Oct 2024 Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Yui Lo, Yuqian Chen, Lauren J. O'Donnell, Weidong Cai

Reconstructing neuron morphology from 3D light microscope imaging data is critical to aid neuroscientists in analyzing brain networks and neuroanatomy.

DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction

1 code implementation30 Sep 2024 Zhen Yang, Yanpeng Dong, Heng Wang, Lichao Ma, Zijian Cui, Qi Liu, Haoran Pei

Multi-sensor fusion significantly enhances the accuracy and robustness of 3D semantic occupancy prediction, which is crucial for autonomous driving and robotics.

3D Object Detection 3D Semantic Occupancy Prediction +4

Enhancing Advanced Visual Reasoning Ability of Large Language Models

no code implementations21 Sep 2024 Zhiyuan Li, Dongnan Liu, Chaoyi Zhang, Heng Wang, Tengfei Xue, Weidong Cai

Recent advancements in Vision-Language (VL) research have sparked new benchmarks for complex visual reasoning, challenging models' advanced reasoning ability.

In-Context Learning Visual Reasoning

Explaining Datasets in Words: Statistical Models with Natural Language Parameters

1 code implementation13 Sep 2024 Ruiqi Zhong, Heng Wang, Dan Klein, Jacob Steinhardt

To make sense of massive data, we often fit simplified models and then interpret the parameters; for example, we cluster the text embeddings and then interpret the mean parameters of each cluster.

Clustering Language Modeling +2

Future Does Matter: Boosting 3D Object Detection with Temporal Motion Estimation in Point Cloud Sequences

no code implementations6 Sep 2024 Rui Yu, Runkai Zhao, Cong Nie, Heng Wang, Huaicheng Yan, Meng Wang

Specifically, Motion-Guided Feature Aggregation (MGFA) is proposed to utilize the object trajectory from previous and future motion states to model spatial-temporal correlations into gaussian heatmap over a driving sequence.

3D Object Detection Motion Estimation +4

Can LLM Graph Reasoning Generalize beyond Pattern Memorization?

1 code implementation23 Jun 2024 Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xiaochuang Han, Tianxing He, Yulia Tsvetkov

To this end, we propose the NLGift benchmark, an evaluation suite of LLM graph reasoning generalization: whether LLMs could go beyond semantic, numeric, structural, reasoning patterns in the synthetic training data and improve utility on real-world graph-based tasks.

Memorization

Autoregressive Pretraining with Mamba in Vision

1 code implementation11 Jun 2024 Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks.

Mamba

Dance Any Beat: Blending Beats with Visuals in Dance Video Generation

no code implementations15 May 2024 Xuanchen Wang, Heng Wang, Dongnan Liu, Weidong Cai

This task enables the dance generation of specific individuals without requiring keypoint annotations, making it more versatile and applicable to various situations.

Image to Video Generation Optical Flow Estimation +2

Boosting 3D Neuron Segmentation with 2D Vision Transformer Pre-trained on Natural Images

no code implementations4 May 2024 Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Weidong Cai

To address this limitation, we aim to distill the consensus knowledge from massive natural image data to aid the segmentation model in learning the complex neuron structures.

Segmentation

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

no code implementations15 Apr 2024 Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie

This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200, 000 edits.

Attribute

Digital Twin Channel for 6G: Concepts, Architectures and Potential Applications

no code implementations19 Mar 2024 Heng Wang, Jianhua Zhang, Gaofeng Nie, Li Yu, Zhiqiang Yuan, Tongjie Li, Jialin Wang, Guangyi Liu

Digital twin channel (DTC) is the real-time mapping of a wireless channel from the physical world to the digital world, which is expected to provide significant performance enhancements for the sixth-generation (6G) air-interface design.

Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters

no code implementations5 Mar 2024 Weizhi Wang, Khalil Mrini, Linjie Yang, Sateesh Kumar, Yu Tian, Xifeng Yan, Heng Wang

Our MLM filter can generalize to different models and tasks, and be used as a drop-in replacement for CLIPScore.

Hy-DAT: A Tool to Address Hydropower Modeling Gaps Using Interdependency, Efficiency Curves, and Unit Dispatch Models

no code implementations28 Feb 2024 Dewei Wang, Bhaskar Mitra, Sameer Nekkalapu, Sohom Datta, Bibi Matthew, Rounak Meyur, Heng Wang, Slaven Kincic

As the power system continues to be flooded with intermittent resources, it becomes more important to accurately assess the role of hydro and its impact on the power grid.

DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection

1 code implementation16 Feb 2024 Herun Wan, Shangbin Feng, Zhaoxuan Tan, Heng Wang, Yulia Tsvetkov, Minnan Luo

Large language models are limited by challenges in factuality and hallucinations to be directly employed off-the-shelf for judging the veracity of news articles, where factual accuracy is paramount.

Articles Misinformation

VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens

no code implementations CVPR 2024 Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang

This amplifies the effect of visual tokens on text generation especially when the relative distance is longer between visual and text tokens.

Hallucination Position +3

Video Recognition in Portrait Mode

1 code implementation CVPR 2024 Mingfei Han, Linjie Yang, Xiaojie Jin, Jiashi Feng, Xiaojun Chang, Heng Wang

While existing datasets mainly comprise landscape mode videos, our paper seeks to introduce portrait mode videos to the research community and highlight the unique challenges associated with this video format.

Data Augmentation Video Recognition

Vista-LLaMA: Reliable Video Narrator via Equal Distance to Visual Tokens

no code implementations12 Dec 2023 Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang

This amplifies the effect of visual tokens on text generation, especially when the relative distance is longer between visual and text tokens.

Hallucination Position +2

InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal Large Language Models

no code implementations20 Nov 2023 Xiaotian Han, Quanzeng You, Yongfei Liu, Wentao Chen, Huangjie Zheng, Khalil Mrini, Xudong Lin, Yiqi Wang, Bohan Zhai, Jianbo Yuan, Heng Wang, Hongxia Yang

To mitigate this issue, we manually curate a benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks.

GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks

no code implementations2 Nov 2023 Xinlu Zhang, Yujie Lu, Weizhi Wang, An Yan, Jun Yan, Lianke Qin, Heng Wang, Xifeng Yan, William Yang Wang, Linda Ruth Petzold

Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details.

Image Generation Image to text

Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models

no code implementations4 Oct 2023 Jianglong Ye, Peng Wang, Kejie Li, Yichun Shi, Heng Wang

Specifically, we decompose the NVS task into two stages: (i) transforming observed regions to a novel view, and (ii) hallucinating unseen regions.

Image to 3D Novel View Synthesis

Resolving Knowledge Conflicts in Large Language Models

1 code implementation2 Oct 2023 Yike Wang, Shangbin Feng, Heng Wang, Weijia Shi, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov

To this end, we introduce an evaluation framework for simulating contextual knowledge conflicts and quantitatively evaluating to what extent LLMs achieve these goals.

The Devil is in the Details: A Deep Dive into the Rabbit Hole of Data Filtering

no code implementations27 Sep 2023 Haichao Yu, Yu Tian, Sateesh Kumar, Linjie Yang, Heng Wang

DataComp is a new benchmark dedicated to evaluating different methods for data filtering.

Advancements in 3D Lane Detection Using LiDAR Point Clouds: From Data Collection to Model Development

1 code implementation24 Sep 2023 Runkai Zhao, Yuwen Heng, Heng Wang, Yuanda Gao, Shilei Liu, Changhao Yao, Jiawen Chen, Weidong Cai

Advanced Driver-Assistance Systems (ADAS) have successfully integrated learning-based techniques into vehicle perception and decision-making.

3D Lane Detection Decision Making

Dataset Condensation via Generative Model

no code implementations14 Sep 2023 David Junhao Zhang, Heng Wang, Chuhui Xue, Rui Yan, Wenqing Zhang, Song Bai, Mike Zheng Shou

Dataset condensation aims to condense a large dataset with a lot of training samples into a small set.

Dataset Condensation model

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

1 code implementation18 Aug 2023 Heng Wang, Jianbo Ma, Santiago Pascual, Richard Cartwright, Weidong Cai

In this paper, we propose a lightweight solution to this problem by leveraging foundation models, specifically CLIP, CLAP, and AudioLDM.

Video-to-Sound Generation

Exploring the Role of Audio in Video Captioning

no code implementations21 Jun 2023 YuHan Shen, Linjie Yang, Longyin Wen, Haichao Yu, Ehsan Elhamifar, Heng Wang

Recent focus in video captioning has been on designing architectures that can consume both video and text modalities, and using large-scale video datasets with text transcripts for pre-training, such as HowTo100M.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Can Language Models Solve Graph Problems in Natural Language?

2 code implementations NeurIPS 2023 Heng Wang, Shangbin Feng, Tianxing He, Zhaoxuan Tan, Xiaochuang Han, Yulia Tsvetkov

We then propose Build-a-Graph Prompting and Algorithmic Prompting, two instruction-based approaches to enhance LLMs in solving natural language graph problems.

In-Context Learning Knowledge Probing +2

Detecting Spoilers in Movie Reviews with External Movie Knowledge and User Networks

1 code implementation22 Apr 2023 Heng Wang, Wenqian Zhang, Yuyang Bai, Zhaoxuan Tan, Shangbin Feng, Qinghua Zheng, Minnan Luo

We then propose MVSD, a novel Multi-View Spoiler Detection framework that takes into account the external knowledge about movies and user activities on movie review platforms.

Graph Neural Network

Progressive Volume Distillation with Active Learning for Efficient NeRF Architecture Conversion

1 code implementation8 Apr 2023 Shuangkang Fang, Yufeng Wang, Yi Yang, Weixin Xu, Heng Wang, Wenrui Ding, Shuchang Zhou

For instance, PVD-AL can distill an MLP-based model from a Hashtables-based model at a 10~20X faster speed and 0. 8dB~2dB higher PSNR than training the MLP-based model from scratch.

3D Reconstruction NeRF +1

$R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition

no code implementations6 Apr 2023 Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang

Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.

Feature Correlation Reranking +2

Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision

no code implementations9 Mar 2023 Tarun Kalluri, Weiyao Wang, Heng Wang, Manmohan Chandraker, Lorenzo Torresani, Du Tran

Many top-down architectures for instance segmentation achieve significant success when trained and tested on pre-defined closed-world taxonomy.

Open-World Instance Segmentation Segmentation +1

Temporal Perceiving Video-Language Pre-training

no code implementations18 Jan 2023 Fan Ma, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, Jiashi Feng, Yi Yang

Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features.

Contrastive Learning Moment Retrieval +7

One is All: Bridging the Gap Between Neural Radiance Fields Architectures with Progressive Volume Distillation

1 code implementation29 Nov 2022 Shuangkang Fang, Weixin Xu, Heng Wang, Yi Yang, Yufeng Wang, Shuchang Zhou

In this paper, we propose Progressive Volume Distillation (PVD), a systematic distillation method that allows any-to-any conversions between different architectures, including MLP, sparse or low-rank tensors, hashtables and their compositions.

 Ranked #1 on Novel View Synthesis on NeRF (Average PSNR metric)

3D Reconstruction All +3

Towards Generalisable Audio Representations for Audio-Visual Navigation

no code implementations1 Jun 2022 Shunqi Mao, Chaoyi Zhang, Heng Wang, Weidong Cai

In audio-visual navigation (AVN), an intelligent agent needs to navigate to a constantly sound-making object in complex 3D environments based on its audio and visual perceptions.

Contrastive Learning Data Augmentation +2

Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds

1 code implementation22 Apr 2022 Heng Wang, Chaoyi Zhang, Jianhui Yu, Weidong Cai

Dense captioning in 3D point clouds is an emerging vision-and-language task involving object-level 3D scene understanding.

3D dense captioning 3D Object Detection +7

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

1 code implementation CVPR 2022 Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran

From PA we construct a large set of pseudo-ground-truth instance masks; combined with human-annotated instance masks we train GGNs and significantly outperform the SOTA on open-world instance segmentation on various benchmarks including COCO, LVIS, ADE20K, and UVO.

Diversity Open-World Instance Segmentation +1

Canonical Mean Filter for Almost Zero-Shot Multi-Task classification

no code implementations8 Apr 2022 Yong Li, Heng Wang, Xiang Ye

Motivated by ANIL, we rethink the role of adaption in the feature extractor of CNAPs, which is a state-of-the-art representative few-shot method.

PyTorchVideo: A Deep Learning Library for Video Understanding

1 code implementation18 Nov 2021 Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer

We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.

Deep Learning Self-Supervised Learning +1

Voxel-wise Cross-Volume Representation Learning for 3D Neuron Reconstruction

no code implementations14 Aug 2021 Heng Wang, Chaoyi Zhang, Jianhui Yu, Yang song, SiQi Liu, Wojciech Chrzanowski, Weidong Cai

Recently, a series of deep learning based segmentation methods have been proposed to improve the quality of raw 3D optical image stacks by removing noises and restoring neuronal structures from low-contrast background.

Decoder Representation Learning +1

Towards Understanding the Effectiveness of Attention Mechanism

no code implementations29 Jun 2021 Xiang Ye, Zihang He, Heng Wang, Yong Li

Instead, we verify the crucial role of feature map multiplication in attention mechanism and uncover a fundamental impact of feature map multiplication on the learned landscapes of CNNs: with the high order non-linearity brought by the feature map multiplication, it played a regularization role on CNNs, which made them learn smoother and more stable landscapes near real samples compared to vanilla CNNs.

Beyond Short Clips: End-to-End Video-Level Learning with Collaborative Memories

no code implementations CVPR 2021 Xitong Yang, Haoqi Fan, Lorenzo Torresani, Larry Davis, Heng Wang

The standard way of training video models entails sampling at each iteration a single clip from a video and optimizing the clip prediction with respect to the video-level label.

Action Detection Action Recognition +1

Is Space-Time Attention All You Need for Video Understanding?

16 code implementations9 Feb 2021 Gedas Bertasius, Heng Wang, Lorenzo Torresani

We present a convolution-free approach to video classification built exclusively on self-attention over space and time.

Action Classification Action Recognition +6

Single Neuron Segmentation using Graph-based Global Reasoning with Auxiliary Skeleton Loss from 3D Optical Microscope Images

no code implementations22 Jan 2021 Heng Wang, Yang song, Chaoyi Zhang, Jianhui Yu, SiQi Liu, Hanchuan Peng, Weidong Cai

One of the critical steps in improving accurate single neuron reconstruction from three-dimensional (3D) optical microscope images is the neuronal structure segmentation.

Segmentation

Interactive Prototype Learning for Egocentric Action Recognition

no code implementations ICCV 2021 Xiaohan Wang, Linchao Zhu, Heng Wang, Yi Yang

To avoid these additional costs, we propose an end-to-end Interactive Prototype Learning (IPL) framework to learn better active object representations by leveraging the motion cues from the actor.

Action Recognition Object +1

From W-Net to CDGAN: Bi-temporal Change Detection via Deep Learning Techniques

1 code implementation14 Mar 2020 Bin Hou, Qingjie Liu, Heng Wang, Yunhong Wang

Traditional change detection methods usually follow the image differencing, change feature extraction and classification framework, and their performance is limited by such simple image domain differencing and also the hand-crafted features.

Change Detection Generative Adversarial Network

Region and Object based Panoptic Image Synthesis through Conditional GANs

no code implementations14 Dec 2019 Heng Wang, Donghao Zhang, Yang song, Heng Huang, Mei Chen, Weidong Cai

Our contribution consists of the proposal of a significant task worth investigating and a naive baseline of solving it.

Image-to-Image Translation Translation

Incorporating Graph Attention Mechanism into Knowledge Graph Reasoning Based on Deep Reinforcement Learning

no code implementations IJCNLP 2019 Heng Wang, Shuangyin Li, Rong pan, Mingzhi Mao

Meanwhile, a novel mechanism of reinforcement learning is proposed by forcing an agent to walk forward every step to avoid the agent stalling at the same entity node constantly.

Deep Reinforcement Learning Graph Attention +2

FASTER Recurrent Networks for Efficient Video Classification

no code implementations10 Jun 2019 Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Yi Yang, Heng Wang

FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities.

Action Classification Action Recognition +3

Large-scale weakly-supervised pre-training for video action recognition

3 code implementations CVPR 2019 Deepti Ghadiyaram, Matt Feiszli, Du Tran, Xueting Yan, Heng Wang, Dhruv Mahajan

Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?

Ranked #2 on Egocentric Activity Recognition on EPIC-KITCHENS-55 (Actions Top-1 (S2) metric)

Action Classification Action Recognition +3

Video Classification with Channel-Separated Convolutional Networks

7 code implementations ICCV 2019 Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli

It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks.

Action Classification Action Recognition +4

Defeats GAN: A Simpler Model Outperforms in Knowledge Representation Learning

no code implementations3 Apr 2019 Heng Wang, Mingzhi Mao

The goal of knowledge representation learning is to embed entities and relations into a low-dimensional, continuous vector space.

Link Prediction Representation Learning

Multi-task Learning for Chinese Word Usage Errors Detection

no code implementations3 Apr 2019 Jinbin Zhang, Heng Wang

Chinese word usage errors often occur in non-native Chinese learners' writing.

Multi-Task Learning POS +2

Overview of CAIL2018: Legal Judgment Prediction Competition

2 code implementations13 Oct 2018 Haoxi Zhong, Chaojun Xiao, Zhipeng Guo, Cunchao Tu, Zhiyuan Liu, Maosong Sun, Yansong Feng, Xianpei Han, Zhen Hu, Heng Wang, Jianfeng Xu

In this paper, we give an overview of the Legal Judgment Prediction (LJP) competition at Chinese AI and Law challenge (CAIL2018).

Articles Prediction

CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction

3 code implementations4 Jul 2018 Chaojun Xiao, Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Zhiyuan Liu, Maosong Sun, Yansong Feng, Xianpei Han, Zhen Hu, Heng Wang, Jianfeng Xu

In this paper, we introduce the \textbf{C}hinese \textbf{AI} and \textbf{L}aw challenge dataset (CAIL2018), the first large-scale Chinese legal dataset for judgment prediction.

Articles Prediction +1

Devon: Deformable Volume Network for Learning Optical Flow

no code implementations20 Feb 2018 Yao Lu, Jack Valmadre, Heng Wang, Juho Kannala, Mehrtash Harandi, Philip H. S. Torr

State-of-the-art neural network models estimate large displacement optical flow in multi-resolution and use warping to propagate the estimation between two resolutions.

Optical Flow Estimation

Text Generation Based on Generative Adversarial Nets with Latent Variable

1 code implementation1 Dec 2017 Heng Wang, Zengchang Qin, Tao Wan

We propose the VGAN model where the generative model is composed of recurrent neural network and VAE.

Language Modeling Language Modelling +1

Concept Drift Detection and Adaptation with Hierarchical Hypothesis Testing

no code implementations25 Jul 2017 Shujian Yu, Zubin Abraham, Heng Wang, Mohak Shah, Yantao Wei, José C. Príncipe

A fundamental issue for statistical classification models in a streaming environment is that the joint distribution between predictor and response variables changes over time (a phenomenon also known as concept drifts), such that their classification performance deteriorates dramatically.

Drift Detection General Classification +1

Face Aging Effect Simulation using Hidden Factor Analysis Joint Sparse Representation

no code implementations4 Nov 2015 Hongyu Yang, Di Huang, Yunhong Wang, Heng Wang, Yuanyan Tang

Face aging simulation has received rising investigations nowadays, whereas it still remains a challenge to generate convincing and natural age-progressed face images.

Concept Drift Detection for Streaming Data

1 code implementation4 Apr 2015 Heng Wang, Zubin Abraham

Common statistical prediction models often require and assume stationarity in the data.

Drift Detection

Active Community Detection in Massive Graphs

2 code implementations30 Dec 2014 Heng Wang, Da Zheng, Randal Burns, Carey Priebe

A canonical problem in graph mining is the detection of dense communities.

Social and Information Networks Physics and Society

Cannot find the paper you are looking for? You can Submit a new open access paper.