no code implementations • ECCV 2020 • Yuan-Ting Hu, Heng Wang, Nicolas Ballas, Kristen Grauman, Alexander G. Schwing
Video inpainting is an important technique for a wide variety of applications from video content editing to video restoration.
no code implementations • 19 Jun 2025 • Haiyang Miao, Jianhua Zhang, Pan Tang, Heng Wang, Lei Tian, Guangyi Liu
With the increase of multiple-input-multiple-output (MIMO) array size and carrier frequency, near-field MIMO communications will become crucial in 6G wireless networks.
1 code implementation • 10 Jun 2025 • Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, HongYu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang, Zixin Zhang, Bin Wang, Bo Li, Buyun Ma, Changxin Miao, Changyi Wan, Chen Xu, Dapeng Shi, Dingyuan Hu, Enle Liu, Guanzhe Huang, Gulin Yan, Hanpeng Hu, Haonan Jia, Jiahao Gong, Jiaoren Wu, Jie Wu, Jie Yang, Junzhe Lin, Kaixiang Li, Lei Xia, Longlong Gu, Ming Li, Nie Hao, Ranchen Ming, Shaoliang Pang, SiQi Liu, Song Yuan, Tiancheng Cao, Wen Li, Wenqing He, Xu Zhao, Xuelin Zhang, Yanbo Yu, Yinmin Zhong, Yu Zhou, Yuanwei Liang, Yuanwei Lu, Yuxiang Yang, Zidong Yang, Zili Zhang, Binxing Jiao, Heung-Yeung Shum, Jiansheng Chen, Jing Li, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu, Daxin Jiang, Shuchang Zhou, Chen Hu
This work contributes a promising solution for end-to-end LALMs and highlights the critical role of token-based vocoder in enhancing overall performance for AQAA tasks.
no code implementations • 1 Jun 2025 • Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xinyun Liu, Yulia Tsvetkov
We then compare them against existing settings on both in-domain synthetic tasks and out-of-domain real-world tasks with implicit graph structures such as multi-hop QA, structured planning, and more.
1 code implementation • 24 Apr 2025 • Haokai Zhang, Shengtao Zhang, Zijian Cai, Heng Wang, Ruixuan Zhu, Zinan Zeng, Minnan Luo
User bias is calculated through dynamic graph modeling of review history.
1 code implementation • 10 Apr 2025 • Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, HaoNing Wu, Haotian Yao, Haoyu Lu, Heng Wang, Hongcheng Gao, Huabin Zheng, Jiaming Li, Jianlin Su, Jianzhou Wang, Jiaqi Deng, Jiezhong Qiu, Jin Xie, Jinhong Wang, Jingyuan Liu, Junjie Yan, Kun Ouyang, Liang Chen, Lin Sui, Longhui Yu, Mengfan Dong, Mengnan Dong, Nuo Xu, Pengyu Cheng, Qizheng Gu, Runjie Zhou, Shaowei Liu, Sihan Cao, Tao Yu, Tianhui Song, Tongtong Bai, Wei Song, Weiran He, Weixiao Huang, Weixin Xu, Xiaokun Yuan, Xingcheng Yao, Xingzhe Wu, Xinxing Zu, Xinyu Zhou, Xinyuan Wang, Y. Charles, Yan Zhong, Yang Li, Yangyang Hu, Yanru Chen, Yejie Wang, Yibo Liu, Yibo Miao, Yidao Qin, Yimin Chen, Yiping Bao, Yiqin Wang, Yongsheng Kang, Yuanxin Liu, Yulun Du, Yuxin Wu, Yuzhi Wang, Yuzi Yan, Zaida Zhou, Zhaowei Li, Zhejun Jiang, Zheng Zhang, Zhilin Yang, Zhiqi Huang, Zihao Huang, Zijia Zhao, Ziwei Chen, Zongyu Lin
We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2. 8B parameters in its language decoder (Kimi-VL-A3B).
no code implementations • 1 Apr 2025 • Weizhi Wang, Yu Tian, Linjie Yang, Heng Wang, Xifeng Yan
We open-source all aspects of our work, including compute-efficient and data-efficient training details, data filtering methods, sequence packing scripts, pre-training data in WebDataset format, FSDP-based training codebase, and both base and instruction-tuned model checkpoints.
1 code implementation • CVPR 2025 • Zhe Shan, Yang Liu, Lei Zhou, Cheng Yan, Heng Wang, Xia Xie
In this work, we propose ROS-SAM, a method designed to achieve high-quality interactive segmentation while preserving generalization across diverse remote sensing data.
no code implementations • 14 Mar 2025 • Heng Wang, Yotaro Shimose, Shingo Takamatsu
Advertising banners are critical for capturing user attention and enhancing advertising campaign effectiveness.
1 code implementation • 26 Feb 2025 • Jiayi Fu, Xuandong Zhao, Chengyuan Yao, Heng Wang, Qi Han, Yanghua Xiao
Reinforcement Learning from Human Feedback (RLHF) is essential for aligning large language models (LLMs) with human values.
1 code implementation • 17 Feb 2025 • Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, HongYu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu, Jianchang Wu, Jiangjie Zhen, Ranchen Ming, Song Yuan, Xuelin Zhang, Yu Zhou, Bingxin Li, Buyun Ma, Hongyuan Wang, Kang An, Wei Ji, Wen Li, Xuan Wen, Xiangwen Kong, Yuankai Ma, Yuanwei Liang, Yun Mou, Bahtiyar Ahmidi, Bin Wang, Bo Li, Changxin Miao, Chen Xu, Chenrun Wang, Dapeng Shi, Deshan Sun, Dingyuan Hu, Dula Sai, Enle Liu, Guanzhe Huang, Gulin Yan, Heng Wang, Haonan Jia, Haoyang Zhang, Jiahao Gong, Junjing Guo, Jiashuai Liu, Jiahong Liu, Jie Feng, Jie Wu, Jiaoren Wu, Jie Yang, Jinguo Wang, Jingyang Zhang, Junzhe Lin, Kaixiang Li, Lei Xia, Li Zhou, Liang Zhao, Longlong Gu, Mei Chen, Menglin Wu, Ming Li, Mingxiao Li, Mingliang Li, Mingyao Liang, Na Wang, Nie Hao, Qiling Wu, Qinyuan Tan, Ran Sun, Shuai Shuai, Shaoliang Pang, Shiliang Yang, Shuli Gao, Shanshan Yuan, SiQi Liu, Shihong Deng, Shilei Jiang, Sitong Liu, Tiancheng Cao, Tianyu Wang, Wenjin Deng, Wuxun Xie, Weipeng Ming, Wenqing He, Wen Sun, Xin Han, Xin Huang, Xiaomin Deng, Xiaojia Liu, Xin Wu, Xu Zhao, Yanan Wei, Yanbo Yu, Yang Cao, Yangguang Li, Yangzhen Ma, Yanming Xu, Yaoyu Wang, Yaqiang Shi, Yilei Wang, Yizhuang Zhou, Yinmin Zhong, Yang Zhang, Yaoben Wei, Yu Luo, Yuanwei Lu, Yuhe Yin, Yuchu Luo, Yuanhao Ding, Yuting Yan, Yaqi Dai, Yuxiang Yang, Zhe Xie, Zheng Ge, Zheng Sun, Zhewei Huang, Zhichao Chang, Zhisheng Guan, Zidong Yang, Zili Zhang, Binxing Jiao, Daxin Jiang, Heung-Yeung Shum, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu
Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following.
3 code implementations • 14 Feb 2025 • Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou, Kaijun Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun, Xin Han, Yanan Wei, Zheng Ge, Aojie Li, Bin Wang, Bizhu Huang, Bo wang, Brian Li, Changxing Miao, Chen Xu, Chenfei Wu, Chenguang Yu, Dapeng Shi, Dingyuan Hu, Enle Liu, Gang Yu, Ge Yang, Guanzhe Huang, Gulin Yan, Haiyang Feng, Hao Nie, Haonan Jia, Hanpeng Hu, Hanqi Chen, Haolong Yan, Heng Wang, Hongcheng Guo, Huilin Xiong, Huixin Xiong, Jiahao Gong, Jianchang Wu, Jiaoren Wu, Jie Wu, Jie Yang, Jiashuai Liu, Jiashuo Li, Jingyang Zhang, Junjing Guo, Junzhe Lin, Kaixiang Li, Lei Liu, Lei Xia, Liang Zhao, Liguo Tan, Liwen Huang, Liying Shi, Ming Li, Mingliang Li, Muhua Cheng, Na Wang, Qiaohui Chen, Qinglin He, Qiuyan Liang, Quan Sun, Ran Sun, Rui Wang, Shaoliang Pang, Shiliang Yang, Sitong Liu, SiQi Liu, Shuli Gao, Tiancheng Cao, Tianyu Wang, Weipeng Ming, Wenqing He, Xu Zhao, Xuelin Zhang, Xianfang Zeng, Xiaojia Liu, Xuan Yang, Yaqi Dai, Yanbo Yu, Yang Li, Yineng Deng, Yingming Wang, Yilei Wang, Yuanwei Lu, Yu Chen, Yu Luo, Yuchu Luo, Yuhe Yin, Yuheng Feng, Yuxiang Yang, Zecheng Tang, Zekai Zhang, Zidong Yang, Binxing Jiao, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu, Heung-Yeung Shum, Daxin Jiang
We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length.
2 code implementations • 13 Feb 2025 • Xintao Wang, Heng Wang, Yifei Zhang, Xinfeng Yuan, Rui Xu, Jen-tse Huang, Siyu Yuan, Haoran Guo, Jiangjie Chen, Wei Wang, Yanghua Xiao, Shuchang Zhou
It provides authentic dialogues with real-world intricacies, as well as diverse data types such as conversation setups, character experiences and internal thoughts.
4 code implementations • 7 Jan 2025 • Nvidia, :, Niket Agarwal, Arslan Ali, Maciej Bala, Yogesh Balaji, Erik Barker, Tiffany Cai, Prithvijit Chattopadhyay, Yongxin Chen, Yin Cui, Yifan Ding, Daniel Dworakowski, Jiaojiao Fan, Michele Fenzi, Francesco Ferroni, Sanja Fidler, Dieter Fox, Songwei Ge, Yunhao Ge, Jinwei Gu, Siddharth Gururani, Ethan He, Jiahui Huang, Jacob Huffman, Pooya Jannaty, Jingyi Jin, Seung Wook Kim, Gergely Klár, Grace Lam, Shiyi Lan, Laura Leal-Taixe, Anqi Li, Zhaoshuo Li, Chen-Hsuan Lin, Tsung-Yi Lin, Huan Ling, Ming-Yu Liu, Xian Liu, Alice Luo, Qianli Ma, Hanzi Mao, Kaichun Mo, Arsalan Mousavian, Seungjun Nah, Sriharsha Niverty, David Page, Despoina Paschalidou, Zeeshan Patel, Lindsey Pavao, Morteza Ramezanali, Fitsum Reda, Xiaowei Ren, Vasanth Rao Naik Sabavat, Ed Schmerling, Stella Shi, Bartosz Stefaniak, Shitao Tang, Lyne Tchapmi, Przemek Tredak, Wei-Cheng Tseng, Jibin Varghese, Hao Wang, Haoxiang Wang, Heng Wang, Ting-Chun Wang, Fangyin Wei, Xinyue Wei, Jay Zhangjie Wu, Jiashu Xu, Wei Yang, Lin Yen-Chen, Xiaohui Zeng, Yu Zeng, Jing Zhang, Qinsheng Zhang, Yuxuan Zhang, Qingqing Zhao, Artur Zolkowski
In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups.
1 code implementation • 11 Dec 2024 • Khalil Mrini, Hanlin Lu, Linjie Yang, Weilin Huang, Heng Wang
Text-to-image generation has advanced rapidly, yet aligning complex textual prompts with generated visuals remains challenging, especially with intricate object relationships and fine-grained details.
1 code implementation • 23 Nov 2024 • Wei Guo, Heng Wang, Jianbo Ma, Weidong Cai
By addressing V2A generation at the sound-source level, SSV2A surpasses state-of-the-art methods in both generation fidelity and relevance as evidenced by extensive experiments.
no code implementations • 4 Nov 2024 • Fan Wu, Muhammad Bilal, Haolong Xiang, Heng Wang, Jinjun Yu, Xiaolong Xu
In this paper, an edge-cloud collaborative early-warning system is proposed to enable real-time and downtime-tolerant fault diagnosis of RTMs, providing a new paradigm for the deployment of models in safety-critical scenarios.
no code implementations • 29 Oct 2024 • Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Yui Lo, Yuqian Chen, Lauren J. O'Donnell, Weidong Cai
Reconstructing neuron morphology from 3D light microscope imaging data is critical to aid neuroscientists in analyzing brain networks and neuroanatomy.
1 code implementation • 30 Sep 2024 • Zhen Yang, Yanpeng Dong, Heng Wang, Lichao Ma, Zijian Cui, Qi Liu, Haoran Pei
Multi-sensor fusion significantly enhances the accuracy and robustness of 3D semantic occupancy prediction, which is crucial for autonomous driving and robotics.
Ranked #1 on
Prediction Of Occupancy Grid Maps
on Occ3D-nuScenes
no code implementations • 22 Sep 2024 • Changsheng Zhao, Jianhua Zhang, Yuxiang Zhang, Lei Tian, Heng Wang, Hanyuan Jiang, Yameng Liu, Wenjun Chen, Tao Jiang, Guangyi Liu
Integrated sensing and communication (ISAC) has been recognized as the key technology in the vision of the sixth generation (6G) era.
no code implementations • 21 Sep 2024 • Zhiyuan Li, Dongnan Liu, Chaoyi Zhang, Heng Wang, Tengfei Xue, Weidong Cai
Recent advancements in Vision-Language (VL) research have sparked new benchmarks for complex visual reasoning, challenging models' advanced reasoning ability.
1 code implementation • 13 Sep 2024 • Ruiqi Zhong, Heng Wang, Dan Klein, Jacob Steinhardt
To make sense of massive data, we often fit simplified models and then interpret the parameters; for example, we cluster the text embeddings and then interpret the mean parameters of each cluster.
no code implementations • 11 Sep 2024 • Yan-Bo Lin, Yu Tian, Linjie Yang, Gedas Bertasius, Heng Wang
We present a framework for learning to generate background music from video inputs.
no code implementations • 6 Sep 2024 • Rui Yu, Runkai Zhao, Cong Nie, Heng Wang, Huaicheng Yan, Meng Wang
Specifically, Motion-Guided Feature Aggregation (MGFA) is proposed to utilize the object trajectory from previous and future motion states to model spatial-temporal correlations into gaussian heatmap over a driving sequence.
1 code implementation • 15 Aug 2024 • Zhiyuan Li, Heng Wang, Dongnan Liu, Chaoyi Zhang, Ao Ma, Jieting Long, Weidong Cai
However, will these causalities remain straightforward for Vision Large Language Models (VLLMs) when only visual hints are provided?
1 code implementation • 23 Jun 2024 • Yizhuo Zhang, Heng Wang, Shangbin Feng, Zhaoxuan Tan, Xiaochuang Han, Tianxing He, Yulia Tsvetkov
To this end, we propose the NLGift benchmark, an evaluation suite of LLM graph reasoning generalization: whether LLMs could go beyond semantic, numeric, structural, reasoning patterns in the synthetic training data and improve utility on real-world graph-based tasks.
1 code implementation • 11 Jun 2024 • Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie
The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks.
no code implementations • 15 May 2024 • Xuanchen Wang, Heng Wang, Dongnan Liu, Weidong Cai
This task enables the dance generation of specific individuals without requiring keypoint annotations, making it more versatile and applicable to various situations.
no code implementations • 4 May 2024 • Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Weidong Cai
To address this limitation, we aim to distill the consensus knowledge from massive natural image data to aid the segmentation model in learning the complex neuron structures.
no code implementations • 15 Apr 2024 • Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie
This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200, 000 edits.
no code implementations • 19 Mar 2024 • Heng Wang, Jianhua Zhang, Gaofeng Nie, Li Yu, Zhiqiang Yuan, Tongjie Li, Jialin Wang, Guangyi Liu
Digital twin channel (DTC) is the real-time mapping of a wireless channel from the physical world to the digital world, which is expected to provide significant performance enhancements for the sixth-generation (6G) air-interface design.
no code implementations • 8 Mar 2024 • Zinan Zeng, Sen Ye, Zijian Cai, Heng Wang, YuHan Liu, Haokai Zhang, Minnan Luo
For instance, the metadata and the corresponding user's information of a review could be helpful.
no code implementations • 5 Mar 2024 • Weizhi Wang, Khalil Mrini, Linjie Yang, Sateesh Kumar, Yu Tian, Xifeng Yan, Heng Wang
Our MLM filter can generalize to different models and tasks, and be used as a drop-in replacement for CLIPScore.
no code implementations • 28 Feb 2024 • Dewei Wang, Bhaskar Mitra, Sameer Nekkalapu, Sohom Datta, Bibi Matthew, Rounak Meyur, Heng Wang, Slaven Kincic
As the power system continues to be flooded with intermittent resources, it becomes more important to accurately assess the role of hydro and its impact on the power grid.
1 code implementation • 16 Feb 2024 • Herun Wan, Shangbin Feng, Zhaoxuan Tan, Heng Wang, Yulia Tsvetkov, Minnan Luo
Large language models are limited by challenges in factuality and hallucinations to be directly employed off-the-shelf for judging the veracity of news articles, where factual accuracy is paramount.
no code implementations • CVPR 2024 • Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang
This amplifies the effect of visual tokens on text generation especially when the relative distance is longer between visual and text tokens.
1 code implementation • CVPR 2024 • Mingfei Han, Linjie Yang, Xiaojie Jin, Jiashi Feng, Xiaojun Chang, Heng Wang
While existing datasets mainly comprise landscape mode videos, our paper seeks to introduce portrait mode videos to the research community and highlight the unique challenges associated with this video format.
1 code implementation • 16 Dec 2023 • Mingfei Han, Linjie Yang, Xiaojun Chang, Heng Wang
A human need to capture both the event in every shot and associate them together to understand the story behind it.
Ranked #1 on
video narration captioning
on Shot2Story20K
no code implementations • 12 Dec 2023 • Fan Ma, Xiaojie Jin, Heng Wang, Yuchen Xian, Jiashi Feng, Yi Yang
This amplifies the effect of visual tokens on text generation, especially when the relative distance is longer between visual and text tokens.
Ranked #2 on
Question Answering
on NExT-QA (Open-ended VideoQA)
no code implementations • 20 Nov 2023 • Xiaotian Han, Quanzeng You, Yongfei Liu, Wentao Chen, Huangjie Zheng, Khalil Mrini, Xudong Lin, Yiqi Wang, Bohan Zhai, Jianbo Yuan, Heng Wang, Hongxia Yang
To mitigate this issue, we manually curate a benchmark dataset specifically designed for MLLMs, with a focus on complex reasoning tasks.
no code implementations • 2 Nov 2023 • Xinlu Zhang, Yujie Lu, Weizhi Wang, An Yan, Jun Yan, Lianke Qin, Heng Wang, Xifeng Yan, William Yang Wang, Linda Ruth Petzold
Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details.
no code implementations • 4 Oct 2023 • Jianglong Ye, Peng Wang, Kejie Li, Yichun Shi, Heng Wang
Specifically, we decompose the NVS task into two stages: (i) transforming observed regions to a novel view, and (ii) hallucinating unseen regions.
1 code implementation • 2 Oct 2023 • Yike Wang, Shangbin Feng, Heng Wang, Weijia Shi, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov
To this end, we introduce an evaluation framework for simulating contextual knowledge conflicts and quantitatively evaluating to what extent LLMs achieve these goals.
no code implementations • 27 Sep 2023 • Haichao Yu, Yu Tian, Sateesh Kumar, Linjie Yang, Heng Wang
DataComp is a new benchmark dedicated to evaluating different methods for data filtering.
1 code implementation • 24 Sep 2023 • Runkai Zhao, Yuwen Heng, Heng Wang, Yuanda Gao, Shilei Liu, Changhao Yao, Jiawen Chen, Weidong Cai
Advanced Driver-Assistance Systems (ADAS) have successfully integrated learning-based techniques into vehicle perception and decision-making.
no code implementations • 14 Sep 2023 • David Junhao Zhang, Heng Wang, Chuhui Xue, Rui Yan, Wenqing Zhang, Song Bai, Mike Zheng Shou
Dataset condensation aims to condense a large dataset with a lot of training samples into a small set.
1 code implementation • 18 Aug 2023 • Heng Wang, Jianbo Ma, Santiago Pascual, Richard Cartwright, Weidong Cai
In this paper, we propose a lightweight solution to this problem by leveraging foundation models, specifically CLIP, CLAP, and AudioLDM.
Ranked #2 on
Video-to-Sound Generation
on VGG-Sound
1 code implementation • 27 Jul 2023 • Zhiyuan Li, Dongnan Liu, Heng Wang, Chaoyi Zhang, Weidong Cai
Recently, training an image captioner without annotated image-sentence pairs has gained traction.
1 code implementation • ICCV 2023 • Cheng-En Wu, Yu Tian, Haichao Yu, Heng Wang, Pedro Morgado, Yu Hen Hu, Linjie Yang
Vision-language models such as CLIP learn a generic text-image embedding from large-scale training data.
no code implementations • 21 Jun 2023 • YuHan Shen, Linjie Yang, Longyin Wen, Haichao Yu, Ehsan Elhamifar, Heng Wang
Recent focus in video captioning has been on designing architectures that can consume both video and text modalities, and using large-scale video datasets with text transcripts for pre-training, such as HowTo100M.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
2 code implementations • NeurIPS 2023 • Heng Wang, Shangbin Feng, Tianxing He, Zhaoxuan Tan, Xiaochuang Han, Yulia Tsvetkov
We then propose Build-a-Graph Prompting and Algorithmic Prompting, two instruction-based approaches to enhance LLMs in solving natural language graph problems.
1 code implementation • 22 Apr 2023 • Heng Wang, Wenqian Zhang, Yuyang Bai, Zhaoxuan Tan, Shangbin Feng, Qinghua Zheng, Minnan Luo
We then propose MVSD, a novel Multi-View Spoiler Detection framework that takes into account the external knowledge about movies and user activities on movie review platforms.
1 code implementation • 8 Apr 2023 • Shuangkang Fang, Yufeng Wang, Yi Yang, Weixin Xu, Heng Wang, Wenrui Ding, Shuchang Zhou
For instance, PVD-AL can distill an MLP-based model from a Hashtables-based model at a 10~20X faster speed and 0. 8dB~2dB higher PSNR than training the MLP-based model from scratch.
no code implementations • 6 Apr 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang
Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.
1 code implementation • CVPR 2023 • Shuhong Chen, Kevin Zhang, Yichun Shi, Heng Wang, Yiheng Zhu, Guoxian Song, Sizhe An, Janus Kristjansson, Xiao Yang, Matthias Zwicker
We propose PAniC-3D, a system to reconstruct stylized 3D character heads directly from illustrated (p)ortraits of (ani)me (c)haracters.
no code implementations • 9 Mar 2023 • Tarun Kalluri, Weiyao Wang, Heng Wang, Manmohan Chandraker, Lorenzo Torresani, Du Tran
Many top-down architectures for instance segmentation achieve significant success when trained and tested on pre-defined closed-world taxonomy.
no code implementations • 18 Jan 2023 • Fan Ma, Xiaojie Jin, Heng Wang, Jingjia Huang, Linchao Zhu, Jiashi Feng, Yi Yang
Specifically, text-video localization consists of moment retrieval, which predicts start and end boundaries in videos given the text description, and text localization which matches the subset of texts with the video features.
1 code implementation • CVPR 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang
Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.
1 code implementation • 29 Nov 2022 • Shuangkang Fang, Weixin Xu, Heng Wang, Yi Yang, Yufeng Wang, Shuchang Zhou
In this paper, we propose Progressive Volume Distillation (PVD), a systematic distillation method that allows any-to-any conversions between different architectures, including MLP, sparse or low-rank tensors, hashtables and their compositions.
Ranked #1 on
Novel View Synthesis
on NeRF
(Average PSNR metric)
1 code implementation • 15 Oct 2022 • Runkai Zhao, Heng Wang, Chaoyi Zhang, Weidong Cai
In this paper, we propose a novel framework for 3D neuron reconstruction.
1 code implementation • 9 Jun 2022 • Shangbin Feng, Zhaoxuan Tan, Herun Wan, Ningnan Wang, Zilong Chen, Binchi Zhang, Qinghua Zheng, Wenqian Zhang, Zhenyu Lei, Shujie Yang, Xinshun Feng, Qingyue Zhang, Hongrui Wang, YuHan Liu, Yuyang Bai, Heng Wang, Zijian Cai, Yanbo Wang, Lijing Zheng, Zihan Ma, Jundong Li, Minnan Luo
Twitter bot detection has become an increasingly important task to combat misinformation, facilitate social media moderation, and preserve the integrity of the online discourse.
no code implementations • 1 Jun 2022 • Shunqi Mao, Chaoyi Zhang, Heng Wang, Weidong Cai
In audio-visual navigation (AVN), an intelligent agent needs to navigate to a constantly sound-making object in complex 3D environments based on its audio and visual perceptions.
1 code implementation • 22 Apr 2022 • Heng Wang, Chaoyi Zhang, Jianhui Yu, Weidong Cai
Dense captioning in 3D point clouds is an emerging vision-and-language task involving object-level 3D scene understanding.
Ranked #8 on
3D dense captioning
on ScanRefer Dataset
1 code implementation • CVPR 2022 • Weiyao Wang, Matt Feiszli, Heng Wang, Jitendra Malik, Du Tran
From PA we construct a large set of pseudo-ground-truth instance masks; combined with human-annotated instance masks we train GGNs and significantly outperform the SOTA on open-world instance segmentation on various benchmarks including COCO, LVIS, ADE20K, and UVO.
no code implementations • 8 Apr 2022 • Yong Li, Heng Wang, Xiang Ye
Motivated by ANIL, we rethink the role of adaption in the feature extractor of CNAPs, which is a state-of-the-art representative few-shot method.
1 code implementation • 9 Dec 2021 • Jianhui Yu, Chaoyi Zhang, Heng Wang, Dingxin Zhang, Yang song, Tiange Xiang, Dongnan Liu, Weidong Cai
General point clouds have been increasingly investigated for different tasks, and recently Transformer-based networks are proposed for point cloud analysis.
Ranked #1 on
3D Point Cloud Classification
on IntrA
1 code implementation • 18 Nov 2021 • Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer
We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing.
no code implementations • ICCV 2021 • Xinyu Gong, Heng Wang, Zheng Shou, Matt Feiszli, Zhangyang Wang, Zhicheng Yan
We design a multivariate search space, including 6 search variables to capture a wide variety of choices in designing two-stream models.
no code implementations • 14 Aug 2021 • Heng Wang, Chaoyi Zhang, Jianhui Yu, Yang song, SiQi Liu, Wojciech Chrzanowski, Weidong Cai
Recently, a series of deep learning based segmentation methods have been proposed to improve the quality of raw 3D optical image stacks by removing noises and restoring neuronal structures from low-contrast background.
no code implementations • 29 Jun 2021 • Xiang Ye, Zihang He, Heng Wang, Yong Li
Instead, we verify the crucial role of feature map multiplication in attention mechanism and uncover a fundamental impact of feature map multiplication on the learned landscapes of CNNs: with the high order non-linearity brought by the feature map multiplication, it played a regularization role on CNNs, which made them learn smoother and more stable landscapes near real samples compared to vanilla CNNs.
no code implementations • ICCV 2021 • Weiyao Wang, Matt Feiszli, Heng Wang, Du Tran
Current state-of-the-art object detection and segmentation methods work well under the closed-world assumption.
no code implementations • CVPR 2021 • Xitong Yang, Haoqi Fan, Lorenzo Torresani, Larry Davis, Heng Wang
The standard way of training video models entails sampling at each iteration a single clip from a video and optimizing the clip prediction with respect to the video-level label.
16 code implementations • 9 Feb 2021 • Gedas Bertasius, Heng Wang, Lorenzo Torresani
We present a convolution-free approach to video classification built exclusively on self-attention over space and time.
Ranked #1 on
Video Question Answering
on Howto100M-QA
no code implementations • 22 Jan 2021 • Heng Wang, Yang song, Chaoyi Zhang, Jianhui Yu, SiQi Liu, Hanchuan Peng, Weidong Cai
One of the critical steps in improving accurate single neuron reconstruction from three-dimensional (3D) optical microscope images is the neuronal structure segmentation.
no code implementations • ICCV 2021 • Xiaohan Wang, Linchao Zhu, Heng Wang, Yi Yang
To avoid these additional costs, we propose an end-to-end Interactive Prototype Learning (IPL) framework to learn better active object representations by leveraging the motion cues from the actor.
1 code implementation • 14 Mar 2020 • Bin Hou, Qingjie Liu, Heng Wang, Yunhong Wang
Traditional change detection methods usually follow the image differencing, change feature extraction and classification framework, and their performance is limited by such simple image domain differencing and also the hand-crafted features.
no code implementations • 19 Dec 2019 • Xingyi Duan, Baoxin Wang, Ziyue Wang, Wentao Ma, Yiming Cui, Dayong Wu, Shijin Wang, Ting Liu, Tianxiang Huo, Zhen Hu, Heng Wang, Zhiyuan Liu
We present a Chinese judicial reading comprehension (CJRC) dataset which contains approximately 10K documents and almost 50K questions with answers.
no code implementations • 14 Dec 2019 • Heng Wang, Donghao Zhang, Yang song, Heng Huang, Mei Chen, Weidong Cai
Our contribution consists of the proposal of a significant task worth investigating and a naive baseline of solving it.
2 code implementations • 20 Nov 2019 • Chaojun Xiao, Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Zhiyuan Liu, Maosong Sun, Tianyang Zhang, Xianpei Han, Zhen Hu, Heng Wang, Jianfeng Xu
In this paper, we introduce CAIL2019-SCM, Chinese AI and Law 2019 Similar Case Matching dataset.
no code implementations • IJCNLP 2019 • Heng Wang, Shuangyin Li, Rong pan, Mingzhi Mao
Meanwhile, a novel mechanism of reinforcement learning is proposed by forcing an agent to walk forward every step to avoid the agent stalling at the same entity node constantly.
no code implementations • 10 Jun 2019 • Linchao Zhu, Laura Sevilla-Lara, Du Tran, Matt Feiszli, Yi Yang, Heng Wang
FASTER aims to leverage the redundancy between neighboring clips and reduce the computational cost by learning to aggregate the predictions from models of different complexities.
Ranked #29 on
Action Recognition
on UCF101
no code implementations • CVPR 2020 • Heng Wang, Du Tran, Lorenzo Torresani, Matt Feiszli
Motion is a salient cue to recognize actions in video.
Ranked #115 on
Action Classification
on Kinetics-400
3 code implementations • CVPR 2019 • Deepti Ghadiyaram, Matt Feiszli, Du Tran, Xueting Yan, Heng Wang, Dhruv Mahajan
Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?
Ranked #2 on
Egocentric Activity Recognition
on EPIC-KITCHENS-55
(Actions Top-1 (S2) metric)
7 code implementations • ICCV 2019 • Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli
It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks.
Ranked #1 on
Action Recognition
on Sports-1M
no code implementations • 3 Apr 2019 • Heng Wang, Mingzhi Mao
The goal of knowledge representation learning is to embed entities and relations into a low-dimensional, continuous vector space.
no code implementations • 3 Apr 2019 • Jinbin Zhang, Heng Wang
Chinese word usage errors often occur in non-native Chinese learners' writing.
2 code implementations • 13 Oct 2018 • Haoxi Zhong, Chaojun Xiao, Zhipeng Guo, Cunchao Tu, Zhiyuan Liu, Maosong Sun, Yansong Feng, Xianpei Han, Zhen Hu, Heng Wang, Jianfeng Xu
In this paper, we give an overview of the Legal Judgment Prediction (LJP) competition at Chinese AI and Law challenge (CAIL2018).
no code implementations • ECCV 2018 • Jamie Ray, Heng Wang, Du Tran, YuFei Wang, Matt Feiszli, Lorenzo Torresani, Manohar Paluri
The videos retrieved by the search engines are then veried for correctness by human annotators.
3 code implementations • 4 Jul 2018 • Chaojun Xiao, Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Zhiyuan Liu, Maosong Sun, Yansong Feng, Xianpei Han, Zhen Hu, Heng Wang, Jianfeng Xu
In this paper, we introduce the \textbf{C}hinese \textbf{AI} and \textbf{L}aw challenge dataset (CAIL2018), the first large-scale Chinese legal dataset for judgment prediction.
no code implementations • 20 Feb 2018 • Yao Lu, Jack Valmadre, Heng Wang, Juho Kannala, Mehrtash Harandi, Philip H. S. Torr
State-of-the-art neural network models estimate large displacement optical flow in multi-resolution and use warping to propagate the estimation between two resolutions.
1 code implementation • 1 Dec 2017 • Heng Wang, Zengchang Qin, Tao Wan
We propose the VGAN model where the generative model is composed of recurrent neural network and VAE.
24 code implementations • CVPR 2018 • Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann Lecun, Manohar Paluri
In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.
Ranked #3 on
Action Recognition
on Sports-1M
no code implementations • 25 Jul 2017 • Shujian Yu, Zubin Abraham, Heng Wang, Mohak Shah, Yantao Wei, José C. Príncipe
A fundamental issue for statistical classification models in a streaming environment is that the joint distribution between predictor and response variables changes over time (a phenomenon also known as concept drifts), such that their classification performance deteriorates dramatically.
no code implementations • 4 Nov 2015 • Hongyu Yang, Di Huang, Yunhong Wang, Heng Wang, Yuanyan Tang
Face aging simulation has received rising investigations nowadays, whereas it still remains a challenge to generate convincing and natural age-progressed face images.
no code implementations • 21 Apr 2015 • Heng Wang, Dan Oneata, Jakob Verbeek, Cordelia Schmid
We also use the homography to cancel out camera motion from the optical flow.
1 code implementation • 4 Apr 2015 • Heng Wang, Zubin Abraham
Common statistical prediction models often require and assume stationarity in the data.
2 code implementations • 30 Dec 2014 • Heng Wang, Da Zheng, Randal Burns, Carey Priebe
A canonical problem in graph mining is the detection of dense communities.
Social and Information Networks Physics and Society