no code implementations • 4 Feb 2025 • Xueqing Deng, Qihang Yu, Ali Athar, Chenglin Yang, Linjie Yang, Xiaojie Jin, Xiaohui Shen, Liang-Chieh Chen
This dataset sets a new benchmark for evaluating models on joint panoptic segmentation and grounded captioning tasks, addressing the need for high-quality, detailed image-text annotations in multi-modal learning.
no code implementations • 31 Dec 2024 • Zijie Li, Henry Li, Yichun Shi, Amir Barati Farimani, Yuval Kluger, Linjie Yang, Peng Wang
Diffusion models have gained tremendous success in text-to-image generation, yet still lag behind with visual understanding tasks, an area dominated by autoregressive vision-language models.
1 code implementation • 11 Dec 2024 • Khalil Mrini, Hanlin Lu, Linjie Yang, Weilin Huang, Heng Wang
Text-to-image generation has advanced rapidly, yet aligning complex textual prompts with generated visuals remains challenging, especially with intricate object relationships and fine-grained details.
no code implementations • 11 Sep 2024 • Yan-Bo Lin, Yu Tian, Linjie Yang, Gedas Bertasius, Heng Wang
We present a framework for learning to generate background music from video inputs.
no code implementations • 9 Sep 2024 • Henghui Ding, Lingyi Hong, Chang Liu, Ning Xu, Linjie Yang, Yuchen Fan, Deshui Miao, Yameng Gu, Xin Li, Zhenyu He, YaoWei Wang, Ming-Hsuan Yang, Jinming Chai, Qin Ma, Junpei Zhang, Licheng Jiao, Fang Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Xu Liu, Lingling Li, Hao Fang, Feiyu Pan, Xiankai Lu, Wei zhang, Runmin Cong, Tuyen Tran, Bin Cao, Yisi Zhang, Hanyi Wang, Xingjian He, Jing Liu
Despite the promising performance of current video segmentation models on existing benchmarks, these models still struggle with complex scenes.
1 code implementation • 11 Jun 2024 • Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie
The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks.
no code implementations • 5 Mar 2024 • Weizhi Wang, Khalil Mrini, Linjie Yang, Sateesh Kumar, Yu Tian, Xifeng Yan, Heng Wang
Our MLM filter can generalize to different models and tasks, and be used as a drop-in replacement for CLIPScore.
1 code implementation • CVPR 2024 • Mingfei Han, Linjie Yang, Xiaojie Jin, Jiashi Feng, Xiaojun Chang, Heng Wang
While existing datasets mainly comprise landscape mode videos, our paper seeks to introduce portrait mode videos to the research community and highlight the unique challenges associated with this video format.
1 code implementation • 16 Dec 2023 • Mingfei Han, Linjie Yang, Xiaojun Chang, Heng Wang
A human need to capture both the event in every shot and associate them together to understand the story behind it.
Ranked #1 on
video narration captioning
on Shot2Story20K
no code implementations • 8 Oct 2023 • Haogeng Liu, Qihang Fan, Tingkai Liu, Linjie Yang, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang
This paper proposes Video-Teller, a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment to significantly enhance the video-to-text generation task.
no code implementations • 3 Oct 2023 • Xueqing Deng, Qi Fan, Xiaojie Jin, Linjie Yang, Peng Wang
Specifically, SFA consists of external adapters and internal adapters which are sequentially operated over a transformer model.
no code implementations • 27 Sep 2023 • Haichao Yu, Yu Tian, Sateesh Kumar, Linjie Yang, Heng Wang
DataComp is a new benchmark dedicated to evaluating different methods for data filtering.
1 code implementation • 23 Jul 2023 • Yiming Cui, Linjie Yang, Haichao Yu
Transformer-based detection and segmentation methods use a list of learned detection queries to retrieve information from the transformer network and learn to predict the location and category of one specific object from each query.
1 code implementation • ICCV 2023 • Cheng-En Wu, Yu Tian, Haichao Yu, Heng Wang, Pedro Morgado, Yu Hen Hu, Linjie Yang
Vision-language models such as CLIP learn a generic text-image embedding from large-scale training data.
no code implementations • 21 Jun 2023 • YuHan Shen, Linjie Yang, Longyin Wen, Haichao Yu, Ehsan Elhamifar, Heng Wang
Recent focus in video captioning has been on designing architectures that can consume both video and text modalities, and using large-scale video datasets with text transcripts for pre-training, such as HowTo100M.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 6 Apr 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang
Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.
1 code implementation • 15 Mar 2023 • Yiming Cui, Linjie Yang
With Transformerbased object detectors getting a better performance on the image domain tasks, recent works began to extend those methods to video object detection.
1 code implementation • CVPR 2023 • Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang
Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.
1 code implementation • 16 Nov 2022 • Taojiannan Yang, Linjie Yang, Xiaojie Jin, Chen Chen
In this paper, we revisit these training-free metrics and find that: (1) the number of parameters (\#Param), which is the most straightforward training-free metric, is overlooked in previous works but is surprisingly effective, (2) recent training-free metrics largely rely on the \#Param information to rank networks.
no code implementations • 12 Jul 2022 • Yiming Cui, Linjie Yang, Ding Liu
Object detection is a basic computer vision task to loccalize and categorize objects in a given image.
1 code implementation • 25 Aug 2021 • Shanchuan Lin, Linjie Yang, Imran Saleemi, Soumyadip Sengupta
We introduce a robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance.
1 code implementation • CVPR 2021 • Mingyu Ding, Xiaochen Lian, Linjie Yang, Peng Wang, Xiaojie Jin, Zhiwu Lu, Ping Luo
Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources.
no code implementations • 16 May 2021 • Haichao Yu, Linjie Yang, Humphrey Shi
Post-training quantization methods use a set of calibration data to compute quantization ranges for network parameters and activations.
1 code implementation • CVPR 2021 • Xueyan Zou, Linjie Yang, Ding Liu, Yong Jae Lee
To achieve this goal, it is necessary to find correspondences from neighbouring frames to faithfully hallucinate the unknown content.
1 code implementation • ICLR 2022 • Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang, Ping Luo
(4) Thorough studies of NCP on inter-, cross-, and intra-tasks highlight the importance of cross-task neural architecture design, i. e., multitask neural architectures and architecture transferring between different tasks.
5 code implementations • 22 Mar 2021 • Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian, Zihang Jiang, Qibin Hou, Jiashi Feng
In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to be deeper.
Ranked #464 on
Image Classification
on ImageNet
1 code implementation • ICCV 2021 • Daquan Zhou, Xiaojie Jin, Xiaochen Lian, Linjie Yang, Yujing Xue, Qibin Hou, Jiashi Feng
Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction.
1 code implementation • 7 Dec 2020 • Yang Fu, Linjie Yang, Ding Liu, Thomas S. Huang, Humphrey Shi
Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video.
Ranked #32 on
Video Instance Segmentation
on YouTube-VIS validation
1 code implementation • 4 Jul 2020 • Linjie Yang, Qing Jin
Model quantization helps to reduce model size and latency of deep neural networks.
2 code implementations • CVPR 2020 • Yingwei Li, Xiaojie Jin, Jieru Mei, Xiaochen Lian, Linjie Yang, Cihang Xie, Qihang Yu, Yuyin Zhou, Song Bai, Alan Yuille
However, it has been rarely explored to embed the NL blocks in mobile neural networks, mainly due to the following challenges: 1) NL blocks generally have heavy computation cost which makes it difficult to be applied in applications where computational resources are limited, and 2) it is an open problem to discover an optimal configuration to embed NL blocks into mobile neural networks.
Ranked #60 on
Neural Architecture Search
on ImageNet
3 code implementations • 21 Dec 2019 • Qing Jin, Linjie Yang, Zhenyu Liao
To deal with this problem, we propose a simple yet effective technique, named scale-adjusted training (SAT), to comply with the discovered rules and facilitates efficient training.
1 code implementation • CVPR 2020 • Qing Jin, Linjie Yang, Zhenyu Liao
With our proposed techniques applied on a bunch of models including MobileNet-V1/V2 and ResNet-50, we demonstrate that bit-width of weights and activations is a new option for adaptively executable deep neural networks, offering a distinct opportunity for improved accuracy-efficiency trade-off as well as instant adaptation according to the platform constraints in real-world applications.
1 code implementation • ICLR 2020 • Jieru Mei, Yingwei Li, Xiaochen Lian, Xiaojie Jin, Linjie Yang, Alan Yuille, Jianchao Yang
We propose a fine-grained search space comprised of atomic blocks, a minimal search unit that is much smaller than the ones used in recent NAS algorithms.
Ranked #61 on
Neural Architecture Search
on ImageNet
no code implementations • 25 Sep 2019 • Qing Jin, Linjie Yang, Zhenyu Liao
To deal with this problem, we propose a simple yet effective technique, named scale-adjusted training (SAT), to comply with the discovered rules and facilitates efficient training.
no code implementations • 30 Jul 2019 • Zhengyuan Yang, Yuncheng Li, Linjie Yang, Ning Zhang, Jiebo Luo
The core idea is first converting the sparse weak labels such as keypoints to the initial estimate of body part masks, and then iteratively refine the part mask predictions.
5 code implementations • ICCV 2019 • Linjie Yang, Yuchen Fan, Ning Xu
The goal of this new task is simultaneous detection, segmentation and tracking of instances in videos.
Ranked #40 on
Video Instance Segmentation
on YouTube-VIS validation
1 code implementation • 19 Apr 2019 • Ruotian Luo, Ning Zhang, Bohyung Han, Linjie Yang
We present a novel problem setting in zero-shot learning, zero-shot object recognition and detection in the context.
1 code implementation • CVPR 2019 • Jonghwan Mun, Linjie Yang, Zhou Ren, Ning Xu, Bohyung Han
Dense video captioning is an extremely challenging task since accurate and coherent description of events in a video requires holistic understanding of video contents as well as contextual reasoning of individual events.
4 code implementations • ICLR 2019 • Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, Thomas Huang
Instead of training individual networks with different width configurations, we train a shared network with switchable batch normalization.
no code implementations • 6 Sep 2018 • Ning Xu, Linjie Yang, Yuchen Fan, Dingcheng Yue, Yuchen Liang, Jianchao Yang, Thomas Huang
End-to-end sequential learning to explore spatialtemporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i. e., even the largest video segmentation dataset only contains 90 short video clips.
4 code implementations • ECCV 2018 • Ning Xu, Linjie Yang, Yuchen Fan, Jianchao Yang, Dingcheng Yue, Yuchen Liang, Brian Price, Scott Cohen, Thomas Huang
End-to-end sequential learning to explore spatial-temporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i. e., even the largest video segmentation dataset only contains 90 short video clips.
Ranked #12 on
Video Object Segmentation
on YouTube-VOS 2018
(F-Measure (Unseen) metric)
1 code implementation • CVPR 2018 • Linjie Yang, Yanran Wang, Xuehan Xiong, Jianchao Yang, Aggelos K. Katsaggelos
Video object segmentation targets at segmenting a specific object throughout a video sequence, given only an annotated first frame.
Ranked #1 on
One-shot visual object segmentation
on YouTube-VOS 2018
(Jaccard (Seen) metric)
1 code implementation • CVPR 2017 • Linjie Yang, Kevin Tang, Jianchao Yang, Li-Jia Li
The goal is to densely detect visual concepts (e. g., objects, object parts, and interactions between them) from images, labeling each with a short descriptive phrase.
3 code implementations • CVPR 2015 • Linjie Yang, Ping Luo, Chen Change Loy, Xiaoou Tang
Updated on 24/09/2015: This update provides preliminary experiment results for fine-grained classification on the surveillance data of CompCars.
Ranked #6 on
Fine-Grained Image Classification
on CompCars