no code implementations • 16 Jan 2025 • Nanye Ma, Shangyuan Tong, HaoLin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, Saining Xie
In this work, we explore the inference-time scaling behavior of diffusion models beyond increasing denoising steps and investigate how the generation performance can further improve with increased computation.
no code implementations • 12 Dec 2024 • Yihong Sun, Hao Zhou, Liangzhe Yuan, Jennifer J. Sun, Yandong Li, Xuhui Jia, Hartwig Adam, Bharath Hariharan, Long Zhao, Ting Liu
We explore a novel video creation experience, namely Video Creation by Demonstration.
1 code implementation • 15 Nov 2024 • Yandong Li, Francesco Monticone
Similar to algorithms, which consume time and memory to run, hardware requires resources to function.
no code implementations • 16 Oct 2024 • Lichang Chen, Hexiang Hu, Mingda Zhang, YiWen Chen, Zifeng Wang, Yandong Li, Pranav Shyam, Tianyi Zhou, Heng Huang, Ming-Hsuan Yang, Boqing Gong
To address this, OmnixR offers two evaluation variants: (1)synthetic subset: a synthetic dataset generated automatically by translating text into multiple modalities--audio, images, video, and hybrids (Omnify).
no code implementations • 5 Oct 2024 • Long Zhao, Sanghyun Woo, Ziyu Wan, Yandong Li, Han Zhang, Boqing Gong, Hartwig Adam, Xuhui Jia, Ting Liu
Current visual tokenization methods rely on a traditional autoencoder framework, where the encoder compresses data into latent representations, and the decoder reconstructs the original input.
1 code implementation • CVPR 2024 • Jingyao Xu, Yuetong Lu, Yandong Li, Siyang Lu, Dongdong Wang, Xiang Wei
Diffusion models (DMs) embark a new era of generative modeling and offer more opportunities for efficient generating high-quality and realistic data samples.
no code implementations • CVPR 2024 • Hexiang Hu, Kelvin C. K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia
We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision.
no code implementations • 5 Dec 2023 • Shaoan Xie, Yang Zhao, Zhisheng Xiao, Kelvin C. K. Chan, Yandong Li, Yanwu Xu, Kun Zhang, Tingbo Hou
Our extensive experiments demonstrate the superior performance of our method in terms of visual quality, identity preservation, and text control, showcasing its effectiveness in the context of text-guided subject-driven image inpainting.
no code implementations • 14 Apr 2023 • Yu-Chuan Su, Kelvin C. K. Chan, Yandong Li, Yang Zhao, Han Zhang, Boqing Gong, Huisheng Wang, Xuhui Jia
Our approach greatly reduces the overhead for personalized image generation and is more applicable in many potential applications.
no code implementations • 5 Apr 2023 • Xuhui Jia, Yang Zhao, Kelvin C. K. Chan, Yandong Li, Han Zhang, Boqing Gong, Tingbo Hou, Huisheng Wang, Yu-Chuan Su
This paper proposes a method for generating images of customized objects specified by users.
no code implementations • 5 Apr 2023 • Kai Han, Xiaohu Huang, Yandong Li, Sagar Vaze, Jie Li, Xuhui Jia
In this paper, we reconsider the recognition problem and task a vision-language model with assigning class names to images given only a large (essentially unconstrained) vocabulary of categories as prior information.
no code implementations • 2 Mar 2023 • Chenyan Wu, Yimu Pan, Yandong Li, James Z. Wang
Test-time adaptation (TTA) is a technique used to reduce distribution gaps between the training and testing sets by leveraging unlabeled test data during inference.
no code implementations • CVPR 2023 • Hong-You Chen, Yandong Li, Yin Cui, Mingda Zhang, Wei-Lun Chao, Li Zhang
We study the problem of how to train a "personalization-friendly" model such that given only the task descriptions, the model can be adapted to different end-users' needs, e. g., for accurately classifying different subsets of objects.
no code implementations • 25 May 2022 • Chenyan Wu, Yandong Li, Xianfeng Tang, James Wang
Our method works like the following: First, to model the multi-human environment, it processes multi-human 2D poses and builds a novel heterogeneous graph, where nodes from different people and within one person are connected to capture inter-human interactions and draw the body geometry (i. e., skeleton and mesh structure).
Ranked #5 on
3D Multi-Person Pose Estimation
on MuPoTS-3D
3D Multi-Person Human Pose Estimation
3D Multi-Person Pose Estimation
+1
no code implementations • 19 Apr 2022 • Wencong Xu, Yandong Li, Bingshu Chen, Yue Hu, Jianxu Li, Zijing Zeng
The experimental results show that the PD direction-finding error is 3. 39{\deg}, which can meet the need for Partial discharge DOA estimation using inspection robots in substations.
no code implementations • CVPR 2022 • Yang Zhao, Yu-Chuan Su, Chun-Te Chu, Yandong Li, Marius Renn, Yukun Zhu, Changyou Chen, Xuhui Jia
While existing approaches for face restoration make significant progress in generating high-quality faces, they often fail to preserve facial features and cannot authentically reconstruct the faces.
1 code implementation • NeurIPS 2021 • Tai-Yu Pan, Cheng Zhang, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao
We propose NorCal, Normalized Calibration for long-tailed object detection and instance segmentation, a simple and straightforward recipe that reweighs the predicted scores of each class by its training sample size.
3 code implementations • CVPR 2021 • Dan Kondratyuk, Liangzhe Yuan, Yandong Li, Li Zhang, Mingxing Tan, Matthew Brown, Boqing Gong
We present Mobile Video Networks (MoViNets), a family of computation and memory efficient video networks that can operate on streaming video for online inference.
Ranked #3 on
Action Classification
on Charades
1 code implementation • ICCV 2021 • Cheng Zhang, Tai-Yu Pan, Yandong Li, Hexiang Hu, Dong Xuan, Soravit Changpinyo, Boqing Gong, Wei-Lun Chao
Many objects do not appear frequently enough in complex scenes (e. g., certain handbags in living rooms) for training an accurate object detector, but are often found frequently by themselves (e. g., in product images).
1 code implementation • CVPR 2021 • Yandong Li, Xuhui Jia, Ruoxin Sang, Yukun Zhu, Bradley Green, Liqiang Wang, Boqing Gong
This paper is concerned with ranking many pre-trained deep neural networks (DNNs), called checkpoints, for the transfer learning to a downstream task.
Ranked #6 on
Transferability
on classification benchmark
no code implementations • ECCV 2020 • Yandong Li, Di Huang, Danfeng Qin, Liqiang Wang, Boqing Gong
They fail to improve object detectors in their vanilla forms due to the domain gap between the Web images and curated datasets.
1 code implementation • CVPR 2020 • Dongdong Wang, Yandong Li, Liqiang Wang, Boqing Gong
The other is that the number of images used for the knowledge distillation should be small; otherwise, it violates our expectation of reducing the dependence on large-scale datasets.
1 code implementation • CVPR 2020 • Yandong Li, Yu Cheng, Zhe Gan, Licheng Yu, Liqiang Wang, Jingjing Liu
We propose a new task towards more practical application for image generation - high-quality image synthesis from salient object layout.
no code implementations • ICLR 2020 • Yunhui Guo, Mingrui Liu, Yandong Li, Liqiang Wang, Tianbao Yang, Tajana Rosing
We evaluate the effectiveness of traditional attack methods such as FGSM and PGD. The results show that A-GEM still possesses strong continual learning ability in the presence of adversarial examples in the memory and simple defense techniques such as label smoothing can further alleviate the adversarial effects.
no code implementations • 21 Nov 2019 • Yunhui Guo, Yandong Li, Liqiang Wang, Tajana Rosing
Fine-tuning is a popular transfer learning technique for deep neural networks where a few rounds of training are applied to the parameters of a pre-trained model to adapt them to a new task.
1 code implementation • 20 Aug 2019 • Xianfeng Tang, Yandong Li, Yiwei Sun, Huaxiu Yao, Prasenjit Mitra, Suhang Wang
To optimize PA-GNN for a poisoned graph, we design a meta-optimization algorithm that trains PA-GNN to penalize perturbations using clean graphs and their adversarial counterparts, and transfers such ability to improve the robustness of PA-GNN on the poisoned graph.
Ranked #25 on
Node Classification
on Pubmed
no code implementations • ICLR 2019 • Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, Boqing Gong
In other words, there is a population of adversarial examples, instead of only one, for any input to a DNN.
1 code implementation • 1 May 2019 • Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, Boqing Gong
Powerful adversarial attack methods are vital for understanding how to construct robust deep neural networks (DNNs) and for thoroughly testing defense techniques.
no code implementations • 25 Feb 2019 • Xianfeng Tang, Boqing Gong, Yanwei Yu, Huaxiu Yao, Yandong Li, Haiyong Xie, Xiaoyu Wang
In this paper, we propose a novel framework for the citywide traffic volume inference using both dense GPS trajectories and incomplete trajectories captured by camera surveillance systems.
1 code implementation • 3 Feb 2019 • Yunhui Guo, Yandong Li, Rogerio Feris, Liqiang Wang, Tajana Rosing
A model aware of the relationships between different domains can also be trained to work on new domains with less resources.
8 code implementations • 5 Nov 2018 • Dongliang He, Zhichao Zhou, Chuang Gan, Fu Li, Xiao Liu, Yandong Li, Li-Min Wang, Shilei Wen
In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatial temporal network (StNet) architecture for both local and global spatial-temporal modeling in videos.
no code implementations • ECCV 2018 • Yandong Li, Liqiang Wang, Tianbao Yang, Boqing Gong
The large volume of video content and high viewing frequency demand automatic video summarization algorithms, of which a key property is the capability of modeling diversity.
1 code implementation • ICCV 2017 • Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong
Many seemingly distant annotations (e. g., semantic segmentation and visual question answering (VQA)) are inherently connected in that they reveal different levels and perspectives of human understandings about the same visual scenes --- and even the same set of images (e. g., of COCO).
no code implementations • 12 Aug 2017 • Yunlong Bian, Chuang Gan, Xiao Liu, Fu Li, Xiang Long, Yandong Li, Heng Qi, Jie zhou, Shilei Wen, Yuanqing Lin
Experiment results on the challenging Kinetics dataset demonstrate that our proposed temporal modeling approaches can significantly improve existing approaches in the large-scale video recognition tasks.
Ranked #167 on
Action Classification
on Kinetics-400
1 code implementation • 14 Jul 2017 • Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie zhou, Shilei Wen
This paper describes our solution for the video recognition task of the Google Cloud and YouTube-8M Video Understanding Challenge that ranked the 3rd place.