no code implementations • 24 Apr 2025 • Xu Ma, Peize Sun, Haoyu Ma, Hao Tang, Chih-Yao Ma, Jialiang Wang, Kunpeng Li, Xiaoliang Dai, Yujun Shi, Xuan Ju, Yushi Hu, Artsiom Sanakoyeu, Felix Juefei-Xu, Ji Hou, Junjiao Tian, Tao Xu, Tingbo Hou, Yen-Cheng Liu, Zecheng He, Zijian He, Matt Feiszli, Peizhao Zhang, Peter Vajda, Sam Tsai, Yun Fu
A primary limitation is the substantial number of image tokens required for AR models, which constrains both training and inference efficiency, as well as image resolution.
no code implementations • 8 Apr 2025 • Xichen Pan, Satya Narayan Shukla, Aashu Singh, Zhuokai Zhao, Shlok Kumar Mishra, Jialiang Wang, Zhiyang Xu, Jiuhai Chen, Kunpeng Li, Felix Juefei-Xu, Ji Hou, Saining Xie
Unified multimodal models aim to integrate understanding (text output) and generation (pixel output), but aligning these different modalities within a single architecture often demands complex training recipes and careful data balancing.
no code implementations • 30 Mar 2025 • Cong Wei, Bo Sun, Haoyu Ma, Ji Hou, Felix Juefei-Xu, Zecheng He, Xiaoliang Dai, Luxin Zhang, Kunpeng Li, Tingbo Hou, Animesh Sinha, Peter Vajda, Wenhu Chen
We introduce Talking Characters, a more realistic task to generate talking character animations directly from speech and text.
1 code implementation • 6 Mar 2025 • Yifei HUANG, Jilan Xu, Baoqi Pei, Yuping He, Guo Chen, Mingfang Zhang, Lijin Yang, Zheng Nie, Jinyao Liu, Guoshun Fan, Dechen Lin, Fang Fang, Kunpeng Li, Chang Yuan, Xinyuan Chen, Yaohui Wang, Yali Wang, Yu Qiao, LiMin Wang
We present Vinci, a vision-language system designed to provide real-time, comprehensive AI assistance on portable devices.
no code implementations • 2 Mar 2025 • Tomohiro Ando, Jushan Bai, Kunpeng Li, Yong Song
The proposed model captures the dynamic structure of panel data, high-dimensional cross-sectional dependence, and allows for heterogeneous regression coefficients.
no code implementations • 4 Feb 2025 • Feng Liang, Haoyu Ma, Zecheng He, Tingbo Hou, Ji Hou, Kunpeng Li, Xiaoliang Dai, Felix Juefei-Xu, Samaneh Azadi, Animesh Sinha, Peizhao Zhang, Peter Vajda, Diana Marculescu
This challenge arises due to the lack of a mechanism to link each concept with its specific reference image.
1 code implementation • 30 Dec 2024 • Yifei HUANG, Jilan Xu, Baoqi Pei, Yuping He, Guo Chen, Lijin Yang, Xinyuan Chen, Yaohui Wang, Zheng Nie, Jinyao Liu, Guoshun Fan, Dechen Lin, Fang Fang, Kunpeng Li, Chang Yuan, Yali Wang, Yu Qiao, LiMin Wang
We introduce Vinci, a real-time embodied smart assistant built upon an egocentric vision-language model.
2 code implementations • 17 Oct 2024 • Adam Polyak, Amit Zohar, Andrew Brown, Andros Tjandra, Animesh Sinha, Ann Lee, Apoorv Vyas, Bowen Shi, Chih-Yao Ma, Ching-Yao Chuang, David Yan, Dhruv Choudhary, Dingkang Wang, Geet Sethi, Guan Pang, Haoyu Ma, Ishan Misra, Ji Hou, Jialiang Wang, Kiran Jagadeesh, Kunpeng Li, Luxin Zhang, Mannat Singh, Mary Williamson, Matt Le, Matthew Yu, Mitesh Kumar Singh, Peizhao Zhang, Peter Vajda, Quentin Duval, Rohit Girdhar, Roshan Sumbaly, Sai Saketh Rambhatla, Sam Tsai, Samaneh Azadi, Samyak Datta, Sanyuan Chen, Sean Bell, Sharadh Ramaswamy, Shelly Sheynin, Siddharth Bhattacharya, Simran Motwani, Tao Xu, Tianhe Li, Tingbo Hou, Wei-Ning Hsu, Xi Yin, Xiaoliang Dai, Yaniv Taigman, Yaqiao Luo, Yen-Cheng Liu, Yi-Chiao Wu, Yue Zhao, Yuval Kirstain, Zecheng He, Zijian He, Albert Pumarola, Ali Thabet, Artsiom Sanakoyeu, Arun Mallya, Baishan Guo, Boris Araya, Breena Kerr, Carleigh Wood, Ce Liu, Cen Peng, Dimitry Vengertsev, Edgar Schonfeld, Elliot Blanchard, Felix Juefei-Xu, Fraylie Nord, Jeff Liang, John Hoffman, Jonas Kohler, Kaolin Fire, Karthik Sivakumar, Lawrence Chen, Licheng Yu, Luya Gao, Markos Georgopoulos, Rashel Moritz, Sara K. Sampson, Shikai Li, Simone Parmeggiani, Steve Fine, Tara Fowler, Vladan Petrovic, Yuming Du
Our models set a new state-of-the-art on multiple tasks: text-to-video synthesis, video personalization, video editing, video-to-audio generation, and text-to-audio generation.
no code implementations • CVPR 2024 • Feng Liang, Bichen Wu, Jialiang Wang, Licheng Yu, Kunpeng Li, Yinan Zhao, Ishan Misra, Jia-Bin Huang, Peizhao Zhang, Peter Vajda, Diana Marculescu
This enables our model for video synthesis by editing the first frame with any prevalent I2I models and then propagating edits to successive frames.
no code implementations • CVPR 2024 • Jonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma, Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, Ji Hou
Central to our approach is a user-defined 3D semantic proxy room that outlines a rough room layout based on semantic bounding boxes and a textual description of the overall room style.
no code implementations • 27 Sep 2023 • Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda, Devi Parikh
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text.
1 code implementation • 23 Dec 2022 • Xu Ma, Huan Wang, Can Qin, Kunpeng Li, Xingchen Zhao, Jie Fu, Yun Fu
Vision Transformers have shown great promise recently for many vision tasks due to the insightful architecture design and attention mechanism.
1 code implementation • CVPR 2023 • Feng Liang, Bichen Wu, Xiaoliang Dai, Kunpeng Li, Yinan Zhao, Hang Zhang, Peizhao Zhang, Peter Vajda, Diana Marculescu
To address this, we propose to finetune CLIP on a collection of masked image regions and their corresponding text descriptions.
Ranked #4 on
Semantic Segmentation
on Replica
1 code implementation • 24 Sep 2022 • Jiamian Wang, Kunpeng Li, Yulun Zhang, Xin Yuan, Zhiqiang Tao
Besides, CASSI entangles the spatial and spectral information into a 2D measurement, placing a barrier for information disentanglement and modeling.
no code implementations • 29 Aug 2022 • Kunpeng Li, Guangcui Shao, Naijun Yang, Xiao Fang, Yang song
Customer Life Time Value (LTV) is the expected total revenue that a single user can bring to a business.
1 code implementation • ICLR 2022 • Yue Bai, Huan Wang, Zhiqiang Tao, Kunpeng Li, Yun Fu
In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark, then go from a complementary direction to articulate the Dual Lottery Ticket Hypothesis (DLTH): Randomly selected subnetworks from a randomly initialized dense network can be transformed into a trainable condition and achieve admirable performance compared with LTH -- random tickets in a given lottery pool can be transformed into winning tickets.
2 code implementations • 12 Oct 2021 • Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li, Yun Fu
Current Sign Language Recognition (SLR) methods usually extract features via deep neural networks and suffer overfitting due to limited and noisy data.
no code implementations • CVPR 2021 • Yulun Zhang, Kai Li, Kunpeng Li, Yun Fu
They also fail to sense the entire space of the input, which is critical for high-quality MR image SR. To address those problems, we propose squeeze and excitation reasoning attention networks (SERAN) for accurate MR image SR. We propose to squeeze attention from global spatial information of the input and obtain global descriptors.
Ranked #2 on
Image Super-Resolution
on IXI
3 code implementations • 16 Mar 2021 • Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li, Yun Fu
Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master.
Ranked #2 on
Sign Language Recognition
on AUTSL
(using extra training data)
no code implementations • 11 Jan 2021 • Kunpeng Li, Zizhao Zhang, Guanhang Wu, Xuehan Xiong, Chen-Yu Lee, Zhichao Lu, Yun Fu, Tomas Pfister
To address this issue, we introduce a new method for pre-training video action recognition models using queried web videos.
no code implementations • 1 Jan 2021 • Kunpeng Li, Zizhao Zhang, Guanhang Wu, Xuehan Xiong, Chen-Yu Lee, Yun Fu, Tomas Pfister
To address this issue, we introduce a new method for pre-training video action recognition models using queried web videos.
no code implementations • 21 Sep 2020 • Jianqing Fan, Kunpeng Li, Yuan Liao
This paper makes a selective survey on the recent development of the factor model and its application on statistical learnings.
1 code implementation • CVPR 2020 • Kunpeng Li, Chen Fang, Zhaowen Wang, Seokhwan Kim, Hailin Jin, Yun Fu
It is very popular for both novice and experienced users to learn new skills, compared to other tutorial media such as text, because of the visual guidance and the ease of understanding.
1 code implementation • CVPR 2020 • Kai Li, Yulun Zhang, Kunpeng Li, Yun Fu
The recent flourish of deep learning in various tasks is largely accredited to the rich and accessible labeled data.
no code implementations • ICCV 2019 • Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, Yun Fu
With weights sharing and domain adversary training, this knowledge can be successfully transferred by regularizing the network's response to the same category in the target domain.
2 code implementations • ICCV 2019 • Kunpeng Li, Yulun Zhang, Kai Li, Yuanyuan Li, Yun Fu
It outperforms the current best method by 6. 8% relatively for image retrieval and 4. 8% relatively for caption retrieval on MS-COCO (Recall@1 using 1K test set).
Ranked #8 on
Image Retrieval
on Flickr30K 1K test
2 code implementations • ICLR 2019 • Yulun Zhang, Kunpeng Li, Kai Li, Bineng Zhong, Yun Fu
To address this issue, we design local and non-local attention blocks to extract features that capture the long-range dependencies between pixels and pay more attention to the challenging parts.
1 code implementation • 18 Aug 2018 • Kai Li, Zhengming Ding, Kunpeng Li, Yulun Zhang, Yun Fu
To ensure scalability and separability, a softmax-like function is formulated to push apart the positive and negative support sets.
20 code implementations • ECCV 2018 • Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, Yun Fu
To solve these problems, we propose the very deep residual channel attention networks (RCAN).
Ranked #21 on
Image Super-Resolution
on BSD100 - 4x upscaling
2 code implementations • CVPR 2018 • Kunpeng Li, Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, Yun Fu
Weakly supervised learning with only coarse labels can obtain visual explanations of deep neural network such as attention maps by back-propagating gradients.