1 code implementation • 29 May 2025 • Xiangdong Zhang, Jiaqi Liao, Shaofeng Zhang, Fanqing Meng, Xiangpeng Wan, Junchi Yan, Yu Cheng
Recent advancements in text-to-video (T2V) diffusion models have enabled high-fidelity and realistic video synthesis.
no code implementations • 21 May 2025 • Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, Chenchen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao, Dong Du, Dong Wang, Feng Zhang, Fengzong Lian, Guanghui Xu, Guanwei Zhang, Hai Wang, Haipeng Luo, Han Hu, Huilin Xu, Jiajia Wu, Jianchen Zhu, Jianfeng Yan, Jiaqi Zhu, Jinbao Xue, Jun Xia, Junqiang Zheng, Kai Liu, Kai Zhang, Kai Zheng, Kejiao Li, Keyao Wang, Lan Jiang, Lixin Liu, Lulu Wu, Mengyuan Huang, Peijie Yu, Peiqi Wang, Qian Wang, Qianbiao Xiang, Qibin Liu, Qingfeng Sun, Richard Guo, Ruobing Xie, Saiyong Yang, Shaohua Chen, Shihui Hu, Shuai Li, Shuaipeng Li, Shuang Chen, Suncong Zheng, Tao Yang, Tian Zhang, TingHao Yu, Weidong Han, Weijie Liu, Weijin Zhou, Weikang Wang, Wesleye Chen, Xiao Feng, Xiaoqin Ren, Xingwu Sun, Xiong Kuang, Xuemeng Huang, Xun Cao, Yanfeng Chen, Yang Du, Yang Zhen, Yaping Deng, Yi Shen, Yigeng Hong, Yiqi Chen, Yiqing Huang, Yuchi Deng, Yue Mao, Yulong Wang, Yuyuan Zeng, Zenan Xu, Zhanhui Kang, Zhenxiang Yan, Zheng Fang, Zhichao Hu, Zhongzhi Chen, Zhuoyu Li, Zongwei Li, Alex Yan, Ande Liang, Baitong Liu, Beiping Pan, Bin Xing, Binghong Wu, Bingxin Qu, Bolin Ni, Boyu Wu, Chen Li, Cheng Jiang, Cheng Zhang, Chengjun Liu, Chengxu Yang, Chiyu Wang, Chong Zha, Daisy Yi, Di Wang, Fanyang Lu, Fei Chen, Feifei Liu, Feng Zheng, Guanghua Yu, Guiyang Li, Guohua Wang, Haisheng Lin, Han Liu, Han Wang, Hao Fei, Hao Lu, Haoqing Jiang, Haoran Sun, Haotian Zhu, Huangjin Dai, Huankui Chen, Huawen Feng, Huihui Cai, Huxin Peng, Jackson Lv, Jiacheng Shi, Jiahao Bu, Jianbo Li, Jianglu Hu, Jiangtao Guan, Jianing Xu, Jianwei Cai, Jiarong Zhang, Jiawei Song, Jie Jiang, Jie Liu, Jieneng Yang, Jihong Zhang, Jin lv, Jing Zhao, Jinjian Li, JinXing Liu, Jun Zhao, Juntao Guo, Kai Wang, Kan Wu, Lei Fu, Lei He, Lei Wang, Li Liu, Liang Dong, Liya Zhan, Long Cheng, Long Xu, Mao Zheng, Meng Liu, Mengkang Hu, Nanli Chen, Peirui Chen, Peng He, Pengju Pan, Pengzhi Wei, Qi Yang, Qi Yi, Roberts Wang, Rongpeng Chen, Rui Sun, Rui Yang, Ruibin Chen, Ruixu Zhou, Shaofeng Zhang, Sheng Zhang, Shihao Xu, Shuaishuai Chang, Shulin Liu, Siqi Wang, Songjia Feng, Songling Yuan, Tao Zhang, Tianjiao Lang, Tongkai Li, Wei Deng, Wei Li, Weichao Wang, Weigang Zhang, Weixuan Sun, Wen Ouyang, Wenxiang Jiao, Wenzhi Sun, Wenzhuo Jia, Xiang Zhang, Xiangyu He, Xianshun Ren, Xiaoying Zhu, Xiaolong Guo, Xiaoxue Li, Xiaoyu Ma, Xican Lu, Xinhua Feng, Xinting Huang, Xinyu Guan, Xirui Li, Xu Zhang, Xudong Gao, Xun Luo, Xuxiang Qi, Yangkun Chen, Yangyu Tao, Yanling Xiao, Yantao Mai, Yanze Chen, Yao Ding, Yeting Yang, YiFan Song, Yifan Yang, Yijiao Zhu, Yinhe Wu, Yixian Liu, Yong Yang, Yuanjun Cai, Yuanlin Tu, Yue Zhang, Yufei Huang, YuHang Zhou, Yuhao Jiang, Yuhong Liu, Yuhui Hu, YuJin Lin, Yun Yang, Yunhao Wang, Yusong Zhang, Zekun Wu, Zelong Zhang, Zhan Yu, Zhaoliang Yang, Zhe Zhao, Zheng Li, Zhenyu Huang, Zhiguang Liu, Zhiqing Kui, Zhiyin Zeng, Zhiyuan Xiong, Zhuo Han, Zifan Wu, Zigang Geng, Zilong Zhao, Ziyan Tang, Ziyuan Zhu, Zonglei Zhu, Zhijiang Xu
As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model.
1 code implementation • CVPR 2025 • Yi Yu, Botao Ren, Peiyuan Zhang, Mingxin Liu, Junwei Luo, Shaofeng Zhang, Feipeng Da, Junchi Yan, Xue Yang
To our best knowledge, Point2RBox-v2 is the first approach to explore the spatial layout among instances for learning point-supervised OOD.
no code implementations • 13 Nov 2024 • Qiang Zhou, Shaofeng Zhang, Nianzu Yang, Ye Qian, Hao Li
Furthermore, MVideo supports motion condition editing and composition, facilitating the generation of videos with more complex actions.
1 code implementation • 4 Nov 2024 • Yan Li, Weiwei Guo, Xue Yang, Ning Liao, Shaofeng Zhang, Yi Yu, Wenxian Yu, Junchi Yan
In this paper, we put forth a novel formulation of the aerial object detection problem, namely open-vocabulary aerial object detection (OVAD), which can detect objects beyond training categories without costly collecting new labeled data.
1 code implementation • 16 Aug 2024 • Xiangdong Zhang, Shaofeng Zhang, Junchi Yan
These methods typically include an encoder accepting visible patches (normalized) and corresponding patch centers (position) as input, with the decoder accepting the output of the encoder and the centers (position) of the masked parts to reconstruct each point in the masked patches.
2 code implementations • 26 Jun 2024 • Shenghai Yuan, Jinfa Huang, Yongqi Xu, Yaoyang Liu, Shaofeng Zhang, Yujun Shi, Ruijie Zhu, Xinhua Cheng, Jiebo Luo, Li Yuan
We propose a novel text-to-video (T2V) generation benchmark, ChronoMagic-Bench, to evaluate the temporal and metamorphic capabilities of the T2V models (e. g. Sora and Lumiere) in time-lapse video generation.
1 code implementation • 28 Jan 2024 • Shaofeng Zhang, Jinfa Huang, Qiang Zhou, Zhibin Wang, Fan Wang, Jiebo Luo, Junchi Yan
At inference, we generate images with arbitrary expansion multiples by inputting an anchor image and its corresponding positional embeddings.
1 code implementation • 14 Nov 2023 • Jinpei Guo, Shaofeng Zhang, Runzhong Wang, Chang Liu, Junchi Yan
Meanwhile, on Pascal VOC, QueryTrans improves the accuracy of NGMv2 from $80. 1\%$ to $\mathbf{83. 3\%}$, and BBGM from $79. 0\%$ to $\mathbf{84. 5\%}$.
Ranked #1 on
Graph Matching
on PASCAL VOC
(matching accuracy metric)
1 code implementation • NeurIPS 2023 • Junkun Yuan, Xinyu Zhang, Hao Zhou, Jian Wang, Zhongwei Qiu, Zhiyin Shao, Shaofeng Zhang, Sifan Long, Kun Kuang, Kun Yao, Junyu Han, Errui Ding, Lanfen Lin, Fei Wu, Jingdong Wang
To further capture human characteristics, we propose a structure-invariant alignment loss that enforces different masked views, guided by the human part prior, to be closely aligned for the same image.
no code implementations • 10 Oct 2023 • Ning Liao, Shaofeng Zhang, Renqiu Xia, Min Cao, Yu Qiao, Junchi Yan
Instead of evaluating the models directly, in this paper, we try to evaluate the Vision-Language Instruction-Tuning (VLIT) datasets.
1 code implementation • 3 Aug 2023 • Qiang Zhou, Chaohui Yu, Shaofeng Zhang, Sitong Wu, Zhibing Wang, Fan Wang
To this end, we propose to extract features corresponding to regional objects as soft prompts for LLM, which provides a straightforward and scalable approach and eliminates the need for LLM fine-tuning.
no code implementations • 23 Jun 2023 • Shaofeng Zhang, Feng Zhu, Rui Zhao, Junchi Yan
On classification tasks, for ViT-S, ADCLR achieves 77. 5% top-1 accuracy on ImageNet with linear probing, outperforming our baseline (DINO) without our devised techniques as plug-in, by 0. 5%.
no code implementations • 8 Dec 2022 • Hengrui Zhang, Qitian Wu, Yu Wang, Shaofeng Zhang, Junchi Yan, Philip S. Yu
Contrastive learning methods based on InfoNCE loss are popular in node representation learning tasks on graph-structured data.
no code implementations • 28 Apr 2022 • Shaofeng Zhang, Feng Zhu, Junchi Yan, Rui Zhao, Xiaokang Yang
Scalability is an important consideration for deep graph neural networks.
no code implementations • CVPR 2022 • Shaofeng Zhang, Lyn Qiu, Feng Zhu, Junchi Yan, Hengrui Zhang, Rui Zhao, Hongyang Li, Xiaokang Yang
Existing symmetric contrastive learning methods suffer from collapses (complete and dimensional) or quadratic complexity of objectives.
no code implementations • 29 Sep 2021 • Hengrui Zhang, Qitian Wu, Shaofeng Zhang, Junchi Yan, David Wipf, Philip S. Yu
In this paper, we propose ESCo (Effective and Scalable Contrastive), a new contrastive framework which is essentially an instantiation of the Information Bottleneck principle under self-supervised learning settings.
no code implementations • ICLR 2022 • Shaofeng Zhang, Feng Zhu, Junchi Yan, Rui Zhao, Xiaokang Yang
The proposed two methods (FCL, ICL) can be combined synthetically, called Zero-CL, where ``Zero'' means negative samples are \textbf{zero} relevant, which allows Zero-CL to completely discard negative pairs i. e., with \textbf{zero} negative samples.
no code implementations • 29 Sep 2021 • Shaofeng Zhang, Meng Liu, Junchi Yan, Hengrui Zhang, Lingxiao Huang, Pinyan Lu, Xiaokang Yang
Negative pairs are essential in contrastive learning, which plays the role of avoiding degenerate solutions.
no code implementations • 1 Jan 2021 • Shaofeng Zhang, Junchi Yan, Xiaokang Yang
Despite their success in perception over the last decade, deep neural networks are also known ravenous to labeled data for training, which limits their applicability to real-world problems.
no code implementations • NeurIPS 2020 • Shaofeng Zhang, Meng Liu, Junchi Yan
Ensemble is a general way of improving the accuracy and stability of learning models, especially for the generalization ability on small datasets.