no code implementations • 14 Mar 2025 • Jianhong Bai, Menghan Xia, Xiao Fu, Xintao Wang, Lianrui Mu, Jinwen Cao, Zuozhu Liu, Haoji Hu, Xiang Bai, Pengfei Wan, Di Zhang
However, altering camera trajectories of a given video remains under-explored, despite its importance in the field of video creation.
no code implementations • 13 Mar 2025 • Yunxiao Wang, Meng Liu, Rui Shao, Haoyu Zhang, Bin Wen, Fan Yang, Tingting Gao, Di Zhang, Liqiang Nie
Video large language models have achieved remarkable performance in tasks such as video question answering, however, their temporal understanding remains suboptimal.
no code implementations • 12 Mar 2025 • Haoyu Zhang, Qiaohui Chu, Meng Liu, Yunxiao Wang, Bin Wen, Fan Yang, Tingting Gao, Di Zhang, YaoWei Wang, Liqiang Nie
To address these challenges, we propose learning the mapping between exocentric and egocentric domains, leveraging the extensive exocentric knowledge within existing MLLMs to enhance egocentric video understanding.
no code implementations • 9 Mar 2025 • Xukun Zhou, Fengxin Li, Ming Chen, Yan Zhou, Pengfei Wan, Di Zhang, Hongyan Liu, Jun He, Zhaoxin Fan
Audio-driven human gesture synthesis is a crucial task with broad applications in virtual avatars, human-computer interaction, and creative content generation.
no code implementations • 4 Mar 2025 • Zhen Yang, Guibao Shen, Liang Hou, Mushui Liu, Luozhou Wang, Xin Tao, Pengfei Wan, Di Zhang, Ying-Cong Chen
In this paper, we propose RectifiedHR, an straightforward and efficient solution for training-free high-resolution image generation.
no code implementations • 28 Feb 2025 • Xiao Wang, Jingyun Hua, WeiHong Lin, Yuanxing Zhang, Fuzheng Zhang, Jianlong Wu, Di Zhang, Liqiang Nie
Recent Multi-modal Large Language Models (MLLMs) have made great progress in video understanding.
no code implementations • 22 Feb 2025 • Rui Li, Peiyi Wang, Jingyuan Ma, Di Zhang, Lei Sha, Zhifang Sui
Large Language Models (LLMs) have gained increasing attention for their remarkable capacity, alongside concerns about safety arising from their potential to produce harmful content.
no code implementations • 19 Feb 2025 • Hao Yi, Qingyang Li, Yulan Hu, Fuzheng Zhang, Di Zhang, Yong liu
Recently, enhancing the numerical and logical reasoning capability of Large Language Models (LLMs) has emerged as a research hotspot.
no code implementations • 19 Feb 2025 • Borui Liao, Yulong Xu, Jiao Ou, Kaiyuan Yang, Weihua Jian, Pengfei Wan, Di Zhang
Full-Duplex Speech Dialogue Systems (Full-Duplex SDS) have significantly enhanced the naturalness of human-machine interaction by enabling real-time bidirectional communication.
no code implementations • 18 Feb 2025 • Leiyu Pan, Zhenpeng Su, Minxuan Lv, Yizhe Xiong, Xiangwen Zhang, Zijia Lin, Hui Chen, Jungong Han, Guiguang Ding, Cheng Luo, Di Zhang, Kun Gai, Deyi Xiong
Moreover, we find that Finedeep achieves optimal results when balancing depth and width, specifically by adjusting the number of expert sub-layers and the number of experts per sub-layer.
no code implementations • 18 Feb 2025 • Minxuan Lv, Zhenpeng Su, Leiyu Pan, Yizhe Xiong, Zijia Lin, Hui Chen, Wei Zhou, Jungong Han, Guiguang Ding, Cheng Luo, Di Zhang, Kun Gai, Songlin Hu
As large language models continue to scale, computational costs and resource consumption have emerged as significant challenges.
no code implementations • 17 Feb 2025 • Jiaze Li, Yaya Shi, Zongyang Ma, Haoran Xu, Feng Cheng, Huihui Xiao, Ruiwen Kang, Fan Yang, Tingting Gao, Di Zhang
Enhancing the fine-grained instance spatiotemporal motion perception capabilities of Video Large Language Models is crucial for improving their temporal and general video understanding.
no code implementations • 12 Feb 2025 • Qinghe Wang, Yawen Luo, Xiaoyu Shi, Xu Jia, Huchuan Lu, Tianfan Xue, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai
In the first stage, we design an interactive workflow that allows users to intuitively construct 3D-aware conditional signals by positioning object bounding boxes and defining camera movements within the 3D space.
no code implementations • 5 Feb 2025 • Di Zhang
This work highlights the potential of using GNNs for solving optimization problems in finance and provides a promising approach for real-time arbitrage detection in dynamic financial markets.
no code implementations • 27 Jan 2025 • Xuqiang Shao, Yuqi Zhang, Di Zhang, Tianxiang Gao, Xinyuan Liu, Zhiran Gan, Fanshun Meng, Hao Li, Weijie Yang
This paper addresses the challenges of creating efficient and high-quality datasets for machine learning potential functions.
no code implementations • 23 Jan 2025 • Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Wenyu Qin, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di Zhang, Kun Gai, Yujiu Yang, Wanli Ouyang
Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist.
no code implementations • 14 Jan 2025 • Jiwen Yu, Yiran Qin, Xintao Wang, Pengfei Wan, Di Zhang, Xihui Liu
In this paper, we present GameFactory, a framework focused on exploring scene generalization in game video generation.
no code implementations • 8 Jan 2025 • Yuzhou Huang, Ziyang Yuan, Quande Liu, Qiulin Wang, Xintao Wang, Ruimao Zhang, Pengfei Wan, Di Zhang, Kun Gai
To address these challenges, we introduce ConceptMaster, an innovative framework that effectively tackles the critical issues of identity decoupling while maintaining concept fidelity in customized videos.
no code implementations • 26 Dec 2024 • Haonan He, Yuchen Ren, Yining Tang, Ziyang Xu, Junxian Li, Minghao Yang, Di Zhang, Dong Yuan, Tao Chen, Shufei Zhang, Yuqiang Li, Nanqing Dong, Wanli Ouyang, Dongzhan Zhou, Peng Ye
Large language models have already demonstrated their formidable capabilities in general domains, ushering in a revolutionary transformation.
1 code implementation • 12 Dec 2024 • Yuanhui Huang, Wenzhao Zheng, Yuan Gao, Xin Tao, Pengfei Wan, Di Zhang, Jie zhou, Jiwen Lu
As videos are observations of the underlying evolving world, we propose to model the long-term developments in a latent space and use VGMs to film them into videos.
no code implementations • 10 Dec 2024 • Haoran Lian, Junmin Chen, Wei Huang, Yizhe Xiong, Wenping Hu, Guiguang Ding, Hui Chen, Jianwei Niu, Zijia Lin, Fuzheng Zhang, Di Zhang
In this paper, we introduce a novel single-stage continual pretraining method, Head-Adaptive Rotary Position Encoding (HARPE), to equip LLMs with long context modeling capabilities while simplifying the training process.
no code implementations • 10 Dec 2024 • Xiao Fu, Xian Liu, Xintao Wang, Sida Peng, Menghan Xia, Xiaoyu Shi, Ziyang Yuan, Pengfei Wan, Di Zhang, Dahua Lin
Previous methods on controllable video generation primarily leverage 2D control signals to manipulate object motions and have achieved remarkable synthesis results.
1 code implementation • 10 Dec 2024 • Jianhong Bai, Menghan Xia, Xintao Wang, Ziyang Yuan, Xiao Fu, Zuozhu Liu, Haoji Hu, Pengfei Wan, Di Zhang
Recent advancements in video diffusion models have shown exceptional abilities in simulating real-world dynamics and maintaining 3D consistency.
no code implementations • 10 Dec 2024 • Zixuan Ye, Huijuan Huang, Xintao Wang, Pengfei Wan, Di Zhang, Wenhan Luo
Style control has been popular in video generation models.
no code implementations • 27 Nov 2024 • Di Zhang, Junxian Li, Jingdi Lei, Xunzhi Wang, Yujie Liu, Zonglin Yang, Jiatong Li, Weida Wang, Suorong Yang, Jianbo Wu, Peng Ye, Wanli Ouyang, Dongzhan Zhou
In this approach, the Reasoner generates reasoning responses according to text prompts, which can evolve iteratively as a policy based on feedback from the Critic.
no code implementations • 25 Nov 2024 • Yuanyang Yin, Yaqi Zhao, Mingwu Zheng, Ke Lin, Jiarong Ou, Rui Chen, Victor Shea-Jay Huang, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Baoqun Yin, Wentao Zhang, Kun Gai
Achieving optimal performance of video diffusion transformers within given data and compute budget is crucial due to their high training costs.
no code implementations • 25 Nov 2024 • Hao Yi, Qingyang Li, Yulan Hu, Fuzheng Zhang, Di Zhang, Yong liu
To address these issues, we propose a high-quality VQA preference dataset, called \textit{\textbf{M}ultiple \textbf{M}ultimodal \textbf{A}rtificial \textbf{I}ntelligence \textbf{P}reference Datasets in \textbf{V}QA} (\textbf{MMAIP-V}), which is constructed by sampling from the response distribution set and using an external scoring function for response evaluation.
no code implementations • 22 Nov 2024 • Jiahao Hu, Tianxiong Zhong, Xuebo Wang, Boyuan Jiang, Xingye Tian, Fei Yang, Pengfei Wan, Di Zhang
VIVID-10M is the first large-scale hybrid image-video local editing dataset aimed at reducing data construction and model training costs, which comprises 9. 7M samples that encompass a wide range of video editing tasks.
no code implementations • arXiv preprint 2024 • Jiatong Li, Yunqing Liu, Wei Liu, Jingdi Lei, Di Zhang, Wenqi Fan, Dongzhan Zhou, Yuqiang Li, Qing Li
Previous endeavours often treat the molecule as a general SMILES string or molecular graph, neglecting the fine-grained alignments between the molecular sub-structures and the descriptive textual phrases, which are crucial for accurate and explainable predictions.
Ranked #2 on
Text-based de novo Molecule Generation
on ChEBI-20
no code implementations • 22 Nov 2024 • Jiatong Li, Yunqing Liu, Wei Liu, Jingdi Le, Di Zhang, Wenqi Fan, Dongzhan Zhou, Yuqiang Li, Qing Li
Previous endeavours often treat the molecule as a general SMILES string or molecular graph, neglecting the fine-grained alignments between the molecular sub-structures and the descriptive textual phrases, which are crucial for accurate and explainable predictions.
no code implementations • 21 Nov 2024 • Zhuoman Liu, Weicai Ye, Yan Luximon, Pengfei Wan, Di Zhang
Realistic simulation of dynamic scenes requires accurately capturing diverse material properties and modeling complex object interactions grounded in physical principles.
no code implementations • 20 Nov 2024 • Zhicong Li, Jiahao Wang, Zhishu Jiang, Hangyu Mao, Zhongxia Chen, Jiazhen Du, Yuanxing Zhang, Fuzheng Zhang, Di Zhang, Yong liu
In this paper, we introduce DMQR-RAG, a Diverse Multi-Query Rewriting framework designed to improve the performance of both document retrieval and final responses in RAG.
no code implementations • 7 Nov 2024 • Xingyu Lu, Yuhang Hu, Changyi Liu, Tianke Zhang, Zhenyu Yang, Zhixiang Ding, Shengsheng Qian, Meng Du, Ruiwen Kang, Kaiyu Tang, Fan Yang, Tingting Gao, Di Zhang, Hai-Tao Zheng, Bin Wen
In this work, we define mathematical problem-solving as a process of transiting from an initial unsolved state to the final resolved state, and propose Kwai-STaR framework, which transforms LLMs into State-Transition Reasoners to improve their intuitive reasoning capabilities.
no code implementations • 10 Oct 2024 • Qiuheng Wang, Yukai Shi, Jiarong Ou, Rui Chen, Ke Lin, Jiahao Wang, Boyuan Jiang, Haotian Yang, Mingwu Zheng, Xin Tao, Fei Yang, Pengfei Wan, Di Zhang
As visual generation technologies continue to advance, the scale of video datasets has expanded rapidly, and the quality of these datasets is critical to the performance of video generation models.
1 code implementation • 3 Oct 2024 • Di Zhang, Jianbo Wu, Jingdi Lei, Tong Che, Jiatong Li, Tong Xie, Xiaoshui Huang, Shufei Zhang, Marco Pavone, Yuqiang Li, Wanli Ouyang, Dongzhan Zhou
This paper presents an advanced mathematical problem-solving framework, LLaMA-Berry, for enhancing the mathematical reasoning ability of Large Language Models (LLMs).
no code implementations • 29 Sep 2024 • Di Zhang, Bowen Lv, Hai Zhang, Feifan Yang, Junqiao Zhao, Hang Yu, Chang Huang, Hongtu Zhou, Chen Ye, Changjun Jiang
Perceiving the pre-eminence of image reconstruction in representation learning, we propose SMG (Separated Models for Generalization), a novel approach that exploits image reconstruction for generalization.
no code implementations • 29 Sep 2024 • Xiao Wang, Jianlong Wu, Zijia Lin, Fuzheng Zhang, Di Zhang, Liqiang Nie
For iterative refinement, we first leverage a video-language model to generate synthetic annotations, resulting in a refined dataset.
no code implementations • 25 Sep 2024 • Yujian Zheng, Yuda Qiu, Leyang Jin, Chongyang Ma, Haibin Huang, Di Zhang, Pengfei Wan, Xiaoguang Han
Our experiments demonstrate that reconstructing braided and un-braided 3D hair from single-view images via a unified approach is possible and our method achieves the state-of-the-art performance in recovering complex hairstyles.
no code implementations • 23 Sep 2024 • Yihong Tang, Jiao Ou, Che Liu, Fuzheng Zhang, Di Zhang, Kun Gai
Role-playing is an emerging application in the field of Human-Computer Interaction (HCI), primarily implemented through the alignment training of a large language model (LLM) with assigned characters.
no code implementations • 21 Aug 2024 • Yuanyang Yin, Yaqi Zhao, YaJie Zhang, Ke Lin, Jiahao Wang, Xin Tao, Pengfei Wan, Di Zhang, Baoqun Yin, Wentao Zhang
Multimodal Large Language Models (MLLMs) have recently demonstrated remarkable perceptual and reasoning abilities, typically comprising a Vision Encoder, an Adapter, and a Large Language Model (LLM).
Ranked #71 on
Visual Question Answering
on MM-Vet
no code implementations • 20 Aug 2024 • Chen Peng, Di Zhang, Urbashi Mitra
In this paper, the causal bandit problem is investigated, in which the objective is to select an optimal sequence of interventions on nodes in a causal graph.
1 code implementation • 14 Aug 2024 • Junxian Li, Di Zhang, Xunzhi Wang, Zeying Hao, Jingdi Lei, Qian Tan, Cai Zhou, Wei Liu, Yaotian Yang, Xinrui Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Wei Li, Shufei Zhang, Mao Su, Wanli Ouyang, Yuqiang Li, Dongzhan Zhou
We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks.
no code implementations • 13 Aug 2024 • Liangdong Qiu, Chengxing Yu, Yanran Li, Zhao Wang, Haibin Huang, Chongyang Ma, Di Zhang, Pengfei Wan, Xiaoguang Han
Although humans have the innate ability to imagine multiple possible actions from videos, it remains an extraordinary challenge for computers due to the intricate camera movements and montages.
no code implementations • 9 Aug 2024 • Mu Lin, Di Zhang, Ben Chen, Hang Zheng
Water market is a contemporary marketplace for water trading and is deemed to one of the most efficient instruments to improve the social welfare.
no code implementations • 30 Jul 2024 • Di Zhang, Suvrajeet Sen
Moreover, it significantly enhances both speed and accuracy of the optimization process.
no code implementations • 19 Jul 2024 • Kaibing Chen, Dong Shen, Hanwen Zhong, Huasong Zhong, Kui Xia, Di Xu, Wei Yuan, Yifei Hu, Bin Wen, Tianke Zhang, Changyi Liu, Dewen Fan, Huihui Xiao, JiaHong Wu, Fan Yang, Size Li, Di Zhang
However, when dealing with long sequences of visual signals or inputs such as videos, the self-attention mechanism of language models can lead to significant computational overhead.
1 code implementation • 19 Jul 2024 • Shuo Huang, Shikun Sun, Zixuan Wang, Xiaoyu Qin, Yanmin Xiong, Yuan Zhang, Pengfei Wan, Di Zhang, Jia Jia
Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms.
no code implementations • 13 Jul 2024 • Tianrui Ji, Yuntian Hou, Di Zhang
Through this comprehensive survey of Kolmogorov-Arnold Networks(KAN), we have gained a thorough understanding of its theoretical foundation, architectural design, application scenarios, and current research progress.
1 code implementation • 3 Jul 2024 • Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, Zhizhou Zhong, Yuan Zhang, Pengfei Wan, Di Zhang
Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability.
1 code implementation • 28 Jun 2024 • Longrong Yang, Dong Shen, Chaoxiang Cai, Fan Yang, Size Li, Di Zhang, Xi Li
The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs).
1 code implementation • 17 Jun 2024 • Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Hongzhi Zhang, Fuzheng Zhang, Di Zhang, Kun Gai, Ji-Rong Wen
Hallucination detection is a challenging task for large language models (LLMs), and existing studies heavily rely on powerful closed-source LLMs such as GPT-4.
1 code implementation • 11 Jun 2024 • Di Zhang, Xiaoshui Huang, Dongzhan Zhou, Yuqiang Li, Wanli Ouyang
This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in complex mathematical reasoning tasks.
1 code implementation • 6 Jun 2024 • Ye Tian, Ling Yang, Haotian Yang, Yuan Gao, Yufan Deng, Jingmin Chen, Xintao Wang, Zhaochen Yu, Xin Tao, Pengfei Wan, Di Zhang, Bin Cui
Diffusion models have demonstrated great success in text-to-video (T2V) generation.
no code implementations • 31 May 2024 • Jinchao Zhu, Yuxuan Wang, Siyuan Pan, Pengfei Wan, Di Zhang, Gao Huang
1) For the tuning method, we design a model assembly strategy to reconstruct a lightweight model while preserving performance through distillation.
no code implementations • 24 May 2024 • Chenxi Sun, Hongzhi Zhang, Zijia Lin, Jingyuan Zhang, Fuzheng Zhang, Zhongyuan Wang, Bin Chen, Chengru Song, Di Zhang, Kun Gai, Deyi Xiong
The core of our approach is the observation that a pre-trained language model can confidently predict multiple contiguous tokens, forming the basis for a \textit{lexical unit}, in which these contiguous tokens could be decoded in parallel.
1 code implementation • CVPR 2024 • Sixian Zhang, Bohan Wang, Junqiang Wu, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang
Current metrics for text-to-image models typically rely on statistical metrics which inadequately represent the real preference of humans.
1 code implementation • 29 Apr 2024 • Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, Xiting Wang
Based on this, we quantify the faithfulness of a concept explanation via perturbation.
no code implementations • 17 Apr 2024 • Jiao Ou, Jiayu Wu, Che Liu, Fuzheng Zhang, Di Zhang, Kun Gai
In this paper, we propose to explicitly capture the complex rules to help the user simulator pose diverse and in-depth instruction.
no code implementations • 15 Apr 2024 • Zhaokun Zhou, Qiulin Wang, Bin Lin, Yiwei Su, Rui Chen, Xin Tao, Amin Zheng, Li Yuan, Pengfei Wan, Di Zhang
To further evaluate the IAA capability of MLLMs, we construct the UNIAA-Bench, which consists of three aesthetic levels: Perception, Description, and Assessment.
1 code implementation • 9 Apr 2024 • Xiuqi Deng, Lu Xu, Xiyao Li, Jinkai Yu, Erpeng Xue, Zhongyuan Wang, Di Zhang, Zhaojie Liu, Guorui Zhou, Yang song, Na Mou, Shen Jiang, Han Li
In this paper, we propose an industrial multimodal recommendation framework named EM3: End-to-end training of Multimodal Model and ranking Model, which sufficiently utilizes multimodal information and allows personalized ranking tasks to directly train the core modules in the multimodal model to obtain more task-oriented content features, without overburdening resource consumption.
1 code implementation • 29 Mar 2024 • Luozhou Wang, Ziyang Mai, Guibao Shen, Yixun Liang, Xin Tao, Pengfei Wan, Di Zhang, Yijun Li, Yingcong Chen
In this work, we present a novel approach for motion customization in video generation, addressing the widespread gap in the exploration of motion representation within video generative models.
no code implementations • 21 Mar 2024 • Yiquan Chen, Yingchao Lyu, Di Zhang
Deep reinforcement learning has made significant progress in games with imperfect information, but its performance in the card game Doudizhu (Chinese Poker/Fight the Landlord) remains unsatisfactory.
2 code implementations • 12 Mar 2024 • Weijia Wu, Zhuang Li, YuChao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang
We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation.
no code implementations • 6 Mar 2024 • Di Zhang, Moyang Wang, Joseph Mango, Xiang Li, Xianrui Xu
Given these advancements, there has been a surge in novel methods employing reinforcement learning to tackle spatial resource allocation problems.
1 code implementation • 26 Feb 2024 • Zhexin Zhang, Yida Lu, Jingyuan Ma, Di Zhang, Rui Li, Pei Ke, Hao Sun, Lei Sha, Zhifang Sui, Hongning Wang, Minlie Huang
The safety of Large Language Models (LLMs) has gained increasing attention in recent years, but there still lacks a comprehensive approach for detecting safety issues within LLMs' responses in an aligned, customizable and explainable manner.
no code implementations • 16 Feb 2024 • Yihong Tang, Jiao Ou, Che Liu, Fuzheng Zhang, Di Zhang, Kun Gai
Experiments on models improved by RoleAD indicate that our adversarial dataset ameliorates this deficiency, with the improvements demonstrating a degree of generalizability in ordinary scenarios.
1 code implementation • 10 Feb 2024 • Di Zhang, Wei Liu, Qian Tan, Jingdan Chen, Hang Yan, Yuliang Yan, Jiatong Li, Weiran Huang, Xiangyu Yue, Wanli Ouyang, Dongzhan Zhou, Shufei Zhang, Mao Su, Han-sen Zhong, Yuqiang Li
However, the community lacks an LLM specifically designed for chemistry.
1 code implementation • 5 Feb 2024 • Yang Jin, Zhicheng Sun, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang song, Kun Gai, Yadong Mu
In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos.
Ranked #3 on
Text-to-Video Generation
on MSR-VTT
no code implementations • 5 Feb 2024 • Shiyuan Yang, Liang Hou, Haibin Huang, Chongyang Ma, Pengfei Wan, Di Zhang, Xiaodong Chen, Jing Liao
In practice, users often desire the ability to control object motion and camera movement independently for customized video creation.
no code implementations • 22 Jan 2024 • Lihua Jian, Songlei Xiong, Han Yan, Xiaoguang Niu, Shaowu Wu, Di Zhang
The DIIM is designed by modifying the vanilla cross-attention mechanism, which can promote the extraction of the discrepancy information of the source images.
1 code implementation • 11 Jan 2024 • Zhipeng Chen, Kun Zhou, Wayne Xin Zhao, Junchen Wan, Fuzheng Zhang, Di Zhang, Ji-Rong Wen
To address it, we propose a new RL method named RLMEC that incorporates a generative model as the reward model, which is trained by the erroneous solution rewriting task under the minimum editing constraint, and can produce token-level rewards for RL training.
2 code implementations • 27 Dec 2023 • Xun Guo, Mingwu Zheng, Liang Hou, Yuan Gao, Yufan Deng, Pengfei Wan, Di Zhang, Yufan Liu, Weiming Hu, ZhengJun Zha, Haibin Huang, Chongyang Ma
I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism, maintaining the identity of the input image without any changes to the pretrained T2V model.
1 code implementation • 24 Nov 2023 • Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang
In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.
no code implementations • 14 Nov 2023 • Lei Lin, Jiayi Fu, Pengli Liu, Qingyang Li, Yan Gong, Junchen Wan, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai
Although chain-of-thought (CoT) prompting combined with language models has achieved encouraging results on complex reasoning tasks, the naive greedy decoding used in CoT prompting usually causes the repetitiveness and local optimality.
1 code implementation • 3 Nov 2023 • Jiao Ou, Junda Lu, Che Liu, Yihong Tang, Fuzheng Zhang, Di Zhang, Kun Gai
In this paper, we propose DialogBench, a dialogue evaluation benchmark that contains 12 dialogue tasks to probe the capabilities of LLMs as human-like dialogue systems should have.
no code implementations • 17 Oct 2023 • Huan Yuan, Chao Liao, Jianchao Tan, Peng Yao, Jiyuan Jia, Bin Chen, Chengru Song, Di Zhang
To alleviate two disadvantages of two categories of methods, we propose to unify the static compression and dynamic compression techniques jointly to obtain an input-adaptive compressed model, which can further better balance the total compression ratios and the model performances.
no code implementations • 17 Oct 2023 • Peng Yao, Chao Liao, Jiyuan Jia, Jianchao Tan, Bin Chen, Chengru Song, Di Zhang
Deep neural networks have gained great success due to the increasing amounts of data, and diverse effective neural network designs.
no code implementations • 11 Oct 2023 • Yuchong Sun, Che Liu, Kun Zhou, Jinwen Huang, Ruihua Song, Wayne Xin Zhao, Fuzheng Zhang, Di Zhang, Kun Gai
In this paper, we introduce Parrot, a solution aiming to enhance multi-turn instruction following for LLMs.
no code implementations • 11 Oct 2023 • Jiayi Fu, Lei Lin, Xiaoyang Gao, Pengli Liu, Zhengzong Chen, Zhirui Yang, ShengNan Zhang, Xue Zheng, Yan Li, Yuliang Liu, Xucheng Ye, Yiqiao Liao, Chao Liao, Bin Chen, Chengru Song, Junchen Wan, Zijia Lin, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai
Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning.
Ranked #95 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
1 code implementation • 9 Sep 2023 • Yang Jin, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong Mu
Specifically, we introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language that LLM can read.
1 code implementation • 9 Aug 2023 • Jue Chen, Huan Yuan, Jianchao Tan, Bin Chen, Chengru Song, Di Zhang
We propose an improved end-to-end Minimax optimization method for this sparse learning problem to better balance the model performance and the computation efficiency.
no code implementations • 24 Jun 2023 • Xiao Zhang, Hai Zhang, Hongtu Zhou, Chang Huang, Di Zhang, Chen Ye, Junqiao Zhao
In this paper, we propose a method to construct a boundary that discriminates safe and unsafe states.
no code implementations • 22 Feb 2023 • Haoran Yin, Jiaojiao Xiong, Yu Zhou, Chi Zhang, Di Zhang, Xizhang Wei, Yanqun Tang
Delay-Doppler waveform design has been considered as a promising solution to achieve reliable communication under high-mobility channels for the space-air-ground-integrated networks (SAGIN).
no code implementations • 19 Jan 2023 • Chris Egersdoerfer, Dong Dai, Di Zhang
With the increasing prevalence of scalable file systems in the context of High Performance Computing (HPC), the importance of accurate anomaly detection on runtime logs is increasing.
no code implementations • 20 Oct 2022 • Di Zhang, Youzhou Zhou
2) It satisfies the connectivity constraint, that is, all currencies are guaranteed to be tradable.
no code implementations • 4 Jul 2022 • Di Zhang, Qiang Niu, Youzhou Zhou
2) If the variational inference(VI) is used for state estimation, it runs much faster than Monte Carlo(MC) methods since the calculation of the posterior uses only basic arithmetic operations.
1 code implementation • 11 Apr 2022 • Yuanxing Zhang, Langshi Chen, Siran Yang, Man Yuan, Huimin Yi, Jie Zhang, Jiamang Wang, Jianbo Dong, Yunlong Xu, Yue Song, Yong Li, Di Zhang, Wei Lin, Lin Qu, Bo Zheng
However, we observe that GPU devices in training recommender systems are underutilized, and they cannot attain an expected throughput improvement as what it has achieved in CV and NLP areas.
no code implementations • 28 Mar 2022 • Zhirong Xu, Shiyang Wen, Junshan Wang, Guojun Liu, Liang Wang, Zhi Yang, Lei Ding, Yan Zhang, Di Zhang, Jian Xu, Bo Zheng
Moreover, to deploy AMCAD in Taobao, one of the largest ecommerce platforms with hundreds of million users, we design an efficient two-layer online retrieval framework for the task of graph based advertisement retrieval.
no code implementations • 8 Sep 2021 • Di Zhang
The original lottery ticket hypothesis performs pruning and weight resetting after training convergence, exposing it to the problem of forgotten learning knowledge and potential high cost of training.
no code implementations • 31 May 2021 • An Yang, Junyang Lin, Rui Men, Chang Zhou, Le Jiang, Xianyan Jia, Ang Wang, Jie Zhang, Jiamang Wang, Yong Li, Di Zhang, Wei Lin, Lin Qu, Jingren Zhou, Hongxia Yang
Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost, and thus it has become a trend in model scaling.
no code implementations • 30 Mar 2021 • Feng Li, Zhenrui Chen, Pengjie Wang, Yi Ren, Di Zhang, Xiaoyu Zhu
Moreover, it is difficult for user to jump out of their specific historical behaviors for possible interest exploration, namely weak generalization problem.
no code implementations • 27 Feb 2021 • Xuewan Zhang, Dalong Zhang, Liuqing Yang, Gangtao Han, Hsiao-Hwa Chen, Di Zhang
Thus, BER performance of the proposed codebook design approach outperforms that of the existing codebook design schemes in both uncoded and coded SCMA systems, especially for large-size codebooks.
no code implementations • 24 Feb 2021 • Yibo Wang, Siqi Jiang, Jingkuan Xiao, Xiaofan Cai, Di Zhang, Ping Wang, Guodong Ma, Yaqing Han, Jiabei Huang, Kenji Watanabe, Takashi Taniguchi, Alexander S. Mayorov, Geliang Yu
Van der Waals (vdW) assembly of two-dimensional materials has been long recognized as a powerful tool to create unique systems with properties that cannot be found in natural compounds.
Mesoscale and Nanoscale Physics Materials Science
no code implementations • 10 Feb 2021 • Haijing Zhou, Junjie Cao, Jingwei Lian, Di Zhang
Approximate analytical formulas describing the dark matter abundance and cross section in the scattering with nucleons are used to illustrate a dependence on theoretical parameters in neutralino and Higgs sectors.
High Energy Physics - Phenomenology
no code implementations • 9 Feb 2021 • Di Zhang, Shun Zhou
For the first time, the Wilson coefficients of all the relevant six-dimensional operators are computed by carrying out the one-loop matching between the effective theory and full seesaw model, and applied to calculate the total rates of radiative decays of charged leptons.
High Energy Physics - Phenomenology High Energy Physics - Experiment
1 code implementation • 20 Oct 2019 • Di Zhang, Dong Dai, Youbiao He, Forrest Sheng Bao, Bing Xie
Today high-performance computing (HPC) platforms are still dominated by batch jobs.
no code implementations • 25 Sep 2019 • Yu He, Shiyang Wen, Wenjin Wu, Yan Zhang, Siran Yang, Yuan Wei, Di Zhang, Guojie Song, Wei Lin, Liang Wang, Bo Zheng
The Graph Convolutional Network (GCN) and its variants are powerful models for graph representation learning and have recently achieved great success on many graph-based applications.
no code implementations • 3 Jan 2019 • Michael Wojnowicz, Di Zhang, Glenn Chisholm, Xuan Zhao, Matt Wolff
However, the recent development of randomized principal component analysis (RPCA) has opened up the possibility of obtaining approximate principal components on very large datasets.
no code implementations • 11 Jun 2018 • Hao Dong, Shuai Li, Dongchang Xu, Yi Ren, Di Zhang
The training of Deep Neural Networks usually needs tremendous computing resources.