no code implementations • CCL 2022 • Tian Huang, Yanqiu Shao, Wei Li
“语义依存图是NLP处理语义的深层分析方法, 能够对句子中词与词之间的语义进行分析。该文针对古代汉语特点, 在制定古代汉语语义依存图标注规范的基础上, 以《二十四史》为语料来源, 完成标注了规模为3000句的古代汉语语义依存图库, 标注一致性的kappa值为78. 83%。通过与现代汉语语义依存图库的对比, 对依存图库基本情况进行统计, 分析古代汉语的语义特色和规律。统计显示, 古代汉语语义分布宏观上符合齐普夫定律, 在语义事件描述上具有强烈的历史性叙事和正式文体特征, 如以人物纪传为中心, 时间、地点等周边角色描述细致, 叙事语言冷静客观, 缺少描述情态、语气、程度、时间状态等的修饰词语等。 "
no code implementations • ECCV 2020 • Niamul Quader, Juwei Lu, Peng Dai, Wei Li
State-of-the-art approaches to video-based action and gesture recognition often employ two key concepts: First, they employ multistream processing; second, they use an ensemble of convolutional networks.
Ranked #1 on
Action Classification
on Jester test
1 code implementation • EMNLP 2021 • Moye Chen, Wei Li, Jiachen Liu, Xinyan Xiao, Hua Wu, Haifeng Wang
Comparing with traditional methods, our method has two main advantages: (1) the relations between sentences are captured by modeling both the graph structure of the whole document set and the candidate sub-graphs; (2) directly outputs an integrate summary in the form of sub-graph which is more informative and coherent.
1 code implementation • ECCV 2020 • Niamul Quader, Md Mafijul Islam Bhuiyan, Juwei Lu, Peng Dai, Wei Li
We propose novel approaches for simultaneously identifying important weights of a convolutional neural network (ConvNet) and providing more attention to the important weights during training.
1 code implementation • Findings (ACL) 2022 • Wei Li, Yuhan Song, Qi Su, Yanqiu Shao
Word Segmentation is a fundamental step for understanding Chinese language.
no code implementations • COLING 2022 • Kun Zhang, Yunqi Qiu, Yuanzhuo Wang, Long Bai, Wei Li, Xuhui Jiang, HuaWei Shen, Xueqi Cheng
Complex question generation over knowledge bases (KB) aims to generate natural language questions involving multiple KB relations or functional constraints.
no code implementations • CCL 2020 • Xiaodong Yan, Xiaoqing Xie, Yu Zou, Wei Li
Seq2seq神经网络模型在中英文文本摘要的研究中取得了良好的效果, 但在低资源语言的文本摘要研究还处于探索阶段, 尤其是在藏语中。此外, 目前还没有大规模的标注语料库进行摘要提取。本文提出了一种生成藏文新闻摘要的统一模型。利用TextRank算法解决了藏语标注训练数据不足的问题。然后, 采用两层双GRU神经网络提取代表原始新闻的句子, 减少冗余信息。最后, 使用基于注意力机制的Seq2Seq来生成理解式摘要。同时, 我们加入了指针网络来处理未登录词的问题。实验结果表明, ROUGE-1评分比传统模型提高了2%。 关键词:文本摘要;藏文;TextRank; 指针网络;Bi-GRU
no code implementations • CCL 2022 • Kuai Yu, Yanqiu Shao, Wei Li
“基于深度学习的有监督机器翻译取得了良好的效果, 但训练过程中需要大量质量较高的对齐语料。对于中文古今翻译场景, 高质量的平行语料并不多, 而粗对齐的篇章、段语料比较容易获得, 因此语料对齐很有研究价值和研究必要。在传统双语平行语料的句子对齐研究中, 传统方法根据双语文本中的长度、词汇、共现文字等语法信息, 建立一个综合评判标准来衡量两个句对之间相似度。此类方法虽然在单句对齐上取得了较好的效果, 但是对于句子语义匹配的能力有限, 并且在一些多对多的对齐模式上的性能表现不佳。在本文中我们提出尝试利用现在发展迅速且具有强大语义表示能力的预训练语言模型来考虑双语的语义信息, 但是单独使用预训练语言模型只能考虑相对局部的信息, 因此我们提出采用基于动态规划算法的强化学习训练目标来整合段落全局信息, 并且进行无监督训练。实验结果证明我们提出的方法训练得到的模型性能优于此前获得最好表现的基线模型, 尤其相较于传统模型难以处理的多对多对齐模式下, 性能提升较大。”
1 code implementation • 12 May 2025 • Wei Li, Ming Hu, Guoan Wang, Lihao Liu, Kaijin Zhou, Junzhi Ning, Xin Guo, ZongYuan Ge, Lixu Gu, Junjun He
In ophthalmic surgery, developing an AI system capable of interpreting surgical videos and predicting subsequent operations requires numerous ophthalmic surgical videos with high-quality annotations, which are difficult to collect due to privacy concerns and labor consumption.
no code implementations • 11 May 2025 • Binbin Wei, Yuhang Zhang, Shishun Tian, Muxin Liao, Wei Li, Wenbin Zou
Hence, we propose a novel framework, namely Depth-Sensitive Soft Suppression with RGB-D inter-modal stylization flow (DSSS), focusing on learning domain-invariant features from depth maps for the DG semantic segmentation.
no code implementations • 21 Apr 2025 • Miaomiao Cai, Simiao Li, Wei Li, Xudong Huang, Hanting Chen, Jie Hu, Yunhe Wang
Recent advances in diffusion models have improved Real-World Image Super-Resolution (Real-ISR), but existing methods lack human feedback integration, risking misalignment with human preference and may leading to artifacts, hallucinations and harmful content generation.
no code implementations • 19 Apr 2025 • Peixi Wu, Bosong Chai, Menghua Zheng, Wei Li, Zhangchi Hu, Jie Chen, Zheyu Zhang, Hebei Li, Xiaoyan Sun
Due to the poor performance of simply transferring Mamba to 3D SNNs, SPM is designed to utilize both the sequence modeling capabilities of Mamba and the temporal feature extraction of SNNs.
1 code implementation • 12 Apr 2025 • Song Yang, Haotian Fu, Herui Zhang, Peng Zhang, Wei Li, Dongrui Wu
Decoding brain signals accurately and efficiently is crucial for intra-cortical brain-computer interfaces.
no code implementations • 12 Apr 2025 • Jiaxing Xu, Kai He, Yue Tang, Wei Li, Mengcheng Lan, Xia Dong, Yiping Ke, Mengling Feng
In this paper, we present BrainPrompt, an innovative framework that enhances Graph Neural Networks (GNNs) by integrating Large Language Models (LLMs) with knowledge-driven prompts, enabling more effective capture of complex, non-imaging information and external knowledge for neurological disease identification.
no code implementations • 10 Apr 2025 • Yichun Yin, Wenyong Huang, Kaikai Song, Yehui Tang, Xueyu Wu, Wei Guo, Peng Guo, Yaoyuan Wang, Xiaojun Meng, Yasheng Wang, Dong Li, Can Chen, Dandan Tu, Yin Li, Fisher Yu, Ruiming Tang, Yunhe Wang, Baojun Wang, Bin Wang, Bo wang, Boxiao Liu, Changzheng Zhang, Duyu Tang, Fei Mi, Hui Jin, Jiansheng Wei, Jiarui Qin, Jinpeng Li, Jun Zhao, Liqun Deng, Lin Li, Minghui Xu, Naifu Zhang, Nianzu Zheng, Qiang Li, Rongju Ruan, Shengjun Cheng, Tianyu Guo, wei he, Wei Li, Weiwen Liu, Wulong Liu, Xinyi Dai, Yonghan Dong, Yu Pan, Yue Li, YuFei Wang, YuJun Li, Yunsheng Ni, Zhe Liu, Zhenhe Zhang, Zhicheng Liu
Our exploration demonstrates that Ascend NPUs are capable of efficiently and effectively training dense models with more than 100 billion parameters.
no code implementations • 30 Mar 2025 • Kai Liu, Wei Li, Lai Chen, Shengqiong Wu, Yanhao Zheng, Jiayi Ji, Fan Zhou, Rongxin Jiang, Jiebo Luo, Hao Fei, Tat-Seng Chua
This paper introduces JavisDiT, a novel Joint Audio-Video Diffusion Transformer designed for synchronized audio-video generation (JAVG).
1 code implementation • 27 Mar 2025 • Haote Yang, Xingjian Wei, Jiang Wu, Noémi Ligeti-Nagy, Jiaxing Sun, Yinfan Wang, Zijian Győző Yang, Junyuan Gao, JingChao Wang, Bowen Jiang, Shasha Wang, Nanjun Yu, Zihao Zhang, Shixin Hong, Hongwei Liu, Wei Li, Songyang Zhang, Dahua Lin, Lijun Wu, Gábor Prószéky, Conghui He
We introduce OpenHuEval, the first benchmark for LLMs focusing on the Hungarian language and specifics.
1 code implementation • 27 Mar 2025 • Size Wu, Wenwei Zhang, Lumin Xu, Sheng Jin, Zhonghua Wu, Qingyi Tao, Wentao Liu, Wei Li, Chen Change Loy
Unifying visual understanding and generation within a single multimodal framework remains a significant challenge, as the two inherently heterogeneous tasks require representations at different levels of granularity.
1 code implementation • 26 Mar 2025 • Tianqi Liu, Zihao Huang, Zhaoxi Chen, Guangcong Wang, Shoukang Hu, Liao Shen, Huiqiang Sun, Zhiguo Cao, Wei Li, Ziwei Liu
We present Free4D, a novel tuning-free framework for 4D scene generation from a single image.
1 code implementation • 25 Mar 2025 • Yudong Yang, Jimin Zhuang, Guangzhi Sun, Changli Tang, Yixuan Li, Peihan Li, Yifan Jiang, Wei Li, Zejun Ma, Chao Zhang
Audio often serves as an auxiliary modality in video understanding tasks of audio-visual large language models (LLMs), merely assisting in the comprehension of visual information.
1 code implementation • 25 Mar 2025 • Yang Ren, Hai Jiang, Menglong Yang, Wei Li, Shuaicheng Liu
RAW-to-sRGB mapping, or the simulation of the traditional camera image signal processor (ISP), aims to generate DSLR-quality sRGB images from raw data captured by smartphone sensors.
no code implementations • 24 Mar 2025 • Monan Zhou, Shenyang Xu, Zhaorui Liu, Zhaowen Wang, Feng Yu, Wei Li, Baoqiang Han
Data are crucial in various computer-related fields, including music information retrieval (MIR), an interdisciplinary area bridging computer science and music.
no code implementations • 18 Mar 2025 • Yixuan Li, Changli Tang, Jimin Zhuang, Yudong Yang, Guangzhi Sun, Wei Li, Zejun Ma, Chao Zhang
Human vision is dynamic and continuous.
no code implementations • 18 Mar 2025 • Pengcheng Zhou, Lantian Zhang, Wei Li
Semi-supervised learning is of great significance in medical image segmentation by exploiting unlabeled data.
no code implementations • 14 Mar 2025 • Zixu Cheng, Jian Hu, Ziquan Liu, Chenyang Si, Wei Li, Shaogang Gong
Human processes video reasoning in a sequential spatio-temporal reasoning logic, we first identify the relevant frames ("when") and then analyse the spatial relationships ("where") between key objects, and finally leverage these relationships to draw inferences ("what").
1 code implementation • 7 Mar 2025 • Dingkun Liu, Siyang Li, Ziwei Wang, Wei Li, Dongrui Wu
A non-invasive brain-computer interface (BCI) enables direct interaction between the user and external devices, typically via electroencephalogram (EEG) signals.
no code implementations • 7 Mar 2025 • Hongwei Yi, Tian Ye, Shitong Shao, Xuancheng Yang, Jiantong Zhao, Hanzhong Guo, Terrance Wang, Qingyu Yin, Zeke Xie, Lei Zhu, Wei Li, Michael Lingelbach, Daquan Zhou
We present MagicInfinite, a novel diffusion Transformer (DiT) framework that overcomes traditional portrait animation limitations, delivering high-fidelity results across diverse character types-realistic humans, full-body figures, and stylized anime characters.
no code implementations • 5 Mar 2025 • Tao Wang, Yinghui Wang, Yanxing Liang, Liangyi Huang, Jinlong Yang, Wei Li, Xiaojuan Ning
The issue concerning the significant decline in the stability of feature extraction for images subjected to large-angle affine transformations, where the angle exceeds 50 degrees, still awaits a satisfactory solution.
no code implementations • 5 Mar 2025 • Dian Xu, Shanshan Wang, Wei Li, Weibing Deng, Feng Gao, Jianmin Shen
The results demonstrate that the KAN can indeed predict the critical points of percolation models.
no code implementations • 5 Mar 2025 • Wei Li, Bing Hu, Rui Shao, Leyang Shen, Liqiang Nie
However, existing online video assistants often sacrifice assistant efficacy for real-time efficiency by processing low-frame-rate videos with coarse-grained visual features. To overcome the trade-off between efficacy and efficiency, we propose "Fast & Slow Video-Language Thinker" as an onLIne videO assistaNt, LION-FS, achieving real-time, proactive, temporally accurate, and contextually precise responses.
no code implementations • 2 Mar 2025 • Liao Shen, Tianqi Liu, Huiqiang Sun, Jiaqi Li, Zhiguo Cao, Wei Li, Chen Change Loy
We also introduce a synthetic dataset to assess refocusing capabilities and the model's ability to learn precise lens parameters.
1 code implementation • 28 Feb 2025 • Xueyun Tian, Wei Li, Bingbing Xu, Yige Yuan, Yuanzhuo Wang, HuaWei Shen
Experiments show that MIGE excels in both subject-driven generation and instruction-based editing while setting a state-of-the-art in the new task of instruction-based subject-driven editing.
1 code implementation • 24 Feb 2025 • Hao Gu, Wei Li, Lujun Li, Qiyuan Zhu, Mark Lee, Shengjie Sun, Wei Xue, Yike Guo
Based on observations of expert diversity, we decompose their weights into a shared base weight and unique delta weights.
no code implementations • 24 Feb 2025 • Miaomiao Cai, Guanjie Wang, Wei Li, Zhijun Tu, Hanting Chen, Shaohui Lin, Jie Hu
In the field of autoregressive (AR) image generation, models based on the 'next-token prediction' paradigm of LLMs have shown comparable performance to diffusion models by reducing inductive biases.
1 code implementation • 18 Feb 2025 • Ziwei Wang, Siyang Li, Xiaoqing Chen, Wei Li, Dongrui Wu
Two contrastive modules are further introduced: a cross-view contrastive module that enforces consistency of original and augmented views, and a cross-model contrastive module that aligns features extracted from both branches.
no code implementations • 17 Feb 2025 • Guangzhi Sun, Yudong Yang, Jimin Zhuang, Changli Tang, Yixuan Li, Wei Li, Zejun Ma, Chao Zhang
video-SALMONN-o1 achieves 3-8% accuracy improvements over the LLaVA-OneVision baseline across different video reasoning benchmarks.
1 code implementation • 15 Feb 2025 • Quanmin Wei, Penglin Dai, Wei Li, Bingyi Liu, Xiao Wu
However, training a robust collaborative perception model requires collecting sufficient training data that covers all possible collaboration scenarios, which is impractical due to intolerable deployment costs.
1 code implementation • 12 Feb 2025 • Wei Li, Wen Luo, Guangyue Peng, Houfeng Wang
In this paper, we propose a novel retrieval method based on natural language grammatical error explanations (GEE) to address this issue.
no code implementations • 10 Feb 2025 • Jian Hu, Zixu Cheng, Chenyang Si, Wei Li, Shaogang Gong
Multi-modal Large Language Models (MLLMs) struggle with long videos due to the need for excessive visual tokens.
2 code implementations • 24 Jan 2025 • JIA YU, Fei Yuan, Rui Min, Jing Yu, Pei Chu, Jiayang Li, Wei Li, Ruijie Zhang, Zhenxiang Li, Zhifei Ren, Dong Zheng, Wenjian Zhang, Yan Teng, Lingyu Meng, Zhenjiang Jin, Jiantao Qiu, Shasha Wang, Zhongying Tu, Dahua Lin, Yu Wang, Yu Qiao, Yanfeng Wang, Conghui He
This paper introduces the open-source dataset WanJuanSiLu, designed to provide high-quality training corpora for low-resource languages, thereby advancing the research and development of multilingual models.
1 code implementation • 22 Jan 2025 • Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, DaCheng Tao
Experiments on various mathematical reasoning benchmarks show that O1-Pruner not only significantly reduces inference overhead but also achieves higher accuracy, providing a novel and promising solution to this challenge.
no code implementations • 7 Jan 2025 • Dengzhao Fang, Jipeng Qiang, Yi Zhu, Yunhao Yuan, Wei Li, Yan Liu
Research on text simplification has primarily focused on lexical and sentence-level changes.
no code implementations • 6 Jan 2025 • Zhuoxuan Liang, Wei Li, Dalin Zhang, Yidan Chen, Zhihong Wang, Xiangping Zheng, Moustafa Youssef
Based on graph neural networks (GNNs) extracting the relationships between physical and virtual sensors, ISK can infer the measurements of virtual sensors from physical sensors.
no code implementations • 5 Jan 2025 • Jinkun Han, Wei Li, Zhipeng Cai, Yingshu Li
Micro-video recommendation is attracting global attention and becoming a popular daily service for people of all ages.
no code implementations • 5 Jan 2025 • Jinkun Han, Wei Li, Yingshu Li, Zhipeng Cai
Current recommendation systems recommend goods by considering users' historical behaviors, social relations, ratings, and other multi-modals.
no code implementations • 2 Jan 2025 • Tao Feng, Wei Li, Didi Zhu, Hangjie Yuan, Wendi Zheng, Dan Zhang, Jie Tang
This work provides essential insights and tools for advancing forward pass methods to overcome forgetting.
1 code implementation • 1 Jan 2025 • Mingjia Li, Shuang Li, Tongrui Su, Longhui Yuan, Jian Liang, Wei Li
Capitalizing on the complementary advantages of generative and discriminative models has always been a compelling vision in machine learning, backed by a growing body of research.
no code implementations • 24 Dec 2024 • Pufan Zou, Shijia Zhao, Weijie Huang, Qiming Xia, Chenglu Wen, Wei Li, Cheng Wang
Our proposed AdaCo can effectively mitigate the performance limitations of label-free learning networks in 3D semantic segmentation tasks.
1 code implementation • 24 Dec 2024 • Zhibin Wang, Yanxin Cai, Jiayi Zhou, Yangming Zhang, Tianyu Li, Wei Li, Xun Liu, Guoqing Wang, Yang Yang
Efficient compression of remote sensing imagery is a critical solution to alleviate these burdens on satellites.
no code implementations • 18 Dec 2024 • Qingyu Zheng, Guijun Han, Wei Li, Lige Cao, Gongfu Zhou, Haowen Wu, Qi Shao, Ru Wang, Xiaobo Wu, Xudong Cui, Hong Li, Xuan Wang
To fuse multi-source data and reconstruct the nonlinear evolution missing from observations, geoscientists are developing future-oriented DA methods.
no code implementations • 18 Dec 2024 • Jiaping Ren, Jiahao Xiang, Hongfei Gao, Jinchuan Zhang, Yiming Ren, Yuexin Ma, Yi Wu, Ruigang Yang, Wei Li
Fuel efficiency is a crucial aspect of long-distance cargo transportation by oil-powered trucks that economize on costs and decrease carbon emissions.
1 code implementation • 12 Dec 2024 • Yunshuai Zhou, Junbo Qiao, Jincheng Liao, Wei Li, Simiao Li, Jiao Xie, Yunhang Shen, Jie Hu, Shaohui Lin
However, previous KD methods for image restoration overlook the state of the student during the distillation, adopting a fixed solution space that limits the capability of KD.
1 code implementation • 12 Dec 2024 • Pan Zhang, Xiaoyi Dong, Yuhang Cao, Yuhang Zang, Rui Qian, Xilin Wei, Lin Chen, Yifei Li, Junbo Niu, Shuangrui Ding, Qipeng Guo, Haodong Duan, Xin Chen, Han Lv, Zheng Nie, Min Zhang, Bin Wang, Wenwei Zhang, Xinyue Zhang, Jiaye Ge, Wei Li, Jingwen Li, Zhongying Tu, Conghui He, Xingcheng Zhang, Kai Chen, Yu Qiao, Dahua Lin, Jiaqi Wang
Recent advancements in multimodal large language models (MLLMs) have made significant strides in open-world understanding.
1 code implementation • 12 Dec 2024 • Ziqi Zhou, Weize Quan, Hailin Shi, Wei Li, Lili Wang, Dong-Ming Yan
Audio-driven talking head generation necessitates seamless integration of audio and visual data amidst the challenges posed by diverse input portraits and intricate correlations between audio and facial motions.
no code implementations • 2 Dec 2024 • Hamzah A. A. M. Qaid, Bo Zhang, Dan Li, See-Kiong Ng, Wei Li
We assess the fault diagnosis capabilities of four open-sourced LLMs based on the FD-LLM framework, and evaluate the models' adaptability and generalizability under various operational conditions and machine components, namely for traditional fault diagnosis, cross-operational conditions, and cross-machine component settings.
no code implementations • 27 Nov 2024 • Xu Chen, Keke He, Junwei Zhu, Yanhao Ge, Wei Li, Chengjie Wang
Face swapping aims to generate results that combine the identity from the source with attributes from the target.
no code implementations • 26 Nov 2024 • Juefei He, Yuexing Peng, Wei Li, Junchuan Yu, Daqing Ge, Wei Xiang
Relic landslide, formed over a long period, possess the potential for reactivation, making them a hazardous geological phenomenon.
no code implementations • 23 Nov 2024 • Ming Hu, Kun Yuan, Yaling Shen, Feilong Tang, Xiaohao Xu, Lin Zhou, Wei Li, Ying Chen, Zhongxing Xu, Zelin Peng, Siyuan Yan, Vinkle Srivastav, Diping Song, Tianbin Li, Danli Shi, Jin Ye, Nicolas Padoy, Nassir Navab, Junjun He, ZongYuan Ge
Surgical practice involves complex visual interpretation, procedural skills, and advanced medical knowledge, making surgical vision-language pretraining (VLP) particularly challenging due to this complexity and the limited availability of annotated data.
2 code implementations • 23 Nov 2024 • Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, Saizhuo Wang, Kun Zhang, Yuanzhuo Wang, Wen Gao, Lionel Ni, Jian Guo
Accurate and consistent evaluation is crucial for decision-making across numerous fields, yet it remains a challenging task due to inherent subjectivity, variability, and scale.
1 code implementation • 21 Nov 2024 • Tianbin Li, Yanzhou Su, Wei Li, Bin Fu, Zhe Chen, Ziyan Huang, Guoan Wang, Chenglong Ma, Ying Chen, Ming Hu, Yanjun Li, Pengcheng Chen, Xiaowei Hu, Zhongying Deng, Yuanfeng Ji, Jin Ye, Yu Qiao, Junjun He
Despite significant advancements in general AI, its effectiveness in the medical domain is limited by the lack of specialized medical knowledge.
1 code implementation • 20 Nov 2024 • Yige Yuan, Bingbing Xu, Hexiang Tan, Fei Sun, Teng Xiao, Wei Li, HuaWei Shen, Xueqi Cheng
Confidence calibration in LLMs, i. e., aligning their self-assessed confidence with the actual accuracy of their responses, enabling them to self-evaluate the correctness of their outputs.
no code implementations • 13 Nov 2024 • Jiaxin Wan, Lin Liu, Haoran Wang, Liangwei Li, Wei Li, Shuheng Kou, Runtian Li, Jiayi Tang, Juanxiu Liu, Jing Zhang, Xiaohui Du, Ruqian Hao
Total hip arthroplasty (THA) relies on accurate landmark detection from radiographic images, but unstructured data caused by irregular patient postures or occluded anatomical markers pose significant challenges for existing methods.
5 code implementations • 13 Nov 2024 • Yingqi Gao, Yifu Liu, Xiaoxia Li, Xiaorong Shi, Yin Zhu, Yiming Wang, Shiqi Li, Wei Li, Yuntao Hong, Zhiling Luo, Jinyang Gao, Liyu Mou, Yu Li
On the other hand, we implement the ICL approach with an example selection method based on named entity recognition to prevent overemphasis on entities.
Ranked #1 on
Text-To-SQL
on SQL-Eval
1 code implementation • 6 Nov 2024 • Yu Guan, Qinrong Cai, Wei Li, Qiuyun Fan, Dong Liang, Qiegen Liu
To tackle these challenges, we introduce subspace diffusion model with orthogonal decomposition, a method (referred to as Sub-DM) that restrict the diffusion process via projections onto subspace as the k-space data distribution evolves toward noise.
no code implementations • 31 Oct 2024 • Zheng Ruan, Ruixuan Liu, Shimin Chen, Mengying Zhou, Xinquan Yang, Wei Li, Chen Chen, Wei Shen
In the task of dense video captioning of Soccernet dataset, we propose to generate a video caption of each soccer action and locate the timestamp of the caption.
no code implementations • 30 Oct 2024 • Zhan Zhuang, Xiequn Wang, Yulong Zhang, Wei Li, Yu Zhang, Ying WEI
We refer to this method as Cooperative LoRA (CopRA).
no code implementations • 29 Oct 2024 • Yifu Sun, xulong Zhang, Monan Zhou, Wei Li
Music emotion recognition (MER) aims to identify the emotions conveyed in a given musical piece.
no code implementations • 23 Oct 2024 • Feiyan Feng, Tianyu Liu, Hong Wang, Jun Zhao, Wei Li, Yanshen Sun
Therefore, this paper proposes a novel PGDiffSeg (Prior-Guided Diffusion Denoising Model with Parameter-Shared Attention) that applies diffusion denoising methods to breast cancer medical image segmentation, accurately recovering the affected areas from Gaussian noise.
no code implementations • 14 Oct 2024 • Junbo Qiao, Jincheng Liao, Wei Li, Yulun Zhang, Yong Guo, Yi Wen, Zhangxizi Qiu, Jiao Xie, Jie Hu, Shaohui Lin
State Space Models (SSM), such as Mamba, have shown strong representation ability in modeling long-range dependency with linear complexity, achieving successful applications from high-level to low-level vision tasks.
1 code implementation • 11 Oct 2024 • Wei Li, Luyao Zhu, Yang song, Ruixi Lin, Rui Mao, Yang You
In contrast, we advanced three safety alignment strategies to strengthen (by 12. 05%) the safety guardrail of LLMs.
no code implementations • 10 Oct 2024 • Shanyan Guan, Yanhao Ge, Ying Tai, Jian Yang, Wei Li, Mingyu You
Recent advancements in text-to-image diffusion models have shown remarkable creative capabilities with textual prompts, but generating personalized instances based on specific subjects, known as subject-driven generation, remains challenging.
no code implementations • 10 Oct 2024 • Junyi Cao, Shanyan Guan, Yanhao Ge, Wei Li, Xiaokang Yang, Chao Ma
While humans effortlessly discern intrinsic dynamics and adapt to new scenarios, modern AI systems often struggle.
no code implementations • 9 Oct 2024 • Changli Tang, Yixuan Li, Yudong Yang, Jimin Zhuang, Guangzhi Sun, Wei Li, Zujun Ma, Chao Zhang
To address potential catastrophic forgetting of non-captioning abilities due to mrDPO, we propose rebirth tuning, which finetunes the pre-DPO LLM by using the captions generated by the mrDPO-trained model as supervised labels.
1 code implementation • TGRS 2024 • Qizhi Xu, Jiuchen Chen, Xinyu Yan, Wei Li
To address this problem, we proposed the multiscale residual fusion network (MRF-Net) to remove thin cloud from infrared remote sensing imagery.
no code implementations • 3 Oct 2024 • Yuanhan Zhang, Jinming Wu, Wei Li, Bo Li, Zejun Ma, Ziwei Liu, Chunyuan Li
The development of video large multimodal models (LMMs) has been hindered by the difficulty of curating large amounts of high-quality raw data from the web.
no code implementations • 27 Sep 2024 • Yinjian Wang, Wei Li, Yuanyuan Gui, Qian Du, James E. Fowler
Hyperspectral super-resolution is commonly accomplished by the fusing of a hyperspectral imaging of low spatial resolution with a multispectral image of high spatial resolution, and many tensor-based approaches to this task have been recently proposed.
2 code implementations • 27 Sep 2024 • Bin Wang, Chao Xu, Xiaomeng Zhao, Linke Ouyang, Fan Wu, Zhiyuan Zhao, Rui Xu, Kaiwen Liu, Yuan Qu, FuKai Shang, Bo Zhang, Liqun Wei, Zhihao Sui, Wei Li, Botian Shi, Yu Qiao, Dahua Lin, Conghui He
Document content analysis has been a crucial research area in computer vision.
1 code implementation • 17 Sep 2024 • Jiaxing Xu, Kai He, Mengcheng Lan, Qingtian Bian, Wei Li, Tieying Li, Yiping Ke, Miao Qiao
It generates a prior-knowledge-enhanced contrast graph to address the distribution shifts across sub-populations by a two-stream attention mechanism.
no code implementations • 22 Aug 2024 • Hanzheng Wang, Wei Li, Xiang-Gen Xia, Qian Du
This bias allows the tracker to directly use the visual features obtained from the false-color images generated by hyperspectral images without the need to extract spectral features.
1 code implementation • 20 Aug 2024 • Jiawei Han, Kaiqi Liu, Wei Li, Guangzhi Chen
Specifically, the point cloud is initially separated into independent point sets by category to provide initial conditions for the generation of feature subspaces.
Ranked #13 on
Semantic Segmentation
on S3DIS Area5
no code implementations • 18 Aug 2024 • Jiadong Feng, Wei Li, Zhao Wei, Yong Xu, Juhong Wang, Hui Li
However, developers may not follow the same naming conventions and the same variable may have different variable names in different implementations, bringing a challenge to deep learning based code search methods that rely on explicit variable correspondences to understand source code.
no code implementations • 14 Aug 2024 • CanYi Liu, Wei Li, Youchen, Zhang, Hui Li, Rongrong Ji
Built on top of coarse-grained adaption for capturing inter-item relations, DARec is further enhanced with (1) context masking that models intra-item relations to help LLM better understand token and item semantics in the context of SRS, (2) collaborative knowledge injection that helps LLM incorporate long-term collaborative knowledge, and (3) a dynamic adaption mechanism that uses Bayesian optimization to flexibly choose layer-wise adapter architectures in order to better incorporate different sequential information.
1 code implementation • 14 Aug 2024 • Junxian Li, Di Zhang, Xunzhi Wang, Zeying Hao, Jingdi Lei, Qian Tan, Cai Zhou, Wei Liu, Yaotian Yang, Xinrui Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Wei Li, Shufei Zhang, Mao Su, Wanli Ouyang, Yuqiang Li, Dongzhan Zhou
We benchmark ChemVLM against a range of open-source and proprietary multimodal large language models on various tasks.
1 code implementation • 9 Aug 2024 • Zeyu Yang, Nan Song, Wei Li, Xiatian Zhu, Li Zhang, Philip H. S. Torr
To demonstrate the effectiveness of the proposed strategy, we design DeepInteraction++, a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder.
no code implementations • 8 Aug 2024 • Wan Li, Xinyun Zhong, Wei Li, Song Zhang, Moheng Rong, Yan Xi, Peng Yuan, Zechen Wang, Xiaolei Jiang, Rongxi Yi, Hui Tang, Yang Chen, Chaohui Tong, Zhan Wu, Feng Wang
The experimental results confirm the effectiveness of the respiratory subtraction method and the proposed quantitative evaluation metric in assessing lung tumor treatment.
1 code implementation • 6 Aug 2024 • Pengcheng Chen, Jin Ye, Guoan Wang, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J Seibel, Junjun He, Yu Qiao
Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields.
no code implementations • 6 Aug 2024 • Xiaofeng Mao, Zhengkai Jiang, Qilin Wang, Chencan Fu, Jiangning Zhang, Jiafu Wu, Yabiao Wang, Chengjie Wang, Wei Li, Mingmin Chi
In an attempt to bridge this research gap, we introduce a novel Masked Diffusion Transformer for co-speech gesture generation, referred to as MDT-A2G, which directly implements the denoising process on gesture sequences.
no code implementations • 31 Jul 2024 • Jichuan Zhang, Wei Li, Shuang Cheng, Ya-Li Li, Shengjin Wang
These new object queries are aggregated with those from previous phases to adapt both old and new knowledge well.
2 code implementations • 31 Jul 2024 • Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang, Bobbie Chern, Charlotte Caucheteux, Chaya Nayak, Chloe Bi, Chris Marra, Chris McConnell, Christian Keller, Christophe Touret, Chunyang Wu, Corinne Wong, Cristian Canton Ferrer, Cyrus Nikolaidis, Damien Allonsius, Daniel Song, Danielle Pintz, Danny Livshits, Danny Wyatt, David Esiobu, Dhruv Choudhary, Dhruv Mahajan, Diego Garcia-Olano, Diego Perino, Dieuwke Hupkes, Egor Lakomkin, Ehab AlBadawy, Elina Lobanova, Emily Dinan, Eric Michael Smith, Filip Radenovic, Francisco Guzmán, Frank Zhang, Gabriel Synnaeve, Gabrielle Lee, Georgia Lewis Anderson, Govind Thattai, Graeme Nail, Gregoire Mialon, Guan Pang, Guillem Cucurell, Hailey Nguyen, Hannah Korevaar, Hu Xu, Hugo Touvron, Iliyan Zarov, Imanol Arrieta Ibarra, Isabel Kloumann, Ishan Misra, Ivan Evtimov, Jack Zhang, Jade Copet, Jaewon Lee, Jan Geffert, Jana Vranes, Jason Park, Jay Mahadeokar, Jeet Shah, Jelmer Van der Linde, Jennifer Billock, Jenny Hong, Jenya Lee, Jeremy Fu, Jianfeng Chi, Jianyu Huang, Jiawen Liu, Jie Wang, Jiecao Yu, Joanna Bitton, Joe Spisak, Jongsoo Park, Joseph Rocca, Joshua Johnstun, Joshua Saxe, Junteng Jia, Kalyan Vasuden Alwala, Karthik Prasad, Kartikeya Upasani, Kate Plawiak, Ke Li, Kenneth Heafield, Kevin Stone, Khalid El-Arini, Krithika Iyer, Kshitiz Malik, Kuenley Chiu, Kunal Bhalla, Kushal Lakhotia, Lauren Rantala-Yeary, Laurens van der Maaten, Lawrence Chen, Liang Tan, Liz Jenkins, Louis Martin, Lovish Madaan, Lubo Malo, Lukas Blecher, Lukas Landzaat, Luke de Oliveira, Madeline Muzzi, Mahesh Pasupuleti, Mannat Singh, Manohar Paluri, Marcin Kardas, Maria Tsimpoukelli, Mathew Oldham, Mathieu Rita, Maya Pavlova, Melanie Kambadur, Mike Lewis, Min Si, Mitesh Kumar Singh, Mona Hassan, Naman Goyal, Narjes Torabi, Nikolay Bashlykov, Nikolay Bogoychev, Niladri Chatterji, Ning Zhang, Olivier Duchenne, Onur Çelebi, Patrick Alrassy, Pengchuan Zhang, Pengwei Li, Petar Vasic, Peter Weng, Prajjwal Bhargava, Pratik Dubal, Praveen Krishnan, Punit Singh Koura, Puxin Xu, Qing He, Qingxiao Dong, Ragavan Srinivasan, Raj Ganapathy, Ramon Calderer, Ricardo Silveira Cabral, Robert Stojnic, Roberta Raileanu, Rohan Maheswari, Rohit Girdhar, Rohit Patel, Romain Sauvestre, Ronnie Polidoro, Roshan Sumbaly, Ross Taylor, Ruan Silva, Rui Hou, Rui Wang, Saghar Hosseini, Sahana Chennabasappa, Sanjay Singh, Sean Bell, Seohyun Sonia Kim, Sergey Edunov, Shaoliang Nie, Sharan Narang, Sharath Raparthy, Sheng Shen, Shengye Wan, Shruti Bhosale, Shun Zhang, Simon Vandenhende, Soumya Batra, Spencer Whitman, Sten Sootla, Stephane Collot, Suchin Gururangan, Sydney Borodinsky, Tamar Herman, Tara Fowler, Tarek Sheasha, Thomas Georgiou, Thomas Scialom, Tobias Speckbacher, Todor Mihaylov, Tong Xiao, Ujjwal Karn, Vedanuj Goswami, Vibhor Gupta, Vignesh Ramanathan, Viktor Kerkez, Vincent Gonguet, Virginie Do, Vish Vogeti, Vítor Albiero, Vladan Petrovic, Weiwei Chu, Wenhan Xiong, Wenyin Fu, Whitney Meers, Xavier Martinet, Xiaodong Wang, Xiaofang Wang, Xiaoqing Ellen Tan, Xide Xia, Xinfeng Xie, Xuchao Jia, Xuewei Wang, Yaelle Goldschlag, Yashesh Gaur, Yasmine Babaei, Yi Wen, Yiwen Song, Yuchen Zhang, Yue Li, Yuning Mao, Zacharie Delpierre Coudert, Zheng Yan, Zhengxing Chen, Zoe Papakipos, Aaditya Singh, Aayushi Srivastava, Abha Jain, Adam Kelsey, Adam Shajnfeld, Adithya Gangidi, Adolfo Victoria, Ahuva Goldstand, Ajay Menon, Ajay Sharma, Alex Boesenberg, Alexei Baevski, Allie Feinstein, Amanda Kallet, Amit Sangani, Amos Teo, Anam Yunus, Andrei Lupu, Andres Alvarado, Andrew Caples, Andrew Gu, Andrew Ho, Andrew Poulton, Andrew Ryan, Ankit Ramchandani, Annie Dong, Annie Franco, Anuj Goyal, Aparajita Saraf, Arkabandhu Chowdhury, Ashley Gabriel, Ashwin Bharambe, Assaf Eisenman, Azadeh Yazdan, Beau James, Ben Maurer, Benjamin Leonhardi, Bernie Huang, Beth Loyd, Beto De Paola, Bhargavi Paranjape, Bing Liu, Bo Wu, Boyu Ni, Braden Hancock, Bram Wasti, Brandon Spence, Brani Stojkovic, Brian Gamido, Britt Montalvo, Carl Parker, Carly Burton, Catalina Mejia, Ce Liu, Changhan Wang, Changkyu Kim, Chao Zhou, Chester Hu, Ching-Hsiang Chu, Chris Cai, Chris Tindal, Christoph Feichtenhofer, Cynthia Gao, Damon Civin, Dana Beaty, Daniel Kreymer, Daniel Li, David Adkins, David Xu, Davide Testuggine, Delia David, Devi Parikh, Diana Liskovich, Didem Foss, Dingkang Wang, Duc Le, Dustin Holland, Edward Dowling, Eissa Jamil, Elaine Montgomery, Eleonora Presani, Emily Hahn, Emily Wood, Eric-Tuan Le, Erik Brinkman, Esteban Arcaute, Evan Dunbar, Evan Smothers, Fei Sun, Felix Kreuk, Feng Tian, Filippos Kokkinos, Firat Ozgenel, Francesco Caggioni, Frank Kanayet, Frank Seide, Gabriela Medina Florez, Gabriella Schwarz, Gada Badeer, Georgia Swee, Gil Halpern, Grant Herman, Grigory Sizov, Guangyi, Zhang, Guna Lakshminarayanan, Hakan Inan, Hamid Shojanazeri, Han Zou, Hannah Wang, Hanwen Zha, Haroun Habeeb, Harrison Rudolph, Helen Suk, Henry Aspegren, Hunter Goldman, Hongyuan Zhan, Ibrahim Damlaj, Igor Molybog, Igor Tufanov, Ilias Leontiadis, Irina-Elena Veliche, Itai Gat, Jake Weissman, James Geboski, James Kohli, Janice Lam, Japhet Asher, Jean-Baptiste Gaya, Jeff Marcus, Jeff Tang, Jennifer Chan, Jenny Zhen, Jeremy Reizenstein, Jeremy Teboul, Jessica Zhong, Jian Jin, Jingyi Yang, Joe Cummings, Jon Carvill, Jon Shepard, Jonathan McPhie, Jonathan Torres, Josh Ginsburg, Junjie Wang, Kai Wu, Kam Hou U, Karan Saxena, Kartikay Khandelwal, Katayoun Zand, Kathy Matosich, Kaushik Veeraraghavan, Kelly Michelena, Keqian Li, Kiran Jagadeesh, Kun Huang, Kunal Chawla, Kyle Huang, Lailin Chen, Lakshya Garg, Lavender A, Leandro Silva, Lee Bell, Lei Zhang, Liangpeng Guo, Licheng Yu, Liron Moshkovich, Luca Wehrstedt, Madian Khabsa, Manav Avalani, Manish Bhatt, Martynas Mankus, Matan Hasson, Matthew Lennie, Matthias Reso, Maxim Groshev, Maxim Naumov, Maya Lathi, Meghan Keneally, Miao Liu, Michael L. Seltzer, Michal Valko, Michelle Restrepo, Mihir Patel, Mik Vyatskov, Mikayel Samvelyan, Mike Clark, Mike Macey, Mike Wang, Miquel Jubert Hermoso, Mo Metanat, Mohammad Rastegari, Munish Bansal, Nandhini Santhanam, Natascha Parks, Natasha White, Navyata Bawa, Nayan Singhal, Nick Egebo, Nicolas Usunier, Nikhil Mehta, Nikolay Pavlovich Laptev, Ning Dong, Norman Cheng, Oleg Chernoguz, Olivia Hart, Omkar Salpekar, Ozlem Kalinli, Parkin Kent, Parth Parekh, Paul Saab, Pavan Balaji, Pedro Rittner, Philip Bontrager, Pierre Roux, Piotr Dollar, Polina Zvyagina, Prashant Ratanchandani, Pritish Yuvraj, Qian Liang, Rachad Alao, Rachel Rodriguez, Rafi Ayub, Raghotham Murthy, Raghu Nayani, Rahul Mitra, Rangaprabhu Parthasarathy, Raymond Li, Rebekkah Hogan, Robin Battey, Rocky Wang, Russ Howes, Ruty Rinott, Sachin Mehta, Sachin Siby, Sai Jayesh Bondu, Samyak Datta, Sara Chugh, Sara Hunt, Sargun Dhillon, Sasha Sidorov, Satadru Pan, Saurabh Mahajan, Saurabh Verma, Seiji Yamamoto, Sharadh Ramaswamy, Shaun Lindsay, Sheng Feng, Shenghao Lin, Shengxin Cindy Zha, Shishir Patil, Shiva Shankar, Shuqiang Zhang, Sinong Wang, Sneha Agarwal, Soji Sajuyigbe, Soumith Chintala, Stephanie Max, Stephen Chen, Steve Kehoe, Steve Satterfield, Sudarshan Govindaprasad, Sumit Gupta, Summer Deng, Sungmin Cho, Sunny Virk, Suraj Subramanian, Sy Choudhury, Sydney Goldman, Tal Remez, Tamar Glaser, Tamara Best, Thilo Koehler, Thomas Robinson, Tianhe Li, Tianjun Zhang, Tim Matthews, Timothy Chou, Tzook Shaked, Varun Vontimitta, Victoria Ajayi, Victoria Montanez, Vijai Mohan, Vinay Satish Kumar, Vishal Mangla, Vlad Ionescu, Vlad Poenaru, Vlad Tiberiu Mihailescu, Vladimir Ivanov, Wei Li, Wenchen Wang, WenWen Jiang, Wes Bouaziz, Will Constable, Xiaocheng Tang, Xiaojian Wu, Xiaolan Wang, Xilun Wu, Xinbo Gao, Yaniv Kleinman, Yanjun Chen, Ye Hu, Ye Jia, Ye Qi, Yenda Li, Yilin Zhang, Ying Zhang, Yossi Adi, Youngjin Nam, Yu, Wang, Yu Zhao, Yuchen Hao, Yundi Qian, Yunlu Li, Yuzi He, Zach Rait, Zachary DeVito, Zef Rosnbrick, Zhaoduo Wen, Zhenyu Yang, Zhiwei Zhao, Zhiyu Ma
This paper presents a new set of foundation models, called Llama 3.
Ranked #3 on
Question Answering
on PeerQA
no code implementations • 14 Jul 2024 • Xiaoyu Liu, Yun Zhang, Wei Li, Simiao Li, Xudong Huang, Hanting Chen, Yehui Tang, Jie Hu, Zhiwei Xiong, Yunhe Wang
At the sequence level, we propose a sequence correction and re-generation (SCRG) strategy.
no code implementations • 11 Jul 2024 • Zujin Guo, Wei Li, Chen Change Loy
Motion modeling is critical in flow-based Video Frame Interpolation (VFI).
no code implementations • 11 Jul 2024 • Runyuan Ma, Wei Li, FuKai Shang
Construction techniques and methods for public fine-tuning datasets of Large Language Models (LLMs), including data generation and data augmentation among others, are detailed.
2 code implementations • 10 Jul 2024 • Feng Li, Renrui Zhang, Hao Zhang, Yuanhan Zhang, Bo Li, Wei Li, Zejun Ma, Chunyuan Li
To this end, we introduce LLaVA-NeXT-Interleave, which simultaneously tackles Multi-image, Multi-frame (video), Multi-view (3D), and Multi-patch (single-image) scenarios in LMMs.
Ranked #6 on
Zero-Shot Video Question Answer
on VNBench
no code implementations • 7 Jul 2024 • Qiqi He, Xuchen Song, Weituo Hao, Ju-Chiang Wang, Wei-Tsung Lu, Wei Li
For the case where the artist information is available, we extend the audio-based model to take multimodal inputs and develop a framework, called MultiModal Contrastive (MMC) learning, to enhance the training.
1 code implementation • 3 Jul 2024 • Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
This long-context capability allows IXC-2. 5 to excel in tasks requiring extensive input and output contexts.
Ranked #4 on
Video Question Answering
on TVBench
1 code implementation • 2 Jul 2024 • Zihao Huang, Shoukang Hu, Guangcong Wang, Tianqi Liu, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu
However, their annotating requirements are impractical for real-world images or videos, posing challenges toward real-world applications on current avatar creation methods.
1 code implementation • 28 Jun 2024 • Wei Li, Jingyang Zhang, Pheng-Ann Heng, Lixu Gu
Generalist segmentation models are increasingly favored for diverse tasks involving various objects from different image sources.
no code implementations • 25 Jun 2024 • Van Tung Pham, Yist Lin, Tao Han, Wei Li, Jun Zhang, Lu Lu, Yuxuan Wang
Finally, we explore training and inference methods to mitigate high insertion errors.
1 code implementation • 24 Jun 2024 • Yirui Chen, Xudong Huang, Quan Zhang, Wei Li, Mingjian Zhu, Qiangyu Yan, Simiao Li, Hanting Chen, Hailin Hu, Jie Yang, Wei Liu, Jie Hu
The extraordinary ability of generative models emerges as a new trend in image editing and generating realistic images, posing a serious threat to the trustworthiness of multimedia data and driving the research of image manipulation detection and location (IMDL).
1 code implementation • 22 Jun 2024 • Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang
To obtain fine-grained temporal information required by speech understanding, while keeping efficient for other video elements, this paper proposes a novel multi-resolution causal Q-Former (MRC Q-Former) structure to connect pre-trained audio-visual encoders and the backbone large language model.
1 code implementation • 12 Jun 2024 • Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang, Min Dou, Changyao Tian, Xizhou Zhu, Lewei Lu, Yushi Chen, Junjun He, Zhongying Tu, Tong Lu, Yali Wang, LiMin Wang, Dahua Lin, Yu Qiao, Botian Shi, Conghui He, Jifeng Dai
In this paper, we introduce OmniCorpus, a 10 billion-scale image-text interleaved dataset.
no code implementations • 12 Jun 2024 • Yerbolat Khassanov, Zhipeng Chen, Tianfeng Chen, Tze Yuang Chong, Wei Li, Jun Zhang, Lu Lu, Yuxuan Wang
This paper addresses challenges in integrating new languages into a pre-trained multilingual automatic speech recognition (mASR) system, particularly in scenarios where training data for existing languages is limited or unavailable.
no code implementations • 11 Jun 2024 • Wen Luo, Tianshu Shen, Wei Li, Guangyue Peng, Richeng Xuan, Houfeng Wang, Xi Yang
Leveraging HalluDial, we conduct a comprehensive meta-evaluation of LLMs' hallucination evaluation capabilities in information-seeking dialogues and introduce a specialized judge language model, HalluJudge.
2 code implementations • 9 Jun 2024 • Size Wu, Sheng Jin, Wenwei Zhang, Lumin Xu, Wentao Liu, Wei Li, Chen Change Loy
To address this issue, we present F-LMM -- grounding frozen off-the-shelf LMMs in human-AI conversations -- a straightforward yet effective design based on the fact that word-pixel correspondences conducive to visual grounding inherently exist in the attention weights of well-trained LMMs.
1 code implementation • 9 Jun 2024 • Wei Li, Pin-Yu Chen, Sijia Liu, Ren Wang
PSBD is motivated by an intriguing Prediction Shift (PS) phenomenon, where poisoned models' predictions on clean data often shift away from true labels towards certain other labels with dropout applied during inference, while backdoor samples exhibit less PS.
no code implementations • 6 Jun 2024 • Wei Li, William Bishop, Alice Li, Chris Rawles, Folawiyo Campbell-Ajala, Divya Tyamagundlu, Oriana Riva
Moreover, AndroidControl is the most diverse computer control dataset to date, including 14, 548 unique tasks over 833 Android apps, thus allowing us to conduct in-depth analysis of the model performance in and out of the domain of the training data.
no code implementations • 4 Jun 2024 • Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin
The dispersion of data sources and diversity of data formats often lead to inefficiencies in data retrieval and processing, significantly impeding the progress of AI research and applications.
no code implementations • 29 May 2024 • Lianlei Shan, Wenzhang Zhou, Wei Li, Xingyu Ding
During incrementally learning novel classes, the data distribution of old classes will be destroyed, leading to catastrophic forgetting.
no code implementations • 28 May 2024 • Lianlei Shan, Wenzhang Zhou, Wei Li, Xingyu Ding
The joint task of the two is dubbed Learning with Selective Forgetting (LSF).
1 code implementation • 28 May 2024 • Wei Li, Houfeng Wang
Grammatical error correction (GEC) is a task dedicated to rectifying texts with minimal edits, which can be decoupled into two components: detection and correction.
no code implementations • 28 May 2024 • Xingyu Ding, Lianlei Shan, Guiqin Zhao, Meiqi Wu, Wenzhang Zhou, Wei Li
Then we further optimize the attention method which plays an important role in segmentation but has huge computation complexity.
no code implementations • 25 May 2024 • Junlin Song, Yuzhuo Chen, Yuan YAO, Zetong Chen, Renhao Guo, Lida Yang, Xinyi Sui, Qihang Wang, Xijiao Li, Aihua Cao, Wei Li
To address this gap, we develop a computer-aided diagnostic model focusing on white matter regions in brain MRI by employing radiomics and machine learning methods.
1 code implementation • 23 May 2024 • Christopher Rawles, Sarah Clinckemaillie, Yifan Chang, Jonathan Waltz, Gabrielle Lau, Marybeth Fair, Alice Li, William Bishop, Wei Li, Folawiyo Campbell-Ajala, Daniel Toyama, Robert Berry, Divya Tyamagundlu, Timothy Lillicrap, Oriana Riva
AndroidWorld and the experiments in this paper are available at github. com/google-research/android_world.
no code implementations • 22 May 2024 • Liyu Chen, Huaao Tang, Yi Wen, Hanting Chen, Wei Li, Junchao Liu, Jie Hu
To address these issues, we propose the Collaboration of Teachers Framework (CTF), which consists of multiple pairs of teacher and student models for training.
1 code implementation • 22 May 2024 • Wei Li, Hehe Fan, Yongkang Wong, Mohan Kankanhalli, Yi Yang
Recent advancements in image understanding have benefited from the extensive use of web image-text pairs.
no code implementations • 21 May 2024 • Yue Han, Junwei Zhu, Keke He, Xu Chen, Yanhao Ge, Wei Li, Xiangtai Li, Jiangning Zhang, Chengjie Wang, Yong liu
We observe that both face reenactment/swapping tasks essentially involve combinations of target structure, ID and attribute.
1 code implementation • 20 May 2024 • Tianqi Liu, Guangcong Wang, Shoukang Hu, Liao Shen, Xinyi Ye, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu
We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes.
no code implementations • 8 May 2024 • Sijing Xie, Chengxin Zhao, Nan Sun, Wei Li, Hefei Ling
To improve the robustness of the decoder against stronger noise, this paper proposes to introduce a denoise module between the noise layer and the decoder.
no code implementations • 7 May 2024 • Dingrui Wang, Zheyuan Lai, Yuda Li, Yi Wu, Yuexin Ma, Johannes Betz, Ruigang Yang, Wei Li
Furthermore, a new metric named clamped temporal error (CTE) is proposed to give a more comprehensive evaluation of prediction performance, especially in time-sensitive emergency events of subseconds.
no code implementations • 29 Apr 2024 • Wei Li, Ren Ma, Jiang Wu, Chenya Gu, Jiahui Peng, Jinyang Len, Songyang Zhang, Hang Yan, Dahua Lin, Conghui He
In the burgeoning field of large language models (LLMs), the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture.
no code implementations • 25 Apr 2024 • Zhiwei Dong, Xi Zhu, Xiya Cao, Ran Ding, Wei Li, Caifa Zhou, Yongliang Wang, Qiangbo Liu
B\'{e}zierFormer formulate queries as B\'{e}zier control points and incorporate a novel B\'{e}zier curve attention mechanism.
1 code implementation • 25 Apr 2024 • Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Min Dou, Lewei Lu, Xizhou Zhu, Tong Lu, Dahua Lin, Yu Qiao, Jifeng Dai, Wenhai Wang
Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks.
Ranked #6 on
Multiple-choice
on Neptune-Full
3 code implementations • 16 Apr 2024 • Bin Ren, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang, Wei Zhai, Renjing Pei, Jiaming Guo, Songcen Xu, Yang Cao, ZhengJun Zha, Yan Wang, Yi Liu, Qing Wang, Gang Zhang, Liou Zhang, Shijie Zhao, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Xin Liu, Min Yan, Menghan Zhou, Yiqiang Yan, Yixuan Liu, Wensong Chan, Dehua Tang, Dong Zhou, Li Wang, Lu Tian, Barsoum Emad, Bohan Jia, Junbo Qiao, Yunshuai Zhou, Yun Zhang, Wei Li, Shaohui Lin, Shenglong Zhou, Binbin Chen, Jincheng Liao, Suiyi Zhao, Zhao Zhang, Bo wang, Yan Luo, Yanyan Wei, Feng Li, Mingshen Wang, Yawei Li, Jinhan Guan, Dehua Hu, Jiawei Yu, Qisheng Xu, Tao Sun, Long Lan, Kele Xu, Xin Lin, Jingtong Yue, Lehan Yang, Shiyi Du, Lu Qi, Chao Ren, Zeyu Han, YuHan Wang, Chaolin Chen, Haobo Li, Mingjun Zheng, Zhongbao Yang, Lianhong Song, Xingzhuo Yan, Minghan Fu, Jingyi Zhang, Baiang Li, Qi Zhu, Xiaogang Xu, Dan Guo, Chunle Guo, Jiadi Chen, Huanhuan Long, Chunjiang Duanmu, Xiaoyan Lei, Jie Liu, Weilin Jia, Weifeng Cao, Wenlong Zhang, Yanyu Mao, Ruilong Guo, Nihao Zhang, Qian Wang, Manoj Pandey, Maksym Chernozhukov, Giang Le, Shuli Cheng, Hongyuan Wang, Ziyan Wei, Qingting Tang, Liejun Wang, Yongming Li, Yanhui Guo, Hao Xu, Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou, Amogh Joshi, Nikhil Akalwadi, Sampada Malagi, Palani Yashaswini, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi
In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking.
2 code implementations • 9 Apr 2024 • Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution.
Ranked #47 on
Visual Question Answering
on MM-Vet
1 code implementation • 9 Apr 2024 • Junbo Qiao, Wei Li, Haizhen Xie, Hanting Chen, Yunshuai Zhou, Zhijun Tu, Jie Hu, Shaohui Lin
Extensive experiments on multiple image processing tasks (e. g., image super-resolution (SR), JPEG artifact reduction, and image denoising) demonstrate the superiority of LIPT on both latency and PSNR.
no code implementations • 3 Apr 2024 • Simiao Li, Yun Zhang, Wei Li, Hanting Chen, Wenjia Wang, BingYi Jing, Shaohui Lin, Jie Hu
Knowledge distillation (KD) is a promising yet challenging model compression technique that transfers rich learning representations from a well-performing but cumbersome teacher model to a compact student model.
1 code implementation • 1 Apr 2024 • Ang Bian, Wei Li, Hangjie Yuan, Chengrong Yu, Mang Wang, Zixiang Zhao, Aojun Lu, Pengliang Ji, Tao Feng
A general framework of C-Flat applied to all CL categories and a thorough comparison with loss minima optimizer and flat minima based CL approaches is presented in this paper, showing that our method can boost CL performance in almost all cases.
no code implementations • 31 Mar 2024 • Zhijun Tu, Kunpeng Du, Hanting Chen, Hailing Wang, Wei Li, Jie Hu, Yunhe Wang
Recent advances have demonstrated the powerful capability of transformer architecture in image restoration.
3 code implementations • 26 Mar 2024 • Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang, Penglong Jiao, Zhenjiang Jin, Zhikai Lei, Jiaxing Li, Jingwen Li, Linyang Li, Shuaibin Li, Wei Li, Yining Li, Hongwei Liu, Jiangning Liu, Jiawei Hong, Kaiwen Liu, Kuikun Liu, Xiaoran Liu, Chengqi Lv, Haijun Lv, Kai Lv, Li Ma, Runyuan Ma, Zerun Ma, Wenchang Ning, Linke Ouyang, Jiantao Qiu, Yuan Qu, FuKai Shang, Yunfan Shao, Demin Song, Zifan Song, Zhihao Sui, Peng Sun, Yu Sun, Huanze Tang, Bin Wang, Guoteng Wang, Jiaqi Wang, Jiayu Wang, Rui Wang, Yudong Wang, Ziyi Wang, Xingjian Wei, Qizhen Weng, Fan Wu, Yingtong Xiong, Chao Xu, Ruiliang Xu, Hang Yan, Yirong Yan, Xiaogui Yang, Haochen Ye, Huaiyuan Ying, JIA YU, Jing Yu, Yuhang Zang, Chuyu Zhang, Li Zhang, Pan Zhang, Peng Zhang, Ruijie Zhang, Shuo Zhang, Songyang Zhang, Wenjian Zhang, Wenwei Zhang, Xingcheng Zhang, Xinyue Zhang, Hui Zhao, Qian Zhao, Xiaomeng Zhao, Fengzhe Zhou, Zaida Zhou, Jingming Zhuo, Yicheng Zou, Xipeng Qiu, Yu Qiao, Dahua Lin
The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).
Ranked #5 on
Long-Context Understanding
on Ada-LEval (BestAnswer)
no code implementations • CVPR 2024 • Quan Zhang, Xiaoyu Liu, Wei Li, Hanting Chen, Junchao Liu, Jie Hu, Zhiwei Xiong, Chun Yuan, Yunhe Wang
SPD leverages a self-distillation manner to distill the fused semantic priors to boost the performance of original IR models.
2 code implementations • 25 Mar 2024 • Daoguang Zan, Ailun Yu, Wei Liu, Dong Chen, Bo Shen, Wei Li, Yafen Yao, Yongshun Gong, Xiaolin Chen, Bei guan, Zhiguang Yang, Yongji Wang, Qianxiang Wang, Lizhen Cui
For feedback-based evaluation, we develop a VSCode plugin for CodeS and engage 30 participants in conducting empirical studies.
1 code implementation • CVPR 2024 • Junbo Yin, Jianbing Shen, Runnan Chen, Wei Li, Ruigang Yang, Pascal Frossard, Wenguan Wang
HSF applies Point-to-Grid and Grid-to-Region transformers to capture the multimodal scene context at different granularities.
1 code implementation • 21 Mar 2024 • Jiaxing Sun, Weiquan Huang, Jiang Wu, Chenya Gu, Wei Li, Songyang Zhang, Hang Yan, Conghui He
We introduce CHARM, the first benchmark for comprehensively and in-depth evaluating the commonsense reasoning ability of large language models (LLMs) in Chinese, which covers both globally known and Chinese-specific commonsense.
2 code implementations • 20 Mar 2024 • Ziyu Liu, Zeyi Sun, Yuhang Zang, Wei Li, Pan Zhang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
Notably, our approach demonstrates a significant improvement in performance on 5 fine-grained visual recognition benchmarks, 11 few-shot image recognition datasets, and the 2 object detection datasets under the zero-shot recognition setting.
no code implementations • 15 Mar 2024 • Hakim Sidahmed, Samrat Phatale, Alex Hutcheson, Zhuonan Lin, Zhang Chen, Zac Yu, Jarvis Jin, Simral Chaudhary, Roman Komarytsia, Christiane Ahlheim, Yonghao Zhu, Bowen Li, Saravanan Ganesh, Bill Byrne, Jessica Hoffmann, Hassan Mansoor, Wei Li, Abhinav Rastogi, Lucas Dixon
In this work, we empirically evaluate the setup of Parameter Efficient Reinforcement Learning from Human Feedback (PE-RLHF) that leverages LoRA fine-tuning for Reward Modeling, and Reinforcement Learning.
1 code implementation • 12 Mar 2024 • Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan Yang, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng
After instruction tuning, KnowCoder further exhibits strong generalization ability on unseen schemas and achieves up to $\textbf{12. 5%}$ and $\textbf{21. 9%}$, compared to sota baselines, under the zero-shot setting and the low resource setting, respectively.
Ranked #1 on
UIE
on ACE 2005-EAE
no code implementations • 9 Mar 2024 • Hanzheng Wang, Wei Li, Xiang-Gen Xia, Qian Du, Jing Tian
Hyperspectral video (HSV) offers valuable spatial, spectral, and temporal information simultaneously, making it highly suitable for handling challenges such as background clutter and visual similarity in object tracking.
no code implementations • 29 Feb 2024 • Jiantao Qiu, Haijun Lv, Zhenjiang Jin, Rui Wang, Wenchang Ning, JIA YU, Chaobin Zhang, Zhenxiang Li, Pei Chu, Yuan Qu, Jin Shi, Lindong Lu, Runyu Peng, Zhiyuan Zeng, Huanze Tang, Zhikai Lei, Jiawei Hong, Keyu Chen, Zhaoye Fei, Ruiliang Xu, Wei Li, Zhongying Tu, Lin Dahua, Yu Qiao, Hang Yan, Conghui He
To evaluate the quality and utility of the dataset, we trained 1B-parameter and 3B-parameter models using WanJuan-CC and another dataset, RefinedWeb.
1 code implementation • 23 Feb 2024 • Xuhui Jiang, Yinghan Shen, Zhichao Shi, Chengjin Xu, Wei Li, Zixuan Li, Jian Guo, HuaWei Shen, Yuanzhuo Wang
To address the constraints of limited input KG data, ChatEA introduces a KG-code translation module that translates KG structures into a format understandable by LLMs, thereby allowing LLMs to utilize their extensive background knowledge to improve EA accuracy.
no code implementations • 22 Feb 2024 • Shaojie Zhang, Yinghui Wang, Jiaxing Ma, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov
In Visual SLAM, achieving accurate feature matching consumes a significant amount of time, severely impacting the real-time performance of the system.
no code implementations • 21 Feb 2024 • Shaojie Zhang, Yinghui Wang, Jiaxing Ma, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov
Feature matching is a fundamental and crucial process in visual SLAM, and precision has always been a challenging issue in feature matching.
no code implementations • 18 Feb 2024 • Shaojie Zhang, Yinghui Wang, Bin Nan, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov
To address the issue of increased triangulation uncertainty caused by selecting views with small camera baselines in Structure from Motion (SFM) view selection, this paper proposes a robust error-resistant view selection method.
no code implementations • 15 Feb 2024 • Shaojie Zhang, Yinghui Wang, Bin Nan, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov
To address the issue of feature descriptors being ineffective in representing grayscale feature information when images undergo high affine transformations, leading to a rapid decline in feature matching accuracy, this paper proposes a region feature descriptor based on simulating affine transformations using classification.
no code implementations • 11 Feb 2024 • Shaojie Zhang, Yinghui Wang, Peixuan Liu, Wei Li, Jinlong Yang, Tao Yan, Yukai Wang, Liangyi Huang, Mingfeng Wang, Ibragim R. Atadjanov
The images captured by Wireless Capsule Endoscopy (WCE) always exhibit specular reflections, and removing highlights while preserving the color and texture in the region remains a challenge.
1 code implementation • 29 Jan 2024 • Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang
We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-form text-image composition and comprehension.
Ranked #18 on
Visual Question Answering
on MM-Vet v2
1 code implementation • 25 Jan 2024 • Wei Li, Daoguang Zan, Bei guan, Ailun Yu, Xiaolin Chen, Yongji Wang
Code large language models (Code LLMs) have demonstrated remarkable performance in code generation.
no code implementations • 24 Jan 2024 • Wei Li, Xue Xu, Jiachen Liu, Xinyan Xiao
This paper presents UNIMO-G, a simple multimodal conditional diffusion framework that operates on multimodal prompts with interleaved textual and visual inputs, which demonstrates a unified ability for both text-driven and subject-driven image generation.
1 code implementation • CVPR 2024 • Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy
In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models.
no code implementations • 17 Jan 2024 • Xiaolin Chen, Daoguang Zan, Wei Li, Bei guan, Yongji Wang
Specifically, the malicious participant initially employs semi-supervised learning to train a surrogate target model.
no code implementations • 2 Jan 2024 • Shuang Li, Ke Li, Wei Li, Ming Yang
Constrained multi-objective optimization problems (CMOPs) pervade real-world applications in science, engineering, and design.
1 code implementation • 31 Dec 2023 • Yue Han, Jiangning Zhang, Junwei Zhu, Xiangtai Li, Yanhao Ge, Wei Li, Chengjie Wang, Yong liu, Xiaoming Liu, Ying Tai
This work presents FaceX framework, a novel facial generalist model capable of handling diverse facial tasks simultaneously.
1 code implementation • 25 Dec 2023 • Li Xiang, Junbo Yin, Wei Li, Cheng-Zhong Xu, Ruigang Yang, Jianbing Shen
Specifically, DMA builds a domain-mixing 3D instance bank for the teacher and student models during training, resulting in aligned data representation.
no code implementations • 20 Dec 2023 • Xisheng Li, Wei Li, Pinhao Song, Mingjun Zhang, Jie zhou
The inherent characteristics and light fluctuations of water bodies give rise to the huge difference between different layers and regions in underwater environments.
no code implementations • 15 Dec 2023 • Wei Li, Fu-Lin Hsu, Will Bishop, Folawiyo Campbell-Ajala, Max Lin, Oriana Riva
Automation systems that can autonomously drive application user interfaces to complete user tasks are of great benefit, especially when users are situationally or permanently impaired.
no code implementations • 15 Dec 2023 • Feng Lu, Wei Li, Yifei Sun, Cheng Song, Yufei Ren, Albert Y. Zomaya
Artificial intelligence (AI) has immense potential in time series prediction, but most explainable tools have limited capabilities in providing a systematic understanding of important features over time.
no code implementations • 13 Dec 2023 • Xin Ding, Xiaoyu Liu, Zhijun Tu, Yun Zhang, Wei Li, Jie Hu, Hanting Chen, Yehui Tang, Zhiwei Xiong, Baoqun Yin, Yunhe Wang
Post-training quantization (PTQ) has played a key role in compressing large language models (LLMs) with ultra-low costs.
no code implementations • 13 Dec 2023 • Guodong Xiong, Xinyan Ma, Wei Li, Jiaqi Cao, Jian Zhong, Yicong Su
With the development of artificial intelligence and unmanned equipment, human-machine hybrid formations will be the main focus in future combat formations.
1 code implementation • 12 Dec 2023 • Mingjian Zhu, Hanting Chen, Mouxiao Huang, Wei Li, Hailin Hu, Jie Hu, Yunhe Wang
The misuse of AI imagery can have harmful societal effects, prompting the creation of detectors to combat issues like the spread of fake news.
no code implementations • 1 Dec 2023 • Chaoyang Zhang, Yanan Li, Shen Chen, Siwei Fan, Wei Li
We first use a single-layer neural network to merge individual node features in the graph, and then adjust the aggregation weights of neighboring entities by incorporating influence factors.
1 code implementation • IEEE Transactions on Geoscience and Remote Sensing 2023 • Yingjie Tang, Shou Feng, Chunhui Zhao, Yuanze Fan, Qian Shi, Wei Li, and Ran Tao
This module utilizes wavelet transform for frequency decoupling, separating features into different components based on their frequency magnitudes.
no code implementations • 20 Nov 2023 • Dian Xu, Shanshan Wang, Weibing Deng, Feng Gao, Wei Li, Jianmin Shen
This paper suggests that, by inputting the largest cluster rather than the original configuration into the learning model, unsupervised learning can indeed predict the critical point of the percolation model.
no code implementations • 9 Nov 2023 • Qinghua Lin, Zuoyong Li, Kun Zeng, Haoyi Fan, Wei Li, Xiaoguang Zhou
Considering the limited quantity of labeled video data, we propose a semi-supervised fire detection model called FireMatch, which is based on consistency regularization and adversarial distribution alignment.
no code implementations • 6 Nov 2023 • Chenzhong Gao, Wei Li
This paper aims at providing an effective multi-modal images invariant feature extraction and matching algorithm for the application of multi-source data analysis.
1 code implementation • 31 Oct 2023 • Yihang Li, Shuichiro Shimizu, Chenhui Chu, Sadao Kurohashi, Wei Li
In addition to the extensive training set, EVA contains a video-helpful evaluation set in which subtitles are ambiguous, and videos are guaranteed helpful for disambiguation.
1 code implementation • 20 Oct 2023 • Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang
Hearing is arguably an essential ability of artificial intelligence (AI) agents in the physical world, which refers to the perception and understanding of general auditory information consisting of at least three types of sounds: speech, audio events, and music.
1 code implementation • 15 Oct 2023 • Dichucheng Li, Yinghao Ma, Weixing Wei, Qiuqiang Kong, Yulun Wu, Mingjin Che, Fan Xia, Emmanouil Benetos, Wei Li
Recognizing the significance of pitch in capturing the nuances of IPTs and the importance of onset in locating IPT events, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks.
no code implementations • 9 Oct 2023 • Xin Liu, Wei Li, Dazhi Zhan, Yu Pan, Xin Ma, Yu Ding, Zhisong Pan
Federated learning (FL) is a widely employed distributed paradigm for collaboratively training machine learning models from multiple clients without sharing local data.
2 code implementations • 9 Oct 2023 • Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang
Audio-visual large language models (LLM) have drawn significant attention, yet the fine-grained combination of both input streams is rather under-explored, which is challenging but necessary for LLMs to understand general video inputs.
1 code implementation • 8 Oct 2023 • Wei Li, Ruifeng Bian, Wenyi Zhao, Weijin Xu, Huihua Yang
To address these concerns, we propose a novel Cross-head mutual mean-teaching Network (CMMT-Net) incorporated strong-weak data augmentation, thereby benefitting both self-training and consistency learning.
no code implementations • 7 Oct 2023 • Monan Zhou, Shangda Wu, Shaohua Ji, Zijin Li, Wei Li
Unlike previous studies that focused on the effect of piano performance techniques on sound quality, this study evaluates the inherent sound quality of different pianos.
no code implementations • 29 Sep 2023 • Runnan Chen, Xinge Zhu, Nenglun Chen, Dawei Wang, Wei Li, Yuexin Ma, Ruigang Yang, Tongliang Liu, Wenping Wang
In this paper, we propose Model2Scene, a novel paradigm that learns free 3D scene representation from Computer-Aided Design (CAD) models and languages.
2 code implementations • 26 Sep 2023 • Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Haodong Duan, Songyang Zhang, Shuangrui Ding, Wenwei Zhang, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang
We propose InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition.
Ranked #9 on
Visual Question Answering (VQA)
on InfiMM-Eval
no code implementations • 26 Sep 2023 • Hailing Wang, Wei Li, Yuanyuan Xi, Jie Hu, Hanting Chen, Longyu Li, Yunhe Wang
By matching similar patches between frames, objects with large motion ranges in dynamic scenes can be aligned, which can effectively alleviate the generation of artifacts.
no code implementations • 25 Sep 2023 • Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang
Q-Former-based LLMs can generalise well to out-of-domain datasets, where 12% relative WER reductions over the Whisper baseline ASR model were achieved on the Eval2000 test set without using any in-domain training data from Switchboard.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • 25 Sep 2023 • Yun Zhang, Wei Li, Simiao Li, Hanting Chen, Zhijun Tu, Wenjia Wang, BingYi Jing, Shaohui Lin, Jie Hu
Knowledge distillation (KD) compresses deep neural networks by transferring task-related knowledge from cumbersome pre-trained teacher models to compact student models.
Ranked #32 on
Image Super-Resolution
on Urban100 - 4x upscaling
1 code implementation • 23 Sep 2023 • Monan Zhou, Xiaobing Li, Feng Yu, Wei Li
The EMelodyGen system focuses on emotional melody generation in ABC notation controlled by the musical feature template.
1 code implementation • 22 Sep 2023 • Jiahao Xie, Wei Li, Xiangtai Li, Ziwei Liu, Yew Soon Ong, Chen Change Loy
We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation.
no code implementations • 21 Sep 2023 • Yidong Liu, FuKai Shang, Fang Wang, Rui Xu, Jun Wang, Wei Li, Yao Li, Conghui He
With the advancement of deep learning technologies, general-purpose large models such as GPT-4 have demonstrated exceptional capabilities across various domains.
2 code implementations • 12 Sep 2023 • Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim, Chen Chen, Fabian Deuser, Feng Yan, Fufu Yu, Gal Shitrit, Guanshuo Wang, Gyusik Choi, Hankyul Kim, Hao Guo, Hasby Fahrudin, Hidenari Koguchi, Håkan Ardö, Ibrahim Salah, Ido Yerushalmy, Iftikar Muhammad, Ikuma Uchida, Ishay Be'ery, Jaonary Rabarisoa, Jeongae Lee, Jiajun Fu, Jianqin Yin, Jinghang Xu, Jongho Nang, Julien Denize, Junjie Li, Junpei Zhang, Juntae Kim, Kamil Synowiec, Kenji Kobayashi, Kexin Zhang, Konrad Habel, Kota Nakajima, Licheng Jiao, Lin Ma, Lizhi Wang, Luping Wang, Menglong Li, Mengying Zhou, Mohamed Nasr, Mohamed Abdelwahed, Mykola Liashuha, Nikolay Falaleev, Norbert Oswald, Qiong Jia, Quoc-Cuong Pham, Ran Song, Romain Hérault, Rui Peng, Ruilong Chen, Ruixuan Liu, Ruslan Baikulov, Ryuto Fukushima, Sergio Escalera, Seungcheon Lee, Shimin Chen, Shouhong Ding, Taiga Someya, Thomas B. Moeslund, Tianjiao Li, Wei Shen, Wei zhang, Wei Li, Wei Dai, Weixin Luo, Wending Zhao, Wenjie Zhang, Xinquan Yang, Yanbiao Ma, Yeeun Joo, Yingsen Zeng, Yiyang Gan, Yongqiang Zhu, Yujie Zhong, Zheng Ruan, Zhiheng Li, Zhijian Huang, Ziyu Meng
More information on the tasks, challenges, and leaderboards are available on https://www. soccer-net. org.
2 code implementations • 24 Aug 2023 • Bin Wang, Fan Wu, Xiao Han, Jiahui Peng, Huaping Zhong, Pan Zhang, Xiaoyi Dong, Weijia Li, Wei Li, Jiaqi Wang, Conghui He
A practical solution to this problem would be to utilize the available multimodal large language models (MLLMs) to generate instruction data for vision-language tasks.
1 code implementation • 21 Aug 2023 • Conghui He, Zhenjiang Jin, Chao Xu, Jiantao Qiu, Bin Wang, Wei Li, Hang Yan, Jiaqi Wang, Dahua Lin
The rise in popularity of ChatGPT and GPT-4 has significantly accelerated the development of large models, leading to the creation of numerous impressive large language models(LLMs) and multimodal large language models (MLLMs).
no code implementations • 16 Aug 2023 • Ji Zhang, Xiao Wu, Zhi-Qi Cheng, Qi He, Wei Li
Anomaly segmentation plays a pivotal role in identifying atypical objects in images, crucial for hazard detection in autonomous driving systems.
no code implementations • 11 Aug 2023 • Liang Chen, Yifei Yin, Hao Shi, Jingfei He, Wei Li
To address these challenges, we propose a Self-supervised Despeckling Strategy for SAR images (SDS-SAR) that relies solely on speckled intensity data for training.
no code implementations • 7 Aug 2023 • Cheng Wang, Wei Li
Image deraining is a challenging task that involves restoring degraded images affected by rain streaks.
no code implementations • ICCV 2023 • Siao Liu, Zhaoyu Chen, Yang Liu, Yuzheng Wang, Dingkang Yang, Zhile Zhao, Ziqing Zhou, Xie Yi, Wei Li, Wenqiang Zhang, Zhongxue Gan
In particular, CG2A develops a Gradient Agreement Solver to adaptively balance the varying gradient magnitudes, and introduces a Soft Gradient Surgery strategy to alleviate the gradient conflicts.
1 code implementation • 7 Jul 2023 • Zhengdao Li, Wei Li, Kai Hwang
The AGC-net is constructed by the Adaptive Graph Convolution (AGC) based on a novel context attention mechanism, which consists of a set of graph wavelets with various learnable scales.
no code implementations • 7 Jun 2023 • Xi Zhu, Xiya Cao, Zhiwei Dong, Caifa Zhou, Qiangbo Liu, Wei Li, Yongliang Wang
We also provide a new scene-level BEV map evaluation setting along with the corresponding baseline for a more comprehensive comparison.
1 code implementation • CVPR 2023 • Yuchao Wang, Jingjing Fei, Haochen Wang, Wei Li, Tianpeng Bao, Liwei Wu, Rui Zhao, Yujun Shen
In this way, we manage to close the gap between the feature areas of different categories, resulting in a more balanced representation.
1 code implementation • 29 May 2023 • Yuhang Zang, Wei Li, Jun Han, Kaiyang Zhou, Chen Change Loy
Moreover, we present ContextDET, a unified multimodal model that is capable of end-to-end differentiable modeling of visual-language contexts, so as to locate, identify, and associate visual objects with language inputs for human-AI interaction.
no code implementations • 28 May 2023 • Kang Xu, Chenjia Bai, Shuang Qiu, Haoran He, Bin Zhao, Zhen Wang, Wei Li, Xuelong Li
Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence.
no code implementations • 25 May 2023 • Wenhao Cheng, Junbo Yin, Wei Li, Ruigang Yang, Jianbing Shen
In this work, we propose a new multi-modal visual grounding task, termed LiDAR Grounding.
1 code implementation • 23 May 2023 • Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Liwei Wu, Yuxi Wang, Zhaoxiang Zhang
To this end, we propose T2S-DA, which we interpret as a form of pulling Target to Source for Domain Adaptation, encouraging the model in learning similar cross-domain features.
no code implementations • 19 May 2023 • Kaiqi Fu, Shaojun Gao, Shuju Shi, Xiaohai Tian, Wei Li, Zejun Ma
Specifically, we first pre-train the model using a reconstruction loss function, by masking phones and their durations jointly on a large amount of unlabeled speech and text prompts.
1 code implementation • 17 May 2023 • Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, Guy Gur-Ari, Steven Hand, Hadi Hashemi, Le Hou, Joshua Howland, Andrea Hu, Jeffrey Hui, Jeremy Hurwitz, Michael Isard, Abe Ittycheriah, Matthew Jagielski, Wenhao Jia, Kathleen Kenealy, Maxim Krikun, Sneha Kudugunta, Chang Lan, Katherine Lee, Benjamin Lee, Eric Li, Music Li, Wei Li, Yaguang Li, Jian Li, Hyeontaek Lim, Hanzhao Lin, Zhongtao Liu, Frederick Liu, Marcello Maggioni, Aroma Mahendru, Joshua Maynez, Vedant Misra, Maysam Moussalem, Zachary Nado, John Nham, Eric Ni, Andrew Nystrom, Alicia Parrish, Marie Pellat, Martin Polacek, Alex Polozov, Reiner Pope, Siyuan Qiao, Emily Reif, Bryan Richter, Parker Riley, Alex Castro Ros, Aurko Roy, Brennan Saeta, Rajkumar Samuel, Renee Shelby, Ambrose Slone, Daniel Smilkov, David R. So, Daniel Sohn, Simon Tokumine, Dasha Valter, Vijay Vasudevan, Kiran Vodrahalli, Xuezhi Wang, Pidong Wang, ZiRui Wang, Tao Wang, John Wieting, Yuhuai Wu, Kelvin Xu, Yunhan Xu, Linting Xue, Pengcheng Yin, Jiahui Yu, Qiao Zhang, Steven Zheng, Ce Zheng, Weikang Zhou, Denny Zhou, Slav Petrov, Yonghui Wu
Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.
Ranked #1 on
Question Answering
on StrategyQA
1 code implementation • 2023 2023 • Chunhui Zhao, Yingjie Tang, Shou Feng, Yuanze Fan, Wei Li, Ran Tao, and Lifu Zhang
With the development of remote sensing technology, high-resolution (HR) remote sensing optical images have gradually become the main source of change detection data.
no code implementations • 8 May 2023 • Wei Li, Xiangxu Meng, Chuhao Chen, Jianing Chen
In this paper, we carefully examine the opposing properties of CI and CD, and raise a practical question that has not been effectively answered, e. g.,"How to effectively mix the CI and CD properties of time series to achieve better predictive performance?"
1 code implementation • NeurIPS 2023 • Wenhao Wang, Yifan Sun, Wei Li, Yi Yang
This paper explores a hierarchical prompting mechanism for the hierarchical image classification (HIC) task.
1 code implementation • IEEE Transactions on Geoscience and Remote Sensing 2023 • Mingsong Li, Wei Li, Yikun Liu, Yuwen Huang, and Gongping Yang.
Subsequently, based on distance covariance descriptor, a dual channel distance covariance representation (DC-DCR) module is proposed for modeling unified spectral-spatial feature representations and exploring spectral-spatial relationships, especially linear and nonlinear interdependence in spectral domain.
Ranked #1 on
Hyperspectral Image Classification
on Indian Pines
(OA@5%perclass metric)
Hyperspectral image analysis
Hyperspectral Image Classification
+1
1 code implementation • CVPR 2023 • Zeren Chen, Gengshi Huang, Wei Li, Jianing Teng, Kun Wang, Jing Shao, Chen Change Loy, Lu Sheng
In this work, we present Siamese DETR, a Siamese self-supervised pretraining approach for the Transformer architecture in DETR.