Search Results for author: Yu Wang

Found 524 papers, 222 papers with code

HIT: Nested Named Entity Recognition via Head-Tail Pair and Token Interaction

no code implementations EMNLP 2020 Yu Wang, Yun Li, Hanghang Tong, Ziye Zhu

Specifically, we design (1) Head-Tail Detector based on the multi-head self-attention mechanism and bi-affine classifier to detect boundary tokens, and (2) Token Interaction Tagger based on traditional sequence labeling approaches to characterize the internal token connection within the boundary.

named-entity-recognition Named Entity Recognition +2

NEURAL MALWARE CONTROL WITH DEEP REINFORCEMENT LEARNING

no code implementations ICLR 2019 Yu Wang, Jack W. Stokes, Mady Marinescu

Antimalware products are a key component in detecting malware attacks, and their engines typically execute unknown programs in a sandbox prior to running them on the native operating system.

Deep Reinforcement Learning reinforcement-learning +1

CNNSAT: Fast, Accurate Boolean Satisfiability using Convolutional Neural Networks

no code implementations ICLR 2019 Yu Wang, Fengjuan Gao, Amin Alipour, Linzhang Wang, Xuandong Li, Zhendong Su

Boolean satisfiability (SAT) is one of the most well-known NP-complete problems and has been extensively studied.

Pseudo-Masked Language Models for Unified Language Model Pre-Training

1 code implementation ICML 2020 Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).

Decoder Language Modeling +3

基于规则的双重否定识别——以“不v1不v2”为例(Double Negative Recognition Based on Rules——Taking “不v1不v2” as an Example)

no code implementations CCL 2020 Yu Wang

“不v1不v2”是汉语中典型的双重否定结构形式之一, 它包括“不+助动词+不+v2”(不得不去)、“不+是+不v2”(不是不好)、述宾结构“不v1... 不v2”(不认为他不去)等多种双重否定结构, 情况复杂。本文以“不v1不v2”为例, 结合“元语否定”、“动词叙实性”、“否定焦点”等概念, 对“不v1不v2”进行了全面的考察, 制定了“不v1不v2”双重否定结构的识别策略。根据识别策略, 设计了双重否定自动识别程序, 并在此过程中补充了助动词表、非叙实动词表等词库。最终, 对28033句语料进行了识别, 识别正确率为97. 87%, 召回率约为93. 10%。

双重否定结构自动识别研究(The Research on Automatic Recognition of the Double Negation Structure)

no code implementations CCL 2022 Yu Wang, Yulin Yuan

“双重否定结构是一种“通过两次否定表示肯定意义”的特殊结构, 其存在会对自然语言处理中的语义判断与情感分类产生重要影响。本文以“eg eg P== extgreater P”为标准, 对现代汉语中所有的“否定词+否定词”结构进行了遍历研究, 将双重否定结构按照格式分为了3大类, 25小类, 常用双重否定结构或构式132个。结合动词的叙实性、否定焦点、语义否定与语用否定等相关理论, 本文归纳了双重否定结构的三大成立条件, 并据此设计实现了基于规则的双重否定结构自动识别程序。程序实验的精确率为98. 85%, 召回率为98. 90%, F1值为98. 85%。同时, 程序还从96281句语料中获得了8640句精确率约为99%的含有双重否定结构的句子, 为后续基于统计的深度学习模型提供了语料支持的可能。”

Negation

Hysteresis-Aware Neural Network Modeling and Whole-Body Reinforcement Learning Control of Soft Robots

no code implementations18 Apr 2025 Zongyuan Chen, Yan Xia, Jiayuan Liu, Jijia Liu, Wenhao Tang, Jiayu Chen, Feng Gao, Longfei Ma, Hongen Liao, Yu Wang, Chao Yu, Boyu Zhang, Fei Xing

In this study, we present a soft robotic system designed for surgical applications and propose a hysteresis-aware whole-body neural network model that accurately captures and predicts the soft robot's whole-body motion, including its hysteretic behavior.

Meta-Learning and Knowledge Discovery based Physics-Informed Neural Network for Remaining Useful Life Prediction

1 code implementation18 Apr 2025 Yu Wang, Shujie Liu, Shuai Lv, Gengshuo Liu

Predicting the remaining useful life (RUL) of rotating machinery is critical for industrial safety and maintenance, but existing methods struggle with scarce target-domain data and unclear degradation dynamics.

Meta-Learning

Chain-of-Thought Prompting for Out-of-Distribution Samples: A Latent-Variable Study

1 code implementation17 Apr 2025 Yu Wang, Fu-Chieh Chang, Pei-Yuan Wu

Chain-of-Thought (CoT) prompting has emerged as a powerful technique to improve in-context learning (ICL) in large language models (LLMs) by breaking complex reasoning into intermediate steps.

In-Context Learning

Sleep-time Compute: Beyond Inference Scaling at Test-time

1 code implementation17 Apr 2025 Kevin Lin, Charlie Snell, Yu Wang, Charles Packer, Sarah Wooders, Ion Stoica, Joseph E. Gonzalez

Scaling test-time compute has emerged as a key ingredient for enabling large language models (LLMs) to solve difficult problems, but comes with high latency and inference cost.

NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

1 code implementation17 Apr 2025 Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, YuFei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, YuTing Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou, Qirui Yang, Fangpu Zhang, Yunlong Lin, Sixiang Chen, Guoxi Huang, Ruirui Lin, Yan Zhang, Jingyu Yang, Huanjing Yue, Jiyuan Chen, Qiaosi Yi, Hongjun Wang, Chenxi Xie, Shuai Li, Yuhui Wu, Kaiyi Ma, Jiakui Hu, Juncheng Li, Liwen Pan, Guangwei Gao, Wenjie Li, Zhenyu Jin, Heng Guo, Zhanyu Ma, YuBo Wang, Jinghua Wang, Wangzhi Xing, Anjusree Karnavar, Diqi Chen, Mohammad Aminul Islam, Hao Yang, Ruikun Zhang, Liyuan Pan, Qianhao Luo, XinCao, Han Zhou, Yan Min, Wei Dong, Jun Chen, Taoyi Wu, Weijia Dou, Yu Wang, Shengjie Zhao, Yongcheng Huang, Xingyu Han, Anyan Huang, Hongtao Wu, Hong Wang, Yefeng Zheng, Abhijeet Kumar, Aman Kumar, Marcos V. Conde, Paula Garrido, Daniel Feijoo, Juan C. Benito, Guanglu Dong, Xin Lin, Siyuan Liu, Tianheng Zheng, Jiayu Zhong, Shouyi Wang, Xiangtai Li, Lanqing Guo, Lu Qi, Chao Ren, Shuaibo Wang, Shilong Zhang, Wanyu Zhou, Yunze Wu, Qinzhong Tan, Jieyuan Pei, Zhuoxuan Li, Jiayu Wang, Haoyu Bian, Haoran Sun, Subhajit Paul, Ni Tang, Junhao Huang, Zihan Cheng, Hongyun Zhu, Yuehan Wu, Kaixin Deng, Hang Ouyang, Tianxin Xiao, Fan Yang, Zhizun Luo, Zeyu Xiao, Zhuoyuan Li, Nguyen Pham Hoang Le, An Dinh Thien, Son T. Luu, Kiet Van Nguyen, Ronghua Xu, Xianmin Tian, Weijian Zhou, Jiacheng Zhang, Yuqian Chen, Yihang Duan, Yujie Wu, Suresh Raikwar, Arsh Garg, Kritika, Jianhua Zheng, Xiaoshan Ma, Ruolin Zhao, Yongyu Yang, Yongsheng Liang, Guiming Huang, Qiang Li, Hongbin Zhang, Xiangyu Zheng, A. N. Rajagopalan

This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images.

Raindrop Removal Rain Removal +1

VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate

no code implementations16 Apr 2025 Zhihang Yuan, Rui Xie, Yuzhang Shang, Hanling Zhang, Siyuan Wang, Shengen Yan, Guohao Dai, Yu Wang

In this paper, we exploit the inherent temporal non-uniformity of real-world videos and observe that videos exhibit dynamic information density, with high-motion segments demanding greater detail preservation than static scenes.

Video Generation

DeepSelective: Feature Gating and Representation Matching for Interpretable Clinical Prediction

no code implementations15 Apr 2025 Ruochi Zhang, Qian Yang, Xiaoyang Wang, Haoran Wu, Qiong Zhou, Yu Wang, Kewei Li, Yueying Wang, Yusi Fan, Jiale Zhang, Lan Huang, Chang Liu, Fengfeng Zhou

The rapid accumulation of Electronic Health Records (EHRs) has transformed healthcare by providing valuable data that enhance clinical predictions and diagnoses.

Data Compression Decision Making +3

QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models

1 code implementation15 Apr 2025 Yudong Zhang, Ruobing Xie, Jiansheng Chen, Xingwu Sun, Zhanhui Kang, Yu Wang

In typical multimodal tasks, such as Visual Question Answering (VQA), adversarial attacks targeting a specific image and question can lead large vision-language models (LVLMs) to provide incorrect answers.

Question Answering Visual Question Answering

Towards Distribution Matching between Collaborative and Language Spaces for Generative Recommendation

1 code implementation10 Apr 2025 Yi Zhang, Yiwen Zhang, Yu Wang, Tong Chen, Hongzhi Yin

This work addresses this issue by proposing a model-agnostic generative recommendation framework called DMRec, which introduces a probabilistic meta-network to bridge the outputs of LMs with user interactions, thereby enabling an equivalent probabilistic modeling process.

VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation

no code implementations5 Apr 2025 Yuhao Wang, Heyang Liu, Ziyang Cheng, Ronghua Wu, Qunshan Gu, Yanfeng Wang, Yu Wang

Speech large language models (LLMs) have emerged as a prominent research focus in speech processing.

DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image

1 code implementation2 Apr 2025 Jijun Xiang, Xuan Zhu, Xianqi Wang, Yu Wang, Hong Zhang, Fei Guo, Xin Yang

To address these challenges, we propose a novel completion-based method, named DEPTHOR, featuring advances in both the training strategy and model architecture.

Depth Completion Monocular Depth Estimation +1

On Data Synthesis and Post-training for Visual Abstract Reasoning

no code implementations2 Apr 2025 Ke Zhu, Yu Wang, JiangJiang Liu, Qunyi Xie, Shanshan Liu, Gang Zhang

This paper is a pioneering work attempting to address abstract visual reasoning (AVR) problems for large vision-language models (VLMs).

Visual Reasoning

RARE: Retrieval-Augmented Reasoning Modeling

1 code implementation30 Mar 2025 Zhengren Wang, Jiayang Yu, Dongsheng Ma, Zhe Chen, Yu Wang, Zhiyu Li, Feiyu Xiong, Yanfeng Wang, Weinan E, Linpeng Tang, Wentao Zhang

Domain-specific intelligence demands specialized knowledge and sophisticated reasoning for problem-solving, posing significant challenges for large language models (LLMs) that struggle with knowledge hallucination and inadequate reasoning capabilities under constrained parameter budgets.

Hallucination Memorization +1

LaViC: Adapting Large Vision-Language Models to Visually-Aware Conversational Recommendation

1 code implementation30 Mar 2025 Hyunsik Jeon, Satoshi Koide, Yu Wang, Zhankui He, Julian McAuley

To address this challenge, we propose LaViC (Large Vision-Language Conversational Recommendation Framework), a novel approach that integrates compact image representations into dialogue-based recommendation systems.

Conversational Recommendation Recommendation Systems

DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers

no code implementations28 Mar 2025 Hanling Zhang, Rundong Su, Zhihang Yuan, Pengtao Chen, Mingzhu Shen Yibo Fan, Shengen Yan, Guohao Dai, Yu Wang

Text-to-image generation models, especially Multimodal Diffusion Transformers (MMDiT), have shown remarkable progress in generating high-quality images.

2k Text-to-Image Generation

Advancements in Natural Language Processing: Exploring Transformer-Based Architectures for Text Understanding

no code implementations26 Mar 2025 Tianhao Wu, Yu Wang, Ngoc Quach

Natural Language Processing (NLP) has witnessed a transformative leap with the advent of transformer-based architectures, which have significantly enhanced the ability of machines to understand and generate human-like text.

Model Selection

AED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models

no code implementations24 Mar 2025 Le Qiu, Zelai Xu, Qixin Tan, Wenhao Tang, Chao Yu, Yu Wang

Assessing the safety of autonomous driving policy is of great importance, and reinforcement learning (RL) has emerged as a powerful method for discovering critical vulnerabilities in driving policies.

Autonomous Driving Reinforcement Learning (RL)

FROG: Fair Removal on Graphs

no code implementations23 Mar 2025 Ziheng Chen, Jiali Cheng, Gabriele Tolomei, Sijia Liu, Hadi Amiri, Yu Wang, Kaushiki Nag, Lu Lin

As compliance with privacy regulations becomes increasingly critical, the growing demand for data privacy has highlighted the significance of machine unlearning in many real world applications, such as social network and recommender systems, many of which can be represented as graph-structured data.

Fairness Machine Unlearning +1

Reducing Class-wise Confusion for Incremental Learning with Disentangled Manifolds

1 code implementation22 Mar 2025 Huitong Chen, Yu Wang, Yan Fan, Guosong Jiang, QinGhua Hu

Thus, the representation stability and capability of class distributions are enhanced, alleviating the potential class-wise confusion problem.

class-incremental learning Class Incremental Learning +1

BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors

1 code implementation22 Mar 2025 Yu Wang, Junxian Mu, Hongzhi Huang, Qilong Wang, Pengfei Zhu, QinGhua Hu

We first empirically and theoretically explore the role of foregrounds and backgrounds in open set recognition and disclose that: 1) backgrounds that correlate with foregrounds would mislead the model and cause failures when encounters 'partially' known images; 2) Backgrounds unrelated to foregrounds can serve as auxiliary known outliers and provide regularization via global average pooling.

Open Set Learning

Probabilistic Prompt Distribution Learning for Animal Pose Estimation

1 code implementation20 Mar 2025 Jiyong Rao, Brian Nlong Zhao, Yu Wang

To this end, we propose a novel probabilistic prompting approach to fully explore textual descriptions, which could alleviate the diversity issues caused by long-tail property and increase the adaptability of prompts on unseen category instance.

Animal Pose Estimation Diversity +1

Empowering GraphRAG with Knowledge Filtering and Integration

no code implementations18 Mar 2025 Kai Guo, Harry Shomer, Shenglai Zeng, Haoyu Han, Yu Wang, Jiliang Tang

GraphRAG-Integration employs a logits-based selection strategy to balance external knowledge from GraphRAG with the LLM's intrinsic reasoning, reducing over-reliance on retrievals.

MEET: A Million-Scale Dataset for Fine-Grained Geospatial Scene Classification with Zoom-Free Remote Sensing Imagery

no code implementations14 Mar 2025 Yansheng Li, Yuning Wu, Gong Cheng, Chao Tao, Bo Dang, Yu Wang, Jiahao Zhang, Chuge Zhang, Yiting Liu, Xu Tang, Jiayi Ma, Yongjun Zhang

To address this limitation, we introduce the Million-scale finE-grained geospatial scEne classification dataseT (MEET), which contains over 1. 03 million zoom-free remote sensing scene samples, manually annotated into 80 fine-grained categories.

Classification Scene Classification

RankPO: Preference Optimization for Job-Talent Matching

1 code implementation13 Mar 2025 Yafei Zhang, Murray Wang, Yu Wang, Xiaohui Wang

By fine-tuning with RankPO, we achieve a balanced model that retains relatively good performance in the original tasks while significantly improving the alignment with AI preferences.

Contrastive Learning

Partial differential equation system for binarization of degraded document images

no code implementations11 Mar 2025 Youjin Liu, Yu Wang

In this system, the first equation is designed to estimate the background component, incorporating both diffusion and fidelity terms.

Binarization

Gradient-guided Attention Map Editing: Towards Efficient Contextual Hallucination Mitigation

no code implementations11 Mar 2025 Yu Wang, Jiaxin Zhang, Xiang Gao, Wendi Cui, Peng Li, Kamalika Das

In tasks like summarization and open-book question answering (QA), Large Language Models (LLMs) often encounter "contextual hallucination", where they produce irrelevant or incorrect responses despite having access to accurate source information.

Computational Efficiency Hallucination +1

High-Precision Transformer-Based Visual Servoing for Humanoid Robots in Aligning Tiny Objects

no code implementations6 Mar 2025 Jialong Xue, Wei Gao, Yu Wang, Chao Ji, Dongdong Zhao, Shi Yan, Shiwu Zhang

High-precision tiny object alignment remains a common and critical challenge for humanoid robots in real-world.

Object

DSVD: Dynamic Self-Verify Decoding for Faithful Generation in Large Language Models

no code implementations5 Mar 2025 YiQiu Guo, Yuchen Yang, Zhe Chen, Pingjie Wang, Yusheng Liao, Ya zhang, Yanfeng Wang, Yu Wang

The reliability of large language models remains a critical challenge, particularly due to their susceptibility to hallucinations and factual inaccuracies during text generation.

Hallucination Text Generation

SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion

1 code implementation3 Mar 2025 Xuan Zhu, Jijun Xiang, Xianqi Wang, Longliang Liu, Yu Wang, Hong Zhang, Fei Guo, Xin Yang

However, due to the manufacturing constraints of compact devices and the inherent physical principles of imaging, dToF depth maps are sparse and noisy.

Depth Completion

AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web Platforms

1 code implementation26 Feb 2025 Yuwei Yan, Yu Shang, Qingbin Zeng, Yu Li, Keyu Zhao, Zhiheng Zheng, Xuefei Ning, Tianji Wu, Shengen Yan, Yu Wang, Fengli Xu, Yong Li

The AgentSociety Challenge is the first competition in the Web Conference that aims to explore the potential of Large Language Model (LLM) agents in modeling user behavior and enhancing recommender systems on web platforms.

Language Modeling Language Modelling +2

AVD2: Accident Video Diffusion for Accident Video Description

no code implementations20 Feb 2025 Cheng Li, Keyuan Zhou, Tong Liu, Yu Wang, Mingqiao Zhuang, Huan-ang Gao, Bu Jin, Hao Zhao

Traffic accidents present complex challenges for autonomous driving, often featuring unpredictable scenarios that hinder accurate system interpretation and responses. Nonetheless, prevailing methodologies fall short in elucidating the causes of accidents and proposing preventive measures due to the paucity of training data specific to accident scenarios. In this work, we introduce AVD2 (Accident Video Diffusion for Accident Video Description), a novel framework that enhances accident scene understanding by generating accident videos that aligned with detailed natural language descriptions and reasoning, resulting in the contributed EMM-AU (Enhanced Multi-Modal Accident Video Understanding) dataset.

Autonomous Driving Scene Understanding +2

Megrez-Omni Technical Report

no code implementations19 Feb 2025 Boxun Li, Yadong Li, Zhiyuan Li, Congyi Liu, Weilin Liu, Guowei Niu, Zheyue Tan, Haiyang Xu, Zhuyu Yao, Tao Yuan, Dong Zhou, Yueqing Zhuang, Shengen Yan, Guohao Dai, Yu Wang

In this work, we present the Megrez models, comprising a language model (Megrez-3B-Instruct) and a multimodal model (Megrez-3B-Omni).

Language Modeling Language Modelling

Policy-to-Language: Train LLMs to Explain Decisions with Flow-Matching Generated Rewards

no code implementations18 Feb 2025 Xinyi Yang, Liang Zeng, Heng Dong, Chao Yu, Xiaoran Wu, Huazhong Yang, Yu Wang, Milind Tambe, Tonghan Wang

As humans increasingly share environments with diverse agents powered by RL, LLMs, and beyond, the ability to explain their policies in natural language will be vital for reliable coexistence.

DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation

1 code implementation17 Feb 2025 Zhihang Yuan, Siyuan Wang, Rui Xie, Hanling Zhang, Tongcheng Fang, Yuzhang Shang, Shengen Yan, Guohao Dai, Yu Wang

In this paper, we propose the Dynamic Latent Frame Rate VAE (DLFR-VAE), a training-free paradigm that can make use of adaptive temporal compression in latent space.

Video Generation

Mitigating Visual Knowledge Forgetting in MLLM Instruction-tuning via Modality-decoupled Gradient Descent

no code implementations17 Feb 2025 Junda Wu, Yuxin Xiong, Xintong Li, Yu Xia, Ruoyu Wang, Yu Wang, Tong Yu, Sungchul Kim, Ryan A. Rossi, Lina Yao, Jingbo Shang, Julian McAuley

By explicitly disentangling the optimization of visual understanding from task-specific alignment, MDGD preserves pre-trained visual knowledge while enabling efficient task adaptation.

Continual Learning parameter-efficient fine-tuning

Learning Strategic Language Agents in the Werewolf Game with Iterative Latent Space Policy Optimization

no code implementations7 Feb 2025 Zelai Xu, Wanjun Gu, Chao Yu, Yi Wu, Yu Wang

We propose Latent Space Policy Optimization (LSPO), an iterative framework that addresses these challenges by first mapping free-form text to a discrete latent space, where methods like CFR and RL can learn strategic policy more effectively.

counterfactual Decision Making +3

Intent Representation Learning with Large Language Model for Recommendation

1 code implementation5 Feb 2025 Yu Wang, Lei Sang, Yi Zhang, Yiwen Zhang

To tackle these challenges, we propose a model-agnostic framework, Intent Representation Learning with Large Language Model (IRLLRec), which leverages large language models (LLMs) to construct multimodal intents and enhance recommendations.

Language Modeling Language Modelling +3

VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play

no code implementations4 Feb 2025 Zelai Xu, Chao Yu, Ruize Zhang, Huining Yuan, Xiangmin Yi, Shilong Ji, Chuqi Wang, Wenhao Tang, Yu Wang

To bridge this gap, we present VolleyBots, a new MARL testbed where multiple drones cooperate and compete in the sport of volleyball under physical dynamics.

Multi-agent Reinforcement Learning

M+: Extending MemoryLLM with Scalable Long-Term Memory

1 code implementation1 Feb 2025 Yu Wang, Dmitry Krotov, Yuanzhe Hu, Yifan Gao, Wangchunshu Zhou, Julian McAuley, Dan Gutfreund, Rogerio Feris, Zexue He

Equipping large language models (LLMs) with latent-space memory has attracted increasing attention as they can extend the context window of existing language models.

16k Long-Context Understanding +1

Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network

no code implementations1 Feb 2025 Jijia Liu, Feng Gao, Qingmin Liao, Chao Yu, Yu Wang

First, ARSQ decomposes the continuous action space into discrete spaces in a coarse-to-fine hierarchy, enhancing sample efficiency for fine-grained continuous control tasks.

continuous-control Continuous Control +3

Reflections on "Can AI Understand Our Universe?"

no code implementations29 Jan 2025 Yu Wang

This article briefly discusses the philosophical and technical aspects of AI.

WanJuanSiLu: A High-Quality Open-Source Webtext Dataset for Low-Resource Languages

1 code implementation24 Jan 2025 JIA YU, Fei Yuan, Rui Min, Jing Yu, Pei Chu, Jiayang Li, Wei Li, Ruijie Zhang, Zhenxiang Li, Zhifei Ren, Dong Zheng, Wenjian Zhang, Yan Teng, Lingyu Meng, Zhenjiang Jin, Jiantao Qiu, Shasha Wang, Zhongying Tu, Dahua Lin, Yu Wang, Yu Qiao, Yanfeng Wang, Conghui He

This paper introduces the open-source dataset WanJuanSiLu, designed to provide high-quality training corpora for low-resource languages, thereby advancing the research and development of multilingual models.

Diversity

MedS$^3$: Towards Medical Small Language Models with Self-Evolved Slow Thinking

1 code implementation21 Jan 2025 Shuyang Jiang, Yusheng Liao, Zhe Chen, Ya zhang, Yanfeng Wang, Yu Wang

In this work, we present a deployable, small-scale medical language model, \mone, designed for long-chain reasoning in clinical tasks using a self-evolution paradigm.

Multiple-choice

PySpatial: A High-Speed Whole Slide Image Pathomics Toolkit

no code implementations10 Jan 2025 Yuechen Yang, Yu Wang, Tianyuan Yao, Ruining Deng, Mengmeng Yin, Shilin Zhao, Haichun Yang, Yuankai Huo

These results highlight PySpatial's potential to handle large-scale WSI analysis with enhanced efficiency and accuracy, paving the way for broader applications in digital pathology.

Holistic Semantic Representation for Navigational Trajectory Generation

1 code implementation6 Jan 2025 Ji Cao, Tongya Zheng, Qinghong Guo, Yu Wang, Junshu Dai, Shunyu Liu, Jie Yang, Jie Song, Mingli Song

Trajectory generation has garnered significant attention from researchers in the field of spatio-temporal analysis, as it can generate substantial synthesized human mobility trajectories that enhance user privacy and alleviate data scarcity.

Few-Shot Learning Zero-Shot Learning

Towards Omni-RAG: Comprehensive Retrieval-Augmented Generation for Large Language Models in Medical Applications

no code implementations5 Jan 2025 Zhe Chen, Yusheng Liao, Shuyang Jiang, Pingjie Wang, YiQiu Guo, Yanfeng Wang, Yu Wang

Large language models (LLMs) hold promise for addressing healthcare challenges but often generate hallucinations due to limited integration of medical knowledge.

RAG Retrieval

Personalized Graph-Based Retrieval for Large Language Models

1 code implementation4 Jan 2025 Steven Au, Cameron J. Dimacali, Ojasmitha Pedirappagari, Namyong Park, Franck Dernoncourt, Yu Wang, Nikos Kanakaris, Hanieh Deilamsalehy, Ryan A. Rossi, Nesreen K. Ahmed

As large language models (LLMs) evolve, their ability to deliver personalized and context-aware responses offers transformative potential for improving user experiences.

Knowledge Graphs Retrieval +1

Retrieval-Augmented Generation with Graphs (GraphRAG)

no code implementations31 Dec 2024 Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Mahantesh Halappanavar, Ryan A. Rossi, Subhabrata Mukherjee, Xianfeng Tang, Qi He, Zhigang Hua, Bo Long, Tong Zhao, Neil Shah, Amin Javari, Yinglong Xia, Jiliang Tang

However, unlike conventional RAG, where the retriever, generator, and external data sources can be uniformly designed in the neural-embedding space, the uniqueness of graph-structured data, such as diverse-formatted and domain-specific relational knowledge, poses unique and significant challenges when designing GraphRAG for different domains.

RAG Retrieval +1

FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models

1 code implementation30 Dec 2024 Tianyu Fu, Tengxuan Liu, Qinghao Han, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang

Leveraging the unique properties of similarity over importance, we introduce FrameFusion, a novel approach that combines similarity-based merging with importance-based pruning for better token reduction in LVLMs.

Question Answering Token Reduction +1

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

1 code implementation27 Dec 2024 Shiyao Li, Yingchun Hu, Xuefei Ning, Xihui Liu, Ke Hong, Xiaotao Jia, Xiuhong Li, Yaqi Yan, Pei Ran, Guohao Dai, Shengen Yan, Huazhong Yang, Yu Wang

Therefore, treating tokens from different modalities equally, as in existing PTQ methods, may over-emphasize the insensitive modalities, leading to significant accuracy loss.

Quantization

Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching

1 code implementation22 Dec 2024 Enshu Liu, Xuefei Ning, Yu Wang, Zinan Lin

As the first work to demonstrate the possibility of one-step generation for image AR models, DD challenges the prevailing notion that AR models are inherently slow, and opens up new opportunities for efficient AR generation.

Text-to-Image Generation

Enhancing Contrastive Learning Inspired by the Philosophy of "The Blind Men and the Elephant"

1 code implementation21 Dec 2024 Yudong Zhang, Ruobing Xie, Jiansheng Chen, Xingwu Sun, Zhanhui Kang, Yu Wang

Contrastive learning is a prevalent technique in self-supervised vision representation learning, typically generating positive pairs by applying two data augmentations to the same image.

Contrastive Learning Data Augmentation +2

E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling

no code implementations18 Dec 2024 Zhihang Yuan, Yuzhang Shang, Hanling Zhang, Tongcheng Fang, Rui Xie, Bingxin Xu, Yan Yan, Shengen Yan, Guohao Dai, Yu Wang

Our approach not only enhances computational efficiency but also aligns naturally with image generation principles by operating in continuous token space and following a hierarchical generation process from coarse to fine details.

Computational Efficiency Denoising +1

What Matters in Learning A Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study

no code implementations16 Dec 2024 Jiayu Chen, Chao Yu, Yuqing Xie, Feng Gao, Yinuo Chen, Shu'ang Yu, Wenhao Tang, Shilong Ji, Mo Mu, Yi Wu, Huazhong Yang, Yu Wang

The policy derived by SimpleFlight consistently excels across both smooth polynominal trajectories and challenging infeasible zigzag trajectories on small thrust-to-weight quadrotors.

Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal

no code implementations15 Dec 2024 Yuhao Wang, Zhiyuan Zhu, Heyang Liu, Yusheng Liao, Hongcheng Liu, Yanfeng Wang, Yu Wang

Multimodal large language models (MLLMs) excel at multimodal perception and understanding, yet their tendency to generate hallucinated or inaccurate responses undermines their trustworthiness.

GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration

no code implementations5 Dec 2024 Kaiyi Huang, Yukun Huang, Xuefei Ning, Zinan Lin, Yu Wang, Xihui Liu

To avoid hallucination of a single MLLM agent, we decompose this stage to four sequentially-executed MLLM-based agents: verification agent, suggestion agent, correction agent, and output structuring agent.

Attribute Hallucination +2

ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics

1 code implementation4 Dec 2024 Junchao Zhu, Ruining Deng, Tianyuan Yao, Juming Xiong, Chongyu Qu, Junlin Guo, Siqi Lu, Mengmeng Yin, Yu Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

Expanding ST to three-dimensional (3D) volumes is challenging due to the prohibitive costs; a 2D ST acquisition already costs over 50 times more than whole slide imaging (WSI), and a full 3D volume with 10 sections can be an order of magnitude more expensive.

Anatomy Diagnostic +1

Jailbreak Large Vision-Language Models Through Multi-Modal Linkage

1 code implementation30 Nov 2024 Yu Wang, Xiaofei Zhou, Yichen Wang, Geyuan Zhang, Tianxing He

With the significant advancement of Large Vision-Language Models (VLMs), concerns about their potential misuse and abuse have grown rapidly.

LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization

no code implementations26 Nov 2024 Rui Xie, Tianchen Zhao, Zhihang Yuan, Rui Wan, Wenxi Gao, Zhenhua Zhu, Xuefei Ning, Yu Wang

Visual Autoregressive (VAR) has emerged as a promising approach in image generation, offering competitive potential and performance comparable to diffusion-based models.

Image Generation Quantization

Glo-In-One-v2: Holistic Identification of Glomerular Cells, Tissues, and Lesions in Human and Mouse Histopathology

1 code implementation25 Nov 2024 Lining Yu, Mengmeng Yin, Ruining Deng, Quan Liu, Tianyuan Yao, Can Cui, Junlin Guo, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

In this study, we leverage the Glo-In-One toolkit to version 2 with fine-grained segmentation capabilities, curating 14 distinct labels for tissue regions, cells, and lesions across a dataset of 23, 529 annotated glomeruli across human and mouse histopathology data.

Lesion Segmentation Segmentation +2

Continual SFT Matches Multimodal RLHF with Negative Supervision

no code implementations22 Nov 2024 Ke Zhu, Yu Wang, Yanpeng Sun, Qiang Chen, JiangJiang Liu, Gang Zhang, Jingdong Wang

Our nSFT disentangles this negative supervision in RLHF paradigm, and continually aligns VLMs with a simple SFT loss.

Neural Internal Model Control: Learning a Robust Control Policy via Predictive Error Feedback

1 code implementation20 Nov 2024 Feng Gao, Chao Yu, Yu Wang, Yi Wu

In this paper, we propose a novel framework, Neural Internal Model Control, which integrates model-based control with RL-based control to enhance robustness.

Towards Accurate and Efficient Sub-8-Bit Integer Training

no code implementations17 Nov 2024 Wenjin Guo, Donglai Liu, Weiying Xie, Yunsong Li, Xuefei Ning, Zihan Meng, Shulin Zeng, Jie Lei, Zhenman Fang, Yu Wang

Our integer training framework includes two components: ShiftQuant to realize accurate gradient estimation, and L1 normalization to smoothen the loss landscape.

Quantization

WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking

no code implementations14 Nov 2024 Yunchao, Liu, Ha Dong, Xin Wang, Rocco Moretti, Yu Wang, Zhaoqian Su, Jiawei Gu, Bobby Bodenheimer, Charles David Weaver, Jens Meiler, Tyler Derr

While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation and placed less emphasis on establishing best benchmarking practices.

Benchmarking Drug Discovery

Guide for Defense (G4D): Dynamic Guidance for Robust and Balanced Defense in Large Language Models

1 code implementation23 Oct 2024 He Cao, Weidi Luo, Yu Wang, Zijing Liu, Bing Feng, Yuan YAO, Yu Li

With the extensive deployment of Large Language Models (LLMs), ensuring their safety has become increasingly critical.

ReflecTool: Towards Reflection-Aware Tool-Augmented Clinical Agents

3 code implementations23 Oct 2024 Yusheng Liao, Shuyang Jiang, Yanfeng Wang, Yu Wang

Large Language Models (LLMs) have shown promising potential in the medical domain, assisting with tasks like clinical note generation and patient communication.

PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting

no code implementations23 Oct 2024 Yu Wang, Xiaobao Wei, Ming Lu, Guoliang Kang

In this paper, we propose a new method called PLGS that enables 3DGS to generate consistent panoptic segmentation masks from noisy 2D segmentation masks while maintaining superior efficiency compared to NeRF-based methods.

3DGS NeRF +1

Error estimates between SGD with momentum and underdamped Langevin diffusion

no code implementations22 Oct 2024 Arnaud Guillin, Yu Wang, Lihu Xu, Haoran Yang

Stochastic gradient descent with momentum is a popular variant of stochastic gradient descent, which has recently been reported to have a close relationship with the underdamped Langevin diffusion.

Large Language Models are In-context Preference Learners

no code implementations22 Oct 2024 Chao Yu, Qixin Tan, Hong Lu, Jiaxuan Gao, Xinting Yang, Yu Wang, Yi Wu, Eugene Vinitsky

Preference-based reinforcement learning is an effective way to handle tasks where rewards are hard to specify but can be exceedingly inefficient as preference learning is often tabula rasa.

In-Context Learning reinforcement-learning +1

Large Language Model-based Augmentation for Imbalanced Node Classification on Text-Attributed Graphs

no code implementations22 Oct 2024 Leyao Wang, Yu Wang, Bo Ni, Yuying Zhao, Tyler Derr

Node classification on graphs often suffers from class imbalance, leading to biased predictions and significant risks in real-world applications.

Classification Data Augmentation +5

Disentangling Likes and Dislikes in Personalized Generative Explainable Recommendation

no code implementations17 Oct 2024 Ryotaro Shimizu, Takashi Wada, Yu Wang, Johannes Kruse, Sean O'Brien, Sai HtaungKham, Linxin Song, Yuya Yoshikawa, Yuki Saito, Fugee Tsung, Masayuki Goto, Julian McAuley

Specifically, we construct the datasets by explicitly extracting users' positive and negative opinions from their post-purchase reviews using an LLM, and propose to evaluate systems based on whether the generated explanations 1) align well with the users' sentiments, and 2) accurately identify both positive and negative opinions of users on the target items.

Explainable Recommendation Text Generation

Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective

no code implementations11 Oct 2024 Bo Ni, Yu Wang, Lu Cheng, Erik Blasch, Tyler Derr

We design an uncertainty-aware multi-step reasoning framework that leverages conformal prediction to provide a theoretical guarantee on the prediction set.

Conformal Prediction Knowledge Graphs +4

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping

1 code implementation11 Oct 2024 Yue Yang, Shuibai Zhang, Wenqi Shao, Kaipeng Zhang, Yi Bin, Yu Wang, Ping Luo

Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across multimodal tasks such as visual perception and reasoning, leading to good performance on various multimodal evaluation benchmarks.

MME Question Answering +1

Reconstruction of Particle Flow Energy Distribution Using Deep Learning Algorithms

1 code implementation8 Oct 2024 Han Zhang, Shengxiang Lin, Xingyi Zhang, Yu Wang, Yangguang Zhang

In high-energy particle physics, extracting information from complex detector signals is crucial for energy reconstruction.

Deep Learning Image Reconstruction

Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective

no code implementations6 Oct 2024 Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Li Ding, Hao Zhou, Yu Wang, Guohao Dai

We compare the performance of the same optimization methods across different hardware platforms, the performance across different hardware platforms, and the performance of different methods on the same hardware platform.

Language Modeling Language Modelling +3

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding

1 code implementation2 Oct 2024 Yao Teng, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu

In this paper, we propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation.

Text-to-Image Generation

Self-Updatable Large Language Models with Parameter Integration

no code implementations1 Oct 2024 Yu Wang, Xinshuang Liu, Xiusi Chen, Sean O'Brien, Junda Wu, Julian McAuley

Despite significant advancements in large language models (LLMs), the rapid and frequent integration of small-scale experiences, such as interactions with surrounding objects, remains a substantial challenge.

Continual Learning Conversational Recommendation +3

Towards the Mitigation of Confirmation Bias in Semi-supervised Learning: a Debiased Training Perspective

no code implementations26 Sep 2024 Yu Wang, Yuxuan Yin, Peng Li

TaMatch employs a scaling ratio derived from both a prior target distribution and the model's learning status to estimate and correct bias at each training step.

Image Classification

Multi-Designated Detector Watermarking for Language Models

no code implementations26 Sep 2024 Zhengan Huang, Gongxian Zeng, Xin Mu, Yu Wang, Yue Yu

In this paper, we initiate the study of \emph{multi-designated detector watermarking (MDDW)} for large language models (LLMs).

Anisotropic Diffusion Probabilistic Model for Imbalanced Image Classification

no code implementations22 Sep 2024 Jingyu Kong, Yuan Guo, Yu Wang, Yuping Duan

We utilize the data distribution to control the diffusion speed of different class samples during the forward process, effectively improving the classification accuracy of the denoiser in the reverse process.

Classification Denoising +3

Towards LifeSpan Cognitive Systems

no code implementations20 Sep 2024 Yu Wang, Chi Han, Tongtong Wu, Xiaoxin He, Wangchunshu Zhou, Nafis Sadeq, Xiusi Chen, Zexue He, Wei Wang, Gholamreza Haffari, Heng Ji, Julian McAuley

In this paper we focus on the domain of Large Language Models (LLMs), where we identify two major challenges: (1) Abstraction and Experience Merging, and (2) Long-term Retention with Accurate Recall.

Continual Learning

Reward-Robust RLHF in LLMs

no code implementations18 Sep 2024 Yuzi Yan, Xingzhou Lou, Jialian Li, Yiping Zhang, Jian Xie, Chao Yu, Yu Wang, Dong Yan, Yuan Shen

As Large Language Models (LLMs) continue to progress toward more advanced forms of intelligence, Reinforcement Learning from Human Feedback (RLHF) is increasingly seen as a key pathway toward achieving Artificial General Intelligence (AGI).

CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios

1 code implementation16 Sep 2024 Luning Wang, Shiyao Li, Xuefei Ning, Zhihang Yuan, Shengen Yan, Guohao Dai, Yu Wang

Therefore, we introduce CSKV, a training-efficient Channel Shrinking technique for KV cache compression: (1) We first analyze the singular value distribution of the KV cache, revealing significant redundancy and compression potential along the channel dimension.

PIP: Detecting Adversarial Examples in Large Vision-Language Models via Attention Patterns of Irrelevant Probe Questions

1 code implementation8 Sep 2024 Yudong Zhang, Ruobing Xie, Jiansheng Chen, Xingwu Sun, Yu Wang

We propose an unconventional method named PIP, which utilizes the attention patterns of one randomly selected irrelevant probe question (e. g., "Is there a clock?")

Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

no code implementations5 Sep 2024 Yu Wang, Shiwan Zhao, Zhihu Wang, Heyuan Huang, Ming Fan, Yubo Zhang, Zhixing Wang, Haijun Wang, Ting Liu

The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs).

GSM8K

A Fashion Item Recommendation Model in Hyperbolic Space

no code implementations4 Sep 2024 Ryotaro Shimizu, Yu Wang, Masanari Kimura, Yuki Hirakawa, Takashi Wada, Yuki Saito, Julian McAuley

In this work, we propose a fashion item recommendation model that incorporates hyperbolic geometry into user and item representations.

model Multi-Task Learning

A Comprehensive Review of 3D Object Detection in Autonomous Driving: Technological Advances and Future Directions

1 code implementation28 Aug 2024 Yu Wang, Shaohua Wang, Yicheng Li, Mingchun Liu

By providing a holistic view of the current state and future developments in 3D object perception, we aim to offer a more comprehensive understanding of perception tasks for autonomous driving.

3D Object Detection Autonomous Driving +2

Revisiting the Phenomenon of Syntactic Complexity Convergence on German Dialogue Data

no code implementations22 Aug 2024 Yu Wang, Hendrik Buschmeier

We revisit the phenomenon of syntactic complexity convergence in conversational interaction, originally found for English dialogue, which has theoretical implication for dialogical concepts such as mutual understanding.

Dependency Parsing

Cross-Species Data Integration for Enhanced Layer Segmentation in Kidney Pathology

1 code implementation17 Aug 2024 Junchao Zhu, Mengmeng Yin, Ruining Deng, Yitian Long, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

Cross-species homologous data, such as mouse kidney data, which exhibits high structural and feature similarity to human kidneys, has the potential to enhance model performance on human datasets.

Data Integration Domain Generalization +1

Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives

no code implementations13 Aug 2024 Zhihu Wang, Shiwan Zhao, Yu Wang, Heyuan Huang, Sitao Xie, Yubo Zhang, Jiaxin Shi, Zhixing Wang, Hongyan Li, Junchi Yan

This paper introduces the Re-TASK framework, a novel theoretical model that revisits LLM tasks from the perspectives of capability, skill, and knowledge, drawing on the principles of Bloom's Taxonomy and Knowledge Space Theory.

A Survey on Self-play Methods in Reinforcement Learning

no code implementations2 Aug 2024 Ruize Zhang, Zelai Xu, Chengdong Ma, Chao Yu, Wei-Wei Tu, Wenhao Tang, Shiyu Huang, Deheng Ye, Wenbo Ding, Yaodong Yang, Yu Wang

Self-play, characterized by agents' interactions with copies or past versions of themselves, has recently gained prominence in reinforcement learning (RL).

Multi-agent Reinforcement Learning reinforcement-learning +3

Decoding Linguistic Representations of Human Brain

no code implementations30 Jul 2024 Yu Wang, Heyang Liu, Yuhao Wang, Chuan Xuan, Yixuan Hou, Sheng Feng, Hongcheng Liu, Yusheng Liao, Yanfeng Wang

Language, as an information medium created by advanced organisms, has always been a concern of neuroscience regarding how it is represented in the brain.

Brain Computer Interface

CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models

no code implementations29 Jul 2024 Junda Wu, Xintong Li, Tong Yu, Yu Wang, Xiang Chen, Jiuxiang Gu, Lina Yao, Jingbo Shang, Julian McAuley

Instruction tuning in multimodal large language models (MLLMs) aims to smoothly integrate a backbone LLM with a pre-trained feature encoder for downstream tasks.

GLAM: Glomeruli Segmentation for Human Pathological Lesions using Adapted Mouse Model

no code implementations25 Jul 2024 Lining Yu, Mengmeng Yin, Ruining Deng, Quan Liu, Tianyuan Yao, Can Cui, Yitian Long, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

To answer this question, we introduced GLAM, a deep learning study for fine-grained segmentation of human kidney lesions using a mouse model, addressing mouse-to-human transfer learning, by evaluating different learning strategies for segmenting human pathological lesions using zero-shot transfer learning and hybrid learning by leveraging mouse samples.

Diagnostic Lesion Segmentation +2

Intent-guided Heterogeneous Graph Contrastive Learning for Recommendation

1 code implementation24 Jul 2024 Lei Sang, Yu Wang, Yi Zhang, Yiwen Zhang, Xindong Wu

Contrastive Learning (CL)-based recommender systems have gained prominence in the context of Heterogeneous Graph (HG) due to their capacity to enhance the consistency of representations across different views.

Contrastive Learning Recommendation Systems

Reconstruct the Pruned Model without Any Retraining

no code implementations18 Jul 2024 Pingjie Wang, Ziqing Fan, Shengchao Hu, Zhe Chen, Yanfeng Wang, Yu Wang

Structured pruning is a promising hardware-friendly compression technique for large language models (LLMs), which is expected to be retraining-free to avoid the enormous retraining cost.

Common Sense Reasoning model

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

1 code implementation16 Jul 2024 Penghui Du, Yu Wang, Yifan Sun, Luting Wang, Yue Liao, Gang Zhang, Errui Ding, Yan Wang, Jingdong Wang, Si Liu

Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP. However, two main challenges emerge:(1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge.

Language Modeling Language Modelling +3

ISWSST: Index-space-wave State Superposition Transformers for Multispectral Remotely Sensed Imagery Semantic Segmentation

no code implementations3 Jul 2024 Chang Li, Pengfei Zhang, Yu Wang

Currently the semantic segmentation task of multispectral remotely sensed imagery (MSRSI) faces the following problems: 1) Usually, only single domain feature (i. e., space domain or frequency domain) is considered; 2) downsampling operation in encoder generally leads to the accuracy loss of edge extraction; 3) multichannel features of MSRSI are not fully considered; and 4) prior knowledge of remote sensing is not fully utilized.

Ensemble Learning Segmentation +1

LaMoD: Latent Motion Diffusion Model For Myocardial Strain Generation

1 code implementation2 Jul 2024 Jiarui Xing, Nivetha Jayakumar, Nian Wu, Yu Wang, Frederick H. Epstein, Miaomiao Zhang

More specifically, our method first employs an encoder from a pre-trained registration network that learns latent motion features (also considered as deformation-based shape features) from image sequences.

Image Registration

Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs

1 code implementation1 Jul 2024 Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

For example, we demonstrate that pruning up to 75% of experts in Mixtral $8\times7$B-Instruct results in a substantial reduction in parameters with minimal performance loss.

Mixture-of-Experts

HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis

1 code implementation30 Jun 2024 Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy.

Anatomy Image Segmentation +2

Detecting Subtle Differences between Human and Model Languages Using Spectrum of Relative Likelihood

1 code implementation28 Jun 2024 Yang Xu, Yu Wang, Hao An, Zhichen Liu, Yongyuan Li

Human and model-generated texts can be distinguished by examining the magnitude of likelihood in language.

Text Detection

CAPM: Fast and Robust Verification on Maxpool-based CNN via Dual Network

no code implementations27 Jun 2024 Jia-Hau Bai, Chi-Ting Liu, Yu Wang, Fu-Chieh Chang, Pei-Yuan Wu

This study uses CAPM (Convex Adversarial Polytope for Maxpool-based CNN) to improve the verified bound for general purpose maxpool-based convolutional neural networks (CNNs) under bounded norm adversarial perturbations.

ADO-LLM: Analog Design Bayesian Optimization with In-Context Learning of Large Language Models

no code implementations26 Jun 2024 Yuxuan Yin, Yu Wang, Boxun Xu, Peng Li

Analog circuit design requires substantial human expertise and involvement, which is a significant roadblock to design productivity.

Bayesian Optimization In-Context Learning

MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation

3 code implementations25 Jun 2024 Yusheng Liao, Shuyang Jiang, Zhe Chen, Yanfeng Wang, Yu Wang

Based on this two-stage paradigm, we proposed a Medical LLM through decoupling Clinical Alignment and Knowledge Aggregation (MedCare), which is designed to achieve state-of-the-art (SOTA) performance on over 20 medical tasks, as well as SOTA results on specific medical alignment tasks.

Diversity Natural Language Understanding

Intensity Confusion Matters: An Intensity-Distance Guided Loss for Bronchus Segmentation

1 code implementation23 Jun 2024 Haifan Gong, Wenhao Huang, huan zhang, Yu Wang, Xiang Wan, Hong Shen, Guanbin Li, Haofeng Li

We regard a voxel as a hard sample if it is in: (1) the background and has an intensity value close to the bronchus region; (2) the bronchus region and is of higher intensity than most voxels inside the bronchus; (3) the background region and at a short distance from the bronchus.

Segmentation

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

1 code implementation21 Jun 2024 Tianyu Fu, Haofeng Huang, Xuefei Ning, Genghan Zhang, Boju Chen, Tianqi Wu, Hongyi Wang, Zixiao Huang, Shiyao Li, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

Existing methods typically employ a uniform sparse attention mask, applying the same sparse pattern across different attention heads and input lengths.

Language Modeling Language Modelling +3

Privacy Preserved Blood Glucose Level Cross-Prediction: An Asynchronous Decentralized Federated Learning Approach

1 code implementation21 Jun 2024 Chengzhe Piao, Taiyu Zhu, Yu Wang, Stephanie E Baldeweg, Paul Taylor, Pantelis Georgiou, Jiahao Sun, Jun Wang, Kezhi Li

Newly diagnosed Type 1 Diabetes (T1D) patients often struggle to obtain effective Blood Glucose (BG) prediction models due to the lack of sufficient BG data from Continuous Glucose Monitoring (CGM), presenting a significant "cold start" problem in patient care.

Federated Learning Privacy Preserving

Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study

1 code implementation20 Jun 2024 Xuefei Ning, Zifu Wang, Shiyao Li, Zinan Lin, Peiran Yao, Tianyu Fu, Matthew B. Blaschko, Guohao Dai, Huazhong Yang, Yu Wang

We reveal some findings: (1) Teaching materials that make it easier for students to learn have clearer and more accurate logic when using in-context learning as the student's "learning" method; (2) Weak-to-strong generalization: LbT might help improve strong models by teaching weak models; (3) Diversity in students might help: teaching multiple students could be better than teaching one student or the teacher itself.

In-Context Learning Knowledge Distillation

Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models

1 code implementation17 Jun 2024 Sheng Feng, Heyang Liu, Yu Wang, Yanfeng Wang

In this paper, we introduce a groundbreaking end-to-end (E2E) framework for decoding invasive brain signals, marking a significant advancement in the field of speech neuroprosthesis.

Edge Classification on Graphs: New Directions in Topological Imbalance

1 code implementation17 Jun 2024 Xueqi Cheng, Yu Wang, Yunchao Liu, Yuying Zhao, Charu C. Aggarwal, Tyler Derr

Our empirical studies confirm that TE effectively measures local class distribution variance, and indicate that prioritizing edges with high TE values can help address the issue of topological imbalance.

Edge Classification Graph Classification +2

DiTFastAttn: Attention Compression for Diffusion Transformer Models

no code implementations12 Jun 2024 Zhihang Yuan, Hanling Zhang, Pu Lu, Xuefei Ning, Linfeng Zhang, Tianchen Zhao, Shengen Yan, Guohao Dai, Yu Wang

Diffusion Transformers (DiT) excel at image and video generation but face computational challenges due to the quadratic complexity of self-attention operators.

2k Image Generation +1

Large Generative Graph Models

no code implementations7 Jun 2024 Yu Wang, Ryan A. Rossi, Namyong Park, Huiyuan Chen, Nesreen K. Ahmed, Puja Trivedi, Franck Dernoncourt, Danai Koutra, Tyler Derr

To remedy this crucial gap, we propose a new class of graph generative model called Large Graph Generative Model (LGGM) that is trained on a large corpus of graphs (over 5000 graphs) from 13 different domains.

Language Modelling World Knowledge

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

1 code implementation4 Jun 2024 Tianchen Zhao, Tongcheng Fang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang

Diffusion transformers (DiTs) have exhibited remarkable performance in visual generation tasks, such as generating realistic images or videos based on textual instructions.

Quantization Video Generation

CityLight: A Universal Model for Coordinated Traffic Signal Control in City-scale Heterogeneous Intersections

no code implementations4 Jun 2024 Jinwei Zeng, Chao Yu, Xinyi Yang, Wenxuan Ao, Qianyue Hao, Jian Yuan, Yong Li, Yu Wang, Huazhong Yang

Our method, CityLight, features a universal representation module that not only aligns the state representations of intersections by reindexing their phases based on their semantics and designing heterogeneity-preserving observations, but also encodes the narrowed relative traffic relation types to project the neighborhood intersections onto a uniform relative traffic impact space.

Traffic Signal Control

Hybrid Fourier Score Distillation for Efficient One Image to 3D Object Generation

1 code implementation31 May 2024 Shuzhou Yang, Yu Wang, Haijie Li, Jiarui Meng, Yanmin Wu, Xiandong Meng, Jian Zhang

We note that there is a disparity between the generation priors of these two diffusion models, leading to their different appearance outputs.

3D Generation Image to 3D

Cross-Training with Multi-View Knowledge Fusion for Heterogenous Federated Learning

no code implementations30 May 2024 Zhuang Qi, Lei Meng, Weihao He, Ruohan Zhang, Yu Wang, Xin Qi, Xiangxu Meng

Federated learning benefits from cross-training strategies, which enables models to train on data from distinct sources to improve the generalization capability.

Federated Learning Representation Learning

TAIA: Large Language Models are Out-of-Distribution Data Learners

1 code implementation30 May 2024 Shuyang Jiang, Yusheng Liao, Ya zhang, Yanfeng Wang, Yu Wang

However, in certain specialized domains, such as healthcare or harmless content generation, it is nearly impossible to obtain a large volume of high-quality data that matches the downstream distribution.

Math

Augmenting Textual Generation via Topology Aware Retrieval

no code implementations27 May 2024 Yu Wang, Nedim Lipka, Ruiyi Zhang, Alexa Siu, Yuying Zhao, Bo Ni, Xin Wang, Ryan Rossi, Tyler Derr

This framework includes a retrieval module that selects texts based on their topological relationships and an aggregation module that integrates these texts into prompts to stimulate LLMs for text generation.

RAG Retrieval +1

Large Scale Knowledge Washing

1 code implementation26 May 2024 Yu Wang, Ruihan Wu, Zexue He, Xiusi Chen, Julian McAuley

To this end, we propose LAW (Large Scale Washing) to update the MLP layers in decoder-only large language models to perform knowledge washing, as inspired by model editing methods and based on the hypothesis that knowledge and reasoning are disentanglable.

Decoder Memorization +2

Visual-RolePlay: Universal Jailbreak Attack on MultiModal Large Language Models via Role-playing Image Character

no code implementations25 May 2024 Siyuan Ma, Weidi Luo, Yu Wang, Xiaogeng Liu

With the advent and widespread deployment of Multimodal Large Language Models (MLLMs), ensuring their safety has become increasingly critical.

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

1 code implementation23 May 2024 Yao Teng, Yue Wu, Han Shi, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo Li, Xihui Liu

In addition, to further improve training efficiency for high-resolution image generation with DiM, we investigate "weak-to-strong" training strategy that pretrains DiM on low-resolution images ($256\times 256$) and then finetune it on high-resolution images ($512 \times 512$).

Image Generation Mamba +1

SubGDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning

1 code implementation9 May 2024 Jiying Zhang, Zijing Liu, Yu Wang, Yu Li

We propose a novel diffusion model termed SubGDiff for involving the molecular subgraph information in diffusion.

Denoising Drug Discovery +2

Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition

no code implementations25 Apr 2024 Yu Wang, Sanping Zhou, Kun Xia, Le Wang

Semi-supervised action recognition aims to improve spatio-temporal reasoning ability with a few labeled data in conjunction with a large amount of unlabeled data.

Action Recognition Contrastive Learning

CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective

no code implementations22 Apr 2024 Wencheng Zhu, Xin Zhou, Pengfei Zhu, Yu Wang, QinGhua Hu

Note that constraints on intra-sample similarities and inter-sample dissimilarities can be efficiently and effectively reformulated into a contrastive learning framework with newly designed positive and negative pairs.

Contrastive Learning Image Classification +3

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

no code implementations22 Apr 2024 Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai, Matthew Dixon, Ronen Eldan, Victor Fragoso, Jianfeng Gao, Mei Gao, Min Gao, Amit Garg, Allie Del Giorno, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Wenxiang Hu, Jamie Huynh, Dan Iter, Sam Ade Jacobs, Mojan Javaheripi, Xin Jin, Nikos Karampatziakis, Piero Kauffmann, Mahoud Khademi, Dongwoo Kim, Young Jin Kim, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Xihui Lin, Zeqi Lin, Ce Liu, Liyuan Liu, Mengchen Liu, Weishung Liu, Xiaodong Liu, Chong Luo, Piyush Madan, Ali Mahmoudzadeh, David Majercak, Matt Mazzola, Caio César Teodoro Mendes, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Liliang Ren, Gustavo de Rosa, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Yelong Shen, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Praneetha Vaddamanu, Chunyu Wang, Guanhua Wang, Lijuan Wang, Shuohang Wang, Xin Wang, Yu Wang, Rachel Ward, Wen Wen, Philipp Witte, Haiping Wu, Xiaoxia Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Jilong Xue, Sonali Yadav, Fan Yang, Jianwei Yang, Yifan Yang, ZiYi Yang, Donghan Yu, Lu Yuan, Chenruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

Ranked #5 on MMR total on MRR-Benchmark (using extra training data)

Language Modeling Language Modelling +3

A competitive game optimization algorithm for Unmanned Aerial Vehicle path planning

no code implementations15 Apr 2024 Tai-shan Lou, Guang-sheng Guan, Zhe-peng Yue, Yu Wang, Ren-long Qi, Shi-hao Tong

To solve the Unmanned Aerial Vehicle (UAV) path planning problem, a meta-heuristic optimization algorithm called competitive game optimizer (CGO) is proposed.

Can AI Understand Our Universe? Test of Fine-Tuning GPT by Astrophysical Data

no code implementations14 Apr 2024 Yu Wang, Shu-Rui Zhang, Aidin Momtaz, Rahim Moradi, Fatemeh Rastegarnia, Narek Sahakyan, Soroush Shakeri, Liang Li

With the ever-growing volume of multidisciplinary data and the advancement of AI technology, we look forward to the emergence of a more fundamental and comprehensive understanding of our universe.

MING-MOE: Enhancing Medical Multi-Task Learning in Large Language Models with Sparse Mixture of Low-Rank Adapter Experts

2 code implementations13 Apr 2024 Yusheng Liao, Shuyang Jiang, Yu Wang, Yanfeng Wang

Large language models like ChatGPT have shown substantial progress in natural language understanding and generation, proving valuable across various disciplines, including the medical field.

Diversity Language Modeling +4

On the Uniqueness of Solution for the Bellman Equation of LTL Objectives

no code implementations7 Apr 2024 Zetong Xuan, Alper Kamil Bozkurt, Miroslav Pajic, Yu Wang

In a widely-adopted surrogate reward approach, two discount factors are used to ensure that the expected return approximates the satisfaction probability of the LTL objective.

Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better

1 code implementation2 Apr 2024 Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Sergey Yekhanin, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

For example, LCSC achieves better performance using 1 number of function evaluation (NFE) than the base model with 2 NFE on consistency distillation, and decreases the NFE of DM from 15 to 9 while maintaining the generation quality on CIFAR-10.

ParCo: Part-Coordinating Text-to-Motion Synthesis

1 code implementation27 Mar 2024 Qiran Zou, Shangyuan Yuan, Shian Du, Yu Wang, Chang Liu, Yi Xu, Jie Chen, Xiangyang Ji

However, these methods encounter challenges such as the lack of coordination between different part motions and difficulties for networks to understand part concepts.

Motion Synthesis

Leveraging Large Language Models for Fuzzy String Matching in Political Science

no code implementations27 Mar 2024 Yu Wang

Fuzzy string matching remains a key issue when political scientists combine data from different sources.

Simple Graph Condensation

1 code implementation22 Mar 2024 Zhenbang Xiao, Yu Wang, Shunyu Liu, Huiqiong Wang, Mingli Song, Tongya Zheng

The burdensome training costs on large-scale graphs have aroused significant interest in graph condensation, which involves tuning Graph Neural Networks (GNNs) on a small condensed graph for use on the large-scale original graph.

M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

no code implementations21 Mar 2024 Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang

Although multiple academic video datasets have been constructed and released, few of them support both multimodal content recognition and understanding tasks, which is partially due to the lack of high-quality human annotations.

Diversity Script Generation +3

AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting

1 code implementation14 Mar 2024 Yu Wang, Xiaogeng Liu, Yu Li, Muhao Chen, Chaowei Xiao

However, with the integration of additional modalities, MLLMs are exposed to new vulnerabilities, rendering them prone to structured-based jailbreak attacks, where semantic content (e. g., "harmful text") has been injected into the images to mislead MLLMs.

Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator

3 code implementations13 Mar 2024 Yusheng Liao, Yutong Meng, Yuhao Wang, Hongcheng Liu, Yanfeng Wang, Yu Wang

Large Language Models (LLMs) have demonstrated remarkable proficiency in human interactions, yet their application within the medical field remains insufficiently explored.

Position: Towards Implicit Prompt For Text-To-Image Models

no code implementations4 Mar 2024 Yue Yang, Yuqi Lin, Hong Liu, Wenqi Shao, Runjian Chen, Hailong Shang, Yu Wang, Yu Qiao, Kaipeng Zhang, Ping Luo

We call for increased attention to the potential and risks of implicit prompts in the T2I community and further investigation into the capabilities and impacts of implicit prompts, advocating for a balanced approach that harnesses their benefits while mitigating their risks.

Position

COLA: Cross-city Mobility Transformer for Human Trajectory Simulation

1 code implementation4 Mar 2024 Yu Wang, Tongya Zheng, Yuxuan Liang, Shunyu Liu, Mingli Song

To address these challenges, we have tailored a Cross-city mObiLity trAnsformer (COLA) with a dedicated model-agnostic transfer framework by effectively transferring cross-city knowledge for human trajectory simulation.

CoLA Transfer Learning

LLMs in Political Science: Heralding a New Era of Visual Analysis

no code implementations29 Feb 2024 Yu Wang

We find that Gemini is highly accurate in performing object detection, which is arguably the most common and fundamental task in image analysis for political scientists.

Caption Generation Face Identification +3

Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding

1 code implementation29 Feb 2024 Guangyi Liu, Yu Wang, Zeyu Feng, Qiyu Wu, Liping Tang, Yuan Gao, Zhen Li, Shuguang Cui, Julian McAuley, Zichao Yang, Eric P. Xing, Zhiting Hu

The vast applications of deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations -- across various data types, such as discrete text/protein sequences and continuous images.

Decoder Denoising

Leveraging Diverse Modeling Contexts with Collaborating Learning for Neural Machine Translation

no code implementations28 Feb 2024 Yusheng Liao, Yanfeng Wang, Yu Wang

Autoregressive (AR) and Non-autoregressive (NAR) models are two types of generative models for Neural Machine Translation (NMT).

Contrastive Learning Machine Translation +2

Evaluating Quantized Large Language Models

1 code implementation28 Feb 2024 Shiyao Li, Xuefei Ning, Luning Wang, Tengxuan Liu, Xiangsheng Shi, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang

Specifically, PTQ can effectively mitigate memory consumption and reduce computational overhead in LLMs.

Mamba Quantization

Intelligent Director: An Automatic Framework for Dynamic Visual Composition using ChatGPT

no code implementations24 Feb 2024 Sixiao Zheng, Jingyang Huo, Yu Wang, Yanwei Fu

We propose an Intelligent Director framework, utilizing LENS to generate descriptions for images and video frames and combining ChatGPT to generate coherent captions while recommending appropriate music names.

Retrieval Style Transfer

Representation Learning for Frequent Subgraph Mining

no code implementations22 Feb 2024 Rex Ying, Tianyu Fu, Andrew Wang, Jiaxuan You, Yu Wang, Jure Leskovec

SPMiner combines graph neural networks, order embedding space, and an efficient search strategy to identify network subgraph patterns that appear most frequently in the target graph.

Representation Learning Subgraph Counting

Can One Embedding Fit All? A Multi-Interest Learning Paradigm Towards Improving User Interest Diversity Fairness

no code implementations21 Feb 2024 Yuying Zhao, Minghua Xu, Huiyuan Chen, Yuzhong Chen, Yiwei Cai, Rashidul Islam, Yu Wang, Tyler Derr

Recommender systems (RSs) have gained widespread applications across various domains owing to the superior ability to capture users' interests.

All Diversity +2

LVCHAT: Facilitating Long Video Comprehension

1 code implementation19 Feb 2024 Yu Wang, Zeyuan Zhang, Julian McAuley, Zexue He

To address this issue, we propose Long Video Chat (LVChat), where Frame-Scalable Encoding (FSE) is introduced to dynamically adjust the number of embeddings in alignment with the duration of the video to ensure long videos are not overly compressed into a few embeddings.

Video Captioning

Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

no code implementations19 Feb 2024 Xuelin Qian, Yu Wang, Simian Luo, yinda zhang, Ying Tai, Zhenyu Zhang, Chengjie Wang, xiangyang xue, Bo Zhao, Tiejun Huang, Yunsheng Wu, Yanwei Fu

In this paper, we extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously.

3D Generation 3D Shape Generation +1

Leveraging Opposite Gender Interaction Ratio as a Path towards Fairness in Online Dating Recommendations Based on User Sexual Orientation

no code implementations19 Feb 2024 Yuying Zhao, Yu Wang, Yi Zhang, Pamela Wisniewski, Charu Aggarwal, Tyler Derr

While recommender systems have been designed to improve the user experience in dating platforms by providing personalized recommendations, increasing concerns about fairness have encouraged the development of fairness-aware recommender systems from various perspectives (e. g., gender and race).

Fairness Recommendation Systems +1

M2K-VDG: Model-Adaptive Multimodal Knowledge Anchor Enhanced Video-grounded Dialogue Generation

no code implementations19 Feb 2024 Hongcheng Liu, Pingjie Wang, Yu Wang, Yanfeng Wang

Video-grounded dialogue generation (VDG) requires the system to generate a fluent and accurate answer based on multimodal knowledge.

counterfactual Dialogue Generation +1

Knowledge Graph-based Session Recommendation with Adaptive Propagation

no code implementations17 Feb 2024 Yu Wang, Amin Javari, Janani Balaji, Walid Shalaby, Tyler Derr, Xiquan Cui

Then, we adaptively aggregate items' neighbor information considering user intention within the learned session.

Recommendation Systems

MEMORYLLM: Towards Self-Updatable Large Language Models

1 code implementation7 Feb 2024 Yu Wang, Yifan Gao, Xiusi Chen, Haoming Jiang, Shiyang Li, Jingfeng Yang, Qingyu Yin, Zheng Li, Xian Li, Bing Yin, Jingbo Shang, Julian McAuley

We aim to build models containing a considerable portion of self-updatable parameters, enabling the model to integrate new knowledge effectively and efficiently.

Model Editing

Cannot find the paper you are looking for? You can Submit a new open access paper.