Search Results for author: Bin Wang

Found 331 papers, 134 papers with code

Maximal Clique Based Non-Autoregressive Open Information Extraction

no code implementations EMNLP 2021 Bowen Yu, Yucheng Wang, Tingwen Liu, Hongsong Zhu, Limin Sun, Bin Wang

However, the popular OpenIE systems usually output facts sequentially in the way of predicting the next fact conditioned on the previous decoded ones, which enforce an unnecessary order on the facts and involve the error accumulation between autoregressive steps.

Open Information Extraction Sentence

Towards Robust Neural Machine Translation with Iterative Scheduled Data-Switch Training

1 code implementation COLING 2022 Zhongjian Miao, Xiang Li, Liyan Kang, Wen Zhang, Chulun Zhou, Yidong Chen, Bin Wang, Min Zhang, Jinsong Su

Most existing methods on robust neural machine translation (NMT) construct adversarial examples by injecting noise into authentic examples and indiscriminately exploit two types of examples.

Machine Translation NMT +2

C^3KG: A Chinese Commonsense Conversation Knowledge Graph

1 code implementation Findings (ACL) 2022 Dawei Li, Yanran Li, Jiayi Zhang, Ke Li, Chen Wei, Jianwei Cui, Bin Wang

Existing commonsense knowledge bases often organize tuples in an isolated manner, which is deficient for commonsense conversational models to plan the next steps.

BIT-Xiaomi’s System for AutoSimTrans 2022

no code implementations NAACL (AutoSimTrans) 2022 Mengge Liu, Xiang Li, Bao Chen, Yanzhi Tian, Tianwei Lan, Silin Li, Yuhang Guo, Jian Luan, Bin Wang

This system paper describes the BIT-Xiaomi simultaneous translation system for Autosimtrans 2022 simultaneous translation challenge.

Chunking Data Augmentation +1

A Novel Line Integral Transform for 2D Affine-Invariant Shape Retrieval

no code implementations ECCV 2020 Bin Wang, Yongsheng Gao

While conducting the trace transform once only generates a single feature and multiple trace transforms of different functionals are needed to derive more to make the descriptors informative.

Retrieval

Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding

no code implementations25 Nov 2024 Andong Deng, Zhongpai Gao, Anwesa Choudhuri, Benjamin Planche, Meng Zheng, Bin Wang, Terrence Chen, Chen Chen, Ziyan Wu

Temporal awareness is essential for video large language models (LLMs) to understand and reason about events within long videos, enabling applications like dense video captioning and temporal video grounding in a unified system.

Dense Video Captioning Transfer Learning +1

HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation

no code implementations28 Oct 2024 Yuhan Chen, Ang Lv, Jian Luan, Bin Wang, Wei Liu

Furthermore, we conduct a detailed analysis of rotary position encoding (RoPE, a prevalent relative positional encoding in LLMs), and found that the U-shape attention is caused by some learned components, which are also the key factor limiting RoPE's expressiveness and extrapolation. Inspired by these insights, we propose High-frequency rotary Position Encoding (HoPE).

Position

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

no code implementations28 Oct 2024 Qintong Zhang, Victor Shea-Jay Huang, Bin Wang, Junyuan Zhang, Zhengren Wang, Hao Liang, Shawn Wang, Matthieu Lin, Conghui He, Wentao Zhang

Document parsing is essential for converting unstructured and semi-structured documents-such as contracts, academic papers, and invoices-into structured, machine-readable data.

Data Integration Knowledge Base Construction

Order-aware Interactive Segmentation

no code implementations16 Oct 2024 Bin Wang, Anwesa Choudhuri, Meng Zheng, Zhongpai Gao, Benjamin Planche, Andong Deng, Qin Liu, Terrence Chen, Ulas Bagci, Ziyan Wu

However, current methods often fail to accurately separate target objects from the background, due to a limited understanding of order, the relative depth between objects in a scene.

Interactive Segmentation Segmentation

3-D Magnetotelluric Deep Learning Inversion Guided by Pseudo-Physical Information

no code implementations12 Oct 2024 Peifan Jiang, Xuben Wang, Shuang Wang, Fei Deng, Kunpeng Wang, Bin Wang, Yuhan Yang, Islam Fadel

To efficiently achieve data-physical dual-driven MT deep learning inversion for large-scale 3-D MT data, we propose using DL forward modeling networks to compute this portion of the loss.

CALoR: Towards Comprehensive Model Inversion Defense

1 code implementation8 Oct 2024 Hongyao Yu, Yixiang Qiu, Hao Fang, Bin Chen, Sijin Yu, Bin Wang, Shu-Tao Xia, Ke Xu

Model Inversion Attacks (MIAs) aim at recovering privacy-sensitive training data from the knowledge encoded in the released machine learning models.

Low-rank compression

PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning

no code implementations25 Sep 2024 Qibin Wang, Xiaolin Hu, Weikai Xu, Wei Liu, Jian Luan, Bin Wang

Low-rank adaptation (LoRA) and its variants have recently gained much interest due to their ability to avoid excessive inference costs.

GSM8K Math

MobileVLM: A Vision-Language Model for Better Intra- and Inter-UI Understanding

1 code implementation23 Sep 2024 Qinzhuo Wu, Weikai Xu, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Shuo Shang

These fine-tuned VLMs may still ignore the relationships between UI pages, neglect the roles of elements in page transitions and lack inter-UI understanding.

Language Modelling

Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data

1 code implementation19 Sep 2024 Jiaming Zhou, Abbas Ghaddar, Ge Zhang, Liheng Ma, Yaochen Hu, Soumyasundar Pal, Mark Coates, Bin Wang, Yingxue Zhang, Jianye Hao

Despite recent advances in training and prompting strategies for Large Language Models (LLMs), these models continue to face challenges with complex logical reasoning tasks that involve long reasoning chains.

Logical Reasoning Spatial Reasoning

An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Large-Scale Recommender Systems

no code implementations18 Sep 2024 Peng Liu, Jiawei Zhu, Cong Xu, Ming Zhao, Bin Wang

However, limited by their modeling pattern, all the current RL-MTF methods can only utilize user features as the state to generate actions for each user, but unable to make use of item features and other valuable features, which leads to suboptimal results.

Multi-Task Learning Recommendation Systems +1

Mixture of Diverse Size Experts

no code implementations18 Sep 2024 Manxi Sun, Wei Liu, Jian Luan, Pengzhi Gao, Bin Wang

The Sparsely-Activated Mixture-of-Experts (MoE) has gained increasing popularity for scaling up large language models (LLMs) without exploding computational costs.

MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders

no code implementations10 Sep 2024 Wenyu Zhang, Shuo Sun, Bin Wang, Xunlong Zou, Zhuohan Liu, Yingxu He, Geyu Lin, Nancy F. Chen, Ai Ti Aw

The rapid advancements in large language models (LLMs) have significantly enhanced natural language processing capabilities, facilitating the development of AudioLLMs that process and understand speech and audio inputs alongside text.

UMOD: A Novel and Effective Urban Metro Origin-Destination Flow Prediction Method

no code implementations8 Sep 2024 Peng Xie, Minbo Ma, Bin Wang, Junbo Zhang, Tianrui Li

Accurate prediction of metro Origin-Destination (OD) flow is essential for the development of intelligent transportation systems and effective urban traffic management.

Relation

CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation

1 code implementation5 Sep 2024 Bin Wang, Fan Wu, Linke Ouyang, Zhuangcheng Gu, Rui Zhang, Renqiu Xia, Bo Zhang, Conghui He

Such a spatially-aware and character-matching method offers a more accurate and equitable evaluation compared with previous BLEU and Edit Distance metrics that rely solely on text-based character matching.

ToolACE: Winning the Points of LLM Function Calling

no code implementations2 Sep 2024 Weiwen Liu, Xu Huang, Xingshan Zeng, Xinlong Hao, Shuai Yu, Dexun Li, Shuai Wang, Weinan Gan, Zhengying Liu, Yuanqing Yu, Zezhong Wang, Yuxian Wang, Wu Ning, Yutai Hou, Bin Wang, Chuhan Wu, Xinzhi Wang, Yong liu, Yasheng Wang, Duyu Tang, Dandan Tu, Lifeng Shang, Xin Jiang, Ruiming Tang, Defu Lian, Qun Liu, Enhong Chen

Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability.

Segmentation-guided Layer-wise Image Vectorization with Gradient Fills

1 code implementation28 Aug 2024 Hengyu Zhou, HUI ZHANG, Bin Wang

The widespread use of vector graphics creates a significant demand for vectorization methods.

Segmentation Vector Graphics

IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal Capabilities

1 code implementation23 Aug 2024 Bin Wang, Chunyu Xie, Dawei Leng, Yuhui Yin

Building on the strategy of freezing the language model, we conduct thorough structural exploration and introduce the Inner-Adaptor Architecture (IAA).

Language Modelling Large Language Model +1

Understanding Literary Texts by LLMs: A Case Study of Ancient Chinese Poetry

1 code implementation22 Aug 2024 Cheng Zhao, Bin Wang, Zhen Wang

Additionally, evaluating literary works is often complex and hard to fully quantify, which directly hinders the further development of AI creation.

TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning

1 code implementation20 Aug 2024 Bin Wang, Wenqian Wang

Therefore, in this paper, we propose a memory-efficient Temporal Difference Side Network (TDS-CLIP) to balance knowledge transferring and temporal modeling, avoiding backpropagation in frozen parameter models.

Action Recognition parameter-efficient fine-tuning +2

CSI-Free Position Optimization for Movable Antenna Communication Systems: A Black-Box Optimization Approach

no code implementations9 Aug 2024 Xianlong Zeng, Jun Fang, Bin Wang, Boyu Ning, Hongbin Li

Movable antenna (MA) is a new technology which leverages local movement of antennas to improve channel qualities and enhance the communication performance.

Position

In2Core: Leveraging Influence Functions for Coreset Selection in Instruction Finetuning of Large Language Models

no code implementations7 Aug 2024 Ayrton San Joaquin, Bin Wang, Zhengyuan Liu, Nicholas Asher, Brian Lim, Philippe Muller, Nancy F. Chen

By applying our algorithm to instruction fine-tuning data of LLMs, we can achieve similar performance with just 50% of the training data.

Walk Wisely on Graph: Knowledge Graph Reasoning with Dual Agents via Efficient Guidance-Exploration

no code implementations3 Aug 2024 Zijian Wang, Bin Wang, Haifeng Jing, Huayu Li, Hongbo Dou

The high-level agent walks on the simplified knowledge graph to provide stage-wise hints for the low-level agent walking on the original knowledge graph.

Hierarchical Reinforcement Learning Knowledge Graphs

Image Re-Identification: Where Self-supervision Meets Vision-Language Learning

1 code implementation30 Jul 2024 Bin Wang, Yuying Liang, Lei Cai, Huakun Huang, Huanqiang Zeng

Recently, large-scale vision-language pre-trained models like CLIP have shown impressive performance in image re-identification (ReID).

A New Dataset and Framework for Real-World Blurred Images Super-Resolution

1 code implementation20 Jul 2024 Rui Qin, Ming Sun, Chao Zhou, Bin Wang

However, we find that the efficacy of recent methods obviously diminishes when employed on image data with blur, while image data with intentional blur constitute a substantial proportion of general data.

Disentanglement Image Super-Resolution

Learning Structurally Stabilized Representations for Multi-modal Lossless DNA Storage

no code implementations17 Jul 2024 Ben Cao, Tiantian He, Xue Li, Bin Wang, Xiaohu Wu, Qiang Zhang, Yew-Soon Ong

By incorporating these novel strategies, the proposed RSRL can learn highly durable, dense, and lossless representations for the subsequent storage tasks into DNA sequences.

Representation Learning

Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations

1 code implementation8 Jul 2024 Bowen Shen, Zheng Lin, Daren Zha, Wei Liu, Jian Luan, Bin Wang, Weiping Wang

However, as the coarse-grained structured pruning poses large damage to the highly interconnected model, achieving a high compression ratio for scaled-up LLMs remains a challenge.

Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents

1 code implementation1 Jul 2024 Shihan Deng, Weikai Xu, Hongda Sun, Wei Liu, Tao Tan, Jianfeng Liu, Ang Li, Jian Luan, Bin Wang, Rui Yan, Shuo Shang

With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction.

Benchmarking

A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset

1 code implementation26 Jun 2024 Muwei Jian, Haoran Zhang, Mingju Shao, Hongyu Chen, Huihui Huang, Yanjie Zhong, Changlei Zhang, Bin Wang, Penghui Gao

Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis.

Computed Tomography (CT) Lung Cancer Diagnosis

Gaze-directed Vision GNN for Mitigating Shortcut Learning in Medical Image

1 code implementation20 Jun 2024 Shaoxuan Wu, Xiao Zhang, Bin Wang, Zhuo Jin, Hansheng Li, Jun Feng

In this paper, we propose a novel gaze-directed Vision GNN (called GD-ViG) to leverage the visual patterns of radiologists from gaze as expert knowledge, directing the network toward disease-relevant regions, and thereby mitigating shortcut learning.

Medical Image Analysis

Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding

1 code implementation19 Jun 2024 Jizhong Liu, Gang Li, Junbo Zhang, Heinrich Dinkel, Yongqing Wang, Zhiyong Yan, Yujun Wang, Bin Wang

Automated audio captioning (AAC) is an audio-to-text task to describe audio contents in natural language.

Ranked #2 on Audio captioning on Clotho (using extra training data)

Audio captioning Decoder

Bridging Language Gaps in Audio-Text Retrieval

1 code implementation11 Jun 2024 Zhiyong Yan, Heinrich Dinkel, Yongqing Wang, Jizhong Liu, Junbo Zhang, Yujun Wang, Bin Wang

The predominant focus of existing research on English descriptions poses a limitation on the applicability of such models, given the abundance of non-English content in real-world data.

AudioCaps Text Retrieval

OpenDataLab: Empowering General Artificial Intelligence with Open Datasets

no code implementations4 Jun 2024 Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin

The dispersion of data sources and diversity of data formats often lead to inefficiencies in data retrieval and processing, significantly impeding the progress of AI research and applications.

Diversity

DSDL: Data Set Description Language for Bridging Modalities and Tasks in AI Data

1 code implementation28 May 2024 Bin Wang, Linke Ouyang, Fan Wu, Wenchang Ning, Xiao Han, Zhiyuan Zhao, Jiahui Peng, Yiying Jiang, Dahua Lin, Conghui He

In the era of artificial intelligence, the diversity of data modalities and annotation formats often renders data unusable directly, requiring understanding and format conversion before it can be used by researchers or developers with different needs.

Diversity

CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification

1 code implementation26 May 2024 Qijie Wang, Guandu Liu, Bin Wang

Our method achieves outstanding zero-shot classification results across 19 benchmark datasets, improving accuracy by 2. 19\% over the previous leading method.

Zero-Shot Learning

Enhancing User Interest based on Stream Clustering and Memory Networks in Large-Scale Recommender Systems

no code implementations21 May 2024 Peng Liu, Nian Wang, Cong Xu, Ming Zhao, Bin Wang, Yi Ren

Recommender Systems (RSs) provide personalized recommendation service based on user interest, which are widely used in various platforms.

Recommendation Systems UIE

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

4 code implementations7 May 2024 DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J. L. Cai, Jian Liang, JianZhong Guo, Jiaqi Ni, Jiashi Li, Jin Chen, Jingyang Yuan, Junjie Qiu, Junxiao Song, Kai Dong, Kaige Gao, Kang Guan, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qihao Zhu, Qinyu Chen, Qiushi Du, R. J. Chen, R. L. Jin, Ruiqi Ge, Ruizhe Pan, Runxin Xu, Ruyi Chen, S. S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Size Zheng, T. Wang, Tian Pei, Tian Yuan, Tianyu Sun, W. L. Xiao, Wangding Zeng, Wei An, Wen Liu, Wenfeng Liang, Wenjun Gao, Wentao Zhang, X. Q. Li, Xiangyue Jin, Xianzu Wang, Xiao Bi, Xiaodong Liu, Xiaohan Wang, Xiaojin Shen, Xiaokang Chen, Xiaosha Chen, Xiaotao Nie, Xiaowen Sun, Xiaoxiang Wang, Xin Liu, Xin Xie, Xingkai Yu, Xinnan Song, Xinyi Zhou, Xinyu Yang, Xuan Lu, Xuecheng Su, Y. Wu, Y. K. Li, Y. X. Wei, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Li, Yaohui Wang, Yi Zheng, Yichao Zhang, Yiliang Xiong, Yilong Zhao, Ying He, Ying Tang, Yishi Piao, Yixin Dong, Yixuan Tan, Yiyuan Liu, Yongji Wang, Yongqiang Guo, Yuchen Zhu, Yuduan Wang, Yuheng Zou, Yukun Zha, Yunxian Ma, Yuting Yan, Yuxiang You, Yuxuan Liu, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhen Huang, Zhen Zhang, Zhenda Xie, Zhewen Hao, Zhihong Shao, Zhiniu Wen, Zhipeng Xu, Zhongyu Zhang, Zhuoshu Li, Zihan Wang, Zihui Gu, Zilin Li, Ziwei Xie

MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation.

Language Modelling Reinforcement Learning (RL)

CRAFT: Extracting and Tuning Cultural Instructions from the Wild

2 code implementations6 May 2024 Bin Wang, Geyu Lin, Zhengyuan Liu, Chengwei Wei, Nancy F. Chen

Large language models (LLMs) have rapidly evolved as the foundation of various natural language processing (NLP) applications.

PhyRecon: Physically Plausible Neural Scene Reconstruction

no code implementations25 Apr 2024 Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bin Wang, Bo Dai, Puhao Li, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

In this paper, we introduce PHYRECON, the first approach to leverage both differentiable rendering and differentiable physics simulation to learn implicit surface representations.

3D Reconstruction Multi-View 3D Reconstruction

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

1 code implementation23 Apr 2024 Bin Wang, Zhuangcheng Gu, Guang Liang, Chao Xu, Bo Zhang, Botian Shi, Conghui He

To better utilize the UniMER dataset, the paper proposes a Universal Mathematical Expression Recognition Network (UniMERNet), tailored to the characteristics of formula recognition.

Decoder Diversity +1

An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems

no code implementations19 Apr 2024 Peng Liu, Cong Xu, Ming Zhao, Jiawei Zhu, Bin Wang, Yi Ren

Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry.

Efficient Exploration Multi-Task Learning +2

CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment

1 code implementation18 Apr 2024 Geyu Lin, Bin Wang, Zhengyuan Liu, Nancy F. Chen

This performance discrepancy mainly stems from the imbalanced distribution of training data across languages during pre-training and instruction tuning stages.

WiTUnet: A U-Shaped Architecture Integrating CNN and Transformer for Improved Feature Alignment and Local Information Fusion

1 code implementation15 Apr 2024 Bin Wang, Fei Deng, Peifan Jiang, Shuang Wang, Xiao Han, Zhixuan Zhang

Low-dose computed tomography (LDCT) has become the technology of choice for diagnostic medical imaging, given its lower radiation dose compared to standard CT, despite increasing image noise and potentially affecting diagnostic accuracy.

Decoder Image Denoising +1

Resilience of Large Language Models for Noisy Instructions

no code implementations15 Apr 2024 Bin Wang, Chengwei Wei, Zhengyuan Liu, Geyu Lin, Nancy F. Chen

As the rapidly advancing domain of natural language processing (NLP), large language models (LLMs) have emerged as powerful tools for interpreting human commands and generating text across various tasks.

Automatic Speech Recognition Optical Character Recognition +3

InternLM2 Technical Report

3 code implementations26 Mar 2024 Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang, Penglong Jiao, Zhenjiang Jin, Zhikai Lei, Jiaxing Li, Jingwen Li, Linyang Li, Shuaibin Li, Wei Li, Yining Li, Hongwei Liu, Jiangning Liu, Jiawei Hong, Kaiwen Liu, Kuikun Liu, Xiaoran Liu, Chengqi Lv, Haijun Lv, Kai Lv, Li Ma, Runyuan Ma, Zerun Ma, Wenchang Ning, Linke Ouyang, Jiantao Qiu, Yuan Qu, FuKai Shang, Yunfan Shao, Demin Song, Zifan Song, Zhihao Sui, Peng Sun, Yu Sun, Huanze Tang, Bin Wang, Guoteng Wang, Jiaqi Wang, Jiayu Wang, Rui Wang, Yudong Wang, Ziyi Wang, Xingjian Wei, Qizhen Weng, Fan Wu, Yingtong Xiong, Chao Xu, Ruiliang Xu, Hang Yan, Yirong Yan, Xiaogui Yang, Haochen Ye, Huaiyuan Ying, JIA YU, Jing Yu, Yuhang Zang, Chuyu Zhang, Li Zhang, Pan Zhang, Peng Zhang, Ruijie Zhang, Shuo Zhang, Songyang Zhang, Wenjian Zhang, Wenwei Zhang, Xingcheng Zhang, Xinyue Zhang, Hui Zhao, Qian Zhao, Xiaomeng Zhao, Fengzhe Zhou, Zaida Zhou, Jingming Zhuo, Yicheng Zou, Xipeng Qiu, Yu Qiao, Dahua Lin

The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).

4k Long-Context Understanding

EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network

1 code implementation20 Mar 2024 Bin Wang, Fei Deng, Peifan Jiang

Electroencephalogram (EEG) signals play a pivotal role in clinical medicine, brain research, and neurological disease studies.

Denoising EEG +1

Distribution-Aware Data Expansion with Diffusion Models

1 code implementation11 Mar 2024 Haowei Zhu, Ling Yang, Jun-Hai Yong, Hongzhi Yin, Jiawei Jiang, Meng Xiao, Wentao Zhang, Bin Wang

In this paper, we propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.

Image Generation Informativeness

ToolRerank: Adaptive and Hierarchy-Aware Reranking for Tool Retrieval

no code implementations11 Mar 2024 Yuanhang Zheng, Peng Li, Wei Liu, Yang Liu, Jian Luan, Bin Wang

Specifically, our proposed ToolRerank includes Adaptive Truncation, which truncates the retrieval results related to seen and unseen tools at different positions, and Hierarchy-Aware Reranking, which makes retrieval results more concentrated for single-tool queries and more diverse for multi-tool queries.

Retrieval

Communication Efficient ConFederated Learning: An Event-Triggered SAGA Approach

no code implementations28 Feb 2024 Bin Wang, Jun Fang, Hongbin Li, Yonina C. Eldar

Due to the potentially massive number of users involved, it is crucial to reduce the communication overhead of the CFL system.

Federated Learning

A Comprehensive Evaluation of Quantization Strategies for Large Language Models

1 code implementation26 Feb 2024 Renren Jin, Jiangcun Du, Wuwei Huang, Wei Liu, Jian Luan, Bin Wang, Deyi Xiong

Our experimental results indicate that LLMs with 4-bit quantization can retain performance comparable to their non-quantized counterparts, and perplexity can serve as a proxy metric for quantized LLMs on most benchmarks.

Language Modelling Quantization

DRSI-Net: Dual-Residual Spatial Interaction Network for Multi-Person Pose Estimation

no code implementations26 Feb 2024 Shang Wu, Bin Wang

To address the above problems, a dual-residual spatial interaction network (DRSI-Net) for MPPE with high accuracy and low complexity is proposed herein.

Multi-Person Pose Estimation

Active Support of Inverters for Improving Short-Term Voltage Security in 100% IBRsPenetrated Power Systems

no code implementations2 Feb 2024 Yinhong Lin, Bin Wang, Qinglai Guo, Haotian Zhao, Hongbin Sun

Due to the energy crisis and environmental pollution, the installed capacity of inverter-based resources (IBRs) in power grids is rapidly increasing, and grid-following control (GFL) is the most prevalent at present.

In-Context Learning for Few-Shot Nested Named Entity Recognition

no code implementations2 Feb 2024 Meishan Zhang, Bin Wang, Hao Fei, Min Zhang

In nested Named entity recognition (NER), entities are nested with each other, and thus requiring more data annotations to address.

Contrastive Learning In-Context Learning +7

Impact of Flexible and Bidirectional Charging in Medium- and Heavy-Duty Trucks on California's Decarbonization Pathway

no code implementations18 Jan 2024 Osten Anderson, Wanshi Hong, Bin Wang, Nanpeng Yu

In particular, we examine the potential cost savings of electrical generation infrastructure by enabling flexible charging and bidirectional charging for these trucks.

Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts

no code implementations CVPR 2024 Fei Ni, Jianye Hao, Shiguang Wu, Longxin Kou, Jiashun Liu, Yan Zheng, Bin Wang, Yuzheng Zhuang

Inspired by the great success of diffusion model in image generation tasks we propose a novel hierarchical framework named as CoTDiffusion that incorporates diffusion model as a high-level planner to convert the general and multi-modal prompts into coherent visual subgoal plans which further guide the low-level policy model before action execution.

Image Generation Instruction Following +2

Parrot Captions Teach CLIP to Spot Text

1 code implementation21 Dec 2023 Yiqi Lin, Conghui He, Alex Jinpeng Wang, Bin Wang, Weijia Li, Mike Zheng Shou

Despite CLIP being the foundation model in numerous vision-language applications, the CLIP suffers from a severe text spotting bias.

Representation Learning text similarity +1

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

2 code implementations CVPR 2024 Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Conghui He, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu

Based on the observation, OPERA introduces a penalty term on the model logits during the beam-search decoding to mitigate the over-trust issue, along with a rollback strategy that retrospects the presence of summary tokens in the previously generated tokens, and re-allocate the token selection if necessary.

Hallucination

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

1 code implementation28 Nov 2023 Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, Conghui He

Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem", in which the models generate textual descriptions that inaccurately depict or entirely fabricate content from associated images.

Hallucination

DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy

1 code implementation28 Oct 2023 Hongda Sun, Weikai Xu, Wei Liu, Jian Luan, Bin Wang, Shuo Shang, Ji-Rong Wen, Rui Yan

Recent advances in large language models (LLMs) have revolutionized the landscape of reasoning tasks.

Logical Reasoning

Blind Image Super-resolution with Rich Texture-Aware Codebooks

no code implementations26 Oct 2023 Rui Qin, Ming Sun, Fangyuan Zhang, Xing Wen, Bin Wang

However, we find that a codebook based on HR reconstruction may not effectively capture the complex correlations between low-resolution (LR) and HR images.

Blind Super-Resolution Diversity +1

F$^2$AT: Feature-Focusing Adversarial Training via Disentanglement of Natural and Perturbed Patterns

no code implementations23 Oct 2023 Yaguan Qian, Chenyu Zhao, Zhaoquan Gu, Bin Wang, Shouling Ji, Wei Wang, Boyang Zhou, Pan Zhou

We propose a Feature-Focusing Adversarial Training (F$^2$AT), which differs from previous work in that it enforces the model to focus on the core features from natural patterns and reduce the impact of spurious features from perturbed patterns.

Adversarial Robustness Disentanglement +2

EMIT-Diff: Enhancing Medical Image Segmentation via Text-Guided Diffusion Model

no code implementations19 Oct 2023 Zheyuan Zhang, Lanhong Yao, Bin Wang, Debesh Jha, Elif Keles, Alpay Medetalibeyoglu, Ulas Bagci

We leverage recent diffusion probabilistic models to generate realistic and diverse synthetic medical image data that preserve the essential characteristics of the original medical images by incorporating edge information of objects to guide the synthesis process.

Data Augmentation Image Generation +4

Instructive Dialogue Summarization with Query Aggregations

1 code implementation17 Oct 2023 Bin Wang, Zhengyuan Liu, Nancy F. Chen

With the advancement of instruction-finetuned language models, we introduce instruction-tuning to dialogues to expand the capability set of dialogue summarization models.

Machine Reading Comprehension Text Summarization

Tree-Planner: Efficient Close-loop Task Planning with Large Language Models

no code implementations12 Oct 2023 Mengkang Hu, Yao Mu, Xinmiao Yu, Mingyu Ding, Shiguang Wu, Wenqi Shao, Qiguang Chen, Bin Wang, Yu Qiao, Ping Luo

This paper studies close-loop task planning, which refers to the process of generating a sequence of skills (a plan) to accomplish a specific goal while adapting the plan based on real-time observations.

Decision Making

Knowledge Graph Embedding: An Overview

no code implementations21 Sep 2023 Xiou Ge, Yun-Cheng Wang, Bin Wang, C. -C. Jay Kuo

We will also discuss an emerging approach for KG completion which leverages pre-trained language models (PLMs) and textual descriptions of entities and relations and offer insights into the integration of KGE embedding methods with PLMs for KG completion.

Knowledge Graph Embedding Link Prediction

Deep Mutual Learning across Task Towers for Effective Multi-Task Recommender Learning

no code implementations19 Sep 2023 Yi Ren, Ying Du, Bin Wang, Shenzheng Zhang

Recommender systems usually leverage multi-task learning methods to simultaneously optimize several objectives because of the multi-faceted user behavior data.

Multi-Task Learning Recommendation Systems

What Makes Good Open-Vocabulary Detector: A Disassembling Perspective

no code implementations1 Sep 2023 Jincheng Li, Chunyu Xie, Xiaoyu Wu, Bin Wang, Dawei Leng

A two-stage object detector includes a visual backbone, a region proposal network (RPN), and a region of interest (RoI) head.

Object object-detection +2

AsyncET: Asynchronous Learning for Knowledge Graph Entity Typing with Auxiliary Relations

no code implementations30 Aug 2023 Yun-Cheng Wang, Xiou Ge, Bin Wang, C. -C. Jay Kuo

Previously, KG embedding (KGE) methods tried to solve the KGET task by introducing an auxiliary relation, 'hasType', to model the relationship between entities and their types.

Entity Typing Knowledge Graphs +2

MLLM-DataEngine: An Iterative Refinement Approach for MLLM

1 code implementation25 Aug 2023 Zhiyuan Zhao, Linke Ouyang, Bin Wang, Siyuan Huang, Pan Zhang, Xiaoyi Dong, Jiaqi Wang, Conghui He

Despite the great advance of Multimodal Large Language Models (MLLMs) in both instruction dataset building and benchmarking, the independence of training and evaluation makes current MLLMs hard to further improve their capability under the guidance of evaluation results with a relatively low human cost.

Benchmarking

Deep Reinforcement Learning-driven Cross-Community Energy Interaction Optimal Scheduling

no code implementations24 Aug 2023 Yang Li, Wenjie Ma, Fanjin Bu, Zhen Yang, Bin Wang, Meng Han

In order to coordinate energy interactions among various communities and energy conversions among multi-energy subsystems within the multi-community integrated energy system under uncertain conditions, and achieve overall optimization and scheduling of the comprehensive energy system, this paper proposes a comprehensive scheduling model that utilizes a multi-agent deep reinforcement learning algorithm to learn load characteristics of different communities and make decisions based on this knowledge.

Deep Reinforcement Learning reinforcement-learning +1

VIGC: Visual Instruction Generation and Correction

2 code implementations24 Aug 2023 Bin Wang, Fan Wu, Xiao Han, Jiahui Peng, Huaping Zhong, Pan Zhang, Xiaoyi Dong, Weijia Li, Wei Li, Jiaqi Wang, Conghui He

A practical solution to this problem would be to utilize the available multimodal large language models (MLLMs) to generate instruction data for vision-language tasks.

Hallucination Image Captioning +1

Few-Shot Physically-Aware Articulated Mesh Generation via Hierarchical Deformation

1 code implementation ICCV 2023 Xueyi Liu, Bin Wang, He Wang, Li Yi

By observing an articulated object dataset containing only a few examples, we wish to learn a model that can generate diverse meshes with high visual fidelity and physical validity.

Diversity Philosophy

WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models

1 code implementation21 Aug 2023 Conghui He, Zhenjiang Jin, Chao Xu, Jiantao Qiu, Bin Wang, Wei Li, Hang Yan, Jiaqi Wang, Dahua Lin

The rise in popularity of ChatGPT and GPT-4 has significantly accelerated the development of large models, leading to the creation of numerous impressive large language models(LLMs) and multimodal large language models (MLLMs).

Efficient Multi-View Inverse Rendering Using a Hybrid Differentiable Rendering Method

no code implementations19 Aug 2023 Xiangyang Zhu, Yiling Pan, Bailin Deng, Bin Wang

In this paper, we introduce a novel hybrid differentiable rendering method to efficiently reconstruct the 3D geometry and reflectance of a scene from multi-view images captured by conventional hand-held cameras.

3D geometry Inverse Rendering

Reconstructed Convolution Module Based Look-Up Tables for Efficient Image Super-Resolution

1 code implementation ICCV 2023 Guandu Liu, Yukang Ding, Mading Li, Ming Sun, Xing Wen, Bin Wang

To enlarge RF with contained LUT sizes, we propose a novel Reconstructed Convolution(RC) module, which decouples channel-wise and spatial calculation.

Image Super-Resolution

Impact of UAVs Equipped with ADS-B on the Civil Aviation Monitoring System

no code implementations4 Jul 2023 Yiyang Liao, Lei Zhang, Ziye Jia, Chao Dong, Yifan Zhang, Qihui Wu, Huiling Hu, Bin Wang

However, due to the limited frequency of ADS-B technique, UAVs equipped with ADS-B devices result in the loss of packets to both UAVs and civil aviation.

Blocking Position

CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?

no code implementations29 Jun 2023 Tianwen Wei, Jian Luan, Wei Liu, Shuang Dong, Bin Wang

We present the Chinese Elementary School Math Word Problems (CMATH) dataset, comprising 1. 7k elementary school-level math word problems with detailed annotations, source from actual Chinese workbooks and exams.

Language Modelling Math +1

Low-Confidence Samples Mining for Semi-supervised Object Detection

no code implementations28 Jun 2023 Guandu Liu, Fangyuan Zhang, Tianxiang Pan, Bin Wang

Reliable pseudo-labels from unlabeled data play a key role in semi-supervised object detection (SSOD).

Object object-detection +2

ChiPFormer: Transferable Chip Placement via Offline Decision Transformer

no code implementations26 Jun 2023 Yao Lai, Jinxin Liu, Zhentao Tang, Bin Wang, Jianye Hao, Ping Luo

To resolve these challenges, we cast the chip placement as an offline RL formulation and present ChiPFormer that enables learning a transferable placement policy from fixed offline data.

Offline RL Reinforcement Learning (RL)

Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization

no code implementations NeurIPS 2023 Jinxin Liu, Hongyin Zhang, Zifeng Zhuang, Yachen Kang, Donglin Wang, Bin Wang

Naturally, such a paradigm raises three core questions that are not fully answered by prior non-iterative offline RL counterparts like reward-conditioned policy: (q1) What information should we transfer from the inner-level to the outer-level?

Offline RL Test-time Adaptation

Short-Term Voltage Security Constrained UC to Prevent Trip Faults in High Wind Power Penetrated Power Systems

no code implementations19 Jun 2023 Yinhong Lin, Bin Wang, Qinglai Guo, Haotian Zhao, Hongbin Sun

Due to the time delay in WTs' controllers, it is difficult for WTs alone to meet the reactive power demands in different stages of the transient process.

Scheduling

UniMC: A Unified Framework for Long-Term Memory Conversation via Relevance Representation Learning

no code implementations18 Jun 2023 Kang Zhao, Wei Liu, Jian Luan, Minglei Gao, Li Qian, Hanlin Teng, Bin Wang

In this paper, we propose a Unified framework for Long-term Memory Conversations (UniMC), which increases the connection between different stages by learning relevance representation.

Conversation Summarization Decoder +2

Universal Information Extraction with Meta-Pretrained Self-Retrieval

no code implementations18 Jun 2023 Xin Cong. Bowen Yu, Mengcheng Fang, Tingwen Liu, Haiyang Yu, Zhongkai Hu, Fei Huang, Yongbin Li, Bin Wang

Inspired by the fact that large amount of knowledge are stored in the pretrained language models~(PLM) and can be retrieved explicitly, in this paper, we propose MetaRetriever to retrieve task-specific knowledge from PLMs to enhance universal IE.

Retrieval

Exploring Multi-Timestep Multi-Stage Diffusion Features for Hyperspectral Image Classification

1 code implementation15 Jun 2023 Jingyi Zhou, Jiamu Sheng, Jiayuan Fan, Peng Ye, Tong He, Bin Wang, Tao Chen

To address this issue, we propose a novel diffusion-based feature learning framework that explores Multi-Timestep Multi-Stage Diffusion features for HSI classification for the first time, called MTMSD.

Classification Hyperspectral Image Classification

MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

no code implementations31 May 2023 Fei Ni, Jianye Hao, Yao Mu, Yifu Yuan, Yan Zheng, Bin Wang, Zhixuan Liang

Recently, diffusion model shines as a promising backbone for the sequence modeling paradigm in offline reinforcement learning(RL).

Reinforcement Learning (RL)