Search Results for author: Chao Zhang

Found 428 papers, 161 papers with code

AcTune: Uncertainty-Based Active Self-Training for Active Fine-Tuning of Pretrained Language Models

1 code implementation NAACL 2022 Yue Yu, Lingkai Kong, Jieyu Zhang, Rongzhi Zhang, Chao Zhang

We develop AcTune, a new framework that improves the label efficiency of active PLM fine-tuning by unleashing the power of unlabeled data via self-training.

Active Learning text-classification +1

Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning

1 code implementation ACL 2022 Rongzhi Zhang, Yue Yu, Pranav Shetty, Le Song, Chao Zhang

Weakly-supervised learning (WSL) has shown promising results in addressing label scarcity on many NLP tasks, but manually designing a comprehensive, high-quality labeling rule set is tedious and difficult.

Weakly-supervised Learning

Transferring SLU Models in Novel Domains

no code implementations ICLR 2019 Yaohua Tang, Kaixiang Mo, Qian Xu, Chao Zhang, Qiang Yang

When building models for novel natural language domains, a major challenge is the lack of data in the new domains, no matter whether the data is annotated or not.

Intent Recognition Meta-Learning +4

GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks

no code implementations17 Apr 2025 Hao Xu, Xiangru Jian, Xinjian Zhao, Wei Pang, Chao Zhang, Suyuchen Wang, Qixin Zhang, Joao Monteiro, Qiuzhuang Sun, Tianshu Yu

In this paper, we presented GraphOmni, a comprehensive benchmark framework for systematically evaluating the graph reasoning capabilities of LLMs.

QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions

no code implementations26 Mar 2025 Siyin Wang, Wenyi Yu, Xianzhao Chen, Xiaohai Tian, Jun Zhang, Lu Lu, Yu Tsao, Junichi Yamagishi, Yuxuan Wang, Chao Zhang

To bridge this gap, we introduce QualiSpeech, a comprehensive low-level speech quality assessment dataset encompassing 11 key aspects and detailed natural language comments that include reasoning and contextual insights.

IAP: Improving Continual Learning of Vision-Language Models via Instance-Aware Prompting

1 code implementation26 Mar 2025 Hao Fu, Hanbin Zhao, Jiahua Dong, Chao Zhang, Hui Qian

Recent pre-trained vision-language models (PT-VLMs) often face a Multi-Domain Class-Incremental Learning (MCIL) scenario in practice, where several classes and domains of multi-modal tasks are incrementally arrived.

class-incremental learning Class Incremental Learning +2

ACVUBench: Audio-Centric Video Understanding Benchmark

1 code implementation25 Mar 2025 Yudong Yang, Jimin Zhuang, Guangzhi Sun, Changli Tang, Yixuan Li, Peihan Li, Yifan Jiang, Wei Li, Zejun Ma, Chao Zhang

Audio often serves as an auxiliary modality in video understanding tasks of audio-visual large language models (LLMs), merely assisting in the comprehension of visual information.

Video Understanding

Language Model Uncertainty Quantification with Attention Chain

1 code implementation24 Mar 2025 Yinghao Li, Rushi Qiang, Lama Moukheiber, Chao Zhang

To address this, we propose UQAC, an efficient method that narrows the reasoning space to a tractable size for marginalization.

Computational Efficiency Language Modeling +4

Incomplete Multi-view Clustering via Diffusion Contrastive Generation

no code implementations12 Mar 2025 Yuanyang Zhang, Yijie Lin, Weiqing Yan, Li Yao, Xinhang Wan, Guangyuan Li, Chao Zhang, Guanzhou Ke, Jie Xu

By performing contrastive learning on a limited set of paired multi-view samples, DCG can align the generated views with the real views, facilitating accurate recovery of views across arbitrary missing view scenarios.

Clustering Contrastive Learning +3

Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences

no code implementations6 Mar 2025 Adnan Shahid, Adrian Kliks, Ahmed Al-Tahmeesschi, Ahmed Elbakary, Alexandros Nikou, Ali Maatouk, Ali Mokh, Amirreza Kazemi, Antonio De Domenico, Athanasios Karapantelakis, Bo Cheng, Bo Yang, Bohao Wang, Carlo Fischione, Chao Zhang, Chaouki Ben Issaid, Chau Yuen, Chenghui Peng, Chongwen Huang, Christina Chaccour, Christo Kurisummoottil Thomas, Dheeraj Sharma, Dimitris Kalogiros, Dusit Niyato, Eli de Poorter, Elissa Mhanna, Emilio Calvanese Strinati, Faouzi Bader, Fathi Abdeldayem, Fei Wang, Fenghao Zhu, Gianluca Fontanesi, Giovanni Geraci, Haibo Zhou, Hakimeh Purmehdi, Hamed Ahmadi, Hang Zou, Hongyang Du, Hoon Lee, Howard H. Yang, Iacopo Poli, Igor Carron, Ilias Chatzistefanidis, Inkyu Lee, Ioannis Pitsiorlas, Jaron Fontaine, Jiajun Wu, Jie Zeng, Jinan Li, Jinane Karam, Johny Gemayel, Juan Deng, Julien Frison, Kaibin Huang, Kehai Qiu, Keith Ball, Kezhi Wang, Kun Guo, Leandros Tassiulas, Lecorve Gwenole, Liexiang Yue, Lina Bariah, Louis Powell, Marcin Dryjanski, Maria Amparo Canaveras Galdon, Marios Kountouris, Maryam Hafeez, Maxime Elkael, Mehdi Bennis, Mehdi Boudjelli, Meiling Dai, Merouane Debbah, Michele Polese, Mohamad Assaad, Mohamed Benzaghta, Mohammad Al Refai, Moussab Djerrab, Mubeen Syed, Muhammad Amir, Na Yan, Najla Alkaabi, Nan Li, Nassim Sehad, Navid Nikaein, Omar Hashash, Pawel Sroka, Qianqian Yang, Qiyang Zhao, Rasoul Nikbakht Silab, Rex Ying, Roberto Morabito, Rongpeng Li, Ryad Madi, Salah Eddine El Ayoubi, Salvatore D'Oro, Samson Lasaulce, Serveh Shalmashi, Sige Liu, Sihem Cherrared, Swarna Bindu Chetty, Swastika Dutta, Syed A. R. Zaidi, Tianjiao Chen, Timothy Murphy, Tommaso Melodia, Tony Q. S. Quek, Vishnu Ram, Walid Saad, Wassim Hamidouche, Weilong Chen, Xiaoou Liu, Xiaoxue Yu, Xijun Wang, Xingyu Shang, Xinquan Wang, Xuelin Cao, Yang Su, Yanping Liang, Yansha Deng, Yifan Yang, Yingping Cui, Yu Sun, Yuxuan Chen, Yvan Pointurier, Zeinab Nehme, Zeinab Nezami, Zhaohui Yang, Zhaoyang Zhang, Zhe Liu, Zhenyu Yang, Zhu Han, Zhuang Zhou, Zihan Chen, Zirui Chen, Zitao Shuai

This white paper discusses the role of large-scale AI in the telecommunications industry, with a specific focus on the potential of generative AI to revolutionize network functions and user experiences, especially in the context of 6G systems.

Management

Low-Rank and Sparse Model Merging for Multi-Lingual Speech Recognition and Translation

no code implementations24 Feb 2025 Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

Traditional multi-task training approaches aim to address this by jointly optimizing multiple speech recognition and translation tasks across various languages.

Automatic Speech Recognition Diversity +3

Class-Dependent Perturbation Effects in Evaluating Time Series Attributions

1 code implementation24 Feb 2025 Gregor Baer, Isel Grau, Chao Zhang, Pieter Van Gorp

As machine learning models become increasingly prevalent in time series applications, Explainable Artificial Intelligence (XAI) methods are essential for understanding their predictions.

Explainable artificial intelligence Explainable Artificial Intelligence (XAI) +2

A Unified Modeling Framework for Automated Penetration Testing

no code implementations17 Feb 2025 Yunfei Wang, Shixuan Liu, Wenhao Wang, Changling Zhou, Chao Zhang, Jiandong Jin, Cheng Zhu

The integration of artificial intelligence into automated penetration testing (AutoPT) has highlighted the necessity of simulation modeling for the training of intelligent agents, due to its cost-efficiency and swift feedback capabilities.

video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

no code implementations17 Feb 2025 Guangzhi Sun, Yudong Yang, Jimin Zhuang, Changli Tang, Yixuan Li, Wei Li, Zejun Ma, Chao Zhang

video-SALMONN-o1 achieves 3-8% accuracy improvements over the LLaVA-OneVision baseline across different video reasoning benchmarks.

Language Modeling Language Modelling +2

Streamlining the Collaborative Chain of Models into A Single Forward Pass in Generation-Based Tasks

1 code implementation16 Feb 2025 Yuanjie Lyu, Chao Zhang, Yuhao Chen, Yong Chen, Tong Xu

In Retrieval-Augmented Generation (RAG) and agent-based frameworks, the "Chain of Models" approach is widely used, where multiple specialized models work sequentially on distinct sub-tasks.

RAG

Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training

no code implementations10 Feb 2025 Yuchen Zhuang, Jingfeng Yang, Haoming Jiang, Xin Liu, Kewei Cheng, Sanket Lokegaonkar, Yifan Gao, Qing Ping, Tianyi Liu, Binxuan Huang, Zheng Li, Zhengyang Wang, Pei Chen, Ruijie Wang, Rongzhi Zhang, Nasser Zalmout, Priyanka Nigam, Bing Yin, Chao Zhang

Due to the scarcity of agent-oriented pre-training data, LLM-based autonomous agents typically rely on complex prompting or extensive fine-tuning, which often fails to introduce new capabilities while preserving strong generalizability.

MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation

1 code implementation9 Feb 2025 Zhifei Yang, Keyang Lu, Chao Zhang, Jiaxing Qi, Hanqi Jiang, Ruifei Ma, Shenglin Yin, Yifan Xu, Mingzhe Xing, Zhen Xiao, Jieyi Long, Guangyao Zhai

Controllable 3D scene generation has extensive applications in virtual reality and interior design, where the generated scenes should exhibit high levels of realism and controllability in terms of geometry.

Scene Generation

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

no code implementations27 Jan 2025 Chen Chen, Yuchen Hu, Siyin Wang, Helin Wang, Zhehuai Chen, Chao Zhang, Chao-Han Huck Yang, Eng Siong Chng

Recent advances have enabled large language models (LLMs) to incorporate auditory systems for handling various speech-related tasks.

Descriptive

CE-SDWV: Effective and Efficient Concept Erasure for Text-to-Image Diffusion Models via a Semantic-Driven Word Vocabulary

no code implementations26 Jan 2025 Jiahang Tu, Qian Feng, Chufan Chen, Jiahua Dong, Hanbin Zhao, Chao Zhang, Hui Qian

Large-scale text-to-image (T2I) diffusion models have achieved remarkable generative performance about various concepts.

The 1st SpeechWellness Challenge: Detecting Suicidal Risk Among Adolescents

no code implementations11 Jan 2025 Wen Wu, Ziyun Cui, Chang Lei, Yinan Duan, Diyang Qu, Ji Wu, BoWen Zhou, Runsen Chen, Chao Zhang

The 1st SpeechWellness Challenge (SW1) aims to advance methods for detecting suicidal risk in adolescents using speech analysis techniques.

Detecting Defective Wafers Via Modular Networks

no code implementations6 Jan 2025 Yifeng Zhang, Bryan Baker, Shi Chen, Chao Zhang, Yu Huang, Qi Zhao, Sthitie Bom

The growing availability of sensors within semiconductor manufacturing processes makes it feasible to detect defective wafers with data-driven models.

Fault Detection

FedRSClip: Federated Learning for Remote Sensing Scene Classification Using Vision-Language Models

no code implementations5 Jan 2025 Hui Lin, Chao Zhang, Danfeng Hong, Kexin Dong, Congcong Wen

In this paper, we propose FedRSCLIP, the first federated learning framework designed for remote sensing image classification based on a VLM, specifically CLIP.

Federated Learning Image Classification +4

Geometric Deep Learning for Realized Covariance Matrix Forecasting

1 code implementation12 Dec 2024 Andrea Bucci, Michele Palma, Chao Zhang

Traditional methods employed in matrix volatility forecasting often overlook the inherent Riemannian manifold structure of symmetric positive definite matrices, treating them as elements of Euclidean space, which can lead to suboptimal predictive performance.

Deep Learning

LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync

1 code implementation12 Dec 2024 Chunyu Li, Chao Zhang, Weikai Xu, Jinghui Xie, Weiguo Feng, Bingyue Peng, Weiwei Xing

Since we did not change the overall training framework of SyncNet, our experience can also be applied to other lip sync and audio-driven portrait animation methods that utilize SyncNet.

Portrait Animation

Automated Dynamic Image Analysis for Particle Size and Shape Classification in Three Dimensions

no code implementations6 Dec 2024 Sadegh Nadimi, Vasileios Angelidakis, Sadaf Maramizonouz, Chao Zhang

Current state-of-the art instruments for dynamic image analysis are largely limited to two-dimensional imaging.

Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment

no code implementations4 Dec 2024 Feng He, Chao Zhang, Zhixue Zhao

Given a "source" prompt (e. g., "rose") that elicits an implicit assumption (e. g., rose is red) and a "destination" prompt that specifies the desired attribute (e. g., "blue rose"), Embedit fine-tunes only the word token embedding (WTE) of the target object ("rose") to optimize the last hidden state of text encoder in Stable Diffusion, a SOTA text-to-image model.

Attribute Text-to-Image Generation

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation

no code implementations27 Nov 2024 Wenyi Yu, Siyin Wang, Xiaoyu Yang, Xianzhao Chen, Xiaohai Tian, Jun Zhang, Guangzhi Sun, Lu Lu, Yuxuan Wang, Chao Zhang

Unlike traditional modularised conversational AI systems, which separate speech recognition, understanding, and text-to-speech generation into distinct components, multimodal LLMs operate as single end-to-end models.

Question Answering Speech Enhancement +3

Self-Generated Critiques Boost Reward Modeling for Language Models

no code implementations25 Nov 2024 Yue Yu, Zhengxing Chen, Aston Zhang, Liang Tan, Chenguang Zhu, Richard Yuanzhe Pang, Yundi Qian, Xuewei Wang, Suchin Gururangan, Chao Zhang, Melanie Kambadur, Dhruv Mahajan, Rui Hou

Reward modeling is crucial for aligning large language models (LLMs) with human preferences, especially in reinforcement learning from human feedback (RLHF).

Adversarial Attacks Using Differentiable Rendering: A Survey

no code implementations14 Nov 2024 Matthew Hull, Chao Zhang, Zsolt Kira, Duen Horng Chau

Differentiable rendering methods have emerged as a promising means for generating photo-realistic and physically plausible adversarial attacks by manipulating 3D objects and scenes that can deceive deep neural networks (DNNs).

Depth Estimation Image Classification +5

Fast Disentangled Slim Tensor Learning for Multi-view Clustering

1 code implementation12 Nov 2024 Deng Xu, Chao Zhang, Zechao Li, Chunlin Chen, Huaxiong Li

To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view.

Clustering Disentanglement

Matryoshka: Learning to Drive Black-Box LLMs with LLMs

no code implementations28 Oct 2024 Changhao Li, Yuchen Zhuang, Rushi Qiang, Haotian Sun, Hanjun Dai, Chao Zhang, Bo Dai

To address this challenge, we introduce Matryoshika, a lightweight white-box LLM controller that guides a large-scale black-box LLM generator by decomposing complex tasks into a series of intermediate outputs.

In-Context Learning

BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models

no code implementations9 Oct 2024 Fangyikang Wang, Hubery Yin, Yuejiang Dong, Huminhao Zhu, Chao Zhang, Hanbin Zhao, Hui Qian, Chen Li

In this paper, we introduce a generic formulation, \emph{Bidirectional Explicit Linear Multi-step} (BELM) samplers, of the exact inversion samplers, which includes all previously proposed heuristic exact inversion samplers as special cases.

Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization

no code implementations9 Oct 2024 Changli Tang, Yixuan Li, Yudong Yang, Jimin Zhuang, Guangzhi Sun, Wei Li, Zujun Ma, Chao Zhang

To address potential catastrophic forgetting of non-captioning abilities due to mrDPO, we propose rebirth tuning, which finetunes the pre-DPO LLM by using the captions generated by the mrDPO-trained model as supervised labels.

Audio captioning Large Language Model +4

Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer

no code implementations7 Oct 2024 Siyuan Hou, Shansong Liu, Ruibin Yuan, Wei Xue, Ying Shan, Mangsuo Zhao, Chao Zhang

For more precise and fine-grained melody control, we introduce a novel top-$k$ constant-Q Transform representation as the melody prompt, reducing ambiguity compared to previous representations (e. g., chroma), particularly for music with multiple tracks or a wide range of pitch values.

Music Generation Music Style Transfer +2

LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy

no code implementations4 Oct 2024 Rongzhi Zhang, Kuang Wang, Liyuan Liu, Shuohang Wang, Hao Cheng, Chao Zhang, Yelong Shen

Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages, which requires extensive parameter tuning thus unsuitable for pre-trained LLMs; (2) KV cache compression at test time, primarily through token eviction policies, which often overlook inter-layer dependencies and can be task-specific.

Low-rank compression

SWIM: Short-Window CNN Integrated with Mamba for EEG-Based Auditory Spatial Attention Decoding

1 code implementation30 Sep 2024 Ziyang Zhang, Andrew Thwaites, Alexandra Woolgar, Brian Moore, Chao Zhang

By joint training SW$_\text{CNN}$ and Mamba, the proposed SWIM structure uses both short-term and long-term information and achieves an accuracy of 86. 2%, which reduces the classification errors by a relative 31. 0% compared to the previous state-of-the-art result.

Data Augmentation EEG +1

LW2G: Learning Whether to Grow for Prompt-based Continual Learning

1 code implementation27 Sep 2024 Qian Feng, Dawei Zhou, Hanbin Zhao, Chao Zhang, Hui Qian

To promote cross-task knowledge facilitation and form an effective and efficient prompt sets pool, we propose a plug-in module in the former stage to \textbf{Learn Whether to Grow (LW2G)} based on the disparities between tasks.

Continual Learning Prompt Learning +1

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation

1 code implementation25 Sep 2024 Siyin Wang, Wenyi Yu, Yudong Yang, Changli Tang, Yixuan Li, Jimin Zhuang, Xianzhao Chen, Xiaohai Tian, Jun Zhang, Guangzhi Sun, Lu Lu, Yuxuan Wang, Chao Zhang

The results demonstrate that auditory LLMs achieve competitive performance compared to state-of-the-art task-specific small models in predicting MOS and SIM, while also delivering promising results in A/B testing and natural language descriptions.

Text to Speech

MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events

no code implementations25 Sep 2024 Xiaoyu Yang, Qiujia Li, Chao Zhang, Phil Woodland

In this work, MT2KD, a novel two-stage multi-task learning framework is proposed to build a general-purpose speech and audio encoder that jointly performs three fundamental tasks: automatic speech recognition (ASR), audio tagging (AT) and speaker verification (SV).

Audio Tagging Automatic Speech Recognition +5

Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement

no code implementations15 Sep 2024 Yudong Yang, Zhan Liu, Wenyi Yu, Guangzhi Sun, Qiuqiang Kong, Chao Zhang

Diffusion-based generative models have recently achieved remarkable results in speech and vocal enhancement due to their ability to model complex speech data distributions.

RNR: Teaching Large Language Models to Follow Roles and Rules

no code implementations10 Sep 2024 Kuan Wang, Alexander Bukharin, Haoming Jiang, Qingyu Yin, Zhengyang Wang, Tuo Zhao, Jingbo Shang, Chao Zhang, Bing Yin, Xian Li, Jianshu Chen, Shiyang Li

However, existing models trained on open-source IFT datasets only have the ability to follow instructions from users, and often fail to follow complex role and rules specified by developers, a. k. a.

Instruction Following

TextToucher: Fine-Grained Text-to-Touch Generation

1 code implementation9 Sep 2024 Jiahang Tu, Hao Fu, Fengyu Yang, Hanbin Zhao, Chao Zhang, Hui Qian

We model these granularities of information through text descriptions and propose a fine-grained Text-to-Touch generation method (TextToucher) to generate high-quality tactile samples.

Language Modelling Large Language Model +1

SA-MLP: A Low-Power Multiplication-Free Deep Network for 3D Point Cloud Classification in Resource-Constrained Environments

1 code implementation3 Sep 2024 Qiang Zheng, Chao Zhang, Jian Sun

Point cloud classification plays a crucial role in the processing and analysis of data from 3D sensors such as LiDAR, which are commonly used in applications like autonomous vehicles, robotics, and environmental monitoring.

3D Point Cloud Classification Autonomous Vehicles +2

Efficient Point Cloud Classification via Offline Distillation Framework and Negative-Weight Self-Distillation Technique

no code implementations3 Sep 2024 Qiang Zheng, Chao Zhang, Jian Sun

To address these challenges, we introduce an innovative offline recording strategy that avoids the simultaneous loading of both teacher and student models, thereby reducing hardware demands.

Data Augmentation Knowledge Distillation +2

PMT-MAE: Dual-Branch Self-Supervised Learning with Distillation for Efficient Point Cloud Classification

no code implementations3 Sep 2024 Qiang Zheng, Chao Zhang, Jian Sun

This paper introduces PMT-MAE (Point MLP-Transformer Masked Autoencoder), a novel self-supervised learning framework for point cloud classification.

Point Cloud Classification Self-Supervised Learning +1

Tangram: Benchmark for Evaluating Geometric Element Recognition in Large Multimodal Models

no code implementations25 Aug 2024 Chao Zhang, Jiamin Tang, Jing Xiao

Significant advancements in Large Multimodal Models (LMMs) have enabled them to tackle complex problems involving visual-mathematical reasoning.

Mathematical Reasoning

Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs

1 code implementation17 Aug 2024 Anshuman Sinha, Camille Migozzi, Aubin Rey, Chao Zhang

In this paper, we propose to equip the multi-modal ALMs with temporal understanding without loosing their inherent prior capabilities of audio-language tasks with a temporal instillation method TeminAL.

Audio Classification Contrastive Learning +2

ViC: Virtual Compiler Is All You Need For Assembly Code Search

1 code implementation10 Aug 2024 Zeyu Gao, Hao Wang, Yuanda Wang, Chao Zhang

Assembly code search is vital for reducing the burden on reverse engineers, allowing them to quickly identify specific functions using natural language within vast binary programs.

All Code Search +3

PointMT: Efficient Point Cloud Analysis with Hybrid MLP-Transformer Architecture

no code implementations10 Aug 2024 Qiang Zheng, Chao Zhang, Jian Sun

In recent years, point cloud analysis methods based on the Transformer architecture have made significant progress, particularly in the context of multimedia applications such as 3D modeling, virtual reality, and autonomous systems.

Strategic Federated Learning: Application to Smart Meter Data Clustering

no code implementations5 Aug 2024 Hassan Mohamad, Chao Zhang, Samson Lasaulce, Vineeth S Varma, Mérouane Debbah, Mounir Ghogho

In this paper, we introduce a novel FL framework in which the FC uses an aggregate version of the MI to make decisions that affect the client's utility functions.

Clustering Federated Learning +1

Confidence Estimation for Automatic Detection of Depression and Alzheimer's Disease Based on Clinical Interviews

no code implementations29 Jul 2024 Wen Wu, Chao Zhang, Philip C. Woodland

Speech-based automatic detection of Alzheimer's disease (AD) and depression has attracted increased attention.

Diagnostic

DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving

1 code implementation22 Jul 2024 Jiahang Tu, Wei Ji, Hanbin Zhao, Chao Zhang, Roger Zimmermann, Hui Qian

Such datasets are expected to cover various driving scenarios with adverse weather, lighting conditions and diverse moving objects.

Autonomous Driving Diversity +2

Goal-Oriented State Information Compression for Linear Dynamical System Control

no code implementations14 Jul 2024 Li Wang, Chao Zhang, Samson Lasaulce, Lina Bariah, Merouane Debbah

In this paper, we consider controlled linear dynamical systems in which the controller has only access to a compressed version of the system state.

Generative AI for RF Sensing in IoT systems

no code implementations10 Jul 2024 Li Wang, Chao Zhang, Qiyang Zhao, Hang Zou, Samson Lasaulce, Giuseppe Valenzise, Zhuo He, Merouane Debbah

The development of wireless sensing technologies, using signals such as Wi-Fi, infrared, and RF to gather environmental data, has significantly advanced within Internet of Things (IoT) systems.

PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer

1 code implementation4 Jul 2024 Qian Feng, Hanbin Zhao, Chao Zhang, Jiahua Dong, Henghui Ding, Yu-Gang Jiang, Hui Qian

Prompt-fixed methods only learn a single set of prompts on one of the incremental tasks and can not handle all the incremental tasks effectively.

Incremental Learning

Learning Graph Structures and Uncertainty for Accurate and Calibrated Time-series Forecasting

no code implementations2 Jul 2024 Harshavardhan Kamarthi, Lingkai Kong, Alexander Rodriguez, Chao Zhang, B Aditya Prakash

Recent works model the relations between time-series as graphs and have shown that propagating information over the relation graph can improve time series forecasting.

Time Series Time Series Forecasting

SOT Triggered Neural Clustering for Speaker Attributed ASR

no code implementations2 Jul 2024 Xianrui Zheng, Guangzhi Sun, Chao Zhang, Philip C. Woodland

This is achieved by the use of ASR, trained using a serialised output training method, together with segment-level discriminative neural clustering (SDNC) to assign speaker labels.

Clustering

Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization

no code implementations2 Jul 2024 Yuchen Hu, Chen Chen, Siyin Wang, Eng Siong Chng, Chao Zhang

By leveraging reverse inference as the standard to select exemplars used in RLHF from the speech samples generated by the TTS system itself, RIO steers the subsequent optimization towards a direction of enhancing the TTS robustness.

Inference Optimization Speech Synthesis +2

Large Language Models for Power Scheduling: A User-Centric Approach

1 code implementation29 Jun 2024 Thomas Mongaillard, Samson Lasaulce, Othman Hicheur, Chao Zhang, Lina Bariah, Vineeth S. Varma, Hang Zou, Qiyang Zhao, Merouane Debbah

While traditional optimization and scheduling schemes are designed to meet fixed, predefined system requirements, future systems are moving toward user-driven approaches and personalized services, aiming to achieve high quality-of-experience (QoE) and flexibility.

Intent Recognition Prompt Engineering +1

EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees

1 code implementation24 Jun 2024 Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang

Inference with modern Large Language Models (LLMs) is expensive and time-consuming, and speculative sampling has proven to be an effective solution.

Efficient Evolutionary Search Over Chemical Space with Large Language Models

1 code implementation23 Jun 2024 Haorui Wang, Marta Skreta, Cher-Tian Ser, Wenhao Gao, Lingkai Kong, Felix Strieth-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang

Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable.

Drug Design Evolutionary Algorithms

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

1 code implementation22 Jun 2024 Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Yuxuan Wang, Chao Zhang

To obtain fine-grained temporal information required by speech understanding, while keeping efficient for other video elements, this paper proposes a novel multi-resolution causal Q-Former (MRC Q-Former) structure to connect pre-trained audio-visual encoders and the backbone large language model.

Diversity Language Modeling +3

Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation

1 code implementation21 Jun 2024 Yuanjie Lyu, Zihan Niu, Zheyong Xie, Chao Zhang, Tong Xu, Yang Wang, Enhong Chen

Despite the significant progress of large language models (LLMs) in various tasks, they often produce factual errors due to their limited internal knowledge.

Answer Generation RAG

LLMatDesign: Autonomous Materials Discovery with Large Language Models

no code implementations19 Jun 2024 Shuyi Jia, Chao Zhang, Victor Fung

Discovering new materials can have significant scientific and technological implications but remains a challenging problem today due to the enormity of the chemical space.

From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

no code implementations12 Jun 2024 Shiwei Wu, Chao Zhang, Joya Chen, Tong Xu, Likang Wu, Yao Hu, Enhong Chen

People's social relationships are often manifested through their surroundings, with certain objects or interactions acting as symbols for specific relationships, e. g., wedding rings, roses, hugs, or holding hands.

Descriptive Visual Social Relationship Recognition

An Improved Empirical Fisher Approximation for Natural Gradient Descent

no code implementations10 Jun 2024 Xiaodong Wu, Wenyi Yu, Chao Zhang, Philip Woodland

Approximate Natural Gradient Descent (NGD) methods are an important family of optimisers for deep learning models, which use approximate Fisher information matrices to pre-condition gradients during training.

parameter-efficient fine-tuning

Aligning Large Language Models with Representation Editing: A Control Perspective

1 code implementation10 Jun 2024 Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

To achieve alignment for specific objectives, we introduce external control signals into the state space of this language dynamical system.

Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models

no code implementations6 Jun 2024 Ziyun Cui, Chang Lei, Wen Wu, Yinan Duan, Diyang Qu, Ji Wu, Runsen Chen, Chao Zhang

The early detection of suicide risk is important since it enables the intervention to prevent potential suicide attempts.

HYDRA: Model Factorization Framework for Black-Box LLM Personalization

1 code implementation5 Jun 2024 Yuchen Zhuang, Haotian Sun, Yue Yu, Rushi Qiang, Qifan Wang, Chao Zhang, Bo Dai

To address these challenges, we propose HYDRA, a model factorization framework that captures both user-specific behavior patterns from historical data and shared general knowledge among all users to deliver personalized generation.

General Knowledge

Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

no code implementations2 Jun 2024 Chen Chen, Yuchen Hu, Wen Wu, Helin Wang, Eng Siong Chng, Chao Zhang

In recent years, text-to-speech (TTS) technology has witnessed impressive advancements, particularly with large-scale training datasets, showcasing human-level speech quality and impressive zero-shot capabilities on unseen speakers.

Speech Synthesis Text to Speech +1

MSSC-BiMamba: Multimodal Sleep Stage Classification and Early Diagnosis of Sleep Disorders with Bidirectional Mamba

no code implementations30 May 2024 Chao Zhang, Weirong Cui, Jingjing Guo

Our model, which can effectively handle diverse sleep conditions, is the first to apply BiMamba to sleep staging with multimodal PSG data, showing substantial gains in computational and memory efficiency over traditional Transformer-style models.

Diagnostic Mamba +2

NoteLLM-2: Multimodal Large Representation Models for Recommendation

1 code implementation27 May 2024 Chao Zhang, Haoxin Zhang, Shiwei Wu, Di wu, Tong Xu, Xiangyu Zhao, Yan Gao, Yao Hu, Enhong Chen

While leveraging existing Multimodal Large Language Models (MLLMs) for such tasks is promising, challenges arise due to their delayed release compared to corresponding LLMs and the inefficiency in representation tasks.

In-Context Learning

Incremental Pseudo-Labeling for Black-Box Unsupervised Domain Adaptation

no code implementations26 May 2024 Yawen Zou, Chunzhi Gu, Jun Yu, Shangce Gao, Chao Zhang

Black-Box unsupervised domain adaptation (BBUDA) learns knowledge only with the prediction of target data from the source model without access to the source data and source model, which attempts to alleviate concerns about the privacy and security of data.

Pseudo Label Unsupervised Domain Adaptation

Bayesian WeakS-to-Strong from Text Classification to Generation

no code implementations24 May 2024 Ziyun Cui, Ziyang Zhang, Wen Wu, Guangzhi Sun, Chao Zhang

Advances in large language models raise the question of how alignment techniques will adapt as models become increasingly complex and humans will only be able to supervise them weakly.

text-classification Text Classification +1

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

1 code implementation23 May 2024 Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Chengwei Qin, Pin-Yu Chen, Eng Siong Chng, Chao Zhang

We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models

no code implementations22 May 2024 Guangzhi Sun, Potsawee Manakul, Adian Liusie, Kunat Pipatanakul, Chao Zhang, Phil Woodland, Mark Gales

Multimodal foundation models are prone to hallucination, generating outputs that either contradict the input or are not grounded by factual information.

Benchmarking Hallucination +2

BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers

1 code implementation29 Apr 2024 ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Yanqiao Zhu, May D. Wang, Joyce C. Ho, Chao Zhang, Carl Yang

Developing effective biomedical retrieval models is important for excelling at knowledge-intensive biomedical tasks but still challenging due to the deficiency of sufficient publicly annotated biomedical data and computational resources.

Retrieval Unsupervised Pre-training

Frequency-Guided Multi-Level Human Action Anomaly Detection with Normalizing Flows

no code implementations26 Apr 2024 Shun Maeda, Chunzhi Gu, Jun Yu, Shogo Tokai, Shangce Gao, Chao Zhang

We introduce the task of human action anomaly detection (HAAD), which aims to identify anomalous motions in an unsupervised manner given only the pre-determined normal category of training action samples.

Anomaly Detection

Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities

no code implementations23 Apr 2024 Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang

Large language models (LLMs) can adapt to new tasks through in-context learning (ICL) based on a few examples presented in dialogue history without any model parameter update.

In-Context Learning

Emerging Advancements in 6G NTN Radio Access Technologies: An Overview

no code implementations22 Apr 2024 Husnain Shahid, Carla Amatetti, Riccardo Campana, Sorya Tong, Dorin Panaitopol, Alessandro Vanelli Coralli, Abdelhamed Mohamed, Chao Zhang, Ebraam Khalifa, Eduardo Medeiros, Estefania Recayte, Fatemeh Ghasemifard, Ji Lianghai, Juan Bucheli, Karthik Anantha Swamy, Marius Caus, Mehmet Gurelli, Miguel A. Vazquez, Musbah Shaat, Nathan Borios, Per-Erik Eriksson, Sebastian Euler, Zheng Li, Xiaotian Fu

The efforts on the development, standardization and improvements to communication systems towards 5G Advanced and 6G are on track to provide benefits such as an unprecedented level of connectivity and performance, enabling a diverse range of vertical services.

Management

Cepstral Analysis Based Artifact Detection, Recognition and Removal for Prefrontal EEG

no code implementations12 Apr 2024 Siqi Han, Chao Zhang, Jiaxin Lei, Qingquan Han, Yuhui Du, Anhe Wang, Shuo Bai, Milin Zhang

The proposed method achieves an accuracy of 99. 62% on the artifact detection task and a 82. 79% accuracy on the 6-category eye movement classification task.

Artifact Detection EEG +1

Empowering Image Recovery_ A Multi-Attention Approach

no code implementations6 Apr 2024 Juan Wen, Yawei Li, Chao Zhang, Weiyan Hou, Radu Timofte, Luc van Gool

Integration of attention mechanisms across feature and positional dimensions further enhances the recovery of fine details.

Image Restoration

Semantic Map-based Generation of Navigation Instructions

1 code implementation28 Mar 2024 Chengzu Li, Chao Zhang, Simone Teufel, Rama Sanand Doddipatla, Svetlana Stoyanchev

In this paper, we propose a new approach to navigation instruction generation by framing the problem as an image captioning task using semantic maps as visual input.

Image Captioning

Leveraging Large Language Model to Generate a Novel Metaheuristic Algorithm with CRISPE Framework

1 code implementation25 Mar 2024 Rui Zhong, Yuefeng Xu, Chao Zhang, Jun Yu

In this paper, we borrow the large language model (LLM) ChatGPT-3. 5 to automatically and quickly design a new metaheuristic algorithm (MA) with only a small amount of input.

Language Modeling Language Modelling +2

M$^3$AV: A Multimodal, Multigenre, and Multipurpose Audio-Visual Academic Lecture Dataset

no code implementations21 Mar 2024 Zhe Chen, Heyang Liu, Wenyi Yu, Guangzhi Sun, Hongcheng Liu, Ji Wu, Chao Zhang, Yu Wang, Yanfeng Wang

Although multiple academic video datasets have been constructed and released, few of them support both multimodal content recognition and understanding tasks, which is partially due to the lack of high-quality human annotations.

Diversity Script Generation +3

ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models

1 code implementation17 Mar 2024 Yuzhao Heng, Chunyuan Deng, Yitong Li, Yue Yu, Yinghao Li, Rongzhi Zhang, Chao Zhang

Although Large Language Models (LLMs) exhibit remarkable adaptability across domains, these models often fall short in structured knowledge extraction tasks such as named entity recognition (NER).

Attribute named-entity-recognition +2

Efficient Multiplayer Battle Game Optimizer for Adversarial Robust Neural Architecture Search

1 code implementation15 Mar 2024 Rui Zhong, Yuefeng Xu, Chao Zhang, Jun Yu

This paper introduces a novel metaheuristic algorithm, known as the efficient multiplayer battle game optimizer (EMBGO), specifically designed for addressing complex numerical optimization tasks.

Neural Architecture Search

DiaLoc: An Iterative Approach to Embodied Dialog Localization

no code implementations CVPR 2024 Chao Zhang, Mohan Li, Ignas Budvytis, Stephan Liwicki

However, most existing works in embodied dialog research focus on navigation and leave the localization task understudied.

NoteLLM: A Retrievable Large Language Model for Note Recommendation

no code implementations4 Mar 2024 Chao Zhang, Shiwei Wu, Haoxin Zhang, Tong Xu, Yan Gao, Yao Hu, Di wu, Enhong Chen

Indeed, learning to generate hashtags/categories can potentially enhance note embeddings, both of which compress key note information into limited content.

Contrastive Learning Language Modeling +2

APISR: Anime Production Inspired Real-World Anime Super-Resolution

1 code implementation CVPR 2024 Boyang Wang, Fengyu Yang, Xihang Yu, Chao Zhang, Hanbin Zhao

In addition, we identify two anime-specific challenges of distorted and faint hand-drawn lines and unwanted color artifacts.

Super-Resolution

Accelerating materials discovery for polymer solar cells: Data-driven insights enabled by natural language processing

1 code implementation29 Feb 2024 Pranav Shetty, Aishat Adeboye, Sonakshi Gupta, Chao Zhang, Rampi Ramprasad

We present a simulation of various active learning strategies for the discovery of polymer solar cell donor/acceptor pairs using data extracted from the literature spanning $\sim$20 years by a natural language processing pipeline.

Active Learning

Diffusion Models as Constrained Samplers for Optimization with Unknown Constraints

no code implementations28 Feb 2024 Lingkai Kong, Yuanqi Du, Wenhao Mu, Kirill Neklyudov, Valentin De Bortoli, Dongxia Wu, Haorui Wang, Aaron Ferber, Yi-An Ma, Carla P. Gomes, Chao Zhang

To constrain the optimization process to the data manifold, we reformulate the original optimization problem as a sampling problem from the product of the Boltzmann distribution defined by the objective function and the data distribution learned by the diffusion model.

CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision

2 code implementations26 Feb 2024 Hao Wang, Zeyu Gao, Chao Zhang, Zihan Sha, Mingyang Sun, Yuchen Zhou, Wenyu Zhu, Wenju Sun, Han Qiu, Xi Xiao

At the core, our approach boosts superior transfer learning capabilities by effectively aligning binary code with their semantics explanations (in natural language), resulting a model able to generate better embeddings for binary code.

Representation Learning Transfer Learning

ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling

no code implementations21 Feb 2024 Lingxi Zhang, Yue Yu, Kuan Wang, Chao Zhang

Retrieval-augmented generation enhances large language models (LLMs) by incorporating relevant information from external knowledge sources.

MMLU Retrieval +2

A Simple but Effective Approach to Improve Structured Language Model Output for Information Extraction

1 code implementation20 Feb 2024 Yinghao Li, Rampi Ramprasad, Chao Zhang

It breaks the generation into a two-step pipeline: initially, LLMs generate answers in natural language as intermediate responses.

Language Modeling Language Modelling +5

BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models

1 code implementation13 Feb 2024 Haotian Sun, Yuchen Zhuang, Wei Wei, Chao Zhang, Bo Dai

BBox-Adapter distinguishes target and source domain data by treating target data as positive and source data as negative.

Neural Sinkhorn Gradient Flow

no code implementations25 Jan 2024 Huminhao Zhu, Fangyikang Wang, Chao Zhang, Hanbin Zhao, Hui Qian

We utilize the velocity field matching training scheme in NSGF, which only requires samples from the source and target distribution to compute an empirical velocity field approximation.

LightSleepNet: Design of a Personalized Portable Sleep Staging System Based on Single-Channel EEG

no code implementations24 Jan 2024 Yiqiao Liao, Chao Zhang, Milin Zhang, Zhihua Wang, Xiang Xie

This paper proposed LightSleepNet - a light-weight, 1-d Convolutional Neural Network (CNN) based personalized architecture for real-time sleep staging, which can be implemented on various mobile platforms with limited hardware resources.

EEG Sleep Staging +1

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

1 code implementation19 Jan 2024 Yuchen Hu, Chen Chen, Chao-Han Huck Yang, Ruizhe Li, Chao Zhang, Pin-Yu Chen, EnSiong Chng

To this end, we propose to extract a language-space noise embedding from the N-best list to represent the noise conditions of source speech, which can promote the denoising process in GER.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis

no code implementations19 Jan 2024 Chao Zhang, YUREN MAO, Yijiang Fan, Yu Mi, Yunjun Gao, Lu Chen, Dongfang Lou, Jinshu Lin

Text-to-SQL, which provides zero-code interface for operating relational databases, has gained much attention in financial analysis; because, financial professionals may not well-skilled in SQL programming.

Financial Analysis Language Modelling +3

Misconfidence-based Demonstration Selection for LLM In-Context Learning

no code implementations12 Jan 2024 Shangqing Xu, Chao Zhang

In each step, it analyzes a pool of candidate examples and identifies the ones most likely to challenge the LLM's current understanding, measured by a new metric called misconfidence.

In-Context Learning

Multi-Channel Multi-Domain based Knowledge Distillation Algorithm for Sleep Staging with Single-Channel EEG

no code implementations7 Jan 2024 Chao Zhang, Yiqiao Liao, Siqi Han, Milin Zhang, Zhihua Wang, Xiang Xie

The proposed algorithm achieves a state-of-the-art single-channel sleep staging accuracy of 86. 5%, with only 0. 6% deterioration from the state-of-the-art multi-channel model.

EEG Knowledge Distillation +1

A Closed-loop Brain-Machine Interface SoC Featuring a 0.2$μ$J/class Multiplexer Based Neural Network

no code implementations7 Jan 2024 Chao Zhang, Yongxiang Guo, Dawid Sheng, Zhixiong Ma, Chao Sun, Yuwei Zhang, Wenxin Zhao, Fenyan Zhang, Tongfei Wang, Xing Sheng, Milin Zhang

This work presents the first fabricated electrophysiology-optogenetic closed-loop bidirectional brain-machine interface (CL-BBMI) system-on-chip (SoC) with electrical neural signal recording, on-chip sleep staging and optogenetic stimulation.

Sleep Staging

3DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding

1 code implementation6 Jan 2024 Zeju Li, Chao Zhang, Xiaoyan Wang, Ruilong Ren, Yifan Xu, Ruifei Ma, Xiangde Liu

The remarkable potential of multi-modal large language models (MLLMs) in comprehending both vision and language information has been widely acknowledged.

Scene Understanding Visual Question Answering (VQA)

Towards Modeling Uncertainties of Self-explaining Neural Networks via Conformal Prediction

no code implementations3 Jan 2024 Wei Qian, Chenxu Zhao, Yangyi Li, Fenglong Ma, Chao Zhang, Mengdi Huai

To tackle the aforementioned challenges, in this paper, we design a novel uncertainty modeling framework for self-explaining networks, which not only demonstrates strong distribution-free uncertainty modeling performance for the generated explanations in the interpretation layer but also excels in producing efficient and effective prediction sets for the final predictions based on the informative high-level basis explanations.

Conformal Prediction Prediction +1

Multiplayer Battle Game-Inspired Optimizer for Complex Optimization Problems

no code implementations31 Dec 2023 Yuefeng Xu, Rui Zhong, Chao Zhang, Jun Yu

Various popular multiplayer battle royale games share a lot of common elements.

Diversity

Large Language Models for Generative Information Extraction: A Survey

1 code implementation29 Dec 2023 Derong Xu, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, Yang Wang, Enhong Chen

Information extraction (IE) aims to extract structural knowledge from plain natural language texts.

Survey

GAD-PVI: A General Accelerated Dynamic-Weight Particle-Based Variational Inference Framework

no code implementations27 Dec 2023 Fangyikang Wang, Huminhao Zhu, Chao Zhang, Hanbin Zhao, Hui Qian

Particle-based Variational Inference (ParVI) methods approximate the target distribution by iteratively evolving finite weighted particle systems.

Position Variational Inference

A Joint Multi-Gradient Algorithm for Demosaicing Bayer Images

no code implementations International Conference on Communication, Image and Signal Processing (CCISP) 2023 Di wu, Zhihui Xin, Chao Zhang

Experiments show that the algorithm in this paper has better recovery in image edges as well as texture complex regions with higher PSNR and SSIM values and better subjective visual perception compared to the traditional gradient algorithms such as BI, Cok, Hibbard, Laroche, Hamiton, while the algorithm involves only the add-subtract and shift operations, which is suitable to be implemented on the hardware platform.

Demosaicking SSIM

Multilevel Saliency-Guided Self-Supervised Learning for Image Anomaly Detection

no code implementations30 Nov 2023 Jianjian Qin, Chunzhi Gu, Jun Yu, Chao Zhang

To fully exploit saliency guidance, on each map, we select a pixel pair from the cluster with the highest centroid saliency to form a patch pair.

Anomaly Detection Self-Supervised Learning

LanGWM: Language Grounded World Model

no code implementations29 Nov 2023 Rudra P. K. Poudel, Harit Pandya, Chao Zhang, Roberto Cipolla

Furthermore, our proposed technique of explicit language-grounded visual representation learning has the potential to improve models for human-robot interaction because our extracted visual features are language grounded.

Deep Reinforcement Learning model +4

How Far Have We Gone in Vulnerability Detection Using Large Language Models

1 code implementation21 Nov 2023 Zeyu Gao, Hao Wang, Yuchen Zhou, Wenyu Zhu, Chao Zhang

Given the significant successes of large language models (LLMs) in various tasks, there is growing anticipation of their efficacy in vulnerability detection.

Vulnerability Detection

Data Diversity Matters for Robust Instruction Tuning

no code implementations21 Nov 2023 Alexander Bukharin, Shiyang Li, Zhengyang Wang, Jingfeng Yang, Bing Yin, Xian Li, Chao Zhang, Tuo Zhao, Haoming Jiang

QDIT provides a simple method to simultaneously control dataset diversity and quality, allowing us to conduct an in-depth study on the effect of diversity and quality on instruction tuning performance.

Diversity Instruction Following

Speech-based Slot Filling using Large Language Models

no code implementations13 Nov 2023 Guangzhi Sun, Shutong Feng, Dongcheng Jiang, Chao Zhang, Milica Gašić, Philip C. Woodland

Recently, advancements in large language models (LLMs) have shown an unprecedented ability across various language tasks.

In-Context Learning slot-filling +1

Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning

no code implementations13 Nov 2023 Yue Yu, Jiaming Shen, Tianqi Liu, Zhen Qin, Jing Nathan Yan, Jialu Liu, Chao Zhang, Michael Bendersky

To fully unleash the power of explanations, we propose EASE, an Explanation-Aware Soft Ensemble framework to empower in-context learning with LLMs.

In-Context Learning Language Modeling +3

Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study

1 code implementation13 Nov 2023 Yinghao Li, Haorui Wang, Chao Zhang

Large Language Models (LLMs) have shown remarkable proficiency in language understanding and have been successfully applied to a variety of real-world tasks through task-specific fine-tuning or prompt engineering.

Logical Reasoning Prompt Engineering

PolyIE: A Dataset of Information Extraction from Polymer Material Scientific Literature

1 code implementation13 Nov 2023 Jerry Junyang Cheung, Yuchen Zhuang, Yinghao Li, Pranav Shetty, Wantian Zhao, Sanjeev Grampurohit, Rampi Ramprasad, Chao Zhang

Scientific information extraction (SciIE), which aims to automatically extract information from scientific literature, is becoming more important than ever.

Relation Extraction

Image-Pointcloud Fusion based Anomaly Detection using PD-REAL Dataset

no code implementations7 Nov 2023 Jianjian Qin, Chunzhi Gu, Jun Yu, Chao Zhang

We present PD-REAL, a novel large-scale dataset for unsupervised anomaly detection (AD) in the 3D domain.

Unsupervised Anomaly Detection

Improving MIMO channel estimation via receive power feedback

no code implementations1 Nov 2023 Chao Zhang, Hang Zou, Samson Lasaulce, Lucas Saludjian

Estimating the channel state is known to be an important problem in wireless networks.

Orientation-Aware Leg Movement Learning for Action-Driven Human Motion Prediction

no code implementations23 Oct 2023 Chunzhi Gu, Chao Zhang, Shigeru Kuriyama

Specifically, we follow a two-stage forecasting strategy by first employing the motion diffusion model to generate the target motion with a specified future action, and then producing the in-betweening to smoothly connect the observation and prediction to eventually address motion prediction.

Human motion prediction motion prediction +1

ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search

no code implementations20 Oct 2023 Yuchen Zhuang, Xiang Chen, Tong Yu, Saayan Mitra, Victor Bursztyn, Ryan A. Rossi, Somdeb Sarkhel, Chao Zhang

It formulates the entire action space as a decision tree, where each node represents a possible API function call involved in a solution plan.

Decision Making valid

SALMONN: Towards Generic Hearing Abilities for Large Language Models

1 code implementation20 Oct 2023 Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Hearing is arguably an essential ability of artificial intelligence (AI) agents in the physical world, which refers to the perception and understanding of general auditory information consisting of at least three types of sounds: speech, audio events, and music.

Audio captioning Automatic Speech Recognition +10

When Rigidity Hurts: Soft Consistency Regularization for Probabilistic Hierarchical Time Series Forecasting

1 code implementation17 Oct 2023 Harshavardhan Kamarthi, Lingkai Kong, Alexander Rodríguez, Chao Zhang, B. Aditya Prakash

We close both these gap and propose PROFHiT, which is a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy.

Time Series Time Series Forecasting

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

2 code implementations9 Oct 2023 Guangzhi Sun, Wenyi Yu, Changli Tang, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Audio-visual large language models (LLM) have drawn significant attention, yet the fine-grained combination of both input streams is rather under-explored, which is challenging but necessary for LLMs to understand general video inputs.

Question Answering Video Question Answering

Conditional Diffusion Model for Target Speaker Extraction

no code implementations7 Oct 2023 Theodor Nguyen, Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C Woodland

For the reverse-time process, a parametrised score function is conditioned on a target speaker embedding to extract the target speaker from the mixture of sources.

model Target Speaker Extraction

Transferring speech-generic and depression-specific knowledge for Alzheimer's disease detection

no code implementations6 Oct 2023 Ziyun Cui, Wen Wu, Wei-Qiang Zhang, Ji Wu, Chao Zhang

Apart from the knowledge from speech-generic representations, this paper also proposes to simultaneously transfer the knowledge from a speech depression detection task based on the high comorbidity rates of depression and AD.

Alzheimer's Disease Detection Depression Detection +1

Joint Projection Learning and Tensor Decomposition Based Incomplete Multi-view Clustering

1 code implementation6 Oct 2023 Wei Lv, Chao Zhang, Huaxiong Li, Xiuyi Jia, Chunlin Chen

We further consider the graph noise of projected data caused by missing samples and use a tensor-decomposition based graph filter for robust clustering. JPLTD decomposes the original tensor into an intrinsic tensor and a sparse tensor.

Clustering Incomplete multi-view clustering +1

Multi-Dimension-Embedding-Aware Modality Fusion Transformer for Psychiatric Disorder Clasification

no code implementations4 Oct 2023 Guoxin Wang, Xuyang Cao, Shan An, Fengmei Fan, Chao Zhang, Jinsong Wang, Feng Yu, Zhiren Wang

In this work, we proposed a multi-dimension-embedding-aware modality fusion transformer (MFFormer) for schizophrenia and bipolar disorder classification using rs-fMRI and T1 weighted structural MRI (T1w sMRI).

Functional Connectivity Time Series

Adapting LLM Agents with Universal Feedback in Communication

no code implementations1 Oct 2023 Kuan Wang, Yadong Lu, Michael Santacroce, Yeyun Gong, Chao Zhang, Yelong Shen

To optimize agent interactions for task-specific learning with our universal buffer and pipeline, we introduce diverse communication patterns tailored for both single-agent and multi-agent environments.

Decision Making GSM8K

It HAS to be Subjective: Human Annotator Simulation via Zero-shot Density Estimation

1 code implementation30 Sep 2023 Wen Wu, Wenlin Chen, Chao Zhang, Philip C. Woodland

Human annotator simulation (HAS) serves as a cost-effective substitute for human evaluation such as data annotation and system assessment.

Density Estimation Meta-Learning

Subspace-Guided Feature Reconstruction for Unsupervised Anomaly Localization

no code implementations25 Sep 2023 Katsuya Hotta, Chao Zhang, Yoshihiro Hagihara, Takuya Akashi

In this paper, we propose a novel subspace-guided feature reconstruction framework to pursue adaptive feature approximation for anomaly localization.

Anomaly Localization

Connecting Speech Encoder and Large Language Model for ASR

no code implementations25 Sep 2023 Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao Zhang

Q-Former-based LLMs can generalise well to out-of-domain datasets, where 12% relative WER reductions over the Whisper baseline ASR model were achieved on the Eval2000 test set without using any in-domain training data from Switchboard.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Enhancing Quantised End-to-End ASR Models via Personalisation

1 code implementation17 Sep 2023 Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

Recent end-to-end automatic speech recognition (ASR) models have become increasingly larger, making them particularly challenging to be deployed on resource-constrained devices.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Multi-In and Multi-Out Dendritic Neuron Model and its Optimization

no code implementations14 Sep 2023 Yu Ding, Jun Yu, Chunzhi Gu, Shangce Gao, Chao Zhang

Recently, a novel mathematical ANN model, known as the dendritic neuron model (DNM), has been proposed to address nonlinear problems by more accurately reflecting the structure of real neurons.

Multi-class Classification

Can Whisper perform speech-based in-context learning?

no code implementations13 Sep 2023 Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao Zhang

Language-level adaptation experiments using Chinese dialects showed that when applying SICL to isolated word ASR, consistent and considerable relative WER reductions can be achieved using Whisper models of any size on two dialects, which is on average 32. 3%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

RAIN: Your Language Models Can Align Themselves without Finetuning

1 code implementation13 Sep 2023 Yuhui Li, Fangyun Wei, Jinjing Zhao, Chao Zhang, Hongyang Zhang

We discover that by integrating self-evaluation and rewind mechanisms, unaligned LLMs can directly produce responses consistent with human preferences via self-boosting.

Adversarial Attack TruthfulQA

AGMDT: Virtual Staining of Renal Histology Images with Adjacency-Guided Multi-Domain Transfer

no code implementations12 Sep 2023 Tao Ma, Chao Zhang, Min Lu, Lin Luo

Renal pathology, as the gold standard of kidney disease diagnosis, requires doctors to analyze a series of tissue slices stained by H&E staining and special staining like Masson, PASM, and PAS, respectively.

Graph Matching Style Transfer +1

Cross-Utterance Conditioned VAE for Speech Generation

no code implementations8 Sep 2023 Yang Li, Cheng Yu, Guangzhi Sun, Weiqin Zu, Zheng Tian, Ying Wen, Wei Pan, Chao Zhang, Jun Wang, Yang Yang, Fanglei Sun

Experimental results on the LibriTTS datasets demonstrate that our proposed models significantly enhance speech synthesis and editing, producing more natural and expressive speech.

Speech Synthesis Text to Speech

PolyGET: Accelerating Polymer Simulations by Accurate and Generalizable Forcefield with Equivariant Transformer

no code implementations1 Sep 2023 Rui Feng, Huan Tran, Aubrey Toland, Binghong Chen, Qi Zhu, Rampi Ramprasad, Chao Zhang

Machine learning (ML) forcefields have been developed to achieve both the accuracy of ab initio methods and the efficiency of empirical force fields.

Situated Natural Language Explanations

no code implementations27 Aug 2023 Zining Zhu, Haoming Jiang, Jingfeng Yang, Sreyashi Nag, Chao Zhang, Jie Huang, Yifan Gao, Frank Rudzicz, Bing Yin

Situated NLE provides a perspective and facilitates further research on the generation and evaluation of explanations.

Prompt Engineering

kTrans: Knowledge-Aware Transformer for Binary Code Embedding

1 code implementation24 Aug 2023 Wenyu Zhu, Hao Wang, Yuchen Zhou, JiaMing Wang, Zihan Sha, Zeyu Gao, Chao Zhang

By feeding explicit knowledge as additional inputs to the Transformer, and fusing implicit knowledge with a novel pre-training task, kTrans provides a new perspective to incorporating domain knowledge into a Transformer framework.

Outlier Detection

Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations

1 code implementation14 Aug 2023 Wen Wu, Chao Zhang, Philip C. Woodland

Two metrics are proposed to evaluate AER performance with automatic segmentation based on time-weighted emotion and speaker classification errors.

Action Detection Activity Detection +4

One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training

1 code implementation ICCV 2023 Jianshuo Dong, Han Qiu, Yiming Li, Tianwei Zhang, Yuanjie Li, Zeqi Lai, Chao Zhang, Shu-Tao Xia

We propose a training-assisted bit flip attack, in which the adversary is involved in the training stage to build a high-risk model to release.

All

DF2: Distribution-Free Decision-Focused Learning

no code implementations11 Aug 2023 Lingkai Kong, Wenhao Mu, Jiaming Cui, Yuchen Zhuang, B. Aditya Prakash, Bo Dai, Chao Zhang

However, existing end-to-end DFL methods are hindered by three significant bottlenecks: model mismatch error, sample average approximation error, and gradient approximation error.

Revisiting DETR Pre-training for Object Detection

no code implementations2 Aug 2023 Yan Ma, Weicong Liang, Bohan Chen, Yiduo Hao, BoJian Hou, Xiangyu Yue, Chao Zhang, Yuhui Yuan

Motivated by the remarkable achievements of DETR-based approaches on COCO object detection and segmentation benchmarks, recent endeavors have been directed towards elevating their performance through self-supervised pre-training of Transformers while preserving a frozen backbone.

Image to text Object +2

Graph Neural Networks for Forecasting Multivariate Realized Volatility with Spillover Effects

no code implementations1 Aug 2023 Chao Zhang, Xingyue Pu, Mihai Cucuringu, Xiaowen Dong

We present a novel methodology for modeling and forecasting multivariate realized volatilities using customized graph neural networks to incorporate spillover effects across stocks.

Understanding Deep Neural Networks via Linear Separability of Hidden Layers

no code implementations26 Jul 2023 Chao Zhang, Xinyu Chen, Wensheng Li, Lixue Liu, Wei Wu, DaCheng Tao

In this paper, we measure the linear separability of hidden layer outputs to study the characteristics of deep neural networks.

Autoregressive Diffusion Model for Graph Generation

1 code implementation17 Jul 2023 Lingkai Kong, Jiaming Cui, Haotian Sun, Yuchen Zhuang, B. Aditya Prakash, Chao Zhang

However, existing diffusion-based graph generative models are mostly one-shot generative models that apply Gaussian diffusion in the dequantized adjacency matrix space.

Denoising Graph Generation +1

C3: Zero-shot Text-to-SQL with ChatGPT

1 code implementation14 Jul 2023 XueMei Dong, Chao Zhang, Yuhang Ge, YUREN MAO, Yunjun Gao, Lu Chen, Jinshu Lin, Dongfang Lou

This paper proposes a ChatGPT-based zero-shot Text-to-SQL method, dubbed C3, which achieves 82. 3\% in terms of execution accuracy on the holdout test set of Spider and becomes the state-of-the-art zero-shot Text-to-SQL method on the Spider Challenge.

Text-To-SQL

Knowledge-Aware Audio-Grounded Generative Slot Filling for Limited Annotated Data

no code implementations4 Jul 2023 Guangzhi Sun, Chao Zhang, Ivan Vulić, Paweł Budzianowski, Philip C. Woodland

In this work, we propose a Knowledge-Aware Audio-Grounded generative slot-filling framework, termed KA2G, that focuses on few-shot and zero-shot slot filling for ToD with speech input.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Towards Optimal Randomized Strategies in Adversarial Example Game

no code implementations29 Jun 2023 Jiahao Xie, Chao Zhang, Weijie Liu, Wensong Bai, Hui Qian

The vulnerability of deep neural network models to adversarial example attacks is a practical challenge in many artificial intelligence applications.

A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference

1 code implementation26 Jun 2023 Chao Zhang, Shiwei Wu, Sirui Zhao, Tong Xu, Enhong Chen

In this paper, we present a solution for enhancing video alignment to improve multi-step inference.

Video Alignment

G-STO: Sequential Main Shopping Intention Detection via Graph-Regularized Stochastic Transformer

no code implementations25 Jun 2023 Yuchen Zhuang, Xin Shen, Yan Zhao, Chaosheng Dong, Ming Wang, Jin Li, Chao Zhang

The detection of the underlying shopping intentions of users based on their historical interactions is a crucial aspect for e-commerce platforms, such as Amazon, to enhance the convenience and efficiency of their customers' shopping experiences.

Sequential Recommendation

ToolQA: A Dataset for LLM Question Answering with External Tools

2 code implementations NeurIPS 2023 Yuchen Zhuang, Yue Yu, Kuan Wang, Haotian Sun, Chao Zhang

To address this issue, we introduce a new dataset called ToolQA, which is designed to faithfully evaluate LLMs' ability to use external tools for question answering.

Hallucination Question Answering

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation

1 code implementation15 Jun 2023 Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang, Chao Zhang, Xie Chen

Our models outperform other SSL models significantly on the LibriSpeech benchmark without the need for iterative re-clustering and re-training.

Automatic Speech Recognition Clustering +5

MUBen: Benchmarking the Uncertainty of Molecular Representation Models

2 code implementations14 Jun 2023 Yinghao Li, Lingkai Kong, Yuanqi Du, Yue Yu, Yuchen Zhuang, Wenhao Mu, Chao Zhang

While some studies have included UQ to improve molecular pre-trained models, the process of selecting suitable backbone and UQ methods for reliable molecular uncertainty estimation remains underexplored.

Benchmarking Drug Discovery +4

PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm

no code implementations11 Jun 2023 Wensong Bai, Chao Zhang, Yichao Fu, Peilin Zhao, Hui Qian, Bin Dai

As a result, PACER fully utilizes the modeling capability of the push-forward operator and is able to explore a broader class of the policy space, compared with limited policy classes used in existing distributional actor critic algorithms (i. e. Gaussians).

Continuous Control Distributional Reinforcement Learning +3

Estimating the Uncertainty in Emotion Attributes using Deep Evidential Regression

1 code implementation11 Jun 2023 Wen Wu, Chao Zhang, Philip C. Woodland

In automatic emotion recognition (AER), labels assigned by different human annotators to the same utterance are often inconsistent due to the inherent complexity of emotion and the subjectivity of perception.

Attribute Emotion Recognition +1

FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow

no code implementations8 Jun 2023 Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Yijin Li, Hongwei Qin, Jifeng Dai, Xiaogang Wang, Hongsheng Li

This paper introduces a novel transformer-based network architecture, FlowFormer, along with the Masked Cost Volume AutoEncoding (MCVA) for pretraining it to tackle the problem of optical flow estimation.

Decoder Optical Flow Estimation

Local Boosting for Weakly-Supervised Learning

no code implementations5 Jun 2023 Rongzhi Zhang, Yue Yu, Jiaming Shen, Xiquan Cui, Chao Zhang

In this work, we show that the standard implementation of the convex combination of base learners can hardly work due to the presence of noisy labels.

Weakly-supervised Learning

Can Contextual Biasing Remain Effective with Whisper and GPT-2?

1 code implementation2 Jun 2023 Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C. Woodland

End-to-end automatic speech recognition (ASR) and large language models, such as Whisper and GPT-2, have recently been scaled to use vast amounts of training data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

DyGen: Learning from Noisy Labels via Dynamics-Enhanced Generative Modeling

1 code implementation30 May 2023 Yuchen Zhuang, Yue Yu, Lingkai Kong, Xiang Chen, Chao Zhang

Most existing methods for learning from noisy labels use static input features for denoising, but these methods are limited by the information they can provide on true label distributions and can result in biased or incorrect predictions.

Denoising

Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator

1 code implementation30 May 2023 Guangzhi Sun, Chao Zhang, Phil Woodland

The incorporation of biasing words obtained through contextual knowledge is of paramount importance in automatic speech recognition (ASR) applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Graph Reasoning for Question Answering with Triplet Retrieval

no code implementations30 May 2023 Shiyang Li, Yifan Gao, Haoming Jiang, Qingyu Yin, Zheng Li, Xifeng Yan, Chao Zhang, Bing Yin

State-of-the-art methods often utilize entities in questions to retrieve local subgraphs, which are then fed into KG encoder, e. g. graph neural networks (GNNs), to model their local structures and integrated into language models for question answering.

Knowledge Graphs Question Answering +2

AdaPlanner: Adaptive Planning from Feedback with Language Models

1 code implementation NeurIPS 2023 Haotian Sun, Yuchen Zhuang, Lingkai Kong, Bo Dai, Chao Zhang

We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback.

Decision Making Hallucination +1

Extracting Shopping Interest-Related Product Types from the Web

no code implementations23 May 2023 Yinghao Li, Colin Lockard, Prashant Shiralkar, Chao Zhang

To establish such connections, we propose to extract PTs from the Web pages containing hand-crafted PT recommendations for SIs.

Node Classification

Self-supervised representations in speech-based depression detection

no code implementations20 May 2023 Wen Wu, Chao Zhang, Philip C. Woodland

This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

CCGen: Explainable Complementary Concept Generation in E-Commerce

no code implementations19 May 2023 Jie Huang, Yifan Gao, Zheng Li, Jingfeng Yang, Yangqiu Song, Chao Zhang, Zining Zhu, Haoming Jiang, Kevin Chen-Chuan Chang, Bing Yin

We propose and study Complementary Concept Generation (CCGen): given a concept of interest, e. g., "Digital Cameras", generating a list of complementary concepts, e. g., 1) Camera Lenses 2) Batteries 3) Camera Cases 4) Memory Cards 5) Battery Chargers.

Cannot find the paper you are looking for? You can Submit a new open access paper.