Search Results for author: Yu Zhang

Found 517 papers, 159 papers with code

A Coarse-to-Fine Labeling Framework for Joint Word Segmentation, POS Tagging, and Constituent Parsing

1 code implementation CoNLL (EMNLP) 2021 Yang Hou, Houquan Zhou, Zhenghua Li, Yu Zhang, Min Zhang, Zhefeng Wang, Baoxing Huai, Nicholas Jing Yuan

In the coarse labeling stage, the joint model outputs a bracketed tree, in which each node corresponds to one of four labels (i. e., phrase, subphrase, word, subword).

Part-Of-Speech Tagging POS +2

Learning to See in the Dark with Events

no code implementations ECCV 2020 Song Zhang, Yu Zhang, Zhe Jiang, Dongqing Zou, Jimmy Ren, Bin Zhou

A detail enhancing branch is proposed to reconstruct day light-specific features from the domain-invariant representations in a residual manner, regularized by a ranking loss.

Representation Learning Unsupervised Domain Adaptation

\textrm{DuReader}_{\textrm{vis}}: A Chinese Dataset for Open-domain Document Visual Question Answering

1 code implementation Findings (ACL) 2022 Le Qi, Shangwen Lv, Hongyu Li, Jing Liu, Yu Zhang, Qiaoqiao She, Hua Wu, Haifeng Wang, Ting Liu

Open-domain question answering has been used in a wide range of applications, such as web search and enterprise search, which usually takes clean texts extracted from various formats of documents (e. g., web pages, PDFs, or Word documents) as the information source.

document understanding Open-Domain Question Answering +1

All Information is Valuable: Question Matching over Full Information Transmission Network

no code implementations Findings (NAACL) 2022 Le Qi, Yu Zhang, Qingyu Yin, Guidong Zheng, Wen Junjie, Jinlong Li, Ting Liu

In this process, there are two kinds of critical information that are commonly employed: the representation information of original questions and the interactive information between pairs of questions.

Learn to Cross-lingual Transfer with Meta Graph Learning Across Heterogeneous Languages

no code implementations EMNLP 2020 Zheng Li, Mukul Kumar, William Headden, Bing Yin, Ying WEI, Yu Zhang, Qiang Yang

Recent emergence of multilingual pre-training language model (mPLM) has enabled breakthroughs on various downstream cross-lingual transfer (CLT) tasks.

Cross-Lingual Transfer Graph Learning +1

Joint Goal Segmentation and Goal Success Prediction on Multi-Domain Conversations

no code implementations COLING 2022 Meiguo Wang, Benjamin Yao, Bin Guo, Xiaohu Liu, Yu Zhang, Tuan-Hung Pham, Chenlei Guo

To evaluate the performance of a multi-domain goal-oriented Dialogue System (DS), it is important to understand what the users’ goals are for the conversations and whether those goals are successfully achieved.

Dialogue Evaluation Multi-Task Learning +1

Scenario-Adaptive Fine-Grained Personalization Network: Tailoring User Behavior Representation to the Scenario Context

no code implementations15 Apr 2024 Moyu Zhang, Yongxiang Tang, Jinxin Hu, Yu Zhang

To enhance the model's capacity to capture user interests from historical behavior sequences in each scenario, we develop a ranking framework named the Scenario-Adaptive Fine-Grained Personalization Network (SFPNet), which designs a kind of fine-grained method for multi-scenario personalized recommendations.

Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs

1 code implementation10 Apr 2024 Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Suhang Wang, Yu Meng, Jiawei Han

Then, we propose a simple and effective framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.

CORP: A Multi-Modal Dataset for Campus-Oriented Roadside Perception Tasks

no code implementations4 Apr 2024 Beibei Wang, Lu Zhang, Shuang Meng, Chenjie Wang, Jingjing Huang, Yao Li, Haojie Ren, Yuxuan Xiao, Yuru Peng, Jianmin Ji, Yu Zhang, Yanyong Zhang

Numerous roadside perception datasets have been introduced to propel advancements in autonomous driving and intelligent transportation systems research and development.

Autonomous Driving Instance Segmentation +1

Addressing Heterogeneity in Federated Load Forecasting with Personalization Layers

no code implementations1 Apr 2024 Shourya Bose, Yu Zhang, Kibaek Kim

The advent of smart meters has enabled pervasive collection of energy consumption data for training short-term load forecasting models.

Federated Learning Load Forecasting +1

UAlign: Pushing the Limit of Template-free Retrosynthesis Prediction with Unsupervised SMILES Alignment

1 code implementation25 Mar 2024 Kaipeng Zeng, Bo Yang, Xin Zhao, Yu Zhang, Fan Nie, Xiaokang Yang, Yaohui Jin, Yanyan Xu

Single-step retrosynthesis prediction, a crucial step in the planning process, has witnessed a surge in interest in recent years due to advancements in AI for science.

Graph-to-Sequence molecular representation +3

A Survey on Consumer IoT Traffic: Security and Privacy

1 code implementation24 Mar 2024 Yan Jia, Yuxin Song, Zihou Liu, Qingyin Tan, Fangming Wang, Yu Zhang, Zheli Liu

From the security and privacy perspective, this survey seeks out the new characteristics in CIoT traffic analysis, the state-of-the-art progress in CIoT traffic analysis, and the challenges yet to be solved.

Task-Aware Low-Rank Adaptation of Segment Anything Model

no code implementations16 Mar 2024 Xuehao Wang, Feiyang Ye, Yu Zhang

Furthermore, we introduce modified SAM (mSAM) for multi-task learning where we remove the prompt encoder of SAM and use task-specific no mask embeddings and mask decoder for each task.

Image Segmentation Multi-Task Learning +2

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

no code implementations14 Mar 2024 Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-yan Yeung, James T. Kwok, Yu Zhang

Multimodal large language models (MLLMs) have shown impressive reasoning abilities, which, however, are also more vulnerable to jailbreak attacks than their LLM predecessors.

Optical Character Recognition (OCR)

Conditional Score-Based Diffusion Model for Cortical Thickness Trajectory Prediction

no code implementations11 Mar 2024 Qing Xiao, Siyeop Yoon, Hui Ren, Matthew Tivnan, Lichao Sun, Quanzheng Li, Tianming Liu, Yu Zhang, Xiang Li

Alzheimer's Disease (AD) is a neurodegenerative condition characterized by diverse progression rates among individuals, with changes in cortical thickness (CTh) closely linked to its progression.

Trajectory Prediction

SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting

1 code implementation8 Mar 2024 Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, Zeyu Wang

We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device.

Real-Time Adaptive Safety-Critical Control with Gaussian Processes in High-Order Uncertain Models

no code implementations29 Feb 2024 Yu Zhang, long wen, Xiangtong Yao, Zhenshan Bing, Linghuan Kong, wei he, Alois Knoll

Subsequently, the hyperparameters of the Gaussian model are trained with a specially compound kernel, and the Gaussian model's online inferential capability and computational efficiency are strengthened by updating a solitary inducing point derived from new samples, in conjunction with the learned hyperparameters.

Computational Efficiency Gaussian Processes

Online Efficient Safety-Critical Control for Mobile Robots in Unknown Dynamic Multi-Obstacle Environments

no code implementations26 Feb 2024 Yu Zhang, Guangyao Tian, long wen, Xiangtong Yao, Liding Zhang, Zhenshan Bing, wei he, Alois Knoll

This paper proposes a LiDAR-based goal-seeking and exploration framework, addressing the efficiency of online obstacle avoidance in unstructured environments populated with static and moving obstacles.

CFRet-DVQA: Coarse-to-Fine Retrieval and Efficient Tuning for Document Visual Question Answering

no code implementations26 Feb 2024 Jinxu Zhang, Yongqi Yu, Yu Zhang

Document Visual Question Answering (DVQA) is a task that involves responding to queries based on the content of images.

Language Modelling Large Language Model +3

A Unified Taxonomy-Guided Instruction Tuning Framework for Entity Set Expansion and Taxonomy Expansion

1 code implementation20 Feb 2024 Yanzhen Shen, Yu Zhang, Yunyi Zhang, Jiawei Han

Entity Set Expansion, Taxonomy Expansion, and Seed-Guided Taxonomy Construction are three representative tasks that can be used to automatically populate an existing taxonomy with new entities.

Language Modelling Large Language Model +1

SoLA: Solver-Layer Adaption of LLM for Better Logic Reasoning

no code implementations19 Feb 2024 Yu Zhang, Hui-Ling Zhen, Zehua Pei, Yingzhao Lian, Lihao Yin, Mingxuan Yuan, Bei Yu

In this paper, we propose a novel solver-layer adaptation (SoLA) method, where we introduce a solver as a new layer of the LLM to differentially guide solutions towards satisfiability.

Logical Reasoning

KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion

1 code implementation4 Feb 2024 Yanbin Wei, Qiushi Huang, James T. Kwok, Yu Zhang

Knowledge Graph Completion (KGC) is crucial for addressing knowledge graph incompleteness and supporting downstream applications.

In-Context Learning Language Modelling +1

Evaluating Large Language Models in Analysing Classroom Dialogue

no code implementations4 Feb 2024 Yun Long, Haifeng Luo, Yu Zhang

Recognizing the knowledge-intensive and labor-intensive nature of traditional qualitative methods in educational research, this study investigates the potential of LLM to streamline and enhance the analysis process.

Rendering Graphs for Graph Reasoning in Multimodal Large Language Models

no code implementations3 Feb 2024 Yanbin Wei, Shuai Fu, Weisen Jiang, James T. Kwok, Yu Zhang

In this paper, we take the first step in incorporating visual information into graph reasoning tasks and propose a new benchmark GITQA, where each sample is a tuple (graph, image, textual description).

Common Sense Reasoning Knowledge Graph Completion

iMove: Exploring Bio-impedance Sensing for Fitness Activity Recognition

no code implementations31 Jan 2024 Mengxi Liu, Vitor Fortes Rey, Yu Zhang, Lala Shakti Swarup Ray, Bo Zhou, Paul Lukowicz

While IMUs are currently the prominent fitness tracking modality, through iMove, we show bio-impedence can help improve IMU-based fitness tracking through sensor fusion and contrastive learning. To evaluate our methods, we conducted an experiment including six upper body fitness activities performed by ten subjects over five days to collect synchronized data from bio-impedance across two wrists and IMU on the left wrist. The contrastive learning framework uses the two modalities to train a better IMU-only classification model, where bio-impedance is only required at the training phase, by which the average Macro F1 score with the input of a single IMU was improved by 3. 22 \% reaching 84. 71 \% compared to the 81. 49 \% of the IMU baseline model.

Contrastive Learning Human Activity Recognition +1

Distribution-consistency Structural Causal Models

no code implementations29 Jan 2024 Heyang Gong, Chaochao Lu, Yu Zhang

In the field of causal modeling, potential outcomes (PO) and structural causal models (SCMs) stand as the predominant frameworks.

counterfactual Counterfactual Reasoning +1

Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains

1 code implementation23 Jan 2024 Yu Zhang, Yunyi Zhang, Yanzhen Shen, Yu Deng, Lucian Popa, Larisa Shwartz, ChengXiang Zhai, Jiawei Han

In this paper, we study the task of seed-guided fine-grained entity typing in science and engineering domains, which takes the name and a few seed entities for each entity type as the only supervision and aims to classify new entity mentions into both seen and unseen types (i. e., those without seed entities).

Entity Typing Natural Language Inference

HG3-NeRF: Hierarchical Geometric, Semantic, and Photometric Guided Neural Radiance Fields for Sparse View Inputs

no code implementations22 Jan 2024 Zelin Gao, Weichen Dai, Yu Zhang

We propose Hierarchical Geometric Guidance (HGG) to incorporate the attachment of Structure from Motion (SfM), namely sparse depth prior, into the scene representations.

Novel View Synthesis

SMUTF: Schema Matching Using Generative Tags and Hybrid Features

no code implementations22 Jan 2024 Yu Zhang, Mei Di, Haozheng Luo, Chenwei Xu, Richard Tzong-Han Tsai

Recognizing the lack of extensive, publicly available datasets for SM, we have created and open-sourced the HDXSM dataset from the public humanitarian data.

Feature Engineering Humanitarian

A First-Order Multi-Gradient Algorithm for Multi-Objective Bi-Level Optimization

no code implementations17 Jan 2024 Feiyang Ye, Baijiong Lin, Xiaofeng Cao, Yu Zhang, Ivor Tsang

In this paper, we study the Multi-Objective Bi-Level Optimization (MOBLO) problem, where the upper-level subproblem is a multi-objective optimization problem and the lower-level subproblem is for scalar optimization.

Multi-Task Learning

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

no code implementations13 Jan 2024 Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su, Tiezheng Ge, Jie Fu, Wenhu Chen, Bo Zheng

Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources.

4k Position

Scaling Laws And Statistical Properties of The Transaction Flows And Holding Times of Bitcoin

no code implementations9 Jan 2024 Didier Sornette, Yu Zhang

Defining age-dependent transaction flows as the fraction of bitcoins that are traded at a given time and that were born (last traded) at some specific earlier time, we document that the time-averaged transaction flow fraction has a power law dependence as a function of age, with an exponent close to $-1. 5$, a value compatible with priority queuing theory.

Online Test-Time Adaptation of Spatial-Temporal Traffic Flow Forecasting

1 code implementation8 Jan 2024 Pengxin Guo, Pengrong Jin, Ziyue Li, Lei Bai, Yu Zhang

To make the model trained on historical data better adapt to future data in a fully online manner, this paper conducts the first study of the online test-time adaptation techniques for spatial-temporal traffic flow forecasting problems.

Test-time Adaptation Traffic Prediction

VLLaVO: Mitigating Visual Gap through LLMs

1 code implementation6 Jan 2024 Shuhao Chen, Yulong Zhang, Weisen Jiang, Jiangang Lu, Yu Zhang

Recent advances achieved by deep learning models rely on the independent and identically distributed assumption, hindering their applications in real-world scenarios with domain shifts.

Domain Generalization Language Modelling +2

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

no code implementations19 Dec 2023 Yunhao Gou, Zhili Liu, Kai Chen, Lanqing Hong, Hang Xu, Aoxue Li, Dit-yan Yeung, James T. Kwok, Yu Zhang

Instruction tuning of Large Vision-language Models (LVLMs) has revolutionized the development of versatile models with zero-shot generalization across a wide range of downstream vision-language tasks.

Instruction Following Zero-shot Generalization

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

no code implementations17 Dec 2023 Yu Zhang, Rongjie Huang, RuiQi Li, Jinzheng He, Yan Xia, Feiyang Chen, Xinyu Duan, Baoxing Huai, Zhou Zhao

Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase.

Quantization Singing Voice Synthesis +1

Memory-Efficient Reversible Spiking Neural Networks

1 code implementation13 Dec 2023 Hong Zhang, Yu Zhang

In this paper, we propose the reversible spiking neural network to reduce the memory cost of intermediate activations and membrane potentials during training.

A Unified Framework for Unsupervised Domain Adaptation based on Instance Weighting

no code implementations8 Dec 2023 Jinjing Zhu, Feiyang Ye, Qiao Xiao, Pengxin Guo, Yu Zhang, Qiang Yang

Specifically, the proposed LIWUDA method constructs a weight network to assign weights to each instance based on its probability of belonging to common classes, and designs Weighted Optimal Transport (WOT) for domain alignment by leveraging instance weights.

Partial Domain Adaptation Universal Domain Adaptation +1

Lite-Mind: Towards Efficient and Robust Brain Representation Network

no code implementations6 Dec 2023 Zixuan Gong, Qi Zhang, Guangyin Bao, Lei Zhu, Yu Zhang, Ke Liu, Liang Hu, Duoqian Miao

The limited data availability and the low signal-to-noise ratio of fMRI signals lead to the challenging task of fMRI-to-image retrieval.

Brain Decoding Image Retrieval +2

Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors

1 code implementation2 Dec 2023 Yu Zhang, Songpengcheng Xia, Lei Chu, Jiarui Yang, Qi Wu, Ling Pei

This paper introduces a novel human pose estimation approach using sparse inertial sensors, addressing the shortcomings of previous methods reliant on synthetic data.

Pose Estimation

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers

no code implementations27 Nov 2023 Chenglin Yang, Siyuan Qiao, Yuan Cao, Yu Zhang, Tao Zhu, Alan Yuille, Jiahui Yu

To tackle this problem, we redesign the scoring objective for the captioner to alleviate the distributional bias and focus on measuring the gain of information brought by the visual inputs.

Caption Generation Language Modelling +2

Privacy-Preserving Load Forecasting via Personalized Model Obfuscation

no code implementations21 Nov 2023 Shourya Bose, Yu Zhang, Kibaek Kim

The widespread adoption of smart meters provides access to detailed and localized load consumption data, suitable for training building-level load forecasting models.

Federated Learning Load Forecasting +1

Spatio-Temporal Similarity Measure based Multi-Task Learning for Predicting Alzheimer's Disease Progression using MRI Data

no code implementations6 Nov 2023 Xulong Wang, Yu Zhang, Menghui Zhou, Tong Liu, Jun Qi, Po Yang

The experimental results show that compared with directly ROI based learning, our proposed method is more effective in predicting disease progression.

Multi-Task Learning

Signal Processing Meets SGD: From Momentum to Filter

no code implementations6 Nov 2023 Zhipeng Yao, Yu Zhang, Dazhou Li

To address this contradiction, we propose a novel optimization method that aims to accelerate the convergence rate of SGD without loss of generalization.

E3 TTS: Easy End-to-End Diffusion-based Text to Speech

no code implementations2 Nov 2023 Yuan Gao, Nobuyuki Morioka, Yu Zhang, Nanxin Chen

Instead, E3 TTS models the temporal structure of the waveform through the diffusion process.

"Why Should I Review This Paper?" Unifying Semantic, Topic, and Citation Factors for Paper-Reviewer Matching

no code implementations23 Oct 2023 Yu Zhang, Yanzhen Shen, Xiusi Chen, Bowen Jin, Jiawei Han

As many academic conferences are overwhelmed by a rapidly increasing number of paper submissions, automatically finding appropriate reviewers for each submission becomes a more urgent need than ever.

Information Retrieval Language Modelling +1

Machine Learning Methods for Background Potential Estimation in 2DEGs

no code implementations11 Oct 2023 Carlo da Cunha, Nobuyuki Aoki, David Ferry, Kevin Vora, Yu Zhang

In the realm of quantum-effect devices and materials, two-dimensional electron gases (2DEGs) stand as fundamental structures that promise transformative technologies.

Image-to-Image Translation

Non-autoregressive Text Editing with Copy-aware Latent Alignments

1 code implementation11 Oct 2023 Yu Zhang, Yue Zhang, Leyang Cui, Guohong Fu

In this work, we propose a novel non-autoregressive text editing method to circumvent the above issues, by modeling the edit process with latent CTC alignments.

Management Sentence +1

Ontology Enrichment for Effective Fine-grained Entity Typing

no code implementations11 Oct 2023 Siru Ouyang, Jiaxin Huang, Pranav Pillai, Yunyi Zhang, Yu Zhang, Jiawei Han

In this study, we propose OnEFET, where we (1) enrich each node in the ontology structure with two types of extra information: instance information for training sample augmentation and topic information to relate types to contexts, and (2) develop a coarse-to-fine typing algorithm that exploits the enriched information by training an entailment model with contrasting topics and instance-based augmented training samples.

Entity Typing

Learning Multiplex Embeddings on Text-rich Networks with One Text Encoder

no code implementations10 Oct 2023 Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Han Zhao, Jiawei Han

Mainstream text representation learning methods use pretrained language models (PLMs) to generate one embedding for each text unit, expecting that all types of relations between texts can be captured by these single-view embeddings.

Representation Learning

BYOM: Building Your Own Multi-Task Model For Free

no code implementations3 Oct 2023 Weisen Jiang, Baijiong Lin, Han Shi, Yu Zhang, Zhenguo Li, James T. Kwok

Recently, various merging methods have been proposed to build a multi-task model from task-specific finetuned models without retraining.

SSPFusion: A Semantic Structure-Preserving Approach for Infrared and Visible Image Fusion

no code implementations26 Sep 2023 Qiao Yang, Yu Zhang, Jian Zhang, Zijing Zhao, Shunli Zhang, Jinqiao Wang, Junzhe Chen

Most existing learning-based infrared and visible image fusion (IVIF) methods exhibit massive redundant information in the fusion images, i. e., yielding edge-blurring effect or unrecognizable for object detectors.

Infrared And Visible Image Fusion

IAIFNet: An Illumination-Aware Infrared and Visible Image Fusion Network

no code implementations26 Sep 2023 Qiao Yang, Yu Zhang, Jian Zhang, Zijing Zhao, Shunli Zhang, Jinqiao Wang, Junzhe Chen

Infrared and visible image fusion (IVIF) is used to generate fusion images with comprehensive features of both images, which is beneficial for downstream vision tasks.

Infrared And Visible Image Fusion

Adversarial Attacks on Video Object Segmentation with Hard Region Discovery

no code implementations25 Sep 2023 Ping Li, Yu Zhang, Li Yuan, Jian Zhao, Xianghua Xu, Xiaoqin Zhang

Particularly, the gradients from the segmentation model are exploited to discover the easily confused region, in which it is difficult to identify the pixel-wise objects from the background in a frame.

Autonomous Driving Object +5

Domain-Guided Conditional Diffusion Model for Unsupervised Domain Adaptation

no code implementations23 Sep 2023 Yulong Zhang, Shuhao Chen, Weisen Jiang, Yu Zhang, Jiangang Lu, James T. Kwok

However, the performance of existing UDA methods is constrained by the large domain shift and limited target domain data.

Unsupervised Domain Adaptation

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

1 code implementation21 Sep 2023 Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T. Kwok, Zhenguo Li, Adrian Weller, Weiyang Liu

Our MetaMath-7B model achieves 66. 4% on GSM8K and 19. 4% on MATH, exceeding the state-of-the-art models of the same size by 11. 5% and 8. 7%.

Ranked #53 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +4

Efficient Long-Short Temporal Attention Network for Unsupervised Video Object Segmentation

no code implementations21 Sep 2023 Ping Li, Yu Zhang, Li Yuan, Huaxin Xiao, Binbin Lin, Xianghua Xu

Unsupervised Video Object Segmentation (VOS) aims at identifying the contours of primary foreground objects in videos without any prior knowledge.

Semantic Segmentation Unsupervised Video Object Segmentation +1

Fully Transformer-Equipped Architecture for End-to-End Referring Video Object Segmentation

no code implementations21 Sep 2023 Ping Li, Yu Zhang, Li Yuan, Xianghua Xu

Referring Video Object Segmentation (RVOS) requires segmenting the object in video referred by a natural language query.

Object Referring Video Object Segmentation +4

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models

no code implementations14 Sep 2023 Guanlong Zhao, Yongqiang Wang, Jason Pelecanos, Yu Zhang, Hank Liao, Yiling Huang, Han Lu, Quan Wang

We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages.

Change Detection

StereoFlowGAN: Co-training for Stereo and Flow with Unsupervised Domain Adaptation

no code implementations4 Sep 2023 Zhexiao Xiong, Feng Qiao, Yu Zhang, Nathan Jacobs

We introduce a novel training strategy for stereo matching and optical flow estimation that utilizes image-to-image translation between synthetic and real image domains.

Image-to-Image Translation Optical Flow Estimation +3

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

1 code implementation3 Sep 2023 Yue Zhang, Yafu Li, Leyang Cui, Deng Cai, Lemao Liu, Tingchen Fu, Xinting Huang, Enbo Zhao, Yu Zhang, Yulong Chen, Longyue Wang, Anh Tuan Luu, Wei Bi, Freda Shi, Shuming Shi

While large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks, a significant concern revolves around their propensity to exhibit hallucinations: LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge.

Hallucination World Knowledge

Occlusion-Aware Detection and Re-ID Calibrated Network for Multi-Object Tracking

no code implementations30 Aug 2023 Yukun Su, Ruizhou Sun, Xin Shu, Yu Zhang, Qingyao Wu

Multi-Object Tracking (MOT) is a crucial computer vision task that aims to predict the bounding boxes and identities of objects simultaneously.

Multi-Object Tracking Object

Dual-Balancing for Multi-Task Learning

1 code implementation23 Aug 2023 Baijiong Lin, Weisen Jiang, Feiyang Ye, Yu Zhang, Pengguang Chen, Ying-Cong Chen, Shu Liu, James T. Kwok

Multi-task learning (MTL), a learning paradigm to learn multiple related tasks simultaneously, has achieved great success in various fields.

Multi-Task Learning

High-Fidelity Lake Extraction via Two-Stage Prompt Enhancement: Establishing a Novel Baseline and Benchmark

1 code implementation16 Aug 2023 Ben Chen, Xuechao Zou, Kai Li, Yu Zhang, Junliang Xing, Pin Tao

Lake extraction from remote sensing imagery is a complex challenge due to the varied lake shapes and data noise.

Forward-Backward Reasoning in Large Language Models for Mathematical Verification

no code implementations15 Aug 2023 Weisen Jiang, Han Shi, Longhui Yu, Zhengying Liu, Yu Zhang, Zhenguo Li, James T. Kwok

Instead of using forward or backward reasoning alone, we propose FOBAR to combine FOrward and BAckward Reasoning for verification.

Mathematical Reasoning

DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images

1 code implementation8 Aug 2023 Xuechao Zou, Kai Li, Junliang Xing, Yu Zhang, Shiying Wang, Lei Jin, Pin Tao

Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis.

Cloud Removal Image Generation

Decomposing and Coupling Saliency Map for Lesion Segmentation in Ultrasound Images

no code implementations2 Aug 2023 Zhenyuan Ning, Yixiao Mao, Qianjin Feng, Shengzhou Zhong, Yu Zhang

Complex scenario of ultrasound image, in which adjacent tissues (i. e., background) share similar intensity with and even contain richer texture patterns than lesion region (i. e., foreground), brings a unique challenge for accurate lesion segmentation.

Dimensionality Reduction Disentanglement +2

MATNilm: Multi-appliance-task Non-intrusive Load Monitoring with Limited Labeled Data

1 code implementation27 Jul 2023 Jing Xiong, Tianqi Hong, Dongbo Zhao, Yu Zhang

Non-intrusive load monitoring (NILM) identifies the status and power consumption of various household appliances by disaggregating the total power usage signal of an entire house.

energy management Non-Intrusive Load Monitoring

A Dual-mode Local Search Algorithm for Solving the Minimum Dominating Set Problem

no code implementations25 Jul 2023 Enqiang Zhu, Yu Zhang, Shengzhi Wang, Darren Strash, Chanjuan Liu

Given a graph, the minimum dominating set (MinDS) problem is to identify a smallest set $D$ of vertices such that every vertex not in $D$ is adjacent to at least one vertex in $D$.

"It Felt Like Having a Second Mind": Investigating Human-AI Co-creativity in Prewriting with Large Language Models

no code implementations20 Jul 2023 Qian Wan, Siying Hu, Yu Zhang, Piaohong Wang, Bo Wen, Zhicong Lu

This collaborative process champions the human in a dominant role, in addition to mixed and shifting levels of initiative that exist between humans and LLMs.

Real-World Evaluation of Full-Duplex Millimeter Wave Communication Systems

no code implementations20 Jul 2023 Ian P. Roberts, Yu Zhang, Tawfik Osman, Ahmed Alkhateeb

Noteworthy strides continue to be made in the development of full-duplex millimeter wave (mmWave) communication systems, but most of this progress has been built on theoretical models and validated through simulation.

Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers

1 code implementation24 Jun 2023 Yu Zhang, Bowen Jin, Xiusi Chen, Yanzhen Shen, Yunyi Zhang, Yu Meng, Jiawei Han

Instead of relying on human-annotated training samples to build a classifier, weakly supervised scientific paper classification aims to classify papers only using category descriptions (e. g., category names, category-indicative keywords).

Multi-Label Classification

FlowFace++: Explicit Semantic Flow-supervised End-to-End Face Swapping

no code implementations22 Jun 2023 Yu Zhang, Hao Zeng, Bowen Ma, Wei zhang, Zhimeng Zhang, Yu Ding, Tangjie Lv, Changjie Fan

The discriminator is shape-aware and relies on a semantic flow-guided operation to explicitly calculate the shape discrepancies between the target and source faces, thus optimizing the face swapping network to generate highly realistic results.

Face Swapping

Learning Variable Impedance Skills from Demonstrations with Passivity Guarantee

no code implementations20 Jun 2023 Yu Zhang, Long Cheng, Xiuze Xia, Haoyu Zhang

The proposed approach involves the estimation of full stiffness matrices from human demonstrations, which are then combined with sensed forces and motion information to create a model using the non-parametric method.

PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer

no code implementations13 Jun 2023 Xu Han, Bin Guo, Yoon Jung, Benjamin Yao, Yu Zhang, Xiaohu Liu, Chenlei Guo

Personalized dialogue agents (DAs) powered by large pre-trained language models (PLMs) often rely on explicit persona descriptions to maintain personality consistency.

Response Generation Transfer Learning

Efficient Adapters for Giant Speech Models

no code implementations13 Jun 2023 Nanxin Chen, Izhak Shafran, Yu Zhang, Chung-Cheng Chiu, Hagen Soltau, James Qin, Yonghui Wu

However, finetuning all parameters from the self-supervised learned model can be computationally expensive, and becomes infeasiable as the size of the model and the number of downstream tasks scales.

A Graph Transformer-Driven Approach for Network Robustness Learning

no code implementations12 Jun 2023 Yu Zhang, Jia Li, Jie Ding, Xiang Li

Learning and analysis of network robustness, including controllability robustness and connectivity robustness, is critical for various networked systems against attacks.

Gotta: Generative Few-shot Question Answering by Prompt-based Cloze Data Augmentation

1 code implementation7 Jun 2023 Xiusi Chen, Yu Zhang, Jinliang Deng, Jyun-Yu Jiang, Wei Wang

Few-shot question answering (QA) aims at precisely discovering answers to a set of questions from context passages while only a few training samples are available.

Data Augmentation Question Answering

COPR: Consistency-Oriented Pre-Ranking for Online Advertising

no code implementations6 Jun 2023 Zhishan Zhao, Jingyue Gao, Yu Zhang, Shuguang Han, Siyuan Lou, Xiang-Rong Sheng, Zhe Wang, Han Zhu, Yuning Jiang, Jian Xu, Bo Zheng

In this architecture, the pre-ranking model is expected to be a lightweight approximation of the ranking model, which handles more candidates with strict latency requirements.

Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection

1 code implementation CVPR 2023 Yingjie Wang, Jiajun Deng, Yao Li, Jinshui Hu, Cong Liu, Yu Zhang, Jianmin Ji, Wanli Ouyang, Yanyong Zhang

LiDAR and Radar are two complementary sensing approaches in that LiDAR specializes in capturing an object's 3D shape while Radar provides longer detection ranges as well as velocity hints.

object-detection Object Detection

Effective Structured Prompting by Meta-Learning and Representative Verbalizer

1 code implementation1 Jun 2023 Weisen Jiang, Yu Zhang, James T. Kwok

Combining meta-learning the prompt pool and RepVerb, we propose MetaPrompter for effective structured prompting.


How to Estimate Model Transferability of Pre-Trained Speech Models?

1 code implementation1 Jun 2023 Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-Yi Lee, Tara N. Sainath

In this work, we introduce a "score-based assessment" framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks.

Explanation Graph Generation via Generative Pre-training over Synthetic Graphs

1 code implementation1 Jun 2023 Han Cui, Shangzhan Li, Yu Zhang, Qi Shi

The generation of explanation graphs is a significant task that aims to produce explanation graphs in response to user input, revealing the internal reasoning process.

Graph Generation Language Modelling

LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus

no code implementations30 May 2023 Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

The constituent samples of LibriTTS-R are identical to those of LibriTTS, with only the sound quality improved.

Mixture-of-Expert Conformer for Streaming Multilingual ASR

no code implementations25 May 2023 Ke Hu, Bo Li, Tara N. Sainath, Yu Zhang, Francoise Beaufays

We evaluate the proposed model on a set of 12 languages, and achieve an average 11. 9% relative improvement in WER over the baseline.

Automatic Speech Recognition speech-recognition +1

PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training

1 code implementation23 May 2023 Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, Jiawei Han

Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision, which largely reduces human annotation efforts.

Pseudo Label Sentiment Analysis +3

Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding

no code implementations23 May 2023 Yu Zhang, Hao Cheng, Zhihong Shen, Xiaodong Liu, Ye-Yi Wang, Jianfeng Gao

Scientific literature understanding tasks have gained significant attention due to their potential to accelerate scientific discovery.

Citation Prediction Contrastive Learning

Patton: Language Model Pretraining on Text-Rich Networks

no code implementations20 May 2023 Bowen Jin, Wentao Zhang, Yu Zhang, Yu Meng, Xinyang Zhang, Qi Zhu, Jiawei Han

A real-world text corpus sometimes comprises not only text documents but also semantic links between them (e. g., academic papers in a bibliographic network are linked by citations and co-authorships).

Language Modelling Masked Language Modeling +1

Temporal Consistent Automatic Video Colorization via Semantic Correspondence

1 code implementation13 May 2023 Yu Zhang, Siqi Chen, Mingdao Wang, Xianlin Zhang, Chuang Zhu, Yue Zhang, Xueming Li

Extensive experiments demonstrate that our method outperforms other methods in maintaining temporal consistency both qualitatively and quantitatively.

Colorization Image Colorization +1

A Self-Training Framework Based on Multi-Scale Attention Fusion for Weakly Supervised Semantic Segmentation

1 code implementation10 May 2023 Guoqing Yang, Chuang Zhu, Yu Zhang

Weakly supervised semantic segmentation (WSSS) based on image-level labels is challenging since it is hard to obtain complete semantic regions.

Denoising Weakly supervised Semantic Segmentation +1

A Unifying Framework of Attention-based Neural Load Forecasting

1 code implementation8 May 2023 Jing Xiong, Yu Zhang

In this paper, we propose a unifying deep learning framework for load forecasting, which includes time-varying feature weighting, hierarchical temporal attention, and feature-reinforced error correction.

Load Forecasting

Chain-of-Skills: A Configurable Model for Open-domain Question Answering

1 code implementation4 May 2023 Kaixin Ma, Hao Cheng, Yu Zhang, Xiaodong Liu, Eric Nyberg, Jianfeng Gao

Our approach outperforms recent self-supervised retrievers in zero-shot evaluations and achieves state-of-the-art fine-tuned retrieval performance on NQ, HotpotQA and OTT-QA.

Open-Domain Question Answering Retrieval +1

Transforming Visual Scene Graphs to Image Captions

1 code implementation3 May 2023 Xu Yang, Jiawei Peng, Zihua Wang, Haiyang Xu, Qinghao Ye, Chenliang Li, Songfang Huang, Fei Huang, Zhangzikang Li, Yu Zhang

In TSG, we apply multi-head attention (MHA) to design the Graph Neural Network (GNN) for embedding scene graphs.

Attribute Descriptive +1

An Adaptive Policy to Employ Sharpness-Aware Minimization

no code implementations28 Apr 2023 Weisen Jiang, Hansi Yang, Yu Zhang, James Kwok

Sharpness-aware minimization (SAM), which searches for flat minima by min-max optimization, has been shown to be useful in improving model generalization.

Understanding Shared Speech-Text Representations

no code implementations27 Apr 2023 Gary Wang, Kyle Kastner, Ankur Bapna, Zhehuai Chen, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang

Recently, a number of approaches to train speech models by incorpo-rating text into end-to-end models have been developed, with Mae-stro advancing state-of-the-art automatic speech recognition (ASR)and Speech Translation (ST) performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Detection of Pavement Cracks by Deep Learning Models of Transformer and UNet

no code implementations25 Apr 2023 Yu Zhang, Lin Zhang

In this study, we investigated nine promising models to evaluate their performance in pavement surface crack detection by model accuracy, computational complexity, and model stability.

Mastering Asymmetrical Multiplayer Game with Multi-Agent Asymmetric-Evolution Reinforcement Learning

no code implementations20 Apr 2023 Chenglu Sun, Yichi Zhang, Yu Zhang, Ziling Lu, Jingbin Liu, Sijia Xu, Weidong Zhang

We propose asymmetric-evolution training (AET), a novel multi-agent reinforcement learning framework that can train multiple kinds of agents simultaneously in AMP game.

Multi-agent Reinforcement Learning reinforcement-learning

Handling Heavy Occlusion in Dense Crowd Tracking by Focusing on the Heads

no code implementations16 Apr 2023 Yu Zhang, Huaming Chen, Wei Bao, Zhongzheng Lai, Zao Zhang, Dong Yuan

Being able to identify and track all the pedestrians in the dense crowd scene with computer vision approaches is a typical challenge in this field, also known as the Multiple Object Tracking (MOT) challenge.

Multiple Object Tracking object-detection +1

SPColor: Semantic Prior Guided Exemplar-based Image Colorization

1 code implementation13 Apr 2023 Siqi Chen, Xueming Li, Xianlin Zhang, Mingdao Wang, Yu Zhang, Yue Zhang

Previous methods search for correspondence across the entire reference image, and this type of global matching is easy to get mismatch.

Colorization Image Colorization +1

Simplifying Low-Light Image Enhancement Networks with Relative Loss Functions

1 code implementation6 Apr 2023 Yu Zhang, Xiaoguang Di, Junde Wu, Rao Fu, Yong Li, Yue Wang, Yanwu Xu, Guohui YANG, Chunhui Wang

In this paper, to make the learning easier in low-light image enhancement, we introduce FLW-Net (Fast and LightWeight Network) and two relative loss functions.

Low-Light Image Enhancement

Safe Explicable Planning

no code implementations4 Apr 2023 Akkamahadevi Hanni, Andrew Boateng, Yu Zhang

The goal of SEP is to find behaviors that align with human expectations while adhering to the specified safety criterion.

Decision Making

Personalized Federated Learning with Local Attention

no code implementations2 Apr 2023 Sicong Liang, Junchao Tian, Shujun Yang, Yu Zhang

The key challenge of FL is the heterogeneity of local data in different clients, such as heterogeneous label distribution and feature shift, which could lead to significant performance degradation of the learned models.

Image Classification object-detection +2

Exemplar-based Video Colorization with Long-term Spatiotemporal Dependency

no code implementations27 Mar 2023 Siqi Chen, Xueming Li, Xianlin Zhang, Mingdao Wang, Yu Zhang, Jiatong Han, Yue Zhang

Exemplar-based video colorization is an essential technique for applications like old movie restoration.


$P^{3}O$: Transferring Visual Representations for Reinforcement Learning via Prompting

no code implementations22 Mar 2023 Guoliang You, Xiaomeng Chu, Yifan Duan, Jie Peng, Jianmin Ji, Yu Zhang, Yanyong Zhang

In particular, we specify a prompt-transformer for representation conversion and propose a two-step training process to train the prompt-transformer for the target environment, while the rest of the DRL pipeline remains unchanged.


Diffusion-based Target Sampler for Unsupervised Domain Adaptation

no code implementations17 Mar 2023 Yulong Zhang, Shuhao Chen, Yu Zhang, Jiangang Lu

The generated samples can well simulate the data distribution of the target domain and help existing UDA methods transfer from the source domain to the target domain more easily, thus improving the transfer performance.

Unsupervised Domain Adaptation

Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

1 code implementation3 Mar 2023 Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani

Experiments show that Miipher (i) is robust against various audio degradation and (ii) enable us to train a high-quality text-to-speech (TTS) model from restored speech samples collected from the Web.

Speech Denoising Speech Enhancement

PCR-CG: Point Cloud Registration via Deep Explicit Color and Geometry

1 code implementation28 Feb 2023 Yu Zhang, Junle Yu, Xiaolin Huang, Wenhui Zhou, Ji Hou

Different from previous methods that only use geometry representation, our module is specifically designed to effectively correlate color into geometry for the point cloud registration task.

Point Cloud Registration

Edgeformers: Graph-Empowered Transformers for Representation Learning on Textual-Edge Networks

1 code implementation21 Feb 2023 Bowen Jin, Yu Zhang, Yu Meng, Jiawei Han

Edges in many real-world social/information networks are associated with rich text information (e. g., user-user communications or user-product reviews).

Edge Classification Link Prediction +1

JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition

no code implementations16 Feb 2023 Zhong Meng, Weiran Wang, Rohit Prabhavalkar, Tara N. Sainath, Tongzhou Chen, Ehsan Variani, Yu Zhang, Bo Li, Andrew Rosenberg, Bhuvana Ramabhadran

We propose JEIT, a joint end-to-end (E2E) model and internal language model (ILM) training method to inject large-scale unpaired text into ILM during E2E training which improves rare-word speech recognition.

Language Modelling speech-recognition +1

The Effect of Metadata on Scientific Literature Tagging: A Cross-Field Cross-Model Study

1 code implementation7 Feb 2023 Yu Zhang, Bowen Jin, Qi Zhu, Yu Meng, Jiawei Han

Due to the exponential growth of scientific publications on the Web, there is a pressing need to tag each paper with fine-grained topics so that researchers can track their interested fields of study rather than drowning in the whole literature.

Language Modelling Multi Label Text Classification +3

TrajMatch: Towards Automatic Spatio-temporal Calibration for Roadside LiDARs through Trajectory Matching

no code implementations4 Feb 2023 Haojie Ren, Sha Zhang, Sugang Li, Yao Li, Xinchen Li, Jianmin Ji, Yu Zhang, Yanyong Zhang

In this paper, we propose TrajMatch -- the first system that can automatically calibrate for roadside LiDARs in both time and space.

Physics-guided Residual Learning for Probabilistic Power Flow Analysis

no code implementations28 Jan 2023 Kejun Chen, Yu Zhang

Probabilistic power flow (PPF) analysis is critical to power system operation and planning.

From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

no code implementations19 Jan 2023 Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman

In this work, we propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition, which can \textbf{re-purpose} well-trained English automatic speech recognition (ASR) models to recognize the other languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Super-Resolution Harmonic Retrieval of Non-Circular Signals

no code implementations17 Jan 2023 Yu Zhang, Yue Wang, Zhi Tian, Geert Leus, Gong Zhang

This paper proposes a super-resolution harmonic retrieval method for uncorrelated strictly non-circular signals, whose covariance and pseudo-covariance present Toeplitz and Hankel structures, respectively.

Retrieval Super-Resolution

OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection

no code implementations13 Jan 2023 Xiaomeng Chu, Jiajun Deng, Yuan Zhao, Jianmin Ji, Yu Zhang, Houqiang Li, Yanyong Zhang

To this end, we propose OA-BEV, a network that can be plugged into the BEV-based 3D object detection framework to bring out the objects by incorporating object-aware pseudo-3D features and depth features.

3D Object Detection Object +1

Learning Trajectory-Word Alignments for Video-Language Tasks

no code implementations ICCV 2023 Xu Yang, Zhangzikang Li, Haiyang Xu, Hanwang Zhang, Qinghao Ye, Chenliang Li, Ming Yan, Yu Zhang, Fei Huang, Songfang Huang

To amend this, we propose a novel TW-BERT to learn Trajectory-Word alignment by a newly designed trajectory-to-word (T2W) attention for solving video-language tasks.

Question Answering Retrieval +4

Range-Nullspace Video Frame Interpolation With Focalized Motion Estimation

no code implementations CVPR 2023 ZHIYANG YU, Yu Zhang, Dongqing Zou, Xijun Chen, Jimmy S. Ren, Shunqing Ren

Continuous-time video frame interpolation is a fundamental technique in computer vision for its flexibility in synthesizing motion trajectories and novel video frames at arbitrary intermediate time steps.

Video Frame Interpolation

PEAL: Prior-Embedded Explicit Attention Learning for Low-Overlap Point Cloud Registration

1 code implementation CVPR 2023 Junle Yu, Luwei Ren, Yu Zhang, Wenhui Zhou, Lili Lin, Guojun Dai

Recently, it has achieved huge success in incorporating Transformer into point cloud feature representation, which usually adopts a self-attention module to learn intra-point-cloud features first, then utilizes a cross-attention module to perform feature exchange between input point clouds.

Point Cloud Registration

Adaptive Positional Encoding for Bundle-Adjusting Neural Radiance Fields

no code implementations ICCV 2023 Zelin Gao, Weichen Dai, Yu Zhang

Neural Radiance Fields have shown great potential to synthesize novel views with only a few discrete image observations of the world.

E2NeRF: Event Enhanced Neural Radiance Fields from Blurry Images

1 code implementation ICCV 2023 Yunshan Qi, Lin Zhu, Yu Zhang, Jia Li

To solve this problem, we propose a novel Event-Enhanced NeRF (E2NeRF) by utilizing the combination data of a bio-inspired event camera and a standard RGB camera.

Deblurring Image Deblurring +2

Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models

no code implementations19 Dec 2022 Yong Cheng, Yu Zhang, Melvin Johnson, Wolfgang Macherey, Ankur Bapna

We present Mu$^{2}$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Dynamic Sparse Network for Time Series Classification: Learning What to "see''

1 code implementation19 Dec 2022 Qiao Xiao, Boqian Wu, Yu Zhang, Shiwei Liu, Mykola Pechenizkiy, Elena Mocanu, Decebal Constantin Mocanu

The receptive field (RF), which determines the region of time series to be ``seen'' and used, is critical to improve the performance for time series classification (TSC).

Time Series Time Series Analysis +1

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts

1 code implementation12 Dec 2022 Yu Zhang, Yunyi Zhang, Martin Michalski, Yucheng Jiang, Yu Meng, Jiawei Han

Instead of mining coherent topics from a given text corpus in a completely unsupervised manner, seed-guided topic discovery methods leverage user-provided seed words to extract distinctive and coherent topics so that the mined topics can better cater to the user's interest.

Language Modelling Word Embeddings

Unsupervised Deep Learning for AC Optimal Power Flow via Lagrangian Duality

no code implementations7 Dec 2022 Kejun Chen, Shourya Bose, Yu Zhang

Non-convex AC optimal power flow (AC-OPF) is a fundamental optimization problem in power system analysis.

Entity Set Co-Expansion in StackOverflow

no code implementations5 Dec 2022 Yu Zhang, Yunyi Zhang, Yucheng Jiang, Martin Michalski, Yu Deng, Lucian Popa, ChengXiang Zhai, Jiawei Han

Given a few seed entities of a certain type (e. g., Software or Programming Language), entity set expansion aims to discover an extensive set of entities that share the same type as the seeds.

graph construction Management

Feature Aggregation and Propagation Network for Camouflaged Object Detection

1 code implementation2 Dec 2022 Tao Zhou, Yi Zhou, Chen Gong, Jian Yang, Yu Zhang

In this paper, we propose a novel Feature Aggregation and Propagation Network (FAP-Net) for camouflaged object detection.

Object object-detection +1

TSGP: Two-Stage Generative Prompting for Unsupervised Commonsense Question Answering

no code implementations24 Nov 2022 Yueqing Sun, Yu Zhang, Le Qi, Qi Shi

In this paper, we aim to address the above limitation by leveraging the implicit knowledge stored in PrLMs and propose a two-stage prompt-based unsupervised commonsense question answering framework (TSGP).

Answer Generation Question Answering +1

Leveraging per Image-Token Consistency for Vision-Language Pre-training

no code implementations CVPR 2023 Yunhao Gou, Tom Ko, Hansi Yang, James Kwok, Yu Zhang, Mingxuan Wang

(2) Under-utilization of the unmasked tokens: CMLM primarily focuses on the masked token but it cannot simultaneously leverage other tokens to learn vision-language associations.

Language Modelling Masked Language Modeling +1

Disentangling Task Relations for Few-shot Text Classification via Self-Supervised Hierarchical Task Clustering

no code implementations16 Nov 2022 Juan Zha, Zheng Li, Ying WEI, Yu Zhang

However, most prior works assume that all the tasks are sampled from a single data source, which cannot adapt to real-world scenarios where tasks are heterogeneous and lie in different distributions.

Clustering Few-Shot Text Classification +1

TLP: A Deep Learning-based Cost Model for Tensor Program Tuning

1 code implementation7 Nov 2022 Yi Zhai, Yu Zhang, Shuo Liu, Xiaomeng Chu, Jie Peng, Jianmin Ji, Yanyong Zhang

Instead of extracting features from the tensor program itself, TLP extracts features from the schedule primitives.

Multi-Task Learning

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

1 code implementation6 Nov 2022 Yu Meng, Martin Michalski, Jiaxin Huang, Yu Zhang, Tarek Abdelzaher, Jiawei Han

In this work, we study few-shot learning with PLMs from a different perspective: We first tune an autoregressive PLM on the few-shot samples and then use it as a generator to synthesize a large amount of novel training samples which augment the original training set.

Few-Shot Learning

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition

no code implementations2 Nov 2022 Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Tara N. Sainath, Sabato Marco Siniscalchi, Chin-Hui Lee

We propose a quantum kernel learning (QKL) framework to address the inherent data sparsity issues often encountered in training large-scare acoustic models in low-resource scenarios.

Spoken Command Recognition

Max Markov Chain

no code implementations2 Nov 2022 Yu Zhang, Mitchell Bucklew

In this paper, we introduce Max Markov Chain (MMC), a novel representation for a useful subset of High-order Markov Chains (HMCs) with sparse correlations among the states.

MedSegDiff: Medical Image Segmentation with Diffusion Probabilistic Model

2 code implementations1 Nov 2022 Junde Wu, Rao Fu, Huihui Fang, Yu Zhang, Yehui Yang, Haoyi Xiong, Huiying Liu, Yanwu Xu

Inspired by the success of DPM, we propose the first DPM based model toward general medical image segmentation tasks, which we named MedSegDiff.

Anomaly Detection Brain Tumor Segmentation +8

Modular Hybrid Autoregressive Transducer

no code implementations31 Oct 2022 Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno

In this work, we propose a modular hybrid autoregressive transducer (MHAT) that has structurally separated label and blank decoders to predict label and blank distributions, respectively, along with a shared acoustic encoder.

Language Modelling speech-recognition +1

Accelerating RNN-T Training and Inference Using CTC guidance

no code implementations29 Oct 2022 Yongqiang Wang, Zhehuai Chen, Chengjian Zheng, Yu Zhang, Wei Han, Parisa Haghani

We propose a novel method to accelerate training and inference process of recurrent neural network transducer (RNN-T) based on the guidance from a co-trained connectionist temporal classification (CTC) model.

Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation

no code implementations28 Oct 2022 Nobuyuki Morioka, Heiga Zen, Nanxin Chen, Yu Zhang, Yifan Ding

Adapting a neural text-to-speech (TTS) model to a target speaker typically involves fine-tuning most if not all of the parameters of a pretrained multi-speaker backbone model.

Personalized Dialogue Generation with Persona-Adaptive Attention

1 code implementation27 Oct 2022 Qiushi Huang, Yu Zhang, Tom Ko, Xubo Liu, Bo Wu, Wenwu Wang, Lilian Tang

Persona-based dialogue systems aim to generate consistent responses based on historical context and predefined persona.

Dialogue Generation