Search Results for author: Kai Hu

Found 35 papers, 11 papers with code

Mutagenesis screen to map the functions of parameters of Large Language Models

no code implementations21 Aug 2024 Yue Hu, Kai Hu, Patrick X. Zhao, Javed Khan, Chengming Xu

Large Language Models (LLMs) have significantly advanced artificial intelligence, excelling in numerous tasks.

Descriptive

Empowering Graph Invariance Learning with Deep Spurious Infomax

1 code implementation13 Jul 2024 Tianjun Yao, Yongqiang Chen, Zhenhao Chen, Kai Hu, Zhiqiang Shen, Kun Zhang

To bridge this gap, we introduce a novel graph invariance learning paradigm, which induces a robust and general inductive bias.

Inductive Bias

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

no code implementations7 Jul 2024 Zhihao Du, Qian Chen, Shiliang Zhang, Kai Hu, Heng Lu, Yexin Yang, Hangrui Hu, Siqi Zheng, Yue Gu, Ziyang Ma, Zhifu Gao, Zhijie Yan

Based on the tokens, we further propose a scalable zero-shot TTS synthesizer, CosyVoice, which consists of an LLM for text-to-token generation and a conditional flow matching model for token-to-speech synthesis.

Language Modelling Large Language Model +6

Slight Corruption in Pre-training Data Makes Better Diffusion Models

no code implementations30 May 2024 Hao Chen, Yujin Han, Diganta Misra, Xiang Li, Kai Hu, Difan Zou, Masashi Sugiyama, Jindong Wang, Bhiksha Raj

They benefit significantly from extensive pre-training on large-scale datasets, including web-crawled data with paired data and conditions, such as image-text and image-class pairs.

RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection

1 code implementation30 May 2024 Fangyi Chen, Han Zhang, Zhantao Yang, Hao Chen, Kai Hu, Marios Savvides

Open-vocabulary object detection (OVD) requires solid modeling of the region-semantic relationship, which could be learned from massive region-text pairs.

Ranked #11 on Open Vocabulary Object Detection on LVIS v1.0 (using extra training data)

Image Captioning Image Inpainting +4

DLAFormer: An End-to-End Transformer For Document Layout Analysis

no code implementations20 May 2024 Jiawei Wang, Kai Hu, Qiang Huo

Document layout analysis (DLA) is crucial for understanding the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc.

Document Layout Analysis Document Summarization +3

Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

no code implementations15 May 2024 Kai Hu, Weichen Yu, Tianjun Yao, Xiang Li, Wenhe Liu, Lijun Yu, Yining Li, Kai Chen, Zhiqiang Shen, Matt Fredrikson

Our approach relaxes the discrete jailbreak optimization into a continuous optimization and progressively increases the sparsity of the optimizing vectors.

LLM Jailbreak

Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis

1 code implementation22 Jan 2024 Jiawei Wang, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang Huo

Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets (PubLayNet and DocLayNet), a high-quality hierarchical document structure reconstruction dataset (HRDoc), and our Comp-HRDoc benchmark.

Document Layout Analysis Document Summarization +4

UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents

no code implementations17 Jan 2024 Kai Hu, Jiawei Wang, WeiHong Lin, Zhuoyao Zhong, Lei Sun, Qiang Huo

This unified approach allows for the definition of various relation types and effectively tackles hierarchical relationships in form-like documents.

Decoder Key-value Pair Extraction +2

Dynamic Relation Transformer for Contextual Text Block Detection

no code implementations17 Jan 2024 Jiawei Wang, Shunchi Zhang, Kai Hu, Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo

Contextual Text Block Detection (CTBD) is the task of identifying coherent text blocks within the complexity of natural scenes.

Decoder Graph Generation +2

Is Certifying $\ell_p$ Robustness Still Worthwhile?

no code implementations13 Oct 2023 Ravi Mangal, Klas Leino, Zifan Wang, Kai Hu, Weicheng Yu, Corina Pasareanu, Anupam Datta, Matt Fredrikson

There are three layers to this inquiry, which we address in this paper: (1) why do we care about robustness research?

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

2 code implementations7 Oct 2023 Zhihao Du, JiaMing Wang, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

Previous mainstream audio-and-text LLMs use discrete audio tokens to represent both input and output audio; however, they suffer from performance degradation on tasks such as automatic speech recognition, speech-to-text translation, and speech enhancement over models using continuous speech features.

Audio captioning Automatic Speech Recognition +13

A Recipe for Improved Certifiable Robustness

1 code implementation4 Oct 2023 Kai Hu, Klas Leino, Zifan Wang, Matt Fredrikson

A key challenge, supported both theoretically and empirically, is that robustness demands greater network capacity and more data than standard training.

Data Augmentation

Completing Visual Objects via Bridging Generation and Segmentation

no code implementations1 Oct 2023 Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang, Zicheng Liu

This paper presents a novel approach to object completion, with the primary goal of reconstructing a complete object from its partially visible components.

Image Generation Object +1

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

1 code implementation14 Sep 2023 Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng

We also demonstrate that the pre-trained models are suitable for downstream tasks, including automatic speech recognition and personalized text-to-speech synthesis.

Automatic Speech Recognition speech-recognition +4

A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images

no code implementations17 Apr 2023 Kai Hu, Zhuoyuan Wu, Zhuoyao Zhong, WeiHong Lin, Lei Sun, Qiang Huo

In this paper, we present a new question-answering (QA) based key-value pair extraction approach, called KVPFormer, to robustly extracting key-value relationships between entities from form-like document images.

Decoder Key-value Pair Extraction +1

Unlocking Deterministic Robustness Certification on ImageNet

2 code implementations NeurIPS 2023 Kai Hu, Andy Zou, Zifan Wang, Klas Leino, Matt Fredrikson

We show that fast ways of bounding the Lipschitz constant for conventional ResNets are loose, and show how to address this by designing a new residual block, leading to the \emph{Linear ResNet} (LiResNet) architecture.

Enhanced Training of Query-Based Object Detection via Selective Query Recollection

2 code implementations CVPR 2023 Fangyi Chen, Han Zhang, Kai Hu, Yu-Kai Huang, Chenchen Zhu, Marios Savvides

This paper investigates a phenomenon where query-based object detectors mispredict at the last decoding stage while predicting correctly at an intermediate stage.

Attribute Object +2

Contextual Expressive Text-to-Speech

no code implementations26 Nov 2022 Jianhong Tu, Zeyu Cui, Xiaohuan Zhou, Siqi Zheng, Kai Hu, Ju Fan, Chang Zhou

To achieve this task, we construct a synthetic dataset and develop an effective framework.

Speech Synthesis Text to Speech

The VolcTrans System for WMT22 Multilingual Machine Translation Task

no code implementations20 Oct 2022 Xian Qian, Kai Hu, Jiaqiang Wang, Yifeng Liu, Xingyuan Pan, Jun Cao, Mingxuan Wang

This report describes our VolcTrans system for the WMT22 shared task on large-scale multilingual machine translation.

Machine Translation Translation

Composite FORCE learning of chaotic echo state networks for time-series prediction

no code implementations6 Jul 2022 Yansong Li, Kai Hu, Kohei Nakajima, Yongping Pan

Echo state network (ESN), a kind of recurrent neural networks, consists of a fixed reservoir in which neurons are connected randomly and recursively and obtains the desired output only by training output connection weights.

Time Series Time Series Prediction

Enhancing Quality of Pose-varied Face Restoration with Local Weak Feature Sensing and GAN Prior

no code implementations28 May 2022 Kai Hu, Yu Liu, Renhe Liu, Wei Lu, Gang Yu, Bin Fu

In the asymmetric codec, we adopt a mixed multi-path residual block (MMRB) to gradually extract weak texture features of input images, which can better preserve the original facial features and avoid excessive fantasy.

Blind Face Restoration Super-Resolution

ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents

no code implementations25 May 2021 WeiHong Lin, Qifang Gao, Lei Sun, Zhuoyao Zhong, Kai Hu, Qin Ren, Qiang Huo

In this paper, we propose a new multi-modal backbone network by concatenating a BERTgrid to an intermediate layer of a CNN model, where the input of CNN is a document image and the BERTgrid is a grid of word embeddings, to generate a more powerful grid-based document representation, named ViBERTgrid.

Image Segmentation Key Information Extraction +4

Text to Image Generation with Semantic-Spatial Aware GAN

1 code implementation CVPR 2022 Kai Hu, Wentong Liao, Michael Ying Yang, Bodo Rosenhahn

Text-to-image synthesis (T2I) aims to generate photo-realistic images which are semantically consistent with the text descriptions.

Sentence Sentence Embedding +2

Contrast and Order Representations for Video Self-Supervised Learning

no code implementations ICCV 2021 Kai Hu, Jie Shao, YuAn Liu, Bhiksha Raj, Marios Savvides, Zhiqiang Shen

To address this, we present a contrast-and-order representation (CORP) framework for learning self-supervised video representations that can automatically capture both the appearance information within each frame and temporal information across different frames.

Action Recognition Self-Supervised Action Recognition Linear +1

Is normalization indispensable for training deep neural network?

1 code implementation NeurIPS 2020 Jie Shao, Kai Hu, Changhu Wang, xiangyang xue, Bhiksha Raj

In this paper, we study what would happen when normalization layers are removed from the network, and show how to train deep neural networks without normalization layers and without performance degradation.

General Classification Image Classification +5

A Neural Architecture Search based Framework for Liquid State Machine Design

no code implementations7 Apr 2020 Shuo Tian, Lianhua Qu, Kai Hu, Nan Li, Lei Wang, Weixia Xu

By exploring the design space in network architectures and parameters, recent works have demonstrated great potential for improving the accuracy of LSM model with low complexity.

Neural Architecture Search

RotationOut as a Regularization Method for Neural Network

no code implementations18 Nov 2019 Kai Hu, Barnabas Poczos

We further use a noise analysis method to interpret the difference between RotationOut and Dropout in co-adaptation reduction.

Higher-order Network for Action Recognition

no code implementations19 Nov 2018 Kai Hu, Bhiksha Raj

Capturing spatiotemporal dynamics is an essential topic in video recognition.

Action Recognition General Classification +2

Neural CRF transducers for sequence labeling

no code implementations4 Nov 2018 Kai Hu, Zhijian Ou, Min Hu, Junlan Feng

Conditional random fields (CRFs) have been shown to be one of the most successful approaches to sequence labeling.

Chunking NER +2

MSDNN: Multi-Scale Deep Neural Network for Salient Object Detection

no code implementations12 Jan 2018 Fen Xiao, Wenzheng Deng, Liangchan Peng, Chunhong Cao, Kai Hu, Xieping Gao

Salient object detection is a fundamental problem and has been received a great deal of attentions in computer vision.

Object object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.