The Pyramid of Captions

no code implementations1 May 2024 Delong Chen, Samuel Cahyawijaya, Etsuko Ishii, Ho Shu Chan, Yejin Bang, Pascale Fung

Building upon this foundation, we propose a novel Pyramid of Captions (PoCa) method, which constructs caption pyramids by generating localized captions for zoomed-in image patches and integrating them with global caption information using large language models.

Hallucination Image Captioning +1

High-Dimension Human Value Representation in Large Language Models

1 code implementation11 Apr 2024 Samuel Cahyawijaya, Delong Chen, Yejin Bang, Leila Khalatbari, Bryan Wilie, Ziwei Ji, Etsuko Ishii, Pascale Fung

there is an urgent need to understand the scope and nature of human values injected into these models before their release.

Language Modelling

Measuring Political Bias in Large Language Models: What Is Said and How It Is Said

no code implementations27 Mar 2024 Yejin Bang, Delong Chen, Nayeon Lee, Pascale Fung

We propose to measure political bias in LLMs by analyzing both the content and style of their generated content regarding political issues.

Subobject-level Image Tokenization

1 code implementation22 Feb 2024 Delong Chen, Samuel Cahyawijaya, Jianfeng Liu, Baoyuan Wang, Pascale Fung

Transformer-based vision models typically tokenize images into fixed-size square patches as input units, which lacks the adaptability to image content and overlooks the inherent pixel grouping structure.

Attribute Language Modelling +2

Few-shot Adaptation of Multi-modal Foundation Models: A Survey

no code implementations3 Jan 2024 Fan Liu, Tianshu Zhang, Wenwen Dai, Wenwen Cai, Xiaocong Zhou, Delong Chen

Therefore, in this survey, we introduce and analyze the research advancements in few-shot adaptation methods for multi-modal models, summarizing commonly used datasets and experimental setups, and comparing the results of different methods.

Domain Generalization Model Selection

Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model

1 code implementation20 Sep 2023 Xinyu Zhou, Delong Chen, Yudong Chen

This paper explores the potential of constructing an AI spoken dialogue system that "thinks how to respond" and "thinks how to speak" simultaneously, which more closely aligns with the human speech production process compared to the current cascade pipeline of independent chatbot and Text-to-Speech (TTS) modules.

Chatbot Language Modelling +3

Visual Instruction Tuning with Polite Flamingo

2 code implementations3 Jul 2023 Delong Chen, Jianfeng Liu, Wenliang Dai, Baoyuan Wang

This side effect negatively impacts the model's ability to format responses appropriately -- for instance, its "politeness" -- due to the overly succinct and unformatted nature of raw annotations, resulting in reduced human preference.

RemoteCLIP: A Vision Language Foundation Model for Remote Sensing

1 code implementation19 Jun 2023 Fan Liu, Delong Chen, Zhangqingyun Guan, Xiaocong Zhou, Jiale Zhu, Qiaolin Ye, Liyong Fu, Jun Zhou

However, these models primarily learn low-level features and require annotated data for fine-tuning.

Ranked #3 on Cross-Modal Retrieval on RSITMD (using extra training data)

Classification Cross-Modal Retrieval +7

Taming Diffusion Models for Music-driven Conducting Motion Generation

1 code implementation15 Jun 2023 Zhuoran Zhao, Jinbin Bai, Delong Chen, Debang Wang, Yubo Pan

Generating the motion of orchestral conductors from a given piece of symphony music is a challenging task since it requires a model to learn semantic music features and capture the underlying distribution of real conducting motion.


Few-shot Classification via Ensemble Learning with Multi-Order Statistics

no code implementations30 Apr 2023 Sai Yang, Fan Liu, Delong Chen, Jun Zhou

To address this need, we prove theoretically that leveraging ensemble learning on the base classes can correspondingly reduce the true error in the novel classes.

Classification Diversity +2

ProtoCLIP: Prototypical Contrastive Language Image Pretraining

1 code implementation22 Jun 2022 Delong Chen, Zhao Wu, Fan Liu, Zaiquan Yang, Huaxi Huang, Ying Tan, Erjin Zhou

Based on this understanding, in this paper, Prototypical Contrastive Language Image Pretraining (ProtoCLIP) is introduced to enhance such grouping by boosting its efficiency and increasing its robustness against the modality gap.

Zero-Shot Learning

A Simple Baseline for Adversarial Domain Adaptation-based Unsupervised Flood Forecasting

1 code implementation16 Jun 2022 Delong Chen, Ruizhi Zhou, Yanling Pan, Fan Liu

Specifically, training of FloodDAN includes two stages: in the first stage, we train a rainfall encoder and a prediction head to learn general transferable hydrological knowledge on large-scale source domain data; in the second stage, we transfer the knowledge in the pretrained encoder into the rainfall encoder of target domain through adversarial domain alignment.

Unsupervised Domain Adaptation

Self-Supervised Music-Motion Synchronization Learning for Music-Driven Conducting Motion Generation

1 code implementation Journal of Computer Science and Technology 2022 Fan Liu, Delong Chen, Ruizhi Zhou, Sai Yang, Feng Xu

Therefore, we propose a novel Music Motion Synchronized Generative Adversarial Network (M2S-GAN), which generates motions according to the automatically learned music representations.

Generative Adversarial Network

Survey of Hallucination in Natural Language Generation

no code implementations8 Feb 2022 Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Yejin Bang, Delong Chen, Wenliang Dai, Ho Shu Chan, Andrea Madotto, Pascale Fung

This advancement has led to more fluent and coherent NLG, leading to improved development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation.

Abstractive Text Summarization Data-to-Text Generation +4

VirtualConductor: Music-driven Conducting Video Generation System

2 code implementations28 Jul 2021 Delong Chen, Fan Liu, Zewen Li, Feng Xu

In this demo, we present VirtualConductor, a system that can generate conducting video from any given music and a single user's image.

Pose Transfer Video Generation

Significant Wave Height Prediction based on Wavelet Graph Neural Network

1 code implementation20 Jul 2021 Delong Chen, Fan Liu, Zheqi Zhang, Xiaomin Lu, Zewen Li

Several parallel graph neural networks are separately trained on wavelet decomposed data, and the reconstruction of each model's prediction forms the final SWH prediction.

BIG-bench Machine Learning Graph Neural Network

A Review of Automated Diagnosis of COVID-19 Based on Scanning Images

no code implementations9 Jun 2020 Delong Chen, Shunhui Ji, Fan Liu, Zewen Li, Xinyu Zhou

The pandemic of COVID-19 has caused millions of infections, which has led to a great loss all over the world, socially and economically.

Computed Tomography (CT) Domain Adaptation +1

Deep Learning Based Single Sample Per Person Face Recognition: A Survey

no code implementations9 Jun 2020 Fan Liu, Delong Chen, Fei Wang, Zewen Li, Feng Xu

Face recognition under this situation is referred to as single sample face recognition and poses significant challenges to the effective training of deep models.

Domain Adaptation Face Recognition

