Search Results for author: Fuzhao Xue

Found 23 papers, 11 papers with code

Boosting LLM via Learning from Data Iteratively and Selectively

1 code implementation23 Dec 2024 Qi Jia, Siyu Ren, Ziheng Qin, Fuzhao Xue, Jinjie Ni, Yang You

On the other hand, the diversity score is defined on top of the samples' responses under the consideration of their informativeness.

Diversity Informativeness

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

no code implementations17 Oct 2024 Jinjie Ni, YiFan Song, Deepanway Ghosal, Bo Li, David Junhao Zhang, Xiang Yue, Fuzhao Xue, Zian Zheng, Kaichen Zhang, Mahir Shah, Kabir Jain, Yang You, Michael Shieh

Perceiving and generating diverse modalities are crucial for AI models to effectively learn from and engage with real-world signals, necessitating reliable evaluations for their development.

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

1 code implementation19 Aug 2024 Yukang Chen, Fuzhao Xue, Dacheng Li, Qinghao Hu, Ligeng Zhu, Xiuyu Li, Yunhao Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan, Yuke Zhu, Yao Lu, Song Han

We introduce the long-context Multi-Modal Sequence Parallelism (MM-SP) system that efficiently parallelizes long video training and inference, enabling 2M context length training on 256 GPUs without any gradient checkpointing.

Video Captioning Video Question Answering +1

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

no code implementations3 Jun 2024 Jinjie Ni, Fuzhao Xue, Xiang Yue, Yuntian Deng, Mahir Shah, Kabir Jain, Graham Neubig, Yang You

Our benchmarks' advantages lie in (1) a 0. 96 model ranking correlation with Chatbot Arena arising from the highly impartial query distribution and grading mechanism, (2) fast, cheap, and reproducible execution (6% of the time and cost of MMLU), and (3) dynamic evaluation enabled by the rapid and stable data update pipeline.

Chatbot MMLU

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

1 code implementation29 Jan 2024 Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You

To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens.

Decoder Mixture-of-Experts

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

1 code implementation NeurIPS 2023 Zangwei Zheng, Xiaozhe Ren, Fuzhao Xue, Yang Luo, Xin Jiang, Yang You

By leveraging this information, we introduce an efficient sequence scheduling technique that groups queries with similar response lengths into micro-batches.

Quantization Scheduling

Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention

1 code implementation Tiny Papers @ ICLR 2023 Xiao Liu, Jian Zhang, Heng Zhang, Fuzhao Xue, Yang You

We evaluate our model on various dialogue understanding tasks including dialogue relation extraction, dialogue emotion recognition, and dialogue act classification.

Dialogue Act Classification Dialogue Understanding +2

Adaptive Computation with Elastic Input Sequence

1 code implementation30 Jan 2023 Fuzhao Xue, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You

However, most standard neural networks have a fixed function type and computation budget regardless of the sample's nature or difficulty.

Inductive Bias

Large-Scale Deep Learning Optimizations: A Comprehensive Survey

no code implementations1 Nov 2021 Xiaoxin He, Fuzhao Xue, Xiaozhe Ren, Yang You

Deep learning have achieved promising results on a wide spectrum of AI applications.

Deep Learning Survey

Cross-token Modeling with Conditional Computation

no code implementations5 Sep 2021 Yuxuan Lou, Fuzhao Xue, Zangwei Zheng, Yang You

Mixture-of-Experts (MoE), a conditional computation architecture, achieved promising performance by scaling local module (i. e. feed-forward network) of transformer.

Computational Efficiency Image Classification +1

Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization

no code implementations10 Aug 2021 Andrew Koh, Fuzhao Xue, Eng Siong Chng

In this paper, we examine the use of Transfer Learning using Pretrained Audio Neural Networks (PANNs), and propose an architecture that is able to better leverage the acoustic features provided by PANNs for the Automated Audio Captioning Task.

Audio captioning Decoder +1

Go Wider Instead of Deeper

1 code implementation25 Jul 2021 Fuzhao Xue, Ziji Shi, Futao Wei, Yuxuan Lou, Yong liu, Yang You

To achieve better performance with fewer trainable parameters, recent methods are proposed to go shallower by parameter sharing or model compressing along with the depth.

Image Classification Mixture-of-Experts

Sequence Parallelism: Long Sequence Training from System Perspective

no code implementations26 May 2021 Shenggui Li, Fuzhao Xue, Chaitanya Baranwal, Yongbin Li, Yang You

That is, with sparse attention, our sequence parallelism enables us to train transformer with infinite long sequence.

Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey

no code implementations10 May 2021 Jinjie Ni, Tom Young, Vlad Pandelea, Fuzhao Xue, Erik Cambria

To the best of our knowledge, this survey is the most comprehensive and up-to-date one at present for deep learning based dialogue systems, extensively covering the popular techniques.

Information Retrieval Question Answering

GDPNet: Refining Latent Multi-View Graph for Relation Extraction

1 code implementation12 Dec 2020 Fuzhao Xue, Aixin Sun, Hao Zhang, Eng Siong Chng

Recent advances on RE task are from BERT-based sequence modeling and graph-based modeling of relationships among the tokens in the sequence.

Ranked #4 on Dialog Relation Extraction on DialogRE (F1c (v1) metric)

Dialog Relation Extraction Dynamic Time Warping +3

Deep Graph Random Process for Relational-Thinking-Based Speech Recognition

no code implementations ICML 2020 Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang

Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.