Search Results for author: Ge Zhang

Found 125 papers, 68 papers with code

Aligning Generative Language Models with Human Values

no code implementations Findings (NAACL) 2022 Ruibo Liu, Ge Zhang, Xinyu Feng, Soroush Vosoughi

Although current large-scale generative language models (LMs) can show impressive insights about factual knowledge, they do not exhibit similar success with respect to human values judgements (e. g., whether or not the generations of an LM are moral).

Text Generation Transfer Learning

AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection

no code implementations12 May 2025 Kai Hua, Steven Wu, Ge Zhang, Ke Shen

Recently, there has been growing interest in collecting reasoning-intensive pretraining data to improve LLMs' complex reasoning ability.

GSM8K HumanEval +2

FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models

1 code implementation5 May 2025 Zhouliang Yu, Ruotian Peng, Keyi Ding, Yizhe Li, Zhongyuan Peng, Minghao Liu, Yifan Zhang, Zheng Yuan, Huajian Xin, Wenhao Huang, Yandong Wen, Ge Zhang, Weiyang Liu

Formal mathematical reasoning remains a critical challenge for artificial intelligence, hindered by limitations of existing benchmarks in scope and scale.

Benchmarking Mathematical Reasoning +1

IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs

1 code implementation21 Apr 2025 David Ma, Yuanxing Zhang, Jincheng Ren, Jarvis Guo, Yifan Yao, Zhenlin Wei, Zhenzhu Yang, Zhongyuan Peng, Boyu Feng, Jun Ma, Xiao Gu, Zhoufutu Wen, King Zhu, Yancheng He, Meng Cao, Shiwen Ni, Jiaheng Liu, Wenhao Huang, Ge Zhang, Xiaojie Jin

Existing evaluation frameworks for Multimodal Large Language Models (MLLMs) primarily focus on image reasoning or general video understanding tasks, largely overlooking the significant role of image context in video comprehension.

Video Understanding

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

1 code implementation15 Apr 2025 Jiazhan Feng, Shijue Huang, Xingwei Qu, Ge Zhang, Yujia Qin, Baoquan Zhong, Chengquan Jiang, Jinxin Chi, Wanjun Zhong

While reasoning models (e. g., DeepSeek R1) trained with reinforcement learning (RL), excel in textual reasoning, they struggle in scenarios requiring structured problem-solving, such as geometric reasoning, concise computation, or complex equation solving-areas where computational tools like code interpreters (CI) demonstrate distinct advantages.

Math Mathematical Reasoning +3

Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning

no code implementations10 Apr 2025 ByteDance Seed, :, Jiaze Chen, Tiantian Fan, Xin Liu, Lingjun Liu, Zhiqi Lin, Mingxuan Wang, Chengyi Wang, Xiangpeng Wei, Wenyuan Xu, Yufeng Yuan, Yu Yue, Lin Yan, Qiying Yu, Xiaochen Zuo, Chi Zhang, Ruofei Zhu, Zhecheng An, Zhihao Bai, Yu Bao, Xingyan Bin, Jiangjie Chen, Feng Chen, Hongmin Chen, Riwei Chen, Liangqiang Chen, Zixin Chen, Jinsong Chen, Siyan Chen, Kaiyuan Chen, Zhi Chen, Jin Chen, Jiecao Chen, Jinxin Chi, Weinan Dai, Ning Dai, Jiahui Dai, Shihan Dou, Yantao Du, Zhengyin Du, Jianhui Duan, Chen Dun, Ting-Han Fan, Jiazhan Feng, Junda Feng, Ziyuan Feng, Yuwei Fu, Wenqi Fu, Hanjie Fu, Hao Ge, Hongyi Guo, Mingji Han, Li Han, Wenhao Hao, Xintong Hao, Qianyu He, Jerry He, Feng He, Wen Heng, Zehua Hong, Qi Hou, Liang Hu, Shengding Hu, Nan Hu, Kai Hua, Qi Huang, Ziyue Huang, Hongzhi Huang, Zihao Huang, Ting Huang, Wenhao Huang, Wei Jia, Bin Jia, Xiaoying Jia, Yuhua Jiang, Haobin Jiang, Ziheng Jiang, Kaihua Jiang, Chengquan Jiang, Jianpeng Jiao, Xiaoran Jin, Xing Jin, Xunhao Lai, Xiang Li, Liyi Li, Hongkai Li, Zheng Li, Shengxian Wan, Ya Wang, Yunshui Li, Chenggang Li, Niuniu Li, Siyu Li, Xi Li, Xiao Li, Aoyan Li, Yuntao Li, Nianning Liang, Xinnian Liang, Haibin Lin, Weijian Lin, Ye Lin, Zhicheng Liu, Guanlin Liu, Chenxiao Liu, Yan Liu, Gaohong Liu, Juncai Liu, Chundian Liu, Deyi Liu, Kaibo Liu, Siyao Liu, Qi Liu, Yongfei Liu, Kang Liu, Gan Liu, Boyi Liu, Rui Long, Weiqiang Lou, Chenwei Lou, Xiang Luo, Yao Luo, Caiping Lv, Heyang Lv, Bole Ma, Qianli Ma, Hongzhi Ma, Yiyuan Ma, Jin Ma, Wenchang Ma, Tingting Ma, Chen Mao, Qiyang Min, Zhe Nan, Guanghan Ning, Jinxiang Ou, Haojie Pan, Renming Pang, Yanghua Peng, Tao Peng, Lihua Qian, Mu Qiao, Meng Qu, Cheng Ren, Hongbin Ren, Yong Shan, Wei Shen, Ke Shen, Kai Shen, Guangming Sheng, Jinlong Shi, Wenlei Shi, Guang Shi, Shuai Shuai Cao, Yuxin Song, Zuquan Song, Jing Su, Yifan Sun, Tao Sun, Zewei Sun, Borui Wan, Xiaohui Wang, Xi Wang, Shuguang Wang, Jun Wang, Qinlong Wang, Chenyuan Wang, Shuai Wang, Zihan Wang, Changbao Wang, Jiaqiang Wang, Shihang Wang, Xuwu Wang, Zaiyuan Wang, Yuxuan Wang, Wenqi Wang, Taiqing Wang, Chengzhi Wei, Houmin Wei, Ziyun Wei, Shufa Wei, Zheng Wu, Yonghui Wu, Yangjun Wu, Bohong Wu, Shuang Wu, Jingqiao Wu, Ning Wu, Shuangzhi Wu, Jianmin Wu, Chenguang Xi, Fan Xia, Yuqiao Xian, Liang Xiang, Boren Xiang, Bowen Xiao, Zhen Xiao, Xia Xiao, Yongsheng Xiao, Chao Xin, Shulin Xin, Yuwen Xiong, Jingjing Xu, Ziwen Xu, Chenyin Xu, Jiayi Xu, Yifan Xu, Wei Xu, Yufei Xu, Shikun Xu, Shipeng Yan, Shen Yan, Qingping Yang, Xi Yang, Tianhao Yang, Yuehang Yang, Yuan Yang, Ximing Yang, Zeyu Yang, Guang Yang, Yifan Yang, Xuesong Yao, Bairen Yi, Fan Yin, Jianian Yin, Ziqiang Ying, Xiangyu Yu, Hongli Yu, Song Yu, Menghan Yu, Huan Yu, Siyu Yuan, Jun Yuan, Yutao Zeng, Tianyang Zhan, Zheng Zhang, Yun Zhang, Mofan Zhang, Wang Zhang, Ru Zhang, Zhi Zhang, Tianqi Zhang, Xinyi Zhang, Zhexi Zhang, Sijun Zhang, Wenqiang Zhang, Xiangxiang Zhang, Yongtao Zhang, Yuyu Zhang, Ge Zhang, He Zhang, Yue Zhang, Renjie Zheng, Ningxin Zheng, Zhuolin Zheng, Yaowei Zheng, Chen Zheng, Xiaoyun Zhi, Wanjun Zhong, Cheng Zhong, Zheng Zhong, Baoquan Zhong, Xun Zhou, Na Zhou, Huan Zhou, Hang Zhu, Defa Zhu, Wenjia Zhu, Lei Zuo

We introduce Seed1. 5-Thinking, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks.

Mixture-of-Experts reinforcement-learning +1

Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models

no code implementations24 Mar 2025 Meng Cao, Pengfei Hu, Yingyao Wang, Jihao Gu, Haoran Tang, Haoze Zhao, Jiahua Dong, Wangbo Yu, Ge Zhang, Ian Reid, Xiaodan Liang

Recent advancements in Large Video Language Models (LVLMs) have highlighted their potential for multi-modal understanding, yet evaluating their factual grounding in video contexts remains a critical unsolved challenge.

High-entropy Advantage in Neural Networks' Generalizability

no code implementations17 Mar 2025 Entao Yang, Xiaotian Zhang, Yue Shang, Ge Zhang

By employing molecular simulation algorithms, we compute entropy landscapes as functions of both training loss and test accuracy (or test loss), on networks with up to 1 million parameters, across four distinct machine learning tasks: arithmetic question, real-world tabular data, image recognition, and language modeling.

Language Modeling Language Modelling

FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis

no code implementations17 Mar 2025 Luxi Chen, Zihan Zhou, Min Zhao, Yikai Wang, Ge Zhang, Wenhao Huang, Hao Sun, Ji-Rong Wen, Chongxuan Li

Generating flexible-view 3D scenes, including 360{\deg} rotation and zooming, from single images is challenging due to a lack of 3D data.

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

no code implementations14 Mar 2025 Weiming Ren, Wentao Ma, Huan Yang, Cong Wei, Ge Zhang, Wenhu Chen

State-of-the-art transformer-based large multimodal models (LMMs) struggle to handle hour-long video inputs due to the quadratic complexity of the causal self-attention operations, leading to high computational costs during training and inference.

Mamba Token Reduction +1

Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

1 code implementation26 Feb 2025 Yancheng He, Shilong Li, Jiaheng Liu, Weixun Wang, Xingyuan Bu, Ge Zhang, Zhongyuan Peng, Zhaoxiang Zhang, Zhicheng Zheng, Wenbo Su, Bo Zheng

In this paper, to understand the qualities of these long CoTs and measure the critique abilities of existing LLMs on these long CoTs, we introduce the DeltaBench, including the generated long CoTs from different o1-like models (e. g., QwQ, DeepSeek-R1) for different reasoning tasks (e. g., Math, Code, General Reasoning), to measure the ability to detect errors in long CoT reasoning.

Math

Audio-FLAN: A Preliminary Release

1 code implementation23 Feb 2025 Liumeng Xue, Ziya Zhou, Jiahao Pan, Zixuan Li, Shuai Fan, Yinghao Ma, Sitong Cheng, Dongchao Yang, Haohan Guo, Yujia Xiao, Xinsheng Wang, Zixuan Shen, Chuanbo Zhu, Xinshen Zhang, Tianchi Liu, Ruibin Yuan, Zeyue Tian, Haohe Liu, Emmanouil Benetos, Ge Zhang, Yike Guo, Wei Xue

Audio-FLAN lays the foundation for unified audio-language models that can seamlessly handle both understanding (e. g., transcription, comprehension) and generation (e. g., speech, music, sound) tasks across a wide range of audio domains in a zero-shot manner.

Zero-Shot Learning

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

no code implementations20 Feb 2025 M-A-P Team, Xinrun Du, Yifan Yao, Kaijing Ma, Bingli Wang, Tianyu Zheng, King Zhu, Minghao Liu, Yiming Liang, Xiaolong Jin, Zhenlin Wei, Chujie Zheng, Kaixin Deng, Shawn Gavin, Shian Jia, Sichao Jiang, Yiyan Liao, Rui Li, Qinrui Li, Sirun Li, Yizhi Li, Yunwen Li, David Ma, Yuansheng Ni, Haoran Que, Qiyao Wang, Zhoufutu Wen, Siwei Wu, Tyshawn Hsing, Ming Xu, Zhenzhu Yang, Zekun Moore Wang, Junting Zhou, Yuelin Bai, Xingyuan Bu, Chenglin Cai, Liang Chen, Yifan Chen, Chengtuo Cheng, Tianhao Cheng, Keyi Ding, Siming Huang, Yun Huang, Yaoru Li, Yizhe Li, Zhaoqun Li, Tianhao Liang, Chengdong Lin, Hongquan Lin, Yinghao Ma, Tianyang Pang, Zhongyuan Peng, Zifan Peng, Qige Qi, Shi Qiu, Xingwei Qu, Shanghaoran Quan, Yizhou Tan, Zili Wang, Chenqing Wang, Hao Wang, Yiya Wang, YuBo Wang, Jiajun Xu, Kexin Yang, Ruibin Yuan, Yuanhao Yue, Tianyang Zhan, Chun Zhang, Jinyang Zhang, Xiyue Zhang, Xingjian Zhang, Yue Zhang, Yongchi Zhao, Xiangyu Zheng, Chenghua Zhong, Yang Gao, Zhoujun Li, Dayiheng Liu, Qian Liu, Tianyu Liu, Shiwen Ni, Junran Peng, Yujia Qin, Wenbo Su, Guoyin Wang, Shi Wang, Jian Yang, Min Yang, Meng Cao, Xiang Yue, Zhaoxiang Zhang, Wangchunshu Zhou, Jiaheng Liu, Qunshu Lin, Wenhao Huang, Ge Zhang

To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines.

Collaborative Filtering

SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

no code implementations18 Feb 2025 Xianfu Cheng, Wei zhang, Shiwei Zhang, Jian Yang, Xiangyuan Guan, Xianjie Wu, Xiang Li, Ge Zhang, Jiaheng Liu, Yuying Mai, Yutao Zeng, Zhoufutu Wen, Ke Jin, Baorui Wang, Weixiao Zhou, Yunhong Lu, Tongliang Li, Wenhao Huang, Zhoujun Li

The increasing application of multi-modal large language models (MLLMs) across various sectors have spotlighted the essence of their output reliability and accuracy, particularly their ability to produce content grounded in factual information (e. g. common and domain-specific knowledge).

Image Comprehension Question Answering +2

CryptoX : Compositional Reasoning Evaluation of Large Language Models

1 code implementation8 Feb 2025 Jiajun Shi, Chaoren Wei, Liqun Yang, Zekun Moore Wang, Chenghao Yang, Ge Zhang, Stephen Huang, Tao Peng, Jian Yang, Zhoufutu Wen

In this paper, we introduce CryptoX, an evaluation framework that, for the first time, combines existing benchmarks and cryptographic, to quantify the compositional reasoning capacity of LLMs.

Generating Symbolic World Models via Test-time Scaling of Large Language Models

1 code implementation7 Feb 2025 Zhouliang Yu, Yuhuan Yuan, Tim Z. Xiao, Fuxiang Frank Xia, Jie Fu, Ge Zhang, Ge Lin, Weiyang Liu

Solving complex planning problems requires Large Language Models (LLMs) to explicitly model the state transition to avoid rule violations, comply with constraints, and ensure optimality-a task hindered by the inherent ambiguity of natural language.

Aligning Instruction Tuning with Pre-training

no code implementations16 Jan 2025 Yiming Liang, Tianyu Zheng, Xinrun Du, Ge Zhang, Jiaheng Liu, Xingwei Qu, Wenqiang Zu, Xingrun Xing, Chujie Zheng, Lei Ma, Wenhu Chen, Guoyin Wang, Zhaoxiang Zhang, Wenhao Huang, Xiang Yue, Jiajun Zhang

Instruction tuning enhances large language models (LLMs) to follow human instructions across diverse tasks, relying on high-quality datasets to guide behavior.

Diversity

KARPA: A Training-free Method of Adapting Knowledge Graph as References for Large Language Model's Reasoning Path Aggregation

no code implementations30 Dec 2024 Siyuan Fang, Kaijing Ma, Tianyu Zheng, Xinrun Du, Ningxuan Lu, Ge Zhang, Qingkun Tang

Leveraging knowledge graphs (KGs) as external knowledge sources has emerged as a viable solution, but existing methods for LLM-based knowledge graph question answering (KGQA) are often limited by step-by-step decision-making on KGs, restricting the global planning and reasoning capabilities of LLMs, or they require fine-tuning or pre-training on specific KGs.

Decision Making Graph Question Answering +2

Path-of-Thoughts: Extracting and Following Paths for Robust Relational Reasoning with Large Language Models

no code implementations23 Dec 2024 Ge Zhang, Mohammad Ali Alomrani, Hongjian Gu, Jiaming Zhou, Yaochen Hu, Bin Wang, Qun Liu, Mark Coates, Yingxue Zhang, Jianye Hao

Large language models (LLMs) possess vast semantic knowledge but often struggle with complex reasoning tasks, particularly in relational reasoning problems such as kinship or spatial reasoning.

Relational Reasoning Spatial Reasoning

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

1 code implementation16 Dec 2024 Liang Chen, Zekun Wang, Shuhuai Ren, Lei LI, Haozhe Zhao, Yunshui Li, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong, Yichi Zhang, Ruoyu Wu, Qingxiu Dong, Ge Zhang, Jian Yang, Lingwei Meng, Shujie Hu, Yulong Chen, Junyang Lin, Shuai Bai, Andreas Vlachos, Xu Tan, Minjia Zhang, Wen Xiao, Aaron Yee, Tianyu Liu, Baobao Chang

As Large Language Models (LLMs) have advanced to unify understanding and generation tasks within the textual modality, recent research has shown that tasks from different modalities can also be effectively encapsulated within the NTP framework, transforming the multimodal information into tokens and predict the next one given the context.

Language Modeling Language Modelling +2

PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos

1 code implementation2 Dec 2024 Meng Cao, Haoran Tang, Haoze Zhao, Hangyu Guo, Jiaheng Liu, Ge Zhang, Ruyang Liu, Qiang Sun, Ian Reid, Xiaodan Liang

In this paper, we propose PhysGame as a pioneering benchmark to evaluate physical commonsense violations in gameplay videos.

Question Answering Video Understanding

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

no code implementations7 Nov 2024 Siming Huang, Tianhao Cheng, J. K. Liu, Jiaran Hao, Liuyihan Song, Yang Xu, J. Yang, Jiaheng Liu, Chenchen Zhang, Linzheng Chai, Ruifeng Yuan, Zhaoxiang Zhang, Jie Fu, Qian Liu, Ge Zhang, Zili Wang, Yuan Qi, Yinghui Xu, Wei Chu

To address the gap, we introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.

Code Generation

MdEval: Massively Multilingual Code Debugging

no code implementations4 Nov 2024 Shukai Liu, Linzheng Chai, Jian Yang, Jiajun Shi, He Zhu, Liran Wang, Ke Jin, Wei zhang, Hualei Zhu, Shuyue Guo, Tao Sun, Jiaheng Liu, Yunlong Duan, Yu Hao, Liqun Yang, Guanglin Niu, Ge Zhang, Zhoujun Li

Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet.

Program Repair

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

no code implementations28 Oct 2024 Jiaheng Liu, Ken Deng, Congnan Liu, Jian Yang, Shukai Liu, He Zhu, Peng Zhao, Linzheng Chai, Yanan Wu, Ke Jin, Ge Zhang, Zekun Wang, Guoan Zhang, Bangyu Xiang, Wenbo Su, Bo Zheng

Besides, the existing benchmarks usually report overall average scores of different languages, where the fine-grained abilities in different completion scenarios are ignored.

Code Completion

Sparse Decomposition of Graph Neural Networks

no code implementations25 Oct 2024 Yaochen Hu, Mai Zeng, Ge Zhang, Pavel Rumiantsev, Liheng Ma, Yingxue Zhang, Mark Coates

Graph Neural Networks (GNN) exhibit superior performance in graph representation learning, but their inference cost can be high, due to an aggregation operation that can require a memory fetch for a very large number of nodes.

Graph Neural Network Graph Representation Learning +2

Can MLLMs Understand the Deep Implication Behind Chinese Images?

1 code implementation17 Oct 2024 Chenhao Zhang, Xi Feng, Yuelin Bai, Xinrun Du, Jinchang Hou, Kaixin Deng, Guangzeng Han, Qinrui Li, Bingli Wang, Jiaheng Liu, Xingwei Qu, Yifei Zhang, Qixuan Zhao, Yiming Liang, Ziqiang Liu, Feiteng Fang, Min Yang, Wenhao Huang, Chenghua Lin, Ge Zhang, Shiwen Ni

To fill the gap, we introduce the **C**hinese **I**mage **I**mplication understanding **Bench**mark, **CII-Bench**, which aims to assess the higher-order perception and understanding capabilities of MLLMs for Chinese images.

A Comparative Study on Reasoning Patterns of OpenAI's o1 Model

1 code implementation17 Oct 2024 Siwei Wu, Zhongyuan Peng, Xinrun Du, Tuney Zheng, Minghao Liu, Jialong Wu, Jiachen Ma, Yizhi Li, Jian Yang, Wangchunshu Zhou, Qunshu Lin, Junbo Zhao, Zhaoxiang Zhang, Wenhao Huang, Ge Zhang, Chenghua Lin, J. H. Liu

In our work, to investigate the reasoning patterns of o1, we compare o1 with existing Test-time Compute methods (BoN, Step-wise BoN, Agent Workflow, and Self-Refine) by using OpenAI's GPT-4o as a backbone on general reasoning benchmarks in three domains (i. e., math, coding, commonsense reasoning).

Math

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

1 code implementation15 Oct 2024 Pei Wang, Yanan Wu, Zekun Wang, Jiaheng Liu, Xiaoshuai Song, Zhongyuan Peng, Ken Deng, Chenchen Zhang, Jiakai Wang, Junran Peng, Ge Zhang, Hangyu Guo, Zhaoxiang Zhang, Wenbo Su, Bo Zheng

Besides, all evaluation metrics of our MTU-Bench are based on the prediction results and the ground truth without using any GPT or human evaluation metrics.

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

2 code implementations10 Oct 2024 Bofei Gao, Feifan Song, Zhe Yang, Zefan Cai, Yibo Miao, Qingxiu Dong, Lei LI, Chenghao Ma, Liang Chen, Runxin Xu, Zhengyang Tang, Benyou Wang, Daoguang Zan, Shanghaoran Quan, Ge Zhang, Lei Sha, Yichang Zhang, Xuancheng Ren, Tianyu Liu, Baobao Chang

However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e. g., OpenAI o1 achieves 94. 8\% on MATH dataset), indicating their inadequacy for truly challenging these models.

GSM8K Math +1

ING-VP: MLLMs cannot Play Easy Vision-based Games Yet

1 code implementation9 Oct 2024 Haoran Zhang, Hangyu Guo, Shuyue Guo, Meng Cao, Wenhao Huang, Jiaheng Liu, Ge Zhang

To bridge this gap, we present ING-VP, the first INteractive Game-based Vision Planning benchmark, specifically designed to evaluate the spatial imagination and multi-step reasoning abilities of MLLMs.

Spatial Reasoning

General Preference Modeling with Preference Representations for Aligning Language Models

1 code implementation3 Oct 2024 Yifan Zhang, Ge Zhang, Yue Wu, Kangping Xu, Quanquan Gu

In this paper, we introduce preference representation learning, an approach that embeds responses into a latent space to capture intricate preference structures efficiently, achieving linear query complexity.

Language Modelling Representation Learning

MIO: A Foundation Model on Multimodal Tokens

1 code implementation26 Sep 2024 Zekun Wang, King Zhu, Chunpu Xu, Wangchunshu Zhou, Jiaheng Liu, Yibo Zhang, Jiashuo Wang, Ning Shi, Siyu Li, Yizhi Li, Haoran Que, Zhaoxiang Zhang, Yuanxing Zhang, Ge Zhang, Ke Xu, Jie Fu, Wenhao Huang

In this paper, we introduce MIO, a novel foundation model built on multimodal tokens, capable of understanding and generating speech, text, images, and videos in an end-to-end, autoregressive manner.

model Text Generation

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

1 code implementation24 Sep 2024 Haoran Que, Feiyu Duan, Liqun He, Yutao Mou, Wangchunshu Zhou, Jiaheng Liu, Wenge Rong, Zekun Moore Wang, Jian Yang, Ge Zhang, Junran Peng, Zhaoxiang Zhang, Songyang Zhang, Kai Chen

Therefore, we introduce the Hierarchical Long Text Generation Benchmark (HelloBench), a comprehensive, in-the-wild, and open-ended benchmark to evaluate LLMs' performance in generating long text.

Long-Context Understanding Text Generation

OmniBench: Towards The Future of Universal Omni-Language Models

1 code implementation23 Sep 2024 Yizhi Li, Ge Zhang, Yinghao Ma, Ruibin Yuan, Kang Zhu, Hangyu Guo, Yiming Liang, Jiaheng Liu, Zekun Wang, Jian Yang, Siwei Wu, Xingwei Qu, Jinjie Shi, Xinyue Zhang, Zhenzhu Yang, Xiangzhou Wang, Zhaoxiang Zhang, Zachary Liu, Emmanouil Benetos, Wenhao Huang, Chenghua Lin

Recent advancements in multimodal large language models (MLLMs) have focused on integrating multiple modalities, yet their ability to simultaneously process and reason across different inputs remains underexplored.

Instruction Following

Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data

1 code implementation19 Sep 2024 Jiaming Zhou, Abbas Ghaddar, Ge Zhang, Liheng Ma, Yaochen Hu, Soumyasundar Pal, Mark Coates, Bin Wang, Yingxue Zhang, Jianye Hao

Despite recent advances in training and prompting strategies for Large Language Models (LLMs), these models continue to face challenges with complex logical reasoning tasks that involve long reasoning chains.

Logical Reasoning Spatial Reasoning

LIME: Less Is More for MLLM Evaluation

2 code implementations10 Sep 2024 King Zhu, Qianbo Zang, Shian Jia, Siwei Wu, Feiteng Fang, Yizhi Li, Shawn Gavin, Tuney Zheng, Jiawei Guo, Bo Li, HaoNing Wu, Xingwei Qu, Jian Yang, Zachary Liu, Xiang Yue, J. H. Liu, Chenghua Lin, Min Yang, Shiwen Ni, Wenhao Huang, Ge Zhang

However, many of these benchmarks include overly simple or uninformative samples, complicating the effective distinction of different MLLMs' performance.

Image Captioning Question Answering +1

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

2 code implementations4 Sep 2024 Xiang Yue, Tianyu Zheng, Yuansheng Ni, YuBo Wang, Kai Zhang, Shengbang Tong, Yuxuan Sun, Botao Yu, Ge Zhang, Huan Sun, Yu Su, Wenhu Chen, Graham Neubig

This paper introduces MMMU-Pro, a robust version of the Massive Multi-discipline Multimodal Understanding and Reasoning (MMMU) benchmark.

Optical Character Recognition (OCR)

FuzzCoder: Byte-level Fuzzing Test via Large Language Model

1 code implementation3 Sep 2024 Liqun Yang, Jian Yang, Chaoren Wei, Guanglin Niu, Ge Zhang, Yunli Wang, Linzheng Chai, Wanxu Xia, Hongcheng Guo, Shun Zhang, Jiaheng Liu, Yuwei Yin, Junran Peng, Jiaxin Ma, Liang Sun, Zhoujun Li

In this work, we propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks to guide future fuzzing explorations.

Language Modeling Language Modelling +2

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

no code implementations17 Aug 2024 Xianjie Wu, Jian Yang, Linzheng Chai, Ge Zhang, Jiaheng Liu, Xinrun Du, Di Liang, Daixin Shu, Xianfu Cheng, Tianzhen Sun, Guanglin Niu, Tongliang Li, Zhoujun Li

Recent advancements in Large Language Models (LLMs) have markedly enhanced the interpretation and processing of tabular data, introducing previously unimaginable capabilities.

Question Answering

I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

1 code implementation15 Aug 2024 Yiming Liang, Ge Zhang, Xingwei Qu, Tianyu Zheng, Jiawei Guo, Xinrun Du, Zhenzhu Yang, Jiaheng Liu, Chenghua Lin, Lei Ma, Wenhao Huang, Jiajun Zhang

Large Language Models (LLMs) have achieved significant advancements, however, the common learning paradigm treats LLMs as passive information repositories, neglecting their potential for active learning and alignment.

Active Learning Code Generation

Overview of the NLPCC 2024 Shared Task on Chinese Metaphor Generation

no code implementations8 Aug 2024 Xingwei Qu, Ge Zhang, Siwei Wu, Yizhi Li, Chenghua Lin

The goal of this shared task is to generate Chinese metaphors using machine learning techniques and effectively identifying basic components of metaphorical sentences.

Sentence

DDK: Distilling Domain Knowledge for Efficient Large Language Models

no code implementations23 Jul 2024 Jiaheng Liu, Chenchen Zhang, Jinyang Guo, Yuanxing Zhang, Haoran Que, Ken Deng, Zhiqi Bai, Jie Liu, Ge Zhang, Jiakai Wang, Yanan Wu, Congnan Liu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands.

Knowledge Distillation

LongIns: A Challenging Long-context Instruction-based Exam for LLMs

no code implementations25 Jun 2024 Shawn Gavin, Tuney Zheng, Jiaheng Liu, Quehry Que, Noah Wang, Jian Yang, Chenchen Zhang, Wenhao Huang, Wenhu Chen, Ge Zhang

To address these issues, we propose the LongIns benchmark dataset, a challenging long-context instruction-based exam for LLMs, which is built based on the existing instruction datasets.

16k 4k

GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models

1 code implementation21 Jun 2024 Leyan Wang, Yonggang Jin, Tianhao Shen, Tianyu Zheng, Xinrun Du, Chenchen Zhang, Wenhao Huang, Jiaheng Liu, Shi Wang, Ge Zhang, Liuyu Xiang, Zhaofeng He

As large language models (LLMs) continue to develop and gain widespread application, the ability of LLMs to exhibit empathy towards diverse group identities and understand their perspectives is increasingly recognized as critical.

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

no code implementations20 Jun 2024 Junjie Wang, Yin Zhang, Yatai Ji, Yuxiang Zhang, Chunyang Jiang, YuBo Wang, Kang Zhu, Zekun Wang, Tiezhen Wang, Wenhao Huang, Jie Fu, Bei Chen, Qunshu Lin, Minghao Liu, Ge Zhang, Wenhu Chen

Recent advancements in Large Multimodal Models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks.

MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

no code implementations19 Jun 2024 Shun Wang, Ge Zhang, Han Wu, Tyler Loakman, Wenhao Huang, Chenghua Lin

Machine Translation (MT) has developed rapidly since the release of Large Language Models and current MT evaluation is performed through comparison with reference human translations or by predicting quality scores from human-labeled data.

Machine Translation Translation

VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text

1 code implementation10 Jun 2024 Tianyu Zhang, Suyuchen Wang, Lu Li, Ge Zhang, Perouz Taslakian, Sai Rajeswar, Jie Fu, Bang Liu, Yoshua Bengio

This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images.

Language Modeling Language Modelling +4

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

no code implementations3 Jun 2024 Haoran Que, Jiaheng Liu, Ge Zhang, Chenchen Zhang, Xingwei Qu, Yinghao Ma, Feiyu Duan, Zhiqi Bai, Jiakai Wang, Yuanxing Zhang, Xu Tan, Jie Fu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng

To address the limitations of existing methods, inspired by the Scaling Law for performance prediction, we propose to investigate the Scaling Law of the Domain-specific Continual Pre-Training (D-CPT Law) to decide the optimal mixture ratio with acceptable training costs for LLMs of different sizes.

Math

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

2 code implementations3 Jun 2024 YuBo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, Wenhu Chen

In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains.

MMLU Multi-task Language Understanding

MAmmoTH2: Scaling Instructions from the Web

no code implementations6 May 2024 Xiang Yue, Tuney Zheng, Ge Zhang, Wenhu Chen

Notably, MAmmoTH2-7B's (Mistral) performance increases from 11% to 36. 7% on MATH and from 36% to 68. 4% on GSM8K without training on any in-domain data.

Chatbot GSM8K +1

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

1 code implementation28 Apr 2024 Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints.

In-Context Learning Music Generation

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

no code implementations5 Apr 2024 Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Wenhu Chen, Ge Zhang

In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs.

Language Modeling Language Modelling +1

Long-context LLMs Struggle with Long In-context Learning

2 code implementations2 Apr 2024 Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, Wenhu Chen

We introduce a benchmark (LongICLBench) for long in-context learning in extreme-label classification using six datasets with 28 to 174 classes and input lengths from 2K to 50K tokens.

2k In-Context Learning +1

Yi: Open Foundation Models by 01.AI

1 code implementation7 Mar 2024 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Guoyin Wang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie, Yanpeng Li, Yuchi Xu, Yudong Liu, Yue Wang, Yuxuan Cai, Zhenyu Gu, Zhiyuan Liu, Zonghong Dai

The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models.

 Ranked #1 on Chatbot on AlpacaEval (using extra training data)

Attribute Chatbot +4

TEGEE: Task dEfinition Guided Expert Ensembling for Generalizable and Few-shot Learning

no code implementations7 Mar 2024 Xingwei Qu, Yiming Liang, Yucheng Wang, Tianyu Zheng, Tommy Yue, Xingyuan Bu, Lei Ma, Stephen W. Huang, Jiajun Zhang, Yinan Shi, Chenghua Lin, Jie Fu, Ge Zhang

Our framework employs a dual 3B model approach, with each model assigned a distinct role: one focuses on task definition extraction, while the other handles learning from demonstrations.

Continual Learning Definition Extraction +3

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

no code implementations26 Feb 2024 Alex Zhuang, Ge Zhang, Tianyu Zheng, Xinrun Du, Junjie Wang, Weiming Ren, Stephen W. Huang, Jie Fu, Xiang Yue, Wenhu Chen

Utilizing this dataset, we train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters.

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

1 code implementation22 Feb 2024 Tianyu Zheng, Ge Zhang, Tianhao Shen, Xueling Liu, Bill Yuchen Lin, Jie Fu, Wenhu Chen, Xiang Yue

However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter.

Code Generation HumanEval +1

MORE-3S:Multimodal-based Offline Reinforcement Learning with Shared Semantic Spaces

1 code implementation20 Feb 2024 Tianyu Zheng, Ge Zhang, Xingwei Qu, Ming Kuang, Stephen W. Huang, Zhaofeng He

Drawing upon the intuition that aligning different modalities to the same semantic embedding space would allow models to understand states and actions more easily, we propose a new perspective to the offline reinforcement learning (RL) challenge.

Decision Making Offline RL +4

CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models

no code implementations20 Feb 2024 Yizhi Li, Ge Zhang, Xingwei Qu, Jiali Li, Zhaoqun Li, Zekun Wang, Hao Li, Ruibin Yuan, Yinghao Ma, Kai Zhang, Wangchunshu Zhou, Yiming Liang, Lei Zhang, Lei Ma, Jiajun Zhang, Zuowen Li, Stephen W. Huang, Chenghua Lin, Jie Fu

The advancement of large language models (LLMs) has enhanced the ability to generalize across a wide range of unseen natural language processing (NLP) tasks through instruction-following.

Instruction Following

CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation

2 code implementations20 Feb 2024 Yujie Shao, Xinrong Yao, Xingwei Qu, Chenghua Lin, Shi Wang, Stephen W. Huang, Ge Zhang, Jie Fu

These models are able to generate creative and fluent metaphor sentences more frequently induced by selected samples from our dataset, demonstrating the value of our corpus for Chinese metaphor research.

AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

1 code implementation19 Feb 2024 Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, Hang Yan, Jie Fu, Tao Gui, Tianxiang Sun, Yugang Jiang, Xipeng Qiu

We introduce AnyGPT, an any-to-any multimodal language model that utilizes discrete representations for the unified processing of various modalities, including speech, text, images, and music.

Language Modeling Language Modelling +1

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

1 code implementation6 Feb 2024 Weiming Ren, Huan Yang, Ge Zhang, Cong Wei, Xinrun Du, Wenhao Huang, Wenhu Chen

To verify the effectiveness of our method, we propose I2V-Bench, a comprehensive evaluation benchmark for I2V generation.

Image to Video Generation

Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction

1 code implementation6 Feb 2024 Yonggang Jin, Ge Zhang, Hao Zhao, Tianyu Zheng, Jarvi Guo, Liuyu Xiang, Shawn Yue, Stephen W. Huang, Zhaofeng He, Jie Fu

Drawing inspiration from the success of multimodal instruction tuning in visual tasks, we treat the visual-based RL task as a long-horizon vision task and construct a set of multimodal game instructions to incorporate instruction tuning into a decision transformer.

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

1 code implementation24 Jan 2024 Siwei Wu, Yizhi Li, Kang Zhu, Ge Zhang, Yiming Liang, Kaijing Ma, Chenghao Xiao, Haoran Zhang, Bohao Yang, Wenhu Chen, Wenhao Huang, Noura Al Moubayed, Jie Fu, Chenghua Lin

We further annotate the image-text pairs with two-level subset-subcategory hierarchy annotations to facilitate a more comprehensive evaluation of the baselines.

Benchmarking Image Captioning +3

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

1 code implementation22 Jan 2024 Ge Zhang, Xinrun Du, Bei Chen, Yiming Liang, Tongxu Luo, Tianyu Zheng, Kang Zhu, Yuyang Cheng, Chunpu Xu, Shuyue Guo, Haoran Zhang, Xingwei Qu, Junjie Wang, Ruibin Yuan, Yizhi Li, Zekun Wang, Yudong Liu, Yu-Hsuan Tsai, Fengji Zhang, Chenghua Lin, Wenhao Huang, Jie Fu

We introduce CMMMU, a new Chinese Massive Multi-discipline Multimodal Understanding benchmark designed to evaluate LMMs on tasks demanding college-level subject knowledge and deliberate reasoning in a Chinese context.

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

no code implementations13 Jan 2024 Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su, Tiezheng Ge, Jie Fu, Wenhu Chen, Bo Zheng

Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources.

4k Position

Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation

1 code implementation12 Jan 2024 Tianyu Zheng, Shuyue Guo, Xingwei Qu, Jiawei Guo, Xinrun Du, Qi Jia, Chenghua Lin, Wenhao Huang, Jie Fu, Ge Zhang

In this paper, we introduce Kun, a novel approach for creating high-quality instruction-tuning datasets for large language models (LLMs) without relying on manual annotations.

Instruction Following Translation

Improving Depth Completion via Depth Feature Upsampling

no code implementations CVPR 2024 YuFei Wang, Ge Zhang, Shaoqian Wang, Bo Li, Qi Liu, Le Hui, Yuchao Dai

In this paper we visualize the internal feature maps to analyze how the network densifies the input sparse depth.

Decoder Depth Completion +1

Align on the Fly: Adapting Chatbot Behavior to Established Norms

1 code implementation26 Dec 2023 Chunpu Xu, Steffi Chern, Ethan Chern, Ge Zhang, Zekun Wang, Ruibo Liu, Jing Li, Jie Fu, PengFei Liu

In this paper, we aim to align large language models with the ever-changing, complex, and diverse human values (e. g., social norms) across time and locations.

Chatbot

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

no code implementations28 Nov 2023 Cong Wei, Yang Chen, Haonan Chen, Hexiang Hu, Ge Zhang, Jie Fu, Alan Ritter, Wenhu Chen

Existing information retrieval (IR) models often assume a homogeneous format, limiting their applicability to diverse user needs, such as searching for images with text descriptions, searching for a news article with a headline image, or finding a similar photo with a query image.

Benchmarking Information Retrieval +2

Massive Editing for Large Language Models via Meta Learning

1 code implementation8 Nov 2023 Chenmien Tan, Ge Zhang, Jie Fu

While large language models (LLMs) have enabled learning knowledge from the pre-training corpora, the acquired knowledge may be fundamentally incorrect or outdated over time, which necessitates rectifying the knowledge of the language model (LM) after the training.

Fact Checking Language Modeling +4

LRRU: Long-short Range Recurrent Updating Networks for Depth Completion

no code implementations ICCV 2023 YuFei Wang, Bo Li, Ge Zhang, Qi Liu, Tao Gao, Yuchao Dai

Existing deep learning-based depth completion methods generally employ massive stacked layers to predict the dense depth map from sparse input data.

Depth Completion

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks

1 code implementation1 Oct 2023 Dongfu Jiang, Yishan Li, Ge Zhang, Wenhao Huang, Bill Yuchen Lin, Wenhu Chen

To quantitatively assess our metric, we evaluate its correlation with human ratings on 5 held-in datasets, 2 held-out datasets and show that TIGERScore can achieve the open-source SoTA correlation with human ratings across these datasets and almost approaches GPT-4 evaluator.

All Text Generation

AutoAgents: A Framework for Automatic Agent Generation

1 code implementation29 Sep 2023 Guangyao Chen, Siwei Dong, Yu Shu, Ge Zhang, Jaward Sesay, Börje F. Karlsson, Jie Fu, Yemin Shi

Therefore, we introduce AutoAgents, an innovative framework that adaptively generates and coordinates multiple specialized agents to build an AI team according to different tasks.

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

1 code implementation15 Sep 2023 Zihao Deng, Yinghao Ma, Yudong Liu, Rongchen Guo, Ge Zhang, Wenhu Chen, Wenhao Huang, Emmanouil Benetos

Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored.

Caption Generation Language Modelling +1

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

1 code implementation11 Sep 2023 Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.

Math Mathematical Reasoning

Fragment and Integrate Network (FIN): A Novel Spatial-Temporal Modeling Based on Long Sequential Behavior for Online Food Ordering Click-Through Rate Prediction

no code implementations30 Aug 2023 Jun Li, Jingjian Wang, Hongwei Wang, Xing Deng, Jielong Chen, Bing Cao, Zekun Wang, Guanjie Xu, Ge Zhang, Feng Shi, Hualei Liu

(ii) Integrate Network (IN) builds a new integrated sequence by utilizing spatial-temporal interaction on MSS and captures the comprehensive spatial-temporal representation by modeling the integrated sequence with a complicated attention.

Click-Through Rate Prediction Recommendation Systems

SoK: Comparing Different Membership Inference Attacks with a Comprehensive Benchmark

1 code implementation12 Jul 2023 Jun Niu, Xiaoyan Zhu, Moxuan Zeng, Ge Zhang, Qingyang Zhao, Chunhui Huang, Yangming Zhang, Suyu An, Yangzhong Wang, Xinghui Yue, Zhipeng He, Weihao Guo, Kuo Shen, Peng Liu, Yulong Shen, Xiaohong Jiang, Jianfeng Ma, Yuqing Zhang

We have identified three principles for the proposed "comparing different MI attacks" methodology, and we have designed and implemented the MIBench benchmark with 84 evaluation scenarios for each dataset.

On the Effectiveness of Speech Self-supervised Learning for Music

no code implementations11 Jul 2023 Yinghao Ma, Ruibin Yuan, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Ruibo Liu, Gus Xia, Roger Dannenberg, Yike Guo, Jie Fu

Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech.

Information Retrieval Music Information Retrieval +2

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

1 code implementation29 Jun 2023 Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi Li, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wei Xue, Yike Guo

We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal.

Automatic Lyrics Transcription Language Modeling +4

TPDM: Selectively Removing Positional Information for Zero-shot Translation via Token-Level Position Disentangle Module

no code implementations31 May 2023 Xingran Chen, Ge Zhang, Jie Fu

Due to Multilingual Neural Machine Translation's (MNMT) capability of zero-shot translation, many works have been carried out to fully exploit the potential of MNMT in zero-shot translation.

Position Translation

Training Socially Aligned Language Models on Simulated Social Interactions

1 code implementation26 May 2023 Ruibo Liu, Ruixin Yang, Chenyan Jia, Ge Zhang, Denny Zhou, Andrew M. Dai, Diyi Yang, Soroush Vosoughi

Social alignment in AI systems aims to ensure that these models behave according to established societal values.

Interactive Natural Language Processing

no code implementations22 May 2023 Zekun Wang, Ge Zhang, Kexin Yang, Ning Shi, Wangchunshu Zhou, Shaochun Hao, Guangzheng Xiong, Yizhi Li, Mong Yuan Sim, Xiuying Chen, Qingqing Zhu, Zhenzhu Yang, Adam Nik, Qi Liu, Chenghua Lin, Shi Wang, Ruibo Liu, Wenhu Chen, Ke Xu, Dayiheng Liu, Yike Guo, Jie Fu

Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP, aimed at addressing limitations in existing frameworks while aligning with the ultimate goals of artificial intelligence.

Decision Making

RSC-VAE: Recoding Semantic Consistency Based VAE for One-Class Novelty Detection

no code implementations7 May 2023 Ge Zhang, Wangzhe Du

While in this paper, we further exploit the latent space of Variational Auto-encoder (VAE), a typical reconstruction based model, and we innovatively divide it into three regions: Normal/Anomalous/Unknown-semantic-region.

Novelty Detection

Chinese Open Instruction Generalist: A Preliminary Release

2 code implementations17 Apr 2023 Ge Zhang, Yemin Shi, Ruibo Liu, Ruibin Yuan, Yizhi Li, Siwei Dong, Yu Shu, Zhaoqun Li, Zekun Wang, Chenghua Lin, Wenhao Huang, Jie Fu

Instruction tuning is widely recognized as a key technique for building generalist language models, which has attracted the attention of researchers and the public with the release of InstructGPT~\citep{ouyang2022training} and ChatGPT\footnote{\url{https://chat. openai. com/}}.

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

1 code implementation1 Jan 2023 Ge Zhang, Yizhi Li, Yaoyao Wu, Linyuan Zhang, Chenghua Lin, Jiayi Geng, Shi Wang, Jie Fu

As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese.

Sentence

HERB: Measuring Hierarchical Regional Bias in Pre-trained Language Models

1 code implementation5 Nov 2022 Yizhi Li, Ge Zhang, Bohao Yang, Chenghua Lin, Shi Wang, Anton Ragni, Jie Fu

In addition to verifying the existence of regional bias in LMs, we find that the biases on regional groups can be strongly influenced by the geographical clustering of the groups.

Fairness

1Cademy @ Causal News Corpus 2022: Leveraging Self-Training in Causality Classification of Socio-Political Event Data

1 code implementation4 Nov 2022 Adam Nik, Ge Zhang, Xingran Chen, Mingyu Li, Jie Fu

This paper details our participation in the Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE) workshop @ EMNLP 2022, where we take part in Subtask 1 of Shared Task 3.

1Cademy @ Causal News Corpus 2022: Enhance Causal Span Detection via Beam-Search-based Position Selector

1 code implementation31 Oct 2022 Xingran Chen, Ge Zhang, Adam Nik, Mingyu Li, Jie Fu

In this paper, we present our approach and empirical observations for Cause-Effect Signal Span Detection -- Subtask 2 of Shared task 3~\cite{tan-etal-2022-event} at CASE 2022.

Data Augmentation Language Modeling +4

Correlation between entropy and generalizability in a neural network

no code implementations5 Jul 2022 Ge Zhang

Although neural networks can solve very complex machine-learning problems, the theoretical reason for their generalizability is still not fully understood.

Graph-level Neural Networks: Current Progress and Future Directions

no code implementations31 May 2022 Ge Zhang, Jia Wu, Jian Yang, Shan Xue, Wenbin Hu, Chuan Zhou, Hao Peng, Quan Z. Sheng, Charu Aggarwal

To frame this survey, we propose a systematic taxonomy covering GLNNs upon deep neural networks, graph neural networks, and graph pooling.

Survey

Denoised Internal Models: a Brain-Inspired Autoencoder against Adversarial Attacks

no code implementations21 Nov 2021 Kaiyuan Liu, Xingyu Li, Yurui Lai, Ge Zhang, Hang Su, Jiachen Wang, Chunxu Guo, Jisong Guan, Yi Zhou

Despite its great success, deep learning severely suffers from robustness; that is, deep neural networks are very vulnerable to adversarial attacks, even the simplest ones.

Spectral Variability Augmented Sparse Unmixing of Hyperspectral Images

no code implementations19 Oct 2021 Ge Zhang, Shaohui Mei, Mingyang Ma, Yan Feng, Qian Du

Spectral unmixing (SU) expresses the mixed pixels existed in hyperspectral images as the product of endmember and abundance, which has been widely used in hyperspectral imagery analysis.

Spectral Reconstruction

StrobeNet: Category-Level Multiview Reconstruction of Articulated Objects

no code implementations17 May 2021 Ge Zhang, Or Litany, Srinath Sridhar, Leonidas Guibas

We present StrobeNet, a method for category-level 3D reconstruction of articulating objects from one or more unposed RGB images.

3D Reconstruction Object

Tilting the playing field: Dynamical loss functions for machine learning

1 code implementation7 Feb 2021 Miguel Ruiz-Garcia, Ge Zhang, Samuel S. Schoenholz, Andrea J. Liu

In underparameterized networks, such dynamical loss functions can lead to successful training for networks that fail to find a deep minima of the standard cross-entropy loss.

BIG-bench Machine Learning

Diverse Melody Generation from Chinese Lyrics via Mutual Information Maximization

no code implementations7 Dec 2020 Ruibin Yuan, Ge Zhang, Anqiao Yang, Xinyue Zhang

In this paper, we propose to adapt the method of mutual information maximization into the task of Chinese lyrics conditioned melody generation to improve the generation quality and diversity.

Diversity

CORAL: COde RepresentAtion Learning with Weakly-Supervised Transformers for Analyzing Data Analysis

no code implementations28 Aug 2020 Ge Zhang, Mike A. Merrill, Yang Liu, Jeffrey Heer, Tim Althoff

Large scale analysis of source code, and in particular scientific source code, holds the promise of better understanding the data science process, identifying analytical best practices, and providing insights to the builders of scientific toolkits.

Descriptive Representation Learning

Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues

no code implementations17 Jun 2020 Jianrong Wang, Ge Zhang, Zhen-Yu Wu, XueWei Li, Li Liu

Compared with static views, abundant dynamic properties between video frames are beneficial to refined depth estimation, especially for dynamic objects.

Monocular Depth Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.