Search Results for author: Yang Fan

Found 25 papers, 10 papers with code

mixSeq: A Simple Data Augmentation Methodfor Neural Machine Translation

no code implementations ACL (IWSLT) 2021 Xueqing Wu, Yingce Xia, Jinhua Zhu, Lijun Wu, Shufang Xie, Yang Fan, Tao Qin

Data augmentation, which refers to manipulating the inputs (e. g., adding random noise, masking specific parts) to enlarge the dataset, has been widely adopted in machine learning.

Data Augmentation Diversity +2

WorldPM: Scaling Human Preference Modeling

no code implementations15 May 2025 Binghai Wang, Runji Lin, Keming Lu, Le Yu, Zhenru Zhang, Fei Huang, Chujie Zheng, Kai Dang, Yang Fan, Xingzhang Ren, An Yang, Binyuan Hui, Dayiheng Liu, Tao Gui, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Bowen Yu, Jingren Zhou, Junyang Lin

Motivated by scaling laws in language modeling that demonstrate how test loss scales as a power law with model and dataset sizes, we find that similar laws exist in preference modeling.

CrossFormer: Cross-Segment Semantic Fusion for Document Segmentation

no code implementations31 Mar 2025 Tongke Ni, Yang Fan, Junru Zhou, XiangPing Wu, Qingcai Chen

Text semantic segmentation involves partitioning a document into multiple paragraphs with continuous semantics based on the subject matter, contextual information, and document structure.

RAG Segmentation +1

Qwen2.5-Omni Technical Report

1 code implementation26 Mar 2025 Jin Xu, Zhifang Guo, Jinzheng He, Hangrui Hu, Ting He, Shuai Bai, Keqin Chen, Jialin Wang, Yang Fan, Kai Dang, Bin Zhang, Xiong Wang, Yunfei Chu, Junyang Lin

In this framework, Thinker functions as a large language model tasked with text generation, while Talker is a dual-track autoregressive model that directly utilizes the hidden representations from the Thinker to produce audio tokens as output.

Ranked #3 on Zero-Shot Video Question Answer on EgoSchema (fullset) (using extra training data)

Automatic Speech Recognition (ASR) GSM8K +5

AdEval: Alignment-based Dynamic Evaluation to Mitigate Data Contamination in Large Language Models

no code implementations23 Jan 2025 Yang Fan

AdEval extracts key knowledge points and main ideas to align dynamically generated questions with static data's core concepts.

Fairness

Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment

1 code implementation28 May 2024 Keming Lu, Bowen Yu, Fei Huang, Yang Fan, Runji Lin, Chang Zhou

Effectively aligning Large Language Models (LLMs) with human-centric values while preventing the degradation of abilities acquired through Pre-training and Supervised Fine-tuning (SFT) poses a central challenge in Reinforcement Learning from Human Feedback (RLHF).

Structure-Unified M-Tree Coding Solver for MathWord Problem

1 code implementation22 Oct 2022 Bin Wang, Jiangzhou Ju, Yang Fan, Xinyu Dai, ShuJian Huang, Jiajun Chen

As one of the challenging NLP tasks, designing math word problem (MWP) solvers has attracted increasing research attention for the past few years.

Math

Discovering Drug-Target Interaction Knowledge from Biomedical Literature

no code implementations27 Sep 2021 Yutai Hou, Yingce Xia, Lijun Wu, Shufang Xie, Yang Fan, Jinhua Zhu, Wanxiang Che, Tao Qin, Tie-Yan Liu

We regard the DTI triplets as a sequence and use a Transformer-based model to directly generate them without using the detailed annotations of entities and relations.

Contextual Domain Classification with Temporal Representations

no code implementations NAACL 2021 Tzu-Hsiang Lin, Yipeng Shi, Chentao Ye, Yang Fan, Weitong Ruan, Emre Barut, Wael Hamza, Chengwei Su

In commercial dialogue systems, the Spoken Language Understanding (SLU) component tends to have numerous domains thus context is needed to help resolve ambiguities.

Classification domain classification +1

CN-HIT-IT.NLP at SemEval-2020 Task 4: Enhanced Language Representation with Multiple Knowledge Triples

no code implementations SEMEVAL 2020 Yice Zhang, Jiaxuan Lin, Yang Fan, Peng Jin, Yuanchao Liu, Bingquan Liu

For this task, it is obvious that external knowledge, such as Knowledge graph, can help the model understand commonsense in natural language statements.

Knowledge Graphs

Learning to Reweight with Deep Interactions

no code implementations9 Jul 2020 Yang Fan, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li

Recently, the concept of teaching has been introduced into machine learning, in which a teacher model is used to guide the training of a student model (which will be used in real tasks) through data selection, loss function design, etc.

Image Classification Machine Translation +1

Multi-branch Attentive Transformer

1 code implementation18 Jun 2020 Yang Fan, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

While the multi-branch architecture is one of the key ingredients to the success of computer vision tasks, it has not been well investigated in natural language processing, especially sequence learning tasks.

Code Generation Machine Translation +2

Learning to Teach with Dynamic Loss Functions

no code implementations NeurIPS 2018 Lijun Wu, Fei Tian, Yingce Xia, Yang Fan, Tao Qin, Jian-Huang Lai, Tie-Yan Liu

Different from typical learning settings in which the loss function of a machine learning model is predefined and fixed, in our framework, the loss function of a machine learning model (we call it student) is defined by another machine learning model (we call it teacher).

BIG-bench Machine Learning Image Classification +1

Learning to Teach

no code implementations ICLR 2018 Yang Fan, Fei Tian, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

Teaching plays a very important role in our society, by spreading human knowledge and educating our next generations.

BIG-bench Machine Learning Image Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.