Search Results for author: Aref Jafari

Found 9 papers, 3 papers with code

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

1 code implementation31 Jul 2023 Ehsan Kamalloo, Aref Jafari, Xinyu Zhang, Nandan Thakur, Jimmy Lin

In this paper, we introduce a new dataset, HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) for building end-to-end generative information-seeking models that are capable of retrieving candidate quotes and generating attributed explanations.

Information Retrieval Informativeness +1

Improved knowledge distillation by utilizing backward pass knowledge in neural networks

no code implementations27 Jan 2023 Aref Jafari, Mehdi Rezagholizadeh, Ali Ghodsi

Augmenting the training set by adding this auxiliary improves the performance of KD significantly and leads to a closer match between the student and the teacher.

Knowledge Distillation Model Compression

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

no code implementations12 Dec 2022 Aref Jafari, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart, Ali Ghodsi

Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher).

Knowledge Distillation Natural Language Understanding

Do we need Label Regularization to Fine-tune Pre-trained Language Models?

no code implementations25 May 2022 Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Pascal Poupart, Ali Ghodsi

Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model.

Knowledge Distillation Model Compression

Annealing Knowledge Distillation

1 code implementation EACL 2021 Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Knowledge distillation (KD) is a prominent model compression technique for deep neural networks in which the knowledge of a trained large teacher model is transferred to a smaller student model.

Image Classification Knowledge Distillation +1

Segmentation Approach for Coreference Resolution Task

no code implementations30 Jun 2020 Aref Jafari, Ali Ghodsi

This has been accomplished by defining an embedding method for the position of all members of a coreference cluster in a document and resolving all of them for a given mention.

coreference-resolution Position +1

Cannot find the paper you are looking for? You can Submit a new open access paper.