no code implementations • NAACL 2022 • Jingyi You, Dongyuan Li, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu Okumura
Previous studies on the timeline summarization (TLS) task ignored the information interaction between sentences and dates, and adopted pre-defined unlearnable representations for them.
1 code implementation • RANLP 2021 • Thodsaporn Chay-intr, Hidetaka Kamigaito, Manabu Okumura
These models estimate word boundaries from a character sequence.
Ranked #2 on
Thai Word Segmentation
on BEST-2010
no code implementations • RANLP 2021 • Yukun Feng, Chenlong Hu, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
Character-aware neural language models can capture the relationship between words by exploiting character-level information and are particularly effective for languages with rich morphology.
no code implementations • RANLP 2021 • Jingun Kwon, Naoki Kobayashi, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
The results demonstrate that the position of emojis in texts is a good clue to boost the performance of emoji label prediction.
no code implementations • RANLP 2021 • Jingyi You, Chenlong Hu, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
Neural sequence-to-sequence (Seq2Seq) models and BERT have achieved substantial improvements in abstractive document summarization (ADS) without and with pre-training, respectively.
no code implementations • RANLP 2021 • Ying Zhang, Hidetaka Kamigaito, Tatsuya Aoki, Hiroya Takamura, Manabu Okumura
Encoder-decoder models have been commonly used for many tasks such as machine translation and response generation.
no code implementations • EMNLP 2021 • Jingun Kwon, Naoki Kobayashi, Hidetaka Kamigaito, Manabu Okumura
Sentence extractive summarization shortens a document by selecting sentences for a summary while preserving its important contents.
Ranked #5 on
Extractive Text Summarization
on CNN / Daily Mail
1 code implementation • ECCV 2020 • Soichiro Fujita, Tsutomu Hirao, Hidetaka Kamigaito, Manabu Okumura, Masaaki Nagata
This paper proposes a new evaluation framework, Story Oriented Dense video cAptioning evaluation framework (SODA), for measuring the performance of video story description systems.
no code implementations • EMNLP 2021 • Ying Zhang, Hidetaka Kamigaito, Manabu Okumura
Discourse segmentation and sentence-level discourse parsing play important roles for various NLP tasks to consider textual coherence.
no code implementations • 18 Jun 2025 • Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe
To address this, we propose Ordered CommonGen, a benchmark designed to evaluate the compositional generalization and instruction-following abilities of LLMs.
no code implementations • 29 May 2025 • Hidetaka Kamigaito, Ying Zhang, Jingun Kwon, Katsuhiko Hayashi, Manabu Okumura, Taro Watanabe
Their task-solving performance is improved by increasing parameter size, as shown in the recent studies on parameter scaling laws.
no code implementations • 13 May 2025 • Kazuki Hayashi, Hidetaka Kamigaito, Shinya Kouda, Taro Watanabe
Retrieval-Augmented Generation (RAG) has emerged as a way to complement the in-context knowledge of Large Language Models (LLMs) by integrating external documents.
no code implementations • 28 Mar 2025 • Yuto Nishida, Makoto Morishita, Hiroyuki Deguchi, Hidetaka Kamigaito, Taro Watanabe
The $k$-nearest-neighbor language model ($k$NN-LM), one of the retrieval-augmented language models, improves the perplexity for given text by directly accessing a large datastore built from any text data during inference.
1 code implementation • 12 Mar 2025 • Juseon-Do, Jaesung Hwang, Jingun Kwon, Hidetaka Kamigaito, Manabu Okumura
This study investigates retrieval-augmented summarization by specifically examining the impact of exemplar summary lengths under length constraints, not covered by previous work.
1 code implementation • 7 Feb 2025 • Soichiro Murakami, Peinan Zhang, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
This study aims to explore the linguistic features of ad texts that influence human preferences.
1 code implementation • 29 Jan 2025 • Haruki Sakajo, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe
In this study, we created video and image datasets from the existing real-time MRI dataset and investigated whether LMs can understand vowel articulation based on tongue positions using vision-based information.
no code implementations • 5 Jan 2025 • Takashi Harada, Takehiro Motomitsu, Katsuhiko Hayashi, Yusuke Sakai, Hidetaka Kamigaito
In recent years, there has been a notable increase in research on machine learning models for music retrieval and generation systems that are capable of taking natural language sentences as inputs.
1 code implementation • 29 Dec 2024 • Shintaro Ozaki, Yuta Kato, Siyuan Feng, Masayo Tomita, Kazuki Hayashi, Wataru Hashimoto, Ryoma Obara, Masafumi Oyamada, Katsuhiko Hayashi, Hidetaka Kamigaito, Taro Watanabe
Retrieval Augmented Generation (RAG) complements the knowledge of Large Language Models (LLMs) by leveraging external information to enhance response accuracy for queries.
no code implementations • 26 Dec 2024 • Siyuan Feng, Teruya Yoshinaga, Katsuhiko Hayashi, Koki Washio, Hidetaka Kamigaito
Today, manga has gained worldwide popularity.
no code implementations • 24 Dec 2024 • Yusuke Ide, Joshua Tanner, Adam Nohejl, Jacob Hoffman, Justin Vasselli, Hidetaka Kamigaito, Taro Watanabe
MWEs in CoAM are tagged with MWE types, such as Noun and Verb, to enable fine-grained error analysis.
no code implementations • 19 Oct 2024 • Hidetaka Kamigaito, Hiroyuki Deguchi, Yusuke Sakai, Katsuhiko Hayashi, Taro Watanabe
We also introduce a new MBR approach, Metric-augmented MBR (MAMBR), which increases diversity by adjusting the behavior of utility functions without altering the pseudo-references.
no code implementations • 17 Oct 2024 • Shintaro Ozaki, Kazuki Hayashi, Miyu Oba, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe
To address this, we propose a dataset, BQA, a body language question answering dataset, to validate whether the model can correctly interpret emotions from short clips of body language comprising 26 emotion labels of videos of body language.
1 code implementation • 8 Sep 2024 • Zhe Cao, Zhi Qu, Hidetaka Kamigaito, Taro Watanabe
Furthermore, we propose architecture learning techniques and introduce a gradual pruning schedule during fine-tuning to exhaustively explore the optimal setting and the minimal intrinsic subspaces for each language, resulting in a lightweight yet effective fine-tuning procedure.
no code implementations • 3 Sep 2024 • Shintaro Ozaki, Kazuki Hayashi, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
As the performance of Large-scale Vision Language Models (LVLMs) improves, they are increasingly capable of responding in multiple languages, and there is an expectation that the demand for explanations generated by LVLMs will grow.
no code implementations • 22 Aug 2024 • Yusuke Sakai, Adam Nohejl, Jiangnan Hang, Hidetaka Kamigaito, Taro Watanabe
In this study, we provide English and Japanese cross-lingual datasets for evaluating the NLU performance of LLMs, which include multiple instruction templates for fair evaluation of each task, along with regular expressions to constrain the output format.
no code implementations • 19 Aug 2024 • Yusuke Ide, Yuto Nishida, Miyu Oba, Yusuke Sakai, Justin Vasselli, Hidetaka Kamigaito, Taro Watanabe
The grammatical knowledge of language models (LMs) is often measured using a benchmark of linguistic minimal pairs, where LMs are presented with a pair of acceptable and unacceptable sentences and required to judge which is acceptable.
1 code implementation • 8 Aug 2024 • Hiroyuki Deguchi, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe
We published our mbrs as an MIT-licensed open-source project, and the code is available on GitHub.
no code implementations • 3 Aug 2024 • Hiroshi Takato, Hiroshi Tsutsui, Komei Soda, Hidetaka Kamigaito
Identifying risky driving behavior in real-world situations is essential for the safety of both drivers and pedestrians.
1 code implementation • 8 Jul 2024 • Ken Nishida, Kojiro Machi, Kazuma Onishi, Katsuhiko Hayashi, Hidetaka Kamigaito
The extreme multi-label classification~(XMC) task involves learning a classifier that can predict from a large label set the most relevant subset of labels for a data instance.
Extreme Multi-Label Classification
MUlTI-LABEL-ClASSIFICATION
+1
1 code implementation • 5 Jul 2024 • Xincan Feng, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
This paper provides theoretical interpretations of the smoothing methods for the NS loss in KGE and induces a new NS loss, Triplet Adaptive Negative Sampling (TANS), that can cover the characteristics of the conventional smoothing methods.
1 code implementation • 2 Jul 2024 • Wataru Hashimoto, Hidetaka Kamigaito, Taro Watanabe
This work investigates the impact of data augmentation on confidence calibration and uncertainty estimation in Named Entity Recognition (NER) tasks.
no code implementations • 2 Jul 2024 • Wataru Hashimoto, Hidetaka Kamigaito, Taro Watanabe
Trustworthy prediction in Deep Neural Networks (DNNs), including Pre-trained Language Models (PLMs) is important for safety-critical applications in the real world.
no code implementations • 27 Jun 2024 • Ryo Tsujimoto, Hiroki Ouchi, Hidetaka Kamigaito, Taro Watanabe
Explaining temporal changes between satellite images taken at different times is important for urban planning and environmental monitoring.
1 code implementation • 18 Jun 2024 • Zhiyu Guo, Hidetaka Kamigaito, Taro Watanabe
Scaling the context size of large language models (LLMs) enables them to perform various new tasks, e. g., book summarization.
no code implementations • 17 Jun 2024 • Boxuan Lyu, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu Okumura
Maximum a posteriori decoding, a commonly used method for neural machine translation (NMT), aims to maximize the estimated posterior probability.
no code implementations • 16 Jun 2024 • Juseon-Do, Jingun Kwon, Hidetaka Kamigaito, Manabu Okumura
For this purpose, we created new evaluation datasets by transforming traditional sentence compression datasets into an instruction format.
no code implementations • 6 Jun 2024 • Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe
Constructed dataset is a benchmark for cross-lingual language-transfer capabilities of multilingual LMs, and experimental results showed high language-transfer capabilities for questions that LMs could easily solve, but lower transfer capabilities for questions requiring deep knowledge or commonsense.
no code implementations • 3 May 2024 • Zhiyu Guo, Hidetaka Kamigaito, Taro Wanatnabe
The rapid advancement in Large Language Models (LLMs) has markedly enhanced the capabilities of language understanding and generation.
1 code implementation • 30 Apr 2024 • Huy Hien Vu, Hidetaka Kamigaito, Taro Watanabe
Despite significant improvements in enhancing the quality of translation, context-aware machine translation (MT) models underperform in many cases.
no code implementations • 18 Apr 2024 • Yusuke Sakai, Mana Makinae, Hidetaka Kamigaito, Taro Watanabe
In Simultaneous Machine Translation (SiMT) systems, training with a simultaneous interpretation (SI) corpus is an effective method for achieving high-quality yet low-latency systems.
no code implementations • 31 Mar 2024 • Jesse Atuhurra, Hidetaka Kamigaito
NLP systems are on par or, in some cases, better than humans at accomplishing specific tasks.
no code implementations • 29 Mar 2024 • Jesse Atuhurra, Iqra Ali, Tatsuya Hiraoka, Hidetaka Kamigaito, Tomoya Iwakura, Taro Watanabe
Our contribution is four-fold: 1) we introduced nine vision-and-language (VL) tasks (including object recognition, image-text matching, and more) and constructed multilingual visual-text datasets in four languages: English, Japanese, Swahili, and Urdu through utilizing templates containing \textit{questions} and prompting GPT4-V to generate the \textit{answers} and the \textit{rationales}, 2) introduced a new VL task named \textit{unrelatedness}, 3) introduced rationales to enable human understanding of the VLM reasoning process, and 4) employed human evaluation to measure the suitability of proposed datasets for VL tasks.
no code implementations • 26 Mar 2024 • Jesse Atuhurra, Hiroyuki Shindo, Hidetaka Kamigaito, Taro Watanabe
Tokenization is one such technique because it allows for the words to be split based on characters or subwords, creating word embeddings that best represent the structure of the language.
1 code implementation • 25 Mar 2024 • Huayang Li, Deng Cai, Zhi Qu, Qu Cui, Hidetaka Kamigaito, Lemao Liu, Taro Watanabe
In our work, we propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval, which aims to augment cross-lingual applications by addressing polysemy using context information.
no code implementations • 13 Mar 2024 • Jesse Atuhurra, Seiveright Cargill Dujohn, Hidetaka Kamigaito, Hiroyuki Shindo, Taro Watanabe
Natural language processing (NLP) practitioners are leveraging large language models (LLM) to create structured datasets from semi-structured and unstructured data sources such as patents, papers, and theses, without having domain-specific knowledge.
1 code implementation • 8 Mar 2024 • Aru Maekawa, Tsutomu Hirao, Hidetaka Kamigaito, Manabu Okumura
Recently, decoder-only pre-trained large language models (LLMs), with several tens of billion parameters, have significantly impacted a wide range of natural language processing (NLP) tasks.
Ranked #1 on
Discourse Parsing
on RST-DT
no code implementations • 29 Feb 2024 • Kazuki Hayashi, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
To address this issue, we propose a new task: the artwork explanation generation task, along with its evaluation dataset and metric for quantitatively assessing the understanding and utilization of knowledge about artworks.
1 code implementation • 22 Feb 2024 • Seiji Gobara, Hidetaka Kamigaito, Taro Watanabe
Experimental results on the Stack-Overflow dataset and the TSCC dataset, including multi-turn conversation show that LLMs can implicitly handle text difficulty between user input and its generated response.
no code implementations • 19 Feb 2024 • Kazuki Hayashi, Kazuma Onishi, Toma Suzuki, Yusuke Ide, Seiji Gobara, Shigeki Saito, Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
We validate it using a dataset of images from 15 categories, each with five critic review texts and annotated rankings in both English and Japanese, totaling over 2, 000 data instances.
1 code implementation • 17 Feb 2024 • Hiroyuki Deguchi, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe, Hideki Tanaka, Masao Utiyama
Minimum Bayes risk (MBR) decoding achieved state-of-the-art translation performance by using COMET, a neural metric that has a high correlation with human evaluation.
no code implementations • 14 Feb 2024 • Yuto Nishida, Makoto Morishita, Hidetaka Kamigaito, Taro Watanabe
Generating multiple translation candidates would enable users to choose the one that satisfies their needs.
no code implementations • 15 Nov 2023 • Yusuke Sakai, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
Knowledge Graph Completion (KGC) is a task that infers unseen relationships between entities in a KG.
1 code implementation • 17 Sep 2023 • Xincan Feng, Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
Subsampling is effective in Knowledge Graph Embedding (KGE) for reducing overfitting caused by the sparsity in Knowledge Graph (KG) datasets.
1 code implementation • 3 Jun 2023 • Hidetaka Kamigaito, Katsuhiko Hayashi, Taro Watanabe
This task consists of two parts: the first is to generate a table containing knowledge about an entity and its related image, and the second is to generate an image from an entity with a caption and a table containing related knowledge of the entity.
2 code implementations • Journal of Natural Language Processing 2023 • Thodsaporn Chay-intr, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu Okumura
Our model employs the lattice structure to handle segmentation alternatives and utilizes graph neural networks along with an attention mechanism to attentively extract multi-granularity representation from the lattice for complementing character representations.
Ranked #1 on
Chinese Word Segmentation
on CTB6
(using extra training data)
1 code implementation • 22 May 2023 • Ying Zhang, Hidetaka Kamigaito, Manabu Okumura
Pre-trained seq2seq models have achieved state-of-the-art results in the grammatical error correction task.
1 code implementation • 15 Oct 2022 • Naoki Kobayashi, Tsutomu Hirao, Hidetaka Kamigaito, Manabu Okumura, Masaaki Nagata
To promote and further develop RST-style discourse parsing models, we need a strong baseline that can be regarded as a reference for reporting reliable experimental results.
Ranked #1 on
Discourse Parsing
on Instructional-DT (Instr-DT)
no code implementations • 13 Sep 2022 • Hidetaka Kamigaito, Katsuhiko Hayashi
In this article, we explain the recent advance of subsampling methods in knowledge graph embedding (KGE) starting from the original one used in word2vec.
1 code implementation • NAACL 2022 • Toshiki Kawamoto, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu Okumura
A repetition is a response that repeats words in the previous speaker's utterance in a dialogue.
1 code implementation • 21 Jun 2022 • Hidetaka Kamigaito, Katsuhiko Hayashi
To solve this problem, we theoretically analyzed NS loss to assist hyperparameter tuning and understand the better use of the NS loss in KGE learning.
no code implementations • NAACL (ACL) 2022 • Soichiro Murakami, Peinan Zhang, Sho Hoshino, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
Writing an ad text that attracts people and persuades them to click or act is essential for the success of search engine advertising.
no code implementations • 29 Sep 2021 • Hidetaka Kamigaito, Katsuhiko Hayashi
On the other hand, properties of the NS loss function that are considered important for learning, such as the relationship between the noise distribution and the number of negative samples, have not been investigated theoretically.
1 code implementation • ACL 2021 • Lya Hulliyyatus Suadaa, Hidetaka Kamigaito, Kotaro Funakoshi, Manabu Okumura, Hiroya Takamura
In summary, our contributions are (1) a new dataset for numerical table-to-text generation using pairs of a table and a paragraph of a table description with richer inference from scientific papers, and (2) a table-to-text generation framework enriched with numerical reasoning.
1 code implementation • ACL 2021 • Hidetaka Kamigaito, Katsuhiko Hayashi
In knowledge graph embedding, the theoretical relationship between the softmax cross-entropy and negative sampling loss functions has not been investigated.
Ranked #15 on
Link Prediction
on FB15k-237
no code implementations • NAACL 2021 • Naoki Kobayashi, Tsutomu Hirao, Hidetaka Kamigaito, Manabu Okumura, Masaaki Nagata
We then pre-train a neural RST parser with the obtained silver data and fine-tune it on the RST-DT.
Ranked #1 on
Discourse Parsing
on RST-DT
(RST-Parseval (Relation) metric, using extra
training data)
no code implementations • NAACL 2021 • Hidetaka Kamigaito, Peinan Zhang, Hiroya Takamura, Manabu Okumura
Although there are many studies on neural language generation (NLG), few trials are put into the real world, especially in the advertising domain.
no code implementations • EACL 2021 • Hidetaka Kamigaito, Jingun Kwon, Young-In Song, Manabu Okumura
We therefore propose a method for extracting interesting relationships between persons from natural language texts by focusing on their surprisingness.
no code implementations • EACL 2021 • Chenlong Hu, Yukun Feng, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
This work presents multi-modal deep SVDD (mSVDD) for one-class text classification.
1 code implementation • EACL 2021 • Soichiro Murakami, Sora Tanaka, Masatsugu Hangyo, Hidetaka Kamigaito, Kotaro Funakoshi, Hiroya Takamura, Manabu Okumura
The task of generating weather-forecast comments from meteorological simulations has the following requirements: (i) the changes in numerical values for various physical quantities need to be considered, (ii) the weather comments should be dependent on delivery time and area information, and (iii) the comments should provide useful information for users.
no code implementations • EACL 2021 • Lya Hulliyyatus Suadaa, Hidetaka Kamigaito, Manabu Okumura, Hiroya Takamura
Numerical tables are widely used to present experimental results in scientific papers.
no code implementations • COLING 2020 • Jingun Kwon, Hidetaka Kamigaito, Young-In Song, Manabu Okumura
Recently, automatic trivia fact extraction has attracted much research interest.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Yukun Feng, Chenlong Hu, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
We propose a simple and effective method for incorporating word clusters into the Continuous Bag-of-Words (CBOW) model.
no code implementations • COLING 2020 • Shogo Fujita, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
We tackle the task of automatically generating a function name from source code.
no code implementations • COLING 2020 • Riku Kawamura, Tatsuya Aoki, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
We propose neural models that can normalize text by considering the similarities of word strings and sounds.
1 code implementation • 3 Apr 2020 • Naoki Kobayashi, Tsutomu Hirao, Hidetaka Kamigaito, Manabu Okumura, Masaaki Nagata
To obtain better discourse dependency trees, we need to improve the accuracy of RST trees at the upper parts of the structures.
Ranked #3 on
Discourse Parsing
on RST-DT
(RST-Parseval (Span) metric)
1 code implementation • 4 Feb 2020 • Hidetaka Kamigaito, Manabu Okumura
Sentence compression is the task of compressing a long sentence into a short one by deleting redundant words.
Ranked #1 on
Sentence Compression
on Google Dataset
no code implementations • IJCNLP 2019 • Naoki Kobayashi, Tsutomu Hirao, Kengo Nakamura, Hidetaka Kamigaito, Manabu Okumura, Masaaki Nagata
The first one builds the optimal tree in terms of a dissimilarity score function that is defined for splitting a text span into smaller ones.
no code implementations • CONLL 2019 • Yukun Feng, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
Our injection method can also be used together with previous methods.
no code implementations • WS 2019 • Takumi Ohtani, Hidetaka Kamigaito, Masaaki Nagata, Manabu Okumura
We present neural machine translation models for translating a sentence in a text by using a graph-based encoder which can consider coreference relations provided within the text explicitly.
no code implementations • RANLP 2019 • Tatsuya Ishigaki, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
To incorporate the information of a discourse tree structure into the neural network-based summarizers, we propose a discourse-aware neural extractive summarizer which can explicitly take into account the discourse dependency tree structure of the source document.
no code implementations • EMNLP 2018 • Tsutomu Hirao, Hidetaka Kamigaito, Masaaki Nagata
This paper tackles automation of the pyramid method, a reliable manual evaluation framework.
1 code implementation • ACL 2018 • Jun Suzuki, Sho Takase, Hidetaka Kamigaito, Makoto Morishita, Masaaki Nagata
This paper investigates the construction of a strong baseline based on general purpose sequence-to-sequence models for constituency parsing.
Ranked #18 on
Constituency Parsing
on Penn Treebank
no code implementations • NAACL 2018 • Hidetaka Kamigaito, Katsuhiko Hayashi, Tsutomu Hirao, Masaaki Nagata
To solve this problem, we propose a higher-order syntactic attention network (HiSAN) that can handle higher-order dependency features as an attention distribution on LSTM hidden states.
Ranked #3 on
Sentence Compression
on Google Dataset
no code implementations • IJCNLP 2017 • Hidetaka Kamigaito, Katsuhiko Hayashi, Tsutomu Hirao, Hiroya Takamura, Manabu Okumura, Masaaki Nagata
The sequence-to-sequence (Seq2Seq) model has been successfully applied to machine translation (MT).