Search Results for author: Timothy Baldwin

Found 166 papers, 44 papers with code

Improved Topic Representations of Medical Documents to Assist COVID-19 Literature Exploration

no code implementations EMNLP (NLP-COVID19) 2020 Yulia Otmakhova, Karin Verspoor, Timothy Baldwin, Simon Šuster

Efficient discovery and exploration of biomedical literature has grown in importance in the context of the COVID-19 pandemic, and topic-based methods such as latent Dirichlet allocation (LDA) are a useful tool for this purpose.

Topic Models

Popularity Prediction of Online Petitions using a Multimodal DeepRegression Model

no code implementations ALTA 2020 Kotaro Kitayama, Shivashankar Subramanian, Timothy Baldwin

Online petitions offer a mechanism for peopleto initiate a request for change and gather sup-port from others to demonstrate support for thecause.

Information Extraction from Legal Documents: A Study in the Context of Common Law Court Judgements

no code implementations ALTA 2020 Meladel Mistica, Geordie Z. Zhang, Hui Chia, Kabir Manandhar Shrestha, Rohit Kumar Gupta, Saket Khandelwal, Jeannie Paterson, Timothy Baldwin, Daniel Beck

‘Common Law’ judicial systems follow the doctrine of precedent, which means the legal principles articulated in court judgements are binding in subsequent cases in lower courts.

Text Classification

The Company They Keep: Extracting Japanese Neologisms Using Language Patterns

no code implementations GWC 2018 James Breen, Timothy Baldwin, Francis Bond

We identified a set of suitable patterns, then tested them with two large collections of text drawn from the WWW and Twitter.

‘Just What do You Think You’re Doing, Dave?’ A Checklist for Responsible Data Use in NLP

no code implementations Findings (EMNLP) 2021 Anna Rogers, Timothy Baldwin, Kobi Leins

A key part of the NLP ethics movement is responsible use of data, but exactly what that means or how it can be best achieved remain unclear.

KFCNet: Knowledge Filtering and Contrastive Learning for Generative Commonsense Reasoning

no code implementations Findings (EMNLP) 2021 Haonan Li, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan

Pre-trained language models have led to substantial gains over a broad range of natural language processing (NLP) tasks, but have been shown to have limitations for natural language generation tasks with high-quality requirements on the output, such as commonsense generation and ad keyword generation.

Contrastive Learning Text Generation

Automatic Resolution of Domain Name Disputes

1 code implementation EMNLP (NLLP) 2021 Wayan Oger Vihikan, Meladel Mistica, Inbar Levy, Andrew Christie, Timothy Baldwin

We introduce the new task of domain name dispute resolution (DNDR), that predicts the outcome of a process for resolving disputes about legal entitlement to a domain name.

Semi-automatic Triage of Requests for Free Legal Assistance

no code implementations EMNLP (NLLP) 2021 Meladel Mistica, Jey Han Lau, Brayden Merrifield, Kate Fazio, Timothy Baldwin

Free legal assistance is critically under-resourced, and many of those who seek legal help have their needs unmet.

Fairness

Learning from Unlabelled Data for Clinical Semantic Textual Similarity

no code implementations EMNLP (ClinicalNLP) 2020 Yuxia Wang, Karin Verspoor, Timothy Baldwin

Domain pretraining followed by task fine-tuning has become the standard paradigm for NLP tasks, but requires in-domain labelled data for task fine-tuning.

Semantic Textual Similarity

A Simple yet Effective Method for Sentence Ordering

no code implementations SIGDIAL (ACL) 2021 Aili Shen, Timothy Baldwin

Sentence ordering is the task of arranging a given bag of sentences so as to maximise the coherence of the overall text.

Sentence Ordering

Contrastive Learning for Fair Representations

no code implementations22 Sep 2021 Aili Shen, Xudong Han, Trevor Cohn, Timothy Baldwin, Lea Frermann

Trained classification models can unintentionally lead to biased representations and predictions, which can reinforce societal preconceptions and stereotypes.

Contrastive Learning

Fairness-aware Class Imbalanced Learning

no code implementations EMNLP 2021 Shivashankar Subramanian, Afshin Rahimi, Timothy Baldwin, Trevor Cohn, Lea Frermann

Class imbalance is a common challenge in many NLP tasks, and has clear connections to bias, in that bias in training data often leads to higher accuracy for majority groups at the expense of minority groups.

Fairness Long-tail Learning

Balancing out Bias: Achieving Fairness Through Training Reweighting

no code implementations16 Sep 2021 Xudong Han, Timothy Baldwin, Trevor Cohn

Bias in natural language processing arises primarily from models learning characteristics of the author such as gender and race when modelling tasks such as sentiment and syntactic parsing.

Fairness

KFCNet: Knowledge Filtering and Contrastive Learning Network for Generative Commonsense Reasoning

no code implementations14 Sep 2021 Haonan Li, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan

Pre-trained language models have led to substantial gains over a broad range of natural language processing (NLP) tasks, but have been shown to have limitations for natural language generation tasks with high-quality requirements on the output, such as commonsense generation and ad keyword generation.

Contrastive Learning Text Generation

IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effective Domain-Specific Vocabulary Initialization

1 code implementation EMNLP 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

We present IndoBERTweet, the first large-scale pretrained model for Indonesian Twitter that is trained by extending a monolingually-trained Indonesian BERT model with additive domain-specific vocabulary.

Language Modelling

Evaluating the Efficacy of Summarization Evaluation across Languages

1 code implementation Findings (ACL) 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

We take a summarization corpus for eight different languages, and manually annotate generated summaries for focus (precision) and coverage (recall).

Automatic Classification of Neutralization Techniques in the Narrative of Climate Change Scepticism

no code implementations NAACL 2021 Shraey Bhatia, Jey Han Lau, Timothy Baldwin

Neutralisation techniques, e. g. denial of responsibility and denial of victim, are used in the narrative of climate change scepticism to justify lack of action or to promote an alternative view.

Discourse Probing of Pretrained Language Models

1 code implementation NAACL 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

Existing work on probing of pretrained language models (LMs) has predominantly focused on sentence-level syntactic tasks.

On the (In)Effectiveness of Images for Text Classification

no code implementations EACL 2021 Chunpeng Ma, Aili Shen, Hiyori Yoshikawa, Tomoya Iwakura, Daniel Beck, Timothy Baldwin

Images are core components of multi-modal learning in natural language processing (NLP), and results have varied substantially as to whether images improve NLP tasks or not.

Text Classification

Evaluating Document Coherence Modelling

no code implementations18 Mar 2021 Aili Shen, Meladel Mistica, Bahar Salehi, Hang Li, Timothy Baldwin, Jianzhong Qi

While pretrained language models ("LM") have driven impressive gains over morpho-syntactic and semantic tasks, their ability to model discourse and pragmatic phenomena is less clear.

Intrusion Detection

Top-down Discourse Parsing via Sequence Labelling

1 code implementation EACL 2021 Fajri Koto, Jey Han Lau, Timothy Baldwin

We introduce a top-down approach to discourse parsing that is conceptually simpler than its predecessors (Kobayashi et al., 2020; Zhang et al., 2020).

Discourse Parsing

Diverse Adversaries for Mitigating Bias in Training

1 code implementation EACL 2021 Xudong Han, Timothy Baldwin, Trevor Cohn

Adversarial learning can learn fairer and less biased models of language than standard methods.

FFCI: A Framework for Interpretable Automatic Evaluation of Summarization

2 code implementations27 Nov 2020 Fajri Koto, Timothy Baldwin, Jey Han Lau

In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that comprises four elements: faithfulness (degree of factual consistency with the source), focus (precision of summary content relative to the reference), coverage (recall of summary content relative to the reference), and inter-sentential coherence (document fluency between adjacent sentences).

Question Answering

IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP

no code implementations COLING 2020 Fajri Koto, Afshin Rahimi, Jey Han Lau, Timothy Baldwin

Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research.

Language Modelling

Target Word Masking for Location Metonymy Resolution

1 code implementation COLING 2020 Haonan Li, Maria Vasardani, Martin Tomko, Timothy Baldwin

Existing metonymy resolution approaches rely on features extracted from external resources like dictionaries and hand-crafted lexical resources.

COVID-SEE: Scientific Evidence Explorer for COVID-19 Related Research

no code implementations18 Aug 2020 Karin Verspoor, Simon Šuster, Yulia Otmakhova, Shevon Mendis, Zenan Zhai, Biaoyan Fang, Jey Han Lau, Timothy Baldwin, Antonio Jimeno Yepes, David Martinez

We present COVID-SEE, a system for medical literature discovery based on the concept of information exploration, which builds on several distinct text analysis and natural language processing methods to structure and organise information in publications, and augments search by providing a visual overview supporting exploration of a collection to identify key articles of interest.

Give Me Convenience and Give Her Death: Who Should Decide What Uses of NLP are Appropriate, and on What Basis?

no code implementations ACL 2020 Kobi Leins, Jey Han Lau, Timothy Baldwin

We focus in particular on the role of data statements in ethically assessing research, but also discuss the topic of dual use, and examine the outcomes of similar debates in other scientific disciplines.

WikiUMLS: Aligning UMLS to Wikipedia via Cross-lingual Neural Ranking

1 code implementation COLING 2020 Afshin Rahimi, Timothy Baldwin, Karin Verspoor

We present our work on aligning the Unified Medical Language System (UMLS) to Wikipedia, to facilitate manual alignment of the two resources.

Improved Document Modelling with a Neural Discourse Parser

1 code implementation ALTA 2019 Fajri Koto, Jey Han Lau, Timothy Baldwin

We empirically investigate the benefit of the proposed approach on two different tasks: abstractive summarization and popularity prediction of online petitions.

Abstractive Text Summarization Text Generation

Modelling Uncertainty in Collaborative Document Quality Assessment

no code implementations WS 2019 Aili Shen, Daniel Beck, Bahar Salehi, Jianzhong Qi, Timothy Baldwin

In the context of document quality assessment, previous work has mainly focused on predicting the quality of a document relative to a putative gold standard, without paying attention to the subjectivity of this task.

Decision Making Gaussian Processes

Deep Ordinal Regression for Pledge Specificity Prediction

1 code implementation IJCNLP 2019 Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin

Many pledges are made in the course of an election campaign, forming important corpora for political analysis of campaign strategy and governmental accountability.

Evaluating the Utility of Document Embedding Vector Difference for Relation Learning

no code implementations18 Jul 2019 Jingyuan Zhang, Timothy Baldwin

Recent work has demonstrated that vector offsets obtained by subtracting pretrained word embedding vectors can be used to predict lexical relations with surprising accuracy.

Document Embedding

How Well Do Embedding Models Capture Non-compositionality? A View from Multiword Expressions

no code implementations WS 2019 N, Navnita akumar, Timothy Baldwin, Bahar Salehi

In this paper, we apply various embedding methods on multiword expressions to study how well they capture the nuances of non-compositional data.

Target Based Speech Act Classification in Political Campaign Text

1 code implementation SEMEVAL 2019 Shivashankar Subramanian, Trevor Cohn, Timothy Baldwin

We study pragmatics in political campaign text, through analysis of speech acts and the target of each utterance.

General Classification

Does an LSTM forget more than a CNN? An empirical study of catastrophic forgetting in NLP

no code implementations ALTA 2019 Gaurav Arora, Afshin Rahimi, Timothy Baldwin

Catastrophic forgetting {---} whereby a model trained on one task is fine-tuned on a second, and in doing so, suffers a {``}catastrophic{''} drop in performance over the first task {---} is a hurdle in the development of better transfer learning techniques.

Continual Learning Curriculum Learning +1

A Joint Model for Multimodal Document Quality Assessment

no code implementations4 Jan 2019 Aili Shen, Bahar Salehi, Timothy Baldwin, Jianzhong Qi

The quality of a document is affected by various factors, including grammaticality, readability, stylistics, and expertise depth, making the task of document quality assessment a complex one.

Towards Efficient Machine Translation Evaluation by Modelling Annotators

no code implementations ALTA 2018 Nitika Mathur, Timothy Baldwin, Trevor Cohn

In this paper we show that the quality control mechanism is overly conservative, which increases the time and expense of the evaluation.

Machine Translation Translation

A Comparative Study of Embedding Models in Predicting the Compositionality of Multiword Expressions

no code implementations ALTA 2018 N, Navnita akumar, Bahar Salehi, Timothy Baldwin

In this paper, we perform a comparative evaluation of off-the-shelf embedding models over the task of compositionality prediction of multiword expressions(``MWEs'').

Information Retrieval Word Embeddings

Twitter Geolocation using Knowledge-Based Methods

no code implementations WS 2018 Taro Miyazaki, Afshin Rahimi, Trevor Cohn, Timothy Baldwin

Automatic geolocation of microblog posts from their text content is particularly difficult because many location-indicative terms are rare terms, notably entity names such as locations, people or local organisations.

Entity Linking Graph Embedding +1

Topic Intrusion for Automatic Topic Model Evaluation

no code implementations EMNLP 2018 Shraey Bhatia, Jey Han Lau, Timothy Baldwin

Topic coherence is increasingly being used to evaluate topic models and filter topics for end-user applications.

Information Retrieval Topic Models

Encoding Sentiment Information into Word Vectors for Sentiment Analysis

no code implementations COLING 2018 Zhe Ye, Fang Li, Timothy Baldwin

General-purpose pre-trained word embeddings have become a mainstay of natural language processing, and more recently, methods have been proposed to encode external knowledge into word embeddings to benefit specific downstream tasks.

Learning Word Embeddings Sentiment Analysis

Language and the Shifting Sands of Domain, Space and Time (Invited Talk)

no code implementations COLING 2018 Timothy Baldwin

In this talk, I will first present recent work on domain debiasing in the context of language identification, then discuss a new line of work on language variety analysis in the form of dialect map generation.

Language Identification

Content-based Popularity Prediction of Online Petitions Using a Deep Regression Model

1 code implementation ACL 2018 Shivashankar Subramanian, Timothy Baldwin, Trevor Cohn

Online petitions are a cost-effective way for citizens to collectively engage with policy-makers in a democracy.

Narrative Modeling with Memory Chains and Semantic Supervision

1 code implementation ACL 2018 Fei Liu, Trevor Cohn, Timothy Baldwin

Story comprehension requires a deep semantic understanding of the narrative, making it a challenging task.

Cloze Test

Towards Robust and Privacy-preserving Text Representations

3 code implementations ACL 2018 Yitong Li, Timothy Baldwin, Trevor Cohn

Written text often provides sufficient clues to identify the author, their gender, age, and other important attributes.

What's in a Domain? Learning Domain-Robust Text Representations using Adversarial Training

1 code implementation NAACL 2018 Yitong Li, Timothy Baldwin, Trevor Cohn

Most real world language problems require learning from heterogenous corpora, raising the problem of learning robust models which generalise well to both similar (in domain) and dissimilar (out of domain) instances to those seen in training.

Domain Adaptation Language Identification +1

Recurrent Entity Networks with Delayed Memory Update for Targeted Aspect-based Sentiment Analysis

1 code implementation NAACL 2018 Fei Liu, Trevor Cohn, Timothy Baldwin

While neural networks have been shown to achieve impressive results for sentence-level sentiment analysis, targeted aspect-based sentiment analysis (TABSA) --- extraction of fine-grained opinion polarity w. r. t.

Aspect-Based Sentiment Analysis Reading Comprehension

Automatic Language Identification in Texts: A Survey

1 code implementation22 Apr 2018 Tommi Jauhiainen, Marco Lui, Marcos Zampieri, Timothy Baldwin, Krister Lindén

Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in.

Language Identification

Capturing Long-range Contextual Dependencies with Memory-enhanced Conditional Random Fields

1 code implementation IJCNLP 2017 Fei Liu, Timothy Baldwin, Trevor Cohn

Despite successful applications across a broad range of NLP tasks, conditional random fields ("CRFs"), in particular the linear-chain variant, are only able to model local features.

Sub-character Neural Language Modelling in Japanese

no code implementations WS 2017 Viet Nguyen, Julian Brooke, Timothy Baldwin

In East Asian languages such as Japanese and Chinese, the semantics of a character are (somewhat) reflected in its sub-character elements.

Language Modelling

Further Investigation into Reference Bias in Monolingual Evaluation of Machine Translation

1 code implementation EMNLP 2017 Qingsong Ma, Yvette Graham, Timothy Baldwin, Qun Liu

Monolingual evaluation of Machine Translation (MT) aims to simplify human assessment by requiring assessors to compare the meaning of the MT output with a reference translation, opening up the task to a much larger pool of genuinely qualified evaluators.

Machine Translation Translation

BIBI System Description: Building with CNNs and Breaking with Deep Reinforcement Learning

no code implementations WS 2017 Yitong Li, Trevor Cohn, Timothy Baldwin

This paper describes our submission to the sentiment analysis sub-task of {``}Build It, Break It: The Language Edition (BIBI){''}, on both the builder and breaker sides.

Q-Learning Sentiment Analysis +1

Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks

1 code implementation EMNLP 2017 Afshin Rahimi, Timothy Baldwin, Trevor Cohn

We propose a method for embedding two-dimensional locations in a continuous vector space using a neural network-based model incorporating mixtures of Gaussian distributions, presenting two model variants for text-based geolocation and lexical dialectology.

Topically Driven Neural Language Model

1 code implementation ACL 2017 Jey Han Lau, Timothy Baldwin, Trevor Cohn

Language models are typically applied at the sentence level, without access to the broader document context.

Language Modelling

A Neural Model for User Geolocation and Lexical Dialectology

no code implementations ACL 2017 Afshin Rahimi, Trevor Cohn, Timothy Baldwin

We propose a simple yet effective text- based user geolocation model based on a neural network with one hidden layer, which achieves state of the art performance over three Twitter benchmark geolocation datasets, in addition to producing word and phrase embeddings in the hidden layer that we show to be useful for detecting dialectal terms.

Robust Training under Linguistic Adversity

1 code implementation EACL 2017 Yitong Li, Trevor Cohn, Timothy Baldwin

Deep neural networks have achieved remarkable results across many language processing tasks, however they have been shown to be susceptible to overfitting and highly sensitive to noise, including adversarial attacks.

Sentiment Analysis Speech Recognition +1

Semi-Automated Resolution of Inconsistency for a Harmonized Multiword Expression and Dependency Parse Annotation

1 code implementation WS 2017 King Chan, Julian Brooke, Timothy Baldwin

This paper presents a methodology for identifying and resolving various kinds of inconsistency in the context of merging dependency and multiword expression (MWE) annotations, to generate a dependency treebank with comprehensive MWE annotations.

Multimodal Topic Labelling

no code implementations EACL 2017 Ionut Sorodoc, Jey Han Lau, Nikolaos Aletras, Timothy Baldwin

Automatic topic labelling is the task of generating a succinct label that summarises the theme or subject of a topic, with the intention of reducing the cognitive load of end-users when interpreting these topics.

Topic Models

Improving Evaluation of Document-level Machine Translation Quality Estimation

no code implementations EACL 2017 Yvette Graham, Qingsong Ma, Timothy Baldwin, Qun Liu, Carla Parra, Carolina Scarton

Meaningful conclusions about the relative performance of NLP systems are only possible if the gold standard employed in a given evaluation is both valid and reliable.

Document Level Machine Translation Machine Translation +1

Unsupervised Acquisition of Comprehensive Multiword Lexicons using Competition in an n-gram Lattice

no code implementations TACL 2017 Julian Brooke, Jan {\v{S}}najder, Timothy Baldwin

We present a new model for acquiring comprehensive multiword lexicons from large corpora based on competition among n-gram candidates.

Is all that Glitters in Machine Translation Quality Estimation really Gold?

no code implementations COLING 2016 Yvette Graham, Timothy Baldwin, Meghan Dowling, Maria Eskevich, Teresa Lynn, Lamia Tounsi

Human-targeted metrics provide a compromise between human evaluation of machine translation, where high inter-annotator agreement is difficult to achieve, and fully automatic metrics, such as BLEU or TER, that lack the validity of human assessment.

Machine Translation Translation

Multiword Expressions at the Grammar-Lexicon Interface

no code implementations WS 2016 Timothy Baldwin

In this talk, I will outline a range of challenges presented by multiword expressions in terms of (lexicalist) precision grammar engineering, and different strategies for accommodating those challenges, in an attempt to strike the right balance in terms of generalisation and over- and under-generation.

Determining the Multiword Expression Inventory of a Surprise Language

no code implementations COLING 2016 Bahar Salehi, Paul Cook, Timothy Baldwin

Much previous research on multiword expressions (MWEs) has focused on the token- and type-level tasks of MWE identification and extraction, respectively.

Machine Translation

Named Entity Recognition for Novel Types by Transfer Learning

no code implementations EMNLP 2016 Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Timothy Baldwin

In named entity recognition, we often don't have a large in-domain training corpus or a knowledge base with adequate coverage to train a model directly.

Named Entity Recognition Transfer Learning

Learning Robust Representations of Text

1 code implementation EMNLP 2016 Yitong Li, Trevor Cohn, Timothy Baldwin

Deep neural networks have achieved remarkable results across many language processing tasks, however these methods are highly sensitive to noise and adversarial attacks.

An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation

4 code implementations WS 2016 Jey Han Lau, Timothy Baldwin

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings.

Document Embedding Word Embeddings

Evaluating a Topic Modelling Approach to Measuring Corpus Similarity

no code implementations LREC 2016 Richard Fothergill, Paul Cook, Timothy Baldwin

Web corpora are often constructed automatically, and their contents are therefore often not well understood.

From Incremental Meaning to Semantic Unit (phrase by phrase)

1 code implementation17 Apr 2016 Andreas Scherbakov, Ekaterina Vylomova, Fei Liu, Timothy Baldwin

This paper describes an experimental approach to Detection of Minimal Semantic Units and their Meaning (DiMSUM), explored within the framework of SemEval 2016 Task 10.

Word Embeddings

Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning

1 code implementation ACL 2016 Ekaterina Vylomova, Laura Rimell, Trevor Cohn, Timothy Baldwin

Recent work on word embeddings has shown that simple vector subtraction over pre-trained embeddings is surprisingly effective at capturing different lexical relations, despite lacking explicit supervision.

Word Embeddings

Twitter User Geolocation Using a Unified Text and Network Prediction Model

no code implementations IJCNLP 2015 Afshin Rahimi, Trevor Cohn, Timothy Baldwin

We propose a label propagation approach to geolocation prediction based on Modified Adsorption, with two enhancements:(1) the removal of "celebrity" nodes to increase location homophily and boost tractability, and (2) he incorporation of text-based geolocation priors for test users.

Exploiting Text and Network Context for Geolocation of Social Media Users

no code implementations HLT 2015 Afshin Rahimi, Duy Vu, Trevor Cohn, Timothy Baldwin

Research on automatically geolocating social media users has conventionally been based on the text content of posts from a given user or the social network of the user, with very little crossover between the two, and no bench-marking of the two approaches over compara- ble datasets.

Big Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representation on Sequence Labelling Tasks

no code implementations21 Apr 2015 Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Nathan Schneider, Timothy Baldwin

Word embeddings -- distributed word representations that can be learned from unlabelled data -- have been shown to have high utility in many natural language processing applications.

Chunking NER +3

Automatic Detection and Language Identification of Multilingual Documents

no code implementations TACL 2014 Marco Lui, Jey Han Lau, Timothy Baldwin

Language identification is the task of automatically detecting the language(s) present in a document based on the content of the document.

Language Identification Machine Translation

Cross-domain Feature Selection for Language Identification

no code implementations IJCNLP 2011 Marco Lui, Timothy Baldwin

We show that transductive (cross-domain) learning is an important consideration in building a general-purpose language identification system, and develop a feature selection method that generalizes across domains.

Feature Selection Language Identification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.