The LanguageNet (English) is a collection of sentence level paraphrases from Twitter by linking tweets through shared URLs. This corpus is the largest up to date with 51,524 human annotated sentence pairs: 42200 for training and 9324 for testing. It can grow 30,000 new sentential paraphrases per month with ~70% precision. Now we have 1-year data available: 2,869,657 candidate pairs!
Paper | Code | Results | Date | Stars |
---|