MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions

Item Type Conference or Workshop Item (Paper)
Abstract We compiled a new sentence splitting corpus that is composed of 203K pairs of aligned complex source and simplified target sentences. Contrary to previously proposed text simplification corpora, which contain only a small number of split examples, we present a dataset where each input sentence is broken down into a set of minimal propositions, i.e. a sequence of sound, self-contained utterances with each of them presenting a minimal semantic unit that cannot be further decomposed into meaningful propositions. This corpus is useful for developing sentence splitting approaches that learn how to transform sentences with a complex linguistic structure into a fine-grained representation of short sentences that present a simple and more regular structure which is easier to process for downstream applications and thus facilitates and improves their performance.
Authors Niklaus, Christina; Freitas, André & Handschuh, Siegfried
Language English
Subjects computer science
HSG Classification contribution to scientific community
Date 2019
Place of Publication Tokyo, Japan
Event Title 12th International Conference on Natural Language Generation
Event Location Tokyo, Japan
Event Dates 29 October - 1 November 2019
Depositing User Dr. Christina Niklaus
Date Deposited 13 Nov 2019 16:17
Last Modified 20 Jul 2022 17:40
URI: https://www.alexandria.unisg.ch/publications/258308

Download

[img] Text
67_Paper.pdf

Download (110kB)

Citation

Niklaus, Christina; Freitas, André & Handschuh, Siegfried: MinWikiSplit: A Sentence Splitting Corpus with Minimal Propositions. 2019. - 12th International Conference on Natural Language Generation. - Tokyo, Japan.

Statistics

https://www.alexandria.unisg.ch/id/eprint/258308
Edit item Edit item
Feedback?