Item Type |
Conference or Workshop Item
(Paper)
|
Abstract |
We compiled a new sentence splitting corpus that is composed of 203K pairs of aligned complex source and simplified target sentences. Contrary to previously proposed text simplification corpora, which contain only a small number of split examples, we present a dataset where each input sentence is broken down into a set of minimal propositions, i.e. a sequence of sound, self-contained utterances with each of them presenting a minimal semantic unit that cannot be further decomposed into meaningful propositions. This corpus is useful for developing sentence splitting approaches that learn how to transform sentences with a complex linguistic structure into a fine-grained representation of short sentences that present a simple and more regular structure which is easier to process for downstream applications and thus facilitates and improves their performance. |
Authors |
Niklaus, Christina; Freitas, André & Handschuh, Siegfried |
Language |
English |
Subjects |
computer science |
HSG Classification |
contribution to scientific community |
Date |
2019 |
Place of Publication |
Tokyo, Japan |
Event Title |
12th International Conference on Natural Language Generation |
Event Location |
Tokyo, Japan |
Event Dates |
29 October - 1 November 2019 |
Depositing User |
Dr. Christina Niklaus
|
Date Deposited |
13 Nov 2019 16:17 |
Last Modified |
20 Jul 2022 17:40 |
URI: |
https://www.alexandria.unisg.ch/publications/258308 |