Options
A Canonical Context-Preserving Representation for Open IE: Extracting Semantically Typed Relational Tuples from Complex Sentences
Journal
Knowledge-Based Systems
Type
journal article
Date Issued
2023-05-23
Author(s)
Abstract
Modern systems that deal with inference in texts need automatized methods to extract meaning
representations (MRs) from texts at scale. Open Information Extraction (IE) is a prominent way of
extracting all potential relations from a given text in a comprehensive manner. Previous work in this
area has mainly focused on the extraction of isolated relational tuples. Ignoring the cohesive nature
of texts where important contextual information is spread across clauses or sentences, state-of-the-
art Open IE approaches are thus prone to generating a loose arrangement of tuples that lack the
expressiveness needed to infer the true meaning of complex assertions.
To overcome this limitation, we present a method that allows existing Open IE systems to
enrich their output with additional meta information. By leveraging the semantic hierarchy of minimal
propositions generated by the discourse-aware Text Simplification (TS) approach presented in Niklaus
et al. (2019), we propose a mechanism to extract semantically typed relational tuples from complex source
sentences. Based on this novel type of output, we introduce a lightweight semantic representation for
Open IE in the form of normalized and context-preserving relational tuples. It extends the shallow
semantic representation of state-of-the-art approaches in the form of predicate-argument structures
by capturing intra-sentential rhetorical structures and hierarchical relationships between the relational
tuples. In that way, the semantic context of the extracted tuples is preserved, resulting in more
informative and coherent predicate-argument structures which are easier to interpret.
In addition, in a comparative analysis, we show that the semantic hierarchy of minimal propositions
benefits Open IE approaches in a second dimension: the canonical structure of the simplified sentences
is easier to process and analyze, and thus facilitates the extraction of relational tuples, resulting in an
improved precision (up to 32%) and recall (up to 30%) of the extracted relations on a large benchmark
corpus.
representations (MRs) from texts at scale. Open Information Extraction (IE) is a prominent way of
extracting all potential relations from a given text in a comprehensive manner. Previous work in this
area has mainly focused on the extraction of isolated relational tuples. Ignoring the cohesive nature
of texts where important contextual information is spread across clauses or sentences, state-of-the-
art Open IE approaches are thus prone to generating a loose arrangement of tuples that lack the
expressiveness needed to infer the true meaning of complex assertions.
To overcome this limitation, we present a method that allows existing Open IE systems to
enrich their output with additional meta information. By leveraging the semantic hierarchy of minimal
propositions generated by the discourse-aware Text Simplification (TS) approach presented in Niklaus
et al. (2019), we propose a mechanism to extract semantically typed relational tuples from complex source
sentences. Based on this novel type of output, we introduce a lightweight semantic representation for
Open IE in the form of normalized and context-preserving relational tuples. It extends the shallow
semantic representation of state-of-the-art approaches in the form of predicate-argument structures
by capturing intra-sentential rhetorical structures and hierarchical relationships between the relational
tuples. In that way, the semantic context of the extracted tuples is preserved, resulting in more
informative and coherent predicate-argument structures which are easier to interpret.
In addition, in a comparative analysis, we show that the semantic hierarchy of minimal propositions
benefits Open IE approaches in a second dimension: the canonical structure of the simplified sentences
is easier to process and analyze, and thus facilitates the extraction of relational tuples, resulting in an
improved precision (up to 32%) and recall (up to 30%) of the extracted relations on a large benchmark
corpus.
Language
English
HSG Classification
contribution to scientific community
HSG Profile Area
None
Refereed
Yes
Publisher
Elsevier
Number
268
Pages
30
Subject(s)
Division(s)
Eprints ID
269519