Identifying Facts in Text with Fact Extractor: NEC Laboratories Europe

Always first or chosen

Group research

Human-Centric AI

Topic tags

Annotation platform Multilingual systems Qualitative evaluation Datasets Information extraction Explainable AI Knowledge graphs

Group research

Human-Centric AI

Topic tags

Annotation platform Multilingual systems Qualitative evaluation Datasets Information extraction Explainable AI Knowledge graphs

At a Glance

Method

Fact Extractor

Research Field

Machine learning, natural language processing

Focus

Free up time of domain experts and highlight previously unknown relationships between extracted text

Use Cases

Drug development, carbon emission reports, materials informatics

Conferences

EMNLP 2023, AAAI 2021, ACL 2022, EMNLP 2023

Related Methods

AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark (Friedrich et al.), ACL 2022

BenchIE: A Framework for Multi-Faceted Fact-Based Open Information Extraction Evaluation (Gashteovski et al.), ACL 2022

milIE: Modular & Iterative Multilingual Open Information Extraction (Kotnis et al.), ACL 2022

Fact-Linking: Linking Surface Facts to Large-Scale Knowledge Graphs (Radevski et al.), EMNLP 2023

Gradient Rollback: Explaining Neural Matrix Factorization with Gradient Rollback (Lawrence et al.), AAAI 2021

Challenge

Information is being generated faster than humans can process it. This impedes our ability to acquire knowledge, which is slowing down scientific discovery and the advancement of technology. How can humans absorb information at the pace it is being created?

Hypothesis

Develop a scientific framework to extract information that enables computers to read text information and arrange it as structured knowledge. This framework can then be applied to achieve cross-document understanding which, for example, enhances scientific discovery.

Methodology

To enable cross-document understanding, three main goals must be achieved:

Goal 1:: Extract facts from text
Goal 2:: Connect facts from different sources in a meaningful way
Goal 3:: Generate new insights from the connected facts

Figure 1: Framework for cross-document understanding

Extracting facts from text

NEC Laboratories Europe has successfully completed its first goal of achieving cross- document understanding by developing NEC Fact Extractor, which extracts facts from any language (Goal 1).

For each sentence, Fact Extractor extracts facts in the form of triples. Each triple consists of two entities (often subject and object), a relation (often a verb phrase), and, optionally, an argument (often location or time). The information is used to identify and connect with other facts (Goal 2).

Explicit extraditions (all the slots from the triple are extracted from the sentence):

Triple slots Subject (S):

writings of Ptolemy

(often the subject/agent)

Predicate (P):

provide reference to

(can be verb mediated or noun mediated)

Object (O):

settlement in Dublin

(often the object, but can be a clause)

Arguments (Args):

in 140 CE

(generally temporal/location phrases)

Figure 2: Example of Fact Extractor extracting a fact from a sentence

NEC Laboratories Europe recently published the paper, Fact-Linking: Linking Surface Facts to Large-Scale Knowledge Graphs (Gorjan Radevski et al.), that describes how extracted text facts can be linked in a meaningful way and how to generate new text-based insights from these.

Once a knowledge graph is created, we use Gradient Rollback to derive explainable new insights (Goal 3). You can learn more about this method in the paper, Explaining Neural Matrix Factorization with Gradient Rollback (Carolin Lawrence et al.).

Scientific Brief

Identifying Facts in Text with Fact Extractor

Group research

Topic tags

Group research

Topic tags

At a Glance

Methodology