NEC orchestrating a brighter world
NEC Laboratories Europe

Human-Centric AI
Publications

Zhivar Sourati, Darshan Deshpande, Filip Ilievski, Kiril Gashteovski, Sascha Saralajew: “Robust Text Classification: Analyzing Prototype-Based Networks”, Findings of EMNLP 2024

Paper Details

Abstract:

Downstream applications often require text classification models to be accurate and robust. While the accuracy of the state-of-the-art Language Models (LMs) approximates human performance, they often exhibit a drop in performance on noisy data found in the real world. This lack of robustness can be concerning, as even small perturbations in the text, irrelevant to the target task, can cause classifiers to incorrectly change their predictions. A potential solution can be the family of Prototype-Based Networks (PBNs) that classifies examples based on their similarity to prototypical examples of a class (prototypes) and has been shown to be robust to noise for computer vision tasks. In this paper, we study whether the robustness properties of PBNs transfer to text classification tasks under both targeted and static adversarial attack settings. Our results show that PBNs, as a mere architectural variation of vanilla LMs, offer more robustness compared to vanilla LMs under both targeted and static settings. We showcase how PBNs’ interpretability can help us to understand PBNs’ robustness properties. Finally, our ablation studies reveal the sensitivity of PBNs’ robustness to how strictly clustering is done in the training phase, as tighter clustering results in less robust PBNs.

Accepted at: Findings of Empirical Methods in Natural Language Processing (EMNLP) 2024

In collaboration with: USC Information Sciences Institute, USC Thomas Lord Department of Computer Science, Vrije Universiteit Amsterdam, CAIR Ss. Cyril and Methodius University in Skopje

Paper link: https://arxiv.org/pdf/2311.06647v2

Wiem Ben Rim, Ammar Shaker, Zhao Xu, Kiril Gashteovski, Bhushan Kotnis, Carolin Lawrence, Jürgen Quittek, Sascha Saralajew: "A Human-Centric Assessment of the Usefulness of Attribution Methods in Computer Vision", European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2024

Paper Details

Abstract:

Explainable AI has emerged to assure the validity of predictions by explaining produced decisions; nonetheless, it arrived with its own challenges of objectively evaluating explanations or assessing the usefulness of these explanations to human end users. Human-centric evaluation methods, such as simulatability, that are based on human subject studies have been shown to produce contradictory findings on the usefulness of explanations. It is not entirely clear if these contradictory results are caused by uncontrolled confounders; external factors that influence the measured quantity. To enable a reliable, human-centric, and trustworthy evaluation, we propose a generic assessment framework that allows researchers to evaluate XAI results, such as attribution-based explanations, in an experimental setting with a reduced set of confounders. Applied to multiple XAI techniques at the same time, the framework returns a usefulness ranking of the XAI models and also compares them with a human baseline. To show the framework’s utility, we describe an experimental setting that measures the usefulness of attribution methods in supporting human decision-making. In a large-scale subject study, we examine this conjecture and find that our framework enables researchers to  investigate the usefulness of explanations rigorously.

Accepted at: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) 2024

 

Zhao Xu, Wiem Ben Rim, Kiril Gashteovski, Timo Sztyler, Carolin Lawrence: "A Human-Centric Evaluation Platform for Explainable Knowledge Graph Completion", System Demonstrations of European Chapter of the Association for Computational Linguistics (EACL) 2024

Paper Details

Abstract:

Explanations for AI are expected to help human users understand AI-driven predictions. Evaluating plausibility, the helpfulness of the explanations, is therefore essential for developing eXplainable AI (XAI) that can really aid human users. Here we propose a human-centric evaluation platform to measure plausibility of explanations in the context of eXplainable Knowledge Graph Completion (XKGC). The target audience of the platform are researchers and practitioners who want to 1) investigate real needs and interests of their target users in XKGC, 2) evaluate the plausibility of the XKGC methods. We showcase these two use cases in an experimental setting to illustrate what results can be achieved with our system.

Presented at: System Demonstrations of European Chapter of the Association for Computational Linguistics (EACL) 2024

Paper link: https://aclanthology.org/2024.eacl-demo.3.pdf

Carolin Lawrence, Roberto Bifulco, Kiril Gastheovski, Chia-Chien Hung, Wiem Ben Rim, Ammar Shaker, Masafumi Oyamada, Kunihiko Sadamasa, Masafumi Enomoto, Kunihiro Takeoka: "Towards Safer Large Language Models (LLMs)", NEC Technical Journal Vol.17 No.2 Special Issue on Revolutionizing Business Practices with Generative AI — Advancing the Societal Adoption of AI with the Support of Generative AI Technologies June 2024

Paper Details

Abstract:

Large Language Models (LLMs) are revolutionizing our world. They have impressive textual capabilities that will fundamentally change how human users can interact with intelligent systems. Nonetheless, they also still have a series of limitations that are important to keep in mind when working with LLMs. We explore how these limita- tions can be addressed from two different angles. First, we look at options that are currently already available, which include (1) assessing the risk of a use case, (2) prompting a LLM to deliver explanations and (3) encas- ing LLMs in a human-centred system design. Second, we look at technologies that we are currently developing, which will be able to (1) more accurately assess the quality of an LLM for a high-risk domain, (2) explain the generated LLM output by linking to the input and (3) fact check the generated LLM output against external trust- worthy sources.

Published in: NEC Technical Journal Vol.17 No.2 Special Issue on Revolutionizing Business Practices with Generative AI — Advancing the Societal Adoption of AI with the Support of Generative AI Technologies June 2024

Luca Gioacchini, Giuseppe Siracusano, Davide Sanvito, Kiril Gashteovski, David Friede, Roberto Bifulco, Carolin Lawrence: “AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents”, the North American Chapter of the Association for Computational Linguistics (NAACL) 2024

Paper Details

Abstract:

The advances made by Large Language Models (LLMs) have led to the pursuit of LLM agents that can solve intricate, multi-step reasoning tasks. As with any research pursuit, benchmarking and evaluation are key corner stones to efficient and reliable progress. However, existing benchmarks are often narrow and simply compute overall task success. To face these issues, we propose AgentQuest – a framework where (i) both benchmarks and metrics are modular and easily extensible through well documented and easy-to-use APIs; (ii) we offer two new evaluation metrics that can reliably track LLM agent progress while solving a task. We exemplify the utility of the metrics on two use cases wherein we identify common failure points and refine the agent architecture to obtain a significant performance increase. Together with the research community, we hope to extend AgentQuest further and therefore we make it available under https://github.com/nec-research/agentquest.

Presented at: The North American Chapter of the Association for Computational Linguistics (NAACL) 2024

In collaboration with: Politecnico di Torino, Ss. Cyril and Methodius University

Paper link: https://arxiv.org/abs/2404.06411

Zhao Xu, Antonio di Mauro, Wiem Ben Rim, Timo Sztyler, Carolin Lawrence: “Generating and Evaluating Plausible Explanations for Knowledge Graph Completion”, Association for Computational Linguistics (ACL) 2024

Paper Details

Abstract:

Explanations for AI should aid human users, yet this ultimate goal remains under-explored. This paper aims to bridge this gap by investigating the specific explanatory needs of human users in the context of Knowledge Graph Completion (KGC) systems. In contrast to the prevailing approaches that primarily focus on mathematical theories, we recognize the potential limitations of explanations that may end up being overly complex or nonsensical for users. Through in-depth user interviews, we gain valuable insights into the types of KGC explanations users seek. Building upon these insights, we introduce GradPath, a novel path-based explanation method designed to meet human-centric explainability constraints and enhance plausibility. Additionally, GradPath harnesses the gradients of the trained KGC model to maintain a certain level of faithfulness. We verify the effectiveness of GradPath through well-designed human-centric evaluations. The results confirm that our method provides explanations that users consider more plausible than previous ones.

Accepted at: Association for Computational Linguistics (ACL) 2024

Julius Voigt, Sascha Saralajew, Marika Kaden, Katrin Sophie Bohnsack, Lynn Reuss, Thomas Villmann: "Biologically-informed shallow classification learning integrating pathway knowledge", the 17th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC) 2024 

Paper Details

Abstract:

We propose a biologically-informed shallow neural network as an alternative to the common knowledge-integrating deep neural network architecture used in bio-medical classification learning. In particular, we focus on the Generalized Matrix Learning Vector Quantization (GMLVQ) model as a robust and interpretable shallow neural classifier based on class-dependent prototype learning and accompanying matrix adaptation for suitable data mapping. To incorporate the biological knowledge, we adjust the matrix structure in GMLVQ according to the pathway knowledge for the given problem. During model training both the mapping matrix and the class prototypes are optimized. Since GMLVQ is fully interpretable by design, the interpretation of the model is straightforward, taking explicit account of pathway knowledge. Furthermore, the robustness of the model is guaranteed by the implicit separation margin optimization realized by means of the stochastic gradient descent learning. We demonstrate the performance and the interpretability of the shallow network by reconsideration of a cancer research dataset, which was already investigated using a biologically-informed deep neural network.

Accepted at: The 17th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC) 2024

In collaboration with: Saxon Institute for Computational Intelligence and Machine Learning - University of Applied Sciences MittweidaBernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen

Vijay Viswanathan, Kiril Gashteovski, Carolin Lawrence, Tongshuang Wu, Graham Neubig: "Large Language Models Enable Few-Shot Clustering", Transactions of the Association for Computational Linguistics (TACL) 2024

Paper Details

Abstract:

Unlike traditional unsupervised clustering, semi-supervised clustering allows users to provide meaningful structure to the data, which helps the clustering algorithm to match the user’s intent. Existing approaches to semi-supervised clustering require a significant amount of feedback from an expert to improve the clusters. In this paper, we ask whether a large language model can amplify an expert’s guidance to enable query-efficient, few-shot semi-supervised text clustering. We show that LLMs are surprisingly effective at improving clustering. We explore three stages where LLMs can be incorporated into clustering: before clustering (improving input features), during clustering (by providing constraints to the clusterer), and after clustering (using LLMs post-correction). We find incorporating LLMs in the first two stages can routinely provide significant improvements in cluster quality, and that LLMs enable a user to make trade-offs between cost and accuracy to produce desired clusters. We release our code and LLM prompts for the public to use.

Accepted in: Transactions of the Association for Computational Linguistics (TACL) 2024

In collaboration with: Carnegie Mellon University, Inspired Cognition

Youmi Ma, Bhushan Kotnis, Carolin Lawrence, Goran Glavaš, Naoaki Okazaki: "Improving Cross-Lingual Transfer for Open Information Extraction with Linguistic Feature Projection", The 3rd Multilingual Representation Learning (MRL) Workshop [Co-located with EMNLP] 2023

Paper Details

Abstract:

Open Information Extraction (OpenIE) structures information from natural language text in the form of (subject, predicate, object) triples. Supervised OpenIE is, in principle, only possible for English, for which plenty of labeled data exists. Recent research efforts tackled multilingual OpenIE by means of zero-shot transfer from English, with massively multilingual language models as vehicles of transfer. Given that OpenIE is a highly syntactic task, such transfer tends to fail for languages that are syntactically more complex and distant from English. In this work, we propose two Linguistic Feature Projection strategies to alleviate the situation, having observed the failure of transferring from English to German, Arabic, and Japanese. The strategies, namely (i) reordering of words in source-language utterances to match the target language word order and (ii) code-switching, lead to training data that contains features of both the source (English) and target language. Experiments render both strategies effective and mutually complementary on German, Arabic, and Japanese. Additionally, we propose a third strategy tailored for English-Japanese transfer by (iii) inserting Japanese case markers into English utterances, which leads to further performance gains.

Presented at: The 3rd Multilingual Representation Learning (MRL) Workshop [Co-located with EMNLP] 2023

In collaboration with: Tokyo Institute of TechnologyCoresystems AGCAIDAS - University of Würzburg

Chia-Chien Hung, Wiem Ben Rim, Lindsay Frost, Lars Bruckner, Carolin Lawrence: "Walking a Tightrope – Evaluating Large Language Models in High-Risk Domains", GenBench Workshop [The first workshop on (benchmarking) generalisation in NLP] (EMNLP Workshop) 2023

Paper Details

Abstract:

High-risk domains pose unique challenges that require language models to provide accurate and safe responses. Despite the great success of large language models (LLMs), such as Chat-GPT and its variants, their performance in high-risk domains remains unclear. Our study delves into an in-depth analysis of the performance of instruction-tuned LLMs, focusing on factual accuracy and safety adherence. To comprehensively assess the capabilities of LLMs, we conduct experiments on six NLP datasets including question answering and summarization tasks within two high-risk domains: legal and medical. Further qualitative analysis highlights the existing limitations inherent in current LLMs when evaluating in high-risk domains. This underscores the essential nature of not only improving LLM capabilities but also prioritizing the refinement of domain-specific metrics, and embracing a more human-centric approach to enhance safety and factual reliability. Our findings advance the field toward the concerns of properly evaluating LLMs in high-risk domains, aiming to steer the adaptability of LLMs in fulfilling societal obligations and aligning with forthcoming regulations, such as the EU AI Act.

Presented at: GenBench Workshop [The first workshop on (benchmarking) generalisation in NLP] (EMNLP Workshop) 2023

In collaboration with:  NEC Europe

Gorjan Radevski, Kiril Gashteovski, Chia-Chien Hung, Carolin Lawrence, Goran Glavaš: “Linking Surface Facts to Large-Scale Knowledge Graphs”, EMNLP 2023 

Paper Details

Abstract:

Open Information Extraction (OIE) methods extract facts from natural language text in the form of (“subject”; “relation”; “object”) triples. These facts are, however, merely surface forms, the ambiguity of which impedes their downstream usage (e.g., “Michael Jordan” may refer to either the former basketball player or the university professor). Knowledge Graphs (KGs), on the other hand, contain facts in a canonical (i.e., unambiguous) form, but their coverage is limited by a static schema (i.e., a fixed set of entities and predicates). To bridge this gap, we need the best of both worlds: (i) high coverage of free-text OIEs, and (ii) semantic precision (i.e., monosemy) of KGs. In order to achieve this goal, we propose a new benchmark with novel evaluation protocols that can, for example, measure fact linking performance on a granular triple slot level, while also measuring if a system has the ability to recognize that a surface form has no match in the existing KG. Our extensive evaluation of several baselines show that detection of out-of-KG entitiesand predicates is more difficult than accurate linking to existing ones, thus calling for more research efforts on this difficult task.

Accepted at: Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023

In collaboration with: Julius-Maximilians-Universität Würzburg, KU Leuven, Ss. Cyril and Methodius University in Skopje

Full paper download: Linking_Surface_Facts_to_Large-Scale_Knowledge_Graphs_pre-print.pdf

Chia-Chien Hung, Lukas Lange, Jannik Strötgen: “TADA: Efficient Task-Agnostic Domain Adaptation for Transformers”, ACL 2023 

Paper Details

Abstract:

Intermediate training of pre-trained transformer-based language models on domain-specific data leads to substantial gains for downstream tasks. To increase efficiency and prevent catastrophic forgetting alleviated from full domain-adaptive pre-training, approaches such as adapters have been developed. However, these require additional parameters for each layer, and are criticized for their limited expressiveness. In this work, we introduce TADA, a novel task-agnostic domain adaptation method which is modular, parameter-efficient, and thus, data-efficient. Within TADA, we retrain the embeddings to learn domain-aware input representations and tokenizers for the transformer encoder, while freezing all other parameters of the model. Then, task-specific fine-tuning is performed. We further conduct experiments with meta-embeddings and newly introduced meta-tokenizers, resulting in one model per task in multi-domain use cases. Our broad evaluation in 4 downstream tasks for 14 domains across single and multi-domain setups and high and low-resource scenarios reveals that TADA is an effective and efficient alternative to full domain-adaptive pre-training and adapters for domain adaptation, while not introducing additional parameters or complex training steps.

Accepted at: Findings of the Association for Computational Linguistics (ACL) 2023

In collaboration with: University of MannheimBosch Center for Artificial IntelligenceKarlsruhe University of Applied Sciences

Full paper download: TADA_Efficient_Task-Agnostic_Domain_Adaptation_for_Transformers_pre-print.pdf

Zhao Xu, Carolin Lawrence, Ammar Shaker, Raman Siarheyeu: “Uncertainty Propagation in Node Classification”, International Conference on Data Mining (ICDM) 2022

Paper Details

Abstract:
Quantifying predictive uncertainty of neural net- works has recently attracted increasing attention. In this work, we focus on measuring uncertainty of graph neural networks (GNNs) for the task of node classification. Most existing GNNs model message passing among nodes. The messages are often deterministic. Questions naturally arise: Does there exist uncertainty in the messages? How could we propagate such uncertainty over a graph together with messages? To address these issues, we propose a Bayesian uncertainty propagation (BUP) method, which embeds GNNs in a Bayesian modeling framework, and models predictive uncertainty of node classification with Bayesian confidence of predictive probability and uncertainty of messages. Our method proposes a novel uncertainty propagation mechanism inspired by Gaussian models. Moreover, we present an uncertainty oriented loss for node classification that allows the GNNs to clearly integrate predictive uncertainty in learning procedure. Consequently, the training examples with large predictive uncertainty will be penalized. We demonstrate the BUP with respect to prediction reliability and out-of-distribution (OOD) predictions. The learned uncertainty is also analyzed in depth. The relations between uncertainty and graph topology, as well as predictive uncertainty in the OOD cases are investigated with extensive experiments. The empirical results with popular benchmark datasets demonstrate the superior performance of the proposed method.

Presented at: International Conference on Data Mining (ICDM) 2022

 

Ammar Shaker, Carolin Lawrence: “Multi-Source Survival Domain Adaptation”, 37th AAAI Conference on Artificial Intelligence 2023

Paper Details

Abstract:
Survival analysis is the branch of statistics that studies the relation between the characteristics of living entities and their respective survival times, taking into account the partial information held by censored cases. A good analysis can, for example, determine whether one medical treatment for a group of patients is better than another. With the rise of machine learning, survival analysis can be modeled as learning a function that maps studied patients to their survival times. To succeed with that, there are three crucial issues to be tackled. First, some patient data is censored: we do not know the true survival times for all patients. Second, data is scarce, which led past research to treat different illness types as domains in a multi-task setup. Third, there is the need for adaptation to new or extremely rare illness types, where little or no labels are available. In contrast to previous multi-task setups, we want to investigate how to efficiently adapt to a new survival target domain from multiple survival source domains. For this, we introduce a new survival metric and the corresponding discrepancy measure between survival distributions. These allow us to define domain adaptation for survival analysis while incorporating censored data, which would otherwise have to be dropped. Our experiments on two cancer data sets reveal a superb performance on target domains, a better treatment recommendation, and a weight matrix with a plausible explanation.

To be presented at: AAAI Conference on Artificial Intelligence (AAAI-23)

Full paper download: Multi-Source_Survival_Domain_Adaptation_pre-print.pdf

Sascha Saralajew, Ammar Shaker, Zhao Xu, Kiril Gashteovski, Bhushan Kotnis, Wiem Ben Rim, Jürgen Quittek, Carolin Lawrence: “A Human-Centric Assessment Framework for AI”, International Conference on Machine Learning (ICML) Workshop on Human-Machine Collaboration and Teaming 2022

Paper Details

Abstract:
With the rise of AI systems in real-world applications comes the need for reliable and trustworthy AI. An important aspect for this are explain- able AI systems. However, there is no agreed standard on how explainable AI systems should be assessed. Inspired by the Turing test, we introduce a human-centric assessment framework where a leading domain expert accepts or rejects the solutions of an AI system and another domain expert. By comparing the acceptance rates of provided solutions, we can assess how the AI sys- tem performs in comparison to the domain expert, and in turn whether or not the AI system’s explanations (if provided) are human understandable. This setup—comparable to the Turing test—can serve as framework for a wide range of human- centric AI system assessments. We demonstrate this by presenting two instantiations: (1) an assessment that measures the classification accuracy of a system with the option to incorporate label uncertainties; (2) an assessment where the useful- ness of provided explanations is determined in a human-centric manner.

Presented at: ICML 2022 Workshop on Human-Machine Collaboration and Teaming

Full paper download: A_Human-Centric_Assessment_Framework_for_AI_arxiv.pdf

Cheng Wang, Carolin Lawrence, Mathias Niepert: “State-Regularized Recurrent Neural Networks to Extract Automata and Explain Predictions”, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

Paper Details

Abstract:
Recurrent neural networks are a widely used class of neural architectures. They have, however, two shortcomings. First, they are often treated as black-box models and as such it is difficult to understand what exactly they learn as well as how they arrive at a particular prediction. Second, they tend to work poorly on sequences requiring long-term memorization, despite having this capacity in principle. We aim to address both shortcomings with a class of recurrent networks that use a stochastic state transition mechanism between cell applications. This mechanism, which we term state-regularization, makes RNNs transition between a finite set of learnable states. We evaluate state-regularized RNNs on (1) regular languages for the purpose of automata extraction; (2) non-regular languages such as balanced parentheses and palindromes where external memory is required; and (3) real-word sequence learning tasks for sentiment analysis, visual object recognition and text categorisation. We show that state-regularization (a) simplifies the extraction of finite state automata that display an RNN’s state transition dynamic; (b) forces RNNs to operate more like automata with external memory and less like finite state machines, which potentiality leads to a more structural memory; (c) leads to better interpretability and explainability of RNNs by leveraging the probabilistic finite state transition mechanism over time steps.

Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Haris Widjaja, Kiril Gashteovski, Wiem Ben Rim, Pengfei Liu, Christopher Malon, Daniel Ruffinelli, Carolin Lawrence, Graham Neubig: “KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models” EMNLP 2022 

Paper Details

Abstract
Knowledge Graphs (KGs) store information in the form of (head, predicate, tail)-triples. To augment KGs with new knowledge, researchers proposed models for KG Completion (KGC) tasks such as link prediction; i.e., answering (h; p; ?) or (?; p; t) queries. Such models are usually evaluated with averaged metrics on a held-out test set. While useful for tracking progress, averaged single-score metrics cannot reveal what exactly a model has learned—or failed to learn. To address this issue, we pro-pose KGxBoard1: an interactive framework for performing fine-grained evaluation on meaningful subsets of the data, each of which tests individual and interpretable capabilities of a KGC model. In our experiments, we highlight the findings that we discovered with the use of KGxBoard, which would have been impossible to detect with standard averaged single-score metrics.

Presented at: Conference on Empirical Methods in Natural Language Processing (EMNLP) 2022

 

Full paper download: KGxBoard_Explainable_and_Interactive_Leaderboard_for_Evaluation_Knowledge_Graph_Completion_Models.pdf

Niklas Friedrich, Kiril Gashteovski, Mingying Yu, Bhushan Kotnis, Carolin Lawrence, Mathias Niepert, Goran Glavaš, “AnnIE: An Annotation Platform for Constructing Complete Open Information Extraction Benchmark”, Annual Meeting of the Association for Computational Linguistics (ACL) 2022

Paper Details

Abstract:
Open Information Extraction (OIE) is the task of extracting facts from sentences in the form of relations and their corresponding arguments in schema-free manner. Intrinsic performance of OIE systems is difficult to measure due to the incompleteness of existing OIE benchmarks: ground truth extractions do not group all acceptable surface realizations of the same fact that can be extracted from a sentence. To measure performance of OIE systems more realistically, it is necessary to manually annotate complete facts (i.e., clusters of all acceptable surface realizations of the same fact) from input sentences.

We propose AnnIE: an inter-active annotation platform that facilitates such challenging annotation tasks and supports creation of complete fact-oriented OIE evaluation benchmarks. AnnIE is modular and flexible in order to support different use case scenarios (i.e., benchmarks covering different types of facts) and different languages. We use AnnIE to build two complete OIE benchmarks: one with verb-mediated facts and another with facts encompassing named entities. We evaluate several OIE systems on our complete benchmarks created with AnnIE. We publicly release AnnIE under non-restrictive license.

Conference: Annual Meeting of the Association for Computational Linguistics (ACL) 2022

Kiril Gashteovski, Mingying Yu, Bhushan Kotnis, Carolin Lawrence, Mathias Niepert, Goran Glavaš (University of Mannheim), "BenchIE: A Framework for Multi-Faceted Fact-Based Open Information Extraction Evaluation", Annual Meeting of the Association for Computational Linguistics (ACL) 2022

Paper Details

Abstract:
Intrinsic evaluations of OIE systems are carried out either manually—with human evaluators judging the correctness of extractions—or automatically, on standardized benchmarks. The latter, while much more cost-effective, is less reliable, primarily because of the incompleteness of the existing OIE benchmarks: the ground truth extractions do not include all acceptable variants of the same fact, leading to unreliable assessment of the models’ performance. Moreover, the existing OIE benchmarks are available for English only. In this work, we introduce BenchIE: a benchmark and evaluation framework for comprehensive evaluation of OIE systems for English, Chinese, and German. In contrast to existing OIE benchmarks, BenchIE is fact-based, i.e., it takes into account informational equivalence of extractions: our gold standard consists of fact synsets, clusters in which we exhaustively list all acceptable surface forms of the same fact. Moreover, having in mind common downstream applications for OIE, we make BenchIE multi-faceted; i.e., we create benchmark variants that focus on different facets of OIE evaluation, e.g., compactness or minimality of extractions. We benchmark several state-of-the-art OIE systems using BenchIE and demonstrate that these systems are significantly less effective than indicated by existing OIE benchmarks. We make BenchIE (data and evaluation code) publicly available.

Conference: Annual Meeting of the Association for Computational Linguistics (ACL) 2022

Bhushan Kotnis, Kiril Gashteovski, Daniel Oñoro-Rubio, Vanesa Rodriguez-Tembras, Ammar Shaker, Makoto Takamoto, Mathias Niepert, Carolin Lawrence, "milIE: Modular & Iterative Multilingual Open Information Extraction", Annual Meeting of the Association for Computational Linguistics (ACL) 2022

Paper Details

Abstract:
Open Information Extraction (OpenIE) is the task of extracting (subject, predicate, object) triples from natural language sentences. Current OpenIE systems extract all triple slots independently. In contrast, we explore the hypothesis that it may be beneficial to extract triple slots iteratively: first extract easy slots, followed by the difficult ones by conditioning on the easy slots, and therefore achieve a better overall extraction.

Based on this hypothesis, we propose a neural OpenIE system, MILIE, that operates in an it-erative fashion. Due to the iterative nature, the system is also modular—it is possible to seamlessly integrate rule based extraction systems with a neural end-to-end system, thereby allowing rule based systems to supply extraction slots which MILIE can leverage for extracting the remaining slots. We confirm our hypothe-sis empirically: MILIE outperforms SOTA systems on multiple languages ranging from Chi-nese to Arabic. Additionally, we are the first to provide an OpenIE test dataset for Arabic and Galician.

Presented at:  Annual Meeting of the Association for Computational Linguistics (ACL) 2022

In collaboration with: Heidelberg University - Center for Ibero-American Studies, University of Stuttgart

Full paper download: MILIE_-_Modular___Iterative_Multilingual_Open_Information_Extraction.pdf

Ammar Shaker, Shujian Yu, Daniel Oñoro-Rubio: “Learning to Transfer with von Neumann Conditional Divergence, AAAI 2022 

Paper Details

Abstract:
The similarity of feature representations plays a pivotal role in the success of problems related to domain adaptation. Feature similarity includes both the invariance of marginal distributions and the closeness of conditional distributions given the desired response y (e.g., class labels). Unfortunately, tra-ditional methods always learn such features without fully taking into consideration the information in y, which in turn may lead to a mismatch of the conditional distributions or the mix-up of discriminative structures underlying data distributions. In this work, we introduce the recently proposed von Neu-mann conditional divergence to improve the transferability across multiple domains. We show that this new divergence is differentiable and eligible to easily quantify the functional dependence between features and y. Given multiple source tasks, we integrate this divergence to capture discriminative information in y and design novel learning objectives assuming those source tasks are observed either simultaneously or sequentially. In both scenarios, we obtain favorable performance against state-of-the-art methods in terms of smaller generalization error on new tasks and less catastrophic for-getting on source tasks (in the sequential setup).

Conference: Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)
 

Wiem Ben Rim, Carolin Lawrence, Kiril Gashteovski, Mathias Niepert, Naoaki Okazaki: “Behavioral Testing of Knowledge Graph Embedding Models for Link Prediction”, Conference on Automated Knowledge Base Construction (AKBC) 2021

Paper Details

Abstract:
Knowledge graph embedding (KGE) models are often used to encode knowledge graphs in order to predict new links inside the graph. The accuracy of these methods is typically evaluated by computing an averaged accuracy metric on a held-out test set. This approach, however, does not allow the identification of where the models might systematically fail or succeed. To address this challenge, we propose a new evaluation framework that builds on the idea of (black-box) behavioral testing, a software engineering principle that enables users to detect system failures before deployment. With behavioral tests, we can specifically target and evaluate the behavior of KGE models on specific capabilities deemed important in the context of a particular use case. To this end, we leverage existing knowledge graph schemas to design behavioral tests for the link prediction task. With an extensive set of experiments, we perform and analyze these tests for several KGE models. Crucially, we for example find that a model ranked second to last on the original test set actually performs best when tested for a specific capability. Such insights allow users to better choose which KGE model might be most suitable for a particular task. The framework is extendable to additional behavioral tests and we hope to inspire fellow researchers to join us in collaboratively growing this framework. The framework is available at https: //github.com/nec-research/KGEval.

Conference: Conference on Automated Knowledge Base Construction (AKBC)

A research collaboration between NEC Laboratories Europe and Tokyo Institute of Technology

Giuseppe Serra, Zhao Xu, Mathias Niepert, Carolin Lawrence, Peter Tiňo, Xin Yao: "Interpreting Node Embedding with Text-labeled Graphs", IEEE International Joint Conference on Neural Network (IJCNN) 2021

Paper Details

Abstract
Graph neural networks have recently received increasing attention. These methods often map nodes into latent spaces and learn vector representations of the nodes for a variety of downstream tasks. To gain trust and to promote collaboration between AIs and humans, it would be better if those representations were interpretable for humans. However, most explainable AIs focus on a supervised learning setting and aim to answer the following question: ”Why does the model predict y for an input x?”. For an unsupervised learning setting as node embedding, interpretation can be more complicated since the embedding vectors are usually not understandable for humans. On the other hand, nodes and edges in a graph are often associated with texts in many real-world applications. A question naturally arises: could we integrate the human-understandable textural data into graph learning to facilitate interpretable node embedding? In this paper we present interpretable graph neural networks (iGNN), a model to learn textual explanations for node representations modeling the extra information contained in the associated textual data. To validate the performance of the proposed method, we investigate the learned interpretability of the embedding vectors and use functional interpretability to measure it. Experimental results on multiple text-labeled graphs show the effectiveness of the iGNN model on learning textual explanations of node embedding while performing well in downstream tasks.

Index Terms—Node embedding, interpretability, text mining

Full paper download: Interpreting_Node_Embedding_with_Text-labeled_Graph_090621.pdf

Ammar Shaker, Francesco Alesiani, Shujian Yu, Wenzhe Yin: “Bilevel Continual Learning,” IJCNN 2021

Paper Details

Abstract
Continual learning (CL) studies the problem of learning a sequence of tasks, one at a time, such that the learning of each new task does not lead to the deterioration in performance on the previously seen ones. This paper presents Bilevel Continual Learning (BiCL), a general framework for continual learning that fuses bilevel optimization and recent advances in meta-learning for deep neural networks. BiCL is able to train both deep discriminative and generative models and deep generative models under the conservative setting of the online continual learning settings. Experimental results show that BiCL provides competitive performance in terms of accuracy for the current task while reducing the effect of catastrophic forgetting.

Carolin Lawrence, Timo Sztyler, Mathias Niepert: “Explaining Neural Matrix Factorization with Gradient Rollback”, AAAI 2021

Paper Details

Abstract

Explaining the predictions of neural black-box models is an important problem, especially when such models are used in applications where user trust is crucial. Estimating the influence of training examples on a learned neural model's behavior allows us to identify training examples most responsible for a given prediction and, therefore, to faithfully explain the output of a black-box model. The most generally applicable existing method is based on influence functions, which scale poorly for larger sample sizes and models.

We propose gradient rollback, a general approach for influence estimation, applicable to neural models where each parameter update step during gradient descent touches a smaller number of parameters, even if the overall number of parameters is large. Neural matrix factorization models trained with gradient descent are part of this model class. These models are popular and have found a wide range of applications in industry. Especially knowledge graph embedding methods, which belong to this class, are used extensively. We show that gradient rollback is highly efficient at both training and test time. Moreover, we show theoretically that the difference between gradient rollback's influence approximation and the true influence on a model's behavior is smaller than known bounds on the stability of stochastic gradient descent. This establishes that gradient rollback is robustly estimating example influence. We also conduct experiments which show that gradient rollback provides faithful explanations for knowledge base completion and recommender datasets.

Presented at: 35th Conference on Artificial Intelligence (AAAI-21)

Full paper download: 16632-Article_Text-20126-1-2-20210518.pdf

Bhushan Kotnis, Carolin Lawrence and Mathias Niepert: “Answering Complex Queries in Knowledge Graphs with Bidirectional Sequence Encoders”, AAAI 2021

Paper Details

Abstract
Representation learning for knowledge graphs (KGs) has focused on the problem of answering simple link prediction queries. In this work, we address the more ambitious challenge of predicting the answers of conjunctive queries with multiple missing entities. We propose Bidirectional Query Embedding (BIQE), a method that embeds conjunctive queries with models based on bi-directional attention mechanisms. Contrary to prior work, bidirectional self-attention can capture interactions among all the elements of a query graph. We introduce two new challenging data sets for studying conjunctive query inference and conduct experiments on several benchmark datasets that demonstrate BIQE significantly outperforms state of the art baselines.

Presented at: 35th Conference on Artificial Intelligence (AAAI-21)

Shujian Yu, Ammar Shaker, Francesco Alesiani, Jose Principe: “Measuring the Discrepancy between Conditional Distributions: Methods, Properties and Applications”, IJCAI 2020

Paper Details

Abstract
We propose a simple yet powerful test statistic to quantify the discrepancy between two conditional distributions. The new statistic avoids the explicit estimation of the underlying distributions in high dimensional space and it operates on the cone of symmetric positive semidefinite (SPS) matrix using the Bregman matrix divergence. Moreover, it inherits the merits of the correntropy function to explicitly incorporate high-order statistics in the data. We present the properties of our new statistic and illustrate its connections to prior art. We finally show the applications of our new statistic on three different machine learning problems, namely the multi-task learning over graphs, the concept drift detection, and the information-theoretic feature selection, to demonstrate its utility and advantage. Code of our statistic is available at https: //bit.ly/BregmanCorrentropy.

Presented at: International Joint Conference on Artificial Intelligence – Pacific Rim International Conference on Artificial Intelligence, 2020

Full paper download: Measuring the Discrepancy between Conditional Distributions: Methods, Properties and Applications (pdf)

Francesco Alesiani, Shujian Yu, Ammar Shaker: “Towards Interpretable Multi Task Learning”, ECML PKDD 2020

Paper Details

Abstract
Interpretable Multi-Task Learning can be expressed as learn-ing a sparse graph of the task relationship based on the predictionperformance of the learned models. Since many natural phenomenonexhibit sparse structures, enforcing sparsity on learned models reveals theunderlying task relationship. Moreover, different sparsification degreesfrom a fully connected graph uncover various types of structures, likecliques, trees, lines, clusters or fully disconnected graphs. In this paper,we propose a bilevel formulation of multi-task learning that induces sparsegraphs, thus, revealing the underlying task relationships, and an efficientmethod for its computation. We show empirically how the induced sparsegraph improves the interpretability of the learned models and their re-lationship on synthetic and real data, without sacrificing generalizationperformance. Code athttps://bit.ly/GraphGuidedMTL

Ammar Shaker, Shujian Yu, Xiao He, Christoph Gärtner: “Online Meta-Forest for Regression Data Streams”, IJCNN 2020 and WCCI 2020

Paper Details

Abstract
Stream learning is essential when there is lim-ited memory, time and computational power. However, existingstreaming methods are mostly designed for classification withonly a few exceptions for regression problems. Although beingfast, the performance of these online regression methods isinadequate due to their dependence on merely linear models.Besides, only a few stream methods are based on meta-learningthat aims at facilitating the dynamic choice of the right model.Nevertheless, these approaches are restricted to recommendlearners on a window and not on the instance level. In thispaper, we present a novel approach, named Online Meta-Forest,that incrementally induces an ensemble of meta-learners thatselects the best set of predictors for each test example. Eachmeta-learner has the ability to find a non-linear mapping of theinput space to the set of induced models. We conduct a series ofexperiments demonstrating that Online Meta-Forest outperformsrelated methods on16out of25evaluated benchmark anddomain datasets in transportation.Index Terms—Learning from Data Streams, Adaptive Learn-ing, Meta-Learning, Regression Streams, Data Streams, OnlineBagging, Ensemble Learning

Shujian Yu, Ammar Shaker, Francesco Alesiani, Jose C. Principe: “Measuring the Discrepancy between two Conditional Distributions: Methods, Properties and Applications”, IJCAI 2020

Paper Details

Abstract

We propose a simple yet powerful test statistic toquantify the discrepancy between two conditionaldistributions. The new statistic avoids the explicitestimation of the underlying distributions in high-dimensional space and it operates on the cone ofsymmetric positive semidefinite (SPS) matrix usingthe Bregman matrix divergence. Moreover, it in-herits the merits of the correntropy function to ex-plicitly incorporate high-order statistics in the da-ta. We present the properties of our new statisticand illustrate its connections to prior art. We fi-nally show the applications of our new statistic onthree different machine learning problems, name-ly the multi-task learning over graphs, the conceptdrift detection, and the information-theoretic fea-ture selection, to demonstrate its utility and advan-tage. Code of our statistic is available at bit.ly/BregmanCorrentropy.

Carolin Lawrence, Bhushan Kotnis, Mathias Niepert: “Attending to Future Tokens for Bidirectional Sequence Generation”, EMNLP 2019

Paper Details

Abstract

Neural sequence generation is typically performed token-by-token and left-to-right.Whenever a token is generated only previously produced tokens are taken into consideration. In contrast, for problems such as sequence classification, bidirectional attention, which takes both past and future tokens into consideration, has been shown to perform much better. We propose to make the sequence generation process bidirectional by employing special placeholder tokens. Treated as a node in a fully connected graph, a placeholder token can take past and future tokens into consideration when generating the actual output token. We verify the effectiveness of our approach experimentally on two conversational tasks where the proposed bidirectional model outperforms competitive baselines by a large margin.

Presented at: Conference on Empirical Methods in Natural Language Processing 2019 and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019

Top of this page