Kiril Gashteovski, Mingying Yu, Bhushan Kotnis, Carolin Lawrence, Mathias Niepert, Goran Glavaš (University of Mannheim), "BenchIE: A Framework for Multi-Faceted Fact-Based Open Information Extraction Evaluation", Annual Meeting of the Association for Computational Linguistics (ACL) 2022
Intrinsic evaluations of OIE systems are carried out either manually—with human evaluators judging the correctness of extractions—or automatically, on standardized benchmarks. The latter, while much more cost-effective, is less reliable, primarily because of the incompleteness of the existing OIE benchmarks: the ground truth extractions do not include all acceptable variants of the same fact, leading to unreliable assessment of the models’ performance. Moreover, the existing OIE benchmarks are available for English only. In this work, we introduce BenchIE: a benchmark and evaluation framework for comprehensive evaluation of OIE systems for English, Chinese, and German. In contrast to existing OIE benchmarks, BenchIE is fact-based, i.e., it takes into account informational equivalence of extractions: our gold standard consists of fact synsets, clusters in which we exhaustively list all acceptable surface forms of the same fact. Moreover, having in mind common downstream applications for OIE, we make BenchIE multi-faceted; i.e., we create benchmark variants that focus on different facets of OIE evaluation, e.g., compactness or minimality of extractions. We benchmark several state-of-the-art OIE systems using BenchIE and demonstrate that these systems are significantly less effective than indicated by existing OIE benchmarks. We make BenchIE (data and evaluation code) publicly available.
Bhushan Kotnis, Kiril Gashteovski, Daniel Oñoro-Rubio, Vanesa Rodriguez-Tembras, Ammar Shaker, Makoto Takamoto, Mathias Niepert, Carolin Lawrence, "milIE: Modular & Iterative Multilingual Open Information Extraction", Annual Meeting of the Association for Computational Linguistics (ACL) 2022
Open Information Extraction (OpenIE) is the task of extracting (subject, predicate, object) triples from natural language sentences. Current OpenIE systems extract all triple slots independently. In contrast, we explore the hypothesis that it may be beneficial to extract triple slots iteratively: first extract easy slots, followed by the difficult ones by conditioning on the easy slots, and therefore achieve a better overall extraction.
Based on this hypothesis, we propose a neural OpenIE system, MILIE, that operates in an it-erative fashion. Due to the iterative nature, the system is also modular—it is possible to seamlessly integrate rule based extraction systems with a neural end-to-end system, thereby allowing rule based systems to supply extraction slots which MILIE can leverage for extracting the remaining slots. We confirm our hypothe-sis empirically: MILIE outperforms SOTA systems on multiple languages ranging from Chi-nese to Arabic. Additionally, we are the first to provide an OpenIE test dataset for Arabic and Galician.
Ammar Shaker, Shujian Yu, Daniel Oñoro-Rubio: “Learning to Transfer with von Neumann Conditional Divergence”, AAAI-22 (Accepted)
The similarity of feature representations plays a pivotal role in the success of problems related to domain adaptation. Feature similarity includes both the invariance of marginal distributions and the closeness of conditional distributions given the desired response y (e.g., class labels). Unfortunately, tra-ditional methods always learn such features without fully taking into consideration the information in y, which in turn may lead to a mismatch of the conditional distributions or the mix-up of discriminative structures underlying data distributions. In this work, we introduce the recently proposed von Neu-mann conditional divergence to improve the transferability across multiple domains. We show that this new divergence is differentiable and eligible to easily quantify the functional dependence between features and y. Given multiple source tasks, we integrate this divergence to capture discriminative information in y and design novel learning objectives assuming those source tasks are observed either simultaneously or sequentially. In both scenarios, we obtain favorable performance against state-of-the-art methods in terms of smaller generalization error on new tasks and less catastrophic for-getting on source tasks (in the sequential setup).
D. Friede, M. Niepert: “Efficient Learning of Discrete-Continuous Computations Graphs”, Conference on Neural Information Processing Systems (NeurIPS) 2021
Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph’s execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the scale parameter of the Gumbel noise perturbations during training improves the learning behavior. Second, we propose dropout residual connections specifically tailored to stochastic, discrete-continuous computation graphs. With an extensive set of experiments, we show that we can train complex discrete-continuous models which one cannot train with standard stochastic softmax tricks. We also show that complex discrete-stochastic models generalize better than their continuous counterparts on several benchmark datasets.
Conference: Conference on Neural Information Processing Systems (NeurIPS) 2021
M. Niepert, Minervini Pasquale, Luca Franceschi: “Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions”, Conference on Neural Information Processing Systems (NeurIPS) 2021
Combining discrete probability distributions and combinatorial optimization problems with neural network components has numerous applications but poses several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. I-MLE is widely applicable as it only requires the ability to compute the most probable states and does not rely on smooth relaxations. The framework encompasses several approaches such as perturbation-based implicit differentiation and recent methods to differentiate through black-box combinatorial solvers. We introduce a novel class of noise distributions for approximating marginals via perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood estimation when used in some recently studied learning settings that involve combinatorial solvers. Experiments on several datasets suggest that I-MLE is competitive with and often outperforms existing approaches which rely on problem-specific relaxations.
Conference: Conference on Neural Information Processing Systems (NeurIPS)
Wiem Ben Rim, Carolin Lawrence, Kiril Gashteovski, Mathias Niepert, Naoaki Okazaki: “Behavioral Testing of Knowledge Graph Embedding Models for Link Prediction”, Conference on Automated Knowledge Base Construction (AKBC) (accepted)
Knowledge graph embedding (KGE) models are often used to encode knowledge graphs in order to predict new links inside the graph. The accuracy of these methods is typically evaluated by computing an averaged accuracy metric on a held-out test set. This approach, however, does not allow the identiﬁcation of where the models might systematically fail or succeed. To address this challenge, we propose a new evaluation framework that builds on the idea of (black-box) behavioral testing, a software engineering principle that enables users to detect system failures before deployment. With behavioral tests, we can speciﬁcally target and evaluate the behavior of KGE models on speciﬁc capabilities deemed important in the context of a particular use case. To this end, we leverage existing knowledge graph schemas to design behavioral tests for the link prediction task. With an extensive set of experiments, we perform and analyze these tests for several KGE models. Crucially, we for example ﬁnd that a model ranked second to last on the original test set actually performs best when tested for a speciﬁc capability. Such insights allow users to better choose which KGE model might be most suitable for a particular task. The framework is extendable to additional behavioral tests and we hope to inspire fellow researchers to join us in collaboratively growing this framework. The framework is available at https: //github.com/nec-research/KGEval.
A research collaboration between NEC Laboratories Europe and Tokyo Institute of Technology
Francesco Alesiani, Shujian Yu, Xi Yu: “Gated Information Bottleneck for Generalization in Sequential Environments” IEEE International Conference on Data Mining (ICDM) 2021 (accepted)
Abstract—Deep neural networks suffer from poor generalization to unseen environments when the underlying data distribution is different from that in the training set. By learning minimum sufficient representations from training data, the information bottleneck (IB) approach has demonstrated its effectiveness to improve generalization in different AI applications. In this work, we propose a new neural network-based IB approach, termed gated information bottleneck (GIB), that dynamically drops spurious features and progressively selects the most relevant ones across different environments by a trainable soft mask (on raw features). GIB enjoys a simple and tractable objective, without any variational approximation or distributional assumption. We empirically demonstrate the superiority of GIB over other popular neural network-based IB approaches in adversarial robustness and out-of-distribution (OOD) detection. Meanwhile, we also establish the connection between IB theory and invariant causal representation learning, and observed that GIB demonstrate appealing performance when different environments are observed sequentially, a more practical scenario where invariant risk minimization (IRM) fails.
Giuseppe Serra, Zhao Xu, Mathias Niepert, Carolin Lawrence, Peter Tiňo, Xin Yao: "Interpreting Node Embedding with Text-labeled Graphs", IEEE International Joint Conference on Neural Network (IJCNN) 2021 (accepted)
Graph neural networks have recently received increasing attention. These methods often map nodes into latent spaces and learn vector representations of the nodes for a variety of downstream tasks. To gain trust and to promote collaboration between AIs and humans, it would be better if those representations were interpretable for humans. However, most explainable AIs focus on a supervised learning setting and aim to answer the following question: ”Why does the model predict y for an input x?”. For an unsupervised learning setting as node embedding, interpretation can be more complicated since the embedding vectors are usually not understandable for humans. On the other hand, nodes and edges in a graph are often associated with texts in many real-world applications. A question naturally arises: could we integrate the human-understandable textural data into graph learning to facilitate interpretable node embedding? In this paper we present interpretable graph neural networks (iGNN), a model to learn textual explanations for node representations modeling the extra information contained in the associated textual data. To validate the performance of the proposed method, we investigate the learned interpretability of the embedding vectors and use functional interpretability to measure it. Experimental results on multiple text-labeled graphs show the effectiveness of the iGNN model on learning textual explanations of node embedding while performing well in downstream tasks.
Index Terms—Node embedding, interpretability, text mining
Full paper download: Interpreting_Node_Embedding_with_Text-labeled_Graph_090621.pdf
Ammar Shaker, Francesco Alesiani, Shujian Yu, Wenzhe Yin “Bilevel Continual Learning,” IJCNN 21 (accepted)
Continual learning (CL) studies the problem of learning a sequence of tasks, one at a time, such that the learning of each new task does not lead to the deterioration in performance on the previously seen ones. This paper presents Bilevel Continual Learning (BiCL), a general framework for continual learning that fuses bilevel optimization and recent advances in meta-learning for deep neural networks. BiCL is able to train both deep discriminative and generative models and deep generative models under the conservative setting of the online continual learning settings. Experimental results show that BiCL provides competitive performance in terms of accuracy for the current task while reducing the effect of catastrophic forgetting.
Shujian Yu, Francesco Alesiani, Xi Yu, Robert Jenssen, Jose C. Príncipe: “Measuring Dependence with Matrix-based Entropy Functional,” AAAI 2021
Measuring the dependence of data plays a central role in statistics and machine learning. In this work, we summarize and generalize the main idea of existing information-theoretic dependence measures into a higher-level perspective by the Shearer’s inequality. Based on our generalization, we then propose two measures, namely the matrix-based normalized total correlation Tα* and the matrix-based normalized dual total correlation Dα* to quantify the dependence of multiple variables in arbitrary dimensional space, without explicit estimation of the underlying data distributions. We show that our measures are differentiable and statistically more powerful than prevalent ones. We also show the impact of our measures in four different machine learning problems, namely the gene regulatory network inference, the robust machine learning under covariate shift and non-Gaussian noises, the subspace outlier detection, and the understanding of the learning dynamics of convolutional neural networks (CNNs), to demonstrate their utilities, advantages, as well as implications to those problems. Code of our dependence measure is available at: https://bit.ly/AAAI-dependence.
Full author details: Shujian Yu, NEC Laboratories Europe; Francesco Alesiani, NEC Laboratories Europe; Xi Yu, University of Florida; Robert Jenssen, UiT - The Arctic University of Norway; Jose C. Príncipe, University of Florida
Presented at: 35th Conference on Artificial Intelligence (AAAI-21)
Carolin Lawrence, Timo Sztyler and Mathias Niepert: “Explaining Neural Matrix Factorization with Gradient Rollback”, AAAI 2021
Explaining the predictions of neural black-box models is an important problem, especially when such models are used in applications where user trust is crucial. Estimating the influence of training examples on a learned neural model's behavior allows us to identify training examples most responsible for a given prediction and, therefore, to faithfully explain the output of a black-box model. The most generally applicable existing method is based on influence functions, which scale poorly for larger sample sizes and models.
We propose gradient rollback, a general approach for influence estimation, applicable to neural models where each parameter update step during gradient descent touches a smaller number of parameters, even if the overall number of parameters is large. Neural matrix factorization models trained with gradient descent are part of this model class. These models are popular and have found a wide range of applications in industry. Especially knowledge graph embedding methods, which belong to this class, are used extensively. We show that gradient rollback is highly efficient at both training and test time. Moreover, we show theoretically that the difference between gradient rollback's influence approximation and the true influence on a model's behavior is smaller than known bounds on the stability of stochastic gradient descent. This establishes that gradient rollback is robustly estimating example influence. We also conduct experiments which show that gradient rollback provides faithful explanations for knowledge base completion and recommender datasets.
Presented at: 35th Conference on Artificial Intelligence (AAAI-21)
Full paper download: 16632-Article_Text-20126-1-2-20210518.pdf
Bhushan Kotnis, Carolin Lawrence and Mathias Niepert: “Answering Complex Queries in Knowledge Graphs with Bidirectional Sequence Encoders”, AAAI 2021
Representation learning for knowledge graphs (KGs) has focused on the problem of answering simple link prediction queries. In this work, we address the more ambitious challenge of predicting the answers of conjunctive queries with multiple missing entities. We propose Bidirectional Query Embedding (BIQE), a method that embeds conjunctive queries with models based on bi-directional attention mechanisms. Contrary to prior work, bidirectional self-attention can capture interactions among all the elements of a query graph. We introduce two new challenging data sets for studying conjunctive query inference and conduct experiments on several benchmark datasets that demonstrate BIQE significantly outperforms state of the art baselines.
Presented at: 35th Conference on Artificial Intelligence (AAAI-21)
Shujian Yu, Ammar Shaker, Francesco Alesiani and Jose Principe: “Measuring the Discrepancy between Conditional Distributions: Methods, Properties and Applications”, IJCAI 2020
We propose a simple yet powerful test statistic to quantify the discrepancy between two conditional distributions. The new statistic avoids the explicit estimation of the underlying distributions in high dimensional space and it operates on the cone of symmetric positive semidefinite (SPS) matrix using the Bregman matrix divergence. Moreover, it inherits the merits of the correntropy function to explicitly incorporate high-order statistics in the data. We present the properties of our new statistic and illustrate its connections to prior art. We finally show the applications of our new statistic on three different machine learning problems, namely the multi-task learning over graphs, the concept drift detection, and the information-theoretic feature selection, to demonstrate its utility and advantage. Code of our statistic is available at https: //bit.ly/BregmanCorrentropy.
Presented at: International Joint Conference on Artificial Intelligence – Pacific Rim International Conference on Artificial Intelligence, 2020
Francesco Alesiani, Shujian Yu, Ammar Shaker: “Towards Interpretable Multi Task Learning”, ECML PKDD 2020
Interpretable Multi-Task Learning can be expressed as learn-ing a sparse graph of the task relationship based on the predictionperformance of the learned models. Since many natural phenomenonexhibit sparse structures, enforcing sparsity on learned models reveals theunderlying task relationship. Moreover, different sparsification degreesfrom a fully connected graph uncover various types of structures, likecliques, trees, lines, clusters or fully disconnected graphs. In this paper,we propose a bilevel formulation of multi-task learning that induces sparsegraphs, thus, revealing the underlying task relationships, and an efficientmethod for its computation. We show empirically how the induced sparsegraph improves the interpretability of the learned models and their re-lationship on synthetic and real data, without sacrificing generalizationperformance. Code athttps://bit.ly/GraphGuidedMTL
Ammar Shaker, Shujian Yu, Xiao He, Christoph Gärtner: “Online Meta-Forest for Regression Data Streams”, IJCNN 2020 and WCCI 2020
Stream learning is essential when there is lim-ited memory, time and computational power. However, existingstreaming methods are mostly designed for classification withonly a few exceptions for regression problems. Although beingfast, the performance of these online regression methods isinadequate due to their dependence on merely linear models.Besides, only a few stream methods are based on meta-learningthat aims at facilitating the dynamic choice of the right model.Nevertheless, these approaches are restricted to recommendlearners on a window and not on the instance level. In thispaper, we present a novel approach, named Online Meta-Forest,that incrementally induces an ensemble of meta-learners thatselects the best set of predictors for each test example. Eachmeta-learner has the ability to find a non-linear mapping of theinput space to the set of induced models. We conduct a series ofexperiments demonstrating that Online Meta-Forest outperformsrelated methods on16out of25evaluated benchmark anddomain datasets in transportation.Index Terms—Learning from Data Streams, Adaptive Learn-ing, Meta-Learning, Regression Streams, Data Streams, OnlineBagging, Ensemble Learning
Shujian Yu, Ammar Shaker, Francesco Alesiani, Jose C. Principe: “Measuring the Discrepancy between two Conditional Distributions: Methods, Properties and Applications”, IJCAL20
We propose a simple yet powerful test statistic toquantify the discrepancy between two conditionaldistributions. The new statistic avoids the explicitestimation of the underlying distributions in high-dimensional space and it operates on the cone ofsymmetric positive semidefinite (SPS) matrix usingthe Bregman matrix divergence. Moreover, it in-herits the merits of the correntropy function to ex-plicitly incorporate high-order statistics in the da-ta. We present the properties of our new statisticand illustrate its connections to prior art. We fi-nally show the applications of our new statistic onthree different machine learning problems, name-ly the multi-task learning over graphs, the conceptdrift detection, and the information-theoretic fea-ture selection, to demonstrate its utility and advan-tage. Code of our statistic is available at bit.ly/BregmanCorrentropy.
A. Garcıa-Duran, R. Gonzalez, D. Onoro-Rubio, M. Niepert, H. Li: "TransRev: Modeling Reviews as Translations from Users to Items", 42nd European Conference on Information Retrieval (ECIR 2020), April 2020
C. Lawrence, B. Kotnis, M. Niepert: “Attending to Future Tokens for Bidirectional Sequence Generation”, EMNLP 2019
Neural sequence generation is typically performed token-by-token and left-to-right.Whenever a token is generated only previously produced tokens are taken into consideration. In contrast, for problems such as sequence classification, bidirectional attention, which takes both past and future tokens into consideration, has been shown to perform much better. We propose to make the sequence generation process bidirectional by employing special placeholder tokens. Treated as a node in a fully connected graph, a placeholder token can take past and future tokens into consideration when generating the actual output token. We verify the effectiveness of our approach experimentally on two conversational tasks where the proposed bidirectional model outperforms competitive baselines by a large margin.
Presented at: Conference on Empirical Methods in Natural Language Processing 2019 and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019
K. Akimoto, T. Hiraoka, K. Sadamasa and M. Niepert: “Cross-Sentence N-ary Relation Extraction using Lower-Arity Universal Schemas”, EMNLP 2019
Luca Franceschi, Xiao He, Mathias Niepert, Massimiliano Pontil, “Graph structure learning for GCNs”, ICLR, July 2019
C. Wang, M.Niepert: “State-Regularized Recurrent Neural Networks” ICML 2019 (Thirty-sixth International Conference on Machine Learning), May 2019
L. Franceschi, X. He, M. Niepert, M. Pontil:“Learning Discrete Structures for Graph Neural Networks” ICML 2019 (Thirty-sixth International Conference on Machine Learning), 2019
C. Wang, M. Niepert, H. Li, "RecSys-DAN: Discriminative Adversarial Networks for Cross-Domain Recommender Systems" in IEEE Transactions on Neural Networks and Learning Systems. March 2019
A. G. Duran, D. Rubio, M. Niepert, Y. Liu, H. Li, D. Rosenblum, "MMKG: Multi-Modal Knowledge Graphs" in ESWC 2019, the 16th Extended Semantic Web Conference. March 2019
D. Rubio, A. G. Duran, M. Niepert, R. Gonzales, R. Lopez-Sastre, "Answering Visual-Relational Queries in Web-Extracted Knowledge Graphs" in AKBC 2019, Automated Knowledge Base Construction Conference. March 2019
B. Kotnis, A. G. Duran, "Learning Numerical Attributes in Knowledge Bases" in AKBC 2019, Automated Knowledge Base Construction Conference. March 2019