Massive-scale Enhancement of Knowledge Databases on Graphcore Processing Units

In a recent competition at NeurIPS 2022, Graphcore emerged victorious in the Knowledge Graph track of the Open Graph Benchmark Large-Scale Challenge (OGB-LSC). The competition, which aims to push the boundaries of graph representation learning, saw Graphcore's submission detailing a unique approach to training Knowledge Graph Embedding (KGE) models.

The winning approach involved the training of an ensemble of 85 KGE models. This ensemble combined five different scoring functions and two different loss functions. The models in the ensemble were chosen based on their performance, with DistMult and ComplEx models, known for their exceptional performance in ensembles, making up a significant portion.

The best individual model achieved an Mean Reciprocal Rank (MRR) of 0.243. However, it was the ensemble that truly shone, achieving a validation MRR of 0.2922 and an MRR of 0.2562 on the test-challenge dataset. This ensemble was trained using the BESS (Balanced Entity Sampling and Sharing) distributed processing scheme, which efficiently balances communication and compute over multiple workers.

BESS randomly and uniformly partitions the set of entity embeddings across available workers and samples batches uniformly from the partitioned triples during training. This approach guarantees that only tail embeddings have to be exchanged across workers during training, making it an efficient method for large-scale training. The BESS approach was trained on a Graphcore Bow Pod16, benefiting from collective communications running over fast IPU-IPU links.

The WikiKG90Mv2 dataset, a large-scale Knowledge Graph based on Wikidata, consisting of over 90 million nodes and 600 million triples, was used for the competition. To reduce the generalisation gap, the training dataset was biased so that the resulting distribution of relation types matches the cube root of relation type counts in the training dataset.

The predictions of multiple models were combined using a power-rank ensembling strategy. The validation and test datasets in the competition had a distribution of relation types that was proportional to the cube root of the original relation counts.

The methods presented in the paper are not a magic bullet, but they aim to help the community in creating fast and accurate KGE models and accelerate their adoption in real-world applications. The paper references various works on KGE, including TransE, DistMult, ComplEx, TransH, RotatE, and others.

KGE models have several important applications across various domains. They help in link prediction, question answering, recommendation services, semantic search and information retrieval, personalized academic retrieval, biomedical and clinical applications, cybersecurity, and fake news detection.

In link prediction, KGEs help predict missing links or relationships in a knowledge graph, which is essential for completing incomplete data and discovering new connections. They support advanced question answering systems by embedding KG facts into vector spaces that allow semantic understanding beyond keyword matching. In recommendation systems, KGEs capture complex product or item relationships and user preferences beyond simple categorical data.

Embeddings enhance search engines and enterprise search by understanding query intent and content semantics, improving result relevance even with varied terminology. In personalized academic retrieval, KGEs can be used to generate user profiles and model user interests in academic search, improving document recommendation and retrieval by matching user embeddings with document embeddings.

In biomedical and clinical applications, KGEs represent and analyze complex biomedical data, such as interactions among proteins, diseases, pathways, mutations, and chemicals, facilitating insights in systems biology, drug discovery, and precision medicine. In cybersecurity, they assist in modeling and predicting cyber-attacks by capturing complex knowledge of cyber threats within knowledge graphs.

In fake news detection, knowledge graphs and their embeddings help detect misinformation by representing and analyzing relationships between entities in social media content. Recent advancements improve scalability and enable embedding of large knowledge graphs, enhancing applications that require global structural understanding in huge datasets.

In summary, KGE models are widely applied in knowledge graph completion, advanced search, recommendation systems, personalized retrieval, biomedical research, cybersecurity, and misinformation detection, by transforming structured knowledge into continuous vector spaces that machines can efficiently learn from and exploit for downstream tasks. The winning approach at the OGB-LSC Knowledge Graph Competition demonstrates the advantage of a diverse ensemble of well-tuned models for large-scale Knowledge Graph completion.

The winning approach at the OGB-LSC Knowledge Graph Competition, led by Graphcore, employed a diverse ensemble of 85 Knowledge Graph Embedding (KGE) models, with the ensemble being trained using the BESS approach, which leverages technology like cloud-computing and data-and-cloud-computing for efficient large-scale training. These KGE models, including DistMult and ComplEx, are crucial in various domains such as science, where they aid in areas like link prediction, question answering, recommendation services, and semantic search.

Massive-scale Enhancement of Knowledge Databases on Graphcore Processing Units