The JVMG project collects data from multiple sources and converts it into the RDF format. One of the core characteristics of this format is that all entities and attributes are represented as URIs, while the value of said attributes are either URIs (thus linking two entities using a property) or literal values.
The SPARQL language can then be used to formulate search queries on RDF stored in a database, but this requires the user to be both familiar with the query language as well as the structure of the RDF data.
As all entities and properties are identified by URIs, one way to explore RDF data is having a web server that serves the domain that the data URIs are residing in and shows all information that can be associated with a given URI.
This functionality is one of the main ideas of linked data: a linked data frontend can serve “raw” RDF data to programs that try to resolve an URI while human users using a browser to resolve the same URI get a human-readable HTML view of all the data that is associated with this URI.
Such a frontend also allows for simple exploration and navigation of a dataset, as all URIs in the human-readable view can be made into clickable links.
Most RDF triple stores, especially commercial solutions, come with a simple web frontend that provides the exploration capabilities described above. But these often come with a limited set of configuration options.
As an open-source alternative, there is the “Pubby” frontend, which was developed in the D2ME side-project of the Europeana initiative. See Baierer, K., Dröge, E., Eckert, K., Goldfarb, D., Iwanowa, J., Morbidoni, C., & Ritze, D. (2017). DM2E: A linked data source of digitised manuscripts for the digital humanities. Semantic Web, 8(5), 733-745 for details.
The software is quite versatile and includes functionality such as
- configurable SPARQL-query and SPARQL-endpoint
- content negotiation
- external label lookup
- preferred language for labels
- preloading labels for faster response
But there are several trade-offs: It is developed as a Java Web Application and needs the corresponding infrastructure to be run as a server. Also there is a significant slowdown when a URI entity page has a lot of incoming or outgoing links, as the labels for these links are resolved with individual lookups.
Our own frontend
As the JVMG knowledge graph has entity pages with as much as 90.000 labelled links, e.g. the “character” entity page http://mediagraph.link/vndb/ont/Character, and we desired more control over the appearance and further functionality of the web frontend, we developed our own solution.
We chose Python and the Django web application framework (https://www.djangoproject.com/) for the implementation. This allowed for rapid prototyping.
In order to have fast label lookups, our approach creates a single SPARQL query that retrieves all relevant data for a given URI and the corresponding labels. This minimizes the amount of connections we have to open to the database (as every one of them costs time) and allows the database to use its internal structures (e.g. indexes and caches) to speed up query processing.
The following image shows a simplified version of our SPARQL query. For a given URI (Target) it gathers all triples where the URI is the subject or the object. Additionally, it also gathers the labels of each part of a triple.
Python code then generates a basic HTML view that can be further styled using CSS. It is a rather small codebase that – besides the Django framework – mainly relies on the SPARQLWrapper and RDFlib modules. The response times for entity pages with many links are orders of magnitude faster than the prior “Pubby” installation, so this was a promising start.
As the code base is very compact and easy to understand, extended functionality can easily be added. Correspondingly, the HTML view has evolved quite a bit and now includes
- multiple CSS variants, including a “dark mode”
- provenance information on every statement to see which data source is responsible
- settings to limit the labels to one or more languages
- interactively expandable attribute sections that limit the number of values by default to keep the view compact
- filters to hide information contained in specific subsets of the data (called graphs in the triple store)
The framework allows for the development of plug-in-like expansions that add new functionality to the server. Following the research done in the tiny use cases, some expansions were developed that showcase the possibilities:
- co-occurence counter
This expansion uses the currently shown URI in the web frontend as a starting point and traverses the graph to collect all entities that are linked to it and all their literal and URI attribute values. These values are then tallied and result in a co-occurance statistic for the starting entities. This is completely independent from the starting point: starting from a tag describing a character trait, the list will show other traits and many characters share a given combination. Starting from a person associated with a work, the list will show other persons that have worked together with the starting person and the number of works this collaboration did.
A simple search interface has been added that uses an Elasticsearch index. One can search for label literal values or parts of URI strings
One aspect that needs further work is performance. Very large requests like the aforementioned character entity page no longer crash or time out, but response times are still too long. The current bottleneck seems to be the database and we are looking into the configuration options of the Fuseki triple store we currently use.
Also on our to-do list is a proper open source release. Documentation is still very limited and the code has changed a lot in a short time. The prototype frontend is running very stable and has already been adopted in another research project, so we feel comfortable to release the code to the public soon.