Presenting at the Ontologies for Narrative and Fiction workshop

The first week of July we were in Groningen to join Federico Pianzola and the GOLEM team at the University of Groningen for a hybrid workshop on Ontologies for Narrative and Fiction (in cooperation with the Center for Language and Cognition, the Center for Digital Humanities, and the Jantina Tammes School of Digital Society, Technology and AI). This workshop was a very special event bringing together an amazing spectrum of expertise and experience with modelling themes, genres, narratives and characters in fiction using an array of approaches from BFO, CIDOC-CRM/FRBRoo/LRMoo, the Wikibase data model, Schema.org, custom bottom-up models and so on, as well as with exploring the potential interoperability of these models. All the presentation slides are now available on the workshop’s page.

Day one started out with Janna Hastings‘ presentation “How can fiction change the world? Towards an ontology of literary characters and their interactions” setting the tone and raising the bar for the whole workshop. In fact, the way the presentation managed to both survey and interconnect the spectrum of questions and problems in relation to working with fictional entities from the perspective of the Basic Formal Ontology framework with ontological – in the philosophical sense of the word this time – questions of existence and the work of phenomenological thinkers Roman Ingarden and Alfred Schütz, it felt almost like an opening keynote address for the event. On the more abstract level Janna Hastings offered the tentative conclusion that “reading fiction harnesses not our imagination as it is typically understood, but our facility for social cognition“. And on the practical level proposed extending the BFO framework with an as_if_about relation – “a special relation, unique to fiction, for appearing to represent” – that would be analogous to the already existing is_about relationship.

Next up Paul Sheridan, the contagiously enthusiastic driving force behind the Literary Theme Ontology Project (LTO or just Theme Ontology), outlined what the future of modelling themes could/will look like in his presentation “Toward creating a collection of literary themes for use in supervised learning applications”. Development of the LTO started in 2010 by Paul Sheridan and Mikael Onsjö and has accrued not only a further number of key contributors over the years, but now boasts around 4000 stories – mostly films and TV shows – annotated with circa 3000 themes. The themes cover a wide range of perspectives from classical literary theory to feminist, Marxist, post-colonial criticism and beyond, and they are arranged in an ontology that is BFO compliant. All the data in the project is made available under an MIT License and there is even an R package to enable ease of exploration of the database. The next steps in the project’s development will focus on enabling the automated detection of themes in stories using machine learning techniques with the help of gold or silver standard training data.

In the third presentation of the day, “Towards an ontology for literary history: issues of complexity and scale when constructing the MiMoTextBaseChristof Schöch and Maria Hinzmann introduced the potentials and limitations that come from building on the Wikibase infrastructure and data model in their large-scale on-going project Mining and Modeling Text (MiMoText) at Trier University. One of the advantages of using the Wikibase infrastructure for the MiMoTextBase knowledge graph is that querying the SPARQL endpoint also enables the creation of corresponding visualizations of the results. By aligning their data model with Wikibase the MiMoTextBase is seamlessly interoperable with other parts of the LOD cloud enabling, for example, federated queries. The ontology is structured into eleven modules, and importantly does not model facts but rather only statements – using linked open data allows for the coexistence of even contradictory statements within the knowledge graph. The knowledge graph relies on reification to trace the provenance of these statements. The problem of distinguishing between fictional and real life entities presented in Janna Hastings’ talk was also addressed, with Christof Schöch and Maria Hinzmann demonstrating how “narrative location” can be used to both distinguish and interconnect locations in novels from/with geographical information in Wikidata.

The elaborate data models discussed in relation to the MiMoTextBase were followed-up by a deep dive into the depths of the CIDOC-CRM/FRBRoo/LRMoo data models and their many intricacies discussed by Ingo Börner in his tour de force presentation “Modeling Drama Corpora in CIDOC-CRM” on the Drama Corpora Project (DraCor). DraCor is a constantly expanding collection of corpora of drama texts in a range of different languages, of which the largest three currently being French, German and Russian. All the plays’ texts are encoded according to the guidelines of the Text Encoding Initiative (TEI). Furthermore, DraCor aims to be a prototype for what a Programmable Corpus – “corpora that expose an open, transparently documented and (at least partly) research-driven API to make texts machine-actionable” – would be according to the vision of the Computational Literary Studies Infrastructure (CLS INFRA). DraCor is “LOD-friendly” as a result of also providing information on the plays in RDF format – and thus interconnectible with other LOD data through Wikidata. Aligning the DraCor ontology, which was created using rapid prototyping, with the CIDOC-CRM/FRBRoo/LRMoo data models proved to be a herculean task, with Ingo Börner – quoting Aaron Swartz – having to descend into the “Salt Mines of the Semantic Web”. Beyond a thorough analysis of the potential ways to describe works and characters in CIDOC-CRM/FRBRoo/LRMoo compliant ways Ingo Börner also proposed a solution for dealing with the ontological modelling of characters in DraCor where they have to be treated as characters in a text as well as nodes in a network. This solution can then be potentially extended to be the basis for modelling characters in fanfiction in the GOLEM project and beyond – in part drawing on Janna Hastings work on the fictional continuant (see for example Hastings and Schulz 2019).

The afternoon session started off with Valentina Pasqual offering us a breath of fresh air from all the large-scale projects and industrial-strength ontology building with a very focused and tangible case study in “ODI and BACODI: a study on Destini incrociati by Italo Calvino with Semantic Web Technologies” (coauthored with Enrica Bruno and Francesca Tomasi). Il castello dei destini incrociati (The Castle of Crossed Destinies) is “Calvino’s most rigorous work in combinatorial literature”, which utilizes the cards from the Pierpont Morgan Bergamo version of the Visconti-Sforza Tarot on the one level to create the structure of the novel and on another as an actual means of communication between the characters in the novel itself. The cards and the contents of the novel (characters, objects, events, places, etc.) were meticulously modeled – see the excellent documentation here – to enable a computational narratological analysis of the work and its constituent parts. The ODI and BACODI project website allows us to browse the contents of the book according to the stories (which are presented with network visualizations as well), the cards and the assigned meanings; and all project data is also available on GitHub.

In the following presentation “The World Literature Knowledge Graph: a resource for studying the underrepresentation of non-Western peopleMarco Stranisci took us on a tour of a very special project, namely the World Literature Knowledge Graph, which aims to not only build a knowledge graph, but also to identify and rectify important omissions and biases in central resources like Wikidata and Wikipedia in relation to the representation of non-Western writers and writers belonging to ethnic minorities. The project consists of six main blocks. 1) Modelling writers and their works building on existing ontologies, such as DOLCE, FRBR, PROV-O and Wikidata. 2) Gathering data from various resources and consolidating them into the World Literature Knowledge Graph. 3) A visualization system enables the browsing of the knowledge graph, and 4) users can then provide feedback on potential further resources that should be integrated to make the coverage more complete. 5) A biographical event detection coupled with entity detection approach is developed and employed to further populate the knowledge graph with information on underrepresented authors; incidentally this work has just won an Outstanding Paper Award at the Association for Computational Linguistics 2023 conference. 6) As a final step a fair recommendation system will also be implemented.

Although the potential of using various machine learning and further AI technologies to enhance projects had been touched on in a number of talks up to this point Inès Blin‘s presentation “Leveraging structured representations for narratives” flipped the question completely around: how can we enhance our models and knowledge graphs to better capture narratives in order to help build better and more human-centric AI systems. Her work at Sony Computer Science Laboratories – Paris in cooperation with the Learning & Reasoning group at the Vrije Universiteit Amsterdam focuses on how to automate the building of narratives from knowledge graphs. In previous work utilising data from Wikidata and Wikipedia and drawing on the Simple Event Model Inès Blin built a prototype knowledge graph representation of the French Revolution. This prototype was then used as a testbed to refine graph traversal strategies for the automatic extraction of events and their combining into narratives. In order to allow for the application of this approach at scale her current work explores the automatic aggregation of narrative elements from various knowledge graphs (e.g. Wikidata, Wikipedia and DBpedia). For this a large number of conceptual frameworks and ontologies for describing narratives were examined and then combined. The new ontology developed by Inès Blin uses DUL as its base and is further enriched with elements from FARO, F, PROV and NIF. One of the main challenges for automatically extracting narratives from the aggregated data is the correct identification of the most appropriate frame for each event.

The last presentation of the day was by our colleagues Magnus Pfeffer and Zoltan Kacsuk, “Creating a unified ontology for the Japanese Visual Media Graph”, in which they discussed the challenges posed by working with pre-given heterogeneous databases and the pragmatic approach of modelling the data bottom-up with a view towards the actual use cases of researchers working with the data. The presentation slides can be found below.

The second day, as the scaffolding fell away from the great edifice of the GOLEM, we got a first glimpse of the master plan for which all of day one was just laying the groundworks. In their presentation “First steps of the GOLEM ontology for narrative and fictionFederico Pianzola and Xiaoyan Yang unveiled the ontological circuit boards that will help animate the GOLEM. The two central questions of the GOLEM project are: “how do cultural traits spread, become successful, disappear”; and “which narrative strategies have a stronger impact on readers?” The project studies a number of different language large-scale fanfiction communities and their corpora of works to answer these questions. Therefore, the GOLEM project needs to create an ontology that will be able to accommodate the metadata from the various fanfiction and online reading platforms being examined and to appropriately describe concepts for fiction and narrative, characters and their traits, and textual and reader response features. The ontology building is done via a ground-up approach taking the fanfiction databases as a starting point, but also trying to integrate a number of perspectives from CIDOC-CRM, CRMsoc, CLS INFRA, LRMoo (FRBRoo), NIF and schema from Schema.org. So far the GOLEM team have decided on utilizing schema as the best way of interconnecting the various parts of the GOLEM ontology (e.g. metadata on metrics, content, publication information, characters and so on), which can then be fleshed out with more detailed ontologies borrowing from CIDOC-CRM for example. As an example of this latter undertaking Federico Pianzola and Xiaoyan Yang showed us how fictional characters in both their original appearance and their fanfiction versions could be modeled in CIDOC-CRM with the “Character Abstraction” (E28 conceptual Object) connecting the “Original Character” and “Fanfic Character” that are realized on the “Character Concept” (E89 Propositional Object) and “Character Expression” (E73 Information Object) levels.

Luca Scotti‘s “Alignment and harmonisation: Mapping ontologies for narrative and fiction” touching on many of the ontologies discussed was the perfect final presentation for the workshop: following the crescendo of the roar and thunder of the GOLEM master plan we were treated to a coda of hope looking into the promise that the future of ontology alignment holds for all of us working with modelling works of fiction and their contents. After introducing the basics of ontology mapping and the three main types of relationships that can be established between ontologies (harmonisation, alignment and extension) Luca Scotti outlined the two ontology sets that needed to be aligned: 1) the LRMoo model (as an extension of CIDOC-CRM) with its Work-Expression-Manifestation-Item hierarchy, and 2) the BFO extended by the Literary Theme Ontology and the Literary Character Ontology. Drawing on prior work on the possible alignment of BFO and CIDOC-CRM a proposal for interconnecting all the relevant ontology elements was outlined which has the triad of BFO: Generically Dependent Continuant, CRM: Conceptual Object and Information Artifact Ontology: Information Content Entity elements and their relationships (the latter two connected by an OWL: sameAs relationship and both hierarchically under the BFO: GDC) at its center.

Following every presentation as well as in closing after all the presentations we had the most hands-on and workshop-like discussion sessions we have ever witnessed. Ideas were shared, ontologies doctored, duct tape applied to hold pipelines together, and all in all lots of inspiration and potential solutions going around for all parties concerned. For us at the JVMG project one of the exciting aspects of the discussion was the way GOLEM: Character Abstraction aligns with the JVMG: (Meta-)Character and the GOLEM “Original Character” and “Fanfic Character” lines up with the JVMG: Realized Character.

The best way to sum up the experience of participating in the Ontologies for Narrative and Fiction workshop is by adapting the above tweet from Federico Pianzola cited on the starting slides of Ingo Börner’s presentation: two days of the ONF workshop saved us who knows how much work! It is very rare that one leaves a workshop not just with new ideas and a sense of re-invigoration, but also a long list of actual answers to very pressing issues in the field.

In this spirit we would like to thank Federico Pianzola and the GOLEM project for organizing this workshop, and all the participants for their invaluable contributions, and we very much look forward to continuing working together on an array of different questions and points of connection that came up at the workshop.

Pfeffer_Kacsuk_Groningen_2023_slides