Tiny Use Case 2: Can we test one of the points from Hiroki Azuma’s “Otaku: Japan’s Database Animals” with the JVMG database? Part 4: Questions of validity and the theoretical implications of our results

It has been quite a journey getting to this fourth part in our series on Tiny Use Case 2. We started out by introducing Hiroki Azuma’s discourse defining work, Otaku: Japan’s Database Animals, and picking out a claim that would be worth examining on the JVMG database. Next we introduced the two datasets (The Visual Novel Database (VNDB) and Anime Characters Database (ACDB)) we were employing for our analysis, and examined some key descriptive statistics. Finally, in part three we employed the toolkit of regression analysis to see whether our two hypotheses are confirmed or contradicted by the data at our disposal. Our hypotheses were:

Tiny Use Case 2: Can we test one of the points from Hiroki Azuma’s “Otaku: Japan’s Database Animals” with the JVMG database? Part 3: Regression analysis

Following the first part of this series, where we introduced Hiroki Azuma’s seminal book Otaku: Japan’s Database Animals, and identified the point (“many of the otaku characters created in recent years are connected to many characters across individual works” (p 49)) we are testing on the JVMG database; in part two we discussed the two datasets (The Visual Novel Database (VNDB) and Anime Characters Database (ACDB)) we are working with and the operationalization of our concepts on these datasets. Furthermore, we examined some key descriptive statistics , and based on what we saw, we reformulated our initial two hypotheses to be the following:

Tiny Use Case 2: Can we test one of the points from Hiroki Azuma’s “Otaku: Japan’s Database Animals” with the JVMG database? Part 2: Descriptive statistics

In the first part of this series we introduced Hiroki Azuma’s seminal book Otaku: Japan’s Database Animals, and identified a point to try and test on the JVMG database, namely that “many of the otaku characters created in recent years are connected to many characters across individual works” (p 49). This led to the formulation of the following two hypotheses.

Tiny Use Case 2: Can we test one of the points from Hiroki Azuma’s “Otaku: Japan’s Database Animals” with the JVMG database? Part 1: Formulating a hypothesis

Hiroki Azuma’s Dōbutsu ka suru posutomodan: otaku kara mita nihon shakai (Animalizing postmodern: Japanese society as seen from otaku), published by Kōdansha in 2001, has been one of the most influential treatises on not only Japanese otaku (the word roughly translates to avid fans of anime, manga, games, etc., similar in meaning to geek in the English language domain), but also on the production and consumption paradigm defining Japanese anime, manga, light novels and games in late modernity. The books impact on the discourse around otaku and the just enumerated domains is truly international thanks in part to the English translation, which was published in 2009 as Otaku: Japan’s database animals (introduction and translation by Jonathan E. Abel & Shion Kono, University Of Minnesota Press, all quotes in the following are from this English edition).

Tiny Use Case 1 Part III: Eyes Wide Open – Tareme and Tsurime as predictors of character demeanor

Part I of this blogpost left us with the question of whether there is a specificity to visual novel game characters. Part II concluded with an invitation to compare two specific design elements, tareme and tsurime, in light of the player’s position during the gaming experience and the data available on the VNDB repository. In this third and final part we will summarize the analysis of data pertaining to tareme and tsurime leads us.

First, we need to remind ourselves that the exchange of gazes between the player and the character is one of the defining elements of a visual novel game’s experience. A visual novel game is played in a first-person perspective: the prose is written in the first person and character sprites are generally depicted as looking at the player.

This digression has been necessary to highlight the importance that the first person and the gaze have in generating the experience, and in turn re-highlight the potential importance of eyes in the construction of the characters. This brings us once more to tareme and tsurime and what kind of demeanor they communicate. According to their description on VNDB.orgtareme suggests a gentler and caring demeanor, opposed to tsurime, which suggests a demeanor that is more distant and generally non-friendly.

Tiny Use Case 1 Part II: Data from vndb.org and the Player’s position

At the end of part I of this blogpost, we were asking ourselves whenever we could use data from The Visual Novel Database to further our investigation into visual novel game characters. First, let us look at the numbers of vndb.org: the site catalogues over 91240 visual novel characters via a system of 2140 traits. These characters come from a grand total of over 27951 distinct visual novel game titles. The Visual Novel Database’s trait system is a rich apparatus with which fans can catalogue characters in visual novels on the basis of specific categories. There are trait trees pertaining to a character’s hair, a character’s eyes, their body shape, their clothes, personal items, personality, their role in the game’s narrative, what they do and what is done to them, with separate trees for sexual activity.

To employ such a system in a meaningful manner, we first decided to consider the position of the player as they play through a visual novel game. During the course of the game, the player is (usually) first introduced to each of the game’s characters, and then presented with the first of many choices to steer the gameplay experience towards one character or another. Intimacy is gradually built through discovery of a character’s personal narrative, which articulates conventional design elements known to fans and producers into the game’s specific narrative context. By knowing the character more and more, the player can make decisions that are more in accord with a specific character.

Tiny Use Case 1 Part I: Investigating Japanese Visual Novel Characters

The first Tiny Use Case undertaken within the JVMG was about Japanese visual novel games and their characters. Japanese Visual Novel Games are prose-heavy interactive experiences whose main goal is to win the affections of one or more characters. Visual novel games feature situations and interaction proper of Japanese anime and manga, which the player navigates by choosing which path to take through the narrative at specific points. These points are presented as choices between multiple options, each of which will steer the player towards one character or another, or even towards a failure state. The player progress towards a character’s affection through a series of narrative events, until physical intimacy is reached. When intimacy gets physical, it is usually represented in pornographic fashion, with situations proper of pornographic manga and anime.

Japanese Visual novel games present an interesting research object in the form of character design elements. within visual novel works, we can observe usage of moe, shōjo manga and BL (Boys Love) aesthetics as an integral part of the gamic experience. In particular, character eyes and gazes are central to depiction of intimacy between characters. Do the eyes of visual novel characters code some patterned ways of relating to them? How can we employ the JVMG data gathering efforts to garner insights into the characters of visual novel games, especially regarding character eyes and gaze? Can we test this against knowledge from both the researchers and scholars in the field?

Working with the Tiny Use Case workflow methodology in the JVMG project

Following the success of our project launching workshop in July 2019, the work on processing community databases started in earnest (you can read about the technical details of the process in relation to ontology creation and data transformation). By November 2019, we were ready to start examining the data and our infrastructure through the lens of exploratory research.

We decided to adopt the Tiny Use Case workflow methodology to have a number of short-term research projects that would be substantial enough to generate meaningful and interesting results in their own right, but would be compact enough to provide an ongoing stream of feedback on issues with the database, the project infrastructure, and researcher needs. Since each Tiny Use Case is only 3-4 months long, it provides us with an excellent tool for assessing our progress and for uncovering newer issues, as each TUC has a different focus and somewhat different requirements.

Data quality and ground truth

After taking a six month break due to an internship, I restarted my work as a student assistant for the Japanese Visual Media Graph project in April 2020. 

Currently, my main occupation is in the field of data quality control. After getting lots of data from different fan communities, the quality of said data needs to be checked against other sources to make sure there aren’t any errors adopted into the project’s database. To get started with this task, it was decided to first check several small data samples from different providers to enable an easier determination of the duration, effort, and expectable problems and results that a wider data quality control would entail. 

I received the first two data samples from two different fan sites, both containing twenty entries of anime with several properties for me to check; those properties were, for example, the Japanese and English titles of the anime, the producing studio’s name in Japanese and English, the release date of the first episode, and the overall episode count or the completeness of a series. The properties in the samples depended on the usage of properties by the fan communities the data came from, and my task was to check if the entries were correct or if they contained some errors. To prove something, I had to find a source of ground truth for it, which would occasionally prove to be some kind of a challenge. Of course, the anime on DVD would actually be the best source of ground truth, but since the resources for this simply didn’t exist, I relied on other sources. A valid source of ground truth would, for example, be the opening or ending sequence of an anime, preferably found on YouTube or a legal streaming source like Netflix or Crunchyroll. An image of the DVD-case of an anime would also be a usable source for ground truth. 

I worked with the data in an excel sheet, marking the correctness of the respective properties accordingly and adding screenshots and links to my sources of ground truth.

excel screenshot of the first data sample

I encountered a noticeable difference in finding ground truth for the different properties. Finding proof for the Japanese or English anime titles almost never posed a problem; they usually could be found in the opening sequences or on DVD-cases. The year of first release could also be usually seen in the opening or ending sequences. The exact date was however sometimes quite difficult to proof. While I tried to use the Japanese Amazon Prime at first, it proved to be not reliable enough. Most of the time I could only return to the Media Arts Database to find proof for an exact date of release.

The names of the producing studios could usually be found easily inside the opening and ending sequence of the respective anime; however I sometimes encountered the problem that the studio didn’t write its own name in katakana, like provided by the fan-based data. While I usually could validate that it was indeed the correct studio, the spelling couldn’t be found in any official source for ground truth. I always marked those occurrences as “correct but strange” and left it open for further decisions.

example of the above mentioned studio problem

A complicated property was the completeness of an anime. Whether or not this is a usable or provable property remains to be discussed.

After having checked those first two data samples, I can state that there was a lot of correct data, but also errors of different types. How to deal with them is also currently a point of discussion. The next samples from new sources will surely bring even more new experiences and insights.

Turning Fan-Created Data into Linked Data II: Data Transformation

In a previous post, we discussed the creation of a Linked Data ontology that can be used to describe existing fan-created data that the JVMG is working with. For the ontology to work correctly, the data itself must also be converted into a Linked Data format, and so in this post we’ll be discussing the transformation of data, as it’s received from providers, into RDF.

To summarize, our workflow involves using python and the RDFLib library inside a set of Jupyter notebooks to transform and export the data from all of the data provider partners. Data ingestion is also sometimes done using Python and Jupyter notebooks, but here we’ll just focus on the data transformation. 

Continue reading "Turning Fan-Created Data into Linked Data II: Data Transformation"