Tiny Use Case 2: Can we test one of the points from Hiroki Azuma’s “Otaku: Japan’s Database Animals” with the JVMG database? Part 4: Questions of validity and the theoretical implications of our results

It has been quite a journey getting to this fourth part in our series on Tiny Use Case 2. We started out by introducing Hiroki Azuma’s discourse defining work, Otaku: Japan’s Database Animals, and picking out a claim that would be worth examining on the JVMG database. Next we introduced the two datasets (The Visual Novel Database (VNDB) and Anime Characters Database (ACDB)) we were employing for our analysis, and examined some key descriptive statistics. Finally, in part three we employed the toolkit of regression analysis to see whether our two hypotheses are confirmed or contradicted by the data at our disposal. Our hypotheses were:

    1. The portion of new characters with shared traits should increase over time.
    2. The portion of shared traits among new characters should increase over time.

We found that our first hypothesis was not substantiated by our regression analyses. And we found no adequate regression model for testing our second hypothesis.

Now, in this fourth part of the series we will first assess our approach and the validity of our results. Then we will consider what implications our findings could have for Azuma’s original argument and the wider theoretical discourse on anime and manga.

continue reading

Tiny Use Case 2: Can we test one of the points from Hiroki Azuma’s “Otaku: Japan’s Database Animals” with the JVMG database? Part 3: Regression analysis

Following the first part of this series, where we introduced Hiroki Azuma’s seminal book Otaku: Japan’s Database Animals, and identified the point (“many of the otaku characters created in recent years are connected to many characters across individual works” (p 49)) we are testing on the JVMG database; in part two we discussed the two datasets (The Visual Novel Database (VNDB) and Anime Characters Database (ACDB)) we are working with and the operationalization of our concepts on these datasets. Furthermore, we examined some key descriptive statistics , and based on what we saw, we reformulated our initial two hypotheses to be the following:

    1. The portion of new characters with shared traits should increase over time.
    2. The portion of shared traits among new characters should increase over time.

In this third and final part of the series we will apply the toolkit of regression analysis to try and understand the relationships – we saw in part two – in our data better, and hopefully get closer to testing out hypotheses. Regression analysis revolves around trying to estimate the relationship between the dependent variable (for which we would like to explain the observed changes in its values) and the independent variables (also called explanatory variables, since we aim to use them to explain the changes in the dependent variable’s values). If there are multiple possible relationships between the independent variables and the dependent variable, for example due to the large number of possible explanatory variables we could include in our model, the process of regression analysis involves comparing the different possible models and selecting the best performing one.

continue reading

Tiny Use Case 2: Can we test one of the points from Hiroki Azuma’s “Otaku: Japan’s Database Animals” with the JVMG database? Part 2: Descriptive statistics

In the first part of this series we introduced Hiroki Azuma’s seminal book Otaku: Japan’s Database Animals, and identified a point to try and test on the JVMG database, namely that “many of the otaku characters created in recent years are connected to many characters across individual works” (p 49). This led to the formulation of the following two hypotheses.

    1. The number of new characters with shared traits should increase over time.
    2. The number of shared traits among new characters should increase over time.

How do we go about actually testing these hypotheses on the available data? Well, we would need to be able to somehow assign appearance dates to each character, otherwise we won’t be able to look at changes over time, and we also have to be able to define what characters with shared traits mean in the context of our data. So let’s take a look at the data we have to work with.

For this TUC we decided to use the data from The Visual Novel Database (VNDB) and Anime Characters Database (ACDB), as these databases both have a significant number of characters and a relatively large number and detailed level of traits describing them. There are, however, important differences between the two datasets. VNDB only focuses on visual novels, whereas ACDB collects data on a wide range of characters from various media (although predominantly focusing on visual novels and anime). Furthermore, VNDB has a very rich and rigorously structured – nevertheless open to extension by users – ontology of traits, which, however, lacks a core set of featured traits that would be expected to be available in relation to all characters. In contrast ACDB features a hybrid system for describing characters, which on the one hand supports a closed ontology for eight flagship traits that are part of each character’s fact sheet, and on the other hand provides the opportunity for a free form tagging of characters with user created labels.

continue reading

Tiny Use Case 2: Can we test one of the points from Hiroki Azuma’s “Otaku: Japan’s Database Animals” with the JVMG database? Part 1: Formulating a hypothesis

Hiroki Azuma’s Dōbutsu ka suru posutomodan: otaku kara mita nihon shakai (Animalizing postmodern: Japanese society as seen from otaku), published by Kōdansha in 2001, has been one of the most influential treatises on not only Japanese otaku (the word roughly translates to avid fans of anime, manga, games, etc., similar in meaning to geek in the English language domain), but also on the production and consumption paradigm defining Japanese anime, manga, light novels and games in late modernity. The books impact on the discourse around otaku and the just enumerated domains is truly international thanks in part to the English translation, which was published in 2009 as Otaku: Japan’s database animals (introduction and translation by Jonathan E. Abel & Shion Kono, University Of Minnesota Press, all quotes in the following are from this English edition).

With almost twenty years since the original publication in Japanese and more than ten years since the English translation was released, the concepts and frameworks outlined by Azuma have become almost taken for granted cornerstones of this scholarly discourse. However, even though Azuma’s line of argument contains a number of potentially testable statements, to the best of our knowledge these have so far not been subjected to any large-scale empirical test, as they are most often only invoked in relation to various case studies. The aim of this Tiny Use Case was to identify a testable major point from Azuma’s seminal work, and to test it with the help of the database assembled within the framework of the JVMG project.

Continue reading

Working with the Tiny Use Case workflow methodology in the JVMG project

Following the success of our project launching workshop in July 2019, the work on processing community databases started in earnest (you can read about the technical details of the process in relation to ontology creation and data transformation). By November 2019, we were ready to start examining the data and our infrastructure through the lens of exploratory research.

We decided to adopt the Tiny Use Case workflow methodology to have a number of short-term research projects that would be substantial enough to generate meaningful and interesting results in their own right, but would be compact enough to provide an ongoing stream of feedback on issues with the database, the project infrastructure, and researcher needs. Since each Tiny Use Case is only 3-4 months long, it provides us with an excellent tool for assessing our progress and for uncovering newer issues, as each TUC has a different focus and somewhat different requirements.

continue reading

Data quality and ground truth

After taking a six month break due to an internship, I restarted my work as a student assistant for the Japanese Visual Media Graph project in April 2020. 

Currently, my main occupation is in the field of data quality control. After getting lots of data from different fan communities, the quality of said data needs to be checked against other sources to make sure there aren’t any errors adopted into the project’s database. To get started with this task, it was decided to first check several small data samples from different providers to enable an easier determination of the duration, effort, and expectable problems and results that a wider data quality control would entail. 

I received the first two data samples from two different fan sites, both containing twenty entries of anime with several properties for me to check; those properties were, for example, the Japanese and English titles of the anime, the producing studio’s name in Japanese and English, the release date of the first episode, and the overall episode count or the completeness of a series. The properties in the samples depended on the usage of properties by the fan communities the data came from, and my task was to check if the entries were correct or if they contained some errors. To prove something, I had to find a source of ground truth for it, which would occasionally prove to be some kind of a challenge. Of course, the anime on DVD would actually be the best source of ground truth, but since the resources for this simply didn’t exist, I relied on other sources. A valid source of ground truth would, for example, be the opening or ending sequence of an anime, preferably found on YouTube or a legal streaming source like Netflix or Crunchyroll. An image of the DVD-case of an anime would also be a usable source for ground truth. 

I worked with the data in an excel sheet, marking the correctness of the respective properties accordingly and adding screenshots and links to my sources of ground truth.

excel screenshot of the first data sample

I encountered a noticeable difference in finding ground truth for the different properties. Finding proof for the Japanese or English anime titles almost never posed a problem; they usually could be found in the opening sequences or on DVD-cases. The year of first release could also be usually seen in the opening or ending sequences. The exact date was however sometimes quite difficult to proof. While I tried to use the Japanese Amazon Prime at first, it proved to be not reliable enough. Most of the time I could only return to the Media Arts Database to find proof for an exact date of release.

The names of the producing studios could usually be found easily inside the opening and ending sequence of the respective anime; however I sometimes encountered the problem that the studio didn’t write its own name in katakana, like provided by the fan-based data. While I usually could validate that it was indeed the correct studio, the spelling couldn’t be found in any official source for ground truth. I always marked those occurrences as “correct but strange” and left it open for further decisions.

example of the above mentioned studio problem

A complicated property was the completeness of an anime. Whether or not this is a usable or provable property remains to be discussed.

After having checked those first two data samples, I can state that there was a lot of correct data, but also errors of different types. How to deal with them is also currently a point of discussion. The next samples from new sources will surely bring even more new experiences and insights.

Turning Fan-Created Data into Linked Data II: Data Transformation

In a previous post, we discussed the creation of a Linked Data ontology that can be used to describe existing fan-created data that the JVMG is working with. For the ontology to work correctly, the data itself must also be converted into a Linked Data format, and so in this post we’ll be discussing the transformation of data, as it’s received from providers, into RDF.

To summarize, our workflow involves using python and the RDFLib library inside a set of Jupyter notebooks to transform and export the data from all of the data provider partners. Data ingestion is also sometimes done using Python and Jupyter notebooks, but here we’ll just focus on the data transformation. 

Continue reading “Turning Fan-Created Data into Linked Data II: Data Transformation”

Turning Fan-Created Data into Linked Data I: Ontology Creation

One of the primary functions of the JVMG project is to enable researchers to work with existing data in ways that are not readily enabled by the data providers themselves. One way in which we are attempting to facilitate this flexible data work is through the use of Linked Data. As we are working with a diverse set of data providers, the ways in which they create, store, and serve data are similarly diverse. Some of these providers are MediaWiki pages, with data being available as JSON through the use of an API, while others are closer to searchable databases, with data existing as SQL and being offered in large data dumps. 

What remains constant across these data providers is our general data workflow; data must be accessed in some way, analyzed so that a suitable ontology can be created that is able to represent the data, transformed into a Linked Data format (in our case RDF), and finally made available so that it is able to be worked with by researchers. To give readers an idea of what this workflow looks like and how the data we work with is altered in a way to help it meet the needs of researchers, we’ll be going over a couple of these steps in separate blog post. Here, we’ll talk about the creation of the ontology based on how data providers describe their own data, and in a followup post, we’ll talk about some technical aspects of data transformation.

Continue reading “Turning Fan-Created Data into Linked Data I: Ontology Creation”

What is a Tiny Use Case?

The term Tiny Use Case, or TUC for short, was coined by the diggr (Databased Infrastructure for Global Games Culture Research) research project team. A detailed description of this workflow methodology can be found in their paper With small steps to the big picture: A method and tool negotiation workflow (Freybe, Rämisch and Hoffmann 2019).

Taking their inspiration from agile software development principles, the Tiny Use Case workflow was created to handle the needs of a complex research project that required the meshing of expertise from very different disciplinary backgrounds and involved a high level of uncertainty regarding the types of challenges that would emerge in the course of the project. By working through a series of three to four month long Tiny Use Cases the diggr team was able to leverage a similar cycle of continuous incremental innovations and assessments that is one of the main strengths of agile approaches.

Continue Reading

Presence at upcoming conferences and workshops

UPDATE 2020. 03. 14.: Due to the situation regarding COVID-19 both of our upcoming conference appearances have been postponed. The Mechademia Conference will take place next year, and the Building Bridges Symposium will be held on an as yet undecided future date.

We will be introducing our first results at the following upcoming conferences and workshops. If you are interested in talking with a team member about our project, please feel free to contact us.