The process of designing characters for a visual novel game relies on shared conventions for drawing character clothes, hairstyles, accessories, for articulating character demeanor (through visual and other cues) and more. In some cases, certain character types are conventionally depicted with certain visually recognizable traits. For example, a character’s hair could be drawn so that it sports a strand of hair which moves according to the character’s mood, this is called an ‘ahoge‘(idiot hair), and signifies a correspondingly whimsical personality. Another character might treat their love interest coldly while secretly harboring affections for them, struggling in the contradiction, a ‘tsundere’ demeanor, which does not necessarily have a corresponding outward visual trait to signify this personality type. Ahoge and tsundere are two of hundreds of templates for character design, which combine to shape a character’s identity.
Is the usage of these templates subject to recurring practices across the production of visual novel games? Can we investigate them via fan-curated data? We can think of the usage of templates in character design as group of traits that occur together. Characters sporting the same templates can in turn be seen as a pattern in character creation practices. We can also think of ensembles of templates recurring in the same fashion across characters as potential archetypes for character creation practices. The vastness of fan-curated databases offers us an avenue for answering our questions.
On The Visual Novel Database (VNDB), which has kindly offered its dataset to the JVMG project, characters are described through a system of character traits. Traits detail various aspects of a character such as hair style, hair color, eye color, profession, what they do within the game’s plot and more. Traits are not present in an isolated fashion: character entries tend to feature multiple traits, allowing for the identification of recurring patterns of trait co-occurrences. VNDB’s data model is thus particularly suited to further our investigation into recurring practices of character design in the field of visual novel games.
The VNDB character trait dataset can be represented as a graph, a set of objects (traits) related with each other by how much they occur together (co-occurrence) on characters. The network visualization software Gephi can visualize our dataset as a network diagram, turning character traits into nodes and trait co-occurrence into the edges that connect nodes to other nodes.
Beyond visualization, Gephi provides us with an array of data analysis tools which allow us to further our investigation into visual novel games. A graph can be measured in several ways, all of them productive towards analyzing the structure of our dataset. One measure that proved particularly useful to us is modularity. Modularity is the measure of how many communities of nodes can be derived from our graph on the basis of connection strength between nodes. Modularity allowed us to structure the connections existing between groups of traits in our dataset as subnetworks. By looking at the type of nodes forming each community, we were able to gather insights on practices of character design as catalogued on The Visual Novel Database.
Character design practices in visual novel games possess an apparent tendency towards aesthetic self-sameness. If practices of character design were actually divergent from the established system of design templates, trait distribution in our dataset would tend towards randomness. A random distribution would be reflected in the modularity of our dataset. On the other hand, a non-random distribution would also be reflected in the communities derived from our dataset. Note that random trait distribution would not necessarily imply that characters are designed in random fashion. It would rather mean that each visual novel game’s development process is a distinctly unique effort. The actual recourse to the common baseline of character template would prove to be quite limited.
Another group of measures which provides useful insights into our dataset is centrality. Centrality refers to the measure of a node’s importance within a network on the basis of a given attribute. It can be used to measure which nodes act as intermediate points between two other nodes (betweenness centrality), which nodes possess the highest number of connections (degree centrality), which nodes possess less degrees of separation from all other nodes (closeness centrality), which nodes are most connected to other highly connected nodes (eigenvector centrality or prestige score). For the present analysis, eigenvector centrality allowed us to measure which traits provide the most insights on each community.
By measuring which nodes are the most connected to other highly connected nodes, we can gain a very specific insight into our dataset: what are the groups of nodes which constitute the cores of our networks. In other words, we can further map the internal connections of our dataset to find whenever there are groups of traits which could be considered archetypical in character creation or recognized as such during data compilation. Should these traits also share other commonalities, such as descriptive themes or trait trees, it would further suggest the presence of shared character archetypes, or the recognition of such ensembles by VNDB users.
On the other hand, a network ‘core’ of traits lacking commonality would elicit different considerations, and would ultimately suggest against the presence of archetypes in visual novel game character creation. We therefore decided to run Gephi’s modularity algorithm to identify the major subnetworks in our character traits data, followed by the eigenvector centrality algorithm on each of the subnetwork derived in this way. Running the algorithms at default setting derived a total of three subnetwork for our dataset. We also decided to check for each subnetwork’s top ten traits by eigenvector centrality. The results were as follows:
|Community|| 10 most representative traits |
(highest eigenvector centrality in the node community)
|Percentage within the dataset|
|1||‘Short Hair’, ‘Brown Hair’, ‘Young Adult’, ‘Black Hair’, ‘Amber Eyes’, ‘Parted to Side’, ‘Brown Eyes’, ‘Shirt’, ‘Red Eyes’, ‘Necktie’.||47.57%|
|2||‘Explicit Trait 1’, ‘Explicit Trait 2’, ‘Explicit Trait 3’, ‘Explicit Trait 4’, ‘Explicit Trait 5’, ‘Explicit Trait 6’, ‘Explicit Trait 7’, ‘Explicit Trait 8’, ‘Explicit Trait 9’, ‘Explicit Trait 10’.||27.3%|
|3||‘Pale Skin’, ‘Slim Body’, Teen (Apparent age), ‘Blue Eyes’, ‘Long Hair’ , ‘Average Height’, ‘Waist Lenght+ (Hair)’ , ‘Straight Hair, ‘Blond Hair’, ‘Sidehair (Hair Tail)’.||25.13%|
The results strongly suggest a non-random distribution of traits in the vndb dataset. The emerging node communities differ significantly in terms of size, their share in the overall dataset and even the themes of the nodes that compose the subnetworks. This is very evident if we observe node community number two. This community clusters traits which pertain to the description of a character’s sexual activity and pornographic depiction in a visual novel game. If overall distribution followed a random pattern, we would not see this clustering pattern but rather a distribution of sexual traits across the subnetworks we have derived.
Regarding eigenvector centrality, our observation of the three node communities reveals a prevalence of nondescript traits in the first and the third community. On the other hand, the second community – which groups traits describing character sexual activity and pornographic depictions – has an ensemble of mainstream pornographic depictions as their ten nodes with highest eigenvector centrality. Within the first community, we can find two traits describing a character’s hairstyle (‘Short Hair’, ‘Parted to Side’), two traits describing hair color (‘Brown Hair’, ‘Black Hair’), one describing an age group (‘Young Adult (Apparent Age)’), three traits describing eye color (‘Amber Eyes’, ‘Brown Eyes’, ‘Red Eyes’) and two traits describing pieces of clothing (‘Shirt’, ‘Necktie’). Within the third community – our other non-sexual traits community – we can find a trait describing a character’s skin color (‘Pale Skin’), one describing an age group (Teen (Apparent age)), one describing character eye color (‘Blue Eyes’), three describing various hair styles (‘Long Hair’, ‘Waist Lenght+ (Hair)’, ‘Straight Hair’, ‘Sidehair (Hair Tail)) and one hair color (‘Blond Hair’).
Employing Eigenvector centrality, in this case, ended up being not as productive as we had anticipated, especially in light of the highest-ranking traits in community one and three. We could not derive traits or ensembles thereof that could clearly suggest the presence of shared character archetypes in visual novel design practice. We could only surmise that hair styles and body types are important in character design, and that traits describing sexual activity mainly recur with each other in their own community.
On the other hand, community two becomes more interesting when juxtaposed with the gendered specificities of visual novel games. Pornography in the field of visual novel game tends to follow a gendered distribution. Visual novel games intended for a male audience generally feature explicit depiction of sexual intercourse between the (male) player character and the game’s characters (female). Visual novel games intended for a female audience present different nuances.
Visual novel games intended for female audiences and featuring experience involving heterosexual relationships will generally not depict sexual intercourse between the (female) player avatar and characters (male). Visual novel games directed at female audiences which feature homosexual relationships will instead tend to be much more explicit in showing intercourse between the (male) player avatar and other characters (also male).
Taking community two as a being representative of all visual novel games might skew our data by not taking its gendered nuances into account. Usage and distribution of character pornography vary widely on the basis of the game’s intended audience. We therefore decided to stratify our dataset to reflect these nuances in our approach by looking for markers of a character’s intended audience. We then repeated our modularity/eigenvector centrality approach on a stratified dataset capable of accounting for characters’ intended audiences. We will detail our stratified approach in the second part of this blogpost series.