On counting the uncountable

For many game scholars, archival material means either old game platforms, old games, or occasionally, old game ephemera such as magazines or fan sites. For many digital archivists, archival material means technical data and metadata about software. What about when the two collide? Do game logs—not even replays or recordings but the barely readable logs—have archival value?

I ask this because for the past two days I’ve been working on a small python program to extract information from the League of Legends logs on my computer. The reason I started this in the first place was because it bothered me that I didn’t have a way to know how many total games I’ve played with a given champion. There are several good web tools out there for grabbing gameplay data like champions played, win-rate, items built, etc. These tools allow players to look back at their games and to correlate performance with choices of champion, item, team, or role. For example, most of these tools include a “most played champions” visualization with win and loss statistics (see figure 1 below).

Figure 1: Screenshot from OP.GG's landing page for myself. Search performed on 19 December 2014.

Figure 1: Screenshot from OP.GG‘s landing page for myself. Search performed on 19 December 2014.

In May 2014, Riot Games introduced their new Match History site (link goes to my own match history) for recording much of this information, and it’s extremely comprehensive. The history stored only goes back to the introduction of this tool, however, and prior history has been lost. In addition, Riot has not decided how long this information will live on their servers. According to a blog post introducing the new Match History site, Riot planned to store games for a year (FAQ, “How long will games be saved?”). This match history, like many of the other web tools, is not supposed to be a time capsule of gameplay. Rather, it serves the purposes of current players, by visualizing their games and presenting statistics conducive to improvement.

Unfortunately, most of these tools were created to help players with their ranked gameplay. Thus, the gameplay data collected have to do mainly with ranked games, and secondarily with other modes such as normal (Summoner’s Rift). While this is great for most players, it normalizes a form of play to the exclusion of others. I do not play ranked, and rarely play normal (Summoner’s Rift) games. The vast majority of my games are ARAM (All Random All Mid, a “fun” game mode on Howling Abyss) and bot games (Co-op vs. AI, not PvP). We can get into the reasons for this another time, but suffice it to say that when I use these tools to examine my own performance, I get misleading or even just plain wrong results (see Figure 2).

Figure 2: The same module on Elophant, another web tool for performance stats. Note the lack of champions due to lack of data.

Figure 2: The same module on Elophant, another web tool for performance stats. Note the lack of champions due to lack of data. Search performed on 19 December 2014.

So how could I fix this problem?

How can I count the games that don’t count?

Part of the answer lay in a web tool that I have used to great success, Logs of Lag (“a League of Legends netlog analyzer”). This tool allows you to drag and drop a single network log file into your browser, and outputs a graph visualizing your ping and your packet loss during the game. It also evaluates your average ping and packet loss and identifies whether your connection is good or bad. It’s clean, simple to use, well-designed (it even tells you exactly how to find these network logs) and useful. Logs of Lag pays homage to another, now defunct tool, called LoL Parse (Internet Archive link due to the whole defunct thing). Now, in the interest of full disclosure, I should say that I have used LoL Parse exactly once, many months ago when it was announced, and I loved it. For the first time, this was a performance analyzer that took my performance seriously. It was fascinating to see how many times I played certain champions, and whether there were any patterns that I could unlock. I understood why people gravitated to web tools like the aforementioned OP.GG and Lolking. I wish I had saved the results of the analysis.

I didn’t articulate these feelings at the time, but the fact that I’ve thought about this tool multiple times in the intervening months should say a lot. Perhaps it was only through writing my program and digging through this material and getting inspiration from extant tools that I was able to understand why it meant something to me. Clearly, it did. I wouldn’t have taken a day of coding (and I’m not the most experienced coder, especially in Python…) if it wasn’t important. So then the question becomes, why is it meaningful to count the games that don’t count?

Part of that answer lies in a term that I discovered due to a proposed workshop at the iConference next March: “trace ethnography.” The organizers of this workshop describe it as analyzing “trace data” as evidence of engagement with information systems. In my case, it would mean analyzing these highly technical game logs as evidence of my engagement with League of Legends as a player. Hopefully, I’ll know much more about “trace ethnography” in March. I think it has a lot to add to current methods for new media studies, although I personally have a big question. How is trace ethnography any different from archival research? If the researchers who embrace and develop this method believe in a fundamental difference between “automatic” traces generated by information systems, and “hand-drawn” (lol manuscript) traces created in the context of an institution, then I don’t think I can get behind it.

First of all, canonical archives are in many ways equivalent to the data archives containing this trace data. According to foundational archival theorists like Schellenberg or Jenkinson, archival material is necessarily evidence of the functioning of a process. Archival material consists of the log files of governments, schools, corporations, and other organizations. Second of all, the creation of trace data is never removed from social and historical context. For example, my folder containing these log files is massive. My friends, who play League on Windows (I play on Mac), have much smaller log files. This isn’t due to some automatic thing where the Mac client just logs more information than the Windows client, absent any human involvement. Instead, we must understand that the Mac client is newer and less-well-developed, that the Mac client is in fact still in beta, that the Mac client crashes more often due to this, and that the developers introduced more error checking into the logs because they were aware of the nature of the software. In all cases, the reason for the file size difference is that people decided to record more information about a system in order to better understand it. Trace data are never neutral.

That aside, I’m super excited for this workshop (which I suppose means I should apply for real and include some of this). I think the concept of “trace ethnography” could benefit from conversation with archival theory and practice, and I would like to see that conversation happen and maybe even be part of it. Moreover, my work with this stupid little parser forced me to crystallize my own understanding of what trace ethnography could be.

So far, the parser’s functionality is limited to identifying names of champions that the summoner whose computer it is has played. The logs, however, contain a crapton of information beyond this. They list the summoner names (equivalent to usernames) of all players on the team. They list the champions used by said summoners. They identify the skin used, and the team. All that, in one line! (See code tag below for an example of this line from one of my many log files.)

Spawning champion (Ahri) with skinID 0 on team 200 for clientID 9 and summonername (morbidflight) (is HUMAN PLAYER)

The log files also contain information about in-game events like player character deaths. Finally, the log files contain information about events that are invisible to the player, like errors in loading certain assets.

Figure 3: Image of part of the game logs. In this game, the client apparently failed to load an asset for Yasuo's wind wall skill.

Figure 3: Image of part of the game logs. In this game, the client apparently failed to load an asset for Yasuo’s wind wall skill.

If we attempt to take these log files seriously as evidence of play, we have to acknowledge that these log files can represent the course of a game in a way that runs orthogonal to most players’ experience. We don’t see the matrix, but we play in it. The log files, however, record traces of events that act as engagement points for the player. A log file might record a series of disconnect-reconnect cycles, and the player instantly remembers how frustrating that experience was, even if they don’t recall the particular event. Log files present information intended for use by developers, but usable by archivists and scholars of games. As far as I know, no one working in game preservation has argued for storing log files instead of game recordings or game platforms, although I’ll continue with my provocative trend and say that if I can have a 3 gb folder with over 1,400 logs in it corresponding to 1351 games played, that’s more useful than massive video files for each game. After all, digital games are interactions with code. If the log files store some of the subjective experience of that interaction, or even store anchors to that subjective experience, then might that not be enough?

By the way, according to my own parser, I’ve played 87 games as Dr. Mundo on this computer (which I’ve had since around February/March). I think it’s safe to say I like the champion. What was surprising, however, was that I had 82 games played as Nami. I hadn’t realized that I played the two nearly the same number of times (again, only within the past nine months). My parser was able to tell me something I didn’t expect, and something that I could never see from using the web tools. I’d count that a success.

Epistemology and research methods

I almost have to post this, because I went on a rant in IRC about the breakdown of research methods across disciplines leading to reinventing the wheel. I am just going to paste the logs because oh god, why not. I changed two usernames because I’m weird and care about privacy on the internet, sometimes. I also fixed one typo, though it was in no way a meaningful typo.

12:26:59 AM <redacted1>: morbidflight: coding?
12:27:11 AM morbidflight: qualitative data, not programming
12:27:12 AM morbidflight: like
12:27:23 AM <redacted2>: night
12:27:24 AM morbidflight: creating a theoretical framework within which to analyze said messy qual data
12:27:27 AM morbidflight: night <redacted2>
12:27:38 AM morbidflight: but doing it from a bunch of reading and categorizing
12:27:46 AM morbidflight: it’s honestly a pretty similar task to classification in general
12:28:21 AM morbidflight: although, and we’ve talked about this before, qualitative researchers don’t tend to talk to the kinds of people who are trained in classification and so that task gets seen as bitch work while coding and qualitative analysis in general is seen as this high-level thing
12:28:40 AM morbidflight: anyway i have a grudge against that because any ontological construct is high-level work and should be recognized as such
12:28:59 AM morbidflight: and understanding the similarities in said work can help ethnographers et al. learn to manage their task in a different way
12:29:00 AM morbidflight: etc
12:29:31 AM morbidflight: yet another example of the segmentation of academic work leading to breakdowns in potential communication and collaboration
12:29:48 AM morbidflight: i mean imagine if you had a hardcore taxonomist on every anthropological team that worked with qual data
12:29:51 AM morbidflight: that’d be pretty swanky
12:30:05 AM morbidflight: i mean you’d have to argue with them about the fundamental principles of organizing knowledge but hey
12:30:14 AM morbidflight: epistemology amirite
12:30:40 AM morbidflight: itt: i care too much about research methods

A long-standing issue of mine is that I see a lot of great theoretical work being done in libraries, archives, museums, and other cultural heritage institutions, and by the people who study them. This work often engages with larger topical debates, such as the entire field of digital humanities (I mean seriously, who other than an information professional are you going to talk to about creating an accessible web-based database of digitized texts?), and yet these larger debates treat this work as “infrastructure” or “the help.” NOT TO MENTION the often gendered breakdown of this labor. I didn’t use the term “bitch work” lightly, above.

On that note, have a look at this article from a few weeks ago that I tweeted on June 6. Infrastructure is what makes it all possible.