Telemetry, Logging, Debugging : Background Necessities of Development

Imagine you're helping test Beth's game.  Suddenly, jumping onto a ledge in the new lava level, things begin to lag, enemies stop attacking, and the game crashes; a popup of "Unhandled Exception" shows.  Consulting Beth, she doesn't know what happened and confirms there isn't any information saved from the game in cases like this.  Thus, valuable details on a bug are lost, leaving Beth to try to reproduce the issue and stepping through the code with breakpoints!

After years of both making programs and being on the receiving-end of someone else's development, I've come to identify a number of things that every program should have (including Beth's) to help designers, testers, and the developer themselves when their product is in use. Let's see how they could help Beth's situation:

Telemetry

Performance

- Crashes: hold enough memory in reserve to send a final notification to inform on any crashes.  The type of crash (OOM, unhandled exception, etc.), the callstack, build version, and any other applicable details must be included in the telemetry event.  This way, no application failure will go unnoticed, decreasing turnaround time to getting a fix in.

- FPS, memory/CPU usage, and network details should be included in periodic telemetry check-ins.  Gauging what is happening between check-ins can show trends of where performance needs to be improved, such as in the lava level.
An example from the lava level could have been:
1-2-16 17:18:19 (date-time), 123abc (session-id), 0.1.2.3-Test (version), unhandled_exception (type), foo at bar (callstack if applicable), Lava_Level_1 (level), 12.3 (FPS), 456 MB (memory), Windows (platform)

Input

- Can the program accept user interaction?  If so, every user action should be attempted to be recorded (a must for simple GUIs).  For more complex products such as Beth's game that can have dozens of inputs in a single second, the following is more suitable:

Interaction

- Pickups, the firing of weapons, level exits, cinematic skipping, the works - in this case, more data is better.  When the user can take action on the content in the program, it should be recorded for later analysis.

- Things to include would be information akin to time/date, a session ID, XYZ, level, what's currently equipped, checkpoint, health, what's being interacted with, etc.

- If there are processes or actors (e.g. NPCs) other than what is coming from the user, record those, too!  Who's the AI targeting, how is the manager handling pickup clean-up, and the streaming-state of cinematics are all questions Beth should be wanting answers to.

Logs

In General

- Pick a spot to put playtime and error logs that is going to be present on every system (a user's Documents folder, phone/console scratch folders, etc.).  When on a development environment, the details should be very verbose, allowing anyone to deduce what both the player and the system were doing at any given point.
During release, the user doesn't (shouldn't) need to see all the details of what is going on, so regular playtime logs could be toned-down or turned-off.  Error logs do, however, need to be made available to the end-user if developer follow-up is required.

Debugging

Hooks

- Exposing debug hooks that affect the nature and status of the program is necessary if there's any high-level blackbox testing to be done.  Instead of starting from scratch, a testing user can allocate virtual funds, profile details, and, in games, weapons, health, and altered-AI states quickly and on-demand.  Instead of starting from level 1 in Beth's game, the player can go straight to the lava level and re-equip the gear they had before the crash!

- Hooks should be disabled for release builds of the product to eliminate an avenue of unwanted manipulation when it comes to data and performance.

Error Messaging

- No-matter the version, any time a program hits an error or crash, a useful error message should display to the user explaining the problem encountered and any additional steps they should take.

And what makes a message useful?
  • A lack of technical jargon.
  • An explanation that could be given to Beth's grandmother.
  • What can be done by the user in the future (e.g. enter only letters and spaces into a name field).
  • Contact info or a link to submit an error report (if the issue is serious enough).

Before Beth starts looking into why the game crashed on the lava level, she may want to spend time implementing behind-the-scenes systems to make future issues easier to resolve.

Including telemetry, session logs, and embedded debugging in any game or commercial app will ease a lot of pain when a user comes back with a crash or other heinous issue that needs correction.  For myself, this is a best practice for anything new I make, and I highly encourage you to do the same!

Welcome!

Thank you for visiting!

This blog is taking over from GamesOfTaste.blogspot.com.  The best content from there along with fresh posts are on their way shortly.

I look forward to sharing with you insights and thoughts on games and how to make them better.  Feel free to checkout Games Of Taste and to contact me at jmchattin (at) gmail (dot) com.

Take care,
Jimmy