Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411281 Posts in 69324 Topics- by 58380 Members - Latest Member: bob1029

March 28, 2024, 09:57:27 PM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)How I used code generation to auto generate the localization system for TERROR S
Pages: [1]
Print
Author Topic: How I used code generation to auto generate the localization system for TERROR S  (Read 2378 times)
mandarin
Level 0
**


View Profile WWW
« on: January 17, 2020, 06:55:11 AM »

TERROR SQUID is a game where you create your own bullet hell. It’s made in Unity, and will be released on Nintendo Switch and Steam in 2020. The game has been localized for 26 languages. Our localization agency put all of the localized strings in a Google Sheet. The sheet was set up with one column per language, and one row per localized string.

Design goals
The design goals for the system are:

  • Must be possible to switch language at runtime, without having to restart the game.
  • Localized strings must support variables. E.g.: “You scored x points”.
  • We should be able to get new content from the Google Sheet into the game with as little effort as possible.
  • The system should be as performant as possible. It needs to load as fast as possible, and switch language as fast as possible.
  • The system should produce as little garbage in memory as possible.

I have made 5-6 localization systems for other games, and they all had mostly the same requirements. Some of them had to support localization of assets, such as textures, animations and GameObjects. For TERROR SQUID we only had to support localization of strings. While it is possible to make a generic system for handling both strings and assets, I’ve found that generalized systems aren’t very good at anything. They do whatever they’re set to do, but often at the cost of performance and/or API design. If I were to add support for localized assets to TERROR SQUID’s localization system, I would make a sister system that exists side-by-side with the string system, in order to get the best of both worlds.

Before I started planning, I thought about what worked and what didn’t work with the other localization systems I made, and came to a few conclusions for what to do with this one.

  • I didn’t want to put all the strings in assets, like French strings in french.txt, and English in english.txt. It takes time to load an asset, and it has to be deserialized, which takes even longer, and causes a lot of garbage. The old asset has to be unloaded and any remains cleaned up. That’s not ideal at runtime.
  • If I had put strings in an asset, I would get a new problem; variables in strings. The asset can’t contain any logic, so I would have to load a string, parse it, replace any tags with values, and lastly join the strings. That’s expensive to do at runtime. When supporting variables in strings, you can’t escape the fact that you need to join strings before showing them, but the effects of the problem can be reduced.

The system
I looked at what it takes to support variables in strings. I figured that if the localized asset could embed logics, I would write each string like a function, like this:

Code:
string GetScore(int score) => $"You scored {score} points!";

That way I would separate the API from the contents. It wouldn’t matter what language the system returned, I could still call `GetScore(123)` and get the correct localized string. The calling code would never need to know what the current language is, or any other specific details about the localization system or the state it’s in. All it needs to know is which method to call.

Merging text with logics like this could be done by code generation. That’s easy. All you have to do is write the correct code to a text file, and let Unity do the compiling. But will that solve all our problems? Let’s review our design goals.

Fast loading time
When the strings are part of the source code, they’ll get compiled together with the code, and bundled in one of the dll-files. Unity handles the loading of dll’s, and makes sure all code is ready when our game code starts. Convenient!

Switch language at runtime
When strings are grouped in separate classes, one for each language, loading a new language is a matter of instantiating the correct class. Easy and fast!

Produce as little garbage as possible
The language instance doesn’t need any parsing on load, so all it does is allocate a bit of RAM to make room for the class instance. Each string returned allocates a bit of RAM. Switching language means throwing away the previous language instance, and replace it with a new one. The garbage collector will do a bit of work when cleaning up the old instance. So, compared to loading an asset of serialized strings, this will most likely produce less garbage. Sounds like win!

Localized strings must support variables
Check!

Update content with little effort
Generating the code can be automated. Deciding what to generate code for is done via parsing the Google Sheet. Getting the Google Sheet can be solved by using third party software. Someone must have made a Unity integration for Google Docs. So, all of this can be automated via editor scripting. It’s just a matter of writing the code and getting it right. Easy!

Code generation does indeed solve all our problems. Let’s get started!

Code generation
So, to make this work, I had to create one cs-file per language, with all the localized strings wrapped in functions with a name that reflects the contents of the string. Here’s a method from the english file:

Code:
public string GetVO_Awesome() {
  return $"Awesome";
}

The method needs to be generated using only the information returned from the Google Sheet. For this method, I know that it comes from a column called “English”, and a row called “VO_Awesome”. The cell contains only the word “Awesome”. With this information, I created a few simple rules:

  • Use the name of the column to dictate the name of the file. In this case it’s named “EntriesEnglish.cs”.
  • Use the name of the row to dictate the name of the method. Always prefix the name with “Get”, so that the method is named “GetVO_Awesome”.
  • Methods has to be marked as `public string` in order to be publicly available and return a string.
  • Use string interpolation ($) for all strings.

This is easy for strings without variables. When they contain variables, I have to dig a little deeper.

Variables in strings
I needed a way to identify variables in an unambiguous way. In TERROR SQUID we don’t use many special characters. We might use an exclamation mark, or a question mark, so it was safe to decide on using curly brackets for variables, like this:

Code:
You scored {score} points!

In order to pass a variable via method parameters, it has to be of the correct type. So, I needed a way to describe the variables. The type of variables we needed support for was integers, floats and strings.

Variables ended up being defined like this:
Code:
You scored {score:i} points!
for integers.
Code:
Your name is {name:s}
for strings.
Code:
You are {height:f} meters tall
for floating points.

By postfixing the variable with an i, f or s, I have everything I need to generate the code.

Generating the code
For each string parsed, I have to figure out if it contains variables or not. For that I use a simple regex to look for matches of the variable pattern, which looks like this:

Code:
private const string patternVariable = @"{(.*?)}";

The generator is split into three phases. First phase looks for variables. Second phase uses the information about the variables to generate the method signature. The third and last phase generates the method body. Used together, I could easily generate all the necessary classes and methods, both with and without parameters.

Runtime code
To make sure any calling code can access all strings, regardless of language, I created an interface for the entries classes. I had all the information I needed to generate the interface. It looks like this:

Code:
namespace TerrorSquid.Localization {
    public interface ILangEntries {
        string GetMainMenu_Play();
        string GetStats_SurvivalFormat(int day, int hour, int minute, int second);
    }
}

Implementations of the interface gets generated like this:

Code:
namespace TerrorSquid.Localization {
    public class EntriesEnglish : ILangEntries {
        public string GetMainMenu_Play() {
            return $"Play";
        }
        public string GetStats_SurvivalFormat(int day, int hour, int minute, int second) {
            return $"{day}d {hour}h {minute}m {second}s";
        }
    }
}

To be able to switch between languages in a type safe way, I also generate an enum with all the languages the game supports.

Code:
namespace TerrorSquid.Localization {
    public enum Lang {
        English = 0,
        Norwegian = 1,
        French = 2,
        German = 3,
    }
}

For displaying the possible languages to toggle between in the UI, I wanted to avoid parsing the enum to string at runtime. That’s why I generated a class with the pre-parsed strings.

Code:
namespace TerrorSquid.Localization {
    public static class LangHelper {
        public const int NUM_LANGUAGES = 26;
        public static readonly string[] LANG_LABELS = {
            "English",
            "Norwegian",
            "French",
            "German",
        };
    }
}

When using an enum to switch between languages, I also needed a way to know which class to instantiate based on the value of the enum. That could be generated too, like this:

Code:
namespace TerrorSquid.Localization {
    public static class LangEntries {
        public static ILangEntries Create(Lang lang) {
            switch (lang) {
                case Lang.English: return new EntriesEnglish();
                case Lang.Norwegian: return new EntriesNorwegian();
                case Lang.French: return new EntriesFrench();
                case Lang.German: return new EntriesGerman();
            }
        }
    }
}

The only thing I didn’t generate was the entry point that binds all of the generated classes together.

Code:
namespace TerrorSquid.Localization {

    public delegate void LanguageLoaded(Lang fromLang, Lang toLang);
   
    public static class Loc {

        public static ILangEntries entries;

        public static Lang                 CurrentLanguage  { get; private set; }
        public static int                  NumLanguages     => LangHelper.NUM_LANGUAGES;
        public static event LanguageLoaded OnLanguageLoaded = (f, t) => {};

        public static void SetLanguage(Lang lang) {
            Lang fromLang = CurrentLanguage;
            CurrentLanguage = lang;
            entries = LangEntries.Create(CurrentLanguage);
            OnLanguageLoaded.Invoke(fromLang, lang);
        }

        public static string GetLangLabel(Lang lang) => LangHelper.LANG_LABELS[(int)lang];
    }
}

Of all the runtime code written for the localization system, only 23 are written and maintained by hand. The other 20,173 lines are auto generated from a Google Sheet.

So far, this is the best localization system I’ve ever written. At least it solves our challenges in possibly the best way it could.

Hope this is of any inspiration.


Discord: https://discord.gg/9wsFA4Z
Website: https://terrorsquid.ink/
Reddit: https://www.reddit.com/r/TerrorSquid/
Logged

TheCams
Level 0
**


View Profile
« Reply #1 on: January 28, 2020, 04:20:19 AM »

Nice write-up!
I like the code generation part, but it sounds like having to recompile the whole game can make iterations on the localization slower?
Any plans on adding a debug feature to reload strings at runtime? (maybe a new class inheriting from ILangEntries)
Logged
mandarin
Level 0
**


View Profile WWW
« Reply #2 on: February 14, 2020, 05:38:55 AM »

Nice write-up!
I like the code generation part, but it sounds like having to recompile the whole game can make iterations on the localization slower?
Any plans on adding a debug feature to reload strings at runtime? (maybe a new class inheriting from ILangEntries)

Hi!

I haven't spent any time measuring differences in compile time, so I can only guess that it depends on how many languages your game supports, og and how many entries each language has. In our game, we have 185 entries over 26 languages. That's 4810 methods. It takes only a few seconds to get all data, create all files and recompile. If you use an engine like Unity, you can isolate all localization files in an assembly definition. I'm sure a lot of other engines has a way of isolating code in separate libraries/assemblies.

No, I haven't thought about reloading strings at runtime. So far, I haven't had the need to implement it.
Logged

TheCams
Level 0
**


View Profile
« Reply #3 on: February 18, 2020, 10:12:49 AM »

Ah I didn't know about reloading assemblies in Unity. Coming from a C++ background, I thought it meant you had to restart the whole game. Nice!
Logged
Edgar2436
TIGBaby
*


View Profile
« Reply #4 on: January 07, 2021, 01:27:19 AM »

Nice step-by-step walkthrough!
I wonder, though, why create a function for each string when you can have one unified function to get localized value for a given key from a dictionary object that's loaded when launching the game? From a practical stand point, you'd always need a key and a language identifier, just pass that to your gettext function and hook it up with a dictionary containing entries of all locales. This will save you the trouble of generating all the code.
Logged
Pages: [1]
Print
Jump to:  

Theme orange-lt created by panic