Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411423 Posts in 69363 Topics- by 58416 Members - Latest Member: JamesAGreen

April 19, 2024, 06:05:28 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)Localisation library with a linguist's touch~
Pages: [1]
Print
Author Topic: Localisation library with a linguist's touch~  (Read 713 times)
oahda
Level 10
*****



View Profile
« on: July 02, 2016, 11:32:30 AM »

Yoooo~

Been working on a little project for the past two weekends... Perhaps you might be interested.

As a linguist-programmer, I thought I'd try to think up a system that can truly cover the needs of different languages when localising software. Thus this system is intended to be a lot more feature-complete than most I've ever found out there, and able to cover different numeral systems, inflections of translated words and names and so on.

Work in progress. Not done. But can definitely show it off at this point in case anybody is interested. Have a peek at the detailed and organised info on Bitbucket. And the code, if you want. So please read there before you ask any questions! Hand Thumbs Up Right

Only C++11 ATM. C# port is likely. Beyond that I don't know.

https://bitbucket.org/avaskoog/karhucsvlocalisation/overview

Some localisation data:



Some output:

« Last Edit: July 02, 2016, 11:45:41 AM by Prinsessa » Logged

voidSkipper
Level 2
**


View Profile
« Reply #1 on: July 03, 2016, 06:07:31 PM »

As a linguist-translator-programmer I can tell you that this sort of thing would've saved me a lot of time and my clients a lot of money on some past projects.

I had one project where the English comma was used as a delimiter, and essentially had to modify the text renderer and batch-process all of the script engine directives to use a different delimiter in order to produce my English patch.

Game developers often think "we'll finish the game, then sort out the localization later", but they would have faster releases and less monetary overhead if localization was considered from the beginning.
Logged
oahda
Level 10
*****



View Profile
« Reply #2 on: July 04, 2016, 12:01:17 AM »

Yeah, part of my motivation comes from the fact that the localisation system used for the game the company I'm working for now is making will basically only work properly for languages that have more or less the same grammar as Scandinavian and English, the only supported locales ATM. It has a token replacement system and that's it. Doesn't even handle numeral forms. Sad

Completely agree on your last point.

Nice to see a fellow linguist-programmer, for that matter! Gomez

Next consideration: possibility of adding rules to convert " to proper quotation marks instead of manually writing the correct quotation mark symbols in every label?
Logged

BorisTheBrave
Level 10
*****


View Profile WWW
« Reply #3 on: July 04, 2016, 11:40:18 AM »

Looks cool.

Reading your docs, I think you need some clarification on how _form works. It's not clear at all to me how you got from "FINLAND" to "Suomi". You seem to be implying there is some recursive label expansion going on, but that's surely a separate feature from inflected forms.

Also, you should mention what class _form constructs. We're likely to want variables, not literals!
Logged
oahda
Level 10
*****



View Profile
« Reply #4 on: July 04, 2016, 11:56:34 PM »

Looks cool.
Thanks!

Reading your docs, I think you need some clarification on how _form works. It's not clear at all to me how you got from "FINLAND" to "Suomi". You seem to be implying there is some recursive label expansion going on, but that's surely a separate feature from inflected forms.
Yeah, it's always difficult to figure out how to word things. Good feedback. I might have to add more pictures of spreadsheets to show the correspondences.

Basically: FINLAND is the identifier of another label defined in a translation spreadsheet, with the English locale's value being Finland and the Finnish locale's value being %[Suomi, Suomessa], defining two distinct inflected forms of the noun in Finnish, so that the translator can call on a specific form in a token by its index, so that %{n;0} (or just %{n} since index 0 is assumed by default) would yield the first form (Suomi if the FINLAND label is used) and %{n;1} would yield the second form (Suomessa if the FINLAND label is used).



So what _form does is tell the localiser getting a label that it wants to replace a token with the requested inflected form of another token (i.e. replace the tokens in LABEL with one of the forms in FINLAND):

Code:
localiser.get("LABEL", "FINLAND"_form");



Code:
This is Finland. This is in Finland.
Tämä on Suomi. Tämä on Suomessa.

As you can see, only one argument is passed, because both tokens are index 0 (as defined by %{0}, only requesting different forms.

This might be useful if your software needs a list of countries that are dynamically determined by the program somewhere and they need to be used in different grammatical contexts. So you could keep defining labels for more countries and your program would pick the right one before requesting the label:



Also added Icelandic here to display a different strategy, adding the preposition to the inflected form, since Icelandic requires different prepositions depending on the country.

Also, you should mention what class _form constructs. We're likely to want variables, not literals!
Ain't that what auto is for? Wink But sure, it's a karhu::localisation::Inflection. But I guess you meant you want to construct variables from other (string) variables, which I suppose the suffix can't do. For example in my above example of dynamic country names. That's a good point. Maybe I should add a little utility function to construct them as well. The class name is a bit too long. I guess I could change it to Form instead.

EDIT:
Changed the name to of Inflection to Form and updated the info page a bit.
« Last Edit: July 05, 2016, 08:43:07 PM by Prinsessa » Logged

BorisTheBrave
Level 10
*****


View Profile WWW
« Reply #5 on: July 05, 2016, 01:46:46 PM »

It's much clearer where you have a picture of CSV with just LABEL and FINLAND as above, thanks.

If I've understood correctly, your example is wrong, though. You are passing two parameters into localizer.get (plus LABEL), but the LABEL's localization only expects a single parameter, which it renders twice with different inflections. Perhaps you mean %{1;1}?
Logged
oahda
Level 10
*****



View Profile
« Reply #6 on: July 05, 2016, 08:40:21 PM »

It's much clearer where you have a picture of CSV with just LABEL and FINLAND as above, thanks.

If I've understood correctly, your example is wrong, though. You are passing two parameters into localizer.get (plus LABEL), but the LABEL's localization only expects a single parameter, which it renders twice with different inflections. Perhaps you mean %{1;1}?
Oops. Good catch! Will update. Thanks. c:

I think the point I was trying to make, but failed, because I forgot I was trying to make it, was that you can actually just pass the argument once and get different forms if the argument has the same index, so %{0;1} would be fine and passing "FINLAND"_form again would be superfluous.

Code:
localiser.get("LABEL", "FINLAND"_form");

So that should be enough and token 0 will occur twice while requesting different forms.
Logged

Pages: [1]
Print
Jump to:  

Theme orange-lt created by panic