Schrompf
|
|
« Reply #2 on: May 18, 2017, 02:29:00 AM » |
|
That's a deep cave you're looking at.
First: negative numbers are not negative by nature, it's just an interpretation when treating them. Some operations, especially printing, do differ. But most operations, addition or multiplication, do exactly the same for signed and unsigned types.
So just cast those numbers to an unsigned type and you can use them as array indices.
Second: the array to look into. This is where the dragons live. You have a number, and you want to know what graphics to put on screen. But there are waaayyy more characters then there are numbers in a char type. And over the years, many people have found multiple solutions to solve this.
"Unicode" is the table of all characters that mankind has come up with. It's basically a huge table which defines an index and a graphics for each and every letter, digit or thingy you can think of. It currently has 136690 entries.
"Encoding" is the method by which you store the charactes as bytes in memory. There's a lot of different encodings, mainly because back in the days they tried to get away with single bytes per character. But because you know that bytes (as the char type in C/C++) can only differentiate 256 values, but the number of characters is >100k, most chars are simply not present in those encodings. Old Windows versions, for example, had a "table" of characters suitable only for certain world regions, and were simply unable to hold chars from other regions. Western Europe was ISO 8859-1, a table containing umlauts such as äöüß, but it was literally impossible to store Chinese characters in there. So a Windows version for China, for example, used a different encoding. As did Eastern Europe. There were a lot of those tables. If you read a value of 184, for example, it meant Ö in one table but ò in another or whatever.
So it was obvious that if you want to solve this once and for all, you'll have to use more then one byte per character. Nowadays, the default for everyone except a few Microsoft programs is UTF8. UTF8 is an interesting idea because for Plain English it's compatible byte by byte. But if you want to leave the realm of ANSI >127 (the "negative" numbers you mentioned), the byte denoted how many additional bytes after that byte belong to the same character, and some bit fiddling gave you the actual character index.
If you're going to write your font renderer, a) use UTF8 everywhere and b) convert strings from and to the current Windows API using one of the dozens of conversion functions. Because Microsoft decided to use UTF16 instead, and that's yet another method of storing characters in bytes.
|