This week I've been mostly focused on various aspects of my data building process. But I'll start with the work I did during Indie Dev Hour:
Indie Dev Hour - More Glowing EyesThis was once again a very productive hour. Something about having a time limit in which to work really focuses me!
I continued what I started last week, adding the glow layer to the robots' sprites to make their eyes glow. This is a manual process of making copies of each image and deleting all of the non-glowing parts of the image. I was able to get through nearly all of the sprites, except for a couple of variations of one of the enemies, and I stuck them all in a more-dimly-lit-than-usual room to show it off:
How To Play Screen - Adding VideosMy main focus this week was to start adding some more How To Play videos into the game. I previously wrote a post detailing
the process of compressing the videos by splitting them up into small tiles. Since then I also implemented a first pass of the How To Play screen itself, which I don't seem to have shown off yet. Here's a video showing the screen including the new videos I've added this week:
Build Tool optimisationI didn't plan on working on my data building process this week; however these videos were really starting to stretch my build tool to its limits. Let's go over the work that the tool has to do:
1) Load the 788 individual cels that comprise the six videos. Each cel is 128x80 in size.
2) Decompose them into 8x8 square pieces - this turns into (788 * 160) 126,080 pieces.
3) Do a first pass to combine neighbouring duplicate cels together. For example, if the first 6 frames of a video have no movement in them - just Leilani standing still - it'll compress them into a single frame. This process leaves 617 frames made of 98,720 pieces.
4) Detect duplicate pieces throughout all videos. All 98,720 pieces are compared with each other to check if they contain the same pixels, and one is marked as a duplicate of the other. This leaves 8,365 non-duplicate pieces.
5) Pack the 8,365 pieces into a 1024x512 image.
6) Save the image.
7) Save the sprite XML data that specifies which pieces are used for each cel.
Some of these steps were horrendously slow!
3) 0:26
4) 2:34
5) 1:03
7) 1:00
I was able to speed them up with some code optimisations. It was essentially all down to the existing code being written without much thought about the speed of it. It's tempting to say it was 'lazy' code - but I think that encourages bad attitude. It's perfectly fine to write slow code that does its job, and to only think about improving it later if you need to! It's only the large amount of data being processed here that made the optimisations worthwhile.
The optimisations for each step are as follows:
3) Combining neighbouring duplicate cels - 0:26In this step, the source textures for the two neighbouring cel are compared to see if they're identical. This is done by going through each pixel of a texture and combining the values of the pixels into a single hash value. The hash values of both images are compared - if they're identical then we know* that the textures are identical, and the cels can be combined.
(*Side note: If you don't know what a hash value is, it's a small value that represents a much larger and more complex set of data. In this case, the R, G, B and A values of each of the 10,240 pixels in the image are combined together into a single 32-bit integer. This is really useful for quickly comparing large data sets for equality - you can calculate the hash value of many data sets just once, and then simply check if the hash values of two of the data sets are equal. However, when I say "we know that the textures are identical", this isn't strictly true, as it's possible for two different textures to produce the same hash value. This is unlikely enough that I don't bother to handle it.)
My code for generating the hash values of images was very slow. At the time I wrote it, the only function I had for calculating a hash value took a string as its input. To calculate the hash value of the texture, I put all of the RGBA values of each pixel into a long string, and then calculated the hash value of the string. This process of building a string was slow and unnecessary. If the RGBA values of each pixel need 12 characters, the whole texture needs 122,880 characters or 120 KB. That's a long string.
Instead I added functions for calculating a combined hash value from the individual integer R, G, B and A values with no strings involved. This reduced step 3 from 0:26 to less than 1 second.
4) Detect duplicate pieces - 2:34Firstly, this was already sped up slightly by the previous optimisation, as the individual pieces of the images are also compared using hash values generated by the same code.
The main culprit of the slow down was some bad usage of containers. The 98,720 pieces are stored in a contiguous array (std::vector in c++). Contiguous means they are stored in a single uninterrupted block of memory. This is actually a good thing, as it makes it quicker to iterate through all of the pieces in order, as when accessing one entry in the array the CPU will tend to cache what's nearby in memory and be able to access the next entry very quickly. However, when I found a duplicate sprite, I was immediately removing it from the array. If the 10th piece is a duplicate, and is removed, then the 98,710 pieces that follow it all have to be moved back one space in the array to maintain its contiguous nature. Since about 90,000 of the pieces ended up being removed, one by one, you can see that this causes a
lot of unnecessary work to move everything around in memory.
The solution was, instead of removing each duplicate piece from the array immediately, just mark it as a duplicate by setting a boolean flag on it. Then, after finding all duplicates, I do a single pass of moving all of the non-duplicate pieces down to the front of the array, and then trimming off all of the unneeded pieces from the end of the array. This reduced step 4 from 2:34 to around 6 seconds.
5) Packing pieces - 1:03The packing process is a generic algorithm that takes a given set of rectangles, and the size of the space in which they all need to fit*, and decides the location of every rectangle. So it's not specifically designed for packing images, but that's what I'm using it for here.
(*Side note: I determine the target size of the texture by adding up the area of all of the pieces. In this case the code decides that all the pieces should fit into a 1024x512 area. In the event that they didn't fit - due to awkwardly shaped pieces that meant there was wasted space - it would try again with a 1024x1024 texture, but usually the pieces fit on the first try.)
During the packing process, there's a function CollidesWithPacked that's called very often. A rectangle is passed into the function, and it tests if this rectangle overlaps with any of the already-packed rectangles. This function was very inefficient - it simply looped over every packed rectangle and tested for the overlap. It's easy to see how the time adds up here. By the time we're packing the 8,001st rectangle, for every position that we consider packing it into, we have to check against the 8,000 rectangles that were already packed.
A simple optimisation here was to be smarter about how CollidesWithPacked searches for possible collisions. The whole packing space is now divided into evenly sized buckets - both across and down the space. After we finish packing a rectangle, we add it into all of the buckets it overlaps with. Then, when testing for collisions, we also check which buckets the input rectangle overlaps with. We know that only those packed rectangles that are in the overlapped buckets could possibly collide with the input rectangle.
This is a bit like a
quad tree, but I don't have any quad tree code so just kept it simple! This reduced step 5 from 1:03 to less than 1 second.
7) Saving out sprite XML data - 1:00Finally, yet another case of inefficiently searching for information. While saving out the XML data for a packed sprite, a function FindSpritePiece is called very often which looks up the data of a packed sprite so that data can be written to the XML. You can probably predict where this is going - FindSpritePiece was doing a straightforward loop through every known sprite piece until it found the relevant one. The removal of duplicates didn't help here - it's looping through 126,080 pieces.
The solution was to build a look-up table. In c++ I used a std::map to do this, but I think it might be called a Dictionary in some languages. I generated a hash value for each piece based on the information that was being used to look it up. Each piece is then stored in the look-up table using the hash value as the key. When FindSpritePiece is called later, a hash value is generated from the provided information. This can then be used to near-instantly look up the piece that's stored using that same hash value in the look-up table. This speeds up step 7 from 1:00 to less than 1 second.
Phew!That was a lot of information - hopefully it provides some insight on how it's easy to really destroy the speed of a piece of code if you're not careful (or put another way, how a little care and attention can really speed up a piece of code later!). These optimisations don't actually directly affect the final game, but will avoid me occasionally having to wait around for long periods of time when I occasionally have to build the data. The problem would have only gotten worse as I continued to add more How To Play videos. As it currently stands, thanks to the optimisations the data for the entire game can be built in around 1:09 - very satisfying.
It'd be rude to write all this and not show the result - here are the How To Play videos in packed form.
Sprite XML Data condensationI'm not done with this devlog yet though! I made another optimisation to the build process, but not one of speed.
This is the pre-build-process XML data that defines one of the How To Play videos - "Jump".
<Strip name="Jump"
filename="Jump%04i.png"
filePerCel="true"
cels="113"
length60="2"
/>
The total file that defines all six videos is around 1 KB in size.
After the build process, this XML data is much less simple. Each cel of each video (788 in total) has to be individually specified, as well as the source and position of each of the 160 pieces within each cel. Let's look at the "Jump" video again:
<CompositeStrip name="Jump" spriteSheet="PackedVideo.spritesheet" initialBackwardsCel="112">
<Cel length60="16">
<Piece sourceSprite="0" offsetx="-60" offsety="-36" />
<Piece sourceSprite="0" offsetx="-52" offsety="-36" />
<Piece sourceSprite="0" offsetx="-44" offsety="-36" />
<Piece sourceSprite="0" offsetx="-36" offsety="-36" />
<Piece sourceSprite="0" offsetx="-28" offsety="-36" />
... and 145 other pieces
</Cel>
... and 112 other cels
</CompositeStrip>
The file is now 99971 lines long, a hefty 6.33 MB. I didn't do any tests to see if this was particularly slow to load - but nevertheless I wasn't happy with leaving the data at that size.
I created a new data format to use for cels with many pieces. The overall file format is still XML - but data is simply condensed into a single XML tag rather than being defined by separate XML tags.
<CompositeStrip name="Jump" spriteSheet="PackedVideo.spritesheet" initialBackwardsCel="112">
<Cel length60="16">
<CondensedPieceData pieceCount="160">
p0x-60y-36p0x-52y-36p0x-44y-36p0x-36y-36p0x-28y-36p0x-20y-36p0x-12y-36p0x-4y-36p0x4y-36p0x12y-36p0x20y-36p0x28y-36p0x36y-36p0x44y-36p0x52y-36p0x60y-36p0x-60y-28p0x-52y-28p0x-44y-28p0x-36y-28p0x-28y-28p0x-20y-28p0x-12y-28p0x-4y-28p0x4y-28p0x12y-28p0x20y-28p0x28y-28p0x36y-28p0x44y-28p0x52y-28p0x60y-28p0x-60y-20p0x-52y-20p0x-44y-20p0x-36y-20p0x-28y-20p0x-20y-20p0x-12y-20p0x-4y-20p0x4y-20p0x12y-20p1x20y-18p1x28y-18p1x36y-18p1x44y-18p1x52y-18p1x60y-18p0x-60y-12p0x-52y-12p0x-44y-12p0x-36y-12p0x-28y-12p0x-20y-12p0x-12y-12p0x-4y-12p0x4y-12p3x14y-12p2x20y-12p2x28y-12p2x36y-12p2x44y-12p2x52y-12p2x60y-12p0x-60y-4p0x-52y-4p0x-44y-4p0x-36y-4p0x-28y-4p0x-20y-4p0x-12y-4p0x-4y-4p0x4y-4p3x14y-4p2x20y-4p2x28y-4p2x36y-4p2x44y-4p2x52y-4p2x60y-4p4x-58y6p5x-52y5p6x-44y6p0x-36y4p0x-28y4p0x-20y4p0x-12y4p0x-4y4p0x4y4p3x14y4p2x20y4p2x28y4p2x36y4p2x44y4p2x52y4p2x60y4p7x-58y11p8x-52y12p9x-45y12p0x-36y12p0x-28y12p0x-20y12p0x-12y12p0x-4y12p0x4y12p3x14y12p2x20y12p2x28y12p2x36y12p2x44y12p2x52y12p2x60y12p0x-60y20p10x-52y20p11x-44y20p0x-36y20p0x-28y20p0x-20y20p0x-12y20p0x-4y20p0x4y20p3x14y20p2x20y20p2x28y20p2x36y20p2x44y20p2x52y20p2x60y20p1x-60y30p12x-52y28p13x-44y28p1x-36y30p1x-28y30p1x-20y30p1x-12y30p1x-4y30p1x4y30p14x12y28p2x20y28p2x28y28p2x36y28p2x44y28p2x52y28p2x60y28p2x-60y36p2x-52y36p2x-44y36p2x-36y36p2x-28y36p2x-20y36p2x-12y36p2x-4y36p2x4y36p2x12y36p2x20y36p2x28y36p2x36y36p2x44y36p2x52y36p2x60y36
</CondensedPieceData>
</Cel>
... and 112 other cels
</CompositeStrip>
While still a lot of data, the file size was reduced to 944 KB which seems more reasonable.
Duplicate Sprite detection bugWill this devlog never end? We're nearly there. This final thing brings us full circle back to the glowing eye sprites I mentioned at the start. While working on all of this sprite packing stuff, I looked at the packed sprites for the laser enemy and noticed that some of the sprites that should have been recognised as duplicates, weren't.
Specifically, it's the sprites that I had manually separated out into different layers - the glowing parts, and the hands - where the identical sprites aren't always being recognised as duplicates.
As mentioned earlier in this post, duplicate sprites are detected by generating a hash value of each of the RGBA values of each pixel of the sprite, and comparing it with the hash value of the other sprite. I realised that the error here is that there's invisible RGB data even in the pixels that have an Alpha value of 0, leftover from when the sprites were last edited. It's really easy to overlook this as to a human's eye the fully transparent pixels are identical. So a simple fix - to treat any pixel with an Alpha value of 0 as completely black - worked!
Thanks for reading!