DevLog Update 03 – OptimizationYou wouldn't think a simple game like Prune would need any optimization at all. I mean, it's basically one tree and some circles. But there are three things you have to realize: 1) I'm a terrible programmer, 2) trees tend to have a lot of moving parts, 3) see #1. I've become quite skilled at hacking together barely functional, terribly inefficient code in the name of "prototyping" and then moving on.
When I first started prototyping things, I didn't really see any slowdowns on my PC so performance wasn't even on my mind. But then when I tried it out on my iPad 2 for the first time, it brought it to its knees. So I started reading up a bit on mobile performance and found I was basically doing everything wrong.
Put Your Draws DownFirst order of business was to get my draw calls down. Turns out, keeping draw calls low is *super important* for mobile. For whatever reason, my branches were initially made out of quads instead of proper 2D sprites (in Unity). As soon as I would override the material color (e.g. - infecting a branch red), the draw call count would go crazy. Instead of a pleasant number like <50, it would spike up to 500 or so to match the number of tree branches.
Luckily, this was an easy fix. After messing around with trying to batch the quads and use a shared material, I found simply switching to sprites took my draw calls down to the single digits where they should be for a game with a single silhouetted shrub.
iTween, uTween, weAllTweenSince I had used iTween on other projects, I naturally began using iTween to animate all the tree branches growing. Again, this was pretty much fine on PC, but as soon as I tested on mobile I found that iTween'ing 500+ objects in parallel was A Bad Idea due to the considerable overhead. I quickly found a much faster alternative in
LeanTween.
I also ended up just writing my own simple little loops for lerping stuff like branch color and branch size since I wanted it to run as fast as possible and allow for Prune-specific stuff.
You Get A Collider, YOU Get A Collider...For my initial prototype, I made the (hasty) decision that each branch would get a box collider (set to trigger). My branches were rectangles, so they deserve box colliders, right? Of course, this was overkill, but I didn't know it at the time. It made things relatively easy to get in quickly since I could just add a bunch of OnTriggerEnter type calls.
After awhile I realized that all I really needed was a line segment to define a branch, so I switched over to Unity's Edge Collider 2D. This, combined with realizing I needed to add RigidBody components (oops), ended up improving performance quite a bit.
So Many BranchesBut all was not well in framerate land, I was still having major framerate drops on my iPad. I knew the number of branches trying to update at once was a problem. Early on, I went from a tree depth of 9 to a depth of 8, which basically cut my branch numbers in half (~500 instead of ~1000). This was an easy change because that final layer of teeny tiny branches was pretty much invisible anyway:
But I wasn't willing to go down any further in tree depth. I still somehow needed to update fewer branches every frame. My hunch was to implement some sort of round-robin system, and after talking it over with
Aaron San Filippo, I decided to try it. Dividing the branches into 3 "pools" and then having them take turns updating helped immensely.
Throwing Out CollisionThings were feeling
pretty good now. But I'd still get the occasional slowdown, especially in certain "worst-case" scenarios. I wanted the framerate to be ultra buttery smooth and it was not always ultra buttery smooth.
Relatively early on, I switched from a collision based "swipe" to a manual line segment intersection based swipe when pruning. This did wonders for the precision and reliability of pruning:
In the back of my mind, I knew that I probably needed to take this approach with all my branch "collisions." I knew the best solution would be to just get rid of branch collision altogether in favor of writing custom "collision" code for my situation. Well, I finally got around to doing this last week and I can say that I'm now pretty darn close to ultra buttery smooth!
FutureI'm trying to avoid the whole "premature optimization is the root of all evil" thing, so I won't be doing these things if I don't have to, but two further areas I could optimize are:
- Budgeting/throttling - Place an upper limit on how many things can be updated in a single frame. Queue for later anything that exceeds this limit.
- Space partitioning - Divide the playspace into cells so that when doing my custom collision checks I'm only checking branches in the neighborhood.
So there it is. I'm probably missing some really obvious optimizations, but I do feel like I've learned a lot in the process. Plus, it's even kind of fun to squeeze out another few milliseconds (did I just say that?).
Just read this post. It was very instructive