Yeah, on second thought, I won't be able to avoid manual annotation, because the tiles that need to be visible or invisible depend on the scenario. Right now I just do what you said and organize tiles by hand into groups that fade in or out simultaneously.
Here's a couple references FWIW.
http://justindjohnson.com/softdev/isometric-occlusion/This guy uses collision areas to determine occlusion. AFAIK his underlying data isn't 3D; it's just like a space shooter with funny-shaped buildings.
http://simianzombie.com/posts/2018/01/29/isometric-demoThis guy has a lot more rigorous approach and actually uses a Z-buffer to determine occlusion of moving objects pixel-by-pixel. He even considers rendering the scene by raycasting into a voxel volume, which sounds rad.
I am keeping it slow & simple: breaking my objects into unit cubes and rendering back-to-front and bottom-to-top.