I store no scale as I never used it.
About translation, I need it for now because I store each bones in his "parent space" and not in "skeleton bindpose space". So I multiply each computed bone by inverse bindpose matrix at each computed frame. I know this may be inefficient, but that's the way it is for now, and like this it's easier for me to bind animation to the skeleton later in the engine and not at import time. If it changes one day, not storing translation would be a plus !
About orientation, I've seen some compression of quaternions like not storing "w" and computing it using "w=sqrt(1-x*x+y*y+z*z)", but yes I could also lower precision... you mean use some "half float" software implementation ? of fixed point ?
For quaternions in 32 bits, there's a fairly good, albeit brief description of the same algorithm I use in the 2nd to last paragraph here (I think it was originally in a game programming gems book, can't find the reference though) :
http://bitsquid.blogspot.co.uk/2009/11/bitsquid-low-level-animation-system.htmlFor compressing quaternions to 64 bits, I wouldn't bother doing anything clever. Just multiply each value by 32767 and store the result in a short. There's plenty of ways to do it more cleverly, but this is fast to implement and to compute, and is probably accurate enough for the vast majority of cases.
It might be wise to make compression a per-animation option though, just in case there are some that need to be extra accurate for some reason.