don't worry, PhysX doesn't require a PPU card to run, and it's the best physic engine I've used, only Havok can compare with it.
A PPU card simply make it faster because if not all the physic is calculated by CPU, you can optimize it a bit using multithreaded programming on dual\quad core processors but in the most cases it's fast more than enough.
And.... Seems strange but it's also faster than box2d, the advantage of Box2d will be only on the crossplatform side, that i'm considering too... i'll see
Also, Ageia is now part of NVidia, and they're working to new drivers to have physics by GPU, so in the near future PhysX will be also hardware accelerated by NVidia graphic cards.
About the rendering-to-texture stuff, yes, it will be a lot faster with a pixelshader, you need to put a quad in front of your camera, then instead of using the backbuffer, render the scene to a texture, there are plenty of tutorials about that tecnique, on opengl or directx, just search around something about "render-to-texture opengl" or "off-screen rendering opengl" ( few links i just found:
RTT with opengl RTT with DirectX ).
After that you will have a texture of the scene. Then you can choose two way to do a filter: using a pixel-shader (called fragment shader on opengl), so rendering your quad with that texture on it using a shader, like I did, or locking your texture, changing manually the pixel values (the texture is a 2x2 array of rgb values). That's slower because you will be using the cpu, but compatible also with integrated chip or old graphic cards... and you can make it a lot faster using sin\cos LUTs anyway
If you're interested in it i can post some hlsl examples for a 2d pixel shader (maybe in an another thread!), and if i've the time, also how to do the same thing by cpu