Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411413 Posts in 69360 Topics- by 58415 Members - Latest Member: sophi_26

April 16, 2024, 05:05:29 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)Benchmarking Stage3D
Pages: 1 [2]
Print
Author Topic: Benchmarking Stage3D  (Read 6076 times)
Sam
Level 3
***



View Profile WWW
« Reply #20 on: March 14, 2012, 08:49:19 AM »

Just as your tests so far are revealing, I think that Stage3D isn't particularly well suited as a replacement for the straight drawing of bitmaps to screen.

For hardware acceleration to shine as much processing as possible needs to be shifted to the graphics hardware. Handing it a few thousand textured quads and telling it where to draw them is keeping a great deal of the work on the CPU. I don't know how ND2D works internally, but as you're setting the position of sprites in code I assume that's either directly altering the vertex data or altering some vertex constant (I'd guess it's the former as it's just moving quads around).

I made a little particle animation system a couple of days ago:

Which you can try on your own machine.


That's 100,000 billboarded sprites being animated and rendered at 60fps. I find that Flash runs out of memory for my embarrassingly bloated vertex buffers before I can generate enough particles to get the frame rate to drop.

The key is that they're being animated entirely on the graphics hardware. Each frame all the CPU needs to do is increase a float that represents elapsed time and upload a bunch of vertex and index buffers to the graphics card. The waterfall bounce animation is generated in realtime by a vertex shader from a few key variables that are stored in the vertex buffer.

This shifting of work away from the CPU can be seen in "Frame time" display, which is simply the difference between GetTimer() at the start and end of the update loop. You can see the difference if you run the same test in software mode (click the "Use Hardware" button at the start screen) where the frame time will quite quickly reach the point that it's limiting frame rate.


But as these particles are being animated entirely on the graphics card there's no simple way to have the rest of your game access their position or to have them react dynamically to the rest of the game. So these would be fine for a purely decorative effect, but difficult to use for drawing thousands of bullets in a shmup (although possible if you don't mind representing the same object twice, and correctly manage keeping them in sync).

Basically I think that for Stage3D to be able to flex its muscles it has to be freed from the traditional Flash (and most 2d games) approach of having every item in the world at a set position which you update and then pass to the renderer for drawing. The CPU is most certainly the weak link and as much work as possible needs to be taken away from it.
Logged
st33d
Guest
« Reply #21 on: March 15, 2012, 05:40:03 AM »

Well that demo runs perfectly fine on my machine, so it seems that the issue is more how you manage your graphics and what you intend to upload.

Starling I think does a terrible job, but it's very easy to learn. ND2D is impenetrable to beginners and is not as optimised as it could be (from my perspective), however it's faster and it's open enough to hack more speed into it.

On the subject of spotting the users who are going to be left behind I found this article:

http://www.mcfunkypants.com/2011/flash11-stage3d-tutorial-handling-init-errors/

Not having a great time with just getting Stage3D to run on its own. I've never built vertex shaders before so I've hit a wall and unpicking ND2D's framework is an unfolding nightmare of complexity.

Logged
st33d
Guest
« Reply #22 on: March 15, 2012, 08:18:08 AM »

Trying to piece together rendering something. Find this gem in the comments on

http://www.adobe.com/devnet/flashplayer/articles/how-stage3d-works.html

Code:
Hi ! I try run Stage3D test. It not works ! :((
 
My test:
 
package
{
  import flash.display.Sprite;
  import flash.display.Stage3D;
   
  public class Test extends Sprite
  {
  public function Test()
  {
    graphics.beginFill(0);
    graphics.drawRect(0,0,100,200);
    graphics.endFill();
     
    var st3:Object=stage.stage3Ds[0];
  }
  }
}

Oh god I wish it were that simple...
Logged
Sam
Level 3
***



View Profile WWW
« Reply #23 on: March 15, 2012, 09:01:43 AM »

If it's helpful to you, I have an example bare bones "hello triangle" project. The comments outweigh the actual code as it's fairly thorough. Hopefully it is useful for seeing what needs to happen to get Stage3D to draw something, free from the trappings of any framework/engine.

I've also written a couple of articles about AGAL and Stage3D's registers and stuff; you can just filter my site for the Stage3D category.
Logged
st33d
Guest
« Reply #24 on: March 15, 2012, 09:41:18 AM »

Thanks man.

It has occurred to me that in the same way I use the BitmapData class as a fast pipeline for processing tile-based levels, I might be able to leverage shaders to do the similar. So it's definitely worth learning.

Going to try and get some iOS results tomorrow on the current benchmarks to give my brain a break.
Logged
Triplefox
Level 9
****



View Profile WWW
« Reply #25 on: March 15, 2012, 12:36:09 PM »

You should benchmark with larger bitmaps as well. My intuitive experience is that copyPixels gets considerably slower as you draw larger sprites and use a larger stage resolution, and so my expectation is that the GPU frameworks will do better.
Logged

Moczan
Guest
« Reply #26 on: March 15, 2012, 04:41:41 PM »

You should benchmark with larger bitmaps as well. My intuitive experience is that copyPixels gets considerably slower as you draw larger sprites and use a larger stage resolution, and so my expectation is that the GPU frameworks will do better.

copyPixels is just an operation on memory. We shouldn't benchmark it in objects displayed but in pixels copied  Wink It's great as long as you only do translation. Rotations and alpha can quickly slow it down, on the other hand they are almost free on GPU, that's the real improvement I'm looking for.
Logged
Fallsburg
Level 10
*****


Fear the CircleCat


View Profile
« Reply #27 on: March 16, 2012, 05:01:12 AM »

Yeah, leveraging the shaders is a big deal.  For example: I have a 1024x1024 bitmap (texture) and I want to blur it by 32 pixels.  In standard AS3, it takes my computer 28ms, so I would be straining to go at 30 fps, but with a decent shader, I can do it all day long at 60fps while having another thousand sprites dancing around. 

I also have a tilemap shader that takes in a color coded bitmapdata and scales it up.  I can render a 131072px x 131072px level (2048 tiles x 2048 tiles, with 64 pixel tiles) in the same amount of time it takes me to render anything else.

My inclination is that Stage3D is pretty powerful, but going about it in a naive way like Starling (and nd2d, to a lesser degree) do only gets marginal results.  That being said, most games are only going to need a couple thousand sprites displayed at most, and you get the rotation, scaling, and alpha for free, so I don't know why one wouldn't use Stage3D if they were just starting a game.
Logged
raigan
Level 5
*****


View Profile
« Reply #28 on: March 16, 2012, 03:42:35 PM »

I also have a tilemap shader that takes in a color coded bitmapdata and scales it up.  I can render a 131072px x 131072px level (2048 tiles x 2048 tiles, with 64 pixel tiles) in the same amount of time it takes me to render anything else.

Do you mean that each pixel in the color-coded bitmap represents a tile? That's pretty awesome! How does it work?

Logged
Fallsburg
Level 10
*****


Fear the CircleCat


View Profile
« Reply #29 on: March 16, 2012, 04:50:15 PM »

Yup, that's exactly what I mean.

The main portion is a fragment shader that takes in two textures and a couple of parameters.  Each pixel in the first texture (the tilemap) is some combination of red (x coord) and green (y coord) that corresponds to the UV coordinates in the second texture (the text atlas, of sorts).  The parameters are the number of tiles per row/column in the tilemap (right now it only takes square maps) and the number of tiles per row/column in the texture atlas (again, it only takes square texture atlases).

From there, you scale up the first texture, do a sample on it (to get the coordinate of the tile it points to), then you scale down its UV coordinates by the number of tiles per row/col of the tilemap (to get the proper UV coord for the tile itself), you add the sample and the new UV coord and then sample the second texture with that coordinate. 

It's a little hard to explain, but it's actually pretty damn easy to use.  If anyone wants my shader and the associated code, I can set them up. 

Logged
raigan
Level 5
*****


View Profile
« Reply #30 on: March 16, 2012, 05:29:08 PM »

From there, you scale up the first texture, do a sample on it (to get the coordinate of the tile it points to), then you scale down its UV coordinates by the number of tiles per row/col of the tilemap (to get the proper UV coord for the tile itself), you add the sample and the new UV coord and then sample the second texture with that coordinate. 


I think I follow you; if I understand correctly, you're drawing a fullscreen quad (or similar) to evaluate this shader once for each pixel onscreen, right? Pretty cool idea to save lots of memory/bandwidth!
Logged
Fallsburg
Level 10
*****


Fear the CircleCat


View Profile
« Reply #31 on: March 16, 2012, 05:56:16 PM »

From there, you scale up the first texture, do a sample on it (to get the coordinate of the tile it points to), then you scale down its UV coordinates by the number of tiles per row/col of the tilemap (to get the proper UV coord for the tile itself), you add the sample and the new UV coord and then sample the second texture with that coordinate. 


I think I follow you; if I understand correctly, you're drawing a fullscreen quad (or similar) to evaluate this shader once for each pixel onscreen, right? Pretty cool idea to save lots of memory/bandwidth!

Yeah, pretty much.  Honestly, I'm not quite sure how it works (my graphics pipeline is pretty poor), but it definitely saves on texture memory/draw calls/number of vertices/number of objects to keep track of.
Logged
Franklin's Ghost
Level 10
*****



View Profile WWW
« Reply #32 on: March 17, 2012, 06:39:40 AM »

It's a little hard to explain, but it's actually pretty damn easy to use.  If anyone wants my shader and the associated code, I can set them up. 

Hey Fallsburg think I partially understand but would be awesome if I could get your shader and associated code. Always been better with code to look through to understand things  Smiley Thinking of starting my next game soon and just want to approach it with the best position possible.
Logged

Fallsburg
Level 10
*****


Fear the CircleCat


View Profile
« Reply #33 on: March 17, 2012, 07:06:04 AM »

Code:

package circlecat.materials
{
    import de.nulldesign.nd2d.geom.Face;
    import de.nulldesign.nd2d.materials.shader.ShaderCache;
    import de.nulldesign.nd2d.materials.Sprite2DMaterial;
    import de.nulldesign.nd2d.materials.texture.Texture2D;
    import flash.display3D.Context3D;
    import flash.display3D.Context3DProgramType;
    /**
     * ...
     * @author Adam Summerville
     */
    public class CCTileMaterial extends Sprite2DMaterial
    {
        protected const FRAGSHADER:String =
                "tex ft0, v0, fs0 <2d,miplinear,nearest,clamp>\n" + // sample texture from interpolated uv coords
                "mul ft1, v0, fc2.y \n"+ 
                "frc ft1 ft1 \n" + 
                "mul ft1, ft1, fc2.x\n" +
                "add ft1, ft0, ft1\n" +
                "tex ft2, ft1, fs1 <2d,mipnone,nearest,clamp>\n" + // sample texture from interpolated uv coords
"m44 oc, ft2 , fc3 \n ";
        protected var sheetTexture:Texture2D;
        protected var spritesPerRow:Number;
protected var pixelsPerRow:Number;
/**
*
* @param tex           The spritesheet to use (no spaces between tiles, must be square)
* @param SpritesPerRow The number of sprites per row in the sheet
* @param TilesPerRow   The number of tiles per row
*/
        public function CCTileMaterial(tex:Texture2D,SpritesPerRow:Number,PixelsPerRow:Number)
        {
spritesPerRow = SpritesPerRow;
pixelsPerRow = PixelsPerRow;
            sheetTexture = tex;
            super();
        }
public static function getTileColor(tileIndex:Number, tilesPerRow:Number):uint {
var tileColor:uint = 0xff000000;
var tileRed:uint = 256*( (tileIndex % tilesPerRow)/tilesPerRow);
var tileGreen:uint = 256*( Math.floor(tileIndex / tilesPerRow)/tilesPerRow);
tileColor += tileRed * 256 * 256 + tileGreen * 256;
return tileColor;
}
        override protected function prepareForRender(context:Context3D):void
        {
            super.prepareForRender(context);
            context.setTextureAt(1, sheetTexture.getTexture(context));
            var constantsVector:Vector.<Number> = new Vector.<Number>();
            constantsVector.push(1/spritesPerRow); // 1 / number of sprites in sprite sheet (needs to be square)
            constantsVector.push(pixelsPerRow);   // number of tiles per row/column (needs to be square)
            constantsVector.push(0);
            constantsVector.push(1);
           
            context.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 2, constantsVector);
context.setProgramConstantsFromVector(Context3DProgramType.FRAGMENT, 3, colorTransformMatrix);
        }
        override protected function clearAfterRender(context:Context3D):void
        {
            context.setTextureAt(1, null);
            super.clearAfterRender(context);
        }

override public function dispose():void {
super.dispose();

if(sheetTexture) {
sheetTexture.dispose();
sheetTexture = null;
}
}

override public function handleDeviceLoss():void {
super.handleDeviceLoss();
sheetTexture = null;
}

        override protected function initProgram(context:Context3D):void {
            if(!shaderData) {
                shaderData = ShaderCache.getInstance().getShader(context, this, VERTEX_SHADER, FRAGSHADER, 4, texture.textureOptions);
            }
        }
    }

}

So, there's the shader and the code.  I'm using nd2d, so this is set up as an nd2d material, but given that all of the rendering happens in here it should be roughly transferrable to just a Stage3d environment.

The main thing not shown here is the setting up of the vertex buffer, but that's just a standard quad.

Couple of things:
1) You have to scale up the quad by at least a factor of the size of your tiles (so if your tiles are 32x32, you should scale the quad by at least 32x32) otherwise you want show the entire tile.  If you want to scale things more, I would suggest whole multiples of your tile size (e.g. 32x32 tile should be scaled up 32x32,64x64) as the nearest neighbor sample works best under those conditions.
2) Because it needs to do a nearest neighbor sample, it is probably best to not rotate the sprite.  I haven't done it yet, but I'm betting it isn't pretty. 
3)The only other weirdness of this is that the camera has to be on a whole pixel, i.e. no fractional part.  I still don't understand quite why this is, but if it is at a fractional location (e.g. 1.1,1.2 instead of 1.0,1.0) there will be an outer ring of 1 pixel around the tiles that is set to some unknown value.  This is weird, but honestly sub-pixel fidelity on the camera is typically not necessary.
Logged
st33d
Guest
« Reply #34 on: March 22, 2012, 03:01:25 AM »

iOS tests took quite a while. I compiled a bunch of tests without overlaying the latest AIR SDK. I had to follow these instructions to get it to work:

http://blog.michaeljbowen.com/?p=127

Then I had to redo all the tests. I also double checked that I was in OpenGL rendering mode for the Starling and ND2D tests.

I tested using an iPod 3G, which is our office's "worst case scenario" device. I tested GPU and CPU rendering compiles for each test. Mostly because Adobe is claiming that there is a big speed benefit to CPU - which I found to be true in only one edge case.







They're a bit like the desktop results to be brutally honest.

Starling seems to be able to sustain a greater load over time, whereas ND2D can manage more before its initial choke.

CPU rendering works better only for static MovieClips it appears. Pretty much the same optimisation that is going on in the desktop Flash Player - which I think is good, but could mislead some people doing benchmarks into thinking it's better. Across the board, it isn't.

Most surprising is the blitting, everyone is saying it's bad but my figures say otherwise.

I'll be trying to set up a repo on github now for all of the code for these tests so people can scrutinise them. Don't take anyone's benchmark results at face value. Especially not mine.

I think there's still some raw Stage3D tests to be done. I might take a break though and get some more game programming done at work.
Logged
st33d
Guest
« Reply #35 on: March 22, 2012, 06:14:08 AM »

Here's the repository for all of the code from my tests:

https://github.com/nitrome/stage3dbenchmarking

There are also the spreadsheets uploaded and the ipa files.

If you'd like to contribute stats or improve on my benchmark code please do. The tests should be simple, transparent and practical for use in games.

I'm going to take a break before I tackle shaders.
Logged
Pages: 1 [2]
Print
Jump to:  

Theme orange-lt created by panic