Welcome, Guest. Please login or register.

Login with username, password and session length

 
Advanced search

1411423 Posts in 69363 Topics- by 58416 Members - Latest Member: JamesAGreen

April 19, 2024, 03:23:50 AM

Need hosting? Check out Digital Ocean
(more details in this thread)
TIGSource ForumsDeveloperTechnical (Moderator: ThemsAllTook)Shader Performance and distance calc
Pages: [1] 2
Print
Author Topic: Shader Performance and distance calc  (Read 3471 times)
Sigma
Level 1
*


View Profile WWW
« on: December 14, 2014, 12:05:29 PM »

Hi,
  I have a shader that calculates a constant 30 distance calculation per pixel in my frag shader, though its fine in adreno gpu's and modern iPhone's. In iPhone 4S the performance
 is below as 10 fps. Is there a way to avoid distance formula inside a shader and still can achieve the same with minor errors. I mean any replacement for distance formula with minor errors ?


All Helps are highly appreciated Smiley
Logged

ThemsAllTook
Administrator
Level 10
******



View Profile WWW
« Reply #1 on: December 14, 2014, 12:23:52 PM »

I'm not sure you'll be able to get better performance with it than the GPU's sqrt(), but this is the method I know: http://en.wikipedia.org/wiki/Newton%27s_method

Iterate more times for better precision, or fewer times for better performance.
Logged

Fallsburg
Level 10
*****


Fear the CircleCat


View Profile
« Reply #2 on: December 14, 2014, 01:09:18 PM »

You can converge faster than Newton's with higher order derivatives as well.
Logged
BorisTheBrave
Level 10
*****


View Profile WWW
« Reply #3 on: December 14, 2014, 03:22:41 PM »

I'm not an expert on shader performances, but generally speaking you can get some improvements by reworking the code to avoid needing the sqrt computed. Recall that length(p) = sqrt(dot(p,p)). So we can do some tricks:

Instead of
  if(length(p)>r)
do
  if(dot(p,p)>r*r)

Instead of
  texture(my_1d_tex, length(p)
do
  texture(my_1d_tex, dot(p,p))

where my_1d_tex is appropriately distorted (note, without care you may get some precision issues)

Instead of:
  pow(length(p), k)
do
  pow(dot(p,p), k / 2)

Hope one of those identities is applicable to you.
Logged
Gtoknu
Level 0
***


View Profile
« Reply #4 on: December 14, 2014, 04:12:12 PM »

Boris pretty much nailed it. But here are some more tips to deal with the sqrts:


Move every calculation you can off the shaders. If something is known before-hand, don't recalculate it.
For example, you send a uniform, but just use its sqrt: Instead of sending that value, send its sqrt directly!
The same is valid for varyings. (though, i'm not sure the interpolation will get right. You may have to test)
If you post your entire shader, we may be able to help a little more than just guesses.
Logged

wut
Sigma
Level 1
*


View Profile WWW
« Reply #5 on: December 14, 2014, 09:07:21 PM »

Thanks for the reply, i will try all the solutions and update the thread in the evening.
Logged

Columbo
Level 0
***


View Profile
« Reply #6 on: December 14, 2014, 10:35:44 PM »

Can you do the calculation at a lower precision? Use mediump instead of highp perhaps?

A 4S has a pretty reasonable GPU, and a distance calculation is not that expensive, are you sure it's the distance calculation that's killing the frame rate? Could you post the shader? Maybe there's something else going on like a dependent texture read.
Logged

surt
Level 7
**


Meat by-product.


View Profile
« Reply #7 on: December 14, 2014, 11:18:13 PM »

Probably not minor enough an error but there is the octagonal distance approximation.
Logged

Real life would be so much better with permadeath.
PJ Gallery - OGA Gallery - CC0 Scraps
Sigma
Level 1
*


View Profile WWW
« Reply #8 on: December 15, 2014, 12:44:34 AM »

Can you do the calculation at a lower precision? Use mediump instead of highp perhaps?

A 4S has a pretty reasonable GPU, and a distance calculation is not that expensive, are you sure it's the distance calculation that's killing the frame rate? Could you post the shader? Maybe there's something else going on like a dependent texture read.

I can post once i reach home in the evening. What i noticed is commenting out distance calculation boosts the performance from  10 fps to 30 fps. Also i'm using medium precision not high precision though.
Logged

Sigma
Level 1
*


View Profile WWW
« Reply #9 on: December 15, 2014, 12:46:39 AM »

Probably not minor enough an error but there is the octagonal distance approximation.

its interesting, will give a try and update Smiley
Logged

Sigma
Level 1
*


View Profile WWW
« Reply #10 on: December 15, 2014, 10:25:53 AM »

Boris pretty much nailed it. But here are some more tips to deal with the sqrts:


Move every calculation you can off the shaders. If something is known before-hand, don't recalculate it.
For example, you send a uniform, but just use its sqrt: Instead of sending that value, send its sqrt directly!
The same is valid for varyings. (though, i'm not sure the interpolation will get right. You may have to test)
If you post your entire shader, we may be able to help a little more than just guesses.

tried all the solutions nothing meets the expectation.

mediump vec2 pixelPos = v_texCoord * u_resolution;
   
    lightDist = distance(pixelPos, lights[0].position);
    lightDist = lightDistDivider/lightDist;
    v_fragmentIntensity = v_fragmentIntensity + (lights[0].color * (lightDist * lights[0].intensity));


this is the pixel shader code. Can you see some bottle necks in this ? the same piece of code is repeated nearly 30 times because putting in it in a for loop eats the fps. As branching is not handled efficiently in most of the mobile devices. i also tried using varying variables but no help may be i'm handling wrong.

Suppose i'm passing the distance calc for each object from outside shader. How can i handle that using varying variables?

what i tried :

1) for each object, i passed the distance of the object from each corner of the device and stored it in vertex shader using a var like this distFromBottomLeft[30], distFromBottomRight[30], .....TopLeft[30], ......TopRight[30]

2) based on the vertex position i'm assigning the varying var(lightDist[30]) defined in pixel shader as follows
      1) if vertex_position.x == 0.0 then left
          if vertex_position.y > 0.0 then top_left
          assign lightDist[index] in pixel shader = distFromTopLeft[index]

i know the above mentioned way of using varying variable might be wrong can you guys help me how to use varying variables?


All helps are highly appreciated Smiley
Logged

Columbo
Level 0
***


View Profile
« Reply #11 on: December 15, 2014, 11:38:20 AM »

lightDistDivider, lights[0].color and lights[0].intensity all get multiplied together, so you could just ditch lightDistDivider and lights[0].intensity and folder them into the lights[0].color constant.

But I think probably trying to have 30 per-pixel lights is pushing it a bit. If you have 30 lights in your scene then the usual approach with forward rendering is to select a maximum of about 4 of the closest lights for each object/chunk of the environment and render just those. Having different permutations of the shader for different numbers of lights is another valuable approach (e.g. Have a 1 light, 2 light, 4 light and 8 light version of your shader, and pick the lowest light count you can get away with).

FWIW, I also believe that squared falloff is more technically correct than linear falloff (not that lighting models have to be physically correct). So if you can make the lighting look good with something like

vec3 vToLight = lights[0].position - pixelPos;
lightDist = lightDistDivider/dot(vToLight, vToLight);

And if that solves your performance problem, then that might be OK for you.

This is quite a nice article on optimising shaders: http://aras-p.info/blog/2011/02/01/ios-shader-tricks-or-its-2001-all-over-again/

Logged

BorisTheBrave
Level 10
*****


View Profile WWW
« Reply #12 on: December 15, 2014, 01:34:55 PM »

Ok, now you've explained what exactly is going on, I've got some more specific ideas. These are all things I know are actually done professionally, but I have no idea which are appropriate for mobile!


1)
Vertex lit lighting
Using varying is a good idea, though i'm not sure you've quite got it. You compute them once per vertex then linearly interpolate the rest of the pixels. You'd need to use a grid of vertices though for your geometry, where more points is slower, but more accurate.

The shaders would look something like this.

Code:
// vertex shader
...
varying vec4 light_color;
void main()
{
...
for(...)
  light_distance += lights[i].color * lights[i].intensity * distance(gl_Pos, light_pos[i]);
}

// fragment shader
...
varying vec4 light_color;
void main()
{
...
for(...)
  v_fragmentIntensity += light_color;
}

Note, that because varying variables are linearly interpolated, all the scaling and summing you did in the fragment shader can be lifted to the vertex shader, cutting down vastly on how much data is passed.

2)
Squared falloff
Like Colombo says, just replace distance with dot, and you'll get something much faster. It'll look different, but who knows, it might look better?

3)
Stop using 30 lights
Another Colombo suggestion
Too many lights is a common issue. The trick is to realize that you don't need all 30 to light each pixel. Typically only the 4 nearest lights contribute the bulk of the color. Particularly if you are careful designing the scene. You can do some trickery in software to decide what the 4 nearest lights are, then render just those.

Bonus points if fade out the lights for continuity reasons.

Unity has short summary of this idea: ref.


4)
Light probes
Are your lights static? If so, use light probes. This technique supports infinitely many lights, at decent fidelity.

5)
Performance check
Are you *sure* you've diagnosed where the speed issue is? Even for mobile, computing 30 distance operations per pixel sounds pretty trivial. Try editing random chunks of your shader to see what exactly is required to make it go slowly. Consider using a 1d texture lookup - they are way slower than sqrt operations, but when it comes to doing 30 of the things, the parallelization win out.
Logged
BorisTheBrave
Level 10
*****


View Profile WWW
« Reply #13 on: December 15, 2014, 01:37:18 PM »

That link of colombo's has a super obvious and easy fix you should try before anything else:

Code:
precision lowp float
Logged
Sigma
Level 1
*


View Profile WWW
« Reply #14 on: December 15, 2014, 07:48:50 PM »

That link of colombo's has a super obvious and easy fix you should try before anything else:

Code:
precision lowp float

tried already no improvements

tried this as well

vec3 vToLight = lights[0].position - pixelPos;
lightDist = lightDistDivider/dot(vToLight, vToLight);

- effect is not the same

Will try varying variable you have posted and also will try reducing number of lights.
Never heard of light probes will try that too.

Thank you all, will update the thread with the new results.

Logged

Sigma
Level 1
*


View Profile WWW
« Reply #15 on: December 16, 2014, 08:08:05 AM »

hi all thanks for al the support and answers, finally increased the frame rate with less number of lights. REALLY HATS OFF FOR THE SUPPORT.

But one thing i didn't understand

1) Even after changing the distance calc to dot version, the fps was same though the effect didn't improve or not the same( way less than what squared distance can offer ),
Is it because of the cocos2dx or the way iPhone4S is,  no idea yet.

If anyone have an knowledge or hypothesis please share, would like to know more about the shaders and GPU. Its really a pain to use shaders in devices with n number of spec and res. I dunno how I'm gonna handle these for android devices.

Thank U all Smiley
Logged

BorisTheBrave
Level 10
*****


View Profile WWW
« Reply #16 on: December 16, 2014, 03:05:34 PM »

Can you post your complete shader. Sounds like assuming the distance calc was the slow part was wrong.
Logged
Sigma
Level 1
*


View Profile WWW
« Reply #17 on: December 16, 2014, 07:32:56 PM »

i will post the complete shader with c++ code once reach home in the evening
Logged

Sigma
Level 1
*


View Profile WWW
« Reply #18 on: December 22, 2014, 07:22:19 AM »

Can you post your complete shader. Sounds like assuming the distance calc was the slow part was wrong.

sorry for the late reply, out of station. Here is the complete code

--------------------------
VertexShader
--------------------------

attribute vec4 a_position;
attribute vec2 a_texCoord;
varying mediump vec2 v_texCoord;

void main()
{
    gl_Position = CC_PMatrix * a_position;
    
    v_texCoord = a_texCoord;
}

--------------------------
PixelShader
--------------------------

precision mediump float;

struct Light
{
    mediump vec2 position;
    mediump vec3 color;
    int isActive;
};

uniform Light lights[15];

varying mediump vec2 v_texCoord;
uniform mediump vec2 u_resolution;

void main()
{
    mediump vec4 fragColor = texture2D(CC_Texture0, v_texCoord);
    mediump vec3 v_fragmentIntensity = vec3(0.0, 0.0, 0.0);
    mediump float lightDist;
    mediump float lightDistDivider = 2.0;
    mediump vec2 pixelPos = v_texCoord * u_resolution;
    
    if (lights[0].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[0].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[0].color * lightDist);
    }
    if (lights[1].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[1].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[1].color * lightDist);
    }
    if (lights[2].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[2].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[2].color * lightDist);
    }
    if (lights[3].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[3].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[3].color * lightDist);
    }
    if (lights[4].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[4].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[4].color * lightDist);
    }
    if (lights[5].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[5].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[5].color * lightDist);
    }
    if (lights[6].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[6].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[6].color * lightDist);
    }
    if (lights[7].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[7].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[7].color * lightDist);
    }
    if (lights[8].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[8].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[8].color * lightDist);
    }
    if (lights[9].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[9].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[9].color * lightDist);
    }
    if (lights[10].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[10].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[10].color * lightDist);
    }
    if (lights[11].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[11].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[11].color * lightDist);
    }
    if (lights[12].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[12].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[12].color * lightDist);
    }
    if (lights[13].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[13].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[13].color * lightDist);
    }
    if (lights[14].isActive == 1)
    {
        lightDist = distance(pixelPos, lights[14].position);
        lightDist = lightDistDivider/lightDist;
        v_fragmentIntensity = v_fragmentIntensity + (lights[14].color * lightDist);
    }
    
    gl_FragColor = vec4(v_fragmentIntensity * fragColor.rgb, 1.0);
}



You have any clue ?
Logged

bdsowers
Level 3
***



View Profile WWW
« Reply #19 on: December 22, 2014, 08:33:59 AM »

Your branching there is problematic. Even when branching using a uniform variable as the condition, some devices don't like this.

Unfortunately, the instinctive way to work around this might be to start accessing the lights[] uniform array with dynamically calculated indices, which is even worse.

Definitely always use lowp versus mediump if you can get away with it. It doesn't always help, but when it does it can help a LOT. I've seen 20+ framerate bumps just from that alone.

Ultimately though, that number of lights is killer. The older mobile devices have fill-rate issues. If you try running this on an iPad 1 it'll hurt your feelings.
Logged

Pages: [1] 2
Print
Jump to:  

Theme orange-lt created by panic