GLSL: How do you dynamically index into an array of structs in a ubo or ssbo

In GLSL how do you dynamically index into an array of structs in a ubo or ssbo without having the compiler optimize and break the code into giving one static value?

Can you provide an example of what you are trying to do? It’s been a bit since I’ve worked with raw GLSL

#version 430

struct BodyPart {
    vec3 radiusA;
    vec3 radiusB;
    int jointA;
    int jointB;
    int shape;
    float curve;
};

layout(std430, binding = 1) buffer uJoints {
    vec4 joints[64];
};

layout(std430, binding = 2) buffer uBodyParts {
    BodyPart parts[64];
    int partCount;
    int padp[3];
};

// ----------------- SCENE SDF -----------------
HitInfo sdfScene(vec3 p) {
    HitInfo hit;
    hit.distance = 1e6;
    hit.partID = -1;

    for (int i = 0; i < partCount; i++) {
        float d = sdfSphere(p, joints[parts[i].jointA].xyz, parts[i].radiusA.x); //this is the problem line
        if (d < hit.distance) {
            hit.distance = d;
            hit.partID = i;
        }
        if(d<0) break;
    }
    return hit;
}

So this is in a opengl compute shader, note this is not the whole shader the whole shader is and sdf ray marcher, but its the relevant parts so you can see what im trying to do.

sdfSphere is an sdf function taking point, sphere center, sphere radius

so the problem is is the int returned by parts[i].jointA is a non const dynamic and comparitivly i is a const compile time known value. the compiler seems to be assuming parts[i].jointA will be const when it unrolls the loop so i only get one joint(a position) for all spheres making them render all on top of each other. If i change the problem line to be float d = sdfSphere(p, joints[i].xyz, i); and and loop i=0 to i<jointCount i can render the joint positions clearly as spheres in different locations their size showing which. The point here is the compiler seems to unroll the for loops and if the value of something accesing an array in a ubo or ssbo is dynamic it seems to assume its const and only return one value not all the changing values.

ps:
even forgoing the loop and doing this in place of the loop still clearly does not refrence the correct joint and is clearly rendering the postion of joints[0] in the array:

    BodyPart part = parts[2];
    hit.distance = sdfSphere(p, joints[part.jointA].xyz, part.radiusA.x);
    hit.partID = 2;

Take what I am going to say with a grain of salt as its been a bit since I have played with GLSL and only started getting back into it myself. I think you might be looking at a memory structure issue and not one where the compiler is mangling your code.

Have you been able to confirm that jointA is set to the expected value in the buffer for all the BodyParts and that the buffer is formatted correctly? I know personally when I hit issues like that 9 times out of 10 its been due to me not setting the data in the buffer up correctly.

yes i can confirm both that was the point of mentioning

    for (int i = 0; i < jointCount; i++) {
        float d = sdfSphere(p, joints[i].xyz, i+1);
        if (d < hit.distance) {
            hit.distance = d;
            hit.partID = i;
        }
        if(d<0) break;
    }

resultss in the spheres all at different locatoins proving joints is correctly formated and the uplod is working. i have also done :

    for (int i = 0; i < partCount; i++) {
        float d = sdfSphere(p, vec3(10*i,0,0), parts[i].jointA+1);
        if (d < hit.distance) {
            hit.distance = d;
            hit.partID = i;
        }
        if(d<0) break;
    }

which also works and proves by drawing spheres will all different radius that the int jointA parameter is being uploaded into correctly in the correct position, note that in both examples the radius’ or postions are where i expect them to be base on thee uploaded data, and for now i have hard coded it so i know what to expect. also i tried this with a std140 layout as a ubo instead of the std430 ssbo same results

So I have proven that the data is there, and has the correct value. Furthermore i can acess it when doing a single load operation where the load is a compile time const like i in these loops, but if the load out of the ssbo or ubo using another loaded value out of a differnt ubo ssbo the compiler seems to some how be optimizing out the first load of the value to index into the second load.

So i have thought of a work around and that is in my c++ to just instead of uploading a seperate ubo of joints in in the part ssbo just upload the full vec4 joint there, and it would solve the issue, and that is probably what i will end up doing, but just from a learning stand point if i am some how misusing the language or there is a trick to force the compiler to not optimize out the load, i would be all ears to learning how to do this without the workaround.

Side point im aware of other having to do funny tricks to force the compiler to not optimize out values, like here:
ShaderToy - Selfie Girl - by: IQ

// Computes the normal of the girl's surface (the gradient
// of the SDF). The implementation is weird because of the
// technicalities of the WebGL API that forces us to do
// some trick to prevent code unrolling. More info here:
//
// https://iquilezles.org/articles/normalsSDF
//
vec3 calcNormal( in vec3 pos, in float time )
{
    const float eps = 0.001;
#if 0    
    vec2 e = vec2(1.0,-1.0)*0.5773;
    return normalize( e.xyy*map( pos + e.xyy*eps,time,kk ).x + 
					  e.yyx*map( pos + e.yyx*eps,time,kk ).x + 
					  e.yxy*map( pos + e.yxy*eps,time,kk ).x + 
					  e.xxx*map( pos + e.xxx*eps,time,kk ).x );
#else
    vec4 n = vec4(0.0);
    for( int i=ZERO; i<4; i++ )
    {
        vec4 s = vec4(pos, 0.0);
        float kk; vec3 kk2;
        s[i] += eps;
        n[i] = mapD(s.xyz, time).x;
      //if( n.x+n.y+n.z+n.w>100.0 ) break;
    }
    return normalize(n.xyz-n.w);
#endif   
}

where ZERO is defined as :

#define ZERO min(iFrame,0)

the point is mapD is was being inlined 4 times with the same values paassed vs calling it 4 times with seperate values.

I did try this type of trick but it does not solve my issue. A key difference is his first value s is not loaded from a ubo or ssbo but rather generated in the loop, which is distinctly different in that s is derived from no ubo or ssbo values

Interesting, my thing is I am not sure why the compiler would be loop unrolling your code in the first place though. In the example above it makes sense the compiler would be doing that as with the start of i and how many times to loop already being known. Also the issue the linked article seems to be talking about how loop unrolling can reduce the number of instructions available for your shader which is legit since loop unrolling and inlining does bloat the binaries which I am not sure is your case.

Since in your example you are setting the limit by what is set in partCount the compiler cant really know how many loops there could be so it cant really unwrap that loop unless the opengl compiler can look into a prelinked buffer and use that. Now maybe its inlining the sdfSpere call but I am not sure how that might be causing your issue.

If I can come up with anything I think could be causing that I will let you know but at least I can help rubber duck it a bit lol.

just one more food for thought for anyone looking and thinking on this, the following is also broken and results in the same issue of the sphere not rendering in the correct position, in fact in renders in the position of joint[0]:

// ----------------- SCENE SDF -----------------
HitInfo sdfScene(vec3 p) {
    HitInfo hit;
    hit.distance = 1e6;
    hit.partID = -1;

    hit.distance = sdfSphere(p, joints[parts[2].jointA].xyz, parts[2].radiusA.x);
    hit.partID = 2;

    return hit;
}

note the removed loop and direct call to render a single speific compile time const part position, just unknown joint array offset.

Just to confirm what you are saying here, the example above is getting invalid data at the array index of 2?

ok what im saying is its rendering the shere at position 0,0,0(position 0 in the ubo of vec4’s) not 0,-10,0(the vec4 at position 3(4th joint in the array)) again refrencing back to

for (int i = 0; i < partCount; i++) {
        float d = sdfSphere(p, vec3(10*i,0,0), parts[i].jointA+1);
        if (d < hit.distance) {
            hit.distance = d;
            hit.partID = i;
        }
        if(d<0) break;
    }

if i do this i can confirm for parts[2] that jointA has a stored value of 3 as it renders a spehre at 20,0,0 with a radius of 4, it being the 3rd sphere left to right as parts[2] is the 3rd part in the parts array.

Also note if i change joints[0] to be hard coded to another value say -20,-20,0 the sphere in moves to that postion in the no loop example in my last comment so its not reading garbage data, its specifically reading joints[0] despite parts[2].jointA being 3 and joints[3] being 0,-10,0

So when you did this and hard coded index 2 in parts what happened?

it renders a sphere at 0,0,0 with radius 10.

to be clear parts[2].radiusA.x is supposed to be 10 so that load is correct, but again to be clear joints[parts[2].jointA].xyz should be equivalent to joints[3].xyz which has a value of 0,-10,0 for that position.

also not trying to repeate my self to much but as in one of my tests above:

    for (int i = 0; i < jointCount; i++) {
        float d = sdfSphere(p, joints[i].xyz, i+1);
        if (d < hit.distance) {
            hit.distance = d;
            hit.partID = i;
        }
        if(d<0) break;
    }

i can clearly show that joints[3] is 0,-10,0 and joints[0] is clearly 0,0,0

I get that but the issue seems to be with parts and not joints. I think the parts struct it malformed in the buffer and is not being set correctly or the data is being pushed around due to maybe different primitive sizes in memory.

I can not say for sure since I dont know how you are setting the values in the buffer but the fact that when you hard code the index you still dont get the expected result is a big red flag that the data is not in the place it is supposed to be. It is possible that maybe the int is a different size and that could be pushing the data around in the struct. That could explain the results you saw in your tests as it might have eventually encountered the correct data.

If I was you I would dump the buffer into VS code and look through it with a hex editor paying special attention to the BodyPart struct to ensure the memory layout looks correct and there is no weird padding or larger then expected data types (int32_t vs int64_t for example) that might be throwing off what the gpu is expecting.

My bet with what you have said so far is the int being set in the buffer is the wrong size for what the GPU is expecting and that would explain why changing to a Vec4 seemed to fix it.

I am working of a fair number of assumptions here so could also be off base but that is where I would place my money with what you have told me so far.

again there seems to be some missunderstandings here. Not sure if all of these are missunderstood but ill just go through any that i think there might be:

first when i say hard code the values in joints or parts im talking about in c++ hard coding the data structuers to be a static value that is then upladed to the gpu, instead of the normal or finished product where it will be runtime determined base on the scene in c++.

next would be shader compile time const coding like if i hard code “joints[3].xyz” i get the expected value a vec4 0,-10,0,0 from my ssbo joints(again the 0,-10,0,0 is the “hard coded” value i put in c++ its literally baked into the compiled c++ binary). if i hard code into the shader “parts[2].radiusA.x” i get the expected vec3 10,0,0 from my parts ssbo and i can use either in a line to render a sphere proving parts and joints are not mall formed and the structure is correct.

The next is where it starts getting confusing if you do a for loop the loop the loop variable does retirive the expected value showing that as a whole the whole ssbo is valid and intact as i can read each element correctly all in the same pass back to these examples:

    for (int i = 0; i < jointCount; i++) {
        float d = sdfSphere(p, joints[i].xyz, i+1);
        if (d < hit.distance) {
            hit.distance = d;
            hit.partID = i;
        }
        if(d<0) break;
    }
    for (int i = 0; i < partCount; i++) {
        float d = sdfSphere(p, vec3(10*i,0,0), parts[i].jointA+1);
        if (d < hit.distance) {
            hit.distance = d;
            hit.partID = i;
        }
        if(d<0) break;
    }

The problem is when you do something like “joints[parts[i].jointA].xyz” where first you must resolve parts[i].jointA before using that resolved value in joints[value].xyz this is where things explode. some how this is compiling and “joints[parts[i].jointA].xyz” seem to be transfomring into “joints[0].xyz”

Oh no I get all that but what I am saying is I think your second test case where you are using parts[i].jointA+1 is throwing a false positive unless the add op is doing something.

Generally speaking if you hard code an index and you dont get the value you expect back from it there is an issue with the memory. If I understand correctly you are expecting joints[parts[2].jointA].xyz to basically be joints[3].xyz meaning jointA should be of value 3 in the struct. Doing some rough math that I assume the struct BodyPart is ~40 bytes in size (assuming 4 byte ints and floats). So the value you want (parts[2].jointA) should be around byte 104 (I am making a lot of assumptions about how GLSL wants the buffer laid out so take this more as an example).

But when you access byte 104 you get 0 (or you think 0) instead of 3. A big issue with C++ though is object padding since there is no standard on it, so if I assume you have a struct in your C++ code that mimics the GLSL one (not saying you are doing that but I have seen it) the C++ BodyPart may actually be 42 bytes instead of 40 which would push the value of parts[2].jointA to byte 108 which is firmly in jointB.

Now you could be correct and maybe GLSL is inlining parts[2].jointA in a weird way during compile. If we assume that is the cause what I would suggest is pull it out and assign it to an index variable like int jointAIndex = parts[2].jointA; and see if that works since the compiler shouldn’t inline that. But if that fails then there is a good chance your buffer is the issue.

Since this is a compute shader may be helpful to echo back out some of those values raw in the response just to confirm as well.

I am not saying that the compiler isn’t doing something weird, but your situation really seems like a buffer issue which is much more common. I would imagine if the GLSL was incorrectly optimizing joints[parts[2].jointA].xyz there would be a lot more talk around it so I think its a red herring.


Ok ignore the flaws it works! i will eventually Frankenstein an human sdf.

Nice! What seemed to have been the issue or did you just work around it?

it was a memory layout issue, but in my opinion the fault is of the compiler failing to compile the correct stride in all cases. See Opengl glsl specs on layout about Vec3 an Vec4 both requring a 4N stride, but it seems when the BodyPart was defined as:

struct BodyPart {
    vec3 radiusA;
    vec3 radiusB;
    int jointA;
    int jointB;
    int shape;
    float curve;
};

it did not always respect the 4n stride. For example it did when indexing into the ssbo via for loop with compile time known i it did, but did not when doing the dynamic indexing. i changed the shader to be vec4’s which even more explicitly matches no mater what the c++ data upload which matches the open gl spec which is those vec3s on the c++ side were store in 16bytes of memory not 12bytes. anyhow this is the new struct def in the shader:

struct BodyPart {
    vec4 radiusA;
    vec4 radiusB;
    int jointA;
    int jointB;
    int shape;
    float curve;
};

id also add for note after testing this i went back and tested again with vec3’s but removing the 4 padding bytes after each 12 bytes for the 3 floats and that also did not work at all because it for sure broke the specs 4n requirements.

p.s. compilers are the bane of my coding existence. Don’t even get me started on c++'s schrodinger’s coroutine frames (my new nickname for the phenomenon), as weather they are dead or alive, nobody knows.