Ok, currently working with c++, with nested loops im looking for the fastest way to continue an outer loop from inside a nested loop. The loops in question are extremely hot, so the difference between setting a boolean to true from false is actually noticeable, and measurable across the time to complete all the iterations(were assuming the boolean is already allocated before entering any of the loops).
So for example consider:
for(int i=0;i<100;i++){
for(int j=0;j<100;j++){
for(int k=0;k<100;k++){
for(int l=0;l<100;l++){
for(int m=0;m<100;m++){
for(int n=0;n<100;n++){
//logic doing stuff with i,j,k,l,m,and n all together its from here we realize we need to continue the for loop for say i or k for instance.
}
}
}
}
}
}
The question is what is the fastest way from inside all the loops, to continue the i loop for instance?
the fastest i have found is to use goto with labels to jump straight to the continue. which would look something like this:
for(int i=0;i<100;i++){
for(int j=0;j<100;j++){
for(int k=0;k<100;k++){
for(int l=0;l<100;l++){
for(int m=0;m<100;m++){
for(int n=0;n<100;n++){
//logic doing stuff with i,j,k,l,m,and n all together its from here we realize we need to continue the for loop for say i or k for instance.
if(some_condition){
goto i_loop_continue;
} else if (other_condition){
goto j_loop_continue;
}...
} if(condition_to_continue_n) {
continue;
}
}
if(false)
m_loop_continue:
continue;
}
if(false)
l_loop_continue:
continue;
}
if(false)
k_loop_continue:
continue;
}
if(false)
j_loop_continue:
continue;
}
if(false)
i_loop_continue:
continue;
}
How would you approach this problem, what is the clean and fastest way to solve this problem. So far this goto label approach has proved to be 20% faster than using boolean and ifs. Would you consider the goto label approach?
The short answer to your question is use the âbreakâ statement. That will immediately get out of whatever loop youâre in, and return you to the next higher loop or the function if youâre in the outermost loop.
loop(j) {
if(true) {
break;
}
}
But before you do that, I have a couple pieces of advice for a new coder:
First of all, you usually want to use nested loops as little as possible, just as a general rule of thumb. Sometime you do want to use them, as itâs the cleanest, easiest way of doing it, but take a minute to decide if youâre sure about that.
Second, if you do use nested loops (or any kind of nested statements), which like I said, can be reasonable if you have a reason, you pretty much never want to go more than 2 or 3 deep. And even then, you want the counter on those inner loops to be as small as possible. The hit to performance gets worse exponentially with each loop deeper you go, literally. The way youâve got this written, with 6 loops running from 0 to 99 each, that innermost loop is running a trillion times, 100^6 = 1e12.
Third, I donât know what youâre doing with âlogic doing stuff with i,j,k,l,m,and nâ, but be careful to only read those values or copy them into a new variable if you have to manipulate them, and do not assign or set those iterators (unless you have a really, really good reason, and youâre super sure thereâs no better way to do what you want).
Fourth, just for your own sanity, I strongly recommend using actual, real, meaningful and easily readable variable names. Yes, even for a loop iterator, but especially if youâre using that many of them that close together in the same context. Using real names will make it harder to screw up the logic, and easier to debug it when you do.
Fifth, only ever break out of a loop if youâre sure the loop is for-real done, and never, ever jump into a loop, or into a function.
Finally, never ever use a âgotoâ unless you absolutely have too, and you are absolutely certain youâre thereâs no better way. Theyâre just way to easy too fuck up, thereâs pretty much never a reason to use them in a high level language like C++, and are pretty much always the first thing to look at when something does goes wrong.
So yeah, I donât know what problem youâre specifically trying to solve here, but if you show us your actual code, someone here can almost certainly help you find a better, faster, cleaner way to do what youâre trying to do with out so many loops, so many nested statements, and without using 'gotoâs.
Iâm neither a novice or expert programmer, somewhere in the middle closer to the novice side. Iâm aware there are some way more knowledgeable skilled programmers than I.
The answer to what im doing is writing an ai to my simulation game, heavy on the simulation(may be more simulation than game). The example above is the simplest way to demonstrate the problem. Whats actually happening is there are about 20 possible actions, that can happen on a given turn period, with 6 periods to a turn. So generating a given possible turn is a simple combination permutation problem. You can have repeat actions(to an extent). So i,j,k,l,m,n make up the 6 chosen actions for the 6 periods. I cant know if i is valid till choice until I know what the rest are, hence having to break from the loop for n,m,j,l,k, and j to get back to i to be able to continue that loop. Assuming we have found valid possible combination we score it and if its greater than anything we have found so far its the current choice for the turn.
Yes and no, assuming i is invalid and im not using goto, break is correct for all loops but i, however for i we do want to continue looking at other actions for that time period so continue is the correct not break, in this example.
Without pasting 1500 lines of code not sure your going to see how it all ties together. A lot of effort to minimize what happens in these 6 loops has been done, there are ZERO allocations happening inside the loops. Almost any and all math and logic that can been pre done outside the loop has. So once were in the loops its pretty much as straight forward as possible.
So when not using goto and labels i ended up with this type of structure:
bool break_i;
bool cont_i;
bool break_j;
bool cont_j;
bool break_k;
bool cont_k;
bool break_l;
bool cont_l;
bool break_m;
bool cont_m;
for(int i=0;i<100;i++){
break_i = false;
cont_i = false;
for(int j=0;j<100;j++){
break_j = false;
cont_j = false;
for(int k=0;k<100;k++){
break_k = false;
cont_k = false;
for(int l=0;l<100;l++){
break_l = false;
cont_l = false;
for(int m=0;m<100;m++){
break_m = false;
cont_m = false;
for(int n=0;n<100;n++){
//logic doing stuff with i,j,k,l,m,and n all together its from here we realize we need to continue the for loop for say i or k for instance.
if(some_condition){
break_m = true;
break_l = true;
break_k = true;
break_j = true;
cont_i = true;
break;
} else if (other_condition){
break_m = true;
break_l = true;
break_k = true;
cont_j = true;
break;
}...
} if(condition_to_continue_n) {
continue;
}
}
if(break_m)
break;
if(cont_n)
continue;
}
if(break_l)
break;
if(cont_l)
continue;
}
if(break_k)
break;
if(cont_k)
continue;
}
if(break_j)
break;
if(cont_j)
continue;
}
if(break_i)
break;
if(cont_i)
continue;
}
Note how many times were setting resetting booleans just to be able to break or continue for each loop, n does not need the booleans because we obviously can just call break or continue while in it. As you can see if j is invlaidating the combinations we just set the boolenas differently before breaking from the n loop to break untill were at the end of the j loop then we continue that loop. When i say goto is 20% faster its because it does not require the 10 booleans being set and reset all the time, and also checking them in the 10 ifs at the end of each iteration of each loop. It just cleanly moves us to the continue with one clean simple call.
With all this said i did not arrive a goto first and actualy worte out many other solutions first, today i just on a whim decided to check this idea out, about using the taboo goto, and when profiling the section of code with these loops found it to be 20% faster. It does not break anything and serves the same purpose.
The reason iâm asking is because iâm hoping to find that someone has a novel idea that does not involve my boolean flags to break and continue the correct loops, that is not as risky as goto and labels.
See my new post above but, the problem to be solved is this:
I think the XY problem is two ways though in some cases(IE the technical support just wants to answer y to avoid x). One could make an analogy to a doctor who is asked to help a patient with a broke bone. If the doctor then asks the patient, how they broke it. With a response of it was from a fall when playing sports. If the doctor then says the solution(to how the broke it, and avoid breaking it again) is to not play sports, the doctor is looking to solve a root problem the patient is not looking to solve, or necessarily should be solved(ie the doctor does not want to have to patch the patients bone agian). Point being sometimes and i think this applies to my question rather than assuming there is some root problem to be solved, or needs to be solved, just focus on the question asked.
Well that is a big assumption to make, from âi noticed this method of breaking and continuing loops is farily fast, do you you have a recomened method of doing this/faster method of doing this?â
Ok so again, following your assumption that I cannot possibly be asking the right question, that the loops are the problem, do you know a faster way to generate combinations than for loops?
I mean im all ears to any suggestion however at the end of the day short of random number generation in a loop, which could have no smart building built into it, im not sure with out loops of some nature you can repeatedly generate all the combinations permutations to score. My Predict an Answer tm says we will eventually arrive at if were going for speed i should just pre-curate by hand a bunch of different combinations to fake/shortcut actually scoring all the other answers.
Admittedly the above would be faster to just score say 50 pre-thought out combinations, and i actually started there, but just didnât like the resulting AI. So were back to actually doing the work, And i do think what i have produces more better combinations for the npc given its stats.
Again your working under the assumption i have not already employed these and really am not looking for the question i ask instead of the question you seem to be trying to answer.
Understand, even if you do help me find a different solution to how to generate combinations, none of that still answers my question about breaking and continuing nestted loops, which is the original question that, im actually still quite curious about and want to hear answers to.
Without further addu now that i have gotten the rest out of that way the answers you have been waiting for.
To understand scoring, and if a certain combination permeation is valid or not probably just need to start from the top if were going down the rabbit hole.
So a turn is 6 periods as i have mentioned that you pick an action for. Have not mention yet there are different locations you can do each of thees actions at for each of these 6 periods. So in regards to scoring were actually scoring an action at a specific location, with respect to the npc were operating ons stats. Another choice not mentioned yet that the npc can pick form 4 travel methods when traveling between locations.
So npcs have for the moment 6 different general goals. Think money, food, happiness, ⌠The game has configurable maps of goal weights to actions, in various different ways, like there is a map of goal weights of actions during each time period, there is a map for different actions at different locations, âŚ
Going forward when i say map assume a single map or maps with in a map for simplicity of explanations sake sake.
Step 1 to deciding an npcs turn is calculating a value for each of these goals for an npc.
Step 2 is calcuate a map of scores given an npcs stats of traveling from one location to any other location, with respect to each possible travel method.
Step 3 is go through said map and pick the max score for traveling form each a to b of the 4 different travel methods, im going to refer to this map as travel_max we will come back to it later.
Step 4 is generating a map with keys that are our actions to lists of locations that are valid. We are doing this because each of the locations has sub-locations(gym,school,building x,building y,âŚ) that the npc may or may not have explored yet which is actualy the final destination the action is done at(think of it as you exercise at the gym in xyz city). So this map is saying the npc knows of a valid place to do x action at y locations. any action with no valid locations is never added to the map.
Step 5 is Generating a map of actions to there highest scoring travel locations from any location. For reference later lets call it action_location_max. basically were mashing travel max and the last map together, So basicy answering the question for a given last location, for our actions posible locations, what location has the max score. and store that location and score with respect to the action and last location.
step 6 is generating a map of time periods to actions to scores with respect to all of our goal weights for that time period, were not actual scoring every action rather only the ones that are keys in action_location_max. scores can be in general in a range from -1 to 1. Lets call this actinon_score
step 7 is generating a map of time periods to lists of possible actions for that given time period. So were looking at the scores of each action in step 6s map for a given period and taking any action with a score above 0. Why above zero? Well the next step is to use these lists in our for loop to generate possible turns, lets throw out any action that by score is harmful to our npc for a given time period, to lessen the number combinations and permutations we have to look at. Lets call this map time_period_possible
Now were finally to our for loops so we have 6 loops each loop looping through one of our 6 time periods lists in time_period_possible to assemble our final possible turn combination. While scoring and map generations above has thrown out most(in general it gets rid of about 60-70% of possible combinations, by filtering out actions that donât make sense at all) of our no good combinations, i still need to weed out things like picking for instance exercise for all 6 time periods, or eat for all 6 time periods if that is one of the generated possibility. These are the type of conditions that may need us to continue to roll forward any given time periods respective loop to the next action in the list. Of the combinations left our goal is to early exit the loop for anything that does not make sense, before we get to scoring. In general were basically counting the number of each actions in our list of 6 picked by the for loops and deciding weather or not to keep it base on the counts and other npc stats.
Scoring is as simple as looking up the location or value in either action_location_max, or action_score. Given the npc starts before period 1 at a location already its a matter of map lookups using all know values at this point. This take almost no time to do, both action_location_max and action_score are very small and totally fit in cache, and from comparing total time to complete turn generation ai profiling this scoring vs not, i can tell the calls to time it add as much or more time as the scoring itself. Note here were saving the location and travle method out of the maps. if we like the score.
There is a lot of details to all of this im leaving out, to keep this this short. Understand me fussing over using goto vs ifs and booleans is me shaving micro seconds off iterations, that add up. On average the for loops generate about 750,000 possible combinations to be nayed, or yayed and scored. this whole process takes about 10ms. Considering the combination permutation math on 20 actions in 6 different locations using 4 different possible travel methods is past trillions, were doing pretty good figuring out a turn only looking at under 1 million of them.
All in all i am actually happy with how all this works and with how fast it works. In my original post and question im asking more from a academic perspective of what would be the best solution to continuing a specific for loop in a nest of for loops.
If using a goto works and doesnât break anything, then sure, go with that. Having said that, in years as a professional software developer, Iâve never encountered a situation where goto was an appropriate approach, let alone the best or only, and I can count on one hand the number of times a break was needed.
Iâll also say that it can be difficult to determine these things even when Iâm staring at a full commit history for a PR. It may be that youâve found the most efficient way to handle your specific needs, at least in this nested loop approach, but without seeing the code itself, I donât think anybody will be able to give you very many useful suggestions other than echoing the generally accepted best practices that have been listed here.
Goto is quick, easy, and extremely dirty. Itâs the code equivalent of dousing your project in radioactive waste. So much garbage is left behind when you donât properly exit out of or end things by just ejecting out of them with something appropriate to structured programming, like the break statement.
Your example where you track the need to break with booleans and then cascade out of the for block the same way you procedurally cascaded in looks messier from a code perspective, but from an execution perspective it is much better, because it âtidies upâ after itself, with no risk of leaving garbage in memory behind it.
Heh, programmers love to hate goto. Honestly, using it within a function, over only a few lines, is not a problem. Especially since you clearly know how to phrase it without goto, and are confident that the logic is correct.
Have you tried looping in a different order? Something something prefetching cache sizes.
Are these variables basically continuous, as opposed to categorical? If so, consider that youâre searching a 6-dimensional space for the highest/hottest point, first do a low-resolution pass that iterates by 4s instead of 1s (hitting about 1/4000 of the points) to find the most promising sub-area, then searching exhaustively near there?
Sorry if Iâve asked this before in another thread, but have you tried running this on a GPU? This kind of nested loop seems like it would benefit.
This is not quite correct, the goto statement does correct transfer control from scope to scope in c++. Assuming your following good programming practices with RAII, in this example the goto just cuts to the chase and leaves no garbage behind.
The fact of the matter is itâs only if your not doing RAII would you run into trouble with goto, but then you would also with break or any similar mechanic that changes scope without a way to get back to it(once you break out of a loop that temporary scope is lost forever).
No no it does not, only if im following RAII can you make this assumption same as with goto. Determing wich method leaves behind more garbage is totally dependent on if i am properly seting up and tearing down objects.
Because goto in c++ is limited to jumping to other labels inside the function your calling goto in, in regards to memory management its totally question of sub scopes, the for loops in this case.
Consider:
for(int i=0;i<100;i++){
int* a = new int;
*a += i;
if(a>10){
break;
}
delete a;
}
Ok i know there is so much wrong with this, however it proves a point. Break here does not save you from sending the memory pointed to by int* a into the ether, and goto has the exact same problem. Neither is better to the other for leaving the scope of the for loop, and neither provides any safety the other does not, if your not paying attention to how you use it. The point is break totally is just as much a scope ejection as goto is, continue has the same problem also. Neither have any form of check to see if when leaving a scope you wonât leak memory.
You and I, are once again on the same wavelength, unfortunately each loops variable is not continuous. The combination however is(i,j,k,l,m,n). By continuing outer loop im skipping the iterations of the inner loops for that action in the outer loop. No point searching that space if i know it wonât contain wining combination.
Well i at one point looked into it, the hard part is moving all the data into video memory in an organized fashion, There are no complex types, in gpu land, so all the maps and player variables i would have to know ahead of time their exact sizes and configuratin, to know how far to iterate or look, when writing the compute shader. This is ignoring the fact that i have never written a compute shader in opengl directx or vulcan.
Ah. Yeah, if youâve got a lot of structures more complex and flexible than arrays and structs the GPU way would be pain. Worse changing those things to try new gameplay would become much more difficult. Letâs be thankful the CPU is so flexible and forgiving in comparison!
Other thoughts:
Iâm assuming youâre basically free to reorder the loop variables here, and have tried getting a speedup from that.
Could you say more about when iteration would skip, how far, and how thatâs detected?
Should we take the 0-100 bounds of iteration as accurate, or are some dimensions of size, say, <20 or >1000?
Are there any pairs or trios of these dimensions that are naturally related?
Are there dimensions that can be segmented into a few parts (A, B, C) such that some optimization becomes possible on the parts?
I remember from a previous thread weâve already been over various things like running the same code less often or substituting a simplified version for many agents.
At the risk of casually suggesting something extreme: if you really, really want to squeeze the hardware for all it has, have you considered writing the hot loop in assembly?
Ok were getting off topic of for loops and the fastest way to break out of an outer loop when ther is nested loops.
if your referring to an example its just that an example, technically what were doing is a c++ for each loop with a vector of numbers:
so there is a map of 6 vectors of numbers each vector representing the actions available to be picked from for that period.
see:
So basically, if in the game your only allowed to take an action 1 time out of the 6 periods if we make a combination with two or more of that action we need to continue the loop generating the second instance of it. this is done by simply counting the number of actions in any given combinations and checking if we exceeded the limit if there is one.
They are all related, as to segmenting most anything that can be solved by segmentation is done before we get to the loops.
Yes, and quickly discarded. That would assume I can understand and write assembly better than the c++ compiler, something i defiantly canât do(yet(**hopes and dreams)).