-
-
Save pervognsen/a002735fd626a481222e5204ac15b11e to your computer and use it in GitHub Desktop.
// For N = 11, a call to perms(p, N, 0) only takes 0.36 seconds. | |
// Why would a debugger seemingly hang forever when you try to | |
// step over the outermost recursive call to perms in the loop? | |
// Hint: How is "step over" ("next" in gdb/lldb) implemented? | |
u64 perms(u64 p[N], int n, u64 s) { | |
if (n == 1) { | |
s += signature(p); | |
} else { | |
for (int i = 0; i < n; i++) { | |
swap(p[0], p[i]); | |
s = perms(p + 1, n - 1, s); | |
swap(p[0], p[i]); | |
} | |
} | |
return s; | |
} |
I wrote "seemingly hang forever" but it's "just" intractably slow. For N = 12 there are about a hundred million recursive returns to the return address but only one of them corresponds to the outer call you wanted to step over. There are different ways you can implement the return filtering in a debugger but the most common technique I've seen is to put a breakpoint on the return address after the call. When the breakpoint triggers, the debugger will check the stack pointer against the value that was captured before entering the step-over call. If the pointers don't match, it means the breakpoint triggered for a nested return instead of the outer return. Thus the debugger asks the program to resume execution with the breakpoint still in place. If the stack pointer does match, it removes the breakpoint and hands control to the debugger user interface.
Oh, I see. That leads me to believe that "smart" singlestepping should have no problem with this. Lemme run remedy and check.
upd: nah, it's just as bad if not worse. I've got a long way to go with my understanding it seems..
This gets really annoying sometimes when debugging remote targets.
Maybe if compilers modified the return address after returning from a function, e.g.
call some_function
push 0
add rsp,8
then debuggers could set a watch breakpoint to detect function exit?
You can generate a little return shim whose address you push on the stack before starting execution at the step-over call target's address. The shim just does int3 but has unwind info, a proper prologue and whatever else is needed so that generic platform-compliant stack trace logic will be able to walk past it while your debugger's stack tracer can hide it from the user. Another side benefit of this approach (which I haven't seen any debugger do) is that it will handle longjmp/non-local exits more sanely when you longjmp past a step-over stack pointer mark.
Ok, so I got a little further with my understanding of this (implemented the "effectively hangs" version myself), and I have a further question if that's ok.
Regarding Wons comments on "implementing CFA and strategically placing traps": is it worth it at all? My reasoning is as follows: you want to do a single step, which effectively means executing all instructions on a single line of source code. But how many instructions could that be, if we skip over call
and rep whatevs
? I mean yes, you could put all your code on one line, but in a real situation, how many instructions would a single source line map to? A few dozen?
That's is my argument for "single stepping until the heat death of the universe".
What are your thoughts on this?
I've never bothered doing CFA but the strongest case for something like that is implementing efficient "step over" for inlined calls. I haven't thought it through in detail, though.
There are at least two ways I know of [how "next" is implemented], but neither should hang on this.
The first method (used by lldb IIRC) involves actually interperting the machine code yourself (in lldb they convert it to LLVM IR and run that) to get the list of breakpoint locations. I always assumed that the point of this approach is to analyze the IR somehow, but that is as far as my understanding goes.
The second method (used by RemedyBG) is just to do singlesteps until the line number changes. But George explicitly mentioned that he skips over
call
,rep xxx
etc (skipping over those is safe in terms of determining the line number as far as I understand).