For those of us that were raised with Z80 CPUs, it is abundantly clear that we are living in "times of plenty"; most developers no longer have to think about things like "fitting into memory" or "having enough stack space". Moore's Law provided orders of magnitude of improvements in both CPU speeds and memory sizes; so big in fact, that resource-wasting virtual machines dominate our everyday software, and we don't even feel it.
There are always exceptions, however: platforms where you still need to be careful about memory and CPU usage. Especially when the device operates in a critical environment (e.g. in orbit), you need to be very careful.
For the last 5 years I've been developing code for the European Space Agency, in the context of projects like TASTE; the target platform is an embedded Leon processor, an open-source SPARC derivative specifically designed for operating in space. You really don't want to run out of memory there - "blue screens of death" are too expensive when the nearest ground station operator... is some 36.000 km below :‑)
And what about stack? Stack usage also grows and diminishes as functions call other functions and return back... A "stack overflow" means a crashed program. If you use recursive functions, it's impossible to predict how much stack space you'll actually need at runtime; and even without recursive functions, when spawning the application, you need to know the total stack the code will use, and reserve it up-front - being absolutely sure that you won't run out of stack.
So... how do you enforce the restrictions? How do you cope with the stack requirements?
Space companies use custom-made tools and procedures that address these challenges - but just for the fun of it, we can do some hacking of our own :‑)
In our case, we will use it's ability to disassemble the ELF binary of our program:
bash$ cat > hello.c main() { int i=314159; printf("Hello, 10K times pi is %d\n", i); } (Ctrl-D) bash$ gcc hello.c bash$ objdump -d ./a.out ... Disassembly of section .text: ... 08048384 <main>: 8048384: 55 push %ebp 8048385: 89 e5 mov %esp,%ebp 804838a: 83 ec 20 sub $0x20,%esp 804838d: c7 44 24 1c 2f cb 04 movl $0x4cb2f,0x1c(%esp) 8048394: 00 8048395: 8b 44 24 1c mov 0x1c(%esp),%eax 8048399: 89 44 24 04 mov %eax,0x4(%esp) 804839d: c7 04 24 70 84 04 08 movl $0x8048470,(%esp) 80483a4: e8 0f ff ff ff call 80482b8 <printf@plt> 80483a9: c9 leave 80483aa: c3 ret 80483ab: 90 nopSo, what do we see here?
Apparently, there's a lot of information we can gather, by simple text-processing of this output.
ADDRESS OPCODES sub $0x20,%espIn fact, GCC is more creative, sometimes:
ADDRESS OPCODES add $0xFFFFFF84,%esp...so the stack pointer is decreased via an 'add' instruction, which adds a negative value. Update: Fabien Sanglard pointed me to the reason behind this; it's a size (opcode-wise) optimization.
... import re ... functionNamePattern = re.compile(r'^(\S+) <([a-zA-Z0-9_]+?)>:') callPattern = re.compile(r'^.*call\s+\S+\s+<([a-zA-Z0-9_]+)>') stackUsagePattern = re.compile(r'^.*[add|sub]\s+\$(0x\S+),%esp')
... match = functionNamePattern.match(line) if match: # print "Detected function:", match.group(1) functionName = match.group(1)
... callgraph = {} stackUsagePerFunction = {} for line in os.popen("objdump -d \"" + sys.argv[1] + "\"").readlines(): # Search for function start match = functionNamePattern.match(line) if match: functionName = match.group(1) callGraph[functionName] = set() # this will store called func names foundStackUsage = False # Search for function calls call = callPattern.match(line) if functionName != "" and call: calledFunction = call.group(1) callGraph[functionName].add(calledFunction) # Search for the first stack lowering opcode if not foundStackUsage and functionName!="": stackMatch = stackUsagePattern.match(line) if stackMatch: # make sure we dont re-update the stackusage for this function foundStackUsage = True value = stackMatch.group(1) # sub $0x46,%esp value = int(stackMatch.group(1), 16) # unfortunately, GCC may also write: # add $0xFFFFFF86,%esp if value > 2147483647: value = 4294967296-value stackUsagePerFunction[functionName] = value
But before we do that, we have to make sure that there are no "cycles" in the call graph - that there is no recursion whatsoever. Here's a standalone example of an algorithm that detects cycles:
#!/usr/bin/env python import sys callGraph = { 'a': ['b', 'c'], # a calls b and c 'b': ['d'], # b calls d 'c': ['e'], # etc 'd': [], 'e': ['a'] # cycle # 'e': [] # no cycle } def SearchForCycles(path): print path node = path[-1] # Take the last step in the chain so far neighbours = callGraph[node] # fetch its "neighbours" for neighbour in neighbours: if neighbour in path: # see if we've met one previously print "Detected cycle", path, "->", neighbour sys.exit(1) # add neighbour to the path, check the new, longer path SearchForCycles(path + [neighbour]) for node in callGraph.keys(): SearchForCycles([node])
bash$ ./detectCycles.py ['a'] ['a', 'b'] ['a', 'b', 'd'] ['a', 'c'] ['a', 'c', 'e'] Detected cycle ['a', 'c', 'e'] -> aOnce we've made sure that there are no cycles in the call graph, we can now safely accumulate stack usage per path: For each function, we report the max of the accumulated step costs in all call chains that start from it.
# Calculate the total stack usage of each function, taking into account who it calls def findStackUsage(foo, stackUsagePerFunction, callGraph, cache={}): if foo in cache: # memoization return cache[foo] if foo not in stackUsagePerFunction: return 0 if foo not in callGraph or not callGraph[foo]: cache[foo]=stackUsagePerFunction[foo] return stackUsagePerFunction[foo] # the largest of the possible call chains totalStackUsage = max( ((stackUsagePerFunction[foo] + findStackUsage(x, stackUsagePerFunction, callGraph)) for x in callGraph[foo])) cache[foo]=totalStackUsage return totalStackUsage
bash$ stackUsage.py renderer-2.x/src/renderer | c++filt | tail -15 788: Scene::renderAmbient(Camera const&, Screen&) 872: Scene::renderRaytracer(Camera&, Screen&, bool) 1012: Scene::renderGouraud(Camera const&, Screen&) 1040: Scene::renderPhongAndShadowed(Camera const&, Screen&) 1048: lib3ds_chunk_dump_info 1088: Scene::renderPhong(Camera const&, Screen&) 1108: lib3ds_node_read 1124: texture_map_read 1184: lib3ds_material_read 1340: mdata_read 1416: lib3ds_file_read 1460: lib3ds_file_load 1652: lib3ds_mesh_calculate_normals 4080: Scene::load(char const*) 4096: main...so I can see exactly how much stack space is used by each function. More importantly, I can see whether cycles (recursion) exists:
bash$ stackUsage.py renderer-2.x/src/renderer | c++filt ... Detected cycle: ['lib3ds_file_insert_node', 'lib3ds_file_insert_node'] Detected cycle: ['CountDepth(BVHNode*, int, int&)', 'CountDepth(BVHNode*, int, int&)'] Detected cycle: ['Scene::PopulateCacheFriendlyBVH(Triangle*, BVHNode*, unsigned int&, unsigned int&)', 'Scene::PopulateCacheFriendlyBVH(Triangle*, BVHNode*, unsigned int&, unsigned int&)'] Detected cycle: ['CountBoxes(BVHNode*)', 'CountBoxes(BVHNode*)'] Detected cycle: ['CountTriangles(BVHNode*)', 'CountTriangles(BVHNode*)'] ...There - the script detects recursive functions. It also detects "deeper" cycles: here's cycle detection in a SPARC/Leon Ada binary:
bash$ stackUsage.py /path/to/leonBinaryMadeWithAda ... Detected cycle: [ '__gnat_raise_nodefer_with_msg', '__gnat_last_chance_handler', '__gnat_rcheck_19', '__gnat_raise_program_error', '__gnat_raise_nodefer_with_msg' ] ...Mission accomplished :‑)
[1] In fact, the latest versions of GCC support the -fstack-usage option, which creates stack usage per function (individually) upon compiling the code. Unfortunately, compilers for embedded systems usually take a while to... catch up, in terms of these features. Parsing the stack "decrease" at each function startup is for now, a good-enough workaround. Note that for SPARC/Leon compilers, the corresponding stack search pattern is... ADDRESS OPCODES save %sp, -AMOUNT, %sp, so the final version of the script uses separate regexps for each architecture.
Note that the calculated stack usage per function assumes a single thread running; If there are N threads spawned in our binary, then of course the stack usage scales correspondingly.
Finally, if you compile with debug info enabled (-g) you will get much better output - many symbols are not visible without it.
Index | CV | Updated: Tue Jun 13 21:43:21 2023 |