Optimising Tail Recursion and Performance

Ryan Welch / November 14, 2015

Recursion is a very common paradigm in programming where you invoke a functon call on itself. Tail recursion is a special kind of recursion where the function call to itself is at the end of the function. It is significant in that it can always be redesigned as a loop which has much better performance and removes the overhead of calling the function repeatedly and growing the stack.

Compiler optimisations

Modern compilers are incredibly complex and good at their jobs, some modern compilers will optimise tail recursion for us, meaning that we can enjoy the benefits of writing recursively whilst everything is taken care for us under the hood.

To prove that tail recursion is optimised we will set up a simple experiment.Below is a simple C program that recursively calls the function count() and adds 1 to our total num each time until we reach 500000 where we return.

count.c1#include <stdio.h>
2
3int count(int num) {
4    if(num == 500000) {
5        printf("We did it!\n");
6        return 0;
7    }
8
9    num = num + 1;
10
11    return count(num);
12}
13
14int main() {
15    count(0);
16    return 0;
17}
18


The program should print out We did it once we reach 500000. So let's compile it with gcc, and then test it.

1$gcc recursion.c -o recursion 2$ ./recursion
3Segmentation fault (core dumped)
4


That's not what we expected, we get the very descriptive error message Segmentation fault instead.

Taking a closer look and debugging our program in gdb shows the function count() called thousands of times, this is not what we expected. We expected the compiler to optimise our recursive function into a loop so we should not be seeing the count() function being called more than once.

It turns out after reading the documentation that the compiler doesn't optimise our code by default, so let's try again with optimisations turned on. The flag -O3 tells gcc to use maximum optimisation and apply all of them to our code. A list of optimisations and what they do is here.

1$gcc recursion.c -O3 -o recursion 2$ ./recursion
3We did it!
4


Success! Our program runs perfectly without any core dumps.

Analysing the optimisations

We're not done yet though! Let's take a look at how the compiler actually optimises these calls. I'm dumping the assembly code generated by gcc by using the -S flag which tells gcc to stop just before it compiles the assembly code.

I've stripped away some of the assembly code and only left the relevant bits to make it easier to read.

1$gcc recursion.c -S -o recursion 2  recursion_unoptimised.asm1count: 2.LFB0: 3 .cfi_startproc 4 pushq %rbp 5 .cfi_def_cfa_offset 16 6 .cfi_offset 6, -16 7 movq %rsp, %rbp 8 .cfi_def_cfa_register 6 9 subq$16, %rsp
10    movl    %edi, -4(%rbp)
11    cmpl    $500000, -4(%rbp) 12 jne .L2 13 movl$.LC0, %edi
14    call    puts
15    movl    $0, %eax 16 jmp .L3 17.L2: 18 addl$1, -4(%rbp)
19    movl    -4(%rbp), %eax
20    movl    %eax, %edi
21    call    count
22.L3:
23    leave
24    .cfi_def_cfa 7, 8
25    ret
26    .cfi_endproc
27.LFE0:
28    .size   count, .-count
29    .globl  main
30    .type   main, @function
31


As you can see in this unoptimised version, we have a call to count from within the count function.

The cmpl instruction computes a conditional (whether the current count is equal to 500000) and sets a flag. The instruction jne is a conditional branch that jumps to the location L2 if the flag is false, L2 then calls count.

The jmp call is an unconditional branch and terminates the program once the flag is true. This is clearly unoptimised and will add a new level to the stack each time count is called eventually leading to a stack overflow.

1$gcc recursion.c -S -O3 -o recursion 2  recursion_optimised.asm1count: 2.LFB24: 3 .cfi_startproc 4 subq$8, %rsp
5    .cfi_def_cfa_offset 16
6    movl    $.LC0, %edi 7 call puts 8 xorl %eax, %eax 9 addq$8, %rsp
10    .cfi_def_cfa_offset 8
11    ret
12    .cfi_endproc
13.LFE24:
14    .size   count, .-count
15    .section    .text.startup,"ax",@progbits
16    .p2align 4,,15
17    .globl  main
18    .type   main, @function
19


The compiler does a much better job with the optimisations on, there are no branches as in the previous code and there are no calls to the count function from within itself. So it has successfully optimised the code to exclude tail recursion.

Closing thoughts

Whilst gcc does a good job at optimising it does not apply optimisations by default. Languages such as Java do not optimise tail recursion at all. Therefore where recusrion is critical I strongly recommend you to not rely on the compilers optimisations. Having said that, there's plenty of cases where using recursion makes your life much easier.