16 July, 2011

effin around with ibonacci






First off, thanks to Alex Gorbatchev for his awesome SyntaxHighlighter. I've used it here with his great instructions for embedding. I took his instructions here using the "pre" method and just embedded the scripts within the script tags of this post instead of referencing them. If I use this again, I'll figure out how to upload the scripts I need to blogger, so I can just reference them. Anyway... this is really high quality stuff, nice work Alex.

So, I was playing around with, or did I say effin around with ibonacci? (Yes, putting the "f" in fibonacci.) I wanted to try out a few variations of Fibonacci algos to do some speed comparisons. I was knew that recursion would take a bit of a hit, but I was really surprised that it would be this bad.

recurFib(45) took 6813 milliseconds to calculate 1134903170
iterFib(45) took 0 milliseconds to calculate 1134903170
iterNoArrFib(45) took 0 milliseconds to calculate 1134903170

I love the elegance of recursive code.

 public static int recurFib(int n) {
  if (n == 0) {
   return n;
  }
  if (n <= 2) {
   return 1;
  }
  return recurFib(n - 1) + recurFib(n - 2);
 }

As opposed to the faster-performing iterative code.

public static int iterNoArrFib(int n) {
  int f = 0;
  //special case #1
  if (n == 0) {
   return f;
  }
  
  int x = 1;
  int y = 1;
  //special case #2
  if (n <=2) {
   return x;
  }
  
  for (int i = 3; i <= n; i++ ) {
   f = x + y;
   //set up x and y for next loop
   y = x;
   x = f;
   
  }
  return f;
 }

So, I tried a couple of things. First, I tried to neaten up the iterative version with an array instead of shuffling through the variables:

public static int iterFib(int n) {
  int[] f = new int[n+1];
  f[0] = 0;
  f[1] = 1;
  if (n >= 2) {
   f[2] = 1;
  }
  for (int i = 3; i <= n; i++ ) {
   f[i] = f[i - 1] + f[i - 2];
   
  }
  return f[n];
 }

It was no faster or slower (by my crude benchmark), but a little cleaner. Then, I tried to speed up our recursive version with a little caching. I was pretty sure that the recursive version recalculates all of the children for each node of the tree, so, I added an array to cache values, checking to see if it was already calculated before doing a recursive call:

private static int[] recurCache ;
 
 public static int recurCacheFib(int n, boolean first) {
  
  recurCache = (recurCache == null || first)? new int[n+1] : recurCache ;
  
  if (n == 0) {
   recurCache[n] = 0;
   return n;
  }
  if (n <= 2) {
   recurCache[n] = 1;
   return 1;
  }
  
  int f = recurCache[n];
  if (f == 0) {
   f = recurCacheFib(n - 1,false) + recurCacheFib(n - 2,false);
   recurCache[n] = f;
  }
  
  return f;
 }

Well, low and behold, it was just as fast as the iterative version:

recurFib(45) took 6813 milliseconds to calculate 1134903170
recurCacheFib(45) took 0 milliseconds to calculate 1134903170
recurCacheFib(45) took 0 milliseconds to calculate 1134903170 a second time
iterFib(45) took 0 milliseconds to calculate 1134903170
iterNoArrFib(45) took 0 milliseconds to calculate 1134903170

I ran it twice just to see if I would take much of a hit recreating the array for the cache. Apparently not.

However, when I compare the cached recursion code to the array-based version of iterative code, I think the array-based version is slightly more elegant. This is mostly due to the addition of the external array and not just lines of code. What do you think? The entire code is below.

package org.bikeonastick.examples;

import java.util.ArrayList;
import java.util.List;

public class Fibonacci {
 
 public static void main(String[] args) {
  int toFib = Integer.parseInt(args[0]);
  long start = System.currentTimeMillis();
  int fib = Fibonacci.recurFib(toFib);
  long finish = System.currentTimeMillis();
  System.out.println("recurFib(" + toFib +") took " + (finish - start) + " milliseconds to calculate " + fib);
  
  start = System.currentTimeMillis();
  fib = Fibonacci.recurCacheFib(toFib,true);
  finish = System.currentTimeMillis();
  System.out.println("recurCacheFib(" + toFib +") took " + (finish - start) + " milliseconds to calculate " + fib);

  start = System.currentTimeMillis();
  fib = Fibonacci.recurCacheFib(toFib,true);
  finish = System.currentTimeMillis();
  System.out.println("recurCacheFib(" + toFib +") took " + (finish - start) + " milliseconds to calculate " + fib + " a second time");

  
  start = System.currentTimeMillis();
  fib = Fibonacci.iterFib(toFib);
  finish = System.currentTimeMillis();
  System.out.println("iterFib(" + toFib +") took " + (finish - start) + " milliseconds to calculate " + fib);
  
  start = System.currentTimeMillis();
  fib = Fibonacci.iterNoArrFib(toFib);
  finish = System.currentTimeMillis();
  System.out.println("iterNoArrFib(" + toFib +") took " + (finish - start) + " milliseconds to calculate " + fib);
  
 }
 
 public static int recurFib(int n) {
  if (n == 0) {
   return n;
  }
  if (n <= 2) {
   return 1;
  }
  return recurFib(n - 1) + recurFib(n - 2);
 }
 
 private static int[] recurCache ;
 
 public static int recurCacheFib(int n, boolean first) {
  
  recurCache = (recurCache == null || first)? new int[n+1] : recurCache ;
  
  if (n == 0) {
   recurCache[n] = 0;
   return n;
  }
  if (n <= 2) {
   recurCache[n] = 1;
   return 1;
  }
  
  int f = recurCache[n];
  if (f == 0) {
   f = recurCacheFib(n - 1,false) + recurCacheFib(n - 2,false);
   recurCache[n] = f;
  }
  
  return f;
 }

 
 public static int iterFib(int n) {
  int[] f = new int[n+1];
  f[0] = 0;
  f[1] = 1;
  if (n >= 2) {
   f[2] = 1;
  }
  for (int i = 3; i <= n; i++ ) {
   f[i] = f[i - 1] + f[i - 2];
   
  }
  return f[n];
 }
 
 public static int iterNoArrFib(int n) {
  int f = 0;
  //special case #1
  if (n == 0) {
   return f;
  }
  
  int x = 1;
  int y = 1;
  //special case #2
  if (n <=2) {
   return x;
  }
  
  for (int i = 3; i <= n; i++ ) {
   f = x + y;
   //set up x and y for next loop
   y = x;
   x = f;
   
  }
  return f;
 }

}

3 comments:

Flekkzo said...

Ah, nerding out with optimizations :) First of all, Java is a pretty horrible language, as computer languages go, and its optimizations are quite funky. Also it seems like standard java lacks tail call recursion, which should make quite a difference. And you need more iterations to make sure no odd caches etc skews your results :)

I ran each test 10000 times in a for loop (and skipped the first test completely, apparently method calling in Java is silly stupid bad) for Fibonacci 5000 and found this:


recurCacheFib(5000) took 7432 milliseconds to calculate -1846256875
recurCacheFib(5000) took 7345 milliseconds to calculate -1846256875 a second time
iterFib(5000) took 1653 milliseconds to calculate -1846256875
iterNoArrFib(5000) took 412 milliseconds to calculate -1846256875

Remember, using more memory is bad, as that will trash your CPUs caches and, well, not good :)

Turns out C isn't too awesome either, fib 45 took around 14 seconds with gcc, 4 with -fast or -O3.

OK, had to check, seems like Java does indeed do some tail call optimizations. I'll leave that one up to the reader, but it is WAY faster than the basic version :)

Two style things. I vastly prefer to be real clear when I code, and I can't fathom why the curse that is "one return" has been brought upon us. Just as I like "fail fast", I do like "return early". As soon as we know we already know the answer, return. Oh, and return the constant, not the variable, if we do know:)

Now I want to go optimize something, hmm. Not good.

Unknown said...

good one! yes, it seems that the lightest footprint (iterative with no backing array) is the best performer on the lengthier runs.

style... well, i've gone back and forth on the one return vs. return early over the years. i have, for the most part, settled on return early, but as you can see from the stylistic inconsistencies in my code, my pendulum must be somewhere in the middle, now.

my battle here was between choosing recursion for its terse elegance and iteration for its clunky looking performance. there are times, as a programmer, when you write code to be maintained and times when you write code to perform. while i do appreciate beautiful code, if the first thing people do who maintain it is to rewrite it to perform, you've failed, because that means it doesn't do what it needed to do.

thanks for jumping into the nerd out.

Flekkzo said...

The problem with only one return is that I have to read through the whole method and figure things out so I can follow the code. Not good.

And yes, optimizations should be the exception. A smart thing is to keep both code paths, helps with testing (i.e. both should do the same thing).

Calculating the Fibonacci sequence isn't really a performance issue that one would run into that often. Most of the time it pays off to write simple code and hope the compiler can smooth things out.