Sieve of Eratosthenes in C#

In my personal quest to find out if the programming language really matters, I present to you my first piece of code.  This is my reference implementation of Sieve of Eratosthenes in my preferred language, C#.  For this and all future language implementations, I'll put the code at the top (without comments) and then narrate my experience below.  Now, to the code!

IList<int> findPrimes(int max) {
    var vals = new List<int>((int)(max/(Math.Log(max)-1.08366)));
    var maxSquareRoot = Math.Sqrt(max);
    var eliminated = new System.Collections.BitArray(max + 1);                        

    vals.Add(2);

    for (int i = 3; i <= max; i+=2) {
	if (!eliminated[i]) {
	    if (i < maxSquareRoot) {
		for (int j = i * i; j <= max; j+=2*i)
		    eliminated[j] = true;
	    }
	    vals.Add(i);
	}
    }
    return vals;
}

I started by following the wikipedia definition and then optimized from there.

Algorithm Optimizations
I cut my work in half by treating the special case of '2'. We know that 2 is prime and all even numbers thereafter are not. So, we'll add two immediately and then start looping at 3 only checking odd numbers from there forward.

After we've found a prime, we only need to eliminate numbers from it's square and forward. Let's say we want to find all prime numbers up to 100 and we've just identified 7 as a prime. Per the algorithm, I'll need to eliminate 2*7, 3*7 ,4*7, 5*7, 6*7, 7*7 ,8*7 ,9*7, 10*7 ,11*7, 12*7 ,13*7 and 14*7. None of the even multiples matter (even times an odd is always even) and none of the multiples up to the square of the prime matter since we've already done those multiples in previous loops. So really we only have to eliminate 7*7, 9*7, 11*7 and 13*7. That's a 9 fewer iterations and those savings become more fruitful the deeper you go!

The last optimization is the square root calculation and check. We know from above that we only need to start eliminating beginning at the square of the current prime. Therefore it also makes sense that we can stop even trying once we get past the to square root of the max. This saves a bunch more iterations.

Language Optimizations
Originally I had started by returning an IEnumerable<int>. I wasn't using the list you see above and instead I was using yield return i. I really like that syntax, but once I got to the VB.net version (Coming Soon!), I didn't have a direct translation for the yield keyword. I took the lazy route in the VB version and just stuffed it all into a list and returned that. To my surprise it was faster! I went back and changed the C# version above and it performed better. I'm not sure why, but I'm going with it.

What do you think that you get when do a sizeof(bool) in C#? I was surprised to find out that my trusty booleans actually take up a whole byte instead of a single bit. I speculate that there is a performance benefit that all of your types fit into a byte level offset in memory. I was thrilled to find out that we have a BitArray class that is useful for situations above when you need to store a lot of booleans and you need them to only take up a bit in memory. I'm not sure it helped anything, but I feel better knowing I'm using the least amount of memory possible. :)

Conclusion
Despite the fact that I know C# really well, I'm very thrilled that I was able to learn a few things about the language. Also, I'm really happy with the performance of this reference implementation. On my machine (2.66 GHz Core2 Duo and 2 GB of RAM) I can find all of the primes under 1,000,000 in 19ms. I think I've squeezed all I can out of this version. Please let me know if you see something I missed or did wrong and I'll make adjustments.

EDIT: I just added one more optimization that's worth noting. Instead of constructing my list with an empty constructor, I can save a several milliseconds off the larger sets by specifying a start size of the internal array structure behind the list. If I set this size at or slightly above the end count of prime numbers, then I avoid a lot of costly array copying as the array bounds keep getting hit. It turns out that there is quite a bit of math involved in accurately predicting the number of primes underneath a given number. I chose to cheat and just use Legendre's constant with the Prime Number Theorem which is close enough for my purposes. I can now calculate all primes under 1,000,000 in 10ms on my machine. Neat!

  • Mike

    Nice work, Josh, but my version is consistently faster. For example, to calculate all primes to 1,000,000,000 your version takes about 36 seconds, mine around 31. Owned!

    public static List Sieve2(int target)
    {
    var isPrime = new BitArray(target, true);
    var primesList = new List { 2 };
    var squareRoot = Math.Sqrt(target);

    for (int candidate = 3; candidate < target; candidate +=2)
    {
    if (!isPrime[candidate]) continue;

    if (candidate < squareRoot)
    {
    for (int multiple = candidate*candidate; multiple < target; multiple += 2*candidate)
    {
    isPrime[multiple] = false;
    }
    }
    primesList.Add(candidate);
    }
    return primesList;
    }

  • SMM

    @Mike @ August 27th, 2010:

    You may want to test more thoroughly before you stick your foot in your mouth. Your algorithm was slightly faster with smaller values of ‘target’, but as that number grew, your ‘superior’ version was slower. This was tested on multiple boxes by a few people.

    How do you like ‘dem apples?

  • Atomic Paradox

    Hi Josh,

    Nice bit of coding! :) Taking up the challenge I added another optimisation that you might want to play around with. On my PC it went about 20% faster caching all primes <= 100M.

    Another area I looked at was the maxSquareRoot assignment which I believe compiles to a double. Any later comparisons will take that lil' bit longer to process than if you use an int.

    Here is the revised code:

    static public IList CachePrimesE( int max )
    {
    List vals = new List( (int)(max / (Math.Log( max ) – 1.08366)) );
    int maxSquareRoot = (int)Math.Sqrt( max );
    BitArray eliminated = new BitArray( max + 2 );

    // Add the first prime number
    vals.Add( 2 );
    vals.Add( 3 );
    vals.Add( 5 );

    // All primes greater than 3 can be written in the form 6k +/- 1. If we make i = 6k-1 then we process i and i + 2!
    for ( int i = 5; i <= max; i += 4 )
    {
    // Process i
    if ( !eliminated[i] )
    {
    if ( i < maxSquareRoot )
    {
    for ( int j = i * i; j <= max; j += 2 * i )
    eliminated[j] = true;
    }
    vals.Add( (uint)i );
    }

    // Process i + 2
    i += 2;
    if ( !eliminated[i] )
    {
    if ( i < maxSquareRoot )
    {
    for ( int j = i * i; j <= max; j += 2 * i )
    eliminated[j] = true;
    }
    vals.Add( (uint)i );
    }

    }

    return vals;
    }

  • Atomic Paradox

    I’m back! I had another look to see if I could optimise it any more. The speed is not markedly quicker but I think the optimisations will improve as max increases. The approach I took was as follows:

    I split the outer for loop into two while loops. The first while loop deals with all values of i < = maxSquareRoot. This way you can remove the test before the inner for loop of your original code.

    The second while loop then just harvests the flagged primes remaining in the sieve.

    In addition I calculated i2 = 2 x i outside the for loop rather than recalculate every time we go around the inner for loops.

    ///
    /// Use the Sieve of Eratosthenes to compile and return a list of prime numbers up to max
    /// Inspired by Josh Bush
    /// http://digitalbush.com/2010/02/26/sieve-of-eratosthenes-in-csharp/comment-page-1/#comment-3261
    ///
    /// Maximum prime number in the list.
    /// List of prime numbers in the range [2..max].
    static public IList CachePrimesE( int max )
    {
    IList vals = new List( (int)(max / (Math.Log( max ) – 1.08366)) ); // Create the list to hold the primes
    int maxSquareRoot = (int)Math.Sqrt( max );
    BitArray eliminated = new BitArray( max + 2 ); // Create the bit array for our sieve

    // Add the first two prime numbers
    vals.Add( 2 );
    vals.Add( 3 );

    // All primes greater than 3 can be written in the form 6k +/- 1.
    // If we make i = 6k-1 then we process i and i + 2 through each iteration of the while loops!
    int i = 5;
    while(i <= maxSquareRoot)
    {
    // Process i
    if ( !eliminated[i] )
    {
    for ( int j = i * i, i2 = i + i; j <= max; j += i2 )
    eliminated[j] = true;
    vals.Add( (uint)i ); // Add i to the prime number list
    }

    // Process i + 2
    i += 2;
    if ( !eliminated[i] )
    {
    for ( int j = i * i, i2 = i + i; j = SQR(max) so simply harvest the remaining primes in the sieve
    while(i <= max)
    {
    if ( !eliminated[i] )
    vals.Add( (uint)i ); // Add i to the prime number list
    i += 2;
    if ( !eliminated[i] )
    vals.Add( (uint)i ); // Add i to the prime number list
    i += 4;
    }
    Console.WriteLine( vals.Count );
    return vals;
    }

  • Alain Pech

    Hi,
    with the silly case max=9, the result is not OK; Result is 2, 3, 5, 7 and 9.
    Replace if (i < maxSquareRoot) { by f (i <= maxSquareRoot) {