Sieve of Eratosthenes in C#

In my personal quest to find out if the programming language really matters, I present to you my first piece of code.  This is my reference implementation of Sieve of Eratosthenes in my preferred language, C#.  For this and all future language implementations, I'll put the code at the top (without comments) and then narrate my experience below.  Now, to the code!

IList<int> findPrimes(int max) {
    var vals = new List<int>((int)(max/(Math.Log(max)-1.08366)));
    var maxSquareRoot = Math.Sqrt(max);
    var eliminated = new System.Collections.BitArray(max + 1);                        

    vals.Add(2);

    for (int i = 3; i <= max; i+=2) {
	if (!eliminated[i]) {
	    if (i < maxSquareRoot) {
		for (int j = i * i; j <= max; j+=2*i)
		    eliminated[j] = true;
	    }
	    vals.Add(i);
	}
    }
    return vals;
}

I started by following the wikipedia definition and then optimized from there.

Algorithm Optimizations
I cut my work in half by treating the special case of '2'. We know that 2 is prime and all even numbers thereafter are not. So, we'll add two immediately and then start looping at 3 only checking odd numbers from there forward.

After we've found a prime, we only need to eliminate numbers from it's square and forward. Let's say we want to find all prime numbers up to 100 and we've just identified 7 as a prime. Per the algorithm, I'll need to eliminate 2*7, 3*7 ,4*7, 5*7, 6*7, 7*7 ,8*7 ,9*7, 10*7 ,11*7, 12*7 ,13*7 and 14*7. None of the even multiples matter (even times an odd is always even) and none of the multiples up to the square of the prime matter since we've already done those multiples in previous loops. So really we only have to eliminate 7*7, 9*7, 11*7 and 13*7. That's a 9 fewer iterations and those savings become more fruitful the deeper you go!

The last optimization is the square root calculation and check. We know from above that we only need to start eliminating beginning at the square of the current prime. Therefore it also makes sense that we can stop even trying once we get past the to square root of the max. This saves a bunch more iterations.

Language Optimizations
Originally I had started by returning an IEnumerable<int>. I wasn't using the list you see above and instead I was using yield return i. I really like that syntax, but once I got to the VB.net version (Coming Soon!), I didn't have a direct translation for the yield keyword. I took the lazy route in the VB version and just stuffed it all into a list and returned that. To my surprise it was faster! I went back and changed the C# version above and it performed better. I'm not sure why, but I'm going with it.

What do you think that you get when do a sizeof(bool) in C#? I was surprised to find out that my trusty booleans actually take up a whole byte instead of a single bit. I speculate that there is a performance benefit that all of your types fit into a byte level offset in memory. I was thrilled to find out that we have a BitArray class that is useful for situations above when you need to store a lot of booleans and you need them to only take up a bit in memory. I'm not sure it helped anything, but I feel better knowing I'm using the least amount of memory possible. :)

Conclusion
Despite the fact that I know C# really well, I'm very thrilled that I was able to learn a few things about the language. Also, I'm really happy with the performance of this reference implementation. On my machine (2.66 GHz Core2 Duo and 2 GB of RAM) I can find all of the primes under 1,000,000 in 19ms. I think I've squeezed all I can out of this version. Please let me know if you see something I missed or did wrong and I'll make adjustments.

EDIT: I just added one more optimization that's worth noting. Instead of constructing my list with an empty constructor, I can save a several milliseconds off the larger sets by specifying a start size of the internal array structure behind the list. If I set this size at or slightly above the end count of prime numbers, then I avoid a lot of costly array copying as the array bounds keep getting hit. It turns out that there is quite a bit of math involved in accurately predicting the number of primes underneath a given number. I chose to cheat and just use Legendre's constant with the Prime Number Theorem which is close enough for my purposes. I can now calculate all primes under 1,000,000 in 10ms on my machine. Neat!

Does the Language Matter?

For the past 5 years I've been head down in C# and I really enjoy the language.  While I consider myself a C# developer primarily, there are plenty of other languages I use on a day to day basis.

  • C# - Server side code for ASP.NET websites, internal windows forms applications, and utility applications which load and export data.
  • SQL - Database querying for CRUD operations (not so much anymore, NHibernate rocks my world) and for ad-hoc reporting.
  • PL/SQL - Triggers and a few user defined functions in the database.
  • Javascript - Client side code to make things move and to retrieve data asynchronously.
  • HTML - Web page markup.
  • CSS - Web page styling and positioning.
  • Regular Expressions - Pattern matching in strings.
  • XSLT - Transforming XML documents into other outputs.
  • XPath - Querying XML documents.
  • Batch Files/Scripts - Command line programming to automate tasks, move around the file system, and do stuff to files.

There may be a few more that I'm not thinking about.  The point is that I'm not really "just a C# developer." I'm sure if you sit down and think about it, you use quite a few languages yourself. So, while I'm most comfortable in C#, I feel confident that I could bang out code in just about any language.  I'm only more comfortable with C# because it's what I've been using the most frequently as of late.

This is a personal challenge to myself.  I want to explore a few languages and see what I can learn.  The .NET (C#, VB.NET, F#) languages should be easy just because they all share the same base libraries.  Beyond that, who knows what I'll discover.

From DataReader to Objects

It's been a while since I posted some actual code on my blog.  I'm working on a reporting project at work and ran into a case where I just wanted to execute some arbitrary SQL against an Oracle database and get some plain objects back.  Those objects will be passed into a crystal report and turned into something suitable for digestion by the managament of our company. Nothing fancy here.  

First I'll show you my extension methods and then follow with some explanation:

static Regex underscore = new Regex(@"(^|_)(.)");
static string convertName(string s) {
	return underscore.Replace(s.ToLower(), m => m.Groups[0].ToString().ToUpper().Replace("_",""));
}

static T ToObject<T>(this IDataRecord r) where T:new() {
	T obj = new T();
	for (int i = 0; i < r.FieldCount; i++) {
		var p=typeof(T).GetProperty(convertName(r.GetName(i)));
		if (p != null) {
			if (p.PropertyType == r[i].GetType())
				p.SetValue(obj, r[i], null);
			else {
				var c = TypeDescriptor.GetConverter(r[i]);
				if (c.CanConvertTo(p.PropertyType))
					p.SetValue(obj, c.ConvertTo(r[i],p.PropertyType), null);
			}
		}
	}
	return obj;
}

public static IEnumerable<T> GetObjects<T>(this IDbCommand c) where T : new() {
	using (IDataReader r = c.ExecuteReader()) {
		while (r.Read()) {
			yield return r.ToObject<T>();
		}
	}
}

The meat of the work happens in the extension method for IDataRecord called ToObject.  It's a generic method that tries to map columns to properties.  At our company, the naming convention for column names is underscore delimited identifiers since oracle knows nothing about case unless you use quotes. For my c# objects, I much prefer Pascal Case.  The convertName method gets called to perform this conversion for me.  Finally, the last interesting bit is the TypeDescriptor usage.  When there is a difference in the type coming from the DB and the type in the object for a particular field, I'm using TypeDescriptor to find a converter if possible. This is useful if, for example, the type coming from the DB is a decimal, but my class needs an int.  

The last method, GetObjects is just a wrapper around the ToObject method.  It fires up a datareader and calls ToObject for every row.

Finally, I'd like to show how this is being used. Assume the following silly reporting class.

public class EmployeePerformance{
	public string FirstName {get; set;}
	public string LastName {get; set;}
	public decimal HoursWorked {get; set;}
	public int WidgetsCreated {get; set;}
}

All we have to do now is create a DbCommand object and call our GetObjects extension method.  I'll give an example as if I were running some report just so you have some context.

var rpt=new EmployeePerformanceReport();

using (var cx = new OracleConnection(MyConnectionString)) {
	cx.Open();
	using (var cmd = cx.CreateCommand()) {
		cmd.CommandText = @"
			SELECT
				first_name,
				last_name,
				hours_worked,
				widgets_created
			FROM
				employee_performance
			WHERE
				work_day between :startDate and :endDate
		";
		cmd.Parameters.Add("startDate",new DateTime(2008,1,1));
		cmd.Parameters.Add("endDate",new DateTime(2008,12,31));
		rpt.SetDataSource(cmd.GetObjects<EmployeePerformance>());
	}
}

//Do something with your report

It's just that simple.  I'm sure there could be some caching/compilation of the mapping after the first object to speed things up a bit. Speed hasn't been an issue so far, so I didn't persue that any further. I just wanted to illustrate this technique that lets me avoid using a DataSet which I've come to loathe in most situations. Please let me know what you think.

LINQ Hype

So, I was growing a little weary of all of the hype surrounding LINQ.  I mean, could it really be that great?  Well, I've had a chance to use it in a real world situation and I'm still neutral on it.  For me, the best part is the other language features that even make LINQ possible.  I'm talking extension methods, lambda expressions, and anonymous types. 

I think I'd much rather type:

var specialPeople = myPeopleCollection.Where(p=>p.LastName=="Bush");

than this:

var specialPeople = from p in myPeopleCollection where p.lastName == "Bush" select p; 

I just really dig the extension methods that System.Linq brings.

Extension Methods meet the Database

A co-worker and I had a conversation about extension methods today. Somehow this triggered a thought that I needed to see what could be done about the database monotony. Here's the result.

/*
 * DbExtensions.cs
 * Author: Josh Bush (digitalbush.com)
 */

using System;
using System.Data;
using System.Collections.Generic;

namespace Bush.Data{
    public static class DbExtensions{        

        private delegate T DbAction(IDbCommand cmd);

        ///

        /// Helper method that does the connection and paramter setup.
        /// 

        private static T ExecuteDbAction(IDbConnection conn, string commandText, IDbDataParameter[] parameters,DbAction dba){
            if (conn.State != ConnectionState.Open)
                conn.Open();
            using (IDbCommand cmd = conn.CreateCommand()){
                cmd.CommandText = commandText;
                cmd.Connection = conn;
                if (parameters != null){
                    foreach (IDbDataParameter p in parameters)
                        cmd.Parameters.Add(p);
                }
                return dba(cmd);
            }
        }

        ///

        /// Extension method to execute a query and return the resulting records
        /// 

        public static IEnumerable Query(this IDbConnection conn, string commandText, params IDbDataParameter[] parameters){
            return ExecuteDbAction>(
                conn,
                commandText,
                parameters,
                queryHelper  //ugh, can't use yield inside of an anonymous method
            );
        }
        public static IEnumerable Query(this IDbConnection conn, string commandText){
            return conn.Query(commandText, null);
        }
        private static IEnumerable queryHelper(IDbCommand cmd){
            using (IDataReader r = cmd.ExecuteReader()){
                while (r.Read())
                    yield return r;
            }
        }

        ///

        /// Extension method to execute scalar query.
        /// 

        public static object QueryValue(this IDbConnection conn, string commandText, params IDbDataParameter[] parameters){
            return ExecuteDbAction(
             conn,
             commandText,
             parameters,
             cmd=>cmd.ExecuteScalar()
         );
        }
        public static object QueryValue(this IDbConnection conn, string commandText){
            return conn.QueryValue(commandText, null);
        }

        ///

        /// Extension method to execute a non-query.
        /// 

        public static int Execute(this IDbConnection conn, string commandText, params IDbDataParameter[] parameters){
            return ExecuteDbAction(
                conn,
                commandText,
                parameters,
                cmd => cmd.ExecuteNonQuery()
            );
        }
        public static int Execute(this IDbConnection conn, string commandText){
            return conn.Execute(commandText, null);
        }
    }
}

I haven't had a chance to really run this code through it's paces, but I did the following test. I used the MySql Connector, but the methods above should work for any database provider as long as it implements the IDbConnection interface.

using (MySqlConnection mConn = new MySqlConnection(ConnectionString)){
    foreach(IDataRecord r in mConn.Query("select post_title from posts"))
        Console.WriteLine(r["post_title"].ToString());
}

Compare that to the craptacular way of doing it without the extension methods.

using (MySqlConnection mConn = new MySqlConnection(ConnectionString)){
    mConn.Open();
    using (MySqlCommand mCmd = new MySqlCommand("select post_title from posts", mConn)){
        using (MySqlDataReader r = mCmd.ExecuteReader()){
            while (r.Read())
                Console.WriteLine(r["post_title"].ToString());
        }
    }
}

I like the improved look. I haven't had a chance to test this with transactions or much of anything else for that matter, so just consider this a proof of concept at the moment.

You can download the extension methods to see for yourself: DbExtensions.cs

Next Page »