LINQ and Extension methods

Have you ever wished that a base class had a particular method? What about interfaces? Wouldn’t it be great to define a method on an interface along with its implementation? Any class that then implemented the interface would get this implementation for free.

In the past this was achieved with static utility classes. Unfortunately this leads to cluttering your code with the names of these utility classes and dilute the expressiveness of your code. Let’s say we have a utility class the gets the words and word count from a string. Don’t worry too much about the implementation, just the general structure.

public static class StringUtilities
{
   private static readonly Regex wordsRegex = new Regex(@"\w+");
   public static IEnumerable<string> GetWords(string source)
   {
      return from word in wordsRegex.Matches(source).Cast<Match>()
             select word.Value;
   }

   public static int WordCount(string source)
   {
      return GetWords(source).Count();
   }
}

To use this in our code we would have to do something like this:

var sentence = "The quick brown fox jumps over the lazy dog";
// Display each of the words
foreach (var word in StringUtilities.GetWords(sentence))
{
   Console.WriteLine(word);
}
// Display the word count
Console.Write("Total Words: ")
Console.WriteLine(StringUtilities.WordCount(sentence));

Look at all that clutter. The truth in this context is that we are really performing an action on the sentence. Wouldn’t it be better if we could just call sentence.GetWords()or sentence.WordCount()instead? It would certainly be more readable. Extension methods make this all possible. Here’s our updated StringUtilities class that creates the extension methods:

public static class StringUtilities
{
   private static readonly Regex wordsRegex = new Regex(@"\w+");
   public static IEnumerable<string> GetWords(this string source)
   {
      return from word in wordsRegex.Matches(source).Cast<Match>()
             select word.Value;
   }

   public static int WordCount(this string source)
   {
      return GetWords(source).Count();
   }
}

We’ve added this before the variable type. The rest of the code has been left untouched. So now we can use the extension methods like so:

var sentence = "The quick brown fox jumps over the lazy dog";
// Display each of the words
foreach (var word in sentence.GetWords())
{
   Console.WriteLine(word);
}
// Display the word count
Console.WriteLine("Total Words: {0}", sentence.WordCount());

Doesn’t that read better? We have been able to push the implementation details (the name of the static utility class) out of our code.

How to enable an extension method

In order to use an extension method it must be part of the local namespace or imported with a using statement. Once that’s done you can call extension methods just as you would any normal method.

What does this have to do with LINQ?

LINQ is all about extension methods. When you import the System.Linq namespace it comes with a whole bundle of extension methods. Most of them act on IEnumerable<T> and can be used to write your LINQ queries in method syntax. Let’s look at this query:

from item in items
where item.Price < 1
select item.Name

This query finds the items that are under one dollar and returns their names. We can write this query in method syntax like so:

items.Where(item => item.Price < 1).Select(item => item.Name)

It’s not quite as readable (although that is a matter of opinion), but it gives a good indication of what is going on (and further demonstrates why select is at the end). These methods also take advantage of Lambda expressions (which I’ll discuss in a future post).

There are other useful extension functions that work with queries. Some of the ones you’ll use most often are:

ToList() executes the query and returns the results in a list. You will probably use this method a lot. I’ll cover this method an its consequences in more depth in a future post on deferred execution.
Count() executes the query and returns the number of results. When used with LINQ to SQL it will execute SQL code to get the database server to return the count.
Any() returns true if there are any results in the query. Use this instead of Count() > 0 to abstract out the implementation detail.
First() returns the first result from the query. This is particularly useful when you have a query that will only return one result (such as looking up an entry based on its primary key). This method will throw an exception (InvalidOperationException) if the query yields no results.
FirstOrDefault() returns the first result from the query, much like First(). If there are no results it will return the default for the type (e.g. 0 for an int, null for reference types).

Fortunately you aren’t limited to using these extension methods on LINQ queries. They are designed to work on any class that implements IEnumerable<T>. This means you can use them directly on a lot of the classes already in the .NET base class library.

What about old non-generic `IEnumerable`?

There are a lot of classes in the .NET framework that don’t implement IEnumerable<T> but instead implement the non-generic interface IEnumerable. A perfect example is MatchCollection used by Regular expressions. When we enumerate over a MatchCollection we are given the base object which we then need to cast to a Match object. Until we do this cast we can’t access any of the properties of Match. Fortunately there are a couple of LINQ extension methods designed to help out when dealing with IEnumerable.

Cast<T>() returns a strongly typed IEnumerable<T> object. Each object is cast to the type T. If an object can’t be cast an exception is thrown (InvalidCastException). In the case of a MatchCollection I am confident that every object is a Match object and an exception won’t be thrown.
OfType<T>() also returns a strongly typed IEnumerable<T> object. It goes further than Cast<T>() by only including objects of that type in the enumeration. In other words it filters out any class that isn’t of the desired type (without throwing exceptions). This is the method to use when you are unsure of what the type will be or if you are dealing with an enumeration that contains different typed objects.

If you want to see OfType<T>() in action, copy and paste the following example into LINQPad. (You’ll need to select C# Statement(s) from the language drop down).

var items = new object[]{"a string", 22, Math.PI};  
items.OfType<string>().Dump("OfType<string>");
items.OfType<int>().Dump("OfType<int>");
items.OfType<double>().Dump("OfType<double>");

LINQPad has its own extension method Dump() which is used to output results to the LINQPad window. You’ll see that each individual dump returns a strongly typed IEnumerable<T> object. In this example items actually implemented IEnumerable<object>. Fortunately these methods don’t discriminate and happily work their magic on any IEnumerable<T> as well.

Still more to come

There is still plenty of more that I will post about LINQ. In my next post I’ll look at deferred execution, what it means and how you can take advantage of it.

i-think Twenty-Two

Now with more coherency.

How to enable an extension method

What does this have to do with LINQ?

What about old non-generic `IEnumerable`?

Still more to come

Comments

How to enable an extension method

What does this have to do with LINQ?

What about old non-generic IEnumerable?

Still more to come

Comments

What about old non-generic `IEnumerable`?