One of the stumbling blocks on the road to understanding LINQ is deferred
execution. The key to getting past this is being able to identify that a query
is a definition of what you want, rather than the results themselves.
Here’s an example of how this works:
123456789
varitemsInStock=fromiteminwarehouse.Itemswhereitem.Quantity>0;selectitem;// Display how many items are in stockConsole.WriteLine("Items in stock: {0}",itemsInStock.Count());// Add a new item to the warehousewarehouse.Items.Add(newItem("A new item",50);// Display how many items are in stockConsole.WriteLine("Items in stock: {0}",itemsInStock.Count());
The second time itemsInStock.Count() is called it returns the updated count
that includes our new item. Instead of executing the query when it is defined,
execution is deferred until a result is needed (such as iterating over the
collection with a foreach loop, using ToList() to store the results in a
List<T> or one of the many LINQ extension methods that force an actual
result (such as Count() in this example). This has the added benefit of
allowing a query to be extended like so:
This query can now be used to return items that are in stock, but have less
than 5 available units.
Quite often you’ll want to work with a snapshot of the results from a query.
Maybe you are writing a method that returns a particular set of items. In this
scenario it may be better to return a list rather than the query itself. By
returning a list, the calling code is able to iterate over the result multiple
times without the result changing. For example you might implement your method
like this:
Calling code is able to get the information it needs and internally you can
directly get access to the query.
Another important thing to remember is that because a query is executed every
time you iterate it with a foreach loop you should use ToList() if you are
repeatedly calling the query and don’t need the results to be recalculated
each time.
When I received an unsatisfactory reply to my first complaint to Telstra
I started work on a new scathing reply hoping to get the
response I wanted from my first complaint. Unfortunately Telstra’s reply was
almost identical.
Hi,
I have attached my original complaint letter in case it has been misplaced.
Thank you for your reply. When I first read it I laughed. Despite the amount
of practice you (I’m using the term to collectively refer to Telstra (whom you
represent), so please do not take this as a person attack) must have dealing
with complaints you don’t seem particularly good at it.
I refer to the first line of my email. That’s right, the first line: “Please
redirect this complaint to the necessary area”. Nowhere in my email did I
suggest that it would be even a remotely acceptable response to provide
telephone numbers where I could presumably read my letter to. In fact, one of
these phone numbers was the subject of my fifth complaint (see original letter
attached).
Ignoring the first line of my letter is much like “showing a red rag to a
bull” or “poking the bear”. More likely these terms are referred to internally
as “servicing the customer”.
I chose to send my complaint in writing for the primary reason that it could
be forwarded to the appropriate people without losing anything in the
translation. This is highly preferable to calling up, telling the whole story
only to be passed on to another operator to start all over again.
Perhaps you don’t appreciate the wonders of the written word and its impact
on history. Before cavemen started drawing images on the cave wall the only
way to pass knowledge was through speech. Once written words were formed there
became a means to pass on information without requiring the original author
present. Furthermore the communication was able to be passed on exactly as the
author intended.
This system was still held back by the amount of time it took to reproduce a
written document. Fortunately the invention of the printing press made rapid
duplication of a written document feasible leading eventually to increased
literacy in the general populace. Several years later, computers were created
that could copy information perfectly at high rates. This is where we are
today.
I expected that it would be a simple case of locating an email address for
the necessary departments and forwarding the email to them. Clearly this must
be a new technology that hasn’t yet filtered down to Telstra from the world of
academia.
Consequently I hope that Telstra is more familiar with the postal service.
Please provide me with the postal address details of each of the relevant
heads of department that my letter should be addressed to. Also, please
provide the postal details for the head of Bigpond and head of Telstra who I
will also send a copy of my letter to. I will be adding an additional covering
letter detailing my dissatisfaction of your complaints handling process.
If you can not provide me with this information I would like a full
explanation of why this is the case. “We do not provide this information” is
not an explanation, nor is “call this number”.
Thank you for your time and I look forward to your prompt reply,
Have you ever wished that a base class had a particular method? What about
interfaces? Wouldn’t it be great to define a method on an interface along with
its implementation? Any class that then implemented the interface would get
this implementation for free.
In the past this was achieved with static utility classes. Unfortunately this
leads to cluttering your code with the names of these utility classes and
dilute the expressiveness of your code. Let’s say we have a utility class the
gets the words and word count from a string. Don’t worry too much about the
implementation, just the general structure.
To use this in our code we would have to do something like this:
123456789
varsentence="The quick brown fox jumps over the lazy dog";// Display each of the wordsforeach(varwordinStringUtilities.GetWords(sentence)){Console.WriteLine(word);}// Display the word countConsole.Write("Total Words: ")Console.WriteLine(StringUtilities.WordCount(sentence));
Look at all that clutter. The truth in this context is that we are really
performing an action on the sentence. Wouldn’t it be better if we could just
call sentence.GetWords()or sentence.WordCount()instead? It would
certainly be more readable. Extension methods make this all possible. Here’s
our updated StringUtilities class that creates the extension methods:
We’ve added this before the variable type. The rest of the code has been
left untouched. So now we can use the extension methods like so:
12345678
varsentence="The quick brown fox jumps over the lazy dog";// Display each of the wordsforeach(varwordinsentence.GetWords()){Console.WriteLine(word);}// Display the word countConsole.WriteLine("Total Words: {0}",sentence.WordCount());
Doesn’t that read better? We have been able to push the implementation details
(the name of the static utility class) out of our code.
How to enable an extension method
In order to use an extension method it must be part of the local namespace or
imported with a using statement. Once that’s done you can call extension
methods just as you would any normal method.
What does this have to do with LINQ?
LINQ is all about extension methods. When you import the System.Linq namespace
it comes with a whole bundle of extension methods. Most of them act on
IEnumerable<T> and can be used to write your LINQ queries in method syntax.
Let’s look at this query:
123
fromiteminitemswhereitem.Price<1selectitem.Name
This query finds the items that are under one dollar and returns their names.
We can write this query in method syntax like so:
It’s not quite as readable (although that is a matter of opinion), but it
gives a good indication of what is going on (and further demonstrates why
select is at the end). These methods also take advantage of Lambda expressions
(which I’ll discuss in a future post).
There are other useful extension functions that work with queries. Some of the
ones you’ll use most often are:
ToList() executes the query and returns the results in a list. You will probably use this method a lot. I’ll cover this method an its consequences in more depth in a future post on deferred execution.
Count() executes the query and returns the number of results. When used with LINQ to SQL it will execute SQL code to get the database server to return the count.
Any() returns true if there are any results in the query. Use this instead of Count() > 0 to abstract out the implementation detail.
First() returns the first result from the query. This is particularly useful when you have a query that will only return one result (such as looking up an entry based on its primary key). This method will throw an exception (InvalidOperationException) if the query yields no results.
FirstOrDefault() returns the first result from the query, much like First(). If there are no results it will return the default for the type (e.g. 0 for an int, null for reference types).
Fortunately you aren’t limited to using these extension methods on LINQ
queries. They are designed to work on any class that implements
IEnumerable<T>. This means you can use them directly on a lot of the classes
already in the .NET base class library.
What about old non-generic IEnumerable?
There are a lot of classes in the .NET framework that don’t implement
IEnumerable<T> but instead implement the non-generic interface
IEnumerable. A perfect example is MatchCollection used by Regular
expressions. When we enumerate over a MatchCollection we are given the base
object which we then need to cast to a Match object. Until we do this cast
we can’t access any of the properties of Match. Fortunately there are a
couple of LINQ extension methods designed to help out when dealing with
IEnumerable.
Cast<T>() returns a strongly typed IEnumerable<T> object. Each object is cast to the type T. If an object can’t be cast an exception is thrown (InvalidCastException). In the case of a MatchCollection I am confident that every object is a Match object and an exception won’t be thrown.
OfType<T>() also returns a strongly typed IEnumerable<T> object. It goes further than Cast<T>() by only including objects of that type in the enumeration. In other words it filters out any class that isn’t of the desired type (without throwing exceptions). This is the method to use when you are unsure of what the type will be or if you are dealing with an enumeration that contains different typed objects.
If you want to see OfType<T>() in action, copy and paste the following
example into LINQPad. (You’ll need to select C#
Statement(s) from the language drop down).
var items = new object[]{"a string", 22, Math.PI};
items.OfType<string>().Dump("OfType<string>");
items.OfType<int>().Dump("OfType<int>");
items.OfType<double>().Dump("OfType<double>");
LINQPad has its own extension method Dump() which is used to output results
to the LINQPad window. You’ll see that each individual dump returns a strongly
typed IEnumerable<T> object. In this example items actually implemented
IEnumerable<object>. Fortunately these methods don’t discriminate and
happily work their magic on any IEnumerable<T> as well.
Still more to come
There is still plenty of more that I will post about LINQ. In my next post
I’ll look at deferred execution, what it means and how you can take advantage
of it.
This is an interesting question that I’ve been pondering over recently. My
initial opinion was that if you have some sort of complex logic you should
create unit tests to verify the code is sound. I still feel that complex logic
in any code warrants unit testing, but I am beginning to wonder whether the
need for unit tests may indicate a larger problem: your automation may be too
complex.
But the complexity gets even more complex than that. Large suites of
integrated tests are quite capable of getting complicated without your help.
More abstraction and code reuse certainly makes things easier, but as you add
more code paths you increase the likelihood of creating new bugs. Perhaps
subtle bugs like using > when you should use >=. Maybe a regular expression
isn’t quite right.
Some of these bugs may result in a “false fail” when the test is executed. At
this point you need to begin failure analysis tracing the issue which could be
either in your tests or in the application under test. Constant failures due
to bugs in the test suite erodes confidence in the tests and I have seen
failure analysis put off because it is believed to be an issue with the tests.
If you can’t trust your tests, who can you trust?
If you can’t be confident that a failure in a test represents a failure in the
product under test your tests lose value. But what if your test passes when it
should fail? In this case your test has no value as it has failed its primary
purpose, to accurately confirm the application conforms to the test.
A passing test stays off the radar. Its functionality is assumed to work so
may be overlooked during manual testing. Eventually the problem may be found,
but possibly further down the line than is desirable such as during UAT or
worse yet, in production.
By creating unit tests around our more complicated code we can improve our
confidence in our own tests. It also sets a good example for the developers
working on the application. We can’t expect them to write unit tests for their
code if we don’t do the same.
We don’t need to have a unit test around every test (where would it stop?) but
we certainly should be looking to at least verify our more complex bits of
code.
I really like LINQ. It’s one of my favourite .NET features. When I first heard
about it I was doing most of my programming in Visual Basic 6 (or worse,
Visual Basic for Applications). Working now with C# and the .NET Framework has
blessed me with full O-O, strong types, an excellent base class library,
Visual Studio 2008 (and IntelliSense), Generics (I love generics) and LINQ.
So what is LINQ and why is it so important to add to your arsenal of .NET
skills?
LINQ is so many things
At its core LINQ is exactly what its acronym suggests: Language Integrated
Query. But what does this actually mean? Is it just some marketing hype
designed to confuse the masses and look good on your resume. Probably. But the
value of expressing a query concisely in the language of your choice becomes
more apparent with each LINQ query you write. (Yes, I know LINQ Query would
stand for Language Integrated Query query. Just go with it, it reads better.)
Importantly a LINQ query separates defining what you are looking for from how
to find it. This means that a LINQ query could potentially be executed across
multiple CPU cores and in the case of LINQ to SQL can be turned into an
efficient SQL query so the hard work can be done by your database server.
But when are you going to actually use LINQ? Chances are good that you already
have some code that could benefit from a bit of LINQ.
We can ignore the variable declaration for now (and the call to ToList()) so
let’s break it down to just the core LINQ query.
123
fromiteminitemswhereitem.Price<1selectitem
The LINQ query describes exactly what you want and nothing more. When we used
the foreach construct we were resigned to the fact that we had to look at
each and every item. We are also doing all this in a single thread. In fact,
we spend more time describing how we want to find the items than saying
what it is that we want. By describing what we want using LINQ we don’t
bother with the implementation details resulting in cleaner code and
improved flexibility for how the query should be implemented. In the case
of a database query, the ideal implementation would be to generate a SQL
query, execute the SQL query against the database and return the results. LINQ
to SQL does just that with essentially the same code (I’ll be discussing LINQ
to SQL in depth in another post).
From Where Select vs. Select From Where
If you are already familiar with SQL you may be a little confused by the
syntax of a LINQ query. Indeed this is a major stumbling block most people
encounter when they start to use LINQ. In SQL we have the ‘select’ statement
upfront but in LINQ we save it for the end. Why? The primary reason for this
choice was to enable great IntelliSense support in Visual Studio.
I’d like to argue that the syntax in LINQ actually makes more sense. Rather
than starting with what we want at the end we start with the subject of our
query. The reason this seems so foreign is that we are so used to it because
of SQL. When you write code you typically say where you want to look before
you say what you want to do when you’ve found it. In natural language it is
like saying “From the store find a computer with 2GiB RAM and get me the price
of the computer”. In SQL speak that would be “Get the price of the computer in
the store where that computer has 2GiB RAM”. You tell me which form you’d be
more likely to use.
The magic of type inference
Another convenient way to remember that from comes first is to think of the
old foreach implementation. You’ll find that they have a lot in common. The
biggest difference is that in the foreach loop we have to explicitly specify
a type. In the example above I’ve used var to let the compiler infer the
type. In the LINQ query the type is inferred automatically unless you specify
it explicitly.
Type inference is used throughout most LINQ usage to simplify code and to
improve maintainability. Queries return an object that implements the
IEnumerable<T> interface. More advanced queries can return objects that
implement a more complex interface (which is also an IEnumerable<T>). By
using var to let the compiler infer the type of object returned by the query
it saves the programmer from having to explicitly work out what type of object
is returned. The full significance of this will become apparent in future
posts.
How to get started
The best way to get started working with LINQ is to read up about it on
MSDN. Then download the great tool
LINQPad. LINQPad has some great sample LINQ queries
and lets you play with LINQ outside of Visual Studio. It’s great for writing
short snippets of code and is an ideal sandbox to try out bits of code.
LINQPad is free, but Auto Completion is a paid feature (but well worth it). It
also lets you run LINQ to SQL queries on a SQL database (and now SQL Compact
Edition).
Once you have started familiarising yourself with LINQ you should start using
it in your projects. There are two key requirements to using LINQ:
Your project must target version 3.5 of the .NET Framework.
You must include using System.Linq; to reference the LINQ namespace in all code files where you want to use LINQ.
There’s plenty of stuff to talk about with LINQ. In my next few posts I’ll
cover Extensions methods, Lambda expressions, LINQ over objects and much more.