I recently read tiredblogger's IQueryable Methods on ActiveReports ControlCollections post. First of all, I can see how the "find control by name" type of method to our control collection would certainly be useful. We should look at adding it in an efficient way that won't impose overhead when it isn't needed. However, I do have some suggestions that might help in this scenario.
Most importantly, you're not stuck catching exceptions. Allowing exceptions to be thrown is extremely slow, and in the first example tiredblogger noted, this is probably the cause of "destroying performance". If there are hundreds of "indicators" in his example, there may be several to several hundred exceptions being thrown. However, I want to get right into "Linq" and using it as a solution in this case...
In cases such as this, I don't think Linq is appropriate. First, I think it is helpful to define exactly what "Linq" is. There are lots of idioms associated with Linq which I don't think are Linq at all, namely Extension Methods and Lambda Expressions. Both of which Linq relies on heavily, but are not really Linq. Don't get me wrong though, they are extremely useful (arguably more useful than Linq itself), just not really Linq. At its essence, Linq is the language integrated query facility and IQueryable/IQueryable<T> and goes through IQueryProvider and the various IQueryProvider implementations.
It is very interesting to understand what these extension methods are really doing in order to take full advantage of them. Their concise syntax can mislead us into thinking their "query-like" nature are some how performant in scenarios where they are not. In this example the whole interaction with Linq comes down to IQueryable<Label> (where Label happens to be an ActiveReports Label) and the use of SingleOrDefault.
So my first thought was that, SingleOrDefault has no state between each call to it during that loop, so in the best case it is doing a search through the controls list. Another thing might be to just fill up a Dictionary<string,ARControl> with controls keyed by name before doing the loop. With hundreds of indicators the Diciontary<T> lookup should be much faster than repeatedly searching through the control collection with SingleOrDefault. Essentially this comes down to replacing the code to convert/wrap List<T> with an IQueryable<T> instance to convert the List<T> into a Dictionary<T>.
To satisfy my own curiosity, I did some quick testing and as it turns out the point of IQueryable<T> vs Dictionary<T> is more important than I first realized. My test removed ActiveReports from the scenario since my focus is on Linq & performance here.
The first test is to replicate the conditions from tiredblogger's post. First I created some really simple sample data:
private void CreateSampleData()
{
for (var i = 0; i < indicatorCount; i++)
{
_indicators.Add(new Indicator(i));
}
for (var i = 0; i < indicatorCount; i = i + 2)
{
_controlsSimpleList.Add(new ARControl(i));
}
_controlsQueryable = _controlsSimpleList.AsQueryable();
}
Then the test simulating what tiredblogger described which uses the IQueryable<Label> as the source of data used with SingleOrDefault: private void WithLinqQueryable()
{
var foundCount = 0;
foreach (var indicator in _indicators)
{
var indicatorHeader = string.Format("i{0}", indicator.Id);
var indicatorControl = _controlsQueryable.SingleOrDefault(x => x.Name == indicatorHeader);
if (indicatorControl != null)
{
foundCount++;
indicatorControl.Text = IsSpanish
? indicator.SpanishText
: indicator.EnglishText;
}
}
if (foundCount != indicatorCount/2)
throw new InvalidOperationException("invalid foundCount");
}
I ran a test that uses 400 "indicators" with half of the SingleOrDefault calls not finding a corresponding Label (i.e. returned null). Running the above test in a 10 iteration loop gives me the result of 17046 milliseconds. Next, I realized that IQueryable<T> is not necessary since SingleOrDefault is available as an extension method for both IQueryable<T> (via System.Linq.Queryable.SingleOrDefault) and IEnumerable<T> (via System.Linq.Enumerable.SingleOrDefault). In the prior example, we're using the System.Linq.Queryable.SingleOrDefault implementation. In the next example we'll use the implementation from System.Linq.Enumerable since the source is _controlsSimpleList and it is merely List<T> (i.e. IEnumerable<T> ):
private void WithLinqExtensionMethods()
{
var foundCount = 0;
foreach (var indicator in _indicators)
{
var indicatorHeader = string.Format("i{0}", indicator.Id);
var indicatorControl = _controlsSimpleList.SingleOrDefault(x => x.Name == indicatorHeader);
if (indicatorControl != null)
{
foundCount++;
indicatorControl.Text = IsSpanish
? indicator.SpanishText
: indicator.EnglishText;
}
}
if (foundCount != indicatorCount / 2)
throw new InvalidOperationException("invalid foundCount");
}
Under the same conditions this one runs in a shocking 46 milliseconds! MUCH faster. Next I thought I'd compare the result to not using any Linq-related technologies at all, just a boring old Dictionary:
private void NoLinqTest()
{
var controlLookup = new Dictionary();
_controlsSimpleList.ForEach(x => controlLookup[x.Name] = x);
var foundCount = 0;
foreach (var indicator in _indicators)
{
var indicatorHeader = string.Format("i{0}", indicator.Id);
Label indicatorControl;
if (controlLookup.TryGetValue(indicatorHeader, out indicatorControl))
{
foundCount++;
indicatorControl.Text = IsSpanish
? indicator.SpanishText
: indicator.EnglishText;
}
}
if (foundCount != indicatorCount/2)
throw new InvalidOperationException("invalid foundCount");
}
This one runs under the same conditions in only 15 milliseconds.
So the saying goes, "When the only tool you have is a hammer, everything looks like a nail." Linq is another nice tool, but it's not the only tool we have :)
For those of you that want to play around with it, you can download the code here.
5 comments:
Scott-
These are excellent examples and I appreciate your response to my post.
While I disagree with your brief synopsis of the facets of the LINQ technology base, I do agree that some of the most popular features have revolved around things that make our lives easier--such as extension methods. Unfortunately, these technologies are just clumped into the general "LINQ" umbrella since they relate to iterating through collections of objects with a specific query in mind. :)
That aside, your evidence is quite interesting. The fact that a generic Dictionary object is faster than an IQueryable isn't really surprising. What is surprising is the difference. The millisecond timers that you have for your IQueryable's seem quite a bit longer than they were in my tests.
I'll run tests tomorrow at the office and post back.
On a side note, wouldn't it be quicker to provide these implementations (or at least something that can be iterated through) on the ControlsCollection? :)
Of course we should add the functionality to ControlsCollection. It was also the first line of my post:) However, I don't think it will be any faster than using the Dictionary. In fact, I am thinking that internally we would implement "control name indexer" on the collection using a Dictionary exactly as I've shown in the post.
On the other hand, I was thinking maybe it would be best to just data-bind your labels to the indicator objects text? If the indicators "know" if they're showing Spanish then they could automatically change one property (.e.g. "LocalizedText") that would have the correct value. This would avoid all the looping in the first place. In fact it would avoid the code altogether!
BTW: Out of curiosity, what is your brief synopsis of Linq ? :)
>>If the indicators "know" if they're showing Spanish then they could automatically change one property (.e.g. "LocalizedText") that would have the correct value.
That's a great idea.
>>BTW: Out of curiosity, what is your brief synopsis of Linq ? :)
Maybe I'm looking at it at too simplistic of a level, but, to me, LINQ "is" the extension methods, the lambda expressions--anything that allows us to take a set of data (whether a DataSet, object[], or whatever) and query or parse that information down. To me, it has absolutely nothing to do with any sort of provider model--in fact, a database isn't required--and everything to do with the abilities that the IQueryable and IQueryProvider.
Are these 100% tuned for performance? Hah! No, probably not, but they do provide a simplistic way to parse out data without a lot of overhead and provide an example--from there, tweaking can occur based on preference and environment.
So, on to the tests. Here's what I came up with. In total, there are 480 Label controls on the first page of the report, so that'll serve for our group.
There's also hundreds of Line and Shape controls on the report (much to my dismay), so I still limited the Dictionary (and IQueryable) to just Labels using the OfType extension method:
detail.Controls.OfType< Label >().ForEach(x => _controls[x.Name] = x);
At this point, _controls is a Dictionary< string, Label >.
Comparing this to populating the controls into an IQueryable was nonpoint, they both came back as being basically instant.
Replacing the SingleOrDefault and removing the null checks, since the TryGetValue takes care of that, the results are:
IQueryable
The average of 500 full report runs:
4.92 sec
Dictionary< string, Label >
The average of 500 full report runs:
4.74 sec
At this point, I'm not sure how much of that is attributed to database latency (though I'm probably the only one hitting the database and am connected to it via a quiet 1GB network--it's still early).
Also, I want to point out that these are all run under the VS2008 local web server--performance may vary a bit if pushed out to a proper IIS box.
I do appreciate the feedback and am going to spend some time dinking with this today--it's nice to see other, better ways to do things.
As with most MSFT technologies, "your mileage may vary" should be printed in big, bold letters. Perhaps a better title and overall feel of my post, should have focused around placing the controls into SOME sort of enumerable container and working through that when needed and not focused so much on "LINQ" and the BFH. :)
Thanks for providing more information from your side, it's an interesting conversation.
Post a Comment