Code Highlighting

Monday, October 22, 2012

SSMS: OutOfMemoryException executing a large SQL batch

So you're executing some beast of an Sql batch using the Management Studio, for instance the result of "Generate Scripts" on another database. Then all of a sudden you get a OutOfMemoryException. Even when just keeping the entire script in memory, poor SSMS is hanging on by its fingernails. How could you expect it to also execute that file, and give you the results?

Thankfully there is a command-line Sql server client Microsoft thoughtfully provides together with Sql server. If you're not able to import a large sql file using SSMS, navigate to the Binn folder of your Sql server folder (C:\Program Files\Microsoft SQL Server\100\Tools\Binn here) in Command Prompt, and type the following:

osql -S databaseserver -U username -P password -d DemoDatabase -i c:\demo.sql

That executes the file C:\demo.sql on databaseserver in the context of DemoDatabase using the login data provided. This will scroll a bunch of query results in your command prompt window. If you'd rather examine these results in detail later, the -o parameter writes this info to an output file:

osql -S databaseserver -U username -P password -d DemoDatabase -i c:\demo.sql -o c:\output.txt

That should work, even when SSMS chokes on the sheer size of your query file.

Monday, October 15, 2012

text-transform: uppercase subtleties

So last week I received an interesting question from a customer. He complained that in Chrome on Mac, the category 'Soßen und Dips' ("Sauces and dips") on the german (obviously) version was rendered as 'SOSSEN UND DIPS' (wrong), rather than 'SOßEN UND DIPS' (right). Notice that ß and SS are semantically identical - a latin1 collated database will consider those strings to be identical.
On Internet Explorer, the ß was displayed fine.
It didn't take long to figure out that the menu item had text-transform: uppercase applied, and that this caused the transform of ß to SS. What's more, Safari and Firefox also displayed SS instead of ß. My Google search led me to page https://bugzilla.mozilla.org/show_bug.cgi?id=354451 . This page indicated that transforming ß to SS was deliberate, and not a bug at all. Der Spiegel suggests that ß should, in capitals, always become SS.
So now I only need to convince the customer that their browser can spell their language better than they can.

Menno




As an interesting aside: a basic test case to check browser behavior sees Internet Explorer 9 also rendering SS, not ß. Changing the document mode down to IE7 shows ß again. The actual online page is in IE9 mode, and does not have a X-UA-Compatibility meta tag. The question then becomes, why does it show ß? No idea yet.

Thursday, October 4, 2012

In which I learn about multithreading performance

... or an exercise in optimizing IndexOf on List<T> for multicore.

So I had a great plan.
.NET 4 introduced AsParallel() into LINQ (well, 'PLINQ' , but try using that word in conversation without giggling uncontrollably). So how about re-implementing some of these methods in a multithreaded way for .NET 3.5?
To get my feet wet, I decided to start off simple: I re-implement IndexOf on List<T>. Should be easy:

  • Perfect for splitting up, each thread just takes a range of the list
  • No need for critical sections
So I wrote an extension method IndexOfParallel<T>:
  • Creates as many threads as there are logical cores,
  • pass them a list range, and a status object
  • each thread checks its range of the list, sets the status object to found and returns
  • main thread calls .Join() on each created thread
Done! Let's check how much better it performs compared to the regular IndexOf!

Looking for a thousand random ints in a list of a million ints:
  • Regular version: 8565064 ticks
  • Multithreaded version: 542255931 ticks
Whoops! Only sixty times slower!

Perhaps a million ints is too small to have the proper effect. Let's try ten million:
  • Regular version: 9131999 ticks
  • Multithreaded version: 149267210 ticks
Well, that's only sixteen times slower now. Progress!
I check the source for IndexOf on List<T>, which uses Array.IndexOf on its internal array. Turns out it uses a native method for basic types. Clearly I can't improve on that. Perhaps I should compare against a List<string> (100,000 items):
  • Regular version: 7044903 ticks
  • Multithreaded version: 93284309 ticks
Thirteen times slower. This is starting to piss me off. Why is it slow? What is slow?
Perhaps I shouldn't be trying to reinvent the wheel. What happens if I just use the overload of IndexOf on List<T> that takes a range instead of writing my own loop? Clearly I'll be losing my early exit, but with enough cores maybe it evens out:
  • Regular version: 7563221 ticks
  • Multithreaded version: 140339244 ticks
Nope! Twenty times slower! If I fire this version off against a List<int> of a million, I gave up waiting for it to finish at all. Extremely slow. I suspect the native method lock the List's internal array in memory, and that may be marked as a critical section.
Whatever the reason, let's scrap this, and go back to my own loop. What if we unroll the loop, say, four times? If that speeds it up significantly, we can deduce the loop implementation is slowing everything down:
  • Regular version: 7122743 ticks
  • Multithreaded version: 89569552 ticks
A small gain: twelve times slower. Clearly the loop is fine. So the thread creating is probably to blame. Let's not create our own threads, and use the ThreadPool:
  • Regular version: 7034966ticks
  • Multithreaded version: 5879994 ticks
Whaaa! Success! It's not much, but I finally beat the built-in version. Let's see if we can improve it a bit more. How about turn the status class into a struct, and avoid the getter for the properties by turning them into public fields? Obviously we need to ref the method parameter:
  • Regular version: 6979197 ticks
  • Multithreaded version: 4006500 ticks
Awesome! If we increase the number of strings in the list to a million, the effect increases too:
  • Regular version: 76802790 ticks
  • Multithreaded version: 36655462 ticks
Twice as fast! I'm sure this can still be improved significantly. I still need to figure out at what list size it makes sense to switch to multithreaded. If you're working on a List<T> where T's implementation of Equals is slow, it should do better. There are a number of lessons learned already though:
  • This method's only worth it in a few situations. Mostly just not.
  • Only create new Threads if you will hold onto them for a long time. Creating new threads takes long. 
  • Prefer using the ThreadPool.
  • Measure Measure Measure!
It'll be interesting to see how we manage reimplementing .Where(). The Predicate delegate could be pretty expensive. IEnumerable<T> is forward only. Will exporting to List<T> and splitting up be faster? Excitement!

So there we are.We beat the built-in IndexOf.

Menno


Here's, for now, the final result:

using System;
using System.Collections.Generic;
using System.Threading;

namespace Tabeoka.Extensions
{
    public static class ExtensionMethods
    {
        public static int IndexOfParallel<T>(this List<T> source, T item)
        {
            int threadCount = GetOptimalThreadCount(source.Count);

            if (threadCount == 1)
                return source.IndexOf(item);

            SearchStatus status = new SearchStatus()
            {
                Found = false,
                FoundIndex = -1
            };

            // Looks like the ThreadPool always hangs onto at least 
            // # of cores threads, if left unset otherwise
            using (ManualResetEvent resetEvent = new ManualResetEvent(false))
            {
                int threadsFinished = 0;
                for (int i = 0; i < threadCount; i++)
                {
                    int fromIndex = (source.Count * i) / threadCount;
                    int toIndex = (source.Count * (i + 1)) / threadCount;

                    ThreadPool.QueueUserWorkItem(new WaitCallback(delegate(object t)
                    {
                        SearchListRange(source, item, fromIndex, toIndex, ref status);
                        if (Interlocked.Increment(ref threadsFinished) == threadCount)
                            resetEvent.Set();

                    }));
                }

                resetEvent.WaitOne();
            }

            return status.FoundIndex;
        }

        private static int GetOptimalThreadCount(int listCount)
        {
            // needs more sophisticated logic
            return Math.Min(listCount, Environment.ProcessorCount);
        }

        private static void SearchListRange<T>(List<T> source, T item, int fromIndex, int toIndex, ref SearchStatus status)
        {
            int i;

            for (i = fromIndex; i < toIndex; i += 4)
            {
                if (source[i].Equals(item))
                {
                    status.FoundIndex = i;
                    status.Found = true;
                    return;
                }

                if (source[i + 1].Equals(item))
                {
                    status.FoundIndex = i + 1;
                    status.Found = true;
                    return;
                }

                if (source[i + 2].Equals(item))
                {
                    status.FoundIndex = i + 2;
                    status.Found = true;
                    return;
                }

                if (source[i + 3].Equals(item))
                {
                    status.FoundIndex = i + 3;
                    status.Found = true;
                    return;
                }

                // if some other thread found it; quit searching!
                if (status.Found)
                    return;
            }

            // finishing up loop
            for (i = i - 3; i < toIndex; i++)
            {
                if (source[i].Equals(item))
                {
                    status.FoundIndex = i;
                    status.Found = true;
                    return;
                }
            }
        }
    }

    internal struct SearchStatus
    {
        public bool Found;
        public int FoundIndex;
    }
}