Code Highlighting

Thursday, July 3, 2014

What if events in C# were, you know, cooler?

In the beginning events in C# were awesome. Some of us came from Java, and used to have to implement interfaces named IButtonClickListener on some inner classes whatever nonsense just to react to some button being clicked, but not in C#. No, in C# you just did the following:

    public partial class Form1 : Form
    {
        Button newButton;

        public Form1()
        {
            InitializeComponent();

            newButton = new Button()
            {
                Text = "Click me!",
                Left = 20,
                Top = 20
            };

            newButton.Click += new EventHandler(HandleClick);

            this.Controls.Add(newButton);
        }

        private void HandleClick(object sender, EventArgs e)
        {
            MessageBox.Show("Hi there!");

            newButton.Click -= HandleClick;
        }
    }

No obnoxious interface nonsense! Delegates! The future is now!

But over 10 years later we have gotten spoiled with anonymous delegates and lambdas, and all of a sudden the C# event model does not seem so cool anymore. Because when you hook up an event handler like this:

newButton.Click += (sender, e) => MessageBox.Show("Hi there!");

do you imagine you can remove it like this:

// fugetaboutit!
newButton.Click -= (sender, e) => MessageBox.Show("Hi there!");

No way! You would have to save the handler to a variable to have a reference handy to be able to remove
the handler later:
EventHandler handler = (sender, e) => MessageBox.Show("Hi there!");

Worse! Microsoft added lots of nice generic delegate types Action<T>, Action<T1, T2>, Action<T1, T2, T3> etc that we can't use because we're stuck with the EventHandler delegate types for Events. You would need a wrapper around the Invoke function to make that work.

Would you like to check out what is currently hooked up to your event? Tough!

Clear the whole thing and start over? Get outta here!

But despair no more! I have written a half-assed solution that I think might have been marginally better if it had worked properly!

Presenting: the new and improved Event:


    public class Event<TEventArgs,TEventHandler>
        where TEventArgs : EventArgs
        where TEventHandler : class
    {
        // the main store for handlers
        private Dictionary<int, Action<object, TEventArgs>> handlers =
            new Dictionary<int, Action<object, TEventArgs>>();

        // a dictionary that maps the original handlers to the generated wrappers
        private Dictionary<object, int> operatorAddedHandlers =
            new Dictionary<object, int>();

        public Event()
        {
        }

        static Event()
        {
            // check if my TEventHandler type parameter is a delegate type
            // where TEventHandler : Delegate is not supported
            if (!typeof(TEventHandler).IsSubclassOf(typeof(Delegate)))
            {
                throw new InvalidOperationException(typeof(TEventHandler).Name + " in generic type parameter TEventHandler is not a delegate type.");
            }
        }

        /// <summary>
        /// Maps events of the same name on the container object to the Event field that hides it
        /// </summary>
        /// <param name="eventGenerator">the object that contains the Event field (typically this)</param>
        public void CaptureBaseEvents(object eventGenerator)
        {
            var fieldNames = new List<String>();
            var classType = eventGenerator.GetType();
            var fields = classType.GetFields();
            
            // check which field the current instance has been assigned to, store the name
            foreach (var field in fields)
            {
                var val = field.GetValue(eventGenerator);
                if (Object.Equals(val, this))
                {
                    fieldNames.Add(field.Name);
                }
            }

            // get a MethodInfo for this class's Invoke method
            var invokeHandler =
                typeof(Event<TEventArgs, TEventHandler>).GetMethod("Invoke");

            // check if any base events exist that have the same name
            // if so, hook up our Invoke method
            foreach (var fieldName in fieldNames)
            {
                var classEvent = classType.GetEvent(fieldName);

                if (classEvent != null)
                {
                    var tDelegate = classEvent.EventHandlerType;
                    var d = Delegate.CreateDelegate(tDelegate, this, invokeHandler);

                    var addHandler = classEvent.GetAddMethod();
                    Object[] addHandlerArgs = { d };
                    addHandler.Invoke(eventGenerator, addHandlerArgs);
                }
            }
        }

        /// <summary>
        /// Invokes the event handlers
        /// </summary>
        /// <param name="sender">The object that triggered the event</param>
        /// <param name="e">The event parameters</param>
        public void Invoke(object sender, TEventArgs e)
        {
            // just in case a handler modifies the handlers
            var handlerDelegates = handlers.Select(hs => hs.Value).ToList();

            foreach (var handler in handlerDelegates)
            {
                handler.Invoke(sender, e);
            }
        }

        /// <summary>
        /// Removes all handlers
        /// </summary>
        public void ClearHandlers()
        {
            handlers.Clear();
        }

        /// <summary>
        /// Gets a list of all handlers and their removal token
        /// </summary>
        /// <returns></returns>
        public IEnumerable<EventHandlerSet<TEventArgs>> GetAllHandlers()
        {
            // store in list in case someone loops through to find a handler to remove
            // that seems kind of likely
            var handlerSets = handlers.Select(hs => new EventHandlerSet<TEventArgs>()
            {
                Handler = hs.Value,
                RemovalToken = hs.Key
            }).ToList();

            return handlerSets;
        }

        /// <summary>
        /// Adds an event handler
        /// </summary>
        /// <param name="newHandler">The new handler to add</param>
        /// <returns>The removal token for the newly added handler</returns>
        public int AddHandler(Action<object, TEventArgs> newHandler)
        {
            // we don't hook up the same handler twice,
            // just return its current removal token
            if (handlers.Any(ks => ks.Value.Equals(newHandler)))
            {
                return handlers.Where(ks => ks.Value.Equals(newHandler)).Single().Key;
            }

            // use a new guid cut off at int32 length for key generator
            var newGuid = Guid.NewGuid();
            int newKey = BitConverter.ToInt32(newGuid.ToByteArray(), 0);

            // it's a lot more likely at 32 bits
            while (handlers.ContainsKey(newKey))
            {
                newGuid = Guid.NewGuid();
                newKey = BitConverter.ToInt32(newGuid.ToByteArray(), 0);
            }

            handlers.Add(newKey, newHandler);

            return newKey;
        }

        /// <summary>
        /// Adds a new handler to the Event field
        /// </summary>
        /// <param name="thisEvent">The event field</param>
        /// <param name="newHandler">The handler to add</param>
        /// <returns></returns>
        public static Event<TEventArgs, TEventHandler> operator +(Event<TEventArgs, TEventHandler> thisEvent, TEventHandler newHandler)
        {
            var handlerDelegate = newHandler as Delegate;

            if (handlerDelegate == null)
                throw new InvalidOperationException("Handler is not a delegate type!");

            // if this handler has already been added, do nothing
            if(thisEvent.operatorAddedHandlers.ContainsKey(newHandler)){
                return thisEvent;
            }

            var removalToken = thisEvent.AddHandler(delegate(object sender, TEventArgs e)
            {
                handlerDelegate.DynamicInvoke(new object[] { sender, e });
            });

            thisEvent.operatorAddedHandlers.Add(newHandler, removalToken);

            return thisEvent;
        }

        /// <summary>
        /// Removes the specified handler
        /// </summary>
        /// <param name="handler">The handler to remove</param>
        public void RemoveHandler(Action<object, TEventArgs> handler)
        {
            if (handlers.Any(ks => ks.Value.Equals(handler)))
            {
                var removalToken = handlers.Where(ks => ks.Value.Equals(handler)).Single().Key;

                handlers.Remove(removalToken);
            }
        }

        /// <summary>
        /// Removes the specified handler
        /// </summary>
        /// <param name="removalToken">The handler's removal token</param>
        public void RemoveHandler(int removalToken)
        {
            if (handlers.ContainsKey(removalToken))
                handlers.Remove(removalToken);
        }

        /// <summary>
        /// Removes the specified handler
        /// </summary>
        /// <param name="thisEvent">The event field</param>
        /// <param name="handler">The handler to remove</param>
        /// <returns></returns>
        public static Event<TEventArgs, TEventHandler> operator -(Event<TEventArgs, TEventHandler> thisEvent, TEventHandler handler)
        {
            if (!thisEvent.operatorAddedHandlers.ContainsKey(handler))
                return thisEvent;

            var removalToken = thisEvent.operatorAddedHandlers[handler];

            thisEvent.RemoveHandler(removalToken);

            return thisEvent;
        }
    }

    public class EventHandlerSet<TEventArgs>
    {
        internal EventHandlerSet() { }
        public int RemovalToken { get; set; }
        public Action<object, TEventArgs> Handler { get; set; }
    }

I've made it so that it's mostly a drop-in replacement for pre-existing events. Obviously you need to inherit from the class implementing the events. So let's take that boring old button we had, and jazz up the click event:


    public class NewAndImprovedButton : System.Windows.Forms.Button
    {
        public NewAndImprovedButton()
            : base()
        {
            Click.CaptureBaseEvents(this);
        }

        public new Menno.Event.Event<EventArgs, EventHandler> Click = new Menno.Event.Event<EventArgs, EventHandler>();
    }

Turns out you can hide an event with a field. Who knew? Also, the CaptureBaseEvents in the constructor only works because fields are initialized before the constructor runs. Otherwise I'd have had to do something odd and confusing. More so.

If we take the first piece of code and replace Button by NewAndImprovedButton, like so:
    public partial class Form1 : Form
    {
        NewAndImprovedButton newButton;

        public Form1()
        {
            InitializeComponent();

            newButton = new NewAndImprovedButton()
            {
                Text = "Click me!",
                Left = 20,
                Top = 20
            };

            newButton.Click += new EventHandler(HandleClick);

            this.Controls.Add(newButton);
        }

        private void HandleClick(object sender, EventArgs e)
        {
            MessageBox.Show("Hi there!");

            newButton.Click -= HandleClick;
        }
    }

Everything keeps working exactly the way it did. I implemented the + and - operators to mirror the current Event syntax, and the CaptureBaseEvents method hooks up the event in the base class that the field hides to the Invoke method. Magic!
But we did not make this to keep working the same way. Here's what's new:

  • an AddHandler method that takes an Action<object, TEventArgs> and returns an int that functions as a token.
  • (obviously) a RemoveHandler that an int token.
  • for good measure: a RemoveHandler that takes an Action<object, TEventArgs>
  • a ClearHandlers method
  • a GetAllHandlers method that returns all handler delegates and their tokens, even the ones added through reflection.
Of course there are some rough edges too:
  • Field cannot be readonly if you use the operators. Anything could happen.
  • Operators generate wrappers through reflection. Performance is bound to be questionable.
  • Probably hundreds of corner cases I haven't considered.
Still, now you can do this:

            int removalToken = 0;
            removalToken = newButton.Click.AddHandler((sender, e) =>
            {
                MessageBox.Show("Hi there!");
                newButton.Click.RemoveHandler(removalToken);
            });

For some folks, that's worth a few rough edges.

Sunday, August 18, 2013

What if performance was more important than accuracy?

In 2000 Interplay released a game called "Messiah". It wasn't a particularly good game, and the only reason I mention it is it had this automatic system for scaling detail levels. Rather than letting people set detail level manually, it would detect when it could add some more polygons, and when detail levels needed to be dialled down. That way the game could fairly consistently hit the target frames per second.

That got me wondering: what if you find yourself programming something for which being fast is more important than being exactly right?
This is not an entirely unfamiliar trade-off: everytime you use a double instead of a decimal you decide that fast calculations are more important than exact decimal representation of your numbers. And the UDP protocol foregoes transmission verification for speed.

In the past we would write asynchonous function calls somewhat like this:
(Now with async/await, things are different again. Well, they look different.)

// Start reading from that slow file system
var fileStream = new System.IO.FileStream(@"C:\blablah", System.IO.FileMode.Open, System.IO.FileAccess.Read);
var asyncResult = fileStream.BeginRead(blahblah ...);

// Do lots of stuff in between to maximize your efficiency

// And now you just need your data, so for the rest of the 
// time you're synchronously waiting for it to arrive
fileStream.EndRead(asyncResult);

That's fine and dandy if you're just reading data. But what if you're calculating something on another thread, and you've already found your result, but the caller hasn't asked for it yet? Maybe you could spend your time improving your result rather than twiddling your thumbs.
So we need a method that iteratively improves on its previous results, and returns each so that we have something to return. Obviously I chose pi (using Microsoft.SolverFoundation.Common):
(using BBP formula)

static IEnumerable<Rational> GetPi()
{
    Rational bigPi = new Rational();
    Rational bigMinusOne = -1;
    Rational bigOne = 1;
    Rational bigTwo = 2;
    Rational bigFour = 4;
    Rational bigFive = 5;
    Rational bigSix = 6;
    Rational bigEight = 8;
    Rational bigSixteen = 16;
    for (int i = 0; i < 10000; i++)
    {
        Rational bigI = i;
        Rational powI;
        Rational.Power(16, i, out powI);
        var bigS = (bigOne / powI) * (bigFour / ((bigEight * bigI) + bigOne) - (bigTwo / ((bigEight * bigI) + bigFour)) - (bigOne / ((bigEight * bigI) + bigFive)) - (bigOne / ((bigEight * bigI) + bigSix)));
        bigPi += bigS;
        // We don't want to return anything less than 7 hexadecimal digits
        if (i > 7)
            yield return bigPi;
    }
}

Sooo, up to ten thousand hexadecimal digits (hexits?) of pi generated as a decimal fraction. Now to write a wrapper class that runs this in a loop in another thread:

public sealed class IncreasinglyAccurate<T>
{
    private Func<IEnumerable<T>> _generator;
    private Thread _generatorThread;
    private bool _hasValue = false;
    private T _currentValue;
    public IncreasinglyAccurate(Func<IEnumerable<T>> generator)
    {
        if (generator == null)
            throw new ArgumentNullException("generator");
        _generator = generator;
        // returns as soon as the generator thread is started
        GetValues();
    }
    private void GetValues()
    {
        // Spinning up a new thread is wasteful, but I can't set priority
        // on work items added to the threadpool
        _generatorThread = new Thread(delegate()
        {
            foreach(T value in _generator()){
                _currentValue = value;
                _hasValue = true;
            }
        });
        _generatorThread.Priority = ThreadPriority.BelowNormal;
        _generatorThread.Start();
    }
    public T GetValue()
    {
        // wait for a value
        while (!_hasValue)
        {
            Thread.Sleep(1);
        }
        // delegates will need to expect a ThreadAbortException
        _generatorThread.Abort();
        _generatorThread.Join();
        return _currentValue;
    }
}
/// <summary>
/// Little wrapper to get generic type inference
/// </summary>
public static class IncreasingAccuracy
{
    public static IncreasinglyAccurate<S> FromDelegate<S>(Func<IEnumerable<S>> generator)
    {
        return new IncreasinglyAccurate<S>(generator);
    }
}

That looks about right. Now we should be able to use this as follows:

var piGenerator = IncreasingAccuracy.FromDelegate(GetPi);
// Do some work!
System.Threading.Thread.Sleep(50);
Console.Write(piGenerator.GetValue().ToString());

On my slow old computer that shows a very big number divided by another big number (probably a power of 16!). Increasing the sleep time increases the size of the number, so that looks right.

I'll grant you that pi digits are not that useful; I'm not sure when you would ever need more than, oh, 10 digits max. But I can think of some situations where it's useful to have a sufficient answer fast, but to keep thinking about a better one: AI, voice recognition, anything that will not be 100% perfect in a hurry.

Thursday, June 13, 2013

Something silly

I've been spending some of my free time working on something silly. People who know me expect me to bring up the OMGWTF2 contest now. That's a valid point, so let me start over.

I've been spending some of my free time working on two silly somethings. Here's a link to one of those: http://demo.tabeoka.be/Contact%20%20Tabeoka.htm . Click that link and fill out the form and send it up.
Go ahead, nothing bad will happen. I'll wait right here.






Did you click that link? No? You need to have clicked that link to make sense of the rest of this post. No really! Go back and click.






Right. So now that that is done, I have a second link for you: click right here.
Now there are three possibilities:
1) you're pretty gullible, and thought that was amazing,
2) you thought that was still pretty neat, or
3) that link did nothing because you're using some browser I haven't tested, which are many.

For those of you who chose option 3 and missed all the excitement: that second link should have replayed exactly what you did on that page the first time you were there. I had the idea after going through some Google Analytics graphs and wondering: "But why do they click that particular link?". You could potentially use this type of script to do usability testing on the cheap.
The code is for the most part pretty straightforward, but there are a few things which I thought were interesting to point out:

It generates unique selectors for elements. If you click something, it needs to log which element. When you're just dealing with javascript, you would keep a reference to the DOM element, but we're logging to json. Generating a jquery selector enables getting an easy reference to the clicked element. For now I'm generating a selector by going through parent elements ( td>tr ) and logging the index (:nth-child) until I hit an element with an id (#content) or the document root (html>body>). Amazingly I had to write this myself; I couldn't find a pre-written script on Google.

In the interest of keeping bandwidth use and file size down, I log  frame-by-frame. Every 40 ms (for a frame rate of 25) the data collector comes by, compiles and documentEvents object and adds it to the array. which brings me to the next point: this script is completely modular. Currently it logs input fields, mouse location, scroll position and clicks. If you want to add another thing to log though, say actions in a Flash object, you just write another set of methods to log and replay those, and point the main logger file to it. Remember that every additional logger adds extra bytes to the data that needs to be sent to the server.

Everybody knows that the onunload (or onbeforeunload) can not be trusted. That's why I ping the latest frames to the server every second. You could set it to 2 seconds or 10 seconds or whatever as well. At the server end I have a .NET handler that appends the extra json to the end of a file. The first request generates a new file and sends back the id for subsequent requests (currently a Guid). This would be trivial to implement in another server language.

Current problems include:
In Chrome onchange on an input field is only fired after the field loses focus. IE fires it on keypress. That means that in IE you see the text field value being typed up, but in Chrome it just appears fully written. Chrome is better for bandwidth, but IE is nicer to see replayed. I haven't made up my mind yet which is preferable.

Replaying the logged event mutates the initial page state, but moving back the timeline does not revert to previous page state. I could clear text fields and such, but simulated clicks that fired event handlers might be impossible to undo. Like that cookie notice.

Lastly, this is just some demo script code, don't use it for anything serious. I haven't figured out yet if I'm going to actually release this code as open source at e.g. GitHub. Don't copy/paste and run with it. I'm sure it's illegal or something.

Tuesday, May 7, 2013

Short: belgian rijksregisternummer validation in javascript

It's been really busy, so no time for blog posts. In the meantime, here is a short function to validate id card numbers (rijksregisternummer) in Belgium, according to the official guidelines. I could not find a proper implementation of this in javascript.

        function IsRRNoValid(n) {
            // RR numbers need to be 11 chars long
            if (n.length != 11)
                return false;

            var checkDigit = n.substr(n.length - 2, 2);
            var modFunction = function(nr) { return 97 - (nr % 97); };
            var nrToCheck = parseInt(n.substr(0, 9));
            
            // first check without 2
            if (modFunction(nrToCheck) == checkDigit)
                return true;

            // then check with 2 appended for y2k+ births
            nrToCheck = parseInt('2' + n.substr(0, 9));

            return (modFunction(nrToCheck) == checkDigit);
        }

Easy as pie. You can get the gender and date of birth out of there as well, but I don't need that info for my purpose. A slightly more advanced validation would also check to see if the first 6 digits represent a valid date according to the yyMMdd format (determining the right century using the current validation).

Monday, March 4, 2013

Good news! People are smart! Kind of!

For customer HAROL we wrote a little memory game. There are 16 cards (8 pairs), and you get 3 attempts to try and find a match. If you do find a match you might just win an iPad! If you don't - well, you still get a discount coupon. The game was played on iPads set up at Batibouw. There is no online link for the game, or I'd link you there *.

Before we sent the game off to Batibouw, I sat down to calculate the likelihood of finding a match within your three attempts: it shouldn't be too hard or Harol would never get rid of its iPads. For calculations, I would assume a 100% perfect memory. The most interesting part here is how close people would get to that 100% perfect memory score. Are people smart? Kind of? Or are they really dumb?

I haven' t had a lot of formal statistics training, so I've largely relied on The Big Three: "simple common sense", "blatant disregard for other people's opinions" and "shameless guessing". Readers with a background in statistics: feel free to set me on the right path in the comments. I'll feel free to blatantly disregard you.
Anyway, here's what I figured:

Three turns, six cards. Rather than calculate the 'winning' percentage, I'll calculate the 'losing' percentage, and deduct from 100 %. Losing is the one route you take through the game where you don't find a match, winning is any of 3 distinct events - maybe even 5.
  • Card 1: nothing to do here
  • Card 2: Lucky guess chance to win. Chances to lose: 14/15
  • Card 3: May be a match for one of the 2 previous ones. 12/14 it ain't though.
  • Card 4: And if it ain't, there's still a lucky guess chance: 12/13
  • Card 5: A match for any of the previous ones is starting to be likely, but not quite: 8/12
  • Card 6: Last chance to hit a match: 10/11
So where does that put us? Far as I can see, we just multiply all of those fractions, and come up with the losing percentage: 0,4475524475524476. Say 45%. You're actually 55% likely to win if you play it smart. (If you just guess blindly, your winning chances are obviously (1-pow(14/15, 3)) = 18.6 % = not looking good).

So how did people do?




**** drumroll ****




49%
Only 6% away from a perfect memory.

Good news! People are smart! Kind of!



* For those of you who care: it was an MVC app, some nice ajax, css transitions that I could afford to rely on, as it only needed to work on iPad. Who knows maybe it will be recycled for another promotion that will run online.

Thursday, January 17, 2013

Pixels are dead.

And Apple killed them.

Well, they're not really dead: I'm sure that in the foreseeable future images will still be rendered onto a pixel-based (or semi-pixel based) screen of sorts. In css though, it is time to say goodbye: you shouldn't use pixels anymore. But don't worry: you don't anyway. Don't fool yourself into thinking css pixels necessarily translate 1:1 to actual pixels on an actual screen.


Allow me to demonstrate


Css3 defines the newfangled concept of a resolution unit. It can be set to the traditional dpi, dpcm (if you're a metric-nazi) or dppx (dots per pixel. That's right, dots per pixel). And just like that css formalizes what tablets and smartphones have been doing for a long time: the css pixel is now an abstraction.
Actually, according to the css standard, pixels have been defined as 1/96th of an inch , for a while now. For convenience sake though (and who can blame them?), browsers have been equating the css pixel with the actual pixel for all this time. If that meant redefining the length of the inch, so be it. It's not like Windows actually knows the size of an inch on your screen anyway. How could they, and still support every no-name brand Korean monitor on the market? By necessity, Apple can not do any better.


But pixels are pixels!


If you are the lucky owner of an iPhone 5, you have a 1136 x 640 screen in a 4" size. That leads to 1136 / (sqrt(pow(4, 2) / (pow(16, 2) + pow(9, 2)) * 16) = 325.846 ppi. So if I defined a font size of, say, 13px, in my css that would mean letters would be just over 1mm tall on the iPhone. For my American readers: that's very small ( about 4/100ths of an inch). Reading would be a wee bit hard on the eyes.
So mobile Safari doesn't do that. Instead it creates a virtual viewport that is even wider by default (980px), but also allows you to pick the size of your virtual viewport using meta tags.  It will also adjust the font size to not be ridiculously tiny. Android uses much the same meta tags - both use WebKit after all.


It's all taken care of! Thanks Apple! Let's have a beer and forget about it.


All this stuff makes it easy to create a site that will look decent on iPhone and iPad. Add a couple of media queries to make your website more responsive and call it a day. Your html is sized to your viewport, and then rendered to the high resolution screen, so text remains nice and crisp.
One item that remains though are the images. If you set your img size to 300 by 200, you probably also uploaded a image file with dimensions 300 by 200. But remember: the pixel is a lie. If you set your viewport low enough though, there are enough (physical) pixels there to render an image with a much higher resolution.    Css 4 is working hard to take care of this.


What am I supposed to do?


I have no idea. In a perfect world, I think we should combine physical measurements with percentages, and use media queries based on physical size. No platform properly supports the physical units though; a simple test confirmed that. Tablets could support this: they know the size of their screens perfectly. If they did render and inch to an inch though, a point would have to be a inch/72, and a pixel inch/96, and everything would be broken. Chicken and egg. It'd be nice if there was an opt-in meta tag or something similar for real physical measurements. Because pixels are dead.

Friday, December 28, 2012

SELECTing the comedy horror genre

Whenever we get a new intern, I like to poke and prod him*, to see what he's made of (bones and gooey bits, usually). To test SQL proficiency, I use the following problem. I like it because it's something that comes up occasionally in projects, and is surprisingly complex, though it seems like it shouldn't be:

Suppose you've got a movie database. You've got a table with movies T_MOVIES:

CREATE TABLE [dbo].[T_MOVIES](
 [movie_key] [int] NOT NULL,
 [movie_name] [nvarchar](500) NOT NULL,
 CONSTRAINT [PK_T_MOVIES] PRIMARY KEY CLUSTERED 
(
 [movie_key] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

and there's also a table for movie genres T_GENRES:

CREATE TABLE [dbo].[T_GENRES](
 [genre_key] [int] NOT NULL,
 [genre_name] [nvarchar](500) NOT NULL,
 CONSTRAINT [PK_T_GENRES] PRIMARY KEY CLUSTERED 
(
 [genre_key] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

To link both tables there is a third table T_MOVIE_GENRES:

CREATE TABLE [dbo].[T_MOVIE_GENRES](
 [movie_key] [int] NOT NULL,
 [genre_key] [int] NOT NULL,
 CONSTRAINT [PK_T_MOVIE_GENRES] PRIMARY KEY CLUSTERED 
(
 [movie_key] ASC,
 [genre_key] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

Assume the proper foreign key constraints have been applied, and you have a pretty sensible lay-out. To select all movies within the horror genre you only have to join T_MOVIES to T_MOVIE_GENRES and filter down your genre key in the WHERE clause to whatever the horror genre is, suppose it's 5:

SELECT
 T_MOVIES.movie_key,
 T_MOVIES.movie_name
FROM
 T_MOVIES INNER JOIN
 T_MOVIE_GENRES ON
 T_MOVIES.movie_key = T_MOVIE_GENRES.movie_key
WHERE
 T_MOVIE_GENRES.genre_key = 5

The question now is: What if I want to filter by two genres? How do I find movies that are not only horror, but also comedy? What query will yield "Shaun of the Dead"?
More generally, how do we find those records in a table that have more than one matching record in another table, where those matching records have a field set to a set of specific values?

Obviously WHERE T_MOVIE_GENRES.genre_key IN (1,5) is going to yield movies that belong to either genre.
At this point usually the intern scratches his chin, gives it a bit of thought, and comes up with this:

SELECT
 T_MOVIES.movie_key,
 T_MOVIES.movie_name
FROM
 T_MOVIES INNER JOIN
 (
 SELECT  T_MOVIE_GENRES.movie_key
 FROM T_MOVIE_GENRES
 WHERE T_MOVIE_GENRES.genre_key = 5
 ) AS Q_HORROR
 ON T_MOVIES.movie_key = Q_HORROR.movie_key
  INNER JOIN
 (
 SELECT  T_MOVIE_GENRES.movie_key
 FROM T_MOVIE_GENRES
 WHERE T_MOVIE_GENRES.genre_key = 1
 ) AS Q_COMEDY
 ON T_MOVIES.movie_key = Q_COMEDY.movie_key

(Or some other solution involving a subquery)
Okay, fine, you found "Shaun of the Dead". But what if I'm actually in the mood for Zombieland, a comedy / horror / road movie? In fact, I want a stored procedure that can take an arbitrary number of genres, and filter by it. You can go the dynamic sql route, and build a string with an arbitrary number of subqueries. It'd be ugly and hard to maintain, but it would work.
The following shows what I think is the best solution though:

SELECT
 T_MOVIES.movie_key,
 T_MOVIES.movie_name
FROM
 T_MOVIES INNER JOIN
 T_MOVIE_GENRES
 ON T_MOVIES.movie_key = T_MOVIE_GENRES.movie_key
WHERE
 T_MOVIE_GENRES.genre_key in (1,5)
GROUP BY
 T_MOVIES.movie_key,
 T_MOVIES.movie_name
HAVING
 COUNT(*) = 2

To refactor this into forementioned stored procedure you need to add a table-typed variable, and use a little bit of dynamic sql - because SQL does not have anything like arrays. Something like this:

CREATE PROCEDURE GetMoviesByGenres
 @genre_keys nvarchar(2000)
AS
BEGIN
 DECLARE @genre_keys_table table(genre_key int)
 DECLARE @genre_count int
 
 INSERT INTO @genre_keys_table
   EXEC(N'SELECT DISTINCT genre_key FROM T_GENRES WHERE genre_key IN (' + @genre_keys + N')')
 
 SELECT @genre_count = COUNT(*) FROM @genre_keys_table
 
 SELECT
  T_MOVIES.movie_key,
  T_MOVIES.movie_name
 FROM
  T_MOVIES INNER JOIN
  T_MOVIE_GENRES
  ON T_MOVIES.movie_key = T_MOVIE_GENRES.movie_key
 WHERE
  T_MOVIE_GENRES.genre_key IN (
   SELECT genre_key FROM @genre_keys_table
  )
 GROUP BY
  T_MOVIES.movie_key,
  T_MOVIES.movie_name
 HAVING
  COUNT(*) = @genre_count
END

If it wasn't clear from the code; @genre_keys takes a comma-delimited list of keys. That leaves an obvious injection vulnerability, so I would advise to have your calling code take an array of int, and not a string.
So that pretty much concludes that. If any of you ever end up as an intern at Tabeoka, you can now impress me with your mad SQL skillz.

Menno




* "him" because we've only ever gotten male interns. I'm prety sure I could get in trouble for poking and prodding a girl intern anyway. Both with the law, and my wife.