Code Highlighting

Sunday, August 18, 2013

What if performance was more important than accuracy?

In 2000 Interplay released a game called "Messiah". It wasn't a particularly good game, and the only reason I mention it is it had this automatic system for scaling detail levels. Rather than letting people set detail level manually, it would detect when it could add some more polygons, and when detail levels needed to be dialled down. That way the game could fairly consistently hit the target frames per second.

That got me wondering: what if you find yourself programming something for which being fast is more important than being exactly right?
This is not an entirely unfamiliar trade-off: everytime you use a double instead of a decimal you decide that fast calculations are more important than exact decimal representation of your numbers. And the UDP protocol foregoes transmission verification for speed.

In the past we would write asynchonous function calls somewhat like this:
(Now with async/await, things are different again. Well, they look different.)

// Start reading from that slow file system
var fileStream = new System.IO.FileStream(@"C:\blablah", System.IO.FileMode.Open, System.IO.FileAccess.Read);
var asyncResult = fileStream.BeginRead(blahblah ...);

// Do lots of stuff in between to maximize your efficiency

// And now you just need your data, so for the rest of the 
// time you're synchronously waiting for it to arrive
fileStream.EndRead(asyncResult);

That's fine and dandy if you're just reading data. But what if you're calculating something on another thread, and you've already found your result, but the caller hasn't asked for it yet? Maybe you could spend your time improving your result rather than twiddling your thumbs.
So we need a method that iteratively improves on its previous results, and returns each so that we have something to return. Obviously I chose pi (using Microsoft.SolverFoundation.Common):
(using BBP formula)

static IEnumerable<Rational> GetPi()
{
    Rational bigPi = new Rational();
    Rational bigMinusOne = -1;
    Rational bigOne = 1;
    Rational bigTwo = 2;
    Rational bigFour = 4;
    Rational bigFive = 5;
    Rational bigSix = 6;
    Rational bigEight = 8;
    Rational bigSixteen = 16;
    for (int i = 0; i < 10000; i++)
    {
        Rational bigI = i;
        Rational powI;
        Rational.Power(16, i, out powI);
        var bigS = (bigOne / powI) * (bigFour / ((bigEight * bigI) + bigOne) - (bigTwo / ((bigEight * bigI) + bigFour)) - (bigOne / ((bigEight * bigI) + bigFive)) - (bigOne / ((bigEight * bigI) + bigSix)));
        bigPi += bigS;
        // We don't want to return anything less than 7 hexadecimal digits
        if (i > 7)
            yield return bigPi;
    }
}

Sooo, up to ten thousand hexadecimal digits (hexits?) of pi generated as a decimal fraction. Now to write a wrapper class that runs this in a loop in another thread:

public sealed class IncreasinglyAccurate<T>
{
    private Func<IEnumerable<T>> _generator;
    private Thread _generatorThread;
    private bool _hasValue = false;
    private T _currentValue;
    public IncreasinglyAccurate(Func<IEnumerable<T>> generator)
    {
        if (generator == null)
            throw new ArgumentNullException("generator");
        _generator = generator;
        // returns as soon as the generator thread is started
        GetValues();
    }
    private void GetValues()
    {
        // Spinning up a new thread is wasteful, but I can't set priority
        // on work items added to the threadpool
        _generatorThread = new Thread(delegate()
        {
            foreach(T value in _generator()){
                _currentValue = value;
                _hasValue = true;
            }
        });
        _generatorThread.Priority = ThreadPriority.BelowNormal;
        _generatorThread.Start();
    }
    public T GetValue()
    {
        // wait for a value
        while (!_hasValue)
        {
            Thread.Sleep(1);
        }
        // delegates will need to expect a ThreadAbortException
        _generatorThread.Abort();
        _generatorThread.Join();
        return _currentValue;
    }
}
/// <summary>
/// Little wrapper to get generic type inference
/// </summary>
public static class IncreasingAccuracy
{
    public static IncreasinglyAccurate<S> FromDelegate<S>(Func<IEnumerable<S>> generator)
    {
        return new IncreasinglyAccurate<S>(generator);
    }
}

That looks about right. Now we should be able to use this as follows:

var piGenerator = IncreasingAccuracy.FromDelegate(GetPi);
// Do some work!
System.Threading.Thread.Sleep(50);
Console.Write(piGenerator.GetValue().ToString());

On my slow old computer that shows a very big number divided by another big number (probably a power of 16!). Increasing the sleep time increases the size of the number, so that looks right.

I'll grant you that pi digits are not that useful; I'm not sure when you would ever need more than, oh, 10 digits max. But I can think of some situations where it's useful to have a sufficient answer fast, but to keep thinking about a better one: AI, voice recognition, anything that will not be 100% perfect in a hurry.

Thursday, June 13, 2013

Something silly

I've been spending some of my free time working on something silly. People who know me expect me to bring up the OMGWTF2 contest now. That's a valid point, so let me start over.

I've been spending some of my free time working on two silly somethings. Here's a link to one of those: http://demo.tabeoka.be/Contact%20%20Tabeoka.htm . Click that link and fill out the form and send it up.
Go ahead, nothing bad will happen. I'll wait right here.






Did you click that link? No? You need to have clicked that link to make sense of the rest of this post. No really! Go back and click.






Right. So now that that is done, I have a second link for you: click right here.
Now there are three possibilities:
1) you're pretty gullible, and thought that was amazing,
2) you thought that was still pretty neat, or
3) that link did nothing because you're using some browser I haven't tested, which are many.

For those of you who chose option 3 and missed all the excitement: that second link should have replayed exactly what you did on that page the first time you were there. I had the idea after going through some Google Analytics graphs and wondering: "But why do they click that particular link?". You could potentially use this type of script to do usability testing on the cheap.
The code is for the most part pretty straightforward, but there are a few things which I thought were interesting to point out:

It generates unique selectors for elements. If you click something, it needs to log which element. When you're just dealing with javascript, you would keep a reference to the DOM element, but we're logging to json. Generating a jquery selector enables getting an easy reference to the clicked element. For now I'm generating a selector by going through parent elements ( td>tr ) and logging the index (:nth-child) until I hit an element with an id (#content) or the document root (html>body>). Amazingly I had to write this myself; I couldn't find a pre-written script on Google.

In the interest of keeping bandwidth use and file size down, I log  frame-by-frame. Every 40 ms (for a frame rate of 25) the data collector comes by, compiles and documentEvents object and adds it to the array. which brings me to the next point: this script is completely modular. Currently it logs input fields, mouse location, scroll position and clicks. If you want to add another thing to log though, say actions in a Flash object, you just write another set of methods to log and replay those, and point the main logger file to it. Remember that every additional logger adds extra bytes to the data that needs to be sent to the server.

Everybody knows that the onunload (or onbeforeunload) can not be trusted. That's why I ping the latest frames to the server every second. You could set it to 2 seconds or 10 seconds or whatever as well. At the server end I have a .NET handler that appends the extra json to the end of a file. The first request generates a new file and sends back the id for subsequent requests (currently a Guid). This would be trivial to implement in another server language.

Current problems include:
In Chrome onchange on an input field is only fired after the field loses focus. IE fires it on keypress. That means that in IE you see the text field value being typed up, but in Chrome it just appears fully written. Chrome is better for bandwidth, but IE is nicer to see replayed. I haven't made up my mind yet which is preferable.

Replaying the logged event mutates the initial page state, but moving back the timeline does not revert to previous page state. I could clear text fields and such, but simulated clicks that fired event handlers might be impossible to undo. Like that cookie notice.

Lastly, this is just some demo script code, don't use it for anything serious. I haven't figured out yet if I'm going to actually release this code as open source at e.g. GitHub. Don't copy/paste and run with it. I'm sure it's illegal or something.

Tuesday, May 7, 2013

Short: belgian rijksregisternummer validation in javascript

It's been really busy, so no time for blog posts. In the meantime, here is a short function to validate id card numbers (rijksregisternummer) in Belgium, according to the official guidelines. I could not find a proper implementation of this in javascript.

        function IsRRNoValid(n) {
            // RR numbers need to be 11 chars long
            if (n.length != 11)
                return false;

            var checkDigit = n.substr(n.length - 2, 2);
            var modFunction = function(nr) { return 97 - (nr % 97); };
            var nrToCheck = parseInt(n.substr(0, 9));
            
            // first check without 2
            if (modFunction(nrToCheck) == checkDigit)
                return true;

            // then check with 2 appended for y2k+ births
            nrToCheck = parseInt('2' + n.substr(0, 9));

            return (modFunction(nrToCheck) == checkDigit);
        }

Easy as pie. You can get the gender and date of birth out of there as well, but I don't need that info for my purpose. A slightly more advanced validation would also check to see if the first 6 digits represent a valid date according to the yyMMdd format (determining the right century using the current validation).

Monday, March 4, 2013

Good news! People are smart! Kind of!

For customer HAROL we wrote a little memory game. There are 16 cards (8 pairs), and you get 3 attempts to try and find a match. If you do find a match you might just win an iPad! If you don't - well, you still get a discount coupon. The game was played on iPads set up at Batibouw. There is no online link for the game, or I'd link you there *.

Before we sent the game off to Batibouw, I sat down to calculate the likelihood of finding a match within your three attempts: it shouldn't be too hard or Harol would never get rid of its iPads. For calculations, I would assume a 100% perfect memory. The most interesting part here is how close people would get to that 100% perfect memory score. Are people smart? Kind of? Or are they really dumb?

I haven' t had a lot of formal statistics training, so I've largely relied on The Big Three: "simple common sense", "blatant disregard for other people's opinions" and "shameless guessing". Readers with a background in statistics: feel free to set me on the right path in the comments. I'll feel free to blatantly disregard you.
Anyway, here's what I figured:

Three turns, six cards. Rather than calculate the 'winning' percentage, I'll calculate the 'losing' percentage, and deduct from 100 %. Losing is the one route you take through the game where you don't find a match, winning is any of 3 distinct events - maybe even 5.
  • Card 1: nothing to do here
  • Card 2: Lucky guess chance to win. Chances to lose: 14/15
  • Card 3: May be a match for one of the 2 previous ones. 12/14 it ain't though.
  • Card 4: And if it ain't, there's still a lucky guess chance: 12/13
  • Card 5: A match for any of the previous ones is starting to be likely, but not quite: 8/12
  • Card 6: Last chance to hit a match: 10/11
So where does that put us? Far as I can see, we just multiply all of those fractions, and come up with the losing percentage: 0,4475524475524476. Say 45%. You're actually 55% likely to win if you play it smart. (If you just guess blindly, your winning chances are obviously (1-pow(14/15, 3)) = 18.6 % = not looking good).

So how did people do?




**** drumroll ****




49%
Only 6% away from a perfect memory.

Good news! People are smart! Kind of!



* For those of you who care: it was an MVC app, some nice ajax, css transitions that I could afford to rely on, as it only needed to work on iPad. Who knows maybe it will be recycled for another promotion that will run online.

Thursday, January 17, 2013

Pixels are dead.

And Apple killed them.

Well, they're not really dead: I'm sure that in the foreseeable future images will still be rendered onto a pixel-based (or semi-pixel based) screen of sorts. In css though, it is time to say goodbye: you shouldn't use pixels anymore. But don't worry: you don't anyway. Don't fool yourself into thinking css pixels necessarily translate 1:1 to actual pixels on an actual screen.


Allow me to demonstrate


Css3 defines the newfangled concept of a resolution unit. It can be set to the traditional dpi, dpcm (if you're a metric-nazi) or dppx (dots per pixel. That's right, dots per pixel). And just like that css formalizes what tablets and smartphones have been doing for a long time: the css pixel is now an abstraction.
Actually, according to the css standard, pixels have been defined as 1/96th of an inch , for a while now. For convenience sake though (and who can blame them?), browsers have been equating the css pixel with the actual pixel for all this time. If that meant redefining the length of the inch, so be it. It's not like Windows actually knows the size of an inch on your screen anyway. How could they, and still support every no-name brand Korean monitor on the market? By necessity, Apple can not do any better.


But pixels are pixels!


If you are the lucky owner of an iPhone 5, you have a 1136 x 640 screen in a 4" size. That leads to 1136 / (sqrt(pow(4, 2) / (pow(16, 2) + pow(9, 2)) * 16) = 325.846 ppi. So if I defined a font size of, say, 13px, in my css that would mean letters would be just over 1mm tall on the iPhone. For my American readers: that's very small ( about 4/100ths of an inch). Reading would be a wee bit hard on the eyes.
So mobile Safari doesn't do that. Instead it creates a virtual viewport that is even wider by default (980px), but also allows you to pick the size of your virtual viewport using meta tags.  It will also adjust the font size to not be ridiculously tiny. Android uses much the same meta tags - both use WebKit after all.


It's all taken care of! Thanks Apple! Let's have a beer and forget about it.


All this stuff makes it easy to create a site that will look decent on iPhone and iPad. Add a couple of media queries to make your website more responsive and call it a day. Your html is sized to your viewport, and then rendered to the high resolution screen, so text remains nice and crisp.
One item that remains though are the images. If you set your img size to 300 by 200, you probably also uploaded a image file with dimensions 300 by 200. But remember: the pixel is a lie. If you set your viewport low enough though, there are enough (physical) pixels there to render an image with a much higher resolution.    Css 4 is working hard to take care of this.


What am I supposed to do?


I have no idea. In a perfect world, I think we should combine physical measurements with percentages, and use media queries based on physical size. No platform properly supports the physical units though; a simple test confirmed that. Tablets could support this: they know the size of their screens perfectly. If they did render and inch to an inch though, a point would have to be a inch/72, and a pixel inch/96, and everything would be broken. Chicken and egg. It'd be nice if there was an opt-in meta tag or something similar for real physical measurements. Because pixels are dead.

Friday, December 28, 2012

SELECTing the comedy horror genre

Whenever we get a new intern, I like to poke and prod him*, to see what he's made of (bones and gooey bits, usually). To test SQL proficiency, I use the following problem. I like it because it's something that comes up occasionally in projects, and is surprisingly complex, though it seems like it shouldn't be:

Suppose you've got a movie database. You've got a table with movies T_MOVIES:

CREATE TABLE [dbo].[T_MOVIES](
 [movie_key] [int] NOT NULL,
 [movie_name] [nvarchar](500) NOT NULL,
 CONSTRAINT [PK_T_MOVIES] PRIMARY KEY CLUSTERED 
(
 [movie_key] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

and there's also a table for movie genres T_GENRES:

CREATE TABLE [dbo].[T_GENRES](
 [genre_key] [int] NOT NULL,
 [genre_name] [nvarchar](500) NOT NULL,
 CONSTRAINT [PK_T_GENRES] PRIMARY KEY CLUSTERED 
(
 [genre_key] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

To link both tables there is a third table T_MOVIE_GENRES:

CREATE TABLE [dbo].[T_MOVIE_GENRES](
 [movie_key] [int] NOT NULL,
 [genre_key] [int] NOT NULL,
 CONSTRAINT [PK_T_MOVIE_GENRES] PRIMARY KEY CLUSTERED 
(
 [movie_key] ASC,
 [genre_key] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

Assume the proper foreign key constraints have been applied, and you have a pretty sensible lay-out. To select all movies within the horror genre you only have to join T_MOVIES to T_MOVIE_GENRES and filter down your genre key in the WHERE clause to whatever the horror genre is, suppose it's 5:

SELECT
 T_MOVIES.movie_key,
 T_MOVIES.movie_name
FROM
 T_MOVIES INNER JOIN
 T_MOVIE_GENRES ON
 T_MOVIES.movie_key = T_MOVIE_GENRES.movie_key
WHERE
 T_MOVIE_GENRES.genre_key = 5

The question now is: What if I want to filter by two genres? How do I find movies that are not only horror, but also comedy? What query will yield "Shaun of the Dead"?
More generally, how do we find those records in a table that have more than one matching record in another table, where those matching records have a field set to a set of specific values?

Obviously WHERE T_MOVIE_GENRES.genre_key IN (1,5) is going to yield movies that belong to either genre.
At this point usually the intern scratches his chin, gives it a bit of thought, and comes up with this:

SELECT
 T_MOVIES.movie_key,
 T_MOVIES.movie_name
FROM
 T_MOVIES INNER JOIN
 (
 SELECT  T_MOVIE_GENRES.movie_key
 FROM T_MOVIE_GENRES
 WHERE T_MOVIE_GENRES.genre_key = 5
 ) AS Q_HORROR
 ON T_MOVIES.movie_key = Q_HORROR.movie_key
  INNER JOIN
 (
 SELECT  T_MOVIE_GENRES.movie_key
 FROM T_MOVIE_GENRES
 WHERE T_MOVIE_GENRES.genre_key = 1
 ) AS Q_COMEDY
 ON T_MOVIES.movie_key = Q_COMEDY.movie_key

(Or some other solution involving a subquery)
Okay, fine, you found "Shaun of the Dead". But what if I'm actually in the mood for Zombieland, a comedy / horror / road movie? In fact, I want a stored procedure that can take an arbitrary number of genres, and filter by it. You can go the dynamic sql route, and build a string with an arbitrary number of subqueries. It'd be ugly and hard to maintain, but it would work.
The following shows what I think is the best solution though:

SELECT
 T_MOVIES.movie_key,
 T_MOVIES.movie_name
FROM
 T_MOVIES INNER JOIN
 T_MOVIE_GENRES
 ON T_MOVIES.movie_key = T_MOVIE_GENRES.movie_key
WHERE
 T_MOVIE_GENRES.genre_key in (1,5)
GROUP BY
 T_MOVIES.movie_key,
 T_MOVIES.movie_name
HAVING
 COUNT(*) = 2

To refactor this into forementioned stored procedure you need to add a table-typed variable, and use a little bit of dynamic sql - because SQL does not have anything like arrays. Something like this:

CREATE PROCEDURE GetMoviesByGenres
 @genre_keys nvarchar(2000)
AS
BEGIN
 DECLARE @genre_keys_table table(genre_key int)
 DECLARE @genre_count int
 
 INSERT INTO @genre_keys_table
   EXEC(N'SELECT DISTINCT genre_key FROM T_GENRES WHERE genre_key IN (' + @genre_keys + N')')
 
 SELECT @genre_count = COUNT(*) FROM @genre_keys_table
 
 SELECT
  T_MOVIES.movie_key,
  T_MOVIES.movie_name
 FROM
  T_MOVIES INNER JOIN
  T_MOVIE_GENRES
  ON T_MOVIES.movie_key = T_MOVIE_GENRES.movie_key
 WHERE
  T_MOVIE_GENRES.genre_key IN (
   SELECT genre_key FROM @genre_keys_table
  )
 GROUP BY
  T_MOVIES.movie_key,
  T_MOVIES.movie_name
 HAVING
  COUNT(*) = @genre_count
END

If it wasn't clear from the code; @genre_keys takes a comma-delimited list of keys. That leaves an obvious injection vulnerability, so I would advise to have your calling code take an array of int, and not a string.
So that pretty much concludes that. If any of you ever end up as an intern at Tabeoka, you can now impress me with your mad SQL skillz.

Menno




* "him" because we've only ever gotten male interns. I'm prety sure I could get in trouble for poking and prodding a girl intern anyway. Both with the law, and my wife.

Monday, December 17, 2012

Html and Svg: handling events to and fro

I've been working on a website where I embed a couple of svg's in an html page. Actually, I embed the same svg in three places on one page. The svg in question has some script-driven animation, so I needed to use an object tag, rather than a simple img.
First challenge was this: I needed to trigger the svg animation for all svg's one after the other, which meant I had to call into the svg's script:

        function bounceBall(ball) {
            var svgView = getSvgView(ball);
            
            if(svgView)
                svgView.startBounce();
        }

        function getSvgView(ball) {
            var svgDoc;
            try {
                if (ball.getSVGDocument)
                    svgDoc = ball.getSVGDocument();
                else if (ball.contentDocument)
                    svgDoc = ball.contentDocument;

                if (svgDoc) {
                    return svgDoc.defaultView;
                }
            } catch (e) { }
            return null;
        }

When I initially call this function, the SVG dom might not be loaded yet. That's why I include the try/catch, and return null if an error occurs. Elsewhere in the code I set a setTimeout to retry 200 milliseconds later. The onload event is not quite reliable enough.

Next up was the click handler. If you simply add an onclick to the object tag, nothing happens when you click the svg. That makes perfect sense: the onclick is registered and handled within the svg, and never makes it to the html dom. I needed to have a click on the image open a little div in html though. From svg I could call the html javascript functions using top.someFunctionName(). I had the same svg image three times though, and it needed to do something different each time. Here's what I came up with:

Html:

<object type="image/svg+xml" data="/Content/Images/ball.svg" class="ball" style="left: 130px; top: 160px;" onclick="showPopup('homepopup2');"></object>

"But wait!" you say, "You just told me that doesn't work!". And it doesn't, but it would be pretty convenient if it did:

Html dom javascript:

        function setClickEvent(ball) {
            if (ball.onclick) {
                var svgwin = getSvgView(ball);
                if (svgwin) {
                    svgwin.eventHandler = ball.onclick;
                } else {
                    // if the svg view is not available,
                    // try again in 200 ms.
                    window.setTimeout(function() {
                        setClickEvent(ball);
                    }, 200);
                }
            }
        }

There you go. On load I simply funnel the onclick handler into the svg dom. The svg implementation is trivial:

    var eventHandler = null;

    function handleClick(){
      if(eventHandler)
        eventHandler();
    }


<circle cx="15" cy="15" r="5" id="ball" onclick="handleClick();" />

If you need this sort of thing more often - or for more events - you could work out a neat wrapper with  registerEventHandler(eventName, eventHandler) and triggerEvent(eventName) methods to reuse (and a html-side script that automatically hooks it up). I don't currently foresee a need for it myself though.

Menno