Code Highlighting

Sunday, August 18, 2013

What if performance was more important than accuracy?

In 2000 Interplay released a game called "Messiah". It wasn't a particularly good game, and the only reason I mention it is it had this automatic system for scaling detail levels. Rather than letting people set detail level manually, it would detect when it could add some more polygons, and when detail levels needed to be dialled down. That way the game could fairly consistently hit the target frames per second.

That got me wondering: what if you find yourself programming something for which being fast is more important than being exactly right?
This is not an entirely unfamiliar trade-off: everytime you use a double instead of a decimal you decide that fast calculations are more important than exact decimal representation of your numbers. And the UDP protocol foregoes transmission verification for speed.

In the past we would write asynchonous function calls somewhat like this:
(Now with async/await, things are different again. Well, they look different.)

// Start reading from that slow file system
var fileStream = new System.IO.FileStream(@"C:\blablah", System.IO.FileMode.Open, System.IO.FileAccess.Read);
var asyncResult = fileStream.BeginRead(blahblah ...);

// Do lots of stuff in between to maximize your efficiency

// And now you just need your data, so for the rest of the 
// time you're synchronously waiting for it to arrive
fileStream.EndRead(asyncResult);

That's fine and dandy if you're just reading data. But what if you're calculating something on another thread, and you've already found your result, but the caller hasn't asked for it yet? Maybe you could spend your time improving your result rather than twiddling your thumbs.
So we need a method that iteratively improves on its previous results, and returns each so that we have something to return. Obviously I chose pi (using Microsoft.SolverFoundation.Common):
(using BBP formula)

static IEnumerable<Rational> GetPi()
{
    Rational bigPi = new Rational();
    Rational bigMinusOne = -1;
    Rational bigOne = 1;
    Rational bigTwo = 2;
    Rational bigFour = 4;
    Rational bigFive = 5;
    Rational bigSix = 6;
    Rational bigEight = 8;
    Rational bigSixteen = 16;
    for (int i = 0; i < 10000; i++)
    {
        Rational bigI = i;
        Rational powI;
        Rational.Power(16, i, out powI);
        var bigS = (bigOne / powI) * (bigFour / ((bigEight * bigI) + bigOne) - (bigTwo / ((bigEight * bigI) + bigFour)) - (bigOne / ((bigEight * bigI) + bigFive)) - (bigOne / ((bigEight * bigI) + bigSix)));
        bigPi += bigS;
        // We don't want to return anything less than 7 hexadecimal digits
        if (i > 7)
            yield return bigPi;
    }
}

Sooo, up to ten thousand hexadecimal digits (hexits?) of pi generated as a decimal fraction. Now to write a wrapper class that runs this in a loop in another thread:

public sealed class IncreasinglyAccurate<T>
{
    private Func<IEnumerable<T>> _generator;
    private Thread _generatorThread;
    private bool _hasValue = false;
    private T _currentValue;
    public IncreasinglyAccurate(Func<IEnumerable<T>> generator)
    {
        if (generator == null)
            throw new ArgumentNullException("generator");
        _generator = generator;
        // returns as soon as the generator thread is started
        GetValues();
    }
    private void GetValues()
    {
        // Spinning up a new thread is wasteful, but I can't set priority
        // on work items added to the threadpool
        _generatorThread = new Thread(delegate()
        {
            foreach(T value in _generator()){
                _currentValue = value;
                _hasValue = true;
            }
        });
        _generatorThread.Priority = ThreadPriority.BelowNormal;
        _generatorThread.Start();
    }
    public T GetValue()
    {
        // wait for a value
        while (!_hasValue)
        {
            Thread.Sleep(1);
        }
        // delegates will need to expect a ThreadAbortException
        _generatorThread.Abort();
        _generatorThread.Join();
        return _currentValue;
    }
}
/// <summary>
/// Little wrapper to get generic type inference
/// </summary>
public static class IncreasingAccuracy
{
    public static IncreasinglyAccurate<S> FromDelegate<S>(Func<IEnumerable<S>> generator)
    {
        return new IncreasinglyAccurate<S>(generator);
    }
}

That looks about right. Now we should be able to use this as follows:

var piGenerator = IncreasingAccuracy.FromDelegate(GetPi);
// Do some work!
System.Threading.Thread.Sleep(50);
Console.Write(piGenerator.GetValue().ToString());

On my slow old computer that shows a very big number divided by another big number (probably a power of 16!). Increasing the sleep time increases the size of the number, so that looks right.

I'll grant you that pi digits are not that useful; I'm not sure when you would ever need more than, oh, 10 digits max. But I can think of some situations where it's useful to have a sufficient answer fast, but to keep thinking about a better one: AI, voice recognition, anything that will not be 100% perfect in a hurry.

Thursday, June 13, 2013

Something silly

I've been spending some of my free time working on something silly. People who know me expect me to bring up the OMGWTF2 contest now. That's a valid point, so let me start over.

I've been spending some of my free time working on two silly somethings. Here's a link to one of those: http://demo.tabeoka.be/Contact%20%20Tabeoka.htm . Click that link and fill out the form and send it up.
Go ahead, nothing bad will happen. I'll wait right here.






Did you click that link? No? You need to have clicked that link to make sense of the rest of this post. No really! Go back and click.






Right. So now that that is done, I have a second link for you: click right here.
Now there are three possibilities:
1) you're pretty gullible, and thought that was amazing,
2) you thought that was still pretty neat, or
3) that link did nothing because you're using some browser I haven't tested, which are many.

For those of you who chose option 3 and missed all the excitement: that second link should have replayed exactly what you did on that page the first time you were there. I had the idea after going through some Google Analytics graphs and wondering: "But why do they click that particular link?". You could potentially use this type of script to do usability testing on the cheap.
The code is for the most part pretty straightforward, but there are a few things which I thought were interesting to point out:

It generates unique selectors for elements. If you click something, it needs to log which element. When you're just dealing with javascript, you would keep a reference to the DOM element, but we're logging to json. Generating a jquery selector enables getting an easy reference to the clicked element. For now I'm generating a selector by going through parent elements ( td>tr ) and logging the index (:nth-child) until I hit an element with an id (#content) or the document root (html>body>). Amazingly I had to write this myself; I couldn't find a pre-written script on Google.

In the interest of keeping bandwidth use and file size down, I log  frame-by-frame. Every 40 ms (for a frame rate of 25) the data collector comes by, compiles and documentEvents object and adds it to the array. which brings me to the next point: this script is completely modular. Currently it logs input fields, mouse location, scroll position and clicks. If you want to add another thing to log though, say actions in a Flash object, you just write another set of methods to log and replay those, and point the main logger file to it. Remember that every additional logger adds extra bytes to the data that needs to be sent to the server.

Everybody knows that the onunload (or onbeforeunload) can not be trusted. That's why I ping the latest frames to the server every second. You could set it to 2 seconds or 10 seconds or whatever as well. At the server end I have a .NET handler that appends the extra json to the end of a file. The first request generates a new file and sends back the id for subsequent requests (currently a Guid). This would be trivial to implement in another server language.

Current problems include:
In Chrome onchange on an input field is only fired after the field loses focus. IE fires it on keypress. That means that in IE you see the text field value being typed up, but in Chrome it just appears fully written. Chrome is better for bandwidth, but IE is nicer to see replayed. I haven't made up my mind yet which is preferable.

Replaying the logged event mutates the initial page state, but moving back the timeline does not revert to previous page state. I could clear text fields and such, but simulated clicks that fired event handlers might be impossible to undo. Like that cookie notice.

Lastly, this is just some demo script code, don't use it for anything serious. I haven't figured out yet if I'm going to actually release this code as open source at e.g. GitHub. Don't copy/paste and run with it. I'm sure it's illegal or something.

Tuesday, May 7, 2013

Short: belgian rijksregisternummer validation in javascript

It's been really busy, so no time for blog posts. In the meantime, here is a short function to validate id card numbers (rijksregisternummer) in Belgium, according to the official guidelines. I could not find a proper implementation of this in javascript.

        function IsRRNoValid(n) {
            // RR numbers need to be 11 chars long
            if (n.length != 11)
                return false;

            var checkDigit = n.substr(n.length - 2, 2);
            var modFunction = function(nr) { return 97 - (nr % 97); };
            var nrToCheck = parseInt(n.substr(0, 9));
            
            // first check without 2
            if (modFunction(nrToCheck) == checkDigit)
                return true;

            // then check with 2 appended for y2k+ births
            nrToCheck = parseInt('2' + n.substr(0, 9));

            return (modFunction(nrToCheck) == checkDigit);
        }

Easy as pie. You can get the gender and date of birth out of there as well, but I don't need that info for my purpose. A slightly more advanced validation would also check to see if the first 6 digits represent a valid date according to the yyMMdd format (determining the right century using the current validation).

Monday, March 4, 2013

Good news! People are smart! Kind of!

For customer HAROL we wrote a little memory game. There are 16 cards (8 pairs), and you get 3 attempts to try and find a match. If you do find a match you might just win an iPad! If you don't - well, you still get a discount coupon. The game was played on iPads set up at Batibouw. There is no online link for the game, or I'd link you there *.

Before we sent the game off to Batibouw, I sat down to calculate the likelihood of finding a match within your three attempts: it shouldn't be too hard or Harol would never get rid of its iPads. For calculations, I would assume a 100% perfect memory. The most interesting part here is how close people would get to that 100% perfect memory score. Are people smart? Kind of? Or are they really dumb?

I haven' t had a lot of formal statistics training, so I've largely relied on The Big Three: "simple common sense", "blatant disregard for other people's opinions" and "shameless guessing". Readers with a background in statistics: feel free to set me on the right path in the comments. I'll feel free to blatantly disregard you.
Anyway, here's what I figured:

Three turns, six cards. Rather than calculate the 'winning' percentage, I'll calculate the 'losing' percentage, and deduct from 100 %. Losing is the one route you take through the game where you don't find a match, winning is any of 3 distinct events - maybe even 5.
  • Card 1: nothing to do here
  • Card 2: Lucky guess chance to win. Chances to lose: 14/15
  • Card 3: May be a match for one of the 2 previous ones. 12/14 it ain't though.
  • Card 4: And if it ain't, there's still a lucky guess chance: 12/13
  • Card 5: A match for any of the previous ones is starting to be likely, but not quite: 8/12
  • Card 6: Last chance to hit a match: 10/11
So where does that put us? Far as I can see, we just multiply all of those fractions, and come up with the losing percentage: 0,4475524475524476. Say 45%. You're actually 55% likely to win if you play it smart. (If you just guess blindly, your winning chances are obviously (1-pow(14/15, 3)) = 18.6 % = not looking good).

So how did people do?




**** drumroll ****




49%
Only 6% away from a perfect memory.

Good news! People are smart! Kind of!



* For those of you who care: it was an MVC app, some nice ajax, css transitions that I could afford to rely on, as it only needed to work on iPad. Who knows maybe it will be recycled for another promotion that will run online.

Thursday, January 17, 2013

Pixels are dead.

And Apple killed them.

Well, they're not really dead: I'm sure that in the foreseeable future images will still be rendered onto a pixel-based (or semi-pixel based) screen of sorts. In css though, it is time to say goodbye: you shouldn't use pixels anymore. But don't worry: you don't anyway. Don't fool yourself into thinking css pixels necessarily translate 1:1 to actual pixels on an actual screen.


Allow me to demonstrate


Css3 defines the newfangled concept of a resolution unit. It can be set to the traditional dpi, dpcm (if you're a metric-nazi) or dppx (dots per pixel. That's right, dots per pixel). And just like that css formalizes what tablets and smartphones have been doing for a long time: the css pixel is now an abstraction.
Actually, according to the css standard, pixels have been defined as 1/96th of an inch , for a while now. For convenience sake though (and who can blame them?), browsers have been equating the css pixel with the actual pixel for all this time. If that meant redefining the length of the inch, so be it. It's not like Windows actually knows the size of an inch on your screen anyway. How could they, and still support every no-name brand Korean monitor on the market? By necessity, Apple can not do any better.


But pixels are pixels!


If you are the lucky owner of an iPhone 5, you have a 1136 x 640 screen in a 4" size. That leads to 1136 / (sqrt(pow(4, 2) / (pow(16, 2) + pow(9, 2)) * 16) = 325.846 ppi. So if I defined a font size of, say, 13px, in my css that would mean letters would be just over 1mm tall on the iPhone. For my American readers: that's very small ( about 4/100ths of an inch). Reading would be a wee bit hard on the eyes.
So mobile Safari doesn't do that. Instead it creates a virtual viewport that is even wider by default (980px), but also allows you to pick the size of your virtual viewport using meta tags.  It will also adjust the font size to not be ridiculously tiny. Android uses much the same meta tags - both use WebKit after all.


It's all taken care of! Thanks Apple! Let's have a beer and forget about it.


All this stuff makes it easy to create a site that will look decent on iPhone and iPad. Add a couple of media queries to make your website more responsive and call it a day. Your html is sized to your viewport, and then rendered to the high resolution screen, so text remains nice and crisp.
One item that remains though are the images. If you set your img size to 300 by 200, you probably also uploaded a image file with dimensions 300 by 200. But remember: the pixel is a lie. If you set your viewport low enough though, there are enough (physical) pixels there to render an image with a much higher resolution.    Css 4 is working hard to take care of this.


What am I supposed to do?


I have no idea. In a perfect world, I think we should combine physical measurements with percentages, and use media queries based on physical size. No platform properly supports the physical units though; a simple test confirmed that. Tablets could support this: they know the size of their screens perfectly. If they did render and inch to an inch though, a point would have to be a inch/72, and a pixel inch/96, and everything would be broken. Chicken and egg. It'd be nice if there was an opt-in meta tag or something similar for real physical measurements. Because pixels are dead.