Determine text language

It is good practise to include <html lang="en"> or <html lang="da"> on your website. Having a lot of old content in danish and english which I wanted to publish, I needed to automatically determine the language. Here is my approach:

  1. find frequencies of letters in text
  2. measure distance between letter frequency in text, and letter frequency for danish / english. Choose language with shortest distance.

A better solution would be to look at digram etc., but this solution was just a few lines of code, and turned out to be good-enough™.