Determine text language
It is good practise to include <html lang="en">
or <html lang="da">
on your website. Having a lot of old content in danish and english which I wanted to publish, I needed to automatically determine the language. Here is my approach:
- find frequencies of letters in text
- measure distance between letter frequency in text, and letter frequency for danish / english. Choose language with shortest distance.
A better solution would be to look at digram etc., but this solution was just a few lines of code, and turned out to be good-enough™.