It is good practise to include
<html lang="en"> or
<html lang="da"> on your website. Having a lot of old content in danish and english which I wanted to publish, I needed to automatically determine the language. Here is my approach:
- find frequencies of letters in text
- measure distance between letter frequency in text, and letter frequency for danish / english. Choose language with shortest distance.
A better solution would be to look at digram etc., but this solution was just a few lines of code, and turned out to be good-enough™.