Language Detection script change requests
Donatzsky
Status: New User - Welcome
Joined: 21 Feb 2008
Posts: 2
Reply Quote
I'm sorry to say it, but your language detection script has a number of problems (all PHP scripts for this purpose, that I have found, except the PEAR package, suffers from these or other problems):

  1. It ignores quality values (q=0.2). Not the worst offense in the context of an Accept-Language header, but easy enough to implement.
  2. It doesn't allow for a distinction between "en-gb" and "en-us" as it will always reduce them to "en".
  3. Given this Accept-Language header: "da, en-us;q=0.8, en;q=0.5, fr;q=0.3" (straight from my browser) and this redirect list: "fr, en, default", it will select "default", despite both languages being in both lists, as it will try to match "da".

Here are the relevant RFCs:
www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4
www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.9
Back to top
techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Posts: 4129
Location: East Coast, West Coast? I know it's one of them.
Reply Quote
I took a look, the code did have a small bug, related to the presence of a space in the language string.
Back to top
techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Posts: 4129
Location: East Coast, West Coast? I know it's one of them.
Reply Quote
I pasted in the value you gave above to test it, and here's the output:

:: Code ::
Array ( [0] => Array ( [0] => da [1] => da [2] => Danish [3] => Danish ) [1] => Array ( [0] => en-us [1] => e ) [2] => Array ( [0] => en [1] => e ) [3] => Array ( [0] => fr [1] => f ) )


then I ran a test page:

:: Quote ::
System Language

Primary Language id: da
Primary Language:
Danish


As you can see, there is a bug

I haven't looked at that script for a while though, so it's nice to see that it's still working more or less

I'll take a look at that one and see if I can get it fixed.
Back to top
Donatzsky
Status: New User - Welcome
Joined: 21 Feb 2008
Posts: 2
Reply Quote
1. is fairly minor, although it should at the very least handle "q=0" properly.

:: Quote ::
Sorry, I was wrong, your number 2 is not valid.

Whoops, my bad. It's actually been a little while since I tested it, and missed that when skimming the code today. Not that it matters much, since $feature='data' doesn't do much useful. It would be better to fix the 'header' part and then match the output against the language list.

:: Quote ::
I agree there are fringe cases that aren't supported. Why? Because they are fringe, and affect only a relatively tiny number of users compared to the larger user base as a whole.

3. is hardly a fringe case, but a serious design flaw that is easily solved with a few array_intersect() and array_diff().
Don't you agree that if your site offers, say, a French version and my browser asks for it but gets the English version, that it's a problem? Sure you can (and should) have a manual switch, but what's the point in having a script do it automatically, if it doesn't work?

:: Quote ::
Anyway, look the code over and fix it, then post the code, if you feel this issue is so massively serious that it merits your time.

Fixing it would essentially be the same as a rewrite, so why bother?

Actually I had already started on my own implementation before I found yours, but figured that there ought to be some finished scripts out there. Well, there are a few, but as I said they all (except for the PEAR package which, however, does complete content negotiation) suck in various ways, and are obviously written by complete novices or someone that really shouldn't be allowed to program.
My implementation: www.anaerob.dk/stuff/GetPreferredLanguage.php.txt
It still needs some polish/testing, and I'm sure the code could use some optimising/refactoring, but the general functionality is in place and seems to work as expected.
Back to top
techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Posts: 4129
Location: East Coast, West Coast? I know it's one of them.
Reply Quote
sorry, I updated my postings.

The only problem with the script was failing to remove whitespaces in the initial string, which was an oversight easily patched.

the test version now works as expected.

Re trying to guess what the user actually wants to happen on a site with language redirects, with multiple languages, that's not something I think you'll find much use for.

Certainly very few major sites chose this course of action.

I wouldn't presume to do that, although I might offer the first language option as default, but that's about all I'd do.

You spent a lot of extra effort though, all that was happening here was failing to remove white space from strings, and then after that, all you needed to do was add another array item to put in the q value, if you wanted to do that, which I can see no reason to do.
Back to top
techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Posts: 4129
Location: East Coast, West Coast? I know it's one of them.
Reply Quote
The actual bug was trivial to fix, though it did take a while to figure out:

All that is required is to replace ' ' with '', null:

:: Code ::
//check to see if language is set
   if ( isset( $_SERVER["HTTP_ACCEPT_LANGUAGE"] ) )
   {
      $languages = strtolower( $_SERVER["HTTP_ACCEPT_LANGUAGE"] );
      // $languages = ' fr-ch;q=0.3, da, en-us;q=0.8, en;q=0.5, fr;q=0.3';
      // need to remove spaces from strings to avoid error
      $languages = str_replace( ' ', '', $languages );
      $languages = explode( ",", $languages );
      //$languages = explode( ",", $test);// this is for testing purposes only

This fix is in 0.3.5 now.

As for the q stuff, that's also fairly trivial to add, but also pretty pointless, but in case someone wants it, just add, with a q is present test,

:: Code ::
$temp_array[0] = substr( $language_list, 0, strcspn( $language_list, ';' ) );//full language
         $temp_array[1] = substr( $language_list, 0, 2 );// cut out primary language

there, use: $temp_array[2] = explode( '=', explode(';', $language_list) );

or something like that, and test if you want more data, but again, using q for assigning pages is probably just going to bug your users.

Anyway, thanks for the bug report, even though the report wasn't really accurate, there was in fact a bug, which did cause some issues for multiple languages, so best to have it work right.
Back to top
Display posts from previous:   

All times are GMT - 8 Hours