Searchenginefriendly URLs

Johannes Beus
Speaking, searchenginefriendly URLs are in demand not only since Stefan Karzaunikats introduction to this subject as well as mod_rewrite in the current c't. Since many projects do not implement finished solutions, which are used for blog-systems for example, you have everybody tinkering with this with differing levels of abilities. I want to provide some tips for possible snares and small codesnippets, which are working for us.

Speaking URLs are often build in a way that the original ID, which is needed to identify the dataset in the database, remains in the URL but that additionally the title and the name of the dataset are included. Take the URL for this article for example: while from a technical point of view it would have been enough to have a URL like www.sistrx.com/news/670, here the headline as well as the ending “.html” were added. The appropriated PHP-function is similar to the following:

function filename($str) {

    static 
$from = array('ä',  'á''à''â''å''ö',  'ó''ò''ô''õ''ü',  'ú''ù''û''é''è''ê''í''ì''î''ß',  'ç''Ç''ñ''ý'); 
    static 
$to   = array('ae''a''a''a''a''oe''o''o''o''o''ue''u''u''u''e''e''e''i''i''i''ss''c''c''n''y');

    
$str trim(preg_replace("#\s+#"' 'preg_replace("#[^a-z0-9\.]#"' 'str_replace($from$tostrtolower($str)))));
    
$str str_replace(' ''-'$str);

    return 
$str;

}


Usual special characters are replaced first (ä->ae), the complete string is the converted into lower case and spaces as well as special characters are replaced with the dash as separator. To put a link in the PHP-code of the page you just have to call the following function:

echo '<a href="/news/'.$id.'-'.filename($titel).'.html">'.htmlentities($titel).'</a>';

Now that you have handsome URLs, all you have to do is make sure they are also understood by the scripts. Usually this is where mod_rewrite comes into action: a module of the common Apache webserver, which allows the internal rewriting of URLs by means of predefined expressions. In this you often get the mistake that only the ID is forwarded and checked. A while ago Cyb has already called attention to this problem which – putting aside whether it is done benevolently or maliciously – can produce a massive duplicate-content-problem. For the above example of the URL for this article you would need to use a mod_rewrite-instruction like this:

RewriteEngine on
RewriteRule ^news/([0-9]+)\-([a-z0-9\-\.]+)\.html$ beitrag.php?id=$1&title=$2

The script beitrag.php is now not only checked for the actual existence of the forwarded ID which is then to be published but it also checks if the forwarding, meaning the actually displayed filename is also the identical to the one that the function filename() would have generated. Should this not be the case, because someone linked wrong or if the title has been changed, there should be a 301-redirect to the correct address and the script should be terminated at that point.
Johannes Beus - on Wed (09/05/2007) at 09:00 AM

Add Comment

more
This posting is older than 30 days and therefore closed for new comments.