Rewriting URL’s with mod_rewrite

Rewriting URL’s from difficult-to-read addresses to simple, clean addresses can make your website much more user-friendly and SEO-friendly, but it can be difficult to accomplish without the proper tools. Enter mod_rewrite! Mod_rewrite is a part of the Apache web server that acts as a proxy before content from the web server is returned to your browser. If your server isn’t running Apache then there are also extensions available for IIS such as the URL Rewriter.

What is the benefit of this? Well, how many times have you come across a link like this while browsing?

http://www.example.com/some-dir/index.php?page=animals&animal=dogs&action=videos

First thing is, this is not very descriptive and hard for users to remember. Second, this isn’t going to get you very good search results in Google or any other search engine. Search engines much prefer URL’s to look something like this:

http://www.example.com/animals/dogs/videos

Coding in mod_rewrite

So how does mod_rewrite work to turn the nasty looking first link into the super nice looking, SEO-friendly second link? It all starts with the .htaccess file (yes, it starts with a dot “.”). I’ve always been better at learning-by-doing or learning-by-seeing, so lets just jump into the code. First, I’ve created a file in the root of my site called .htaccess. In it are some simple lines of code like this:


    RewriteEngine On
    # send any non-files or non-folders to the root index.php file
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^(.*)$ /index.php?page=$1 [QSA,L]

This starts off with a conditional statement checking to see if Apache has the mod_rewrite module at all to avoid errors. Inside the ‘if’ statement is the actual code. The first line “RewriteEngine On” simply says we’re activating mod_rewrite. This is a quick and easy way to turn URL rewriting on/off without having to delete or comment out a bunch of lines of code. The second line starts with a hash mark. This is how you do comments. The next two lines start with “RewriteCond”. These are conditions for the last line which is the actual rule. The condition lines aren’t even read unless the RewriteRule is met. The RewriteRule is read like this: “if the path of the URL matches my regular expression, then check the RewriteCond”. The “path” here is the part of the URL between the domain and the query string: http://www.example.com/some/directory/file.php?querystring. If you are unfamiliar with Regular Expressions, you can familiarize yourself with them with Regex cheat sheets online. I got the following from addbytes.com.

With the above regular expression: ^(.*)$ says “match 0 or more characters”, which any URL would match, so in this case the RewriteRule will always be met, and will always look at the RewriteCond. Since there is no other RewriteRule between the two RewriteCond they both apply to this rule. The first condition is making sure the file name (through %{REQUEST_FILENAME} ) is not a directory (the exclamation mark meaning *not* and the -d meaning directory). If it meets this condition, the second condition is ignored and we go back to the RewriteRule. The second RewriteCond is similar to the first except that “-f” makes sure this is not an existing file. The reason for this is that we don’t want to redirect every file on the server. If there currently exists a directory or file “/animals/dogs/videos” on the server then we don’t want to write over it, because remember our rewrite rule already said “any path” which would include the file we’re going to end up redirecting to. This would lead to an infinite loop of redirects, and nothing would be accessible.

Once one of these conditions is met, the rewrite engine goes back to the RewriteRule and looks at the URL that comes after the first regular expression. The regular expression and the URL are separated by a white space. So, our new URL that we want to go to is /index.php?page=$1. This would be a PHP file that exists in the root of the site. The PHP page has a parameter call “page” with the $1. This dollar sign and number means that anything that is within the first set of parenthesis in the first part of our RewriteRule should be put in this spot. In this case, the entire path. So in our example above, when someone tries to go to example.com/animals/dogs/videos, they will get redirected to example.com/index.php?page=/animals/dogs/videos. Now, you won’t see any of this in the browser, but that’s what your web server sees in the background. So in your index.php file you can do something like:

if (isset($_GET["page"])) {
    $page = $_GET["page"];
    // display code based on this variable
}

The above is obviously very simplified, but you get the idea. The last part of the RewriteRule is this: [QSA,L]. This is where you can specify extra information about he URL you are redirecting to. The QSA means that any additional query string parameters should be added to the new URL, and L means that this is the last rule that should be met; any other rules after this are ignored.

Click here for a mod_rewrite cheat sheet, thanks to addbytes.com for the list.

What about other, more complex redirects? Suppose we want to redirect old URLs to a new domain or website? Lets look at RewriteMaps.

RewriteMaps are variables that include an external text file to map certain keys to certain values. It might look like this:


    RewriteEngine On
    RewriteMap sections_var txt:/usr/local/apache/conf/sections.txt
    RewriteCond %{QUERY_STRING}% section=([0-9]+)&item=(\w+)
    RewriteRule ^/store/(.*) http://www.example.com/store/${sections_var:%1|no-section}/%2/ [QSA,R=301,L]

We also have a sections.txt file that looks like this:


1    Electronics
2    Music
3    MP3 Players
4    Televisions
5    Laptops
6    Netbooks
7    Desktops
8    Books
9    Software
10   Accessories

So for this example, we want our old, out of date URLs that people may have saved to their favorites, or Google may have indexed, to know where our new links are.

We reference this text file with the RewriteMap variable called sections_var. We then identify where the file is located by writing out the full path to the text file. In our RewriteRule we are checking to see if the incoming URL is a string starting with /store/ followed by zero or more characters. The RewriteCond checks to see if the query string has section=(one or more numbers) and also: &item=(one or more words). An example of this kind of URL is: site.com/store/index.php?section=10&item=cases.

Once we return to the RewriteRule, our first variable we come to is ${sections_var:%1|no-section}. Lets look at the first part: ${} . This means a variable with code inside the braces. Inside the braces are two sets of information separated by the pipe character. If the part to the left of the pipe character isn’t found, the part to the right of the pipe character is read. The sections_var:%1 takes the first parenthesis value from the RewriteCond and finds the value in the sections.txt map. Look at the sections.txt map above, and if, for example, the number 7 is in the URL, the value “Desktops” is returned to the RewriteRule. Again, if no value is found the default “no-section” is given. The %2 is the second parenthesis given in the RewriteCond. Notice, a percentage is used here as opposed to the dollar sign like we used in the first example. When you “backreference” to the first part of the RewriteRule you use the dollar sign. When you backreference to the RewriteCond you use the percentage.

Last thing we have in this example is the R=301. This is telling the browser and all the little Google bots out there that this is a permanent redirect and it should update all the indexes it has.

Tips

If you want different sets of rules for different directories in your site, you can put the .htaccess files in lower directories as well. The farther down your directory it is the more it takes precedence. If you have an .htaccess file in /site/animals/ then any rules in that directory will run before any the rules of an .htaccess file in /site/ or the root directory.

Other issues you may have with rewriting URLs is that if you rewrite to a new domain, the URL will resolve to the new address. This may not be the desired effect. In this case, you would use the P flag in your brackets at the end of your RewriteRule:


    RewriteEngine On
    # any incoming URL on http://oldsite.com gets redirect to http://newsite.com
    RewriteRule ^(.*) http://www.newsite.com.com$1 [QSA,P]

This will take any incoming URL from oldsite.com and forward it to newsite.com, but you’ll still see “oldsite.com” in your browsers address bar. I have found this useful for things like redirecting my crossdomain.xml file when my application server is down and I have web services that need the crossdomain file.

Optimizing Your Rewrite Code

It is important to remember that every request made to your server comes through your mod_rewrite file. If you load a single page, it and all of its images, css files, javascripts and other files will go through this file separately. This could lead to slow load times if you have a lot of rules. Mod_rewrite is a powerful tool if you spend the time to optimize it properly.

Remember to utilize maps when possible. Maps won’t be queried if a rule’s conditions aren’t met.

Familiarize yourself with regular expressions to create fewer rules. Complex rules are harder to write but the rewrite engine can go through them quickly. Just remember, the fewer rules the better. When you need to write maps, they don’t get called as often as the rules, but try to keep them short anyway. A list of 10, 20 or 50 shouldn’t be too much for it, but if you create a map with 500 items you could see a big slow down.

Lastly, try to rely on your server-side code like the first example if possible. If you can send large chuncks of data to your PHP file and parse your the data with your PHP code (or whatever language you’re using) the PHP code can run a lot faster than your rewrite code.

One thought on “Rewriting URL’s with mod_rewrite”

Leave a Reply

Your email address will not be published. Required fields are marked *