Rewritten Wordpress Pretty Permalinks under Sun/Open Webserver 7
aka, why bother? but seriously folks…
Wordpress is blogging software. This blog is powered by wordpress in fact. It uses php and a database to construct web pages that then get served out. One of the “features” it has is that it can use “pretty” links, or define permalinks such that the links will be the same for all time. I won’t go into it in too much detail (see here) but what this means is, let’s say you write an article about your cat called “My cat is the super duper bestest stop making fun of him!” The link to this individual post will look something like “http://mycatrules.com/?p=148″ or something else unintelligible. But if you use the permalinks feature you can make it more sane and give the prospective reader a better idea of what the post is about. Without rewriting, you can usually get as far as something like “http://mycatrules/index.php/my-cat-is-the-super-duper-bestest-stop-making-fun-of-him” which is okay, but that index.php leaves a lot to be desired. What if you move your blog to something that isn’t running on php? Plus it’s ugly.
So you need to change the url to get rid of the index.php, but that’s basically impossible to do within php (without including an index.php in a bunch of just-in-time created directories or something which would be a solution worse than the problem). Fortunately web servers can help by rewriting the url internally. If you were using apache, which is how 99.99999% of the wordpress setups in the world are doing this, you just need a .htaccess file that invokes the magic of mod_rewrite. that .htaccess would look something like this, depending on just how pretty you wanted to make your links:
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
What this does is basically convert your request for /archives/blah into a request for /index.php/archives/blah behind the scenes, where the /archives/blah parts are really arguments that get passed to index.php to be processed. But only if /archives/blah doesn’t actually exist in the docroot as either a file or directory. That way things like the actual html/images/css used by your theme, or the wp-admin interface, are served up as-is, but things that exist only in the weird world inside of the code for wordpress such as the concept of archives or tags, things for which there is no physical representation in the filesystem, are passed as arguments to index.php.
If however, you are running sun web server 7 (or now open web server?), it is slightly different. The same idea applies but mod_rewrite doesn’t exist. Instead there are “NameTrans” “SAFs” to handle rewriting/redirecting of urls based on criteria. A full explanation of how this all works is beyond the scope of this post. I’m not here to flame about which is better, mod_rewrite is certainly more popular, but from what I’ve read the sun web server can do it all and more and faster. What I’m trying to do here is the simplest possible rewrite though, and they are both more than up to the task. Plus if I was doing it with apache, this wouldn’t be much of a post since the work is already done and the howto can be found in 8 million places.
So what we want is, simply enough, the “rewrite” function. So firstly let’s try the simple base case and see what happens. In our virtual server’s obj.conf, define a rule that will map requests for /* to /index.php/* behind the scenes:
<If $uri =~ "^/(.*)">
NameTrans fn="rewrite" path="/index.php/$1"
</If>
Or in english “hey, if the request from the client matches the regular expression ^/(.*) (and all will), stick an /index.php in front of whatever (.*) matches (which is everything after the first slash).” The uri and path variables are as defined here. And this works! Well, sort of. Basically everything gets translated into /index.php/blah. This prevents the theme from loading since that is componsed of physical files in wp-content/theme/blah, but that gets translated into /index.php/wp-content/theme/blah which is gibberish. But the home page will load, as will any permalink you set up, without /index.php in front of it! So that’s some progress, and more or less what was expected.
So we need to fix the same problem that is handled by the RewriteCond lines in mod_rewrite from above. The way to go about this is to modify the If in obj.conf to exclude files that actually do exist in the docroot. And it even seems like there’s a variable to help us determine that, namely $ppath. This is described as the physical path to the file in the request. When combined with the -f and -d tests in the If syntax, we should be able to achieve the same thing as mod_rewrite. So, second try:
<If $uri =~ "^/(.*)" and not -f $ppath and not -d $ppath>
NameTrans fn="rewrite" path="/index.php/$1"
</If>
Except, this doesn’t work either. In fact it looks exactly like our more naive example from above. Without being an expert on debugging the values of variables in the web server, I kind of banged my head against the wall for awhile on this one. Eventually I just tried to manually put the full path to the request in like so:
<If $uri =~ "^/(.*)" and not -f "/var/SUNWwbsvr7/holcasaur.us/docs/$uri" and not -d "/var/SUNWwbsvr7/holcasaur.us/docs/$uri">
NameTrans fn="rewrite" path="/index.php/$1"
</If>
And it worked! So $ppath must not be working as advertised. Except it is. From the variable reference:
$ppath:
Requested path (either URI, partial path, or file system path depending on stage).The predefined variable path is the value of pathrq->vars. If path isn’t set in rq->vars (for example, if NameTrans hasn’t completed), path gets the value of ppathrq->vars.
Notice that “depending on the stage” part. Turns out, this is all happening in the NameTrans stage, whose whole purpose is to determine the actual physical path to the request (see the stages of request processing). So we can’t use the physical path until it’s been determined, and that’s the last thing that happens so we can’t rewrite afterwards.
As glad as I am that the above is working, having absolute paths like that is ugly, kludgey, and will break if the actual docroot ever gets changed. So surely we should be able to substitute some variable for the docroot. I swear to you I have searched everywhere, and I am usually pretty good at searching, and this is just not possible. There’s no $root or $docroot that can be used in our If. So the only thing I can think of is to define one manually in server.xml. You can actually do this through the admin interface so, assuming one were changing the actual docroot, as long as you were careful and also changed the variable definition of $docroot, you could leave obj.conf alone (always a good practice to leave as much configuration as possible alone).
Basically in server.xml we have:
<virtual-server>
<name>holcasaur.us</name>
...
<document-root>/var/SUNWwbsvr7/holcasaur.us/docs</document-root>
<variable>
<name>docroot</name>
<value>/var/SUNWwbsvr7/holcasaur.us/docs</value>
</variable>
This defines a variable, $docroot, that matches the document root as defined for the virtual server. Then in obj.conf we can use it like so:
<If $uri =~ "^/(.*)" and not -f"$docroot$uri" and not -d"$docroot$uri">
NameTrans fn="rewrite" path="/index.php/$1"
</If>
And voila, everything works as expected!
I am almost positive there has to be a better way to do this. Maybe something along the lines of using PathCheck fn=restart once $ppath gets defined properly (I tried this and got myself into a rewrite loop but it seems promising), or using objectype to define some custom logic, or going ahead with our rewrite then rewriting again later if ppath doesn’t actually exist or something, but I am not an expert on how it’s all supposed to work together, yet. Please let me know if you know, though…
Your mod_rewrite rules are so:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
These basically say “if the requested URI doesn’t exist as a file or as a directory then do the rewrite.”
You can do this in the obj.conf with something like:
# “Clean URLs”
AuthTrans fn=”restart” uri=”/index.php/$1″
At least I think that would work. I use a variation of this in order to make Drupal work with WS7 and have “Clean URLs:”
# Drupal “Clean URLs”
$url =~ ‘//[^/]+/([^?]*)(\?(.*)|())$’>
AuthTrans fn=”restart” uri=”/index.php?q=$1&$3″
Note that I check to make certain the request is not already restart (“not $internal”) in order to avoid recursion. Also not that my regex match is comparing $url, not $uri.
In my obj.conf these preceed any NameTrans statements.
See http://docs.sun.com/app/docs/doc/819-2630/gbyvr?a=view for more information.
Err. The WordPress comment system seems to have stripped out much of the markup needed for that comment to make sense. Sorry about that.
Let’s try to recreate them:
# “Clean URLs”
<If not $internal
and $urlhost =~ ‘(?i)(jmccabe\.org|tullycreek\.com)$’
and not -U $path
and $url =~ ‘/(.*)$’>
AuthTrans fn=”restart” uri=”/index.php/$1″
</If>
# Drupal “Clean URLs”
<If not $internal
and not -U $path
$url =~ ‘//[^/]+/([^?]*)(\?(.*)|())$’>
AuthTrans fn=”restart” uri=”/index.php?q=$1&$3″
</If>
Let’s see if that worked…
@jmccabe
Ah, that ‘not $internal’ looks useful. I think in this case (see my follow up post) it’s not strictly necessary since it always redirects to /index.php, which should exist, but it can’t hurt to be careful.
I couldn’t quite figure out -U. Is it like -e but with an access check (eg it exists and is readable) as well?
@holcomb
Wow. The cold medicine was doing terrible things to my writing yesterday. Sorry about that.
There’s a few things to address:
1) Something is broken in your rewrite rules. For example, http://holcasaur.us/2009/05/wordpress-permalink-follow-up/ throws a “No input file specified” error. This means that a URI got tossed to the PHP engine that isn’t resolved to a real PHP file.
2) I understand that -e checks to see if a file or directory exists (but nothing more) where -U also makes sure that the UID Web Server is running as is actually allowed to access the file (this is handy if we’re checking to see if a the request is for a file that we should be serving).
@jmccabe
in all probability 1) is because I’m still messing with it :) I really need a staging/dev box or something.
switched to 2) and added the not $internal as well.
@jmccabe
argh you’re right.
back to the drawing board. btw I’m not sure about -U. the files in question are definitely readable but not owned by the webserver user. Switching to it caused things to not work.
@jmccabe
this was a problem with php/wordpress actually (or maybe with the sun web server fcgi handler). setting cgi.fix_pathinfo=0 in php.ini fixed it.
strange that some of the pretty urls worked and others didn’t.
how many followups can I do…
@holcomb
That’s a b00g in PHP (the cgi.fix_pathinfo thinger). http://bugs.php.net/bug.php?id=47042 Sun submitted a patch to fix the issue in PHP 5.2.9, but the PHP team didn’t like it. See also http://wikis.sun.com/display/WebServerdocs/Release+Notes#ReleaseNotes-Core (b00g 6785490).
It’s a really annoying bug where PHP makes assumptions about the HTTP engine its working with and does dumb things based on those assumptions.
Regard -e vs -U … well, I run with the rule of thumb of “if the right switch doesn’t work, move to the wrong switch that does.” Chasing down why the right switch fails often proves to be far too big a pain to be worthwhile. If you really, REALLY feel like chasing it on a test server, use strace to look at the commands thrown at the file system. If that’s not enough, dig into the Open Web Server source. :)
jaja!