I’ve been hacking on and around the innards of Oddmuse, so I wanted to pause and record some notes, as it is fairly undocumented.
How Does It Work?
Roughly speaking, the program flow is like this.
Initalize and Parse Action
- Initialize a bunch of global variables with straight declarations, then go to DoWikiRequest, which is the main program.
- Init()
- create directories, initialize some link globals
- load modules
- it uses
glob
, so I think they are loaded in your system’s alphanumeric order.
- load config
- Note this comes after modules, so you can configure variables in modules in your config file.
- InitRequest(). This creates $q, which is a standard CGI.pm object.
- Notice this means your modules will NOT have access to $q in their global variable declarations, but will have it when the actual subs run.
- use $q method to get the cookie
$CookieName
and put it in %NewCookie
- InitVariables() clears out globals in case you’re using mod_perl.
- DoSurgeProtection()
- It does some simple user permissions checks.
- Note for the record there are UserIsEditor(), UserIsAdmin(), UserIsBanned(), UserCanEdit(), as well as UserIsEditorOrError and UserIsAdminOrError.
- Now we’re in DoBrowseRequest(),
- This figures out the
action
from the CGI params and decides if this is a Search, Post, Download or a Page to Resolve. - Note the use of GetId() which figures out the current page name. Quite often
$q->param(id)
doesn’t exist, so this needs to do some thinking. For instance, typically pages are called as script/pagename
or script?pagename,
or no pagename defaults the id to $HomePage.
Search and Replace
- DoSearch covers Indexing, Replace and Search. DoIndex and Replace handle the obvious things.
- Most of search part of DoSearch is spent setting up the search page with title.
- SearchTitleAndBody is where the actual searching happens.
- You pass it a \&PrintSearchResult routine to report found pages and @args to pass to &PrintSearchResult.
- The built-in search is quite brute force. We loop over every page returned by AllPagesList() (which itself just looks in your
pageidx
). For each page, we run SearchString($regexp, $data)
, which applies splits $regexp into space-delimited regexps and runs each one as a perl search on $data. This is an AND search, as success is reported only if they all match. Searches are /$regexp/i, that is, case-insensitive. - OPTIMIZATION Idea.
- Currently, for each page we apply each piece of the $regexp. That means each of the R regexp pieces gets recompiled for each of the P pages. That’s R*P recompilations. If we use some of the ideas from Schwartz (namely use $compiled = qr/$regexppiece/, ⦠$data=/$compiled/), and rework the loop, we should get a nice savings. The loop could be reworked so that we loop over each piece of the regexp. $string also gets scanned for null entries once for each page!! Compile the pattern at the start of the loop. Now loop over all the pages, tracking success in a hash somewhere, and abandoning the check if we’re successful. We can sort results if necessary.
- OR We can keep the logic of the page if we insert at the start of SearchTitleAndBody a compilation of each regexp piece, and instead of passing $string, pass @compiled_searches.
- I actually tried this on a site with about 1880 pages with queries with 4 parts (natural) and 15ish parts (contrived). The unoptimized searches never took more than 0.5 sec. Optimized searches did take slightly less time, but about 5-10% faster. Not worth the trouble.
How Formatting Works
- It all happens in ApplyRules, which more or less starts with the raw markup in $text. After we branch off for pages with uploaded file content, we do a big infinite
while(1)
loop. - In this master loop, a number of short routines take turns trying to match the beginning of $text (now in
local $_
). - After a number of hardcoded routines take their turn (and these should probably be refactored out as Rules, like LinkRules, etc.), then the modules can have their routines run as MyRules.
- Once there is a match, the loop begins again but with routines trying to match the beginning of the unmatched part of $text. This is implemented through the required \G match starting each regexp.
- Note each search is of the form
m/\G regexp/cg;
This is important, because module formatting rules need to have this form as well, or they will mess up processing of other rules. The special parts of the search are: - The \G matches where the last successful search ended. This means that once a Rule matches some markup, no other Rule can match it. So if you are competing with existing markup rules, you need to set
$RuleOrder{\&YourSub}
to some number, either very positive to assure firing late, or very negative to assure firing early. - The cg flags mean that if we *don’t *match, then we will reset our search start to where the search began.
- In principle one might worry there won’t be a match and large hunks of the markup won’t get processed and output. In reality, the last searches are for strings of plaintext words separated by non-linebreak whitespace.
Localization
- T, Ts, Tss are all routines one should call on output text strings to allow future modules to translate them.
Cookies
- Using $q methods, the cookie
$CookieName
gets put in %NewCookie
during Init().
- SetParam and GetParam are the official ways to access the site cookie values. SetParam just updates %NewCookie. The values of the cookie get flushed to header when GetHttpHeader is called. Remember to set $InvisibleCookieParameters{$key}to avoid lots of annoying status change messages, and%CookieParameters if you want the key/value to persist in the cookie.