BFC4: Notes on Oddmuse Internals

I’ve been hacking on and around the innards of Oddmuse, so I wanted to pause and record some notes, as it is fairly undocumented.

How Does It Work?

How Does It Work?

Roughly speaking, the program flow is like this.

Initalize and Parse Action

Initialize a bunch of global variables with straight declarations, then go to DoWikiRequest, which is the main program.
Init()
- create directories, initialize some link globals
- load modules
  - it uses glob, so I think they are loaded in your system’s alphanumeric order.
- load config
  - Note this comes after modules, so you can configure variables in modules in your config file.
- InitRequest(). This creates $q, which is a standard CGI.pm object.
  - Notice this means your modules will NOT have access to $q in their global variable declarations, but will have it when the actual subs run.
- use $q method to get the cookie $CookieName and put it in %NewCookie
- InitVariables() clears out globals in case you’re using mod_perl.
DoSurgeProtection()
It does some simple user permissions checks.
- Note for the record there are UserIsEditor(), UserIsAdmin(), UserIsBanned(), UserCanEdit(), as well as UserIsEditorOrError and UserIsAdminOrError.
Now we’re in DoBrowseRequest(),
- This figures out the action from the CGI params and decides if this is a Search, Post, Download or a Page to Resolve.
- Note the use of GetId() which figures out the current page name. Quite often $q->param(id) doesn’t exist, so this needs to do some thinking. For instance, typically pages are called as script/pagename or script?pagename, or no pagename defaults the id to $HomePage.

Search and Replace

DoSearch covers Indexing, Replace and Search. DoIndex and Replace handle the obvious things.
Most of search part of DoSearch is spent setting up the search page with title.
SearchTitleAndBody is where the actual searching happens.
- You pass it a \&PrintSearchResult routine to report found pages and @args to pass to &PrintSearchResult.
- The built-in search is quite brute force. We loop over every page returned by AllPagesList() (which itself just looks in your pageidx). For each page, we run SearchString($regexp, $data), which applies splits $regexp into space-delimited regexps and runs each one as a perl search on $data. This is an AND search, as success is reported only if they all match. Searches are /$regexp/i, that is, case-insensitive.
- OPTIMIZATION Idea.
  - Currently, for each page we apply each piece of the $regexp. That means each of the R regexp pieces gets recompiled for each of the P pages. That’s R*P recompilations. If we use some of the ideas from Schwartz (namely use $compiled = qr/$regexppiece/, â¦ $data=/$compiled/), and rework the loop, we should get a nice savings. The loop could be reworked so that we loop over each piece of the regexp. $string also gets scanned for null entries once for each page!! Compile the pattern at the start of the loop. Now loop over all the pages, tracking success in a hash somewhere, and abandoning the check if we’re successful. We can sort results if necessary.
  - OR We can keep the logic of the page if we insert at the start of SearchTitleAndBody a compilation of each regexp piece, and instead of passing $string, pass @compiled_searches.
    - I actually tried this on a site with about 1880 pages with queries with 4 parts (natural) and 15ish parts (contrived). The unoptimized searches never took more than 0.5 sec. Optimized searches did take slightly less time, but about 5-10% faster. Not worth the trouble.

How Formatting Works

It all happens in ApplyRules, which more or less starts with the raw markup in $text. After we branch off for pages with uploaded file content, we do a big infinite while(1) loop.
In this master loop, a number of short routines take turns trying to match the beginning of $text (now in local $_).
After a number of hardcoded routines take their turn (and these should probably be refactored out as Rules, like LinkRules, etc.), then the modules can have their routines run as MyRules.
Once there is a match, the loop begins again but with routines trying to match the beginning of the unmatched part of $text. This is implemented through the required \G match starting each regexp.
- Note each search is of the form m/\G regexp/cg; This is important, because module formatting rules need to have this form as well, or they will mess up processing of other rules. The special parts of the search are:
- The \G matches where the last successful search ended. This means that once a Rule matches some markup, no other Rule can match it. So if you are competing with existing markup rules, you need to set $RuleOrder{\&YourSub} to some number, either very positive to assure firing late, or very negative to assure firing early.
- The cg flags mean that if we *don’t *match, then we will reset our search start to where the search began.
In principle one might worry there won’t be a match and large hunks of the markup won’t get processed and output. In reality, the last searches are for strings of plaintext words separated by non-linebreak whitespace.

Localization

T, Ts, Tss are all routines one should call on output text strings to allow future modules to translate them.

Cookies

Using $q methods, the cookie $CookieName gets put in %NewCookie during Init().
SetParam and GetParam are the official ways to access the site cookie values. SetParam just updates %NewCookie. The values of the cookie get flushed to header when GetHttpHeader is called. Remember to set $InvisibleCookieParameters{$key}to avoid lots of annoying status change messages, and%CookieParameters if you want the key/value to persist in the cookie.

Notes on Oddmuse Internals

Contents

How Does It Work?

Initalize and Parse Action

Search and Replace

How Formatting Works

Localization

Cookies