October 20 Saturday, 10:17AM. Another good thing about a personal RSS reader (I use FreshRSS) is that it's really easy to, a week after reading something interesting, find an article again based on any half-remembered word or phrase. I find Twitter to just be a such an unstoppable river, that without having liked a post it's #

October 11 Thursday, 4:21PM. I reckon some people in East Fremantle don't want people to know that their cul de sacs are actually pedestrian thoroughfares. I'm adding 'em to OpenStreetMap#

October 11 Thursday, 4:13PM. It's so hard to remember the brilliant mind-clearing simplicity of just walking, with no goal, for at least an hour, and possibly with a pub at the halfway point. Even sore tendons succumb to the joy. #

October 10 Wednesday, 2:51PM. I've been out taking photos of Mills and Wares Park, because WikiShootMe told me there was no photo yet of it on Wikidata. #

October 10 Wednesday, 1:26PM. I've updated my PageCleanUp.js script on English Wikisource to support some extra bits and bobs, at the suggestion of @pigsonthewing #

October 8 Monday, 8:58AM. When creating a header template for MediaWiki pages, it can be good to set the display title to de-emphasize the standard page title so that it can be reused lower down within the header:

'"`UNIQ--nowiki-00000002-QINU`"' #

September 26 Wednesday, 1:22PM. Interesting presentation about the Auckland University Tramping Club and archives and how sometimes professional intervention isn't the best way to go.

Belinda Battley #

September 26 Wednesday, 8:01AM. Heading to the first day of the Australian Society of Archivists conference. #

September 23 Sunday, 9:34AM. Public websites can be archived on the Internet Arcchive (either as crawled HTML pages or dumped XML etc.), but what is one to do about private sites? Currently I'm backing up locally (of course) but it's hard to know how to make the backups usable by anyone. #

September 17 Monday, 6:42AM. I'm attempting to remove ParserFunctions completely, and switch all templates use to Lua modules instead. #

My coffee mug

Hello world, and welcome to my corner of the web! This is where I write words about what I'm working on, and post photographs of things I've seen.

I'm a Software Engineer at the Wikimedia Foundation, and so of course my personal website is a wiki (running on MediaWiki). In my spare time I volunteer with WikiClubWest to work on Wikimedia projects, mostly around my family's genealogy and local Western Australian history (especially to do with Fremantle). I try to keep up with issues on all the things I maintain.

I also try to find time to work in my workshop on various woodworking projects. Recently, that's been focused on getting my new workshop's doors built and installed.

Travel features in my life, not because I really hugely want to go elsewhere but because I just do — and also because then I can do some more interesting mapping on OpenStreetMap.

I'm currently reading , and Dad and Dave (Steel Rudd, 1899), and Where Angels Fear to Tread (E. M. Forster, 1905).

To contact me, you can email me or find me on the Freenode IRC network (as 'samwilson'). If you want to leave a comment on this site (by creating an account), you need to know the secret code Tuart (it's not very secret, but seems to be confusing enough for most spammers).



Data modeling


I know I've said it before, but I really do find data modeling within MediaWiki with Cargo is great fun, and happens with the quick speed of normal wiki editing. It's a lot nicer (for the bulk of cases) than attempting to build a bespoke application. (Of course, I should preface all of this with some caveats about what can be done, but blah blah that can all be taken as read).

It's better to come at things from the other end, of say a small set of data in a spreadsheet. This can in many cases be easily ported into the MediaWiki idea by creating a single wiki page for each row of data, with each page containing a call to a single template. This template is where almost everything else is done.

One weird thing about it is that it's sort of closer to a EVA schema than a normal table, because each record can (but really shouldn't) have different attributes. This is a good thing from a point of view of tracking changes over time to the data, because changes to attribute names as well as their values are tracked in the page history. Of course, it also means that one has to do more scripting for some types of data modification, but for the most part that's not very hard.

Today the thing I'm enjoying about it is that it's perfectly easy to set up a new template and table and things for any quite small dataset. That means that data is given the structure it needs, rather than being munged into some more generic schema. (I guess this will also probably come back to bite me one day too, because I'll have separate things that don't fit together! But oh well.)

The other weird bit about Cargo is that it almost does away with the need to have categories in MediaWiki. I'm not sure if that's a good thing or not. So far I'm loving the extra flexibility I get with 'keywords' that are usually tied to mainspace pages.

Fremantle civic centre demolition


2018-10-09 Fremantle civic centre demolition jaws.JPG

I went to see the demolition of the civic building in Kings Square today. It's a bit sad to see it all coming down. Interesting to see the side of the town hall exposed, but really mostly just melancholy. Change is always a bit like that.

(These photos are also on Commons, in commons:Category:October 2018 in Fremantle.)

Providing Services from a Syfony bundle


I'm trying to add a service in a redistributable Symfony 4 bundle. The docs say it is just a matter of loading the service configuration in the bundle's Extension class. For example, for the GoatBundle:

class GoatExtension extends \Symfony\Component\DependencyInjection\Extension\Extension {
    public function load(array $configs, ContainerBuilder $container) {
        $configDir = dirname(__DIR__).'/Resources/config';
        $loader = new YamlFileLoader($container, new FileLocator($configDir));

Where GoatBundle/Resources/config/services.yml looks like this:

  factory: [ '\Factory\For\InjectedClass', serviceFactory ]
    - '@service_container'

But this results in:

There is no extension able to load the configuration for "Name\Of\InjectedClass" (in GoatBundle/Resources/config/services.yml). Looked for namespace "Name\Of\InjectedClass", found none

I went around and around in circles until I realised that I was simply missing the top-level services key in services.yml! It needs to be like this:

    factory: [ '\Factory\For\InjectedClass', serviceFactory ]
      - '@service_container'

I'm only writing it all down here because it was only by this rubber ducking that I saw the problem.

Writing every day


I used to try to write every morning about what I was going to work on in the day. Sometimes I'd publish it as a blog post, but mostly just stick it away in my private journal — it was the process of writing it that mattered, not at all the fact that I could then read back over it. In fact, I think I pretty rarely read over anything a second time, except perhaps when searching for some bit of documentation that I had an inkling that I'd written down somewhere. Some people seem to blog about topics, I just treat it like a rambling whatever-space to put anything at all.

I tried the other day to switch to a Markdown-and-Pandoc based blog, hosted as static files on Netlify, but I then immediately wanted to search something... and couldn't. Of course, there are ways of doing static-site search (build a static index, and query it from the client) but I'm not very interested and so am back here in MediaWiki.

MediaWiki with two database servers


I've been trying to replicate locally a bug with MediaWiki's GlobalPreferences extension. The bug is about the increased number of database reads that happen when the extension is loaded, and the increase happens not on the database table that stores the global preferences (as might be expected) but rather on the 'local' tables. However, locally I've had all of these running on the same database server, which makes it hard to watch the standard monitoring tools to see differences; so, I set things up on two database servers locally.

Firstly, this was a matter of starting a new MySQL server in a Docker container (accessible at and with its data in a local directory so I could destroy and recreate the container as required):

docker run -it -e MYSQL_ROOT_PASSWORD=pwd123 -p3305:3306 -v$PWD/mysqldata:/var/lib/mysql mysql

(Note that because we're keeping local data, root's password is only set on the first set-up, and so the MYSQL_ROOT_PASSWORD can be left off future invocations of this command.)

Then it's a matter of setting up MediaWiki to use the two servers:

$wgLBFactoryConf = [
	'class' => 'LBFactory_Multi',
	'sectionsByDB' => [
		// Map of database names to section names.
		'mediawiki_wiki1' => 's1',
		'wikimeta' => 's2',
	'sectionLoads' => [
		// Map of sections to server-name/load pairs.
		'DEFAULT' => [ 'localdb'  => 0 ],
		's1' => [ 'localdb'  => 0 ],
		's2' => [ 'metadb' => 0 ],
	'hostsByName' => [
		// Map of server-names to IP addresses (and, in this case, ports).
		'localdb' => '',
		'metadb' => '',
	'serverTemplate' => [
		'dbname'        => $wgDBname,
		'user'          => $wgDBuser,
		'password'      => $wgDBpassword,
		'type'          => 'mysql',
		'flags'         => DBO_DEFAULT,
		'max lag'       => 30,
$wgGlobalPreferencesDB = 'wikimeta';


New MediaWiki extension: AutoCategoriseUploads. It "automatically adds categories to new file uploads based on keyword metadata found in the file. The following metadata types are supported: XMP (many file types, including JPG, PNG, PDF, etc.); ITCP (JPG); ID3 (MP3)".

Unfortunately there's no code yet in the repository, so there's nothing to test. Sounds interesting though.

Self-hosted websites are doomed to die


I keep wanting to be able to recommend the 'best' way for people (who don't like command lines) to get research stuff online. Is it Flickr, Zenodo, Internet Archive, Wikimedia, and Github? Or is it a shared hosting account on Dreamhost, running MediaWiki, WordPress, and Piwigo? I'd rather the latter! Is it really that hard to set up your own website? (I don't think so, but I probably can't see what I can't see.)

Anyway, even if running your own website, one should still be putting stuff on Wikimedia projects. And even if not using it for everything, Flickr is a good place for photos (in Australia) because you can add them to the Australia in Pictures group and they'll turn up in searches on Trove. The Internet Archive, even if not a primary and cited place for research materials, is a great place to upload wikis' public page dumps. So it really seems that the remaining trouble with self-hosting websites is that they're fragile and subject to complete loss if you abandon them (i.e. stop paying the bills).

My current mitigation to my own sites' reliance on me is to create annual dumps in multiple formats, including uploading public stuff to IA, and printing some things, and burning all to Blu-ray discs that get stored in polypropylene sleeves in the dark in places I can forget to throw them out. (Of course, I deal in tiny amounts of data, and no video.)

What was it Robert Graves said in I, Claudius about the best way to ensure the survival of a document being to just leave it sitting on ones desk and not try at all to do anything special — because it's all perfectly random anyway as to what persists, and we can not influence the universe in any meaningful way?

Wikisource books for binding


I have been experimenting with turning Wikisource works into LaTeX-formatted bindable PDFs. My initial idea was to produce quatro or octavo layout sheets (i.e. 8 or 16 book pages to a sheet of paper that's printed on both sides and has the pages layed out in such a way as when the sheet is folded the pages are in the correct order) but now I'm thinking of just using a print-on-demand service (hopefully Pediapress, because they seem pretty brilliant).

Basically, my tool downloads all of a work's pages and subpages (in the main namespace only; it doesn't care about the method of construction of the work) and saves the HTML for these, in order, to a html/ directory. Then (here's the crux of the thing) it uses Pandoc to create a set of matching TeX files in an adjacent latex/ directory.

So far, so obvious. But the trouble with this approach of wanting to create a separate source format for a work is that there are changes that one wants to make to the work (either formatting or structural) that can't be made upstream on Wikisource — but we also want to be able to bring down updates at any time from Wikisource. That is to say, this is creating a fork of the work in a different format, but it's a fork that needs to be able to be kept up to date.

My current solution to this is to save the HTML and LaTeX files in a Git repository (one per work, e.g.) and have two branches: one containing the raw un-edited HTML and LaTeX, on which the download operation can be re-run at any time; and the other being based off this, being a place to make any edits required, and which can have the first merged into it whenever that's updated. This will sometimes result in merge conflicts, but for the most part (because the upstream changes are generally small typo fixes and the like) will happen without error.

Now I just want to automate all this a little bit more, so a new project can be created (with GitHub repo and all) with a single (albeit slow!) command.

The output ends up something like Commons:File:The Nether World by George Gissing.pdf.

2018 Firefox survey


I just filled in the 2nd Annual Firefox Census, which is just a survey about weird stuff and Firefox and Mozilla and stuff (literally: like "To what Hogwarts house do you belong?").

I don't know why I feel like it's sort of okay to share data with a company like Mozilla, but I guess I do. Although, come to think of it I'll also sign up to loyalty programmes at grog shops too... I guess I don't really care about my personal information that much!

It's more the bloody annoying algorithms of YouTube and Twitter that annoy me. Give me chronologies and folksonomies any day! (Sort of. There is, of course, more to it, but that's always the case.)

Template frontmatter

Golden Gate Club, San Francisco

A few years ago the static-site bloggig tool Jekyll popularised the idea of text files containing 'front matter', which is usually Yaml-formatted metadata put between some delimeters at the top of a file. This works pretty well in MediaWiki as well, with a slightly different format (i.e. templates).

Yaml Wikitext HTML
type: book
author: John Smith
publication_date: 1923
| author = John Smith
| publication_date = 1923
<div itemscope itemtype="http://schema.org/Book">
  A book by <span itemprop="author">John Smith</span>,
  published in <span itemprop="date">1923</span>.

I find it useful to think of wiki pages as representing instances of some sort of 'entity', and the template at the top is what defines this.

With the addition of Cargo, all this metadata becomes queryable from elsewhere in the wiki.

Waiting for a flight

Perth airport

Sunday morning, Perth airport. In a few hours (by the clock) I'll be in San Francisco. It'll take me a bit longer than that to get there, but that's okay—it's a nice day. The main question is whether one should add spaces next to an em dash. Or whether it's easy to use altogether too many em dashes! Tricky questions. Plenty of time to figure them out. And to find the asciicircum key (why don't they call it what we all call it?).

In the UK, "phonebox numbers reached their peak in 1992, when there were 92,000 of them"[1] and now they're getting rid of lots and will be left with only twenty thousand or so. I guess we all have other phones now (even if some of us don't use them ever and are terrible at answering when phoned).

Someone said this morning that "MediaWiki seems to be much more popular outside the foundation then it is inside it."[2] I hope that's true! I mean, it's not good that it's not popular inside the Foundation (we use Google docs sometimes for things! shock horror), but I like to think that it is popular outside. I like MediaWiki.

The coffee here really isn't as good as elsewhere. Yesterday, I had a terrific long black at the place on the corner of King and Wellington streets. Terrible seating, but lovely coffee, and a good window to peer out of. Important things. From here, a metre away, I can see black polyester-clad bums of two pilots, dragging their nice little square rolly suitcases.

Anyway, I'll stop blathering and go find a better spot to while away the time.

Centenary Building


Archaeological excavations (by a company called ARCHAE-AUS) around the Freo town hall have been carrying on this week, with evidence of a farrier's and blacksmith found at the corner of Newman Court (was Street) and William Street, where the Centenary Building stood for thirty years or so from 1929.

The sign in the middle photo reads:

Exposing 1890–1913 blacksmiths', farriers', coachbuilders', & wheelwrights'.

Mr Tyler is in the photograph taken in 1896 in the front of his workshop (photo taken from other side of William St).

I'm not sure what happened in 1913.

Around the corner they've started work to uncover the foundations of the first St Johns church. This one they know more about, as it was excavated in the 1980s (although, the word on the street is that not enough photos were taken, nor other detailed records made). It'll be exciting to see this, after knowing about the church's outline in the paving stones my whole life.

What's the future of hosting MediaWiki?


What is to be the future of running one's own MediaWiki? Shall there be a dozen different services required (database, cache, search, parser, ...) all running with different technologies and different systems of upgrading and support? Or will we head back to the "old days" (in which things like WordPress still exist) where it's basically just a single PHP application, perhaps now with its own dependency manager (i.e. Composer), and nothing much else? Are people with shared hosting accounts going to still be able to get it running? Will they be able to get it running more easily than they can today? (Certainly, they're not often currently getting it running with Visual Editor, for example.)

I'd like to think that MediaWiki will become easier to install. Maybe that means going in the direction of Discourse, and only supporting deployment via Docker, in order to hide the complexities of all the required services. But that's got a whole lot of confusions of its own, that I think are perhaps too much. Is the future of self-hosting really going to be VPSs, or even "serverlessness"? I guess it could be. The security conundrums with shared hosts are bad, certainly... but perhaps not as bad as poorly-managed whole servers? At least Dreamhost and their ilk monitor for suspicious-looking stuff; Digital Ocean couldn't care less untill you're such a spam farm that you're interferring with other things.

Imagine if MediaWiki (with all the good bits as well) were super easy to install, that people could turn to it for any collaborative editing website! I guess I'm probably just showing my age though, and am harking back to 2002 when it seemed desirable that people would control their own bit of the web. Still, I do think MediaWiki does multiple-people-editing-multiple-pages-quickly rather well, and is still easier to use (once installed) than some combination of "Markdown files on Github and photos loaded from Instagram and embeds from Twitter" or "put it all on WordPress.com" (or God forbid "we don't need anything now we've got Slack").

MediaWiki, through it's structure and editing philosophy, really does encapsulate something great about the open web: we've got pages, they contain whatever you want, there are links between them, all changes are tracked, and beyond that lies an infinite field of human creativity and ingenuity. No algorithms to coorce your behaviour, nothing hidden and nothing prohibited. It still makes sense, and I think there's still a future for this sort of thing.

Open Source Hack Afternoon

Glendalough Railway Station

I went along today to my first open source hack afternoon, a regular language/platform agnostic hack group that's now meeting at the Artifactory.

It was a hot day today, with dark orange skies from the fires up near Mundaring, and when I got to the Artifactory there was a bit of a delay in getting inside and so we sheltered in half a metre of shade against a hot wall for a little while.

We had a pretty good room with an portable air conditioner that made it just about a bearable temperature (and provided white noise, in case that's useful). Stephen brought a projector, so we could share things more easily.

I'm looking forward to next month—and maybe more people will come! Maybe it'll be nicer weather.


Loops and deadlines


It's a loopy sort of day. I mean there are loops in it. Which is pretty common, but sometimes gets annoying. It's like having a flowchart in your head, but someone forgot to put an end state in it and instead it links back on itself. It only does this after some sufficiently large number of steps though, so it's sort of hard to see that it's a loop at first, because you're expecting it not to be and aren't keeping count properly.

There is no deadline so every second is one: on anxiety, perfectionism, and Wikimedia projects by Léna:

There is no deadline on Wikimedia projects, and we have so, so much to do. Maybe I should grab my camera, go outside, and take decent pictures of my city. Or I should work with my local LGBT community and do an edit-a-thon. Or work at a national level, contact centers and convince them to share some of their archives. Or I should work on my tools backlog. Gosh, I haven’t seen the admin discussions on French Wikipedia for months, I should give a helpful hand. And I have so many books, I just should take some time and write articles. Or I should look at the latest mass-upload on Commons and put pictures on Wikipedia. Or I should enrich the French Wiktionnary about Gaumais, the language of my ancestors, to my knowledge I am the only one on Wikimedia projects who knows about Gaumais. What is the most important ? Where do I have the most value ? Oh, I guess I should work on that friendly space policy at Wikimedia France, I have lot of experience on what works and what doesn’t on other spaces, I would be so useful there.

Not jumping for the shiny new things

It is very hard to not want to change things when a better idea comes along. The problem mostly seems to be that "better" is so easy to feel when you first see a new thing, and it's so hard to remember that the current way of doing things was once that same "better".

Bit by bit I'm learning to be more conservative in jumping to the next cool new technology, but I still feel the pull. A simple tweet about a new way of working with Markdown; a Hacker News thread about why hosting your own websites is dead; or the feeling of giving in to the corporate world of the web (Twitter & Flickr, mainly) will bring some cool relaxation. But really, ignoring the impulse (unless it comes again and again) is not that hard, and makes for an easier ride in the end.

Also finished listening to How to Stop Time (eps 8–10).

Beginning 2018


It's the fourth day of 2018, and nothing seems much different from 2017 yet. I'm still too slow at cutting bits of wood to the right size, and too indecisive about how much of my inaccuracies I want to leave showing in a finished piece, and so my wine glass shelves have progressed slowly. On the code front, I'm still trying to build a working system to extract my and my friends' photos from Flickr — the delays mostly around the best ways to keep them in MediaWiki (i.e. templates, using Cargo or not, etc.). One good thing that's happened is that GlobalPreferences' required changes to MediaWiki core have been merged, and so work can begin in earnest on the extension itself. This isn't yet that exciting, as no new functionality is being introduced, but we're one step closer to local overrides (and maybe one day even deploying the blasted thing).

My experiment in wiki-based genealogy research (over on ArchivesWiki) is progressing, and some issues around generating GraphViz graphs of human relationships are being resolved. One thing that looks like it could be useful is an idea to introduce a {{SHORTDESC:Lorem ipsum}} magic word. I reckon this could be the thing that is displayed in parentheses after a person's name in family trees (it'd beat the current system of attempting to glean a useful year-only date range out of non-standardized and variable-granularity date formats).

1895 Stornoway chart


Approaches to Stornoway titleblock.png

Yesterday I found a 1:18300 scale nautical chart of the Approaches to Stornoway in an antique shop in North Fremantle. It's not large, but looks nice, and reminds me both of how nice printed maps can be (on good, thick, paper) and also of Stornoway (where I spent a few days last year). I'm making a wooden clamping bar to hang it with.

Approaches to Stornoway.png Approaches to Stornoway harbour detail.png

Then I made a bolted-stick contraption, and hung it on the wall:

2018-01-06 Stornoway chart on wall.jpg

Monday MediaWiki


Monday morning, hot and humid, and the rain's been falling all night (nearly 5 mm!). It's one of those lovely days when you can look out to the ocean and stand on the limestone and feel this place.

I'm reading through the position statements that have been accepted for the Wikimedia Developer Summit in January. It's great to read other people's ideas in this form. I think there's not really enough of that, in MediaWiki development: it's hard to get an idea of other people's 'big picture' thoughts of what the future should hold.

PhpFlickr 4.1.0

I've just tagged version 4.1.0 of my new fork of the PhpFlickr package. It introduces oauth support, and hopefully improves the documentation of the user authentication process. This release deprecates some old behaviour, but I hope it doesn’t break any. Bug reports are welcome!

There are some parts that are still not converted to the new request flow, but I’ll get to them next.


Retrieved from ‘https://wiki.samwilson.id.au/index.php?title=Welcome&oldid=1827