December 9 Sunday, 3:31PM. I've fixed the PhpFlickr upload bug. Now just need to clean up the patch. But first I'm going to go and explore the rest of this library. #

December 9 Sunday, 1:42PM. Is there any way to make WordPress store the timezone of a post? And then make it display the correct local time for each post (but still sort them in chronological order)?

I'm not sure there is, but here in MediaWiki with Cargo it's pretty easy. Maybe a bit clunky though! I've been working on my [ #

December 7 Friday, 8:27AM. I'm finally fixing an upload bug in PhpFlickr, and in doing so I went looking for an answer to a thing, only to find that I had exactly this same question 2 years ago and the answer (that I wrote) is still the same. Oops. Hard to not go in circles sometimes. #

December 6 Thursday, 4:37PM. This is a test post. #

December 6 Thursday, 7:01AM. It's a rainy dawn in San Diego. Grey skies and a bit chilly. Happy to be in this warm room (and with a better coffee than the urn provides). #

November 9 Friday, 9:00AM. I'm sitting in Perth airport hacking on an event form for the WMAU website. #

November 7 Wednesday, 5:44PM. It's a splendid feeling, turning off one's computer at the end of the day. I recommend it. #

November 6 Tuesday, 2:16PM. Nice to see someone else using MediaWiki for their personal website: https://www.grantswebsite.net/ #

November 5 Monday, 11:10AM. The genealogy demo wiki has been found by the spammers! :-(

I'm fixing it up (and upgrading). #

November 4 Sunday, 4:48PM. I found a bug and maybe fixed it: phabricator:T208670#

My coffee mug

Hello world, and welcome to my corner of the web! This is where I write words about what I'm working on, and post photographs of things I've seen.

I'm a Software Engineer at the Wikimedia Foundation, and so of course my personal website is a wiki (running on MediaWiki). In my spare time I volunteer with WikiClubWest to work on Wikimedia projects, mostly around my family's genealogy and local Western Australian history (especially to do with Fremantle). I try to keep up with issues on all the things I maintain.

I also try to find time to work in my workshop on various woodworking projects. Recently, that's been focused on getting my new workshop's doors built and installed.

Travel features in my life, not because I really hugely want to go elsewhere but because I just do — and also because then I can do some more interesting mapping on OpenStreetMap.

I'm currently reading , and Fathers of Men (E. W. Hornung, 1912).

To contact me, you can email me or find me on the Freenode IRC network (as 'samwilson'). If you want to leave a comment on this site (by creating an account), you need to know the secret code Tuart (it's not very secret, but seems to be confusing enough for most spammers).



In a café in Berkeley


Is the internet a good place to start typing random thoughts? It feels like it’s probably not, because of all the “taken out of context”, “recalled in future years and laughed at”, and “what’s the point no one will see it” responses. But it also feels like random beginnings and unplanned words are the only things that will ever lead to more coherent and useful words, and that putting them out in the great wash of the online world is slightly better than hiding them away in a notebook in my own bottom drawer. I do write lots and lots of words that only I will ever see, and they’re usually pretty unpolished. I don’t think that what I put on this blog or Twitter or anywhere else is particularly good, but I do at least attempt to finish sentences and thoughts, and fix typos. Maybe that’s all I mean: that uploading ideas makes the brain follow though and express them, and in doing that there’s a surprising amount of satisfaction.

Exporting chat transcripts from Signal


I've started using Signal for some things, and it seems there is a way to export chat histories from the desktop client:

  1. Open the developer tools sidebar (ctrl-shift-i)
  2. Navigate through the 'Elements' view to the div.conversation-stack element
  3. Right click on that div and 'copy outerHtml'
  4. Past this HTML into a new file and save it somewhere

Now that file can be viewed with a web browser.

This isn't a particularly great export of course, but at least it's something.

Bentley Library

Bentley Library

Interesting discussion this morning at Bentley Library about how the Wikimedia movement can fit in to the library's programmes. Thoroughly inspiring! Am looking forward to more.

Bently Library 'Hub' sign on wall.jpg

Old code


It's really quite relaxing working on existing systems compared to building new ones. There are a bunch of old decisions that have already been made about the basic things, and maybe they're annoying decisions and might be done differently if done today, but at least they're done and one needn't figure it all out again.

Data modeling


I know I've said it before, but I really do find data modeling within MediaWiki with Cargo is great fun, and happens with the quick speed of normal wiki editing. It's a lot nicer (for the bulk of cases) than attempting to build a bespoke application. (Of course, I should preface all of this with some caveats about what can be done, but blah blah that can all be taken as read).

It's better to come at things from the other end, of say a small set of data in a spreadsheet. This can in many cases be easily ported into the MediaWiki idea by creating a single wiki page for each row of data, with each page containing a call to a single template. This template is where almost everything else is done.

One weird thing about it is that it's sort of closer to a EVA schema than a normal table, because each record can (but really shouldn't) have different attributes. This is a good thing from a point of view of tracking changes over time to the data, because changes to attribute names as well as their values are tracked in the page history. Of course, it also means that one has to do more scripting for some types of data modification, but for the most part that's not very hard.

Today the thing I'm enjoying about it is that it's perfectly easy to set up a new template and table and things for any quite small dataset. That means that data is given the structure it needs, rather than being munged into some more generic schema. (I guess this will also probably come back to bite me one day too, because I'll have separate things that don't fit together! But oh well.)

The other weird bit about Cargo is that it almost does away with the need to have categories in MediaWiki. I'm not sure if that's a good thing or not. So far I'm loving the extra flexibility I get with 'keywords' that are usually tied to mainspace pages.

Fremantle civic centre demolition


2018-10-09 Fremantle civic centre demolition jaws.JPG

I went to see the demolition of the civic building in Kings Square today. It's a bit sad to see it all coming down. Interesting to see the side of the town hall exposed, but really mostly just melancholy. Change is always a bit like that.

(These photos are also on Commons, in commons:Category:October 2018 in Fremantle.)

Providing Services from a Syfony bundle


I'm trying to add a service in a redistributable Symfony 4 bundle. The docs say it is just a matter of loading the service configuration in the bundle's Extension class. For example, for the GoatBundle:

class GoatExtension extends \Symfony\Component\DependencyInjection\Extension\Extension {
    public function load(array $configs, ContainerBuilder $container) {
        $configDir = dirname(__DIR__).'/Resources/config';
        $loader = new YamlFileLoader($container, new FileLocator($configDir));

Where GoatBundle/Resources/config/services.yml looks like this:

  factory: [ '\Factory\For\InjectedClass', serviceFactory ]
    - '@service_container'

But this results in:

There is no extension able to load the configuration for "Name\Of\InjectedClass" (in GoatBundle/Resources/config/services.yml). Looked for namespace "Name\Of\InjectedClass", found none

I went around and around in circles until I realised that I was simply missing the top-level services key in services.yml! It needs to be like this:

    factory: [ '\Factory\For\InjectedClass', serviceFactory ]
      - '@service_container'

I'm only writing it all down here because it was only by this rubber ducking that I saw the problem.

Writing every day


I used to try to write every morning about what I was going to work on in the day. Sometimes I'd publish it as a blog post, but mostly just stick it away in my private journal — it was the process of writing it that mattered, not at all the fact that I could then read back over it. In fact, I think I pretty rarely read over anything a second time, except perhaps when searching for some bit of documentation that I had an inkling that I'd written down somewhere. Some people seem to blog about topics, I just treat it like a rambling whatever-space to put anything at all.

I tried the other day to switch to a Markdown-and-Pandoc based blog, hosted as static files on Netlify, but I then immediately wanted to search something... and couldn't. Of course, there are ways of doing static-site search (build a static index, and query it from the client) but I'm not very interested and so am back here in MediaWiki.

MediaWiki with two database servers


I've been trying to replicate locally a bug with MediaWiki's GlobalPreferences extension. The bug is about the increased number of database reads that happen when the extension is loaded, and the increase happens not on the database table that stores the global preferences (as might be expected) but rather on the 'local' tables. However, locally I've had all of these running on the same database server, which makes it hard to watch the standard monitoring tools to see differences; so, I set things up on two database servers locally.

Firstly, this was a matter of starting a new MySQL server in a Docker container (accessible at and with its data in a local directory so I could destroy and recreate the container as required):

docker run -it -e MYSQL_ROOT_PASSWORD=pwd123 -p3305:3306 -v$PWD/mysqldata:/var/lib/mysql mysql

(Note that because we're keeping local data, root's password is only set on the first set-up, and so the MYSQL_ROOT_PASSWORD can be left off future invocations of this command.)

Then it's a matter of setting up MediaWiki to use the two servers:

$wgLBFactoryConf = [
	'class' => 'LBFactory_Multi',
	'sectionsByDB' => [
		// Map of database names to section names.
		'mediawiki_wiki1' => 's1',
		'wikimeta' => 's2',
	'sectionLoads' => [
		// Map of sections to server-name/load pairs.
		'DEFAULT' => [ 'localdb'  => 0 ],
		's1' => [ 'localdb'  => 0 ],
		's2' => [ 'metadb' => 0 ],
	'hostsByName' => [
		// Map of server-names to IP addresses (and, in this case, ports).
		'localdb' => '',
		'metadb' => '',
	'serverTemplate' => [
		'dbname'        => $wgDBname,
		'user'          => $wgDBuser,
		'password'      => $wgDBpassword,
		'type'          => 'mysql',
		'flags'         => DBO_DEFAULT,
		'max lag'       => 30,
$wgGlobalPreferencesDB = 'wikimeta';



New MediaWiki extension: AutoCategoriseUploads. It "automatically adds categories to new file uploads based on keyword metadata found in the file. The following metadata types are supported: XMP (many file types, including JPG, PNG, PDF, etc.); ITCP (JPG); ID3 (MP3)".

Unfortunately there's no code yet in the repository, so there's nothing to test. Sounds interesting though.

Self-hosted websites are doomed to die


I keep wanting to be able to recommend the 'best' way for people (who don't like command lines) to get research stuff online. Is it Flickr, Zenodo, Internet Archive, Wikimedia, and Github? Or is it a shared hosting account on Dreamhost, running MediaWiki, WordPress, and Piwigo? I'd rather the latter! Is it really that hard to set up your own website? (I don't think so, but I probably can't see what I can't see.)

Anyway, even if running your own website, one should still be putting stuff on Wikimedia projects. And even if not using it for everything, Flickr is a good place for photos (in Australia) because you can add them to the Australia in Pictures group and they'll turn up in searches on Trove. The Internet Archive, even if not a primary and cited place for research materials, is a great place to upload wikis' public page dumps. So it really seems that the remaining trouble with self-hosting websites is that they're fragile and subject to complete loss if you abandon them (i.e. stop paying the bills).

My current mitigation to my own sites' reliance on me is to create annual dumps in multiple formats, including uploading public stuff to IA, and printing some things, and burning all to Blu-ray discs that get stored in polypropylene sleeves in the dark in places I can forget to throw them out. (Of course, I deal in tiny amounts of data, and no video.)

What was it Robert Graves said in I, Claudius about the best way to ensure the survival of a document being to just leave it sitting on ones desk and not try at all to do anything special — because it's all perfectly random anyway as to what persists, and we can not influence the universe in any meaningful way?

Wikisource books for binding


I have been experimenting with turning Wikisource works into LaTeX-formatted bindable PDFs. My initial idea was to produce quatro or octavo layout sheets (i.e. 8 or 16 book pages to a sheet of paper that's printed on both sides and has the pages layed out in such a way as when the sheet is folded the pages are in the correct order) but now I'm thinking of just using a print-on-demand service (hopefully Pediapress, because they seem pretty brilliant).

Basically, my tool downloads all of a work's pages and subpages (in the main namespace only; it doesn't care about the method of construction of the work) and saves the HTML for these, in order, to a html/ directory. Then (here's the crux of the thing) it uses Pandoc to create a set of matching TeX files in an adjacent latex/ directory.

So far, so obvious. But the trouble with this approach of wanting to create a separate source format for a work is that there are changes that one wants to make to the work (either formatting or structural) that can't be made upstream on Wikisource — but we also want to be able to bring down updates at any time from Wikisource. That is to say, this is creating a fork of the work in a different format, but it's a fork that needs to be able to be kept up to date.

My current solution to this is to save the HTML and LaTeX files in a Git repository (one per work, e.g.) and have two branches: one containing the raw un-edited HTML and LaTeX, on which the download operation can be re-run at any time; and the other being based off this, being a place to make any edits required, and which can have the first merged into it whenever that's updated. This will sometimes result in merge conflicts, but for the most part (because the upstream changes are generally small typo fixes and the like) will happen without error.

Now I just want to automate all this a little bit more, so a new project can be created (with GitHub repo and all) with a single (albeit slow!) command.

The output ends up something like Commons:File:The Nether World by George Gissing.pdf.

2018 Firefox survey


I just filled in the 2nd Annual Firefox Census, which is just a survey about weird stuff and Firefox and Mozilla and stuff (literally: like "To what Hogwarts house do you belong?").

I don't know why I feel like it's sort of okay to share data with a company like Mozilla, but I guess I do. Although, come to think of it I'll also sign up to loyalty programmes at grog shops too... I guess I don't really care about my personal information that much!

It's more the bloody annoying algorithms of YouTube and Twitter that annoy me. Give me chronologies and folksonomies any day! (Sort of. There is, of course, more to it, but that's always the case.)

Template frontmatter

Golden Gate Club, San Francisco

A few years ago the static-site bloggig tool Jekyll popularised the idea of text files containing 'front matter', which is usually Yaml-formatted metadata put between some delimeters at the top of a file. This works pretty well in MediaWiki as well, with a slightly different format (i.e. templates).

Yaml Wikitext HTML
type: book
author: John Smith
publication_date: 1923
| author = John Smith
| publication_date = 1923
<div itemscope itemtype="http://schema.org/Book">
  A book by <span itemprop="author">John Smith</span>,
  published in <span itemprop="date">1923</span>.

I find it useful to think of wiki pages as representing instances of some sort of 'entity', and the template at the top is what defines this.

With the addition of Cargo, all this metadata becomes queryable from elsewhere in the wiki.

Waiting for a flight

Perth airport

Sunday morning, Perth airport. In a few hours (by the clock) I'll be in San Francisco. It'll take me a bit longer than that to get there, but that's okay—it's a nice day. The main question is whether one should add spaces next to an em dash. Or whether it's easy to use altogether too many em dashes! Tricky questions. Plenty of time to figure them out. And to find the asciicircum key (why don't they call it what we all call it?).

In the UK, "phonebox numbers reached their peak in 1992, when there were 92,000 of them"[1] and now they're getting rid of lots and will be left with only twenty thousand or so. I guess we all have other phones now (even if some of us don't use them ever and are terrible at answering when phoned).

Someone said this morning that "MediaWiki seems to be much more popular outside the foundation then it is inside it."[2] I hope that's true! I mean, it's not good that it's not popular inside the Foundation (we use Google docs sometimes for things! shock horror), but I like to think that it is popular outside. I like MediaWiki.

The coffee here really isn't as good as elsewhere. Yesterday, I had a terrific long black at the place on the corner of King and Wellington streets. Terrible seating, but lovely coffee, and a good window to peer out of. Important things. From here, a metre away, I can see black polyester-clad bums of two pilots, dragging their nice little square rolly suitcases.

Anyway, I'll stop blathering and go find a better spot to while away the time.

Centenary Building


Archaeological excavations (by a company called ARCHAE-AUS) around the Freo town hall have been carrying on this week, with evidence of a farrier's and blacksmith found at the corner of Newman Court (was Street) and William Street, where the Centenary Building stood for thirty years or so from 1929.

The sign in the middle photo reads:

Exposing 1890–1913 blacksmiths', farriers', coachbuilders', & wheelwrights'.

Mr Tyler is in the photograph taken in 1896 in the front of his workshop (photo taken from other side of William St).

I'm not sure what happened in 1913.

Around the corner they've started work to uncover the foundations of the first St Johns church. This one they know more about, as it was excavated in the 1980s (although, the word on the street is that not enough photos were taken, nor other detailed records made). It'll be exciting to see this, after knowing about the church's outline in the paving stones my whole life.

What's the future of hosting MediaWiki?


What is to be the future of running one's own MediaWiki? Shall there be a dozen different services required (database, cache, search, parser, ...) all running with different technologies and different systems of upgrading and support? Or will we head back to the "old days" (in which things like WordPress still exist) where it's basically just a single PHP application, perhaps now with its own dependency manager (i.e. Composer), and nothing much else? Are people with shared hosting accounts going to still be able to get it running? Will they be able to get it running more easily than they can today? (Certainly, they're not often currently getting it running with Visual Editor, for example.)

I'd like to think that MediaWiki will become easier to install. Maybe that means going in the direction of Discourse, and only supporting deployment via Docker, in order to hide the complexities of all the required services. But that's got a whole lot of confusions of its own, that I think are perhaps too much. Is the future of self-hosting really going to be VPSs, or even "serverlessness"? I guess it could be. The security conundrums with shared hosts are bad, certainly... but perhaps not as bad as poorly-managed whole servers? At least Dreamhost and their ilk monitor for suspicious-looking stuff; Digital Ocean couldn't care less untill you're such a spam farm that you're interferring with other things.

Imagine if MediaWiki (with all the good bits as well) were super easy to install, that people could turn to it for any collaborative editing website! I guess I'm probably just showing my age though, and am harking back to 2002 when it seemed desirable that people would control their own bit of the web. Still, I do think MediaWiki does multiple-people-editing-multiple-pages-quickly rather well, and is still easier to use (once installed) than some combination of "Markdown files on Github and photos loaded from Instagram and embeds from Twitter" or "put it all on WordPress.com" (or God forbid "we don't need anything now we've got Slack").

MediaWiki, through it's structure and editing philosophy, really does encapsulate something great about the open web: we've got pages, they contain whatever you want, there are links between them, all changes are tracked, and beyond that lies an infinite field of human creativity and ingenuity. No algorithms to coorce your behaviour, nothing hidden and nothing prohibited. It still makes sense, and I think there's still a future for this sort of thing.

Open Source Hack Afternoon

Glendalough Railway Station

I went along today to my first open source hack afternoon, a regular language/platform agnostic hack group that's now meeting at the Artifactory.

It was a hot day today, with dark orange skies from the fires up near Mundaring, and when I got to the Artifactory there was a bit of a delay in getting inside and so we sheltered in half a metre of shade against a hot wall for a little while.

We had a pretty good room with an portable air conditioner that made it just about a bearable temperature (and provided white noise, in case that's useful). Stephen brought a projector, so we could share things more easily.

I'm looking forward to next month—and maybe more people will come! Maybe it'll be nicer weather.


Loops and deadlines


It's a loopy sort of day. I mean there are loops in it. Which is pretty common, but sometimes gets annoying. It's like having a flowchart in your head, but someone forgot to put an end state in it and instead it links back on itself. It only does this after some sufficiently large number of steps though, so it's sort of hard to see that it's a loop at first, because you're expecting it not to be and aren't keeping count properly.

There is no deadline so every second is one: on anxiety, perfectionism, and Wikimedia projects by Léna:

There is no deadline on Wikimedia projects, and we have so, so much to do. Maybe I should grab my camera, go outside, and take decent pictures of my city. Or I should work with my local LGBT community and do an edit-a-thon. Or work at a national level, contact centers and convince them to share some of their archives. Or I should work on my tools backlog. Gosh, I haven’t seen the admin discussions on French Wikipedia for months, I should give a helpful hand. And I have so many books, I just should take some time and write articles. Or I should look at the latest mass-upload on Commons and put pictures on Wikipedia. Or I should enrich the French Wiktionnary about Gaumais, the language of my ancestors, to my knowledge I am the only one on Wikimedia projects who knows about Gaumais. What is the most important ? Where do I have the most value ? Oh, I guess I should work on that friendly space policy at Wikimedia France, I have lot of experience on what works and what doesn’t on other spaces, I would be so useful there.

Not jumping for the shiny new things


It is very hard to not want to change things when a better idea comes along. The problem mostly seems to be that "better" is so easy to feel when you first see a new thing, and it's so hard to remember that the current way of doing things was once that same "better".

Bit by bit I'm learning to be more conservative in jumping to the next cool new technology, but I still feel the pull. A simple tweet about a new way of working with Markdown; a Hacker News thread about why hosting your own websites is dead; or the feeling of giving in to the corporate world of the web (Twitter & Flickr, mainly) will bring some cool relaxation. But really, ignoring the impulse (unless it comes again and again) is not that hard, and makes for an easier ride in the end.

Also finished listening to How to Stop Time (eps 8–10).


Retrieved from ‘https://wiki.samwilson.id.au/index.php?title=Welcome&oldid=1836