Parsing Sentences for Easy Reading

For the past year, I’ve had the pleasure to work on and off for a Toronto-based consultancy that focuses on accessibility. In addition to doing training, accessibilty audits, and development, they provide video captioning and description services, which is where I fit in.

Unlike those mysterious beings that close caption live television, I caption videos – generally university lectures. It’s interesting work, and was a lovely part-time job while I was finishing up my Masters. I’ve kept up with it because I enjoy it… and extra pocket money is always enjoyable. What I did not realize when I started, however, was that it would also have a lasting (positive) impact on my work as an IA/tech writer.

You know how sometimes, you’re reading, and a line break hits in the wrong spot, and it’s weird? Bad sentence parsing. I usually only notice if it’s two or three lines of text, like a PowerPoint title, or subtitles on a video. Poor sentence parsing makes it harder for readers to read smoothly: in captioning in particular, you want people to be able to read quickly and glean all the information they’re reading without becoming confused by a subject-object split. As a person whose focus is on expressing concepts clearly, this has been something of a revelation. While I’m not a designer, I necessarily do some design in my work, particularly since I’ve started freelancing. Parsing is something that I increasingly think about, and I think it’s making my visual work easier to understand.

Take, for example, the post-it note that I have had stuck to my monitor, reminding me to write this post:
Post it note reads Parsing Sentences for Easy ReadingThat’s pretty solid. I’m happy with that parsing. I might easily have written this, though:

Parsing Sentences for
Easy Reading

Hopefully you see that this is a less smooth reading experience. Let’s take another example, this time borrowed from a fantastic Girl Geeks Toronto event that I went to last night (recap post to follow). The title of the event was “Designing for Digital: Processes and Planning for Powerful Solutions.” Here are two ways we could write that on the title slide of our presentation deck:

Designing for
Digital: Processes
and Planning
for Powerful Solutions
 
Designing for Digital:
Processes and Planning
for Powerful Solutions

I think the second one is easier to parse, and easier to read. With presentation software and design software, where you create text boxes and write in them, it’s easy to rely on the automatic text wrapping. I have started to fight that urge, and to think about how best to manage my line breaks. I’m not advocating for manually line breaking everything, but for titles or short, punchy sentences, it’s something to think about. If reading it aloud sounds wonky, try moving the line break. Your readers will thank you, even if they don’t realize it.

Edited to add:

Another excellent example care of Paul, from a Zipcar ad in the New York subway:

Better Writing is One Bundle/Package/Plugin Away

Two months ago, I downloaded and installed a writing tools bundle for TextMate 2, one of my favorite text editors. “English Highlight” as it is so innocuously named, does three awesome things:

  1. It highlights weasel words (few, very, fairly, quite, etc.)
  2. It highlights passive sentences (or, should I say, passive sentences are highlighted)
  3. It highlights duplicate words (not not that you’d ever do that).

Christopher Alfeld’s “English Highlight” is an adaptation of Matt Might’s shell scripts. Matt Might is an Assistant Professor at the University of Utah. He noticed that his students tended to ‘abuse’ the passive voice, use weasel words, and repeat words, so he wrote some bash scripts to identify these and integrated them into their LaTeX build. This has spawned a variety of plugins for common text editors. I’ve complied a list of plugins at the bottom of this post.

TextMate 2 with English Highlight screenshot

Screenshot from my TextMate: purple highlighting indicates weasels, passives, or repeated words.

I am not going to lie: it was demoralizing when I first opened up a file and saw tons of purple. Apparently nine years of post-secondary education (6 of which were in the sciences) bred a deep love of the passive voice. Similarly, two years of graduate school, where the answer to every question is ‘it depends’, may have left me generous with my ‘various’, ‘numerous’, and ‘few’s. Highlighting my shortcomings in purple makes it easy for me to identify areas that need work, and to quickly make my writing stronger and clearer.

Weasel Words

The thing about weasel words is that they rarely add to a sentence: they either make your sentence vague or unnecessarily wordy, neither of which is a positive. Admittedly, sometimes you want to say that something is ‘quite’ something. That’s cool! You’re allowed! You might not realize how often you say ‘quite’ or ‘very’, though, and if it’s not helping, it’s hindering.

I went looking for a wishy-washy sentence that I’d recently wrote, but couldn’t find one: it seems my highlighter has done the trick! I’m afraid to open any of my old research papers, so I’m borrowing an example from Matt Might:

Bad: False positives were surprisingly low.
Better: To our surprise, false positives were low.
Good: To our surprise, false positives were low (3%).

I know I have a tendency to overuse ‘various’, ‘numerous’, and ‘fairly’. Highlighting those words draws my eye back to the sentence and makes me think about ways I can improve it. Often it’s as easy as deleting the word.

Passive vs. Active Voice

The passive voice thing is less straightforward than the weasel words. The passive voice has historically held a hallowed position in the sciences, where the prevailing opinion seems to be that science should mysteriously emerge completely independent of the scientists who do it. For this reason, students have to write “10mg of magnesium were massed” in their lab reports, rather than “I massed 10mg of magnesium.” This may have been a contributing factor in my changing majors from chemistry to environment, where I was occasionally allowed to write as though I existed.

During the aforementioned environment undergrad, I attended a somewhat rebellious lecture by Linda Cooper. Linda Cooper is a lecturer at McGill who studies science communication and teaches classes on science writing. She argued that using “direct, active-voiced sentences” makes sentences stronger and easier to read, and that we should all stop blathering on endlessly in the passive voice and instead, choose to use the active when appropriate. It’s easy to see that she’s right when you compare passive-voiced to active-voiced sentences:

Original: If MMS is being run with DB Profiling enabled, further permissions are required.
Revised: If MMS is running with DB Profiling enabled, the user requires additional permissions.

While both sentences point to the same concepts: that running MMS with DB profiling means you’re going to have to do something with permissions, the first sentence is far more vague. What sort of ‘further permissions’ are we talking about? Permissions for MMS? Permissions for you-the-user? Some sort of network permissions? Who knows! The second sentence get to the point: the user requires additional permissions. In either case, the next paragraphs describe what those permissions are, but the revised sentence guides the reader more quickly to the correct answer.

There’s certainly times where passive sentences are appropriate: for instance, I haven’t managed to rewrite “MongoDB is designed specifically with commodity hardware in mind…” as an active-voiced sentence, and I doubt I will. Expunging all passives from the record isn’t the goal here: the goal is to write as clearly as possible, and to be more aware the choices you make when writing.

Resources

As mentioned above, Matt Might’s scripts have been adapted for a number of text editors. I particularly like the name of the emacs / vim mode. If you’re doing any sort of writing – technical or not – I highly recommend installing one of these extensions and trying it out. It makes a huge difference.

Classification, MongoDB Operators, and Kitchen Chairs

ROCM: Representation, Organisation, Classification, and Meaning-Making was a required class when I did my Master of Information. At the time, I complained incessently about how useless it was, how I was never going to use any of it in real life… Well, I was wrong.

A cat laying upon a Roomba

Despite being sat upon in the kitchen, this is probably not a kitchen chair. Credit: barbostick

Yesterday morning, I spent about four hours thinking about ROCM as I reorganized MongoDB’s operators page. One of our professors had a tendency to ask “What makes a kitchen chair a kitchen chair? Is any chair in a kitchen a kitchen chair? Is anything you sit on? If you sat on your dog while he’s in the kitchen, is he a kitchen chair? If you carry a chair that’s usually in your kitchen into the living room, is it still a ‘kitchen chair’?”, etc. The “kitchen chair” metaphor became something of a meme among my cohort, one which inevitably gets brought up whenever we’re together.  The crux of the kitchen chair debate is the question of what makes stuff fit into the categories it fits into. This is really the issue that anyone involved in classification or organization has to grapple with, hopefully with the knowledge that no solution is perfect, but some are better than others.

This week, the question was “What makes these MongoDB operators fit together?”. Some groupings were obvious: the ‘Logical’ operators (‘or’, ‘and’, ‘not’, and ‘nor’) go together nicely. Same with the operators related to geospatial queries, and those specific to arrays. My problem was what’s left: the operators we’ve now classified as ‘Comparison’, ‘Element’, or ‘Evaluation’.  In a sense, the ‘Element’ category is ridiculous: by their nature, queries involve elements… for that matter, all queries involve comparison so the ‘Comparison’ category is a bit fraught as well. But what’s an IA-minded technical writer to do? Either give up and put them all in one long, awful list (which is unideal), or categorize as best you can, knowing that it won’t be perfect (also unideal). Clearly, I went with the latter.

I think the categories we settled on will be meaningful for our readers, while also being factually accurate, which is really the goal. The fact that there’s some weirdness is unfortunate, but sadly unavoidable.

The inevitable failure of classification systems is a central theme of Geoffrey Bowker and Susan Leigh Star’s Sorting Things Out: Classification and Its Consequences. Bowker and Star’s examples really highlight the power of classification: they discuss tuberculosis patients, and the ways that wishy washy diagnoses ruined peoples lives; talk about the incredibly inconsistent and damaging categorization of people under Apartheid in South Africa, and discuss nursing interventions classification and its impact on both nurses’ and patients’ lives. Sorting Things Out is one of those texts that I cited in (nearly) every paper I wrote during my MI, because it always applied. At its core, the book is about highlighting the ways that classification is – or becomes – invisible, and how people grow to accept categories as natural.  As people who care about the ways that we structure information, recognizing the theory behind classification and its implications can make us better at determining an ideal structure for our use case, and at recognizing the ways that it might fail.

Sorting Things Out is on sale at Amazon right now, for under $20. I paid about $50 when I bought my copy, so now’s a good time to bone up. It’s a great read that has instilled in me the need to regard existing classification systems with a critical eye rather than accepting them at face value, and to be far more thoughtful when creating my own.

Learning Emacs

For the past few weeks, on top of writing delightful documentation and exploring New York, I have been executing an herculean undertaking: learning Emacs.

10gen’s lead tech writer is an avid Emacs user: he uses Emacs for his email, his chat client, and, of course, for editing the docs. We write MongoDB’s docs in Docutils, and (mostly) easy to write. It helps to have a text editor that will do reStructuredText syntax highlighting, though, and those seem to be rare. My beloved Coda 2 does not appear to do it; Sublime Text’s highlighting is inconsistent, and its tabs a pain in the ass. TextMate is great, but even it sometimes fills me with rage with auto-indent behaviours that I can’t figure out / turn off.

So my colleague suggested I try out Emacs. It was lateish on a Tuesday afternoon, and I said “why not!” and promptly downloaded the GUI version of Emacs. He kindly gave me his Emacs configuration file so that I would have all his handy reStructuredText-facilitating extensions, and gave me a quick crash course. I suspect part of his enthusiasm was the prospect of being able to use Emacs on my computer when I needed his help with something, but a good deed is a good deed regardless. After a 45-minute pep talk, I was left feeling not unlike I did when my parents bought me my first iMac back in… 1998: “this is very cool, now how the heck do I turn it off?”

Aside: for those of you who do not recall, OS 9 hid the Restart / Sleep / Shutdown menu items in the “Special” menu. The first night I had my iMac, I ended up unplugging the computer in a fit of “oh god what have I done?” and having a cry because of all the money that had been spent on a computer that I had no idea how to use. I bought David Pogue’s “iMac for Dummies,” and life improved immeasurably.

On Wednesday, I remembered that Sacha Chua was an Emacs user, and vaguely recalled her posting a Beginner’s Guide to Emacs. I printed it out, propped it up on my desk, and have been adding to it ever since. Much like the purchase of Pogue’s book, this was an excellent life choice. There are a few things that bug me about Emacs: memorizing keystrokes isn’t something that I’ve ever been very good at, and I don’t like that (given my current configuration) I can’t highlight with shift and the arrow keys. As a learning experience, though, it’s been really pleasant, especially now that I’m starting to remember how to do things (I can now consistently write files, copy & paste, and move to the beginning and ends of lines!).

It sounds goofy, but I’ve always enjoyed the idea of being a person who is good at command line-y text editors. Whenever I see someone mysteriously navigating without a keyboard, typing furiously as windows pop up and disappear, I think it’s neat. Admitting this is both amusing and embarrassing: I’m forcing myself to learn Emacs because I think it looks cool.

Despite my somewhat goofy reasoning, Emacs is kind of fun! At a minimum, it’s made me realize that the way I’ve always interacted with files is not the only way, and it has made my emacs-loving colleague’s workflow is much less mysterious to me. I also really like that I can create a buffer that will execute a command for me: I like being able to generate the html version of whatever I’m working on and being able to click on links to the file if there’s an error.

If you’re just starting out and have an intense desire to look cool learn emacs, check out Sacha’s guide. I also suggest finding someone who is patient and kind, and willing to help you learn.