Have you ever noticed that, even if you delete text from a Microsoft Word document, the file size doesn’t decrease? If you are writing a book in MS Word, you’ll notice that the .doc file grows ever larger as you add & delete paragraphs, move snippets around; even if you delete a chapter, the file doesn’t necessarily shrink.

If you copy-pasted a document which you’d been working on for a while into a new, empty Word .doc (can I call that the null document?), maybe you noticed that the new file — even though it contained exactly the same text — was much smaller than the older file.

  • That’s because, mathematically, .doc files are semigroups

Typewritten letters are also semigroups — the thousands of possible binary operations would be “append from {letters, punctuation, whitespace}" — but with Word files, the operations include delete words, delete ¶ and replace ¶ with the word you just typed.

Whereas the state of a .txt file is the current body of text, the state of a Word .doc is the entire history of the document. (That’s why programmers use git|svn to remember the history of their directories: the IDE doesn’t do it for them.)

You can see this in the metadata. Open a .doc file in a plain-text editor (e.g., Notepad) and you’ll see snippets of things you thought you deleted. Hey, why are those still there?!

Microsoft keeps them there because you — and most Word users — want to be able to hit Ctrl+Z or Edit > Undo. If the pages of highlighted text you just replaced with an “a” were really gone, there could be no undo operation. And then you would screw yourself thousands of times with mistaken keystrokes and on sad days when you think most of your book is trash.

As a consequence of giving people what they want, Microsoft has also given some people what they don’t want. The US government has written instructions on how to really, truly delete classified information in .doc files. (One wonders what was accidentally disclosed before those instructions were written.) And Merck was blasted, after they submitted an article about Vioxx after Delete-ing a paragraph about the concomitant risk of heart attack. Apparently someone at the New England Journal of Medicine knows how to press Ctrl+Z.

Note to CMO’s: Never send an unethical MBA to do an unethical hacker’s job. \insert{LaTeX quip}.

Personally, I never send resumes or curriculum vitæ in Word, nor do I send reports to clients in Word. I always generate a PDF. When I’m writing something, I want it to be malleable. But when I give it to you, I want it to be locked down, uneditable, unable to be reverse-engineered. I make sure what you’re viewing is exactly how I want it to look, saying just and only what I want it to say.

74 notes

  1. azelie reblogged this from absurdreasoning
  2. permutationsofmadness reblogged this from isomorphismes
  3. secondset reblogged this from isomorphismes
  4. geniusesaregreatinbed reblogged this from isomorphismes
  5. nostep reblogged this from isomorphismes
  6. arielzc reblogged this from isomorphismes
  7. esteparium reblogged this from isomorphismes
  8. bparramosqueda reblogged this from isomorphismes and added:
    Really useful!!
  9. qvbit reblogged this from isomorphismes
  10. shes-a-caricature-of-an-epiphany reblogged this from absurdreasoning
  11. kareristangpangkalawakan reblogged this from isomorphismes
  12. isomorphismes posted this