Office Things

The Essential Exchange Formats for Writers

Plus How to Rescue Old Files & Preserve Your Writing

In our previous post, we explored a variety of writing and creativity tools. All of them are capable of importing and exporting information in a variety of formats. But there are so many formats that confusion can set in easily.

Frugal Guidance 2 to the rescue! Here are the main file formats that writers need to know about and the various different names that they go by. We also have some tips for rescuing old, old word processing files and a few ideas for saving your work for posterity.

Plain Text Document

Text iconMost apps call this simply a text file. It uses the extension (.txt). There is no formatting. No graphics. Just text, which means you can import it into any writing app on any system (PC, Mac, Linux, iOS, Android, and anything else you can find out there, including legacy systems). If you need to open your file on your phone, your iPad, or your office computer, this will work. WordPad (see our previous post) also lets you save a file in a text format for old DOS editors, too.

Text files are much more popular for writing and editing on the Mac and Linux than on Windows systems, but they work on all platforms.

RTF

RTF icon2RTF or Rich Text Format is an older but flexible Microsoft Word format that dates back to 1987. It can be used to exchange formatted text and many graphics between most word processors. It’s a good Swiss-Army-Knife-type tool for exchanging files between applications when you don’t want to lose all your formatting as you would with a text file. It uses the .RTF extension. You can also use this format if you want to open your work in a Microsoft Word version before Office 2007.

Microsoft is no longer developing this format and some features of Office 2007 and later will not be preserved in RTF format.

Office Open XML (OOXML)

Microsoft Word iconIf you want to open your work in any version of Microsoft Word since 2007, this is the format to use, since it includes the Microsoft Word (and Excel, and Powerpoint) format specification. Most competing word processors (including Google Writer and LibreOffice / Apache OpenOffice Writer) can open OOXML files, too. For writers, Office Open XML is often called Word 2007-2013 or Word 2007-20xx format, or simply as its file extension: DOCX.

The name is confusing because Office Open XML is NOT OpenOffice XML. Let me repeat:

Office Open XML ≠ OpenOfficeXML

There have been at least 3 versions of OOXML since the turn of the century, so there are a few, minor compatibility issues even between recent versions of Word. Also, OOXML files cannot be opened by Word versions before Office 2007 without a translator (available from Microsoft).

ODT – OpenDocument Text (OpenOffice Text)

LibreOffice Writer iconIf you want to edit your work in LibreOffice, Apache OpenOffice, Calligra, KOffice, AbiWord or any open source tool using the OpenOffice or .ODT file format, use this option. (ODT stands for OpenDocument Text – which is used by the Writer app in any OpenOffice-style program.) ODT is the open source equivalent to Word’s DOCX file format and is an international standard.

OpenOffice logoThere is no official name, OpenOfficeXML, but there is (or, at least, used to be) OpenOffice.org XML. As we saw before, this should not be confused with Office Open XML, the Microsoft format.

Microsoft Word can import ODT files but usually puts up one or two scary-looking warning messages when you try. Other programs that can read ODT files include Google Writer, Zoho Writer, and older versions of OpenOffice, including IBM’s Lotus Symphony.

Note that LibreOffice and Apache OpenOffice can import many different formats, including older and newer Microsoft Word formats, so it’s not usually necessary to translate a DOC or DOCX file into ODT before opening in Writer (LibreOffice or Apache OpenOffice). But if you are intending to use the open source programs from the start, go ahead and save it in the ODT (OpenOffice Document) format.

You can easily save a file in either ODT or DOC or DOCX (OOXML) format in LibreOffice. (Apache OpenOffice does not currently save in the DOCX format.)

Unicode Text

Unicode logoUnicode text is a plain text format which uses an expanded character set useful for non-Latin alphabets, including both left-to-right and right-to-left reading languages such as Arabic, Hebrew, Russian, Korean, Japanese and Chinese. You may see references to UTF-8 AND UTF-16, the latter being more complete. The current full international unicode standard recognizes a massive 120,000 characters in 129 scripts. (To my knowledge, no file format, including Unicode, encompasses the entire set.)

If you are writing in a non-Latin language or you need to open your work in a non-Latin word processor, you can use this format.

If you are saving your file to a very old Linux or Macintosh OS, it may be better not to use Unicode since it was not supported until the late 1990s or early 2000s.

Not all installed fonts support all the Unicode characters, although every modern operating system should have at least a few fonts that will.

Other formats

The world is littered with hard drives, digital tapes, CDs, DVDs, optical drives and floppy disks with both new and older file formats. Fortunately, there are programs that can open many of these formats, even if you don’t have the original software. Here are some of the formats you may run across.

Word 97-2003 and earlier

Word for DOS iconThis is the older DOC format used by Microsoft Word from 1997 until their OOXML format debuted in 2007. You should be able to open these files in any modern version of Word and most other word processors.

PDF

PDF iconThis is a format developed by Adobe that preserves all the font information, graphics, and page layout of your document. Generally, it’s not editable (or, more accurately, difficult to edit). Some PDFs are password protected, too.

If you need to recover text and graphics from a PDF file, there are several tools available on the web, which might help, depending on the security features enabled in the PDF. You can also try LibreOffice, which can open some PDF files in its Drawing module. The text may be broken up into a block for each line, which might be awkward to work with, but better than no recovery. It can be edited in a fashion and printed or exported.

It should be noted that LibreOffice can embed a Writer document into a PDF which allows other LibreOffice users to edit the file in Writer and re-save as a PDF. This is not common, but it exists. (No, Apache OpenOffice will not do this.)

HTML, XHTML, HTML5 & CSS

HTML5 and CSS3 iconsThese are all HTML (hypertext markup language) formats for web display. They are normally meant to be read by a web browser, but can be edited in any text editor. There are two ways to view HTML – as interpreted by your browser or blog editor, and as html code which can use any text processor (see our previous post for ideas). Modern HTML uses marked text but relies on CSS (Cascading Style Sheets) for the actual formatting of the text on-screen.

TeX

TeX logoTeX is an open source typesetting format, often used in academia for math and other technical documents.

WordPerfect

WordPerfect iconWordPerfect is still around and some programs might save in the WPD, WP, and WP7 formats. These are all proprietary WordPerfect formats and are not generally used to exchange data between programs.

Since we talked about ancient Word formats earlier, note that at one time WordPerfect had its own DOC file extension for its proprietary WordPerfect files. So if you find files ending with “.doc” on an old WordPerfect floppy from the 80s, that might be why. You won’t be able to open them in Microsoft Word.

TextMaker

TextMaker iconTextMaker is a popular European word processor included in Softmaker Office, using the TMD extension. TextMaker should be able to import and export to many of the above formats. If you need to open TextMaker files, you could download Softmaker FreeOffice for Windows or Linux, at no cost.

Apple Pages

Apple pages iconPages, the word processor in AppleWorks, cannot be opened in most Windows or Linux applications outside of Google Docs and LibreOffice. It’s not a good format for exchanging your work between software programs.

OPML

OPML iconOPML, which stands for Outline Processor Markup Language) is an exchange format for outlines and other hierarchical text.

Mind mapping software

mind mapping iconMind mapping software has a wide variety of file formats, including ITMZ (iThoughts), MM (Freemind), MMAP (MindManager), XMIND (Xmind), MVD and MVDX (MindView) and many more. Freemind was the original mindmap software, so it’s MM format serves as a common exchange format. (Freemind is open source, so anybody can download it at no cost.) Most mind map software will also export to the OPML format (which preserves the hierarchical relationships) or plain text (which doesn’t).

ePub , MOBI, iBook, azw

ePub iconThese are all ebook formats, not word processing formats. They are destination formats for readers, not exchange formats and word processors need translation tools to export a file into an ebook. A few tools, such as Pandoc, Sigil and Calibre are good for working with ebook files.

Opening very old word processing files

Sometimes a word processing file is so old that updating it to a new version of the same program is not possible. Often you can import it into a new program, but sometimes the best you can do is extract the text to use in a modern word processor. Here are a few ideas.

Using Microsoft Word

Large Word exchange iconWord usually has no problem opening files created as early as 1997. Word existed, of course, before 1997 in Windows, DOS and Mac versions and there have been several different file standards over the years. If you have some of these old files (perhaps lying around on some floppies or an ancient hard drive) you might have problems opening them, even in Word.

Here are some possibilities:

  1. Start your modern Word first, then use the Open command. Set the file format to All Documents, navigate to the old file, and click Open. Even if Word cannot recognize the file as an old Word file, it might be able to open it with formatting intact.
  2. If that doesn’t work, there’s a little-known trick to open old Word files and mystery documents. Again, use the Open command. In the file format drop-down box, select Recover Text from Any File. Navigate to the mystery file and click open. Word will even navigate you to the recovered text.
  3. Instead of using Word, open the file with a text editor, such as the free Notepad++ (or the less capable Notepad in Windows Accessories). You might have to wade through a bunch of gibberish, but you may be able to locate the real text to save it. At least one user online reported this was better for saving text with accents, umlauts and the like.
  4. Find an old computer with Office 2004, which still opened the older formats. Most of us don’t have lots of old computers lying around, but you can check garage sales, or haunt e-recycling events to grab one before they are hauled off. (Check to make sure the hard drive is intact and that Word is still installed.) You might keep one just for the floppy drive.
  5. Try a different program, such as LibreOffice (or Apache OpenOffice) or even Google Docs.

Going the other direction – from a new file to an old computer: If you use an old version of Word (Office XP, Office 2000, or Office 2003), Microsoft offers a Compatibility Pack for newer Word, Excel and PowerPoint files.

If you own no version of Microsoft Word, you can download Microsoft’s Word Viewer which allows you to view 21st century versions of Word, RTF, Text, WordPerfect 5.x and 6.x files, Microsoft Works 6.0 and 7.0 files, and XML files. Note this list does not include the older versions of Word described above, but it might be worth a try. Word Viewer will let you copy the text and edit it in another word processor, but you can’t edit within the Viewer.

Using LibreOffice

LibreOffice logoFrom the beginning, OpenOffice was used to open and edit files made by a variety of different editors. LibreOffice, the most modern version of OpenOffice, has been adding to the number of legacy files it can open, read, save and print. Even if you don’t use LibreOffice to write, you can keep a copy available just to translate files if you need to. It’s free, after all.

LibreOffice opens most modern and legacy Word files, including:

  • Word 2007-2013 (OOXML) files and templates
  • Word 97-2003 (DOC) files and templates
  • Word 2003 XML
  • Word 6.0 / 95 files and templates (1993-1995)
  • WinWord (Word for Windows) versions 1, 2 and 5 (there was no version 3 or 4)
  • Rich Text Formats (RTF)
  • Older versions of Word for Macintosh, too.

Also:

  • Microsoft Works word processing files, spreadsheets and possibly databases
  • Older Microsoft Publisher files may be opened and viewed in LibreOffice Draw. So may some PDF files.

Another trick, similar to Microsoft’s “Recover Text from any File,” is to use the open command, select Text Documents from the file format drop-down menu, and try to open the file from there.

WordPerfect Files

WordPerfect is still an active program, but you might not need to purchase it just to open old files. WPD files can be opened in LibreOffice and in SoftMaker Office and SoftMaker FreeOffice. Microsoft Word can open WordPerfect 5.0 and 6.0 files. For earlier formats, see our tricks above. If these solutions don’t work, you can read the exhaustive How to Import WordPerfect Files into Microsoft Word on the Columbia University website.

AppleWorks

If you have old AppleWorks files, iWorks may now be the only way to open them on a modern Mac without using a text processor to try to extract the text.

Graphics and Artwork

Man does not live by words alone. There are even more old graphics formats than there are writing tools, ranging back to MacPaint and MacDraw and further.

One of the best ways to open legacy digital art is to use the free IrfanView, developed by Irfan Skiljan, which can open and translate almost any digital artwork known to mankind.

Saving Your Work for Posterity

As you can see, most commercial writing programs change formats over time and file compatibility becomes a problem. This is a serious problem for writers (and their descendents) who want to archive their writing. There are several solutions, but none of them is perfect. Storage media is a separate problem. Here are some possible ways to save your writing for posterity.

Constant Vigilance

Whenever a new file format is created (Word, OpenOffice or other), create a system for translating your files into the new format. This also requires updating your writing software from time to time. Depending on the quantity of writing you do, this could be very time consuming and some writers will prefer to spend the time writing new material rather than preserving old.

Whenever you buy a new computer, transfer all your old writings over to it (or to a separate hard drive or two). Keep backups, too.

Text only solutions

Of all the file formats, text-only formats are likely to survive and new programs should be able to import them. Both ASCII text and Unicode text should last longer than proprietary formats.

Open Source Solutions

The open source community is trying to address the problems of longevity and file compatibility in various ways.

One way is by creating international standards, such as ODF (OpenOffice File format), which includes the word processing standard, ODT. This file format will probably change in the future, but it is likely the new versions will be backwards-compatible. The file format is also usable on Windows, Macintosh and Linux computers, which increases compatibility, too. There are currently efforts on making ODT formats usable on iOS and Android systems, too.

The ODF file system is attractive to governments and businesses that need to archive documents for decades (or even centuries). This has been one of the arguments for several European governments changing over from Microsoft Office to various OpenOffice programs.

Also, the creators of LibreOffice have expanded the ability to import older proprietary formats, including older Macintosh outlining files, various graphics formats, Microsoft Works (Windows) files, older Microsoft Publisher files and more. The ability to import, view and print these documents also makes LibreOffice attractive to those concerned with longevity.

Microsoft has tried to develop its own OOXML format into an international standard. The fact that they have changed that standard at least three times in the past ten years or so has not helped their effort, though.

HTML

HTML formats change fairly rapidly, but they are all based on a text file standard, thus preserving the text content with instructions on formatting and style. Graphics and other media are stored separately, and imported via links.

Markdown

Some advocates of using Markdown note that it is a simple, readable all-text format. If you want to preserve some elementary text formatting, Markdown might be a possibility, if you know how to interpret it (which is fairly simple). Markdown has been around for about 11 years, but it’s still a bit of a niche product for bloggers and programmers. (See our Frugal Guidance 2 Markdown tutorials if you want to learn more.)

PDF

The entire purpose of the PDF format, as created by Adobe, was to preserve formatting, graphics, and text within a single file. Although the PDF format is pretty ubiquitous these days, it is difficult to know how long people will find it useful or if it will be replaced by something better.

Non-acidic Paper

There is a non-digital way to preserve web pages, books, text and most file formats called printed paper. Non-acidic paper is available for archival purposes, and has been known, in extreme cases, to last well over a thousand years in the proper, dry environment. Print may be the most viable preservation format, used in combination with digital file formats.

This is contingent, of course, on the human race not destroying itself (or being destroyed by natural causes) over the next thousand years. Perhaps keeping a working library on the moon in non-atmospheric conditions might improve the longevity of printed paper, but I’ve heard of no efforts to implement such a scheme. (It’s just one more reason I believe that moon settlement should be a higher priority than Martian colonization, but that’s another argument for another day.)

I hope this discussion of file formats for Writers has been useful. If anything is unclear, incomplete, or (horrors) incorrect, please let me know in the comments.

Credits:

Title photo, Office Things, used courtesy of Victor Havacek and picjumbo.

Frugal Guidance 2 - http://andybrandt531.com