Sunday 14 October 2012

Creating an information map from multiple word documents

When I'm asked to plan information for a new customer I usually start with a content audit. The quickest and easiest way to understand a suite of MS Word documents (and give yourself a place to keep notes) is by creating a spreadsheet of the documents and tables of contents (TOC).

For a while I've been creating these spreadsheets by laboriously copying each TOC from each Word document (in the latest set, that's 35 documents). This time though, I asked the STC single-sourcing special interest group (sig) for help to automate the process and they gave me a web link that suggests using RD fields to collate all the TOCs into one.

I've written down my process here. It's a bit rough and ready but hopefully it helps. Obviously you'll need to adapt it to your situation but it gives some tips in terms of creating CSV files and using regular expressions for search and replace. If it helps you then please leave a brief comment :)  Thanks.


  1. Follow the instructions here to build an RD TOC (thanks Virginia).
  2. In MS Word, switch off page numbering and hyperlinks in the TOC so that it is headings only.
  3. Copy and paste the TOC into Notepad++
    My TOCs have heading numbering which I can use to create commas for a comma separated value file (CSV). Spreadsheet programs can open CSV files as spreadsheets.
  4. To prevent genuine commas breaking the structure, in Notepad++, search and replace commas "," with a unique string like c0mmr4.
  5. Press Ctrl+H for search and replace and select Use regular expressions.
  6. To extend the heading numbering style with an end period, replace tabs with a period:
       Replace "
    \t" with "." (no inverted commas).
  7. To create the correct number of cells indent, replace numbers followed by period with a comma:
       Replace "
    [0-9]+\." with ","
  8. Save the file.
  9. Grab the file and folder name; right-click the Notepad++ tab and select Copy complete path.
  10. In MS Excel, File > Open and paste in the file path and name.
  11. Search and replace "c0mm4" with ",".
  12. Click OK.
That's it. You should have an outlined TOC for your suite of documents. Now you just need to read them :)

No comments:

Post a Comment