Fixing a document broken by LibraOffice when saving as docx.

The moral of the story is create backups of the document before doing changes and save to ODT then DOCX.
The fun bit is then recovering the file.

Copy the docx file and rename as a zip file and extract the zip file into a temporary directory.

Validate the XML using:

cat word/document.xml | tidy -xml

Open the word/document.xml file in a simple text editor such as notepadqq and find the column and remove the offending tag (or add as it depends on what the error is).

Rerun:

cat word/document.xml | tidy -xml

Remove the next tag (or add) and keep going until there are no more errors.

Create a new zip file. Make sure you recreate the same structure as the original dox/zip file.

Rename the zip file as a docx file and open the file in LibraOffice and save as ODT and DOCX.

Check the structure... fingers crossed all is well again.

The second moral is do not enable tracking when using LibraOffice.

The third moral is save backups regularly.

Comments