Bob Hoffmann's Web Tips  
 

Home
HTML, etc.
Email
Dreamweaver Tips
Contribute Tips
Graphics
Miscellaneous Tips
PDF's
Search Engine Tips


CAHE Information Department

WSU Google Search:

 
 
 

Cleaning up Microsoft Word and Excel code

Update: Newer versions of Microsoft Office, such as 2003, allow you to save as "Web Page, Filtered." Save your file in this format before running "Import Word HTML."


Transforming your Microsoft Word or Excel document to HTML format can be as simple as using the "Save as" function under your File menu. The disappointment comes when you examine your source code and find MS-Bloat, a curious condition that can double or even quadruple your file size. This will increase download time while making it more difficult for you to format your document.

Do not despair! There are Dreamweaver treatment plans for MS-Bloat.

Microsoft Word
There are two ways to deal with Word-Bloat. The first, presuming you have Dreamweaver 4.0 or higher, is to simply copy your text from Word, and paste it into Dreamweaver. Formatting is eliminated, but your carriage returns are still intact. Carriage returns are converted as line breaks (<BR>) instead of paragraphs (<P>).

If you do a lot of this work and choose to convert the breaks to paragraphs, you will want to see the breaks in your design view. To do so, you need to have your interface set to view invisible elements (View, Visual Aids, Invisible Elements). Additionally, check under Edit, Preferences, Invisible Elements, to see whether "Line Breaks" are selected. The Line Break symbol in your design view looks like this: Dreamweaver Line Break Symbol

Simply select the symbol and hit Enter to convert from <BR> to <P>.

The second way to convert a Word document to clean HTML is to export the document from Word to HTML (Save as, Save as type Web Page/*.htm). Then, from a Dreamweaver document window (sorry, it doesn't work from the Site Files window), select File, Import, Import Word HTML. Select your file and click "OK." Dreamweaver will tell you what it has done to clean the code. Don't forget to save your file. The imported file will still have some yucky MS-Bloat, but it will be much cleaner than it was before.

Excel
As bad as Word exports to HTML, Excel is much worse. Excel produces bloated code on a monumental scale. You will wonder how it ever determined to insert all of those classes, styles, table cell heights and widths, etc. Even with Dreamweaver's advanced Search & Replace functions, it can take an hour to de-bloat your code.

There is a faster and better way. Simply export your Excel spreadsheet as a tab-delimited text file (File, Save As, Save as type: Text (Tab delimited)). Then, in your Dreamweaver document window, choose File, Import, Import Tabular Data. Browse to your file, select any table features you want from the dialog box, and click OK. Viola! Your table is imported cleanly into Dreamweaver. You can also use this technique for files delimited with commas, semi-colons, and other characters.

 

Microsoft Filter for Office 2000:
Microsoft also has an HTML Filter that removes all the Office specific tags. See their Info & Download Page.

It works OK when exporting from Word. Just open the document you want to filter.
On the File menu, point to Export To and then click Compact HTML. You should still use Dreamweaver's "Import Word HTML" function on the file.

To work on an Excel file, you'll need to export with Excel, and then open the HTML file with the filter running independently. Go into "Options" to eliminate CSS and Styles. It still leaves table element width and height attributes, but the file is much cleaner than before.

For an even cleaner and faster option, export Excel to a tab-delimited file (see below).

 

Have a Web
question?
Ask Bob Hoffmann

 
                         
 
 
Refer questions or comments to Bob Hoffmann, 509-335-7744. Accessibility | Copyright | Policies
CAHNRS Information Department, 401 Hulbert Hall, Washington State University, Pullman, Washington, 99164-6244.