Category Archives: Document Management

How To : Convert Word Documents to PDF using SharePoint Server 2010 and Word Services and Stop Wasting Money on MS Field Engineers

SharePoint’s Word Automation Services is extremely powerful when it comes to converting document types and keeping the formatting.
Combine this with the flexibility of the templates that OpenXML use and you have a very powerful combination capapble of ANYTHING.
This is part of a Document Management System I developed for Nedbank that a Microsoft Field Engineer said was Impossible in SharePoint and asked R300 000 just for a few hours of his time for “analysis

SharePoint 2010Word Automation Services available with SharePoint Server 2010 supports converting Word documents to other formats. This includes PDF. This article describes using a document library list item event receiver to call Word Automation Services to convert Word documents to PDF when they are added to the list. The event receiver checks whether the list item added is a Word document. If so, it creates a conversion job to create a PDF version of the Word document and pushes the conversion job to the Word Automation Services conversion job queue.

This article describes the following steps to show how to call the Word Automation Services to convert a document:

  1. Creating a SharePoint 2010 list definition application solution in Visual Studio 2010.
  2. Adding a reference to the Microsoft.Office.Word.Server assembly.
  3. Adding an event receiver.
  4. Adding the sample code to the solution.

Creating a SharePoint 2010 List Definition Application in Visual Studio 2010

This article uses a SharePoint 2010 list definition application for the sample code.

To create a SharePoint 2010 list definition application in Visual Studio 2010

  1. Start Microsoft Visual Studio 2010 as an administrator.
  2. From the File Menu, point to the Project menu and then click New.
  3. In the New Project dialog box select the Visual C# SharePoint 2010 template type in the Project Templates pane.
  4. Select List Definition in the Templates pane.
  5. Name the project and solution ConvertWordToPDF.
    Figure 1. Creating the Solution

    Creating the solution

  6. To create the solution, click OK.
  7. Select a site to use for debugging and deployment.
  8. Select the site to use for debugging and the trust level for the SharePoint solution.
    Note
    Make sure to select the trust level Deploy as a farm solution. If you deploy as a sandboxed solution, it does not work because the solution uses the Microsoft.Office.Word.Server assembly. This assembly does not allow for calls from partially trusted callers.
    Figure 2. Selecting the trust level

    Creating the solution

  9. To finish creating the solution, click Finish.

Adding a Reference to the Microsoft Office Word Server Assembly

To use Word Automation Services, you must add a reference to the Microsoft.Office.Word.Server to the solution.

To add a reference to the Microsoft Office Word Server Assembly

  1. In Visual Studio, from the Project menu, select Add Reference.
  2. Locate the assembly. By using the Browse tab, locate the assembly. The Microsoft.Office.Word.Server assembly is located in the SharePoint 2010 ISAPI folder. This is usually located at C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\ISAPI. After the assembly is located, click OK to add the reference.
    Figure 3. Adding the Reference

    Adding the reference

Adding an Event Receiver

This article uses an event receiver that uses the Microsoft.Office.Word.Server assembly to create document conversion jobs and add them to the Word Automation Services conversion job queue.

To add an event receiver

  1. In Visual Studio, on the Project menu, click Add New Item.
  2. In the Add New Item dialog box, in the Project Templates pane, click the Visual C# SharePoint 2010 template.
  3. In the Templates pane, click Event Receiver.
  4. Name the event receiver ConvertWordToPDFEventReceiver and then click Add.
    Figure 4. Adding an Event Receiver

    Adding an event receiver

  5. The event receiver converts Word Documents after they are added to the List. Select the An item was added item from the list of events that can be handled.
    Figure 5. Choosing Event Receiver Settings

    Choosing even receiver settings

  6. Click Finish to add the event receiver to the project.

Adding the Sample Code to the Solution

Replace the contents of the ConvertWordToPDFEventReceiver.cs source file with the following code.C

 
using System;
using System.Security.Permissions;
using Microsoft.SharePoint;
using Microsoft.SharePoint.Security;
using Microsoft.SharePoint.Utilities;
using Microsoft.SharePoint.Workflow;

using Microsoft.Office.Word.Server.Conversions;

namespace ConvertWordToPDF.ConvertWordToPDFEventReceiver
{
  /// <summary>
  /// List Item Events
  /// </summary>
  public class ConvertWordToPDFEventReceiver : SPItemEventReceiver
  {
    /// <summary>
    /// An item was added.
    /// </summary>
    public override void ItemAdded(SPItemEventProperties properties)
    {
      base.ItemAdded(properties);

      // Verify the document added is a Word document
      // before starting the conversion.
      if (properties.ListItem.Name.Contains(".docx") 
        || properties.ListItem.Name.Contains(".doc"))
      {
        //Variables used by the sample code.
        ConversionJobSettings jobSettings;
        ConversionJob pdfConversion;
        string wordFile;
        string pdfFile;

        // Initialize the conversion settings.
        jobSettings = new ConversionJobSettings();
        jobSettings.OutputFormat = SaveFormat.PDF;

        // Create the conversion job using the settings.
        pdfConversion = 
          new ConversionJob("Word Automation Services", jobSettings);

        // Set the credentials to use when running the conversion job.
        pdfConversion.UserToken = properties.Web.CurrentUser.UserToken;

        // Set the file names to use for the source Word document
        // and the destination PDF document.
        wordFile = properties.WebUrl + "/" + properties.ListItem.Url;
        if (properties.ListItem.Name.Contains(".docx"))
        {
          pdfFile = wordFile.Replace(".docx", ".pdf");
        }
        else
        {
          pdfFile = wordFile.Replace(".doc", ".pdf");
        }

        // Add the file conversion to the conversion job.
        pdfConversion.AddFile(wordFile, pdfFile);

        // Add the conversion job to the Word Automation Services 
        // conversion job queue. The conversion does not occur
        // immediately but is processed during the next run of
        // the document conversion job.
        pdfConversion.Start();

      }
    }
  }
}

Examples of the kinds of operations supported by Word Automation Services are as follows:

  • Converting between document formats (e.g. DOC to DOCX)
  • Converting to fixed formats (e.g. PDF or XPS)
  • Updating fields
  • Importing “alternate format chunks”

This article contains sample code that shows how to create a SharePoint list event handler that can create Word Automation Services conversion jobs in response to Word documents being added to the list. This section uses code examples taken from the complete, working sample code provided earlier in this article to describe the approach taken by this article.

The ItemAdded(SPItemEventProperties) event handler in the list event handler first verifies that the item added to the document library list is a Word document by checking the name of the document for the .doc or .docx file name extension.

// Verify the document added is a Word document
// before starting the conversion.
if (properties.ListItem.Name.Contains(".docx") 
    || properties.ListItem.Name.Contains(".doc"))
{

If the item is a Word document then the code creates and initializes ConversionJobSettings and ConversionJob objects to convert the document to the PDF format.

C#
 
//Variables used by the sample code.
ConversionJobSettings jobSettings;
ConversionJob pdfConversion;
string wordFile;
string pdfFile;

// Initialize the conversion settings.
jobSettings = new ConversionJobSettings();
jobSettings.OutputFormat = SaveFormat.PDF;

// Create the conversion job using the settings.
pdfConversion = 
  new ConversionJob("Word Automation Services", jobSettings);

// Set the credentials to use when running the conversion job.
pdfConversion.UserToken = properties.Web.CurrentUser.UserToken;

The Word document to be converted and the name of the PDF document to be created are added to the ConversionJob.

 
// Set the file names to use for the source Word document
// and the destination PDF document.
wordFile = properties.WebUrl + "/" + properties.ListItem.Url;
if (properties.ListItem.Name.Contains(".docx"))
{
  pdfFile = wordFile.Replace(".docx", ".pdf");
}
else
{
  pdfFile = wordFile.Replace(".doc", ".pdf");
}

// Add the file conversion to the Conversion Job.
pdfConversion.AddFile(wordFile, pdfFile);

Finally the ConversionJob is added to the Word Automation Services conversion job queue.

 
// Add the conversion job to the Word Automation Services 
// conversion job queue. The conversion does not occur
// immediately but is processed during the next run of
// the document conversion job.
pdfConversion.Start();

Creating Your Own Document Management System With SharePoint and Dynamix

Image

With the R2 release of Dynamics AX 2012, a new feature was quietly snuck into the product that allows you to store document attachments from Dynamics AX within SharePoint rather than within an archive location, or within the database. This opens up a whole slew of possibilities when it comes to document management within SharePoint.

In this example we will show how you can create a document management structure within SharePoint that you can use in conjunction with the Dynamics AX attachments feature, and also we will show a few tweaks that you can make that may make managing your documents within SharePoint just a little easier.

Creating a new Document Management Site

The first step that we are going to work through is the creation of a new Document Management site where we will put all of our Dynamics AX document attachments. We are just creating a site to separate out the documents from other items that you may already have stored within SharePoint.

How to do it…

To create a new Document Management Site in SharePoint, follow these steps:

  1. Open up the SharePoint Workspace that you want to use to house your Document Management site and from the Site Actions menu, select the New Site option.
  2. When the Site Templates are displayed, select the Blank Site template. Give your site a name, and also a sub site name (probably the same as your site name). When you are done, click on the Create button for start the site creation process.

How it works…

When the site is created, you should be taken to a new blank site which you will be able to use as a document repository.

Creating Document Libraries for the Business Areas

The next step in the process is to create document libraries to store all of your documents away in. You could create one big library, or a number of smaller ones, broken out into groups based on business area or function. In this example we will do the latter because it will give us more flexibility with the indexing of the documents, and also make it easier to find particular documents.

How to do it…

To create document libraries for the business areas, follow these steps:

  1. From within your new Document Management site, select the New Document Library option from the Site Actions menu.
  2. When the Document Library Creation dialog box appears, give your library a Name, Description, and also set the Document Template to None. In this example we are creating a library for all of the AP Documents.
    When you are done, click on the Create button to start the document library creation process.

How it works…

After it finishes you should have a new library for you to use. You can repeat the process for all of the other business areas that you want to manage documents for – in our example we just used the standard business areas from the Dynamics AX navigation menu.

Creating Dynamics AX Document Types that Link to the Document Libraries

Once you have created your document libraries, you can connect them to Dynamics AX with the new SharePoint option so that the users are able to attach documents from the client and then store them within SharePoint for everyone to access.

How to do it…

To create a file attachment type that links to SharePoint, follow these steps:

  1. From the Organization administration area page, select the Document types menu item from the Document management folder of the Setup group.
  2. When the Document types maintenance form is displayed, click on the New button in the menu bar to create a new entry.
  3. We will start by creating a link for all of our generic Accounts Payable documents by giving our new Document type a Name and Description. In the Group select File from the dropdown options and select SharePoint for the Location option.
  4. Now return back to your document libraries within SharePoint and copy the URL for the document library.
  5. Paste the URL into the Archive
    directory field.

    Note: Remove all of the extra parameters though so that you are just referencing the base folder location.
    Also, if you click on the folder browser to the right of the Archive directory field you can test the link to SharePoint.

How it works…

Now, if you attach a document, then you will see the option for your new document type.

It will allow you to attach any file that you have on your desktop.

And rather than showing you the thumbnail image, it will show you a reference link to your SharePoint document library.

After attaching the document, if you look within SharePoint, you will see your document is saved away for you.

You can repeat this process for all of your other document libraries that were created in the previous step.

Adding Columns to your Document Libraries for Better Indexing

One of the reasons why you want to start using SharePoint is so that you can take advantage of the indexing functionality to code and classify your documents. Now that you have people storing the documents away, it’s time to add some indexes to you document libraries.

How to do it…

To create new indexes for your document libraries, follow these steps:

  1. Open up your document libraries within SharePoint and select the Library ribbon bar. Then click on the Create Column button within the Manage Views group.
  2. When the Create Column form opens, set the Column Name to be the field that you want to index, select the type of the column, and also set the columns Description.
  3. Note: Sometimes it’s a good idea not to have spaces in the column name, later on when we add filters, it becomes a litter easier to manage this way.
  4. After you have finished defining the column, click on the OK button to add the column to your library.
  5. When you return back to your document library, there will be a new column on the form.
  6. Repeat the process for all of the columns that you want to use as index fields for the library.

    Note: All of the columns do not have to be used during the indexing process, so it’s OK to have variations of columns, like InvoiceNum, CreditNoteNum, etc.

How it works…

To edit the columns, select the options menu for the document, and choose the Edit Properties option.

This will allow you to update the fields that Dynamics AX did not populate initially.

Now when you look at the document within SharePoint, you will see the additional metadata that is associated with the document.

Embedding Document Libraries into Dynamics AX Forms

Now that we are able to index documents a little more effectively within SharePoint, we can go the extra step, and link SharePoint to our forms within Dynamics AX so that we are able to access them without even leaving the application. Doing this just requires a little bit of coding, but is well worth the effort.

Getting Started…

You can manipulate the information that is displayed by SharePoint, and also how it is displayed through the URL that you use.

If you filter any of the views, then you will notice that it uses two qualifiers – FilterFieldX
and FilterValueX to restrict the viewed records.

Also, if you add a IsDlg=1 qualifier, then all of the navigation areas are hidden giving you a clean list of filtered documents.

This is the perfect type of view to embed within Dynamics AX.

The other half of this step is to choose a form to add your document libraries to. In this case we will update the Vendors form.

How to do it…

To embed your SharePoint document libraries within Dynamics AX forms, follow these steps:

  1. Start the process by opening up AOT, and create a new project for this tweak.
  2. From the Forms area in the AOT, drag over the form that you want to add the documents to – in this case it’s the VendTable form.
  3. Expand out the form within the project, and navigate to the group that you want to add your document library view into.
  4. Right-mouse-click on the parent tab, and select the TabPane option from the New Control sub-menu.
  5. Reorder your tabs (ALT+UpArrow/DownArrow) so that they are in the sequence that you want and then give your new Tab Control a Name and Caption in the Properties section.
  6. Right-mouse-click on the new tab that you added for the document library and select the ActiveX control from the New Control sub menu.
  7. When the list of ActiveX controls are displayed, select the Microsoft Web Browser control, and click the OK button to add it.
  8. Rename your ActiveX control, and set the Width to Column width
    and Height to Column height.
  9. Now we need to have Dynamics AX update the URL that is navigated to when the form is opened. To do this, right-mouse-click on the parent Methods group for the form, and select the activate method from the Override methods submenu.
  10. Update the activate method by building the URL that will define the specific document index that you are wanting to show. You are able to now add conditional filters that pick up the record values, and filter based on the current record – in this case the vendor number.
  11. Once you have finished the update, save the project.

How it works…

Now when you open up the Vendor form, there will be a Documents tab that shows all of the documents that are associated with the current record.

If you select a record that does not have attached documents, then you will not see anything there.

How cool is that.

Creating Custom Views for the Document Libraries

Now that all of the heavy lifting has been done, you can now start tweaking the SharePoint libraries and the way that the information is displayed. Based on the form that you are in you may want to show only particular information. You can do that by creating new custom views.

How to do it…

To create a custom view for your document libraries, follow these steps:

  1. Open up your document libraries within SharePoint and select the Library ribbon bar. Then click on the Library Settings button within the Settings group.
  2. When the document library settings maintenance form is displayed, scroll down to the bottom of the page, and there will be a section for Views that will show you all of the different ways that the form could be displayed. In this case there is only one, but we can fix that by clicking on the Create View link.
  3. Select the Standard View option from the format templates that are displayed.
  4. Assign your new view a Name and select the fields that you want to be displayed in the view.
  5. Once you have made the changes, click on the OK button to save your new view.

How it works…

When you return to the document library you will be able to see the new format of the view.

You can then change the view within the URL of your project to make it the default view for the form.

Now when you see all of the documents within your Dynamics AX form you will see just the information that you need.

Using the SharePoint Designer to Edit Document Libraries

Although you can do everything that we have shown so far within SharePoint, you can also take advantage of the SharePoint Designer application to update your SharePoint document libraries. You don’t even have to search for the install kit, because it is embedded within your SharePoint site, just waiting to be downloaded and installed.

How to do it…

To access the SharePoint Designer to manipulate your SharePoint site, follow these steps:

  1. To use the SharePoint Designer to update your SharePoint site, just select the Edit in SharePoint Designer option from the Site Actions menu.

    Note: If you don’t have SharePoint Designer installed then it will ask you to install it, and download the kit directly from your SharePoint installation.

How it works…

When SharePoint Designer opens up, it will be connected to your current SharePoint site, showing you all of the libraries, etc. that you have been creating.

If you select the Lists and Libraries option from the navigation pad, you will be able to see all of the document libraries that you created in the previous steps.

Drilling into the libraries you will be able to also see all of the views etc. that you configured within SharePoint.

Creating New Content Types to Manage Document Types

When we set up our document libraries we deliberately created them so that all of the documents for a particular area are within the same library. This allows us to save multiple types of documents away within the library like Invoices, Credit Notes, Vendor Certificates etc. The way that we can identify the type of document is through the creation of Content Types.

How to do it…

To create custom Content Types to help make classification easier, follow these steps:

  1. Open up SharePoint Designer (although you can also do this within SharePoint itself) and select the Content Types from the navigation menu and click on the Content type menu button within the New group
    of the Content Types ribbon bar.
  2. When the Content Type creation form is displayed, give your Content Type a Name, and Description, select a parent content type, and also a group that you want to show the Content Type in.

    Note: For the first content type that you create, you may want to create a new Content Type Group so that it isn’t intermingled with all of the other content types.

  3. When you have finished creating your Content Type click on the OK button to add it to SharePoint.
  4. When you return to your SharePoint Designer workspace you will be able to see your new Content Type.
  5. Repeat the process for all of the other types of documents that you want to file away within SharePoint.
  6. Now we need to enable Content Types within our document libraries, and then assign them. To do this, open up your Document Library within SharePoint Designer, and within the Settings group, check the Allow management of content types check box.
  7. Then click on the Add button to the right of the Content Types group to open up the Content Type Picker. Find the new Content Types that you just created, and click on the OK button to assign them to your Document Library.
  8. Now the Content Type will show up as a valid option for the document library.
  9. Repeat the process for all of the other content types that you created.

How it works…

Now when you edit the properties for your documents, there will be a new indexing option for your documents that allows you to define the type of document that you are looking at.

Specifying Document Columns by Content Types to Simplify Indexing

There is an additional benefit that you get from using Content Types within SharePoint, which is the ability to specify what columns are applicable to different Content Types at the time of indexing. For example, you probably don’t want to specify a Invoice Number when indexing a Vendor Insurance Certificate, but would definitely would want to when indexing an Invoice and even a Credit Note.

How to do it…

To modify your Content Types within your Document Libraries to only require certain columns to be indexed, follow these steps:

  1. From within your SharePoint Document Library (or from within SharePoint Designer) click on the Library Settings button within the Settings group of the Library ribbon bar.
  2. Within the Library Settings you will be able to see all of your Content Types that have been assigned. Select any of them to edit their options.
  3. When you first create the Content Types then they will have no columns assigned to them. Click on the Add from existing site or list columns link to assign the valid columns to your Content Type.
  4. When the Add columns to Content Type form is displayed, you will be able to see all of the available columns within the Document Library.
  5. Just select the ones that you want to use for the indexing, and then click the Add button. Once you have selected all the ones that you want to use, click on the OK button to save your changes.

How it works…

Now you will have indexing by Content Type.

Showing the Content Type in the Document View

Now that we are classifying documents by Content Type we might as well show it in the views so that we are able to differentiate different document types.

How to do it…

To add the Content Type field to our Document View, follow these steps:

  1. From within your SharePoint Document Library (or from within SharePoint Designer) click on the Modify View button within the Manage Views group of the Library ribbon bar.
  2. Now that the Content Type is enabled on our Document Library, it will show up on the list of valid columns. To add it to our view, just check the Display checkbox, and possibly change the order of the field so that it is first in the table.
  3. When you’re done, click on the OK button to update the view.

How it works…

Now the Content Type is shown in the document library view.

And also will show up when we browse to the documents within Dynamics AX.

How neat is that.

Grouping Records in the Document View by Key Columns

One final tweak that we will show within SharePoint is the ability to group columns within our document library views so that common information is shown together. These groupings can be different by view, and just make it a little easier to find information if we don’t’ initially filter the data.

How to do it…

To group records within your document library view, follow these steps:

  1. From within your SharePoint Document Library (or from within SharePoint Designer) click on the Modify View button within the Manage Views group of the Library ribbon bar.
  2. Scroll down your view definition until you see the Group By configuration options. Here you will be able to change the Group By fields.

How it works…

Now when you look at your documents, they will be classified by key fields.

Summary

In this walkthrough we have shown how you can:

  • Create a simple document management repository within SharePoint
  • Link the document attachments function within Dynamics AX to SharePoint to make the acquisition of the documents easier
  • Index your documents more effectively by defining custom columns
  • Embed SharePoint back into Dynamics AX and also
  • Tweak your views within SharePoint to make it easier to find and view documents

This is really just a starting point, and once you have mastered the basics, you can start investigating:

  • Assigning workflows to documents for approvals and updates
  • Enabling version control for your documents
  • Acquire documents into SharePoint through scanning technologies
  • Link the index column fields to Dynamics AX for validation of key information
  • And much more.

SharePoint is a great document management tool, and can usually handle all of your document indexing needs. Especially now that it is connected with Dynamics AX natively.

How to add a Link to a Document external to SharePoint

Image

You can add links to external file shares or/and file server documents to your document library very easily. Why would you want to do this? Primarily so all your MetaData to all your documents are searchable in the same place.First a Farm Administrator will need to modify a core file on the front end server.  Then you must create a custom Content Type. If you use the built in content type you will not be able to link to a Folder directly.
Edit the NewLink.aspx page to allow the Document Library to accept a File:// entry.

  1. Go to the Front End Web Server \12\template\layouts directory.
  2. Open the file NewLink.aspx using NotePad. If I have to tell you to take a backup of this file first then you have no business editing this file (really).
  3. Go to the end of the script section near top of page and add:

    function HasValidUrlPrefix_UNC(url)
    {
    var urlLower=url.toLowerCase();
    if (-1==urlLower.search(”^http://”) &&
    -1==urlLower.search(”^https://”) && -1==urlLower.search(”^file://”))
    return false;
    return true;
    }

  • Use Edit Find to search for HasValidURLPrefix and replace it with HasValidURLPrefix_UNC (you should find it two times).
  • File – Save.
  • Open command prompt and enter IISreset /noforce.

Important: To link to Folders correctly you must create your own content type exactly as below and not use the built in URL or Link to Document at all.

Create custom Content Type

  1. Go to your Site Collection logged in as a Site Collection Administrator.
  2. Site actions – Site Settings – Modify All Site Settings.
  3. Content Types
  4. Create
  5. Name = URL or UNC
  6. Description = Use this content type to add a Link column that allows you to put a hyperlink or UNC path to external files, pages or folders. Format is File://\\ServerName\Folder , or http://
  7. Parent Content Type,
    1. Select parent content type from = Document Content Types
    2. Parent Content Type = Link to a Document
  8. Put this site content type into = Existing Group:  Company Custom
    1. image
  9. OK
  10. At the Site Content Type: URL or UNC page click on the URL hyperlink column and change it to Optional so that multiple documents being uploaded will not remain checked out.
  11. OK
    1. image

Add Custom Content Type to Document Library

  1. Go to a Document Library
  2. Settings – Library Settings
  3. Advanced Settings
  4. Allow Management Content Types = Yes
  5. OK
  6. Content Types – Add from existing site content types
  7. Select site content types from: Company Custom
  8. URL or UNC – Add – OK
  9. Click on URL or UNC hyperlink
  10. Click on Add from existing site
  11. Add all your Available Columns – OK
  12. Column Order – change the order to be consistant with the Document content type orders.
  13. Click on your Document Library breadcrumb to test.
  14. View – Modify your view to add the new URL or UNC column to your view next to your Name column.

Create Link to Document

  1. Go to the Document Library
  2. New – URL or UNC
  3. Document Name: This must equal the exact file or folder name less the extension.
    1. Example: My Resume 
    2. Example: Folder2
    3. Example: Doc1
  4. Document URL: This must be the UNC path to the folder or file.
    1. Example: http://LindaChapman.BlogSpot.com/Folder1/Folder2/My Resume.doc
    2. Example: http://LindaChapman.BlogSpot.com/Folder1/Folder2
    3. Example: File://\\ServerName\FolderName\FolderName2\Doc1.doc

You might see other blogs that say you can’t connect to a folder and must create a shortcut first. They are wrong. You can by the method above.

The biggest mistakes I see are:

  1. People click on the NAME field instead of the URL field. They are not the same. You MUST click on the URL field to access the Folder properly.
  2. People use the built in Link to Document content type thinking it is the same or will save them a step. It is not the same.
  3. People type the document extension in the Name field. You can not type the extension in the name field. It will see it is a UNC path and ignore the .aspx extension.
  4. People enter their slashes the wrong direction for UNC paths.

How To : A library to create .mht files (available at request)

There are a number of ways to do this, including hosting Word or Excel on the Web Server and dealing with COM Interop issues, or purchasing third – party MIME encoding libraries, some of which sell for $250.00 or more. But, there is no native .NET solution. So, being the curious soul that I am, I decided to investigate a bit and see what I could come up with. Internet Explorer offers a File / Save As option to save a web page as “Web Archive, single file (*.mht)”.

Image

What this does is create an RFC – compliant Multipart MIME Message. Resources such as images are serialized to their Base64 inline encoding representations and each resource is demarcated with the standard multipart MIME header – breaks. Internet Explorer, Word, Excel and most newsreader programs all understand this format. The format, if saved with the file extension “.eml”, will come up as a web page inside Outlook Express; if saved with “.mht”, it will come up in Internet Explorer when the file is double-clicked out of Windows Explorer, and — what many do not know — if saved with a “*.doc” extension, it will load in MS Word, each with all the images intact, and in the case of the EML and MHT formats, with all of the hyperlinks fully-functioning. The primary advantage of the format is, of course, that all the resources can be consolidated into a single file,. making distribution and archiving much easier — including database storage in an NVarchar or NText type field.

 

System.Web.Mail, which .NET provides as a convenient wrapper around the CDO for Windows COM library, offers only a subset of the functionality exposed by the CDO library, and multipart MIME encoding is not a part of that functionality. However, through the wonders of COM Interop, we can create our own COM reference to CDO in the Visual Studio IDE, allowing it to generate a Runtime Callable Wrapper, and help ourselves to the entire rich set of functionality of CDO as we see fit.

 

One method in the CDO library that immediately came to my notice was the CreateMHTMLBody method. That’s MHTMLBody, meaning “Multipurpose Internet Mail Extension HTML (MHTML) Body”. Well!– when I saw that, my eyes lit up like the LED’s on a 32 – way Unisys box! This is a method on the CDO Message class; the method accepts a URI to the requested resource, along with some enumerations, and creates a MultiPart MIME – encoded email message out of the requested URI responses — including images, css and script — in one fell swoop.

 

“Ah”, you say, “How convenient”! Yes, and not only that, but we also get a free “multipart COM Interop Baggage” reference to the ADODB.Stream object – and by simply calling the GetStream method on the Message Class, and then using the Stream’s SaveToFile method, we can grab any resource including images, javascript, css and everything else (except video) and save it to a single MHT Web Archive file just as if we chose the “Save As” option out of Internet Explorer.

 

If we choose not to save the file, but instead want to get back the stream contents, no problem. We just call Stream.ReadText(Stream.Size) and it returns a string containing the entire MHT encoded content. At that point we can do whatever we want with it – set a content – header and Response .Write the content to the browser, for instance — or whatever.

 

For example, when we get back our “MHT” string, we can write the following code:

Response.ContentType=”application/msword”;
Response.AddHeader( “Content-Disposition”, “attachment;filename=NAME.doc”);
Response.Write(myDataString);

 

— and the browser will dutifully offer to save the file as a Word Document. It will still be Multipart MIME encoded, but the .doc extension on the filename allows Word to load it, and Word is smart enough to be able to parse and render the file very nicely. “Ah”, you are saying, “this is nice, and so is the price!”. Yup!

And, if you are serving this MIME-encoded file from out of your database, for example, and you would like it to be able to be displayed in the browser, just change the “NAME.doc” to “NAME.MHT”, and don’t set a content-type header. Internet Explorer will prompt the user to either save or open the file. If they choose “open”, it will be saved to the IE Temporary files and open up in the browser just as if they had loaded it from their local file system.

 

So, to answer a couple of questions that came up recently, yes — you can use this method to MHTML – encode any web page – even one that is dynamically generated as with a report — provided it has a URL, and save the MIME-encoded content as a string in either an NVarchar or NText column in your database. You can then bring this string back out and send it to the browser, images,css, javascript and all.

Now here is the code for a small, very basic “Converter” class I’ve written to take advantage of the two scenarios specified above. Bear in mind, there is much more available in CDO, but I leave this wondrous trail of ecstatic discovery to your whims of fancy:

using System;
using System.Web;
using CDO;
using ADODB;
using System.Text;
namespace PAB.Web.Utils
{
 public class MIMEConverter
 {
  //private ctor as our methods are all static here
  private MIMEConverter()
  {
   
  }   
  public static bool SaveWebPageToMHTFile( string url, string filePath)
  {
   bool result=false;
   CDO.Message  msg = new CDO.MessageClass(); 
   ADODB.Stream  stm=null ;
   try
   {
    msg.MimeFormatted =true;   
    msg.CreateMHTMLBody(url,CDO.CdoMHTMLFlags.cdoSuppressNone, "" ,"" );
stm = msg.GetStream(); stm.SaveToFile(filePath,ADODB.SaveOptionsEnum.adSaveCreateOverWrite); msg=null; stm.Close(); result=true; } catch {throw;} finally { //cleanup here } return result; } public static string ConvertWebPageToMHTString( string url ) { string data = String.Empty; CDO.Message msg = new CDO.MessageClass(); ADODB.Stream stm=null; try { msg.MimeFormatted =true; msg.CreateMHTMLBody(url,CDO.CdoMHTMLFlags.cdoSuppressNone,
"", "" );
stm = msg.GetStream(); data= stm.ReadText(stm.Size); } catch { throw; } finally { //cleanup here } return data; } } }

 

NOTE: When using this type of COM Interop from an ASP.NET web page, it is important to remember that you must set the AspCompat=”true” directive in the Page declaration or you will be very disappointed at the results! This forces the ASP.NET page to run in STA threading model which permits “classic ASP” style COM calls. There is, of course, a significant performance penalty incurred, but realistically, this type of operation would only be performed upon user request and not on every page request.

<

p align=”left”>The downloadable zip file below contains the entire class library and a web solution that will exercise both methods when you fill in a valid URI with protocol, and a valid file path and filename for saving on the server. Unzip this to a folder that you have named “ConvertToMHT” and then mark the folder as an IIS Application so that your request such as “http://localhost/ConvertToMHT/WebForm1.aspx&#8221; will function correctly. You can then load the Solution file and it should work “out of the box”. And, don’t forget – if you have an ASP.NET web application that wants to write a file to the file system on the server, it must be running under an identity that has been granted this permission.

SharePoint Samurai