Category Archives: mht

How To : A library to create .mht files (available at request)

There are a number of ways to do this, including hosting Word or Excel on the Web Server and dealing with COM Interop issues, or purchasing third – party MIME encoding libraries, some of which sell for $250.00 or more. But, there is no native .NET solution. So, being the curious soul that I am, I decided to investigate a bit and see what I could come up with. Internet Explorer offers a File / Save As option to save a web page as “Web Archive, single file (*.mht)”.

Image

What this does is create an RFC – compliant Multipart MIME Message. Resources such as images are serialized to their Base64 inline encoding representations and each resource is demarcated with the standard multipart MIME header – breaks. Internet Explorer, Word, Excel and most newsreader programs all understand this format. The format, if saved with the file extension “.eml”, will come up as a web page inside Outlook Express; if saved with “.mht”, it will come up in Internet Explorer when the file is double-clicked out of Windows Explorer, and — what many do not know — if saved with a “*.doc” extension, it will load in MS Word, each with all the images intact, and in the case of the EML and MHT formats, with all of the hyperlinks fully-functioning. The primary advantage of the format is, of course, that all the resources can be consolidated into a single file,. making distribution and archiving much easier — including database storage in an NVarchar or NText type field.

 

System.Web.Mail, which .NET provides as a convenient wrapper around the CDO for Windows COM library, offers only a subset of the functionality exposed by the CDO library, and multipart MIME encoding is not a part of that functionality. However, through the wonders of COM Interop, we can create our own COM reference to CDO in the Visual Studio IDE, allowing it to generate a Runtime Callable Wrapper, and help ourselves to the entire rich set of functionality of CDO as we see fit.

 

One method in the CDO library that immediately came to my notice was the CreateMHTMLBody method. That’s MHTMLBody, meaning “Multipurpose Internet Mail Extension HTML (MHTML) Body”. Well!– when I saw that, my eyes lit up like the LED’s on a 32 – way Unisys box! This is a method on the CDO Message class; the method accepts a URI to the requested resource, along with some enumerations, and creates a MultiPart MIME – encoded email message out of the requested URI responses — including images, css and script — in one fell swoop.

 

“Ah”, you say, “How convenient”! Yes, and not only that, but we also get a free “multipart COM Interop Baggage” reference to the ADODB.Stream object – and by simply calling the GetStream method on the Message Class, and then using the Stream’s SaveToFile method, we can grab any resource including images, javascript, css and everything else (except video) and save it to a single MHT Web Archive file just as if we chose the “Save As” option out of Internet Explorer.

 

If we choose not to save the file, but instead want to get back the stream contents, no problem. We just call Stream.ReadText(Stream.Size) and it returns a string containing the entire MHT encoded content. At that point we can do whatever we want with it – set a content – header and Response .Write the content to the browser, for instance — or whatever.

 

For example, when we get back our “MHT” string, we can write the following code:

Response.ContentType=”application/msword”;
Response.AddHeader( “Content-Disposition”, “attachment;filename=NAME.doc”);
Response.Write(myDataString);

 

— and the browser will dutifully offer to save the file as a Word Document. It will still be Multipart MIME encoded, but the .doc extension on the filename allows Word to load it, and Word is smart enough to be able to parse and render the file very nicely. “Ah”, you are saying, “this is nice, and so is the price!”. Yup!

And, if you are serving this MIME-encoded file from out of your database, for example, and you would like it to be able to be displayed in the browser, just change the “NAME.doc” to “NAME.MHT”, and don’t set a content-type header. Internet Explorer will prompt the user to either save or open the file. If they choose “open”, it will be saved to the IE Temporary files and open up in the browser just as if they had loaded it from their local file system.

 

So, to answer a couple of questions that came up recently, yes — you can use this method to MHTML – encode any web page – even one that is dynamically generated as with a report — provided it has a URL, and save the MIME-encoded content as a string in either an NVarchar or NText column in your database. You can then bring this string back out and send it to the browser, images,css, javascript and all.

Now here is the code for a small, very basic “Converter” class I’ve written to take advantage of the two scenarios specified above. Bear in mind, there is much more available in CDO, but I leave this wondrous trail of ecstatic discovery to your whims of fancy:

using System;
using System.Web;
using CDO;
using ADODB;
using System.Text;
namespace PAB.Web.Utils
{
 public class MIMEConverter
 {
  //private ctor as our methods are all static here
  private MIMEConverter()
  {
   
  }   
  public static bool SaveWebPageToMHTFile( string url, string filePath)
  {
   bool result=false;
   CDO.Message  msg = new CDO.MessageClass(); 
   ADODB.Stream  stm=null ;
   try
   {
    msg.MimeFormatted =true;   
    msg.CreateMHTMLBody(url,CDO.CdoMHTMLFlags.cdoSuppressNone, "" ,"" );
stm = msg.GetStream(); stm.SaveToFile(filePath,ADODB.SaveOptionsEnum.adSaveCreateOverWrite); msg=null; stm.Close(); result=true; } catch {throw;} finally { //cleanup here } return result; } public static string ConvertWebPageToMHTString( string url ) { string data = String.Empty; CDO.Message msg = new CDO.MessageClass(); ADODB.Stream stm=null; try { msg.MimeFormatted =true; msg.CreateMHTMLBody(url,CDO.CdoMHTMLFlags.cdoSuppressNone,
"", "" );
stm = msg.GetStream(); data= stm.ReadText(stm.Size); } catch { throw; } finally { //cleanup here } return data; } } }

 

NOTE: When using this type of COM Interop from an ASP.NET web page, it is important to remember that you must set the AspCompat=”true” directive in the Page declaration or you will be very disappointed at the results! This forces the ASP.NET page to run in STA threading model which permits “classic ASP” style COM calls. There is, of course, a significant performance penalty incurred, but realistically, this type of operation would only be performed upon user request and not on every page request.

<

p align=”left”>The downloadable zip file below contains the entire class library and a web solution that will exercise both methods when you fill in a valid URI with protocol, and a valid file path and filename for saving on the server. Unzip this to a folder that you have named “ConvertToMHT” and then mark the folder as an IIS Application so that your request such as “http://localhost/ConvertToMHT/WebForm1.aspx&#8221; will function correctly. You can then load the Solution file and it should work “out of the box”. And, don’t forget – if you have an ASP.NET web application that wants to write a file to the file system on the server, it must be running under an identity that has been granted this permission.