Category Archives: OpenXML

How To : Convert Word Documents to PDF using SharePoint Server 2010 and Word Services and Stop Wasting Money on MS Field Engineers

SharePoint’s Word Automation Services is extremely powerful when it comes to converting document types and keeping the formatting.
Combine this with the flexibility of the templates that OpenXML use and you have a very powerful combination capapble of ANYTHING.
This is part of a Document Management System I developed for Nedbank that a Microsoft Field Engineer said was Impossible in SharePoint and asked R300 000 just for a few hours of his time for “analysis

SharePoint 2010Word Automation Services available with SharePoint Server 2010 supports converting Word documents to other formats. This includes PDF. This article describes using a document library list item event receiver to call Word Automation Services to convert Word documents to PDF when they are added to the list. The event receiver checks whether the list item added is a Word document. If so, it creates a conversion job to create a PDF version of the Word document and pushes the conversion job to the Word Automation Services conversion job queue.

This article describes the following steps to show how to call the Word Automation Services to convert a document:

  1. Creating a SharePoint 2010 list definition application solution in Visual Studio 2010.
  2. Adding a reference to the Microsoft.Office.Word.Server assembly.
  3. Adding an event receiver.
  4. Adding the sample code to the solution.

Creating a SharePoint 2010 List Definition Application in Visual Studio 2010

This article uses a SharePoint 2010 list definition application for the sample code.

To create a SharePoint 2010 list definition application in Visual Studio 2010

  1. Start Microsoft Visual Studio 2010 as an administrator.
  2. From the File Menu, point to the Project menu and then click New.
  3. In the New Project dialog box select the Visual C# SharePoint 2010 template type in the Project Templates pane.
  4. Select List Definition in the Templates pane.
  5. Name the project and solution ConvertWordToPDF.
    Figure 1. Creating the Solution

    Creating the solution

  6. To create the solution, click OK.
  7. Select a site to use for debugging and deployment.
  8. Select the site to use for debugging and the trust level for the SharePoint solution.
    Note
    Make sure to select the trust level Deploy as a farm solution. If you deploy as a sandboxed solution, it does not work because the solution uses the Microsoft.Office.Word.Server assembly. This assembly does not allow for calls from partially trusted callers.
    Figure 2. Selecting the trust level

    Creating the solution

  9. To finish creating the solution, click Finish.

Adding a Reference to the Microsoft Office Word Server Assembly

To use Word Automation Services, you must add a reference to the Microsoft.Office.Word.Server to the solution.

To add a reference to the Microsoft Office Word Server Assembly

  1. In Visual Studio, from the Project menu, select Add Reference.
  2. Locate the assembly. By using the Browse tab, locate the assembly. The Microsoft.Office.Word.Server assembly is located in the SharePoint 2010 ISAPI folder. This is usually located at C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\ISAPI. After the assembly is located, click OK to add the reference.
    Figure 3. Adding the Reference

    Adding the reference

Adding an Event Receiver

This article uses an event receiver that uses the Microsoft.Office.Word.Server assembly to create document conversion jobs and add them to the Word Automation Services conversion job queue.

To add an event receiver

  1. In Visual Studio, on the Project menu, click Add New Item.
  2. In the Add New Item dialog box, in the Project Templates pane, click the Visual C# SharePoint 2010 template.
  3. In the Templates pane, click Event Receiver.
  4. Name the event receiver ConvertWordToPDFEventReceiver and then click Add.
    Figure 4. Adding an Event Receiver

    Adding an event receiver

  5. The event receiver converts Word Documents after they are added to the List. Select the An item was added item from the list of events that can be handled.
    Figure 5. Choosing Event Receiver Settings

    Choosing even receiver settings

  6. Click Finish to add the event receiver to the project.

Adding the Sample Code to the Solution

Replace the contents of the ConvertWordToPDFEventReceiver.cs source file with the following code.C

 
using System;
using System.Security.Permissions;
using Microsoft.SharePoint;
using Microsoft.SharePoint.Security;
using Microsoft.SharePoint.Utilities;
using Microsoft.SharePoint.Workflow;

using Microsoft.Office.Word.Server.Conversions;

namespace ConvertWordToPDF.ConvertWordToPDFEventReceiver
{
  /// <summary>
  /// List Item Events
  /// </summary>
  public class ConvertWordToPDFEventReceiver : SPItemEventReceiver
  {
    /// <summary>
    /// An item was added.
    /// </summary>
    public override void ItemAdded(SPItemEventProperties properties)
    {
      base.ItemAdded(properties);

      // Verify the document added is a Word document
      // before starting the conversion.
      if (properties.ListItem.Name.Contains(".docx") 
        || properties.ListItem.Name.Contains(".doc"))
      {
        //Variables used by the sample code.
        ConversionJobSettings jobSettings;
        ConversionJob pdfConversion;
        string wordFile;
        string pdfFile;

        // Initialize the conversion settings.
        jobSettings = new ConversionJobSettings();
        jobSettings.OutputFormat = SaveFormat.PDF;

        // Create the conversion job using the settings.
        pdfConversion = 
          new ConversionJob("Word Automation Services", jobSettings);

        // Set the credentials to use when running the conversion job.
        pdfConversion.UserToken = properties.Web.CurrentUser.UserToken;

        // Set the file names to use for the source Word document
        // and the destination PDF document.
        wordFile = properties.WebUrl + "/" + properties.ListItem.Url;
        if (properties.ListItem.Name.Contains(".docx"))
        {
          pdfFile = wordFile.Replace(".docx", ".pdf");
        }
        else
        {
          pdfFile = wordFile.Replace(".doc", ".pdf");
        }

        // Add the file conversion to the conversion job.
        pdfConversion.AddFile(wordFile, pdfFile);

        // Add the conversion job to the Word Automation Services 
        // conversion job queue. The conversion does not occur
        // immediately but is processed during the next run of
        // the document conversion job.
        pdfConversion.Start();

      }
    }
  }
}

Examples of the kinds of operations supported by Word Automation Services are as follows:

  • Converting between document formats (e.g. DOC to DOCX)
  • Converting to fixed formats (e.g. PDF or XPS)
  • Updating fields
  • Importing “alternate format chunks”

This article contains sample code that shows how to create a SharePoint list event handler that can create Word Automation Services conversion jobs in response to Word documents being added to the list. This section uses code examples taken from the complete, working sample code provided earlier in this article to describe the approach taken by this article.

The ItemAdded(SPItemEventProperties) event handler in the list event handler first verifies that the item added to the document library list is a Word document by checking the name of the document for the .doc or .docx file name extension.

// Verify the document added is a Word document
// before starting the conversion.
if (properties.ListItem.Name.Contains(".docx") 
    || properties.ListItem.Name.Contains(".doc"))
{

If the item is a Word document then the code creates and initializes ConversionJobSettings and ConversionJob objects to convert the document to the PDF format.

C#
 
//Variables used by the sample code.
ConversionJobSettings jobSettings;
ConversionJob pdfConversion;
string wordFile;
string pdfFile;

// Initialize the conversion settings.
jobSettings = new ConversionJobSettings();
jobSettings.OutputFormat = SaveFormat.PDF;

// Create the conversion job using the settings.
pdfConversion = 
  new ConversionJob("Word Automation Services", jobSettings);

// Set the credentials to use when running the conversion job.
pdfConversion.UserToken = properties.Web.CurrentUser.UserToken;

The Word document to be converted and the name of the PDF document to be created are added to the ConversionJob.

 
// Set the file names to use for the source Word document
// and the destination PDF document.
wordFile = properties.WebUrl + "/" + properties.ListItem.Url;
if (properties.ListItem.Name.Contains(".docx"))
{
  pdfFile = wordFile.Replace(".docx", ".pdf");
}
else
{
  pdfFile = wordFile.Replace(".doc", ".pdf");
}

// Add the file conversion to the Conversion Job.
pdfConversion.AddFile(wordFile, pdfFile);

Finally the ConversionJob is added to the Word Automation Services conversion job queue.

 
// Add the conversion job to the Word Automation Services 
// conversion job queue. The conversion does not occur
// immediately but is processed during the next run of
// the document conversion job.
pdfConversion.Start();

How to add a Link to a Document external to SharePoint

Image

You can add links to external file shares or/and file server documents to your document library very easily. Why would you want to do this? Primarily so all your MetaData to all your documents are searchable in the same place.First a Farm Administrator will need to modify a core file on the front end server.  Then you must create a custom Content Type. If you use the built in content type you will not be able to link to a Folder directly.
Edit the NewLink.aspx page to allow the Document Library to accept a File:// entry.

  1. Go to the Front End Web Server \12\template\layouts directory.
  2. Open the file NewLink.aspx using NotePad. If I have to tell you to take a backup of this file first then you have no business editing this file (really).
  3. Go to the end of the script section near top of page and add:

    function HasValidUrlPrefix_UNC(url)
    {
    var urlLower=url.toLowerCase();
    if (-1==urlLower.search(”^http://”) &&
    -1==urlLower.search(”^https://”) && -1==urlLower.search(”^file://”))
    return false;
    return true;
    }

  • Use Edit Find to search for HasValidURLPrefix and replace it with HasValidURLPrefix_UNC (you should find it two times).
  • File – Save.
  • Open command prompt and enter IISreset /noforce.

Important: To link to Folders correctly you must create your own content type exactly as below and not use the built in URL or Link to Document at all.

Create custom Content Type

  1. Go to your Site Collection logged in as a Site Collection Administrator.
  2. Site actions – Site Settings – Modify All Site Settings.
  3. Content Types
  4. Create
  5. Name = URL or UNC
  6. Description = Use this content type to add a Link column that allows you to put a hyperlink or UNC path to external files, pages or folders. Format is File://\\ServerName\Folder , or http://
  7. Parent Content Type,
    1. Select parent content type from = Document Content Types
    2. Parent Content Type = Link to a Document
  8. Put this site content type into = Existing Group:  Company Custom
    1. image
  9. OK
  10. At the Site Content Type: URL or UNC page click on the URL hyperlink column and change it to Optional so that multiple documents being uploaded will not remain checked out.
  11. OK
    1. image

Add Custom Content Type to Document Library

  1. Go to a Document Library
  2. Settings – Library Settings
  3. Advanced Settings
  4. Allow Management Content Types = Yes
  5. OK
  6. Content Types – Add from existing site content types
  7. Select site content types from: Company Custom
  8. URL or UNC – Add – OK
  9. Click on URL or UNC hyperlink
  10. Click on Add from existing site
  11. Add all your Available Columns – OK
  12. Column Order – change the order to be consistant with the Document content type orders.
  13. Click on your Document Library breadcrumb to test.
  14. View – Modify your view to add the new URL or UNC column to your view next to your Name column.

Create Link to Document

  1. Go to the Document Library
  2. New – URL or UNC
  3. Document Name: This must equal the exact file or folder name less the extension.
    1. Example: My Resume 
    2. Example: Folder2
    3. Example: Doc1
  4. Document URL: This must be the UNC path to the folder or file.
    1. Example: http://LindaChapman.BlogSpot.com/Folder1/Folder2/My Resume.doc
    2. Example: http://LindaChapman.BlogSpot.com/Folder1/Folder2
    3. Example: File://\\ServerName\FolderName\FolderName2\Doc1.doc

You might see other blogs that say you can’t connect to a folder and must create a shortcut first. They are wrong. You can by the method above.

The biggest mistakes I see are:

  1. People click on the NAME field instead of the URL field. They are not the same. You MUST click on the URL field to access the Folder properly.
  2. People use the built in Link to Document content type thinking it is the same or will save them a step. It is not the same.
  3. People type the document extension in the Name field. You can not type the extension in the name field. It will see it is a UNC path and ignore the .aspx extension.
  4. People enter their slashes the wrong direction for UNC paths.

Using Word Automation Services and OpenXML to Change Document Formats

 
There are some tasks that are difficult when using the Welcome to the Open XML SDK 2.0 for Microsoft Office, such as repagination, conversion to other document formats such as PDF, or updating of the table of contents, fields, and other dynamic content in documents. Word Automation Services is a new feature of SharePoint 2010 that can help in these scenarios. It is a shared service that provides unattended, server-side conversion of documents into other formats, and some other important pieces of functionality. It was designed from the outset to work on servers and can process many documents in a reliable and predictable manner.Image

 

Using Word Automation Services, you can convert from Open XML WordprocessingML to other document formats. For example, you may want to convert many documents to the PDF format and spool them to a printer or send them by e-mail to your customers. Or, you can convert from other document formats (such as HTML or Word 97-2003 binary documents) to Open XML word-processing documents.

In addition to the document conversion facilities, there are other important areas of functionality that Word Automation Services provides, such as updating field codes in documents, and converting altChunk content to paragraphs with the normal style applied. These tasks can be difficult to perform using the Open XML SDK 2.0. However, it is easy to use Word Automation Services to do them. In the past, you used Word Automation Services to perform tasks like these for client applications. However, this approach is problematic. The Word client is an application that is best suited for authoring documents interactively, and was not designed for high-volume processing on a server. When performing these tasks, perhaps Word will display a dialog box reporting an error, and if the Word client is being automated on a server, there is no user to respond to the dialog box, and the process can come to an untimely stop. The issues associated with automation of Word are documented in the Knowledge Base article Considerations for Server-side Automation of Office.

This scenario describes how you can use Word Automation Services to automate processing documents on a server.

  • An expert creates some Word template documents that follow specific conventions. She might use content controls to give structure to the template documents. This provides a good user experience and a reliable programmatic approach for determining the locations in the template document where data should be replaced in the document generation process. These template documents are typically stored in a SharePoint document library.

  • A program runs on the server to merge the template documents together with data, generating a set of Open XML WordprocessingML (DOCX) documents. This program is best written by using the Welcome to the Open XML SDK 2.0 for Microsoft Office, which is designed specifically for generating documents on a server. These documents are placed in a SharePoint document library.

  • After generating the set of documents, they might be automatically printed. Or, they might be sent by e-mail to a set of users, either as WordprocessingML documents, or perhaps as PDF, XPS, or MHTML documents after converting them from WordprocessingML to the desired format.

  • As part of the conversion, you can instruct Word Automation Services to update fields, such as the table of contents.

Using the Welcome to the Open XML SDK 2.0 for Microsoft Office together with Word Automation Services enables you to create rich, end-to-end solutions that perform well and do not require automation of the Word client application.

One of the key advantages of Word Automation Services is that it can scale out to your needs. Unlike the Word client application, you can configure it to use multiple processors. Further, you can configure it to load balance across multiple servers if your needs require that.

Another key advantage is that Word Automation Services has perfect fidelity with the Word client applications. Document layout, including pagination, is identical regardless of whether the document is processed on the server or client.

Supported Source Document Formats

The supported source document formats for documents are as follows.

  1. Open XML File Format documents (.docx, .docm, .dotx, .dotm)

  2. Word 97-2003 documents (.doc, .dot)

  3. Rich Text Format files (.rtf)

  4. Single File Web Pages (.mht, .mhtml)

  5. Word 2003 XML Documents (.xml)

  6. Word XML Document (.xml)

Supported Destination Document Formats

The supported destination document formats includes all of the supported source document formats, and the following.

  1. Portable Document Format (.pdf)

  2. Open XML Paper Specification (.xps)

Other Capabilities of Word Automation Services

In addition to the ability to load and save documents in various formats, Word Automation Services includes other capabilities.

You can cause Word Automation Services to update the table of contents, the table of authorities, and index fields. This is important when generating documents. After generating a document, if the document has a table of contents, it is an especially difficult task to determine document pagination so that the table of contents is updated correctly. Word Automation Services handles this for you easily.

Open XML word-processing documents can contain various field types, which enables you to add dynamic content into a document. You can use Word Automation Services to cause all fields to be recalculated. For example, you can include a field type that inserts the current date into a document. When fields are updated, the associated content is also updated, so that the document displays the current date at the location of the field.

One of the powerful ways that you can use content controls is to bind them to XML elements in a custom XML part. See the article, Building Document Generation Systems from Templates with Word 2010 and Word 2007 for an explanation of bound content controls, and links to several resources to help you get started. You can replace the contents of bound content controls by replacing the XML in the custom XML part. You do not have to alter the main document part. The main document part contains cached values for all bound content controls, and if you replace the XML in the custom XML part, the cached values in the main document part are not updated. This is not a problem if you expect users to view these generated documents only by using the Word client. However, if you want to process the WordprocessingML markup more, you must update the cached values in the main document part. Word Automation Services can do this.

Alternate format content (as represented by the altChunk element) is a great way to import HTML content into a WordprocessingML document. The article, Building Document Generation Systems from Templates with Word 2010 and Word 2007 discusses alternate format content, its uses, and provides links to help you get started. However, until you open and save a document that contains altChunk elements, the document contains HTML, and not ordinary WordprocessingML markup such as paragraphs, runs, and text elements. You can use Word Automation Services to import the HTML (or other forms of alternative content) and convert them to WordprocessingML markup that contains familiar WordprocessingML paragraphs that have styles.

You can also convert to and from formats that were used by previous versions of Word. If you are building an enterprise-class application that is used by thousands of users, you may have some users who are using Word 2007 or Word 2003 to edit Open XML documents. You can convert Open XML documents so that they contain only the markup and features that are used by either Word 2007 or Word 2003.

Limitations of Word Automation Services

Word Automation Services does not include capabilities for printing documents. However, it is straightforward to convert WordprocessingML documents to PDF or XPS and spool them to a printer.

A question that sometimes occurs is whether you can use Word Automation Services without purchasing and installing SharePoint Server 2010. Word Automation Services takes advantage of facilities of SharePoint 2010, and is a feature of it. You must purchase and install SharePoint Server 2010 to use it. Word Automation Services is in the standard edition and enterprise edition.

By default, Word Automation Services is a service that installs and runs with a stand-alone SharePoint Server 2010 installation. If you are using SharePoint 2010 in a server farm, you must explicitly enable Word Automation Services.

To use it, you use its programming interface to start a conversion job. For each conversion job, you specify which files, folders, or document libraries you want the conversion job to process. Based on your configuration, when you start a conversion job, it begins a specified number of conversion processes on each server. You can specify the frequency with which it starts conversion jobs, and you can specify the number of conversions to start for each conversion process. In addition, you can specify the maximum percentage of memory that Word Automation Services can use.

The configuration settings enable you to configure Word Automation Services so that it does not consume too many resources on SharePoint servers that are part of your important infrastructure. The settings that you must use are dictated by how you want to use the SharePoint Server. If it is only used for document conversions, you want to configure the settings so that the conversion service can consume most of your processor time. If you are using the conversion service for low priority background conversions, you want to configure accordingly.

 
 

In addition to writing code that starts conversion processes, you can also write code to monitor the progress of conversions. This lets you inform users or post alert results when very large conversion jobs are completed.

Word Automation Services lets you configure four additional aspects of conversions.

  1. You can limit the number of file formats that it supports.

  2. You can specify the number of documents converted by a conversion process before it is restarted. This is valuable because invalid documents can cause Word Automation Services to consume too much memory. All memory is reclaimed when the process is restarted.

  3. You can specify the number of times that Word Automation Services attempts to convert a document. By default, this is set to two so that if Word Automation Services fails in its attempts to convert a document, it attempts to convert it only one more time (in that conversion job).

  4. You can specify the length of elapsed time before conversion processes are monitored. This is valuable because Word Automation Services monitors conversions to make sure that conversions have not stalled.

Unless you have installed a server farm, by default, Word Automation Services is installed and started in SharePoint Server 2010. However, as a developer, you want to alter its configuration so that you have a better development experience. By default, it starts conversion processes at 15 minute intervals. If you are testing code that uses it, you can benefit from setting the interval to one minute. In addition, there are scenarios where you may want Word Automation Services to use as much resources as possible. Those scenarios may also benefit from setting the interval to one minute.

To adjust the conversion process interval to one minute

  1. Start SharePoint 2010 Central Administration.

  2. On the home page of SharePoint 2010 Central Administration, Click Manage Service Applications.

  3. In the Service Applications administration page, service applications are sorted alphabetically. Scroll to the bottom of the page, and then click Word Automation Services . If you are installing a server farm and have installed Word Automation Services manually, whatever you entered for the name of the service is what you see on this page.

  4. In the Word Automation Services administration page, configure the conversion throughput field to the desired frequency for starting conversion jobs.

  5. As a developer, you may also want to set the number of conversion processes, and to adjust the number of conversions per worker process. If you adjust the frequency with which conversion processes start without adjusting the other two values, and you attempt to convert many documents, you make the conversion process much less efficient. The best value for these numbers should take into consideration the power of your computer that is running SharePoint Server 2010.

  6. Scroll to the bottom of the page and then click OK.

Because Word Automation Services is a service of SharePoint Server 2010, you can only use it in an application that runs directly on a SharePoint Server. You must build the application as a farm solution. You cannot use Word Automation Services from a sandboxed solution.

A convenient way to use Word Automation Services is to write a web service that you can use from client applications.

However, the easiest way to show how to write code that uses Word Automation Services is to build a console application. You must build and run the console application on the SharePoint Server, not on a client computer. The code to start and monitor conversion jobs is identical to the code that you would write for a Web Part, a workflow, or an event handler. Showing how to use Word Automation Services from a console application enables us to discuss the API without adding the complexities of a Web Part, an event handler, or a workflow.

Important note Important

Note that the following sample applications call Sleep(Int32) so that the examples query for status every five seconds. This would not be the best approach when you write code that you intend to deploy on production servers. Instead, you want to write a workflow with delay activity.

To build the application

  1. Start Microsoft Visual Studio 2010.

  2. On the File menu, point to New, and then click Project.

  3. In the New Project dialog box, in the Recent Template pane, expand Visual C#, and then click Windows.

  4. To the right side of the Recent Template pane, click Console Application.

  5. By default, Visual Studio creates a project that targets .NET Framework 4. However, you must target .NET Framework 3.5. From the list at the upper part of the File Open dialog box, select .NET Framework 3.5.

  6. In the Name box, type the name that you want to use for your project, such as FirstWordAutomationServicesApplication.

  7. In the Location box, type the location where you want to place the project.

    Figure 1. Creating a solution in the New Project dialog box

    Creating solution in the New Project box

  8. Click OK to create the solution.

  9. By default, Visual Studio 2010 creates projects that target x86 CPUs, but to build SharePoint Server applications, you must target any CPU.

  10. If you are building a Microsoft Visual C# application, in Solution Explorer window, right-click the project, and then click Properties.

  11. In the project properties window, click Build.

  12. Point to the Platform Target list, and select Any CPU.

    Figure 2. Target Any CPU when building a C# console application

    Changing target to any CPU

  13. If you are building a Microsoft Visual Basic .NET Framework application, in the project properties window, click Compile.

    Figure 3. Compile options for a Visual Basic application

    Compile options for Visual Basic applications

  14. Click Advanced Compile Options.

    Figure 4. Advanced Compiler Settings dialog box

    Advanced Compiler Settings dialog box

  15. Point to the Platform Target list, and then click Any CPU.

  16. To add a reference to the Microsoft.Office.Word.Server assembly, on the Project menu, click Add Reference to open the Add Reference dialog box.

  17. Select the .NET tab, and add the component named Microsoft Office 2010 component.

    Figure 5. Adding a reference to Microsoft Office 2010 component

    Add reference to Microsoft Office 2010 component

  18. Next, add a reference to the Microsoft.SharePoint assembly.

    Figure 6. Adding a reference to Microsoft SharePoint

    Adding reference to Microsoft SharePoint

The following examples provide the complete C# and Visual Basic listings for the simplest Word Automation Services application.

 
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.SharePoint;
using Microsoft.Office.Word.Server.Conversions;

class Program
{
    static void Main(string[] args)
    {
        string siteUrl = "http://localhost";
        // If you manually installed Word automation services, then replace the name
        // in the following line with the name that you assigned to the service when
        // you installed it.
        string wordAutomationServiceName = "Word Automation Services";
        using (SPSite spSite = new SPSite(siteUrl))
        {
            ConversionJob job = new ConversionJob(wordAutomationServiceName);
            job.UserToken = spSite.UserToken;
            job.Settings.UpdateFields = true;
            job.Settings.OutputFormat = SaveFormat.PDF;
            job.AddFile(siteUrl + "/Shared%20Documents/Test.docx",
                siteUrl + "/Shared%20Documents/Test.pdf");
            job.Start();
        }
    }
}
NoteNote

Replace the URL assigned to siteUrl with the URL to the SharePoint site.

To build and run the example

  1. Add a Word document named Test.docx to the Shared Documents folder in the SharePoint site.

  2. Build and run the example.

  3. After waiting one minute for the conversion process to run, navigate to the Shared Documents folder in the SharePoint site, and refresh the page. The document library now contains a new PDF document, Test.pdf.

In many scenarios, you want to monitor the status of conversions, to inform the user when the conversion process is complete, or to process the converted documents in additional ways. You can use the ConversionJobStatus class to query Word Automation Services about the status of a conversion job. You pass the name of the WordServiceApplicationProxy class as a string (by default, “Word Automation Services”), and the conversion job identifier, which you can get from the ConversionJob object. You can also pass a GUID that specifies a tenant partition. However, if the SharePoint Server farm is not configured for multiple tenants, you can pass null (Nothing in Visual Basic) as the argument for this parameter.

After you instantiate a ConversionJobStatus object, you can access several properties that indicate the status of the conversion job. The following are the three most interesting properties.

ConversionJobStatus Properties

Property

Return Value

Count

Number of documents currently in the conversion job.

Succeeded

Number of documents successfully converted.

Failed

The number of documents that failed conversion.

Whereas the first example specified a single document to convert, the following example converts all documents in a specified document library. You have the option of creating all converted documents in a different document library than the source library, but for simplicity, the following example specifies the same document library for both the input and output document libraries. In addition, the following example specifies that the conversion job should overwrite the output document if it already exists.

 

Console.WriteLine("Starting conversion job");
ConversionJob job = new ConversionJob(wordAutomationServiceName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.PDF;
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite;
SPList listToConvert = spSite.RootWeb.Lists["Shared Documents"];
job.AddLibrary(listToConvert, listToConvert);
job.Start();
Console.WriteLine("Conversion job started");
ConversionJobStatus status = new ConversionJobStatus(wordAutomationServiceName,
    job.JobId, null);
Console.WriteLine("Number of documents in conversion job: {0}", status.Count);
while (true)
{
    Thread.Sleep(5000);
    status = new ConversionJobStatus(wordAutomationServiceName, job.JobId,
        null);
    if (status.Count == status.Succeeded + status.Failed)
    {
        Console.WriteLine("Completed, Successful: {0}, Failed: {1}",
            status.Succeeded, status.Failed);
        break;
    }
    Console.WriteLine("In progress, Successful: {0}, Failed: {1}",
        status.Succeeded, status.Failed);
}

To run this example, add some WordprocessingML documents in the Shared Documents library. When you run this example, you see output similar to this code snippet,

 
Starting conversion job
Conversion job started
Number of documents in conversion job: 4
In progress, Successful: 0, Failed: 0
In progress, Successful: 0, Failed: 0
Completed, Successful: 4, Failed: 0

You may want to determine which documents failed conversion, perhaps to inform the user, or take remedial action such as removing the invalid document from the input document library. You can call the GetItems() method, which returns a collection of ConversionItemInfo() objects. When you call the GetItems() method, you pass a parameter that specifies whether you want to retrieve a collection of failed conversions or successful conversions. The following example shows how to do this.

 

Console.WriteLine("Starting conversion job");
ConversionJob job = new ConversionJob(wordAutomationServiceName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.PDF;
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite;
SPList listToConvert = spSite.RootWeb.Lists["Shared Documents"];
job.AddLibrary(listToConvert, listToConvert);
job.Start();
Console.WriteLine("Conversion job started");
ConversionJobStatus status = new ConversionJobStatus(wordAutomationServiceName,
    job.JobId, null);
Console.WriteLine("Number of documents in conversion job: {0}", status.Count);
while (true)
{
    Thread.Sleep(5000);
    status = new ConversionJobStatus(wordAutomationServiceName, job.JobId, null);
    if (status.Count == status.Succeeded + status.Failed)
    {
        Console.WriteLine("Completed, Successful: {0}, Failed: {1}",
            status.Succeeded, status.Failed);
        ReadOnlyCollection<ConversionItemInfo> failedItems =
            status.GetItems(ItemTypes.Failed);
        foreach (var failedItem in failedItems)
            Console.WriteLine("Failed item: Name:{0}", failedItem.InputFile);
        break;
    }
    Console.WriteLine("In progress, Successful: {0}, Failed: {1}", status.Succeeded,
        status.Failed);
}

To run this example, create an invalid document and upload it to the document library. An easy way to create an invalid document is to rename the WordprocessingML document, appending .zip to the file name. Then delete the main document part (known as document.xml), which is in the Word folder of the package. Rename the document, removing the .zip extension so that it contains the normal .docx extension.

When you run this example, it produces output similar to the following.

 
 
Starting conversion job
Conversion job started
Number of documents in conversion job: 5
In progress, Successful: 0, Failed: 0
In progress, Successful: 0, Failed: 0
In progress, Successful: 4, Failed: 0
In progress, Successful: 4, Failed: 0
In progress, Successful: 4, Failed: 0
Completed, Successful: 4, Failed: 1
Failed item: Name:http://intranet.contoso.com/Shared%20Documents/IntentionallyInvalidDocument.docx

Another approach to monitoring a conversion process is to use event handlers on a SharePoint list to determine when a converted document is added to the output document library.

In some situations, you may want to delete the source documents after conversion. The following example shows how to do this. 

Console.WriteLine("Starting conversion job");
ConversionJob job = new ConversionJob(wordAutomationServiceName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.PDF;
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite;
SPFolder folderToConvert = spSite.RootWeb.GetFolder("Shared Documents");
job.AddFolder(folderToConvert, folderToConvert, false);
job.Start();
Console.WriteLine("Conversion job started");
ConversionJobStatus status = new ConversionJobStatus(wordAutomationServiceName,
    job.JobId, null);
Console.WriteLine("Number of documents in conversion job: {0}", status.Count);
while (true)
{
    Thread.Sleep(5000);
    status = new ConversionJobStatus(wordAutomationServiceName, job.JobId, null);
    if (status.Count == status.Succeeded + status.Failed)
    {
        Console.WriteLine("Completed, Successful: {0}, Failed: {1}",
            status.Succeeded, status.Failed);
        Console.WriteLine("Deleting only items that successfully converted");
        ReadOnlyCollection<ConversionItemInfo> convertedItems =
            status.GetItems(ItemTypes.Succeeded);
        foreach (var convertedItem in convertedItems)
        {
            Console.WriteLine("Deleting item: Name:{0}", convertedItem.InputFile);
            folderToConvert.Files.Delete(convertedItem.InputFile);
        }
        break;
    }
    Console.WriteLine("In progress, Successful: {0}, Failed: {1}",
        status.Succeeded, status.Failed);
}
 
 
Console.WriteLine("Starting conversion job")
Dim job As ConversionJob = New ConversionJob(wordAutomationServiceName)
job.UserToken = spSite.UserToken
job.Settings.UpdateFields = True
job.Settings.OutputFormat = SaveFormat.PDF
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite
Dim folderToConvert As SPFolder = spSite.RootWeb.GetFolder("Shared Documents")
job.AddFolder(folderToConvert, folderToConvert, False)
job.Start()
Console.WriteLine("Conversion job started")
Dim status As ConversionJobStatus = _
    New ConversionJobStatus(wordAutomationServiceName, job.JobId, Nothing)
Console.WriteLine("Number of documents in conversion job: {0}", status.Count)
While True
    Thread.Sleep(5000)
    status = New ConversionJobStatus(wordAutomationServiceName, job.JobId, _
                                     Nothing)
    If status.Count = status.Succeeded + status.Failed Then
        Console.WriteLine("Completed, Successful: {0}, Failed: {1}", _
                          status.Succeeded, status.Failed)
        Console.WriteLine("Deleting only items that successfully converted")
        Dim convertedItems As ReadOnlyCollection(Of ConversionItemInfo) = _
            status.GetItems(ItemTypes.Succeeded)
        For Each convertedItem In convertedItems
            Console.WriteLine("Deleting item: Name:{0}", convertedItem.InputFile)
            folderToConvert.Files.Delete(convertedItem.InputFile)
        Next
        Exit While
    End If
    Console.WriteLine("In progress, Successful: {0}, Failed: {1}",
                      status.Succeeded, status.Failed)
End While

The power of using Word Automation Services becomes clear when you use it in combination with the Welcome to the Open XML SDK 2.0 for Microsoft Office. You can programmatically modify a document in a document library by using the Welcome to the Open XML SDK 2.0 for Microsoft Office, and then use Word Automation Services to perform one of the difficult tasks by using the Open XML SDK. A common need is to programmatically generate a document, and then generate or update the table of contents of the document. Consider the following document, which contains a table of contents.

Figure 7. Document with a table of contents

Document with table of contents

Let’s assume you want to modify this document, adding content that should be included in the table of contents. This next example takes the following steps.

  1. Opens the site and retrieves the Test.docx document by using a Collaborative Application Markup Language (CAML) query.

  2. Opens the document by using the Open XML SDK 2.0, and adds a new paragraph styled as Heading 1 at the beginning of the document.

  3. Starts a conversion job, converting Test.docx to TestWithNewToc.docx. It waits for the conversion to complete, and reports whether it was converted successfully.

 
Console.WriteLine("Querying for Test.docx");
SPList list = spSite.RootWeb.Lists["Shared Documents"];
SPQuery query = new SPQuery();
query.ViewFields = @"<FieldRef Name='FileLeafRef' />";
query.Query =
  @"<Where>
      <Eq>
        <FieldRef Name='FileLeafRef' />
        <Value Type='Text'>Test.docx</Value>
      </Eq>
    </Where>";
SPListItemCollection collection = list.GetItems(query);
if (collection.Count != 1)
{
    Console.WriteLine("Test.docx not found");
    Environment.Exit(0);
}
Console.WriteLine("Opening");
SPFile file = collection[0].File;
byte[] byteArray = file.OpenBinary();
using (MemoryStream memStr = new MemoryStream())
{
    memStr.Write(byteArray, 0, byteArray.Length);
    using (WordprocessingDocument wordDoc =
        WordprocessingDocument.Open(memStr, true))
    {
        Document document = wordDoc.MainDocumentPart.Document;
        Paragraph firstParagraph = document.Body.Elements<Paragraph>()
            .FirstOrDefault();
        if (firstParagraph != null)
        {
            Paragraph newParagraph = new Paragraph(
                new ParagraphProperties(
                    new ParagraphStyleId() { Val = "Heading1" }),
                new Run(
                    new Text("About the Author")));
            Paragraph aboutAuthorParagraph = new Paragraph(
                new Run(
                    new Text("Eric White")));
            firstParagraph.Parent.InsertBefore(newParagraph, firstParagraph);
            firstParagraph.Parent.InsertBefore(aboutAuthorParagraph,
                firstParagraph);
        }
    }
    Console.WriteLine("Saving");
    string linkFileName = file.Item["LinkFilename"] as string;
    file.ParentFolder.Files.Add(linkFileName, memStr, true);
}
Console.WriteLine("Starting conversion job");
ConversionJob job = new ConversionJob(wordAutomationServiceName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.Document;
job.AddFile(siteUrl + "/Shared%20Documents/Test.docx",
    siteUrl + "/Shared%20Documents/TestWithNewToc.docx");
job.Start();
Console.WriteLine("After starting conversion job");
while (true)
{
    Thread.Sleep(5000);
    Console.WriteLine("Polling...");
    ConversionJobStatus status = new ConversionJobStatus(
        wordAutomationServiceName, job.JobId, null);
    if (status.Count == status.Succeeded + status.Failed)
    {
        Console.WriteLine("Completed, Successful: {0}, Failed: {1}",
            status.Succeeded, status.Failed);
        break;
    }
}

After running this example with a document similar to the one used earlier in this section, a new document is produced, as shown in Figure 8.

Figure 8. Document with updated table of contents

Document with updated table of contents

The Open XML SDK 2.0 is a powerful tool for building server-side document generation and document processing systems. However, there are aspects of document manipulation that are difficult, such a document conversions, and updating of fields, table of contents, and more. Word Automation Services fills this gap with a high-performance solution that can scale out to your requirements. Using the Open XML SDK 2.0 in combination with Word Automation Services enables many scenarios that are difficult when using only the Open XML SDK 2.0.

A FREE “Export to Excel” C# class, using OpenXML

It’s amazing that even now, in 2014, there are so many developers still asking for help on how to write C# and VB.NET code to export their data to Excel.

Even worse, a lot of them will stumble on articles suggesting that they should write their data to a comma-separated file, but to give the file a .xls extension.

So today, I’m going to walkthrough how to use my C# “Export to Excel” class which you can add to your C# WinForms / WPF / ASP.NET application, using one line of code.

Depending on whether your data is stored in a DataSet, DataTable, or List<>, you simply need to call one of these three functions, and tell them what (Excel) file name you want to write to.

public static bool CreateExcelDocument(List list, string ExcelFilename
public static bool CreateExcelDocument(DataTable dt, string ExcelFilename
public static bool CreateExcelDocument(DataSet ds, string ExcelFilename)

Here’s a simple example.

We create a DataSet, fill it with some data (just add your own CreateSampleData function), then call the CreateExcelFile.CreateExcelDocument function, passing it the DataSet and a filename:

// Step 1: Create a DataSet, and put some sample data in it
DataSet ds = CreateSampleData();

// Step 2: Create the Excel .xlsx file
try
{
CreateExcelFile.CreateExcelDocument(ds, “C:\\Sample.xlsx”);
}
catch (Exception ex)
{
MessageBox.Show(“Couldn’t create Excel file.\r\nException: ” + ex.Message);
return;
}

And that’s all you have to do. The CreateExcelDocument function will create a “real” Excel file for you.

For example, if you had a created a DataSet containing three DataTables called:

Drivers
Vehicles,
Vehicle Owners,

..then here’s what your Excel file would look like. The class would create one worksheet per DataTable, and each worksheet would contain the data from that DataTable.

The full source code for this C# class is freely downloadable here
Adding the library to your application

The C# code above shows how easy it is to call the CreateExcelFile class.

DataSet ds = CreateSampleData();
CreateExcelFile.CreateExcelDocument(ds, “C:\\Sample.xlsx”);

However, to use this library, you’ll need to add two files from the free Microsoft OpenXML SDK:

DocumentFormat.OpenXml.dll: From the free Microsoft Open XML SDK library

WindowsBase.dll: From the Microsoft .NET Framework library

Add these two DLLs to your project’s References section, and remember to set them to “Copy Local”

Then, just download the CreateExcelFile.cs file (via the “Browse code” link in CodeProject’s left-hand panel), and add it to your application.

And that’s it.

Regardless of if your data is stored in a List<>, DataTable, or DataSet, you can export it to a “real” Office 2-10 Excel .xlsx file using that one line of code.

And because it is created using the OpenXML library, you can run this code on machines which don’t have Excel installed.
ASP.NET

For ASP.NET developers, I’ve added three extra functions.

public static bool CreateExcelDocument(List list, string filename, System.Web.HttpResponse Response)

public static bool CreateExcelDocument(DataTable dt, string filename, System.Web.HttpResponse Response)

public static bool CreateExcelDocument(DataSet ds, string filename, System.Web.HttpResponse Response)

Rather than creating the Excel file in a temporary directory, then having to load in the file and output it to the webpage’s HttpResponse, you can get the library to write directly to the HttpResponse.

However, by default, this functionality is disabled (to prevent build issues for the non ASP.NET developers). To enable these three functions, you need to make two changes.

First, uncomment the top line of the CreateExcelFile.cs code, so that it now reads:

#define INCLUDE_WEB_FUNCTIONS

Now, you need to add a Reference to the System.Web .NET library:

Adding a reference

Once you’ve done these two steps, the CreateExcelFile library is ready to go.

For example, in this example, my ASP.NET C# code has a list of Employee records, stored in a List. I add an “Export to Excel” button to my webpage, and when the user clicks on it, I just need one simple call to the CreateExcelFile class.

// In this example, I have a defined a List of my Employee objects.
class Employee;

List listOfEmployees = new List();

// The following ASP.Net code gets run when I click on my “Export to Excel” button.
protected void btnExportToExcel_Click(object sender, EventArgs e)
{
// It doesn’t get much easier than this…
CreateExcelFile.CreateExcelDocument(listOfEmployees, “Employees.xlsx”, Response);
}

And that’s it. A real Excel file, in one line of code.
Going forward

You’ll notice that this library is excellent for one job – writing plain, boring data to an Excel file. I haven’t attempted to add any classes to add formatting, colors, pivot tables or anything else.

However, this class is an excellent way to get started (without paying for third-party software to create the Excel file for you), and if you want to take this further, you’ll soon find that Googling will easily find you extra source code to add on top of this.

For example, if you wanted to add a background color to some of the cells in the Excel file, simply Google “open XML background color” and you’ll have many articles showing you how to do this.

The reason I wrote this article is that I found that it was very hard to find a free, easy to use C# library which actually created the OpenXML Excel file in the first place.

Download File

SharePoint 2013 – Creating a Word document with OOXML

This solution is based on the SharePoint-hosted app template provided by Visual Studio 2012. The solution enumerates through each document library in the host website, and adds the library to a drop-down list.

2008040211105590dad[1]

 

When the user selects a library and clicks a tile, the app creates a sample Word 2013 document by using OOXML in the selected library.

Prerequisites

This sample requires the following:

  • Visual Studio 2012
  • Office Developer Tools for Visual Studio 2012
  • Either of the following:
    • SharePoint Server 2013 configured to host apps, and with a Developer Site collection already created; or,
    • Access to an Office 365 Developer Site configured to host apps.

Key components of the sample

The sample app contains the following:

  • The Default.aspx webpage, which is used to enumerate through each document library in the host website, and render tiles for each MP4 video in the app.
  • The Point8020Metro.css style sheet (in the CSS folder) which contains some simple styles for rendering tiles.
  • The AppManifest.xml file, which has been edited to specify that the app requests Full Control permissions for the hosting web.
  • References to the DocumentFormat.OpenXml assembly provided by the OpenXML SDK 2.5.

All other files are automatically provided by the Visual Studio project template for apps for SharePoint, and they have not been modified in the development of this sample.

Configure the sample

Follow these steps to configure the sample.

  1. Open the SP_Autohosted_OOXML_cs.sln file using Visual Studio 2012.
  2. In the Properties window, add the full URL to your SharePoint Server 2013 Developer Site collection or Office 365 Developer Site to the Site URL property.

No other configuration is required.

Build the sample

To build the sample, press CTRL+SHIFT+B.

Run and test the sample

To run and test the sample, do the following:

  1. Press F5 to run the app.
  2. Sign in to your SharePoint Server 2013 Developer Site collection or Office 365 Developer Site if you are prompted to do so by the browser.
  3. Trust the app when you are prompted to do so.

The following images illustrate the app. In Figure 1 the app has been trusted and libraries added to the drop-down list.

Figure 1. View of the app with drop-down list

Figure 1

In Figure 2, the user has clicked the orange tile. The document is created and the red tile provides a link to the appropriate library (Figure 3), which the user reaches by clicking on the red tile.

Figure 2. Open XML document creator

Figure 2

Figure 3. Document library

Figure 3

Troubleshooting

Ensure that you have SharePoint Server 2013 configured to host apps (with a Developer Site Collection already created), or that you have signed up for an Office 365 Developer Site configured to host apps.

Change log

First release: January 30, 2013.

Related content

Using OpenXML to build Office 365 Apps (OOXML)

If you’re building apps for Office to run in Word, you might already know that the JavaScript API for Office (Office.js) offers several formats for reading and writing document content. These are called coercion types, and they include plain text, tables, HTML, and Office Open XML (OOXML).

So what are your options when you need to add rich content to a document, such as images, formatted tables, charts, or even just formatted text?

You can use HTML for inserting some types of rich content, such as pictures. Depending on your scenario, there can be drawbacks to HTML coercion, such as limitations in the formatting and positioning options available to your content.

Because Office Open XML is the language in which Word documents (such as .docx and .dotx) are written, you can insert virtually any type of content that a user can add to a Word document, with virtually any type of formatting the user can apply. Determining the Office Open XML markup you need to get it done is easier than you might think.

Note Note

Office Open XML is also the language behind PowerPoint and Excel (and, as of Office 2013, Visio) documents. However, currently, you can coerce content as Office Open XML only in apps for Office created for Word. For more information about Office Open XML, including the complete language reference documentation, see Additional resources.

To begin, take a look at some of the content types you can insert using OOXML coercion.

Download the code sample Loading and Writing Office Open XML, which contains the Office Open XML markup and Office.js code required for inserting any of the following examples into Word.

Note Note

Throughout this article, the terms content types and rich content refer to the types of rich content you can insert into a Word document.

Figure 1. Text with direct formatting.

Text with direct formatting applied.

You can use direct formatting to specify exactly what the text will look like regardless of existing formatting in the user’s document.

Figure 2. Text formatted using a style.

Text formatted with paragraph style.

You can use a style to automatically coordinate the look of text you insert with the user’s document.

Figure 3. A simple image.

Image of a logo.

You can use the same method for inserting any Office-supported image format.

Figure 4. An image formatted using picture styles and effects.

Formatted image in Word 2013.

Adding high quality formatting and effects to your images requires much less markup than you might expect.

Figure 5. A content control.

Text within a bound content control.

You can use content controls with your app to add content at a specified (bound) location rather than at the selection.

Figure 6. A text box with WordArt formatting.

Text formatted with WordArt text effects.

Text effects are available in Word for text inside a text box (as shown here) or for regular body text.

Figure 7. A shape.

An Office 2013 drawing shape in Word 2013.

You can insert built-in or custom drawing shapes, with or without text and formatting effects.

Figure 8. A table with direct formatting.

A formatted table in Word 2013.

You can include text formatting, borders, shading, cell sizing, or any table formatting you need.

Figure 9. A table formatted using a table style.

A formatted table in Word 2013.

You can use built-in or custom table styles just as easily as using a paragraph style for text.

Figure 10. A SmartArt diagram.

A dynamic SmartArt diagram in Word 2013.

Office 2013 offers a wide array of SmartArt diagram layouts (and you can use Office Open XML to create your own).

Figure 11. A chart.

A chart in Word 2013.

You can insert Excel charts as live charts in Word documents, which also means you can use them in your app for Word.

As you can see by the preceding examples, you can use OOXML coercion to insert essentially any type of content that a user can insert into their own document.

There are two simple ways to get the Office Open XML markup you need. Either add your rich content to an otherwise blank Word 2013 document and then save the file in Word XML Document format or use a test app with the getSelectedDataAsync method to grab the markup. Both approaches provide essentially the same result.

Note Note

An Office Open XML document is actually a compressed package of files that represent the document contents. Saving the file in the Word XML Document format gives you the entire Office Open XML package flattened into one XML file, which is also what you get when using getSelectedDataAsync to retrieve the Office Open XML markup.

If you save the file to an XML format from Word, note that there are two options under the Save as Type list in the Save As dialog box for .xml format files. Be sure to choose Word XML Document and not the Word 2003 option.

Download the code sample named Get, Set, and Edit Office Open XML, which you can use as a tool to retrieve and test your markup.

So is that all there is to it? Well, not quite. Yes, for many scenarios, you could use the full, flattened Office Open XML result you see with either of the preceding methods and it would work. The good news is that you probably don’t need most of that markup.

If you’re one of the many app developers seeing Office Open XML markup for the first time, trying to make sense of the massive amount of markup you get for the simplest piece of content might seem overwhelming, but it doesn’t have to be.

In this topic, we’ll use some common scenarios we’ve been hearing from the apps for Office developer community to show you techniques for simplifying Office Open XML for use in your app. We’ll explore the markup for some types of content shown earlier along with the information you need for minimizing the Office Open XML payload. We’ll also look at the code you need for inserting rich content into a document at the active selection and how to use Office Open XML with the bindings object to add or replace content at specified locations.

When you use getSelectedDataAsync to retrieve the Office Open XML for a selection of content (or when you save the document in Word XML Document format), what you’re getting is not just the markup that describes your selected content; it’s an entire document with many options and settings that you almost certainly don’t need. In fact, if you use that method from a document that contains a task pane app, the markup you get even includes your task pane.

Even a simple Word document package includes parts for document properties, styles, theme (formatting settings), web settings, fonts, and then some—in addition to parts for the actual content.

For example, say that you want to insert just a paragraph of text with direct formatting, as shown earlier in Figure 1. When you grab the Office Open XML for the formatted text using using getSelectedDataAsync, you see a large amount of markup. That markup includes a package element that represents an entire document, which contains several parts (commonly referred to as document parts or, in the Office Open XML, as package parts), as you see listed in Figure 13. Each part represents a separate file within the package.

Tip Tip

You can edit Office Open XML markup in a text editor like Notepad. If you open it in Visual Studio 2012, you can use Edit >Advanced > Format Document (Ctrl+K, Ctrl+D) to format the package for easier editing. Then you can collapse or expand document parts or sections of them, as shown in Figure 12, to more easily review and edit the content of the Office Open XML package. Each document part begins with a pkg:part tag.

Figure 12. Collapse and expand package parts for easier editing in Visual Studio 2012.

Office Open XML code snippet for a package part.

Figure 13. The parts included in a basic Word Office Open XML document package.

Office Open XML code snippet for a package part.

With all that markup, you might be surprised to discover that the only elements you actually need to insert the formatted text example are pieces of the .rels part and the document.xml part.

Note Note

The two lines of markup above the package tag (the XML declarations for version and Office program ID) are assumed when you use the OOXML coercion type, so you don’t need to include them. Keep them if you want to open your edited markup as a Word document to test it.

Several of the other types of content shown at the start of this topic require additional parts as well (beyond those shown in Figure 13), and we’ll address those later in this topic. Meanwhile, since you’ll see most of the parts shown in Figure 13 in the markup for any Word document package, here’s a quick summary of what each of these parts is for and when you need it:

  • Inside the package tag, the first part is the .rels file, which defines relationships between the top-level parts of the package (these are typically the document properties, thumbnail (if any), and main document body). Some of the content in this part is always required in your markup because you need to define the relationship of the main document part (where your content resides) to the document package.

  • The document.xml.rels part defines relationships for additional parts required by the document.xml (main body) part, if any.

Important note Important

The .rels files in your package (such as the top-level .rels, document.xml.rels, and others you may see for specific types of content) are an extremely important tool that you can use as a guide for helping you quickly edit down your Office Open XML package. To learn more about how to do this, see Creating your own markup: best practices later in this topic.

  • The document.xml part is the content in the main body of the document. You need elements of this part, of course, since that’s where your content appears. But, you don’t need everything you see in this part. We’ll look at that in more detail later.

  • Many parts are automatically ignored by the Set methods when inserting content into a document using OOXML coercion, so you might as well remove them. These include the theme1.xml file (the document’s formatting theme), the document properties parts (core, app, and thumbnail), and setting files (including settings, webSettings, and fontTable).

  • In the Figure 1 example, text formatting is directly applied (that is, each font and paragraph formatting setting applied individually). But, if you use a style (such as if you want your text to automatically take on the formatting of the Heading 1 style in the destination document) as shown earlier in Figure 2, then you would need part of the styles.xml part as well as a relationship definition for it. For more information, see the topic section Adding objects that use additional Office Open XML parts.

Let’s take a look at the minimal Office Open XML markup required for the formatted text example shown in Figure 1 and the JavaScript required for inserting it at the active selection in the document.

Simplified Office Open XML markup

We’ve edited the Office Open XML example shown here, as described in the preceding section, to leave just required document parts and only required elements within each of those parts. We’ll walk through how to edit the markup yourself (and explain a bit more about the pieces that remain here) in the next section of the topic.

Copy
<pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage">
  <pkg:part pkg:name="/_rels/.rels" pkg:contentType="application/vnd.openxmlformats-package.relationships+xml" pkg:padding="512">
    <pkg:xmlData>
      <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
        <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="word/document.xml"/>
      </Relationships>
    </pkg:xmlData>
  </pkg:part>
  <pkg:part pkg:name="/word/document.xml" pkg:contentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml">
    <pkg:xmlData>
      <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" >
        <w:body>
          <w:p>
            <w:pPr>
              <w:spacing w:before="360" w:after="0" w:line="480" w:lineRule="auto"/>
              <w:rPr>
                <w:color w:val="70AD47" w:themeColor="accent6"/>
                <w:sz w:val="28"/>
              </w:rPr>
            </w:pPr>
            <w:r>
              <w:rPr>
                <w:color w:val="70AD47" w:themeColor="accent6"/>
                <w:sz w:val="28"/>
              </w:rPr>
              <w:t>This text has formatting directly applied to achieve its font size, color, line spacing, and paragraph spacing.</w:t>
            </w:r>
          </w:p>
        </w:body>
      </w:document>
    </pkg:xmlData>
  </pkg:part>
</pkg:package>
NoteNote

If you add the markup shown here to an XML file along with the XML declaration tags for version and mso-application at the top of the file (shown in Figure 13), you can open it in Word as a Word document. Or, without those tags, you can still open it using File> Open in Word. You’ll see Compatibility Mode on the title bar in Word 2013, because you removed the settings that tell Word this is a 2013 document. Since you’re adding this markup to an existing Word 2013 document, that won’t affect your content at all.

JavaScript for using setSelectedDataAsync

Once you save the preceding Office Open XML as an XML file that’s accessible from your solution, you can use the following function to set the formatted text content in the document using OOXML coercion.

In this function, notice that all but the last line are used to get your saved markup for use in the setSelectedDataAsync method call at the end of the function. setSelectedDataASync requires only that you specify the content to be inserted and the coercion type.

Note Note

Replace yourXMLfilename with the name and path of the XML file as you’ve saved it in your solution. If you’re not sure where to include XML files in your solution or how to reference them in your code, see the Loading and Writing Office Open XML code sample for examples of that and a working example of the markup and JavaScript shown here.

Copy
function writeContent() {
    var myOOXMLRequest = new XMLHttpRequest();
    var myXML;
    myOOXMLRequest.open('GET', ‘yourXMLfilename’, false);
    myOOXMLRequest.send();
    if (myOOXMLRequest.status === 200) {
        myXML = myOOXMLRequest.responseText;
    }
    Office.context.document.setSelectedDataAsync(myXML, { coercionType: 'ooxml' });
}

Let’s take a closer look at the markup you need to insert the preceding formatted text example.

For this example, start by simply deleting all document parts from the package other than .rels and document.xml. Then, we’ll edit those two required parts to simplify things further.

Important note Important

Use the .rels parts as a map to quickly gauge what’s included in the package and determine what parts you can delete completely (that is, any parts not related to or referenced by your content). Remember that every document part must have a relationship defined in the package and those relationships appear in the .rels files. So you should see all of them listed in either .rels, document.xml.rels, or a content-specific .rels file.

The following markup shows the required .rels part before editing. Since we’re deleting the app and core document property parts, and the thumbnail part, we need to delete those relationships from .rels as well. Notice that this will leave only the relationship (with the relationship ID “rID1” in the following example) for document.xml.

Copy
  <pkg:part pkg:name="/_rels/.rels" pkg:contentType="application/vnd.openxmlformats-package.relationships+xml" pkg:padding="512">
    <pkg:xmlData>
      <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
        <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties" Target="docProps/core.xml"/>
        <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/thumbnail" Target="docProps/thumbnail.emf"/>
        <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="word/document.xml"/>
        <Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties" Target="docProps/app.xml"/>
      </Relationships>
    </pkg:xmlData>
  </pkg:part>
Important noteImportant

Remove the relationships (that is, the <Relationship…> tag) for any parts that you completely remove from the package. Including a part without a corresponding relationship, or excluding a part and leaving its relationship in the package, will result in an error.

The following markup shows the document.xml part—which includes our sample formatted text content—before editing.

Copy
<pkg:part pkg:name="/word/document.xml" pkg:contentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml">
    <pkg:xmlData>
      <w:document mc:Ignorable="w14 w15 wp14" xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape">
        <w:body>
          <w:p>
            <w:pPr>
              <w:spacing w:before="360" w:after="0" w:line="480" w:lineRule="auto"/>
              <w:rPr>
                <w:color w:val="70AD47" w:themeColor="accent6"/>
                <w:sz w:val="28"/>
              </w:rPr>
            </w:pPr>
            <w:r>
              <w:rPr>
                <w:color w:val="70AD47" w:themeColor="accent6"/>
                <w:sz w:val="28"/>
              </w:rPr>
              <w:t>This text has formatting directly applied to achieve its font size, color, line spacing, and paragraph spacing.</w:t>
            </w:r>
            <w:bookmarkStart w:id="0" w:name="_GoBack"/>
            <w:bookmarkEnd w:id="0"/>
          </w:p>
          <w:p/>
          <w:sectPr>
            <w:pgSz w:w="12240" w:h="15840"/>
            <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/>
            <w:cols w:space="720"/>
          </w:sectPr>
        </w:body>
      </w:document>
    </pkg:xmlData>
  </pkg:part>

Since document.xml is the primary document part where you place your content, let’s take a quick walk through that part. (Figure 14, which follows this list, provides a visual reference to show how some of the core content and formatting tags explained here relate to what you see in a Word document.)

  • The opening w:document tag includes several namespace (xmlns) listings. Many of those namespaces refer to specific types of content and you only need them if they’re relevant to your content.

    Notice that the prefix for the tags throughout a document part refers back to the namespaces. In this example, the only prefix used in the tags throughout the document.xml part is w:, so the only namespace that we need to leave in the opening w:document tag is xmlns:w.

TipTip

If you’re editing your markup in Visual Studio 2012, after you delete namespaces in any part, look through all tags of that part. If you’ve removed a namespace that’s required for your markup, you’ll see a red squiggly underline on the relevant prefix for affected tags. Also note that, if you remove the xmlns:mc namespace, you must also remove the mc:Ignorable attribute that precedes the namespace listings.

  • Inside the opening body tag, you see a paragraph tag (w:p), which includes our sample content for this example.

  • The w:pPr tag includes properties for directly-applied paragraph formatting, such as space before or after the paragraph, paragraph alignment, or indents. (Direct formatting refers to attributes that you apply individually to content rather than as part of a style.) This tag also includes direct font formatting that’s applied to the entire paragraph, in a nested w:rPr (run properties) tag, which contains the font color and size set in our sample.

NoteNote

You might notice that font sizes and some other formatting settings in Word Office Open XML markup look like they’re double the actual size. That’s because paragraph and line spacing, as well some section formatting properties shown in the preceding markup, are specified in twips (one-twentieth of a point).

Depending on the types of content you work with in Office Open XML, you may see several additional units of measure, including English Metric Units (914,400 EMUs to an inch), which are used for some Office Art (drawingML) values and 100,000 times actual value, which is used in both drawingML and PowerPoint markup. PowerPoint also expresses some values as 100 times actual and Excel commonly uses actual values.

  • Within a paragraph, any content with like properties is included in a run (w:r), such as is the case with the sample text. Each time there’s a change in formatting or content type, a new run starts. (That is, if just one word in the sample text was bold, it would be separated into its own run.) In this example, the content includes just the one text run.

    Notice that, because the formatting included in this sample is font formatting (that is, formatting that can be applied to as little as one character), it also appears in the properties for the individual run.

  • Also notice the tags for the hidden “_GoBack” bookmark (w:bookmarkStart and w:bookmarkEnd), which appear in Word 2013 documents by default. You can always delete the start and end tags for the GoBack bookmark from your markup.

  • The last piece of the document body is the w:sectPr tag, or section properties. This tag includes settings such as margins and page orientation. The content you insert using setSelectedDataAsync will take on the active section properties in the destination document by default. So, unless your content includes a section break (in which case you’ll see more than one w:sectPr tag), you can delete this tag.

Figure 14. How common tags in document.xml relate to the content and layout of a Word document.

Office Open XML elements in a Word document.

TipTip

In markup you create, you might see another attribute in several tags that includes the characters w:rsid, which you don’t see in the examples used in this topic. These are revision identifiers. They’re used in Word for the Combine Documents feature and they’re on by default. You’ll never need them in markup you’re inserting with your app and turning them off makes for much cleaner markup. You can easily remove existing RSID tags or disable the feature (as described in the following procedure) so that they’re not added to your markup for new content.

Be aware that if you use the co-authoring capabilities in Word (such as the ability to simultaneously edit documents with others), you should enable the feature again when finished generating the markup for your app.

To turn off RSID attributes in Word for documents you create going forward, do the following:

  1. In Word 2013, choose File and then choose Options.

  2. In the Word Options dialog box, choose Trust Center and then choose Trust Center Settings.

  3. In the Trust Center dialog box, choose Privacy Options and then disable the setting Store Random Number to Improve Combine Accuracy.

To remove RSID tags from an existing document, try the following shortcut with the document open in Word:

  1. With your insertion point in the main body of the document, press Ctrl+Home to go to the top of the document.

  2. On the keyboard, press Spacebar, Delete, Spacebar. Then, save the document.

After removing the majority of the markup from this package, we’re left with the minimal markup that needs to be inserted for the sample, as shown in the preceding section.

Several types of rich content require only the .rels and document.xml components shown in the preceding example, including content controls, Office drawing shapes and text boxes, and tables (unless a style is applied to the table). In fact, you can reuse the same edited package parts and swap out just the <body> content in document.xml for the markup of your content.

To check out the Office Open XML markup for the examples of each of these content types shown earlier in Figures 5 through 8, explore the Loading and Writing Office Open XML code sample referenced in the Overview section.

Before we move on, let’s take a look at differences to note for a couple of these content types and how to swap out the pieces you need.

Understanding drawingML markup (Office graphics) in Word: What are fallbacks?

If the markup for your shape or text box looks far more complex than you would expect, there is a reason for it. With the release of Office 2007, we saw the introduction of the Office Open XML Formats as well as the introduction of a new Office graphics engine that PowerPoint and Excel fully adopted. In the 2007 release, Word only incorporated part of that graphics engine—adopting the updated Excel charting engine, SmartArt graphics, and advanced picture tools. For shapes and text boxes, Word 2007 continued to use legacy drawing objects (VML). It was in the 2010 release that Word took the additional steps with the graphics engine to incorporate updated shapes and drawing tools.

So, to support shapes and text boxes in Office Open XML Format Word documents when opened in Word 2007, shapes (including text boxes) require fallback VML markup.

Typically, as you see for the shape and text box examples included in the Loading and Writing Office Open XML code sample, the fallback markup can be removed. Word 2013 automatically adds missing fallback markup to shapes when a document is saved. However, if you prefer to keep the fallback markup to ensure that you’re supporting all user scenarios, there’s no harm in retaining it.

Note also that, if you have grouped drawing objects included in your content, you’ll see additional (and apparently repetitive) markup, but this must be retained. Portions of the markup for drawing shapes are duplicated when the object is included in a group.

Important note Important

When working with text boxes and drawing shapes, be sure to check namespaces carefully before removing them from document.xml. (Or, if you’re reusing markup from another object type, be sure to add back any required namespaces you might have previously removed from document.xml.) A substantial portion of the namespaces included by default in document.xml are there for drawing object requirements.

Note about graphic positioning

In the code samples Loading and Writing Office Open XMLand Get, Set, and Edit Office Open XML, the text box and shape are setup using different types of text wrapping and positioning settings. (Also be aware that the image examples in those code samples are setup using in line with text formatting, which positions a graphic object on the text baseline.)

The shape in those code samples is positioned relative to the right and bottom page margins. Relative positioning lets you more easily coordinate with a user’s unknown document setup because it will adjust to the user’s margins and run less risk of looking awkward because of paper size, orientation, or margin settings. To retain relative positioning settings when you insert a graphic object, you must retain the paragraph mark (w:p) in which the positioning (known in Word as an anchor) is stored. If you insert the content into an existing paragraph mark rather than including your own, you may be able to retain the same initial visual, but many types of relative references that enable the positioning to automatically adjust to the user’s layout may be lost.

Working with content controls

Content controls are an important feature in Word 2013 that can greatly enhance the power of your app for Word in multiple ways, including giving you the ability to insert content at designated places in the document rather than only at the selection.

In Word, find content controls on the Developer tab of the ribbon, as shown here in Figure 15.

Figure 15. The Controls group on the Developer tab in Word.

Content Controls group on the Word 2013 ribbon.

Types of content controls in Word include rich text, plain text, picture, building block gallery, check box, dropdown list, combo box, date picker, and repeating section.

  • Use the Properties command, shown in Figure 15, to edit the title of the control and to set preferences such as hiding the control container.

  • Enable Design Mode to edit placeholder content in the control.

If your app works with a Word template, you can include controls in that template to enhance the behavior of the content. You can also use XML data binding in a Word document to bind content controls to data, such as document properties, for easy form completion or similar tasks. (Find controls that are already bound to built-in document properties in Word on the Insert tab, under Quick Parts.)

When you use content controls with your app, you can also greatly expand the options for what your app can do using a different type of binding. You can bind to a content control from within the app and then write content to the binding rather than to the active selection.

Note Note

Don’t confuse XML data binding in Word with the ability to bind to a control via your app. These are completely separate features. However, you can include named content controls in the content you insert via your app using OOXML coercion and then use code in the app to bind to those controls.

Also be aware that both XML data binding and Office.js can interact with custom XML parts in your app—so it is possible to integrate these powerful tools. To learn about working with custom XML parts in the Office JavaScript API, see the Additional resources section of this topic.

Working with bindings in your Word app is covered in the next section of the topic. First, let’s take a look at an example of the Office Open XML required for inserting a rich text content control that you can bind to using your app.

Important note Important

Rich text controls are the only type of content control you can use to bind to a content control from within your app.

Copy
<pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage">
  <pkg:part pkg:name="/_rels/.rels" pkg:contentType="application/vnd.openxmlformats-package.relationships+xml" pkg:padding="512">
    <pkg:xmlData>
      <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
        <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="word/document.xml"/>
      </Relationships>
    </pkg:xmlData>
  </pkg:part>
  <pkg:part pkg:name="/word/document.xml" pkg:contentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml">
    <pkg:xmlData>
      <w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" >
        <w:body>
          <w:p/>
          <w:sdt>
              <w:sdtPr>
                <w:alias w:val="MyContentControlTitle"/>
                <w:id w:val="1382295294"/>
                <w15:appearance w15:val="hidden"/>
                <w:showingPlcHdr/>
              </w:sdtPr>
              <w:sdtContent>
                <w:p>
                  <w:r>
                  <w:t>[This text is inside a content control that has its container hidden. You can bind to a content control to add or interact with content at a specified location in the document.]</w:t>
                </w:r>
                </w:p>
              </w:sdtContent>
            </w:sdt>
          </w:body>
      </w:document>
    </pkg:xmlData>
  </pkg:part>
 </pkg:package>

As already mentioned, content controls—like formatted text—don’t require additional document parts, so only edited versions of the .rels and document.xml parts are included here.

The w:sdt tag that you see within the document.xml body represents the content control. If you generate the Office Open XML markup for a content control, you’ll see that several attributes have been removed from this example, including the tag and document part properties. Only essential (and a couple of best practice) elements have been retained, including the following:

  • The alias is the title property from the Content Control Properties dialog box in Word. This is a required property (representing the name of the item) if you plan to bind to the control from within your app.

  • The unique id is a required property. If you bind to the control from within your app, the ID is the property the binding uses in the document to identify the applicable named content control.

  • The appearance attribute is used to hide the control container, for a cleaner look. This is a new feature in Word 2013, as you see by the use of the w15 namespace. Because this property is used, the w15 namespace is retained at the start of the document.xml part.

  • The showingPlcHdr attribute is an optional setting that sets the default content you include inside the control (text in this example) as placeholder content. So, if the user clicks or taps in the control area, the entire content is selected rather than behaving like editable content in which the user can make changes.

  • Although the empty paragraph mark (<w:p/>) that precedes the sdt tag is not required for adding a content control (and will add vertical space above the control in the Word document), it ensures that the control is placed in its own paragraph. This may be important, depending upon the type and formatting of content that will be added in the control.

  • If you intend to bind to the control, the default content for the control (what’s inside the sdtContent tag) must include at least one complete paragraph (as in this example), in order for your binding to accept multi-paragraph rich content.

NoteNote

The document part attribute that was removed from this sample w:sdt tag may appear in a content control to reference a separate part in the package where placeholder content information can be stored (parts located in a glossary directory in the Office Open XML package). Although document part is the term used for XML parts (that is, files) within an Office Open XML package, the term document parts as used in the sdt property refers to the same term in Word that is used to describe some content types including building blocks and document property quick parts (for example, built-in XML data-bound controls). If you see parts under a glossary directory in your Office Open XML package, you may need to retain them if the content you’re inserting includes these features. For a typical content control that you intend to use to bind to from your app, they’re not required. Just remember that, if you do delete the glossary parts from the package, you must also remove the document part attribute from the w:sdt tag.

The next section will discuss how to create and use bindings in your Word app.

We’ve already looked at how to insert content at the active selection in a Word document. If you bind to a named content control that’s in the document, you can insert any of the same content types into that control.

So when might you want to use this approach?

  • When you need to add or replace content at specified locations in a template, such as to populate portions of the document from a database

  • When you want the option to replace content that you’re inserting at the active selection, such as to provide design element options to the user

  • When you want the user to add data in the document that you can access for use with your app, such as to populate fields in the task pane based upon information the user adds in the document

Download the code sample Add and Populate a Binding in Word , which provides a working example of how to insert and bind to a content control, and how to populate the binding.

Add and bind to a named content control

As you examine the JavaScript that follows, consider these requirements:

  • As previously mentioned, you must use a rich text content control in order to bind to the control from your Word app.

  • The content control must have a name (this is the Title field in the Content Control Properties dialog box, which corresponds to the Alias tag in the Office Open XML markup). This is how the code identifies where to place the binding.

  • You can have several named controls and bind to them as needed. Use a unique content control name, unique content control ID, and a unique binding ID.

Copy
function addAndBindControl() {
        Office.context.document.bindings.addFromNamedItemAsync("MyContentControlTitle", "text", { id: 'myBinding' }, function (result) {
            if (result.status == "failed") {
                if (result.error.message == "The named item does not exist.")
                    var myOOXMLRequest = new XMLHttpRequest();
                    var myXML;
                    myOOXMLRequest.open('GET', '../../Snippets_BindAndPopulate/ContentControl.xml', false);
                    myOOXMLRequest.send();
                    if (myOOXMLRequest.status === 200) {
                        myXML = myOOXMLRequest.responseText;
                    }
                    Office.context.document.setSelectedDataAsync(myXML, { coercionType: 'ooxml' }, function (result) {
                        Office.context.document.bindings.addFromNamedItemAsync("MyContentControlTitle", "text", { id: 'myBinding' });
                    });
            }
            });
        }

The code shown here takes the following steps:

  • Attempts to bind to the named content control, using addFromNamedItemAsync.

    Take this step first if there is a possible scenario for your app where the named control could already exist in the document when the code executes. For example, you’ll want to do this if the app was inserted into and saved with a template that’s been designed to work with the app, where the control was placed in advance. You also need to do this if you need to bind to a control that was placed earlier by the app.

  • The callback in the first call to the addFromNamedItemAsync method checks the status of the result to see if the binding failed because the named item doesn’t exist in the document (that is, the content control named MyContentControlTitle in this example). If so, the code adds the control at the active selection point (using setSelectedDataAsync) and then binds to it.

NoteNote

As mentioned earlier and shown in the preceding code, the name of the content control is used to determine where to create the binding. However, in the Office Open XML markup, the code adds the binding to the document using both the name and the ID attribute of the content control.

After code execution, if you examine the markup of the document in which your app created bindings, you’ll see two parts to each binding. In the markup for the content control where a binding was added (in document.xml), you’ll see the attribute <w15:webExtensionLinked/>.

In the document part named webExtensions1.xml, you’ll see a list of the bindings you’ve created. Each is identified using the binding ID and the ID attribute of the applicable control, such as the following—where the appref attribute is the content control ID:<we:binding id=”myBinding” type=”text” appref=”1382295294″/>.

Important noteImportant

You must add the binding at the time you intend to act upon it. Don’t include the markup for the binding in the Office Open XML for inserting the content control because the process of inserting that markup will strip the binding.

Populate a binding

The code for writing content to a binding is similar to that for writing content to a selection.

Copy
function populateBinding(filename) {
        var myOOXMLRequest = new XMLHttpRequest();
        var myXML;
        myOOXMLRequest.open('GET', filename, false);
            myOOXMLRequest.send();
            if (myOOXMLRequest.status === 200) {
                myXML = myOOXMLRequest.responseText;
            }
            Office.select("bindings#myBinding").setDataAsync(myXML, { coercionType: 'ooxml' });
        }

As with setSelectedDataAsync, you specify the content to be inserted and the coercion type. The only additional requirement for writing to a binding is to identify the binding by ID. Notice how the binding ID used in this code (bindings#myBinding) corresponds to the binding ID established (myBinding) when the binding was created in the previous function.

NoteNote

The preceding code is all you need whether you are initially populating or replacing the content in a binding. When you insert a new piece of content at a bound location, the existing content in that binding is automatically replaced. Check out an example of this in the previously-referenced code sample Add and Populate a Binding in Word, which provides two separate content samples that you can use interchangeably to populate the same binding.

Many types of content require additional document parts in the Office Open XML package, meaning that they either reference information in another part or the content itself is stored in one or more additional parts and referenced in document.xml.

For example, consider the following:

  • Content that uses styles for formatting (such as the styled text shown earlier in Figure 2 or the styled table shown in Figure 9) requires the styles.xml part.

  • Images (such as those shown in Figures 3 and 4) include the binary image data in one (and sometimes two) additional parts.

  • SmartArt diagrams (such as the one shown in Figure 10) require multiple additional parts to describe the layout and content.

  • Charts (such as the one shown in Figure 11) require multiple additional parts, including their own relationship (.rels) part.

You can see edited examples of the markup for all of these content types in the previously-referenced code sample Loading and Writing Office Open XML. You can insert all of these content types using the same JavaScript code shown earlier (and provided in the referenced code samples) for inserting content at the active selection and writing content to a specified location using bindings.

Before you explore the samples, let’s take a look at few tips for working with each of these content types.

Important note Important

Remember, if you are retaining any additional parts referenced in document.xml, you will need to retain document.xml.rels and the relationship definitions for the applicable parts you’re keeping, such as styles.xml or an image file.

Working with styles

The same approach to editing the markup that we looked at for the preceding example with directly-formatted text applies when using paragraph styles or table styles to format your content. However, the markup for working with paragraph styles is considerably simpler, so that is the example described here.

Editing the markup for content using paragraph styles

The following markup represents the body content for the styled text example shown in Figure 2.

Copy
<w:body>
  <w:p>
    <w:pPr>
      <w:pStyle w:val="Heading1"/>
    </w:pPr>
    <w:r>
      <w:t>This text is formatted using the Heading 1 paragraph style.</w:t>
    </w:r>
  </w:p>
</w:body>
NoteNote

As you see, the markup for formatted text in document.xml is considerably simpler when you use a style, because the style contains all of the paragraph and font formatting that you otherwise need to reference individually. However, as explained earlier, you might want to use styles or direct formatting for different purposes: use direct formatting to specify the appearance of your text regardless of the formatting in the user’s document; use a paragraph style (particularly a built-in paragraph style name, such as Heading 1 shown here) to have the text formatting automatically coordinate with the user’s document.

Use of a style is a good example of how important it is to read and understand the markup for the content you’re inserting, because it’s not explicit that another document part is referenced here. If you include the style definition in this markup and don’t include the styles.xml part, the style information in document.xml will be ignored regardless of whether or not that style is in use in the user’s document.

However, if you take a look at the styles.xml part, you’ll see that only a small portion of this long piece of markup is required when editing markup for use in your app:

  • The styles.xml part includes several namespaces by default. If you are only retaining the required style information for your content, in most cases you only need to keep the xmlns:w namespace.

  • The w:docDefaults tag content that falls at the top of the styles part will be ignored when your markup is inserted via the app and can be removed.

  • The largest piece of markup in a styles.xml part is for the w:latentStyles tag that appears after docDefaults, which provides information (such as appearance attributes for the Styles pane and Styles gallery) for every available style. This information is also ignored when inserting content via your app and so it can be removed.

  • Following the latent styles information, you see a definition for each style in use in the document from which you’re markup was generated. This includes some default styles that are in use when you create a new document and may not be relevant to your content. You can delete the definitions for any styles that aren’t used by your content.

NoteNote

Each built-in heading style has an associated Char style that is a character style version of the same heading format. Unless you’ve applied the heading style as a character style, you can remove it. If the style is used as a character style, it appears in document.xml in a run properties tag (w:rPr) rather than a paragraph properties (w:pPr) tag. This should only be the case if you’ve applied the style to just part of a paragraph, but it can occur inadvertently if the style was incorrectly applied.

  • If you’re using a built-in style for your content, you don’t have to include a full definition. You only must include the style name, style ID, and at least one formatting attribute in order for the coerced OOXML to apply the style to your content upon insertion.

    However, it’s a best practice to include a complete style definition (even if it’s the default for built-in styles). If a style is already in use in the destination document, your content will take on the resident definition for the style, regardless of what you include in styles.xml. If the style isn’t yet in use in the destination document, your content will use the style definition you provide in the markup.

So, for example, the only content we needed to retain from the styles.xml part for the sample text shown in Figure 2, which is formatted using Heading 1 style, is the following.

NoteNote

A complete Word 2013 definition for the Heading 1 style has been retained in this example.

Copy
<pkg:part pkg:name="/word/styles.xml" pkg:contentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml">
  <pkg:xmlData>
    <w:styles xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" >
      <w:style w:type="paragraph" w:styleId="Heading1">
        <w:name w:val="heading 1"/>
        <w:basedOn w:val="Normal"/>
        <w:next w:val="Normal"/>
        <w:link w:val="Heading1Char"/>
        <w:uiPriority w:val="9"/>
        <w:qFormat/>
        <w:pPr>
          <w:keepNext/>
          <w:keepLines/>
          <w:spacing w:before="240" w:after="0" w:line="259" w:lineRule="auto"/>
          <w:outlineLvl w:val="0"/>
        </w:pPr>
        <w:rPr>
          <w:rFonts w:asciiTheme="majorHAnsi" w:eastAsiaTheme="majorEastAsia" w:hAnsiTheme="majorHAnsi" w:cstheme="majorBidi"/>
          <w:color w:val="2E74B5" w:themeColor="accent1" w:themeShade="BF"/>
          <w:sz w:val="32"/>
          <w:szCs w:val="32"/>
        </w:rPr>
      </w:style>
    </w:styles>
  </pkg:xmlData>
</pkg:part>

Editing the markup for content using table styles

When your content uses a table style, you need the same relative part of styles.xml as described for working with paragraph styles. That is, you only need to retain the information for the style you’re using in your content—and you must include the name, ID, and at least one formatting attribute—but are better off including a complete style definition to address all potential user scenarios.

However, when you look at the markup both for your table in document.xml and for your table style definition in styles.xml, you see enormously more markup than when working with paragraph styles.

  • In document.xml, formatting is applied by cell even if it’s included in a style. Using a table style won’t reduce the volume of markup. The benefit of using table styles for the content is for easy updating and easily coordinating the look of multiple tables.

  • In styles.xml, you’ll see a substantial amount of markup for a single table style as well, because table styles include several types of possible formatting attributes for each of several table areas, such as the entire table, heading rows, odd and even banded rows and columns (separately), the first column, etc.

Working with images

The markup for an image includes a reference to at least one part that includes the binary data to describe your image. For a complex image, this can be hundreds of pages of markup and you can’t edit it. Since you don’t ever have to touch the binary part(s), you can simply collapse it if you’re using a structured editor such as Visual Studio 2012, so that you can still easily review and edit the rest of the package.

If you check out the example markup for the simple image shown earlier in Figure 3, available in the previously-referenced code sample Loading and Writing Office Open XML, you’ll see that the markup for the image in document.xml includes size and position information as well as a relationship reference to the part that contains the binary image data. That reference is included in the <a:blip> tag, as follows:

Copy
<a:blip r:embed="rId4" cstate="print">

Be aware that, because a relationship reference is explicitly used (r:embed=”rID4″) and that related part is required in order to render the image, if you don’t include the binary data in your Office Open XML package, you will get an error. This is different from styles.xml, explained previously, which won’t throw an error if omitted since the relationship is not explicitly referenced and the relationship is to a part that provides attributes to the content (formatting) rather than being part of the content itself.

NoteNote

When you review the markup, notice the additional namespaces used in the a:blip tag. You’ll see in document.xml that the xlmns:a namespace (the main drawingML namespace) is dynamically placed at the beginning of the use of drawingML references rather than at the top of the document.xml part. However, the relationships namespace (r) must be retained where it appears at the start of document.xml. Check your picture markup for additional namespace requirements. Remember that you don’t have to memorize which types of content require what namespaces—you can easily tell by reviewing the prefixes of the tags throughout document.xml.

Understanding additional image parts and formatting

When you use some Office picture formatting effects on your image—such as for the image shown in Figure 4, which uses adjusted brightness and contrast settings (in addition to picture styling)—a second binary data part for an HD format copy of the image data may be required. This additional HD format is required for formatting considered a layering effect, and the reference to it appears in document.xml similar to the following:

Copy
<a14:imgLayer r:embed="rId5">

See the required markup for the formatted image shown in Figure 4 (which uses layering effects among others) in the Loading and Writing Office Open XML code sample.

Working with SmartArt diagrams

A SmartArt diagram has four associated parts, but only two are always required. You can examine an example of SmartArt markup in the Loading and Writing Office Open XML code sample. First, take a look at a brief description of each of the parts and why they are or are not required:

Note Note

If your content includes more than one diagram, they will be numbered consecutively, replacing the 1 in the file names listed here.

  • layout1.xml: This part is required. It includes the markup definition for the layout appearance and functionality.

  • data1.xml: This part is required. It includes the data in use in your instance of the diagram.

  • drawing1.xml: This part is not always required but if you apply custom formatting to elements in your instance of a diagram—such as directly formatting individual shapes—you might need to retain it.

  • colors1.xml: This part is not required. It includes color style information, but the colors of your diagram will coordinate by default with the colors of the active formatting theme in the destination document, based on the SmartArt color style you apply from the SmartArt Tools design tab in Word before saving out your Office Open XML markup.

  • quickStyles1.xml: This part is not required. Similar to the colors part, you can remove this as your diagram will take on the definition of the applied SmartArt style that’s available in the destination document (that is, it will automatically coordinate with the formatting theme in the destination document).

Tip Tip

The SmartArt layout1.xml file is a good example of places you may be able to further trim your markup but might not be worth the extra time to do so (because it removes such a small amount of markup relative to the entire package). If you would like to get rid of every last line you can of markup, you can delete the <dgm:sampData…> tag and its contents. This sample data defines how the thumbnail preview for the diagram will appear in the SmartArt styles galleries. However, if it’s omitted, default sample data is used.

Be aware that the markup for a SmartArt diagram in document.xml contains relationship ID references to the layout, data, colors, and quick styles parts. You can delete the references in document.xml to the colors and styles parts when you delete those parts and their relationship definitions (and it’s certainly a best practice to do so, since you’re deleting those relationships), but you won’t get an error if you leave them, since they aren’t required for your diagram to be inserted into a document. Find these references in document.xml in the dgm:relIds tag. Regardless of whether or not you take this step, retain the relationship ID references for the required layout and data parts.

Working with charts

Similar to SmartArt diagrams, charts contain several additional parts. However, the setup for charts is a bit different from SmartArt, in that a chart has its own relationship file. Following is a description of required and removable document parts for a chart:

Note Note

As with SmartArt diagrams, if your content includes more than one chart, they will be numbered consecutively, replacing the 1 in the file names listed here.

  • In document.xml.rels, you’ll see a reference to the required part that contains the data that describes the chart (chart1.xml).

  • You also see a separate relationship file for each chart in your Office Open XML package, such as chart1.xml.rels.

    There are three files referenced in chart1.xml.rels, but only one is required. These include the binary Excel workbook data (required) and the color and style parts (colors1.xml and styles1.xml) that you can remove.

Charts that you can create and edit natively in Word 2013 are Excel 2013 charts, and their data is maintained on an Excel worksheet that’s embedded as binary data in your Office Open XML package. Like the binary data parts for images, this Excel binary data is required, but there’s nothing to edit in this part. So you can just collapse the part in the editor to avoid having to manually scroll through it all to examine the rest of your Office Open XML package.

However, similar to SmartArt, you can delete the colors and styles parts. If you’ve used the chart styles and color styles available in to format your chart, the chart will take on the applicable formatting automatically when it is inserted into the destination document.

See the edited markup for the example chart shown in Figure 11 in the Loading and Writing Office Open XML code sample.

You’ve already seen how to identify and edit the content in your markup. If the task still seems difficult when you take a look at the massive Open XML package generated for your document, following is a quick summary of recommended steps to help you edit that package down quickly:

Note Note

Remember that you can use all .rels parts in the package as a map to quickly check for document parts that you can remove.

  1. Open the flattened XML file in Visual Studio 2012 and press Ctrl+K, Ctrl+D to format the file. Then use the collapse/expand buttons on the left to collapse the parts you know you need to remove. You might also want to collapse long parts you need, but know you won’t need to edit (such as the base64 binary data for an image file), making the markup faster and easier to visually scan.

  2. There are several parts of the document package that you can almost always remove when you are preparing Open XML markup for use in your app. You might want to start by removing these (and their associated relationship definitions), which will greatly reduce the package right away. These include the theme1, fontTable, settings, webSettings, thumbnail, both the core and app properties files, and any taskpane or webExtension parts.

  3. Remove any parts that don’t relate to your content, such as footnotes, headers, or footers that you don’t require. Again, remember to also delete their associated relationships.

  4. Review the document.xml.rels part to see if any files referenced in that part are required for your content, such as an image file, the styles part, or SmartArt diagram parts. Delete the relationships for any parts your content doesn’t require and confirm that you have also deleted the associated part. If your content doesn’t require any of the document parts referenced in document.xml.rels, you can delete that file also.

  5. If your content has an additional .rels part (such as chart#.xml.rels), review it to see if there are other parts referenced there that you can remove (such as quick styles for charts) and delete both the relationship from that file as well as the associated part.

  6. Edit document.xml to remove namespaces not referenced in the part, section properties if your content doesn’t include a section break, and any markup that’s not related to the content that you want to insert. If inserting shapes or text boxes, you might also want to remove extensive fallback markup.

  7. Edit any additional required parts where you know that you can remove substantial markup without affecting your content, such as the styles part.

After you’ve taken the preceding seven steps, you’ve likely cut between about 90 and 100 percent of the markup you can remove, depending on your content. In most cases, this is likely to be as far as you want to trim.

Regardless of whether you leave it here or choose to delve further into your content to find every last line of markup you can cut, remember that you can use the previously-referenced code sample Get, Set, and Edit Office Open XML as a scratch pad to quickly and easily test your edited markup.

Tip Tip

If you update an OOXML snippet in an existing solution while developing, clear temporary Internet files before you run the solution again to update the Open XML used by your code. Markup that’s included in your solution in XML files is cached on your computer.

You can, of course, clear temporary Internet files from your default web browser. To access Internet options and delete these settings from inside Visual Studio 2012, on the Debug menu, choose Options and Settings. Then, under Environment, choose Web Browser and then choose Internet Explorer Options.

In this topic, you’ve seen several examples of what you can do with Open XML in your apps for . We’ve looked at a wide range of rich content type examples that you can insert into documents by using the OOXML coercion type, together with the JavaScript methods for inserting that content at the selection or to a specified (bound) location.

So, what else do you need to know if you’re creating your app both for stand-alone use (that is, inserted from the Store or a proprietary server location) and for use in a pre-created template that’s designed to work with your app? The answer might be that you already know all you need.

The markup for a given content type and methods for inserting it are the same whether your app is designed to stand-alone or work with a template. If you are using templates designed to work with your app, just be sure that your JavaScript includes callbacks that account for scenarios where referenced content might already exist in the document (such as demonstrated in the binding example shown in the section Add and bind to a named content control).

When using templates with your app—whether the app will be resident in the template at the time that the user created the document or the app will be inserting a template—you might also want to incorporate other elements of the API to help you create a more robust, interactive experience. For example, you may want to include identifying data in a customXML part that you can use to determine the template type in order to provide template-specific options to the user. To learn more about how to work with customXML in your apps, see the additional resources that follow.

For more information, see the following resources: