Create Microsoft Office documents with XML

This blog post will demonstrate how Microsoft Office documents can be created as XML files that can be downloaded, with the use of LINQ to XML and knowing how the Office xml schemas are defined. A simple Excel document will be created as an example.

In a recent project there was a request to be able to save data presented in a web page as a downloadable Excel file. A simple solution that would avoid the use of additional assemblies is to create an xml file following the open xml standards defined for Microsoft Office Documents. With this approach any kind of office document that supports open XML can be created in a very easy way with the use of LINQ to XML. 

To have a better understanding and find in detail what can be achieved with open xml you can check the Microsoft Office 2003 Edition XML Schema References. You also need to understand how XML is structured and have a basic understanding of LINQ so that you can find elements inside elements in the xml structure.

The basic XML structure shown below is what will be created.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?mso-application progid="Excel.Sheet"?>
	<OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office" />
	<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel" />
	<ss:Worksheet ss:Name="Sheet 1">
				<ss:Data ss:Type="String">Sample value in A1</ss:Data>
				<ss:Data ss:Type="String">Sample value in B1</ss:Data>
				<ss:Data ss:Type="String">Sample value in C1</ss:Data>

And here is how it will be shown when opened in Excel.
Excel File

The code will be presented in a way that can be distributed in more internal functions so as to achieve better handling if more options are required. In the beginning of the method we define the namespaces for the XML plus we create the first part of the XML which is more or less standard code.

protected void SaveToOpenXmlDocument()
	//Define schemas for the XML
	XNamespace x = "urn:schemas-microsoft-com:office:excel";
	XNamespace ss = "urn:schemas-microsoft-com:office:spreadsheet";
	XNamespace x2 = "";
	XNamespace c = "urn:schemas-microsoft-com:office:component:spreadsheet";
	XNamespace xsi = "";
	XNamespace o = "urn:schemas-microsoft-com:office:office";
	XNamespace html = "";
	//Beginning of XDocument
	XDocument doc = new XDocument( new XDeclaration("1.0", "UTF-8", "yes"),
	new XProcessingInstruction("mso-application", "progid=\"Excel.Sheet\""),
	new XElement(ss + "Workbook",
	//Define the Workbook node with all the required attributes
	new XAttribute("xmlns", ss.NamespaceName),
	new XAttribute(XNamespace.Xmlns + "xsi", xsi.NamespaceName),
	new XAttribute(XNamespace.Xmlns + "x", x.NamespaceName),
	new XAttribute(XNamespace.Xmlns + "x2", x2.NamespaceName),
	new XAttribute(XNamespace.Xmlns + "ss", ss.NamespaceName),
	new XAttribute(XNamespace.Xmlns + "o", o.NamespaceName),
	new XAttribute(XNamespace.Xmlns + "html", html.NamespaceName),
	new XAttribute(XNamespace.Xmlns + "c", c.NamespaceName)));

Below two optional nodes are added that can be totally omitted. DocumentSettings can have as child nodes information like title, subject, author,company etc and ExcelWorkBook contains properties for the created workbook as child nodes. Another parent node, Styles, can be added here that can handle styles that are applied inside the document.

XElement DocumentProperties = new XElement(o + "DocumentProperties",  new XAttribute("xmlns", o.NamespaceName));
doc.Elements(ss + "Workbook").Last().Add(DocumentProperties);

XElement ExcelWorkBook = new XElement(x + "ExcelWorkbook", new XAttribute("xmlns", x.NamespaceName));doc.Elements(ss + "Workbook").Last().Add(ExcelWorkBook);


Now we can start writing the main code that includes worksheet and table. Here we can name our worksheet and can define more that one in the same level of node with a loop.

XElement WorkSheet = new XElement("Worksheet", new XAttribute(ss + "Name", "Sheet 1"), new XElement("Table"));
doc.Element(ss + "Workbook").Element(x + "ExcelWorkbook").AddAfterSelf(WorkSheet);

Finally a table can have rows and rows can have cells.By adding these final details the document is almost ready.

XElement Row = new XElement("Row",
 new XElement("Cell", new XElement("Data", new XAttribute(ss + "Type", "String"), "Sample value in A1")),
 new XElement("Cell", new XElement("Data", new XAttribute(ss + "Type", "String"), "Sample value in B1")),
 new XElement("Cell", new XElement("Data", new XAttribute(ss + "Type", "String"), "Sample value in C1")) );

While the document is now ready a loop check is needed to eliminate any xlmns="" elements inside the document. This can be done easily like so.

foreach (XElement xel in doc.Root.DescendantsAndSelf())
 if (xel.Name.Namespace == string.Empty)
 xel.Name = ss + xel.Name.LocalName;

Finally we have the option either to save the file or download it. Here we will show how it can be downloaded. Mind that since we are using Response the first line containing the XML declaration is omitted and thus the document is unreadable. However with the following trick this can be fixed. 

Response.AddHeader("Content-disposition", "attachment; filename=MyExcelDoc.xml")
Response.ContentType = "application/";
Response.Write(string.Concat(doc.Declaration.ToString(), "\r\n", doc.ToString()))

To sum up this is a very neat way to create an excel document and while this example is very simple the xml schemas provide information to achieve detailed and rich excel documents. This can also be a guide to create other types of office documents as it presents a standard method and requires little deviation in the way of main information is presented.