EXI for .Net

EXI is a binary representation for xml data, and is a standard defined by the W3C. The name stands for “Efficient XML Interchange”, and apparently is the (semi?)new hot thing in making xml more compact and efficient.

Common approaches to compressing xml typically take the route of creating some non-standard method of compacting the data to remove or reduce the use of all those huge tags. For the uninitiated (not you!), here is a quick example:

<SomeReallyBigTag>
<NestedTagInside>
1.234
</NestTagInside>
</SomeReallyBigTag>

So all that extra beginning, ending, and nested tags really just contain a single little value of 1.234. Repeat this thousands or millions of times, and it’s easy to see how xml files are way larger than they need to be, often by an order of magnitude.

Many web-based systems nowdays just use gzip to compress xml data, and this works, but it requires a relatively large amount of cpu usage. Other approaches seek to reduce the size or usage of all the tags in xml, but result in some new proprietary format… getting rid of proprietary formats was the reason we went to xml in the first place!

In 2011, W3C released the final standard for EXI, so now there is a new standard which offers an open, highly optimized binary representation of XML data (see the spec here: http://www.w3.org/TR/exi/#encodingDatatypes ). Since it is a published and open standard, it doesn’t suffer from being proprietary. It is also fully compatible with the xml standard, so it directly converts from and to any xml file. But best of all- without requiring a cpu-intensive compression routine, it can often meet or surpass the level of compression offered by gzip in the past.

My own reason for being excited about this standard is, in the past I’d looked into using a relatively open binary serialization standard called protocol buffers as a candidate for efficiently storing application data files. These can (and do) work, but it would be much nicer to have the files also have a direct mapping to an xml equivalent and all the associated support for schemas, nested structures, validations, and interoperability.

So, my questions:

-.Net… where is the implementation for it? I’m looking forward to trying this on some xml based projects, but I have only seen one commercial implementation of EXI in .Net so far. I see a java one out there.

-HTML/SGML – can it be applied to these as well?

-support, best practices for webservices?

-can exi e used over ajax calls?

Update: skeletal project added here for a .net implementation- http://exi.codeplex.com/

Update 2: a .net and java exi project is on sourceforge here: https://sourceforge.net/projects/openexi/

 

Include ETag inside html – idea

Browser caching is a deep topic, but I had an idea today. 

One tradeoff used in browser-side caching is whether to use an expiration header or a etag header to control the browser cache. 

Expiration headers can tell the browser “keep this file until this this date/time, and then you can retrieve it again”. 

Etags send a custom hash or number to the browser and basically say “check with the server if the file is still this version, send the new one back if not”. 

The etag header allows updating files almost immediately in the browser, since the browser still sends a request everty time to chec if the file has changed. 

Expiration headers don’t have this fast turnaround- if you set the date way in the future, you won’t have a way to tell the browser “hey, this file changed, please re-download it!”. 

So my idea is this: why not combine the benefits of the two, and include the etag value for linked files within the tags in the html? 

For instance, when I retrieve index.html, and it include a logo.png file, we could make the tag look like this: 

<img src=”/logo.png” etag=”5E451FFA498″ />

Then the browser can check this etag against the version already in the local cache, and does not need to make a request to the server for each file to re-check the etag on the files.