Managing Data with Windows Azure Table Storage and PowerPivot

Windows Azure Table Storage is a no/low-sql, low cost storage platform on Windows Azure.  It allows you to store entities in a table like structure.  Entities can be composed of name value fields that map to columns in the table.  Entities must have a row key that unique identifies the record and a partition key that identifies how the table can be logically segmented.  Searching in a single partition is much faster than across multiple partitions so choosing a partition that corresponds to a key searching field will improve performance.

The key advantage of Windows Table Storage compared to traditional SQL is cost – storing data in Windows Azure Table Storage is $0.074 per GB in Azure vs. $10.54 for the first GB of storage in SQL Storage on Azure.  For those quasi-big data scenarios where you have millions of rows of relatively simple data, storing them in Windows Table Storage is an attractive option.

One of the key new features of the Power BI add-in for Excel is its ability to import Windows Azure Table Storage data so you could store a bunch of data in the cloud and download it to Excel Power Pivot for analysis.

As an example, I imported about 466,000 records of public data from the City of Toronto.  The total storage is less than 1 gigabyte so to store this on Azure permanently will cost me less than 10 cents a month!  Downloading the entire table to Excel costs another 10 cents in transaction costs. 

Getting Started

The first thing you need is a Windows Azure account and to create a storage account.

image

Manually Moving Data into Azure Table Storage

There is a really nice utility on Codeplex for managing objects in Azure Table Storage called the Azure Table Explorer.  You can download it here.

Using the Azure Storage Explorer, you can create your own tables and add, update and delete entities.  You can also search them as well.

Bulk Importing Data Into Azure Table Storage

There is no really good tool available for bulk uploading data into Azure Table Storage.  From that perspective, it currently is a developer’s tool and to move data in and out of it requires development effort.

In researching, I was able to pull together a few articles and build my own importer.  More detail on this solution will be provided in a future blog post.

Viewing the Data in Excel Power Pivot

Using the new Power Query tool for Excel, you can now pull down data from Windows Azure Table Storage.  

image

One note in editing the query to download the data: by default, the content of the entity (e.g. all the fields besides the row key, partition key, and timestamp) are not downloaded.  If you edit the query in Power Query before downloading, it will look something like this:

image

In the content column, there is an expansion button on the right that if you click on it you can select the columns in the entity:

image

When you do this it will now download addition fields from your entity into PowerPivot.

image

You can now download the entire dataset from Windows Azure Table Storage.  In this case, refreshing the entire list of 466,089 rows takes me about 12 minutes to download.  I also noticed that my original 25 megabyte CSV file is now 350 megabytes to download from Windows Azure thanks to the verbose OData format being used to encode it as it is transmitted.

A faster approach to downloading the entire list is to apply a filter against the partition key – for example, I used a filter of “M1M” for the partition key.  There are 4,750 rows and just downloading these to Excel only takes about 5-10 seconds!

Once I have downloaded the list, I have a local cached copy that I can use in PowerView or PowerPivot to do analysis, graphing or dashboard development.