Using PowerPivot and PowerView: 311 Requests

The City of Toronto has a 311 service for anyone requiring information or service.  As part of their open data initiative, the City of Toronto has published an Excel file of a chunk of their service requests.  The Excel file has 466,000 records mapping to each call that was made to the 311 service between June 2012 and July 2013.

The following shows how PowerPivot and PowerView and be used to visualize the data.

Loading in the data into PowerPivot

One of the easiest ways to move data into PowerPivot is to cut and paste from Excel. 

image

Copy the table and then go to PowerPivot and hit Paste, and a dialog comes up prompting you to create a new PowerPivot table with your pasted data.

image

 

image

Provide a table name and your PowerPivot table is now automatically created.

Adding Something to Count

We want to be able to count the number of requests, but we’re missing a field that represents the 1 request for each row.  I just added a calculated field called “Number of Requests” and set it equal to 1 for each row.  Now when we aggregate based on the other columns, we’ll use this as our core value to sum in our roll-ups.

image

Cleaning Up the Postal Code Data

In this dataset, most of the service locations are by the first three letters of the Postal Code but in some cases they are the street address.  To simplify, I created a calculated column that wiped out any of the street addresses and set them to BLANK(). 

The formula I used to create the calculated column was:

=If(len([SERVICE REQUEST LOCATION])<4, [SERVICE REQUEST LOCATION], Blank())

image

Cleaning Up the Categories

There are 384 unique Service Request Types in the dataset, which is far too many to be useful for visualization or reporting.  I created a table in Excel that mapped each of these categories to a broader roll-up category.  I then copied and pasted this map back into PowerPivot as a separate Service Request Type Group table.

image

Instead of seeing every micro service request type, we can report using the broader rollup group.

Creating a Hierarchy

I want to be able to do a drill down from Postal Code to Service Request Type Group.  First, I first created a relationship between my two tables.  This relates the Service Request Type to the parent Service Request Type Roll-up Group.

image

The Service Request Type Roll-up Group field and the Postal Code field are in two different tables and PowerPivot doesn’t support creating hierarchies with multiple tables.  The solution is to create a look up field in the 311 Service Requests table as a calculated column using the Related() function.  PowerPivot will even help you with the syntax by finding all possible fields and tables for you!

image

Now we have our field and we can create a hierarchy within the same table.  You create hierarchies in the Diagram View.

image

image

Visualizing the Data through PowerView

Now that our data is prepared, we can visualize the data in PowerView.  As expected, PowerView provides us the list of fields from our PowerPivot model.

image

We can use the hierarchy we created earlier to enable drill down from Postal Code to type of request either through a Matrix or Bar Chart.  I like the Tiles option as a way to scroll through each postal code.

image

Visualizing the Data through PowerMap

The postal codes can be also mapped using Microsoft’s new Power Map (you will need the Power Map Addin as part of the Power BI Preview).   Here is my data mapped by postal code as a bar chart with the stacks divided by request type.

image

Read More

Converting Text to Dates in PowerPivot

One of the most valuable dimensions to report on in any business intelligence context is date/time.  In this example, I have the data feed that I found for the City of Toronto’s Parking Ticket activity and I would like to convert some of the raw data to a real date/time column.

I have loaded the entire data feed into a SQL Server database and imported it into PowerPivot (See my previous blog post on how to import text files).  There are about 14 million rows in my model.

My PowerPivot model looks like this:

image

The raw data has a “date_of_infraction” field which is a text field containing the date of infraction and a “time_of_infraction” field which is a text field containing the time of the infraction.  What we really want is a combined date/time field with a properly structured date time.  If we have this column, it will be extremely valuable in the future for grouping, sorting, filtering, etc.

PowerPivot provides the ability to create a calculated column, so let’s do some basic text parsing to create the appropriate date and date/time columns.

CALCULATING the Infraction Date

We can create an Infraction Date column based on our original “date_of_infraction” field.  Right click and select “Insert Column” to create a calculated column.

image

You can rename the column by clicking on the header of the column and replacing the “CalculatedColumn1” with a more user friendly name.  I used “Infraction Date”.

Next, we need to change the field type to a date/time field.  This is done by clicking on the column and picking from the Data Type in the ribbon.  Strangely, Date isn’t available by default – you have to change it to Text first and then to Date.

image

You can also now change the format of the date to a simple date format, since in this column we won’t have any time values.

image

Now we will use a DAX formula to calculate the date from the original “date_of_infraction” field.  DAX is the language for creating calculated values in Power Pivot and is very similar to Excel formula syntax.

The formula to extract this data looks like this:

=Date(mid([date_of_infraction], 1, 4), mid([date_of_infraction], 5, 2), mid([date_of_infraction], 7, 2))

The Date() function takes 3 parameters – a year, a month, and a day.  The Mid() function is used to extract a sub-string from a field.  In this case, the year comes from the first 4 characters, the month comes from 2 characters starting at the 5th position and the date comes from the 2 characters starting at the 7th position. 

Once the formula is provided, PowerPivot calculates every row in the column and caches the result. 

For my model containing about 14 million rows, this took only about 10-15 seconds!

CALCULATING the Infraction Date/Time

Let’s now create a fully qualified date/time based on a combination of our newly created Infraction Date column and parsing the “time_of_infraction”.  Again, we’ll create a new column and set it to Date but with a long format containing both date and time.

image

Now for the formula.  Let’s first calculate a time value based on our text column.  Here is my first attempt:

=Time(mid([time_of_infraction], 1, 2), mid([time_of_infraction], 3, 2), 0)

However, this generates an error for two reasons: 1) we have some legitimate times that are 3 characters not 4 (e.g. 437 = 4:37am) and 2) we have some bad data where the time is only 1 character.

image

So our revised formula will check the length of the field to properly convert the 3 and 4 character times and to create a time value of 0 for any value less than three characters.

The formula looks like this:

if(len([time_of_infraction])<3, time(0,0,0), if(LEN([time_of_infraction]) = 4, time(mid([time_of_infraction], 1, 2), mid([time_of_infraction], 3, 2),0), time(mid([time_of_infraction], 1,1), mid([time_of_infraction],2, 2), 0)))

We have used essentially the same parsing ideas but with a bit of checking on the length of the field value.

So based on this formula, we now have some calculated time values:

image

However, the date isn’t right – its set by default to 12/30/1899.  We can add the date we calculated from Infraction Date to our calculated column for Date Time and this will add the date and times together to create a properly formatted date:

=[Infraction Date]+if(len([time_of_infraction])<3, time(0,0,0), if(LEN([time_of_infraction]) = 4, time(mid([time_of_infraction], 1, 2), mid([time_of_infraction], 3, 2),0), time(mid([time_of_infraction], 1,1), mid([time_of_infraction],2, 2), 0)))

When we do this, PowerPivot returns the appropriate date/time! 

image

Our date/time column is now available to be used in any Excel PowerView reports or Pivot Tables we want to design in the future! 

Read More

Comparing Different Methods for Importing Text Files into PowerPivot

One of the simple challenges in managing large files is importing text files into Excel.  I found some CSV files from the City of Toronto’s web site that have parking ticket data for the past several years.

Each file has multiple millions of rows in it.  Each file is a CSV file with the same column layout.  In total, there are 5 files (2008, 2009, 2010, 2011, and 2012) and there are approximately 14 million records.

The goal is to have this data available to PowerPivot so we can do some analysis on it.  The following are a few options that I tried to import the data and the results based on my tests. 

As you can see in the different options, there are some key weaknesses in the Power Query and PowerPivot tools that make SQL Server still a much better choice for importing large volumes of data.  The error reporting in SSIS is also generally much better and easier to diagnose.

Hopefully, this will be improved in the final version of Power Query when it is released in 2014.

The times reported below for “Time to Run” was the time it took to execute the task using my Dell M 4700 Laptop.  I’m running the latest Office 2013 components with the latest PowerPivot and Power Query addons.

Option #1: Import into Excel Directly – Total FAIL!

Time to Run: ~15 seconds to import one file out of five

The most obvious option is to simply import the file into Excel.  Unfortunately, that will fail on even one file because there are more than the 1m records allowed in a spreadsheet.

When you just open the file directly, Excel truncates the rows at line 1048576.  If you run the import through Power Query, you will get an error if you try to import the file into the worksheet (see the Load Settings to specify Load to Worksheet or Load to Data Model).

image

Option #2: Run Power Query Import – Partial Fail!

Time to Run: ~40 seconds to import one file out of five

When I ran the Power Query import, it failed on four out of five files with the following error:

image

Let’s keep in mind that this is a Power User designed tool – this is not good!  There is no indication why this failed – the error doesn’t even tell me the row number or column number that it couldn’t process.

When the files worked, the run was reasonably fast – about 40 seconds to import 2.8m records. 

Option #3: Run PowerPivot Import – Partial Fail!

Time to Run: ~40 seconds to import one file out of five

Using the PowerPivot import function, importing the same 2011 file that passed in the Power Query import took about the same time.

Loading the 2008 file failed immediately.

Loading the 2009 file took HOURS to import!  In addition, when I check the number of rows against the same import in SQL Server, there are 2,582,383 rows in PowerPivot and 2,774,553 rows in SQL Server!  What happened to my missing 192,170 rows?! 

I’m really not sure why these files are so completely different – they both have the same column layout, the same basic file format and they are about the same file size.   I’m also concerned about the lack of error reporting here – how would I know that this text file import missed so many records if I hadn’t compared against multiple import methods?

Option #3: Bulk Load into Access 2013 – Partial Fail!

Time to Run: ~4 minutes to import 3 out of 5 files,

I tried importing the data into a new Access 2013 database. 

image

The import worked for 2008, 2009 and 2010 but when I tried 2011 and 2012, I received this error message:

image

My Access database has about 8.4m rows in it for the years 2008, 2009, and 2010.

When I imported the data into Power Query, it took a very long time to create the data preview window – about 30-40 seconds.  Loading the 8.4m rows using Power Query took quite a long time to bring in the Access database – about 6 minutes. 

Option #2: Bulk Load into SQL Server – PASS!

Cost to Run: ~ 3.5 minutes to load all files into SQL Server 2012, ~1.5 minutes to load into PowerPivot using default import from database,

I created an SSIS package to drop in the files into SQL Server.

image

The package imported without any errors.  The SSIS package has the best debugging of all the options as well in that it will tell you the exact line and column where there is an error.  The other advantage with SSIS is you can map your data explicitly to strongly typed columns in the SQL table.

Importing using PowerPivot’s import from database option worked well and loaded the records quickly. 

Read More

Power Query Now Provides Support for Azure Table Storage

Storing data in SQL Server either locally or on Azure is expensive, especially for large datasets.  Microsoft provides a few noSQL alternatives – Table Storage, Blob Storage and HDFS based (e.g. Hadoop) storage.

The economics for Table Storage are incredibly attractive in comparison to SQL Storage, especially for large data sets.  For example, let’s say you had a 10 gigabyte database (not that large) – to host this on Azure SQL it will cost $48.46 / month.  In comparison, you can host the same data in Azure Table Storage for $0.75 / month!  While not appropriate for complex relational databases, it is ideal for large volumes of records that need to be stored. 

In the original Power Pivot addon for Excel, you could access oData based data sources.  Technically, Azure Table Storage is oData but it requires a shared key authentication mechanism that the Power Pivot addon didn’t support.  There have been work around solutions published but they required proxy solutions to supply the appropriate key.

Power Query now solves this problem by supporting Windows Azure Table Storage and Blob Storage natively!

image

Read More

Content Search Web Part is now available in Office 365

When Office 365 was first released, one of the key missing items was the Content Search Web Part – it was only available for on-premise implementations.   As an addition to the Content Query Web Part in SharePoint 2010 (and still available in SharePoint 2013), it provides key features in aggregating content across sites and site collections.  It provides the ability to roll-up lists of documents based on pre-designed queries.

As of October, 2013, the Content Query Web Part is now available to the following subscription plans: Office 365 Enterprise E3 and E4, Office 365 Education A3 and A4, Office 365 government G3 and G4, and Office 365 Enterprise E3 for Nonprofits.

Why is the Content Search Web Part so powerful and useful?  A couple key reasons:

  • Aggregation of content across multiple site collections: the CSWP can provide lists of items that span multiple site collections.
  • Easy filtering of content: the CSWP can filter results to just documents, list items, particular content types, matching metadata, etc. to provide a very specific list of items.
  • Use of search display templates: the CWSP uses search display templates to display items which is a huge improvement over the XSLT hand-coding required in SharePoint 2010.

An Example: News and Announcements

I have two site collections: SearchDemo1 and SearchDemo2.  In each site collection, I created a News and Announcements list. 

image

  • image

Imagine that you have multiple departments, each with their own site collection.  However, you also have a main corporate intranet where you want to display all the news aggregated from various departments in your organization.  The CSWP is how to easily accomplish this task.

Adding the Content Search Web Part

Adding the Content Search Web Part to any page is easy – just edit the page and add the web part!

image

By default, the CSWP displays a list of recent search items.  We can now configure the web part to display just news items and reformat the presentation of items to be more like a list than an image gallery.

image

Configuring the Content Search Web Part

To configure the CSWP, you can edit the page and edit the web part properties either through the ribbon at the top of the page or in the context menu of the web part.

image

image

image

Let’s configure the part to only display news items, to show 2 results and to change the display to be more like a news list and remove the picture on the left.

Changing the Query

Click on “Change query” allows you to specify the items returned in your list.  In this case, we want all news items across all the site collections.

image

  • By selecting “Items matching a content type (System)” under “Select a query” we can specify that we only want to display items of type “Announcements”. 
  • By selecting “Don’t restrict results by app” under “Restrict by app” we will return results across all site collections.

In the search results preview, you can see that only news and announcement items are now returned and the results are from both our site collections.

Changing the Items to Show

Changing the “Number of items to show” will increase or decrease the number of items on the page.  Let’s change this number to 2.

image

Changing the Display

Our default display template isn’t that helpful because news and announcements don’t have images to display.  Let’s change the display template to be more news oriented.  Under item, let’s change this to be “Two lines”.

Note: display templates are easily customizable and you can create new templates using some basic HTML skills. 

image

We can also change the title of the web part under Appearance.  Let’s change it to  “News and Announcements”.

image

The Final Result

Here is the final result of our configuration.  We have now a list of news items that is aggregated across all site collections!

image

This is a simple example but with these configuration options and some custom display templates your ability to show results from all your site collections in home pages, corporate communications intranets, etc. can be easily managed with little to no custom development and easy web part configuration.

Read More

Cloud Pricing is a Race to the Bottom – How You Can Benefit…

Traditionally, hosting and managed services has been sold on fixed price contracts with steep discounts for those organizations who committed for multiple years.  This is how most managed services providers have made their profits – they make them on the back end when they have optimized the service model (e.g. you aren’t calling them everyday with bugs, change requests and outage issues).  In addition, if you are running in a virtual environment, this allows the hosting provider to upgrade hardware with cheaper, more powerful boxes while you are still effectively paying for the expensive hardware you originally bought at the point of purchase.

In a cloud based model, that pricing model is constantly changing.  You pay only for what you use and for every new CPU or GB that you need, you’ll pay the latest prices unless you have committed to a multi-year pricing agreement. 

Right now, the direction for cloud pricing in the industry is clearly in one direction – DOWN. 

Here are some recent examples:

Have a look at difference between Azure pricing at the beginning of 2012 compared to today:

image

Why is this happening?  There are a couple key reasons:

  • Massive competition between the two dominant cloud rivals – Amazon and Microsoft – and up and coming cloud players (Google, Rackspace, Oracle, IBM, etc.)
  • Optimization of the hardware – e.g. the cost of storage, CPUs, RAM, etc. continues to decrease (as it has for 30 years now)
  • Optimization of the data centers – lower cost of power, increased density, increased scalability, etc.

Will this continue?  Simply understanding industry trends over the past twenty years, I think the easy answer is yes.  If you look at the cloud industry, its still in its infancy and as it expands, technology becomes cheaper and data centers grow larger the prices will drop significantly. 

How can you benefit from a market where prices are continually dropping?

  • Only buy what you need today: buying extra storage space, CPUs, etc. especially in a dynamically scaled environment will cost more today than buying 6-12 months from now. 
  • Be careful about multi-year commitments: those 5-10% reductions in price for signing multi-year commitments might sound attractive but if the market is driving prices down by 20-30% then you’ll end up losing. 
  • Avoid vendor lock-in: there will be multiple vendors competing for your business in the years to come.  If Amazon or Microsoft or Google offered you a 20-30% discount to switch could your team migrate all your VMs, databases, etc. easily to another option? 
  • Separate services and hosting: service costs won’t be as elastic as hosting and bundling your managed services and hosting costs may feel attractive up front but may lock in your faster price dropping hosting costs at a higher rate.  Even if you have a single managed services provider that includes both hosting and services, ensure that you can de-bundle the two if hosting prices drop significantly in the coming years.

The strategy is a simple one – use cloud services as dynamically as possible and don’t let fixed price contracts, commitments, and multi-year pricing offers act as to deter you from the long term attraction of a cloud based model.

Read More

Getting Started with Power BI

Microsoft Power BI for Office 365 is the upcoming release of Microsoft’s new BI toolset.  Built on the existing Excel, SharePoint, Power View and Power Pivot tools, it provides some new features that enable more interesting data visualization, improve governance of models and integrate a new data market place that can be used to combine your internal data with external data sources.

Power BI for Office 365

You can preview Power BI today – simply go to the Microsoft Power BI web site and register for a preview account.  NOTE: this site is still in Beta…in the near future these features will simply be available in your existing Office 365 account.

You will also need Excel 2013, the Power Query Addin and the Power Map Addin

Creating Your First Power BI Enabled Team Site

The SharePoint preview environment that is created has a single Team Site and a Public Site.  You can enable the existing Team Site or create a new site specifically for Power BI.  Let’s create a brand new site enable the Power BI functionality.

image

Click on “new site” and give your site a name.  I called mine, “PowerBIDemo”. 

Once the site is created, we can add in the Power BI features by going to Site Contents and clicking on “Power BI”.

image

You can add some sample data provided by Microsoft or you can use your own data.  Let’s add the samples and as they are loading you are provided a video showing you the exciting world of Power BI! 

Once the samples are loaded, you will have two key things added to your site: 1) a document library containing Excel files with Power Pivot models and 2) the Power BI App which is used for visualization of your data. 

There is also a “My Power BI” feature now available that provides you a view of your “favorite reports”.  However, this feature appears to be a mock-up at the moment.

image

Trying Out Power BI

Click on the “Power BI” navigation on your enabled team site to try out Power BI for the first time!  The Power BI app provides featured reports, questions and documents that you can click on to access the visualization of the data provided in the samples.  Clicking on any of them will bring you to the Excel web client that displays the interactive reports provided by Power View.  Enjoy!

image

Read More

Office 365 Changes Maximum File Upload Limit from 250 Megabytes to 2 Gigabytes

Microsoft has been increasing its storage and file limits for most of its cloud based offerings.  This is value added service that users receive automatically as part of their Office 365 subscription.  This was announced in late summer this year but we have customers who weren’t aware of the increase. 

Key changes include:

  • Increased file upload limit from 250 megabytes to 2 gigabytes
  • Increased Site Collection limits from 3,000 to 10,000
  • Increased Recycle Bin duration from 30 days to 90 days
  • Increased SkyDrive storage from 7 gigabytes to 25 gigabytes

The file upload limit was a barrier for some of our clients in particular as they are dealing with large files.  For example, we have oil and gas and engineering clients who are distributing engineering drawings that are hundreds of megabytes and the 250 megabyte limit was a barrier that has now been lifted.

Read More

Microsoft is now Cloud First

Microsoft has moved its focus to the cloud and is actively promoting its cloud based services over its on premise software. 

Microsoft’s “Cloud First” approach means that if you are running Dynamics CRM online, SharePoint 2013 as part of Office 365 or Azure services, you will be the first to get updated software in order to promote deployment to the cloud.

In our business, we see many customers adopting Office 365 and Dynamics CRM through cloud services as part of a broader IT strategy to reduce complexity, require less specialized expertise in house and to put technology more easily in the hands of business.  This is especially true with customers of ours who have distributed businesses as the cloud offers a turn key solution for those customers with many offices. 

A good example is Microsoft’s Power BI offering, which is current in Beta but expected to be released sometime next year.  The beta version is ONLY available through Office 365 and the release will be made first available to Office 365 customers.  It isn’t clear yet whether Power BI will ever be fully available for on premise customers.

So what does that mean for customers who have a specific need for an on premise solution?  This isn’t clear yet but it likely means at a minimum that organizations with on premise installations will wait longer for releases and that some features may not be available at all as Microsoft promotes cloud based models.

Read More

Creating a Web Site on Azure

In supporting the Microsoft cloud vision, I decided to create this blog using Microsoft’s Azure platform.  It was surprisingly easy.    Here are the key steps that I went through to create this blog.

Create an Azure Account

Creating an azure account is easy – simply register here.  You can register with a Microsoft account (e.g. as your outlook.com or Hotmail.com account) or you can create an account based on your organizational account.  For this web site I just used by personal Hotmail account.

Create a Subscription

You will need a subscription with a credit card attached to it.  Your credit card is charged as you use Azure services and you can monitor your charges through the Azure portal. The most basic subscription is a “Pay as you Go” subscription that charges you based on your usage every month.

There are some ways to get free Azure subscriptions.  Microsoft provides you a trial subscription to get you started that you can activate.  If you are an MSDN subscriber, you can also register for a limited subscription (it’s about $160 in usage benefits per month).  Your organization may also have subscription benefits as part of your Enterprise Agreement so check with your corporate IT department as many organizations have the benefit but aren’t fully utilizing it.

Accessing the Azure Portal

Microsoft provides a portal for accessing all your Azure services in one spot – its quite user friendly (better than Amazon’s in my opinion) and easy to spin up services.

Creating the Web Site

In the Azure Portal, click on Web Sites and then the new button at the bottom left hand corner.

image

The menu provides three options for creating a new web site:

  • Quick Create: creates a new instance with nothing in it. 
  • Custom Create: creates a new instance with a basic file system and a database (either SQL Server or mySQL)
  • From Gallery: creates a web site based on a preconfigured template

Choosing a Web Site Template

Let’s choose one of the existing templates.  This blog was created using the WordPress template, which creates a fully functional WordPress site complete with a mySQL database.

There are lots of options here – mainly open source CMS platforms including PHP/mySQL based platforms such as Drupal and WordPress as well as .NET based platforms such as Orchard and Umbraco.  If you are looking for a commercial option, Kentico CMS is an enterprise grade CMS that you can spin up on Azure and then add in a commercial license key once you have spun up the instance.

image

In my case, I chose WordPress because it’s a great blogging platform and I’m familiar with it as a site administrator.

Provide a URL

The next step is to provide a URL, create or provide an existing database, choose a data center location and attach your new web site to a subscription.

image

Once you have provided the appropriate configuration information, your instance will be spun within the Azure cloud.

Accessing Your New Web Site

Within the Azure portal, you will now have a web site created that you can access.  Clicking on the web site will bring you to your dashboard that will show you all the ways you can access your new site including:

  • SiteURL: access the main web site directly.
  • FTP: access the file system via FTP.
  • FTPS: access the file system via FTPS.

You can also access the underlying database as well through the portal.

Using WebMatrix to Access your Web Site

Microsoft also provides a desktop application called WebMatrix that provides you the ability to update all the files on your web site.  It provides a view into your web site file system, a file transfer utility and a code editor all within a single desktop tool.

Scaling Out Your Website

By default, your web site is created in “Free” mode which means that you have limited quotas and you cannot attach a custom domain name to it.  In order to move your web site to production quality, you can switch it to either “Shared” or “Standard”.

In Azure, “Shared” means that you are running in a small sliced VM with a bunch of other shared web sites.  “Standard” means that you have your own dedicated core for managing all your web sites.  Shared is charged per web site, Standard is charged per core.

To change your web site from free to either Shared or Standard, go to your web sites dashboard and click on Scale and you can then switch your web site to either option.

Registering a Custom Domain Name

Registering a custom domain name requires you to point a CNAME record to authorize Microsoft to be your domain name’s target.  Configuring this is done through your domain name provider.  In my case, I used GoDaddy to register the domain name so within their portal, I created a CNAME record back to azure.

Once this is registered, you can go into the Azure Portal and add the domain name to your created web site.

image

It’s Really That Easy…

Creating a new web site is really just that easy.   For the cost of a few dollars a month (more on the economics of Azure later) you can have a dynamically scaled web site that is fully managed by Microsoft.   The power of the cloud means you can also shut down your web site, destroy it, scale it or add additional web sites with a few clicks and no contracts!

Read More