How Much Data Can PowerPivot Really Manage? How about 122 Million Records?!

Excel 2013 is an amazing tool in terms of its increased scalability and its ability to handle large data sets.  Combined with PowerPivot, PowerView and Power Query, Excel 2013 is now Microsoft’s leading tool for business intelligence.

What is amazing is how much data Excel 2013 can process in memory – no need for cubes, data warehouses, centralized servers, etc.  if you can just load some large data files straight into Excel. 

When we hear about “Big Data”, how big is big?  Just using Excel 2013 and some raw CSV files, I can load quite a lot of data. 

14 Million Records

Earlier I wrote a blog post where I loaded approximately fourteen million records from the City of Toronto representing every parking ticket issued in the past several years.  Using this dataset loaded into PowerPivot, I can use PowerView to chart the volume of parking tickets by day of the week, by year or by month.

image3

Loading up 14 million records in Excel 2013 takes about 3 minutes to completely refresh the data source from the original data source in Power Pivot.  Once the data is loaded, the Power View refresh is fast enough to be highly usable.

122 Million Records!

In looking for large data sets, I found the publically available data of every airline delay in the United States from 1998 to 2008.  This data is available from the Bureau of Transportation Statistics.  I downloaded all of the CSV files and dropped them into a single directory.  There are approximately 122 million records and over 11 gigabytes of data across the entire set of files.

This dataset has been thought of as “Big Data” historically and has been used for testing both generic statistical analysis techniques as well as specifically for trying to optimize the prediction of delays based on factors such as weather, origin of the flight, etc.

As an experiment, I tried loading the entire dataset into Excel 2013.  To load multiple files into Power Query, you can choose a folder location and pick up all the files in that folder.

If you then click on the expand arrows to the right of the Content column, you can select the actual contents of the files to be imported instead of just the file metadata.

You can then Apply and Close and let the import begin!

image5

Here are some interesting performance stats while my machine was chugging away:

image

image

image

As illustrated by the graphs, performance is dependent on two key things: 1) available RAM and 2) available disk IO.  CPU is barely being taxed.  This intuitively makes sense – essentially this is a very large loading job from disk into RAM.

The results are that it worked – I was able to import 122 million records into Excel.  Here is my PowerPivot table with 122 million records!  From beginning to end, it took approximately 1 hour to load the entire data set into memory.

Is the Data Usable?

I was able to load the data, but is usable?  Here are a few quick tests to see if I could actually manipulate the data in some meaningful way.  While not an exhaustive test, here are a few basic usability tests I tried to determine whether Excel could do some basic reporting on the dataset.

Loading and Saving the Excel File

After loading up all this data, could I actually save the Excel file and could I reload the file from disk?

Saving the file to disk was quite fast – it took about 2 minutes to save the file.  The file itself is about 2.2 gigabytes – there must be some compression going on here because the original files were more than 11 gigabytes.

Loading the file was very fast – it took seconds.  However, when I tried later after my computer went to sleep and I restarted Excel, it was significantly slower – about 2-3 minutes.

Linking Data to Other Tables

As part of the original dataset, there was also supplied three CSV files that had data for Airports, Carriers, and Planes.  I loaded each of these into my model.  In addition, I added a Date table from Azure Marketplace.  I linked these into my PowerPivot model as relationships.

Most of these were quite quick to create – within a minute.

Create a Calculated Column

The Airline Delays dataset provides columns for Year, Month, DayOfMonth, and DayOfWeek.  So I created a column to convert the Month as a number to the Month name (e.g. 2 = February).  I also created a column to extract the hour from the Departure Time column.

Running these basic calculations was pretty fast – about a minute.

Creating a Power View

I tried to create a Power View sheet with the Delay Hour field and the DepDelay field to show how departure minutes are related to the time of the day.  This was VERY slow – it took minutes to refresh the data every time I made a change.

This is where having a pre-aggregated cube in SQL Server would make things much faster – having everything aggregating dynamically is slow when dealing with volumes this large.

But it works!

Conclusion: Pushing the Limits of an In-Memory Based Database

14 million records was useable – I can manipulate the data, save the spreadsheet, load it into a Power View and it works quite well.  It’s not fast but it’s usable.  I could imagine doing analysis against this data pretty easily.  I can do aggregation really easily against dates to filter the data by year, month, day, etc.

At 122 million records, Power Pivot shows its limits.  As everything is loaded into memory dynamically with no pre-indexing, once my laptop uses up its available RAM, it becomes slow to process the data.  Working with PowerView also becomes challenging because every time you change the configuration of a chart or view, it takes minutes to recalculate the result.

  • Pingback: Microsoft Trends | When a Cube is Still Better Than Power Pivot()

  • Nag, have a look at saasabi.com, we provide an Azure based MOLAP engine that would greatly assist you. It can be used with Power BI or PowerView as the front-end, alternativly the likes of Tableau, Spotfire or Qlik Sense can also be used against our engine. But Power BI and Saasabi are a strong, and cost effective, combination.

  • Pingback: How Many Rows can Power BI Process - Big Data on Desktop()

  • Pingback: How Many Rows can Power BI Process - Big Data on Desktop()

  • PaulThorpe

    Three different scopes.
    PowerPivot is a database (tabular) that lives in excel.
    Powerview and PowerBI are basically the same tools except powerview is in excel and PowerBi is in SharePoint as web deployed solution. PowerBi can use powerpivot data sets saved to sharepoint which technically are converted to SSAS solutions by the server.

  • Winston Lui

    Hi Paul: just read your article. currently, we are using excel 2010. we have a 188 MB excel file sales file that constantly crashes, and gives messages as “cannot complete this task with available resources” despite repeatedly rebooting, The file links with our ERP system (via MS Query) to download sales numbers and is processed through a pivot table. Do you have any suggestions, I was thinking of upgrading to MS Excel 2016 and then running all our data through Power Pivot? I am still a bit of a newb in terms of Power Pivot, but do you think this will solve our processing speed and ability to handle the large volumes of data without crashing? Or do you recommend going a different route

    • Daniel Večeřa

      You can expect definitely much better performance with Power Pivot! It still may lag a bit, but it definitely won’t crash. However, be aware that the performance still won’t match a specialised Analytics Solution which allows for data storing and processing. If you have budget for that, google some SaaS Analytics solutions.

  • Daniel Devolder

    If you want to use pivot table for analysis when you have more than one million rows / lines, you need to use OBDC. If you have CSV files and don’t want to use an external database application, like Access or SQL, you can have a look at my solution:

    https://trucsdiversblog.wordpress.com/2016/06/16/how-to-add-equations-numbers-when-using-microsoft-word/