Comparing Different Methods for Importing Text Files into PowerPivot

One of the simple challenges in managing large files is importing text files into Excel.  I found some CSV files from the City of Toronto’s web site that have parking ticket data for the past several years.

Each file has multiple millions of rows in it.  Each file is a CSV file with the same column layout.  In total, there are 5 files (2008, 2009, 2010, 2011, and 2012) and there are approximately 14 million records.

The goal is to have this data available to PowerPivot so we can do some analysis on it.  The following are a few options that I tried to import the data and the results based on my tests. 

As you can see in the different options, there are some key weaknesses in the Power Query and PowerPivot tools that make SQL Server still a much better choice for importing large volumes of data.  The error reporting in SSIS is also generally much better and easier to diagnose.

Hopefully, this will be improved in the final version of Power Query when it is released in 2014.

The times reported below for “Time to Run” was the time it took to execute the task using my Dell M 4700 Laptop.  I’m running the latest Office 2013 components with the latest PowerPivot and Power Query addons.

Option #1: Import into Excel Directly – Total FAIL!

Time to Run: ~15 seconds to import one file out of five

The most obvious option is to simply import the file into Excel.  Unfortunately, that will fail on even one file because there are more than the 1m records allowed in a spreadsheet.

When you just open the file directly, Excel truncates the rows at line 1048576.  If you run the import through Power Query, you will get an error if you try to import the file into the worksheet (see the Load Settings to specify Load to Worksheet or Load to Data Model).

image

Option #2: Run Power Query Import – Partial Fail!

Time to Run: ~40 seconds to import one file out of five

When I ran the Power Query import, it failed on four out of five files with the following error:

image

Let’s keep in mind that this is a Power User designed tool – this is not good!  There is no indication why this failed – the error doesn’t even tell me the row number or column number that it couldn’t process.

When the files worked, the run was reasonably fast – about 40 seconds to import 2.8m records. 

Option #3: Run PowerPivot Import – Partial Fail!

Time to Run: ~40 seconds to import one file out of five

Using the PowerPivot import function, importing the same 2011 file that passed in the Power Query import took about the same time.

Loading the 2008 file failed immediately.

Loading the 2009 file took HOURS to import!  In addition, when I check the number of rows against the same import in SQL Server, there are 2,582,383 rows in PowerPivot and 2,774,553 rows in SQL Server!  What happened to my missing 192,170 rows?! 

I’m really not sure why these files are so completely different – they both have the same column layout, the same basic file format and they are about the same file size.   I’m also concerned about the lack of error reporting here – how would I know that this text file import missed so many records if I hadn’t compared against multiple import methods?

Option #3: Bulk Load into Access 2013 – Partial Fail!

Time to Run: ~4 minutes to import 3 out of 5 files,

I tried importing the data into a new Access 2013 database. 

image

The import worked for 2008, 2009 and 2010 but when I tried 2011 and 2012, I received this error message:

image

My Access database has about 8.4m rows in it for the years 2008, 2009, and 2010.

When I imported the data into Power Query, it took a very long time to create the data preview window – about 30-40 seconds.  Loading the 8.4m rows using Power Query took quite a long time to bring in the Access database – about 6 minutes. 

Option #2: Bulk Load into SQL Server – PASS!

Cost to Run: ~ 3.5 minutes to load all files into SQL Server 2012, ~1.5 minutes to load into PowerPivot using default import from database,

I created an SSIS package to drop in the files into SQL Server.

image

The package imported without any errors.  The SSIS package has the best debugging of all the options as well in that it will tell you the exact line and column where there is an error.  The other advantage with SSIS is you can map your data explicitly to strongly typed columns in the SQL table.

Importing using PowerPivot’s import from database option worked well and loaded the records quickly.