Power BI Increases File Size Limit from 250 MB to 1 GB

Microsoft has announced that they are increasing the file size limit for files to 1 GB from 250 MB.  This limit impacts any data source which is not direct query enabled.  It also impacts any Power BI Desktop file that is published to the Power BI service.

image

We know many of you are doing analysis that involves large amounts of data and have large data models that you want to put into Power BI to share with others. After hearing your requests and seeing what you want to accomplish, we increased the file size limit for both Excel workbooks and Power BI Desktop files to 1 GB.

When you publish a PBIX file from Desktop, publish an Excel file to Power BI service, or upload an Excel or PBIX file through Get Data in the service, you won’t get a file size limit message unless the file is greater than 1 GB. Additionally, the data will continue to refresh as long as the data model size does not grow beyond 1 GB. After that, refreshing the dataset will give an error message letting you know your dataset is too large, and it will fail to refresh until you filter the data model back down under 1 GB.

Read More

Using Power BI to Visualize WordPress Blog Posts

Microsofttrends.com runs on WordPress, and I have been working with Power BI to show some analytics on my posting behavior.  Fetching the data can be done using a JSON plugin (I use the JSON API plugin but there is also now a new WordPress REST API plugin) which will provide all your blog posts as a JSON data source.  

Accessing WordPress Blog Posts

The first step is accessing the JSON data – in the latest version of Power BI desktop there is a dedicated JSON data source that you can provide the URL for your JSON feed and it will fetch the data into PowerQuery for further manipulation.  Here is what the results of my query look like in Power BI Desktop.

image

As a table, this just provides the count of the posts but not the posts themselves – not very interesting.  However, if I click on the List of posts then I can drill down into the JSON and fetch the list of posts themselves.

Each post is represented as a record.  You can convert this now to a table by clicking on the “To Table” function and then clicking on the expand icon to the right of column 1.  When you do this you are prompted to select the fields that you want to select as part of your table.

image

If you click Close and Apply this will load all your blog posts into a table. 

Adding Date Filtering and Drill Down

Each post as a posting date.  We can create a calculated table that represents all the possible dates and add in year, month, and day columns that are nicely formatted to support drill down by year, month or day.

Start by creating a calculated table by clicking on “New Table” with the following formula:

DateTable = CALENDAR (MINX(‘Blog Posts’, [date]), NOW())

This will create a table of all the possible dates from the earliest blog post to today.

Now we need to add a bunch of calculated columns to represent year, month, day, day of the week, etc.  Here are some formulas that you can use – each line is a calculated column that you add to the table.

DateTable = CALENDAR (MINX(‘Blog Posts’, [date]), NOW())

Day = Format([Date],”DDDD”)

DayofMonth = DAY([Date])

DayofWeek = WEEKDAY([Date],2)

ISOWeekofYear = IF ((([OrdinalDate] – [DayofWeek] + 10)/7) >= 53, 1, TRUNC(([OrdinalDate] – [DayofWeek] + 10)/7))

LongDay = Format(DateTable[Date], “dddd”)

Month = FORMAT([Date], “MMM”) & ” ” & [Year]

MonthDate = Date(DateTable[Year], DateTable[MonthofYear], 1)

MonthofYear = MONTH([Date])

OrdinalDate = DATEDIFF([Year]&”,1,1″,[Date],DAY)+1

QuarterofYear = ROUNDUP (MONTH([Date])/3,0)

Year = YEAR([Date])

In addition, you will need to change the sort column for columns such as LongDay to the integer or date representations so that your columns sort in the appropriate order and not in alphabetical order.

Once you have your date table set up, you can create a relationship between the date column in the blog posts table and the DateValue column.  Depending on the format of your original date column, you may need a calculated column that reflects the identical format so that the ids match.  In my case, I created a calculated column called “DateOnly” that has the same formula as the DateValue column and linked these together through the relationship management dialog.

image

Once you have this set up, you can now create date driven visualizations such as the number of posts per month or per day of the week!

image

image

Adding Metadata Filtering

One of the key features of a blog are tags and categories that act as metadata for filtering.  The JSON provides these within its structure so can we extract the data and use it for creating some dashboards by category or tag?

If we go back to our original Power Query, we can create a new table that contains a denormalized set of keys that provide the link from the blog post ID to the tag values (the same approach will work for categories, authors, etc.).   When we get the list of posts, we do the same as above and expand the records but in this case we only need two columns – the blog post ID and the tags.

image

Then, we need to expand the list of tags so that we can fetch the tag ID and the title of the tag for each record.

image

Our new table should look something like this:

image

Once we load up this as a table, we can now create some interesting dashboards based on the tag frequency.  For example, here is the frequency of posts per tag:

image

Here is the frequency of tag use by month as a visualization.

image

Here is a comparison of the use of the tag “Azure” with the use of the tag “Power BI” over time.

image

Once you have the data there are lots of possible ways to explore and analyze the data!

Read More

Latest Version of Power BI Desktop Provides a Number of Enhancements

Microsoft has just launched the latest version of Power BI Desktop, the tool for designing reports and dashboards for the Power BI service.  The latest version can be downloaded here.

The list of new features and enhancements is too long to provide in full here – see this blog article for the complete list.  Here are a few of the highlights.

View Data Behind Visual and Export to CSV

You can now interact with any visual in Power BI and go into “focus mode” which maximizes the visual.  Within this mode, you can then view a split screen that shows the underlying data in a table format and export the table to CSV.

Plot Maps with Longitude and Latitude

You can now plot maps using raw Long and Lat data instead of a location text field.

Improved Support for KPIs

There are a number of new features related to KPIs including:

  • KPI consumption from SQL SSAS multidimensional models
  • Support for display folders when using SSAS exploration mode
  • New KPI Visual

The new KPI visual provides the ability to see a trend line over time as well as a current indicator.

7.5

Support for User Defined Hierarchies

One of the key missing features in Power BI that existed in Power Pivot in Excel, you can down create your own drill down hierarchies.  Similar to Power Pivot, you can create a hierarchy and the add fields from the model to the hierarchy to support drill up/down within your models.

9b

Live Query Support for Oracle and Teradata

Similar to existing support for live query of SQL databases, Power BI will now support live query (e.g. not cached in the report) of Oracle and Teradata data sources.

10

Read More

New Power BI Desktop Features Just Released

Microsoft has just released its December update of Power BI updates to both Power BI Desktop and the Power BI Service.  Here are some of the latest features included in the release.  These are just some of the features – see the link above for the complete list.

New Data Entry Feature

One of the key challenges in any BI solution is seems to be one piece of data that isn’t in a system – it needs to be data entered by people.

Power BI Desktop now supports the ability to copy and paste data into a table or to dynamically create a table on the fly and data enter the data into it. 

Changing Decimal Places and Text Sizes in Visuals

In Power BI Desktop, you can now format the number of decimal places and text size in visuals.

Improved Layout of Visuals and Format Painter

One of the key missing features in Power BI Desktop was basic alignment of visuals on the dashboard.  With this release you can align, distribute and size dashboards

You can also now format multiple dashboards using the Format Painter as well.

New Stacked Area Chart

R Visuals Integration

Power BI Desktop will now support R visuals within dashboards (in preview).  Using R scripts, you can use R’s plotting capabilities to create visualizations and integrate them into the dashboard.

Support for Live SSAS Multidimensional Hierarchies

Power BI Desktop will now support live exploration of SSAS multidimensional model hierarchies and use these within their reports for drill down.

Read More

Power BI Desktop Now Supports Importing Excel and SSAS KPIs

One of the key missing features in Power BI is support for target or key performance indicators.  Microsoft just announced and provided in preview support for consumption of existing KPIs from Excel.  You cannot create KPIs directly in Power BI Desktop just yet.

Here are the key steps to enable this new feature.

Step #1: Download the latest version of Power BI Desktop

You’ll need the latest version of Power BI Desktop to leverage the new feature.  You can download it here.

Step #2: Enable Preview Features in Power BI Desktop

Under Options –> Preview Features, the Use KPIs from Power Pivot or SSAS Tabular.

image

Step #3: Create an Excel Power Pivot Model with a KPI

Using Power Pivot, create a KPI.  In my sample, I create a KPI for average life expectancy for the US to compare with the actual ALE for each county. 

image

Step #4: Import Excel Workbook Contents

In order to pull in your KPI, you have to import Excel Workbook Contents.  Note: if you try to go through the usual “Get Data” option it doesn’t pull in KPIs, just tables and columns. 

image

Step #5: Display Your KPI

Once you have your KPI loaded, you can display it in a Table, Matrix, Card, or Multi-Row Card. 

image

Read More

Power BI Desktop Now Supports Live Query to Azure SQL and SQL Server

Power BI Desktop now supports Live Query to two new data sources – Azure SQL and SQL Server on premise.  Previously, the only supported Live Query data source was SSAS on premise.

This is an important improvement for a key reason I noted in a previous post – there is 250 MB on Power BI Desktop files when you publish them to the Power BI service.  If the data source isn’t “live”, Power BI Desktop fetches the data and saves it within its file format.  If you run a query that requires more than 250 MB to be cached then you cannot publish the Power BI Desktop file.

In order to leverage this new features, you will need the latest Power BI Desktop application installed.   In addition, you have to turn on the feature under Options –> Preview Features.

image

Now when you query these data sources, you’ll see the following dialogue box:

image

If you select DirectQuery, the query executes against the live database instead of caching the data locally. 

The key impact is that you have a direct connection to the database so you can do queries against large datasets and still publish the Power BI Desktop file to the Power BI Service because data is no longer cached locally. 

Read More

Power BI Has a 250 MB File Size Limit

If you try to use Power BI Desktop to create a dashboard, Power BI imposes a hard limit of 250 MB as the maximum file size.  In my scenario, I had several 500 MB csv files that I used to create a dashboard.  Power BI Desktop works fine but when you try and publish to Power BI you are stopped because the Power BI Desktop file is more than 250 MB.

image

I tried moving the files to Azure Blob Storage but that won’t work either because when you run the query to load up the files, it downloads the content back into your local Power BI Desktop file.

The only way around the limit seems to be to use a data source that queries in real time (“Live with direct query”) instead of being cached through Power BI Desktop.  The options currently are Azure SQL, Analysis Services Tabular databases on premise, Microsoft Stream Analytics, SQL Azure Data Warehouse or Spark on HDInsight.  However, the ONLY option for a live connection in Power BI Desktop is SSAS and most recently in preview SQL Server and Azure SQL.

Read More

Power BI Calculated Columns vs. Calculated Measures: An Example

In Power BI Desktop, we have the ability to create calculated “Columns” and “Measures”.  Both work by creating new pseudo columns based on a DAX calculation.  What is the difference between the two?  The answer lies in when the calculation takes place and how it impacts aggregation of data. 

Let’s look at a specific example.

I downloaded a dataset from the Community Health Status Indicators that provides demographic data and health indicators by county in the United States.  One of the demographics available is the population broken down by age in 4 categories:

  • Under 19
  • 19 to 64
  • 64 to 85
  • 85 and over

Each of these are provided as a % value for the county.  In addition there is a total population size for each county provided.  What we want is a summary pie chart that looks something like this:

image

We want to ensure that the % for each age group aggregates properly as we roll up from county to state to country.  Here are the steps required that show the difference between a calculated column vs. a calculated measure.  The key difference between them is:

  • A Calculated Column runs the calculation on each individual row.  The calculation happens BEFORE aggregating.
  • A Calculated Measure runs the calculation on a collection of rows.  Calculations happen AFTER aggregating.

Let’s see the difference in action.

Step #1: Using a Calculated Column for the Population per Age Group

Our dataset only provides % of population for each age group.  If you try to roll this up to the state or country, you cannot aggregate the % successfully.  If you try SUM as your roll up option it will simply add all the % as if they were numbers – definitely not want we want.  If you use AVERAGE to roll up this is a bit better but it doesn’t take into account that some counties are more populous than others, making your average very distorted (for example, the average in Alaska is given equal weight as California). 

In order to calculate a true average, we need to translate the county % values into absolute population values so that when we roll them up it takes into account the size of the county. 

Because we want the calculation to work on each individual row, we can use a Calculated Column.  In this case, we need one for each age group like this:

Age 19 Under Population = [Age_19_Under] / [Total % Age Population] * DEMOGRAPHICS[Population_Size]

Age 19 to 64 Population = [Age_19_64] / [Total % Age Population] * DEMOGRAPHICS[Population_Size]

Age 65 to 84 Population = DEMOGRAPHICS[Age_65_84] / [Total % Age Population] * [Population_Size]

Age 85 and Over Population = [Age_85_and_Over] / [Total % Age Population] * [Population_Size]

These calculated columns represent the absolute number of people for each age bracket per county.

The reason we need the [Total % Age Population] (which is another calculated column) is because in this dataset, the original % values don’t add up to 100%.  By dividing the % by the total % we effectively normalize all the % values so they are always out of 100%.

Here are the results for California (keep in mind this is 2010 data):

image

Step #2: Use a Calculated Measure to Roll Up to State and Country

Now that we have raw numbers for each county, we can use a Calculated Measure to calculated a true % based on the aggregated populates for each county.  For each age group, the calculated measure looks like this:

<19 = Sum(DEMOGRAPHICS[Age 19 Under Population]) / Sum(DEMOGRAPHICS[Population_Size])

19 to 64 = Sum(DEMOGRAPHICS[Age 19 to 64 Population]) / Sum(DEMOGRAPHICS[Population_Size])

64 to 85 = Sum(DEMOGRAPHICS[Age 65 to 84 Population]) / Sum(DEMOGRAPHICS[Population_Size])

> 85 = Sum(DEMOGRAPHICS[Age 85 and Over Population]) / Sum(DEMOGRAPHICS[Population_Size])

As you can see, the key difference with a Calculated Measure is the aggregation happens first – the sum function takes the rows that fit the current filtering criteria, adds them together and then uses the total to calculate the % of the total population.

Read More