Azure Data Warehouse Now Available in Preview

Azure Data Warehouse is a new Microsoft service for staging large volumes of data for analysis purposes.  The Azure Data Warehouse service is now available in preview.

Azure Data Warehouse is a 100% PAAS based service to compliment the existing Azure SQL Database.  Azure DW is different than Azure SQL in a number of different ways:

  • Azure SQL is a traditional SQL database.  Azure Data Warehouse provides both relational and non-relational data processing models. 
  • Azure SQL is optimized for OLTP workloads where Azure Data Warehouse is optimized for ETL and analytics workloads.
  • Azure Data Warehouse is designed to be elastic to massive scale (e.g. Petabytes) where Azure SQL maxes out at 1 TB.
  • Azure Data Warehouse leverages Polybase which is Microsoft’s technology for integrating queries across SQL Server and Hadoop into a single model.

From a pricing perspective, there are some differences as well:

  • Azure SQL is sold based on performance units and so is Azure Data Warehouse.  However, these units are not the same.  It’s not clear which is faster or slower at the moment so it’s difficult to compare pricing from this perspective.
  • Azure SQL is priced on a monthly basis and cannot be stopped as a cloud service without deleting the database.  Azure Data Warehouse runs on demand and you can start and stop it.  Similar to an Azure VM, when stopped you only pay for storage costs and not processing costs.
  • With Azure SQL, you don’t pay for storage costs – they are bundled into the price.  With Azure Data Warehouse, you pay for storage costs for the data being housed in the warehouse.

Read More

SQL Server 2016 Preview Coming This Summer

Microsoft has announced the latest version of SQL Server on premise.  SQL 2016 will arrive in preview this summer with some key new features.

Always Encrypted

Always encrypted means just that – the data is encrypted at rest and in transit.  Encryption is transparent to the application so no changes are required at the application level. 

Stretching From On Premise to Azure

In the new version of SQL 2016, you’ll be able to stretch your data from on premise to the cloud based on business rules.  In a similar way to StorSimple does for storage, SQL 2016 will allow you to extend your data to the cloud for those rows that are historical, less accessed, etc.  Enabling the always encrypted feature means that your data is encrypted at all times.

In Memory Analytics

In memory OLTP has been available in SQL 2014 – in 2016 this is now extended into operational analytics.  This will improve integration of OLTP and OLAP scenarios, all running in memory to speed up performance.

R Processing in the Database

Microsoft has a technology called Polybase which provides integration between Hadoop and SQL on premise.  Microsoft also just bought a company called Revolution Analytics which provides an open source distribution of R.  In the next version of SQL, these platforms will be integrated into SQL Server directly, providing better tools for no-sql workloads.

Native JSON Support

Microsoft is promising “native JSON support” which sounds promising and interesting given their investment in Azure DocumentDB and Azure Search which are both JSON based. 

Read More

New Azure SQL Data Warehouse Service Will Bring Analytics as an Elastic, Fully Managed Service

Microsoft announced this week a brand new service – the Azure SQL Data Warehouse.   Azure SQL has provided PAAS based SQL services for a few years now, but if you wanted to run SQL analysis services, build cubes and do analytics you had to roll-out your own SQL Server on raw infrastructure as a service. 

Azure SQL is quite a good service in terms of performance, scalability and pricing.  You can create a database for as little as $10 / month and scale it up to 500 GB and dedicated performance. 

The new SQL Data Warehouse Service will be built on top of the same Azure SQL elastic technology to enable a data warehouse that can scale up or down as needed. 

The new platform also promises integration with Microsoft’s other big data and business intelligence services such as HDInsight, Power BI, Azure Machine Learning, etc.

The preview will be available later this year…

Read More

Top 10 2015 Predictions for Microsoft Azure

Like Office 365, Microsoft Azure has undergone a lot of change in the past year.  Throughout 2014, it seemed like there was a new service being released in either preview or production release every few weeks.   Looking forward to 2015, here is list of predictions for Azure in 2015.

10. Continued Focus on Security and Encryption

One of Microsoft’s key differentiators as an enterprise cloud provider is its focus on security.  Microsoft already has many of the key certifications, public commitments to enterprise grade security, etc. and in 2015 expect the investments to continue. 

9. Microsoft Shrinks the Market Share Gap with Amazon

Amazon is still the dominant cloud supplier and will continue to be in 2015.  However, in 2014, Microsoft’s growth rate outpaced Amazon and I think we’ll see the same in 2015 as Microsoft closes the gap in market share. 

cis q114

Also watch for data center expansion in 2015 as a way to take away market share from Amazon – Microsoft has launched now in China and Australia for example and this geographic expansion will continue as we move into 2015.

8. New Versions of Visual Studio, ASP.NET, .NET Framework

While not technically an Azure feature, the new releases of Visual Studio, ASP.NET and the .NET framework will further integrate, support and provide new ways of building solutions for Microsoft Azure.  The new Azure SDK provides better diagnostics, improved deployment tools for Azure, support for Blob storage, improved HD Insight support and the new version of Visual Studio supports Azure connected services, enterprise SSO, code analysis for Azure, and publishing to Azure integration.

.net_2015.

7. Services in Preview Launch as Production Services

There were many new services launched in 2014 for Microsoft Azure in preview including:

  • Improvements to Azure SQL
  • Better storage and shared file systems
  • A number of big data services (Batch, Machine Learning, Data Factory, Storm, Stream Analytics, etc)
  • Live media streaming
  • NoSQL document based database
  • Azure Search
  • Site recovery through replication

In particular the big data services are a major strategic investment for Microsoft and a key differentiator in the fight with other cloud service providers for market share.  Expect these to be promoted to enterprise class production ready services and a LOT of marketing and promotion around them in 2015.

6. Microsoft Struggles with the Cannibalization of SQL Server

Microsoft’s number one product is SQL Server with a massive $6 billion in revenue for Microsoft.  The SQL Server business grew by 11% in 2014, but Azure growth was more than 100%. 

Microsoft has a fundamental problem with SQL Server as it pivots towards the cloud:

  • The cost of SQL Server running in the cloud through IAAS is quite expensive, especially compared to NOSQL alternatives.  I can run a basic Windows VM for as little as $14 / month, but installing SQL Standard drives that price up to $315 / month or $1,1777 / month for SQL Enterprise.  The cost of a real enterprise class SQL cluster running through Azure on IAAS would cost thousands of dollars per month when you account for high availability and clustering requirements.
  • Microsoft has Azure SQL, which is a PAAS based offering effectively competing with traditional SQL Server running either on premise or in the cloud.  Azure SQL is significantly cheaper than running a full SQL Server license especially for smaller databases. 
  • Microsoft has multiple NOSQL alternatives including Hadoop, Table Storage, and Document DB.  Each of these services can replace a traditional SQL database in certain scenarios.

In the same way that Office 365 has cannibalized on premise implementations of SharePoint, Exchange and Office, Azure will start to cannibalize all those SQL Server databases running on premise.  It will happen slowly because of the nature of migrating any kind of database but over the next 3-5 years expect the shift to become more visible. 

Microsoft’s struggle in 2015 will be how to position its traditional SQL Server business (in particular all of those customers being sold on upgrades to SQL 2014) vs. customers who might start to look at moving to NOSQL alternatives as they move to the cloud.

5. Microsoft Continues to Pivot on Open Source, Linux and Partnerships

Microsoft will continue its pivot on embracing Linux, open source, and other non-Microsoft partnerships such as SalesForce, IBM, Oracle, Dropbox, etc.  Microsoft has been busy in 2014 open sourcing a number of their core platforms including ASP.NET and big chunks of the .NET framework.  They have also moved to a cross-platform model for the .NET framework.  Microsoft has also announced partnerships with SalesForce and Dropbox to support integration between Office 365 and Azure.

Expect this to continue and expand in 2015 as Microsoft moves from a proprietary software company to an open cloud services company.  Microsoft has recognized that their future is hosting EVERYTHING, not just Microsoft designed and engineered products and this will continue to expand in 2015.

4. Price Wars

Microsoft and all the other major cloud providers are in a massive price war which is driving prices down.  If you had purchased basic cloud storage in 2012, you would have paid $0.14 per GB.  In 2013, that price was $0.07 per GB.  At the end of 2014, it’s as low as $0.03 per GB.   Similarly, an A3 VM would have cost $0.48 per hour in 2012 and is now running at $0.32 per hour.

Expect the price drops to continue in 2015 as Microsoft competes with Amazon and others for market share and the economies of hardware, storage, etc. continue to improve over time. 

3. Hadoop and Big Data Go Mainstream

Big data has been a buzzword in the industry for the past several years, but it has been more hype in many cases than practice.  Hadoop became the key platform for big data in 2014 with Microsoft embracing it as its core platform.  Hadoop providers have received massive investments and their revenue is expected to grow in 2015 by 60% year over year.

However, there have also been significant barriers to adoption and CIO’s have been slow to commit to big data platforms.  2014 was a year of many proof of concepts, investigations and hype demos but also lots of concerns, trepidations and adoption challenges for big data technologies.

In the same way that cloud saw massive growth in 2014 as it went from hype to mainstream, I see 2015 as being a pivotal year in the big data story as CIO’s start to move from hype, research and proof of concept stage to mainstream use of these technologies.

In addition, as Microsoft’s new big data services come online such as Batch, Storm, Stream, Machine Learning, etc. this will start to reduce the complexity of engineering big data services and move the market forward to embrace the big data promise with less of the need for data scientists and PHD computer scientists to figure it out. 

2. The Resurgence of PAAS

Microsoft originally championed the cloud as PAAS and quickly had to backtrack into the IAAS business as Amazon took the market.  However, running virtual machines in the cloud in the same way as on premise is not economical in the long run because you still end up owning the costs for maintenance, patching, upgrades, etc.  Microsoft has gone head to head with Amazon and other IAAS vendors and can now compete well in basic VM hosting – however, this is ultimately a race to the bottom as the prices continue to drop. 

The real differentiator for Microsoft in the long run is PAAS – it has the development tools, the APIs, the stacks and the developer community to making running your own virtual machines seem as antiquated as running them on premise.  Some key changes to Azure that have happened in 2014 will make PAAS an increasingly compelling option in 2015:

  • Azure Web Sites as a low cost, high scale option for running public facing web sites is a very attractive option over running your own servers.  Scalability of the Azure Web Site offering is already really good.
  • Azure SQL continues to drop in price while increasing in features and performance.  The new version of Azure SQL (currently in preview) brings almost complete compatibility as well as improved performance to the existing service. 
  • Microsoft’s new big data offerings are all PAAS services and they provide quasi control over how many VMs, instances, and performance is provided to scale as needed while not requiring any management of the underlying infrastructure.  For example, the new Azure Batch provides access to pools of VMs on demand with no need to maintain them – the service manages them as a generic pool including provisioning and deprovisioning.

For most customers, PAAS will provide a more economical, easier to maintain and scalable service than building your own virtual infrastructure using IAAS.  As additional services and finer grain control over PAAS services in terms of dedicated performance units, scalable tiers, etc. the case for IAAS based services is being undermined by easier to use PAAS services.

1. Performance Challenges and Opportunities

One of the key challenges and opportunities we have seen with Azure is performance and scalability.  For example, running SQL Server on IAAS has been a challenge because of poor I/O performance.  We have also seen really good scalability from Azure Web Sites, especially with the Shared Tier. 

The bottom line is that a cloud based VM isn’t the equivalent of a VM running on premise for lots of different reasons.  The I/O performance tends to be poorer, the network latency is harder to control, and if Microsoft is selling performance on Azure SQL as a set of “Dedicated Performance Units” which don’t map well to traditional servers.  Before launching any cloud service, performance testing is a must to ensure that your particular scenario will scale as expected and perform economically.  In some cases, the cost of the scalability is dirt cheap (for example, scaling up an Azure Web Site or load on an Azure SQL database) where in other cases it can be quite costly (for example, scaling up VM’s running SQL Server in IAAS). 

image

Microsoft has been introducing new service tiers that start to address some of these challenges including:

Expect more of these types of improvements as we move into 2015 as customers to start to leverage Azure for performance demanding workloads. 

Read More

Azure Web Sites Performance Analysis – Hosting Plans Compared

With the many options in scaling up an Azure Web Site, I wanted to understand the different options and the type of performance I could expect from each option.  I created a simple performance test using the Bakery site which is a basic ASP.NET web site template that has a sample store.  It features a catalogue with a few items in it that you can then place an order. 

The default out of the box implementation uses SQL Compact as its database, which has the database running through the file system.  Using only a single user hitting the page, the average page load time is about 300-400 ms.  I also tested migrating the database to a proper SQL Azure database and the average load time dropped to 30-50 ms, an almost 10x improvement in performance. 

In order to test multiple users hitting the site, I created a performance test that hit the site using the open source tool JMeter.  The test was a basic test to see how fast the pages could be retrieved under load.   I tested different loads using 1, 2, 4, 8, 12, 20, 50, 100, 200 and 500 concurrent users.  Tests were run from within an Azure VM so the latency was very low.

Here are the results using a number of different concurrent threads to test the scalability of each option.  The summary of the results can be found here in this slideshare presentation.

 

Free

Free worked quite well running up to 20 concurrent users.  However, at 50 threads, the response time doubled and within a couple minutes the Data Out quota was reached and the entire site was disabled!

Free is really only useful for testing – with only 165 MB per day in the Data Out quota, you’re going to run out even with the most basic web site under any reasonable load.

Shared

Using the SQL Compact edition, Shared still scaled up really well.  Even at 50 concurrent users, the average response time was only 400 ms.  However, at 100 threads we started to see slow down where response time increased to an average of 1290 ms.  At 200 threads, the slow down was even more pronounced at 1755 ms.

If you look at the Azure dashboard, you can see as the tests run there were 8369 requests in one minute and it barely broke a sweat!

image

Running using Azure SQL improved scalability even further.   It could easily handle 100 concurrent users without any noticeable change in performance. 

However, one of the key limitations for Shared is the quota limits that are imposed on your site.  While Shared can handle spikes in traffic quite nicely, you need to be careful about exceeding the quotas. 

Running a simple ASP.NET page using the SQL Compact database, I was able to exceed my CPU quota in less than 5 minutes running at 50 concurrent users and then my site was disabled.  Running the more optimized SQL Azure database, I could do the same at 200 concurrent users.

The key quota limit is CPU time – it resets every 5 minutes and limits your site to “2.5 CPU minutes”.  Essentially, if you have too much traffic in a 5 minute period your site is disabled. 

image

Shared is still the best option of all the tests for handling short spikes in performance as long as your site doesn’t exceed your usage quota.

Basic Small

Basic has 3 configurations – Small, Medium and Large.  We tested each of these configurations under load.

Basic Small was significantly slower performance than Shared in all scenarios.  If you look at the graph from the Azure monitor, you can see the difference in requests being handled from the time the site was in Shared vs. the time it is in Basic Small.

image

In Shared, we peaked at 8300+ requests per minute even running under SQL Compact, while in Basic Small the site was barely managing 250 requests per minute.  By the time we load up to 50 users, the Basic Small site started generating errors from the SQL Compact database because it couldn’t handle the requests fast enough.

Running using a SQL Azure database, Basic Small works better but still suffers from scalability issues.  At 20 concurrent users, the average load time was 93 ms compared to an unloaded 30 ms.  At 50, 100 and 200 concurrent users, the performance got progressively worse.

Basic Medium

Basic Medium fairs much better than Basic Small – it consistently delivers reasonably good results even at higher loads.  As you can see by the following graph, Basic Medium peaks at significantly higher requests per minute than Basic Small under all test scenarios.

image

As a result, Basic Medium performance is stable at around 400-450 ms when running the SQL Compact database to load on average until 20 concurrent users and then starts to increase slowly from there.  When running SQL Azure, We could scale to 50 concurrent users without a significant decrease in performance. 

Basic Large

Basic Large is the first instance that we tested that had faster average load times than Shared and Free.  It also scales much better than Basic Small or Medium, where it stays at an average of 300 ms even with 20 concurrent users running under our SQL Compact database.  Running under SQL Azure, Basic Large performed well up to 20 concurrent users and then performance slowly degraded as we increased to 50, 100, 200 and 500 concurrent users.

Standard Small

Switching to Standard provides you the same options as basic in terms of size of VM but with the additional ability to scale out the number of instances from 3 to 10 and the ability to auto scale up when your instance becomes bogged down. 

The performance of Standard Small is a little bit better than Basic Small.  In my test with Basic Small, we started seeing errors with 50 concurrent users.  With Standard Small, the performance is very slow but manages to get through the test.  However, at 100 concurrent users, Standard Small also fails by generating errors.

Standard Medium and Standard Large

Standard Medium did similar performance to Basic Medium.  Standard Large is rock solid with the best scalability of all the options.  At 200 concurrent users, the instance is still consistently delivering.  Performance is about the same as Basic Large.

Shared X 3

Using Shared, you can increase the number of instances up to 10.  What happens if you scale up to 3 instances – do you get 3x the performance?

Running multiple instances meant that only SQL Azure was supported – SQL Compact won’t work in this scenario at all because it is local on a file system.  Running Shared on 3 instances, the speed to load a page was stable at 30 ms, even when running with 200 concurrent users!

At 500 concurrent users, I was able to again exceed by quota after a couple minutes of running at that speed.  However, in that time I was able to generate almost 50,000 page views running on three shared instances.

Basic Medium X 3 and Standard Medium x 3

Basic Medium or Standard Medium running on 3 instances can handle a LOT of traffic.  Both were rock solid running my test up to 50 concurrent users with an average load time of 33 ms.  With 100 and 200 concurrent users, there was degraded performance but it was reasonable – about 50-60 ms.  Even with 500 concurrent users, performance was still a respectable 124 ms. 

Standard Medium running on 3 instances performed at about the same rate as Basic Medium. 

Running 3 Basic Medium’s is only slightly more expensive than a single Standard large, but the performance is significantly better.  At 500 concurrent users, the cluster of 3 Basic Medium’s was serving pages at 124 ms while at the same load, a Standard Large was taking 620 ms to serve a page.

Standard Large X 3

Standard large running on 3 instances is massive.  It ran easily with 500 concurrent users with barely any decrease in performance! 

image

As you can see by the graph, we peaked at almost 40K page views per minute! 

Key Conclusions

In analyzing the performance of all the various hosting plans, here are my key conclusions:

  • Underlying baseline performance makes a big difference in scalability.  Optimizing your page rendering time can allow you to reduce hosting costs by allowing you to run under smaller or fewer instances.
  • The best performing and most economical hosting plan is Shared.  However, with imposed quotas, you have to be careful not to exceed your limits or your site is disabled.
  • Azure SQL runs very fast and scales well.  Even with 500 concurrent users, Azure SQL was never a noticeable bottleneck.
  • Scaling out shared instances also scales out quota limits – 3 Shared instances @ $30 / month might be a better bet than 1 Basic Small at $60 / month.
  • Basic / Standard Small are poor scalability choices – the potential cost savings compared with Medium is eroded quickly by degradation in performance under load.
  • Scaling out (e.g. adding multiple instances) is generally more reliable, higher performing and cheaper than scaling up.  For example, running 3 Basic Mediums provided superior performance to 1 Standard Large and cost is comparable. 
  • Autoscale is only available with Standard and only works horizontally – e.g. you cannot automatically scale up from a Small to a Medium to a Large.

Azure Web Sites as a platform is an incredible option especially for high volume web sites.  It scales well and the various options for hosting plans mean you can can pay as little as $10 / month for your web site.  As you need more capacity, changing the configuration can be done at any time and you pay only for the capacity you are using.   

Read More