Latest Microsoft Big Data Offerings Start to Enable a Comprehensive Big Data Cloud Platform

The release of several new big data services within Azure shows the evolution of Azure from a virtual machine IAAS world to a PAAS big data enablement service.

image

The following is a summary of what each of these services provide and a link to more information.

Data Input/Output Services

  • Event Hubs: Internet of Things style ingestion of millions of events coming from hundreds of thousands of concurrent clients.
  • Web Sites: Use ASP.NET, PHP, Java technologies to deliver a public facing web site.  Built in support for staging, backup/recovery, high availability and auto-scaling.
  • Notification Hubs: Broadcast push notifications to millions of mobile devices.
  • API Management: Publish APIs to be consumed by internal teams, partners and developers at scale.  Includes provisioning, usage plans, throttling and alerts.

Data Processing Services

  • HD Insight: Hadoop running in the cloud as a service.  Latest version of Hadoop is supported and HD Insight adds additional integration with Excel as a BI platform.
  • Stream Analytics: Part of HD Insight, Stream Analytics provides real time processing of incoming analytics data (e.g. web logs, events, sensor readings, etc.)
  • Machine Learning: Leverage enterprise grade analytics algorithms using a design tool.  Combine algorithms for predictive analytics together in machine learning workflows.
  • Search: General purpose search engine for indexing text based documents.
  • Data Factory: Cloud based ETL engine for data processing across multiple data services including SQL Server, Azure SQL, Azure Blob, Azure Table.  Use multiple languages including C#, Hive and Pig for data processing tasks.
  • Batch: Massively parallel processing for batch jobs.  Harness large pools of CPUs on demand.

Data Storage Services  

  • Table Storage: simple name/value storage but high performance and incredibly cheap. 
  • Document DB: NoSQL document database offering.  Store data as self-describing JSON structures.
  • Blob Storage: storage for unstructured documents, files, etc.  Blob Storage is also used for storing of virtual machine volumes, HD insight volumes, etc. underneath these other services.
  • Azure SQL: traditional SQL Server running as a service. 

The number of services available is growing – some of these services have just been announced while others have been production ready for a while now. Each of these services is PAAS based, e.g. you are buying an automatically scaling service that is entirely managed by Microsoft on a per use basis.  Imagine the number of virtual machines you would need to manage, patch and ensure high availability to replicate these services that you can now scale up in minutes using the Azure portal!