In the big data world, the latest buzz concept is the “Data Lake”.
In the industry, the concept of a data lake is relatively new. It’s as an enterprise wide repository of every type of data collected in a single place prior to any formal definition of requirements or schema. This allows every type of data to be kept without discrimination regardless of its size, structure, or how fast it is ingested. Organizations can then use Hadoop or advanced analytics to find patterns of the data. Data lakes can also serve as a repository for lower cost data preparation prior to moving curated data into a data warehouse.
Essentially, the concept is a massive (e.g. Petabytes) repository of raw files coming in from various sources such as web analytics, Internet of things, etc.
Microsoft announced a new Azure Data Lake service that builds on top of their existing Hadoop investments to provide cloud based scalability at massive sizes.
The service is in private preview and interested developers can sign up here.