January 28, 2016 Mark Ginnebaugh

Choosing a Data Lake Storage System

Data Lake Storage System: Which One Should You Use?

When you create a Data Lake, one of the most overlooked questions is, “What storage technology should back the lake?” Most companies just go with whatever tech stack they are familiar with, or are being sold. In reality, the Data Lake storage system should be chosen using the same questions you ask when you build out any other piece of the system:

Data Lake Storage System

1. Does the system cover all requirements and SLAs that are currently known?
2. Can the system be easily expanded if more functionality (or space) is needed?
3. Is the system in line with budgetary and engineering talent constraints?

Once these questions have been reviewed and answered the selection of storage technology can be started.

There are five widely accepted storage systems being used for Data lakes. Each of them have both pros and cons as the basis for a Lake.

Data Storage System Pros and Cons

Type of System	Pro	Con
Hadoop Based System	Easily expandable and cheaper storage	Slower data retrieval times
Non-Hadoop Based Storage + Hadoop / non-Hadoop Compute, e.g. S3 + Hive / Spark	Decouples storage and compute, optimized for cloud platforms	More difficult to implement on-prem
Massively Parallel Processing System (MPP), e.g. H.P. Vertica or IBM Netezza	Fast record retrieval and ease of setup	High Cost
NoSQL System (Cassandra, HBase)	Easily expandable and fast	Tech community less familiar with NoSQL systems
SQL Database (SQL Server, Oracle, MySQL)	Well defined technology	Cannot handle large amounts of data without high cost

At DesignMind, we have developed a proprietary pattern that not only ingests large amounts of data, but:

Makes data available to users at all levels of the system
Allows data to be accessed by multiple formats
Allows for simplified schema evolution management

Read more in our white paper, “Data Lake Storage Systems That Work”. Questions? Contact us and we’ll get back to you promptly.

Choosing a Data Lake Storage System

Data Lake Storage System: Which One Should You Use?

Data Storage System Pros and Cons

Stay up to date.Subscribe to our newsletter.

Stay up to date.
Subscribe to our newsletter.