By now, there’s no need to point out how much data is generated daily by just about every business there is. But despite this obvious reality, most businesses don’t really use that data beyond its immediate initial purpose. By that, I mean your email is only useful ‘in the instant’. Once you’ve completed the task, it’s forgotten. Today’s sales figures might be rolled into a weekly or monthly report, but after that, it’s history (except maybe again at year-end). The logs and reports generated by monitoring equipment often do nothing unless they throw up an anomaly.
Isn’t it about time you did …something… with all that data? After all, ‘data is the new oil’, and contained in it could be the seeds of innovation or, dare I say it, revolution.
The emergence of the concept of the data lake, coupled with a low-cost way of creating one thanks to Microsoft Azure, answers that question.
First, what is a data lake?
It is storage repository which holds ‘raw’ data in its native format. It is literally a reservoir into which everything is fed just as it is – email, sales figures, production data, log files, click-streams, sensor data - whatever – and held there until it is processed. It differs from the more familiar concept of a data warehouse in that it is completely unstructured, just like water in a lake, where in the data warehouse, structures like folders, rows and columns order information.
(Importantly, however, within the data lake there will also be structured data which has come from more traditional data sources, like SQL databases).
Secondly, what’s the use of that?
That brings us back to the first paragraph. Data is valuable, but most companies aren’t realising the value from it. The data lake is the first step to doing something with company wide data, by bundling it into an initial format where it can be used.
But, I hear you say, what’s the point of simply smashing it all together in the lake? Good question.
The point is that the data lake is the foundation. It sets the scene for analysis which can then deliver valuable insights to guide business decisions (in due course).
When you sign up to give it a try with Microsoft Azure, you also get multiple tools to start doing something with the data. You don’t just get the repository, you get the stuff which does the data sorting and analysis too.
At this point, it’s probably worth pointing out that ‘doing something’ is likely to be an experimental process. Working with your solution provider, you can start establishing a baseline. You can explore what’s in the data and see what, how and where processing gets you.
These are the first steps you’re taking on the path towards becoming a data-driven business.
There should also be a warning: data lakes can become so-called ‘data swamps’ (hard to swim in a real swamp, and hard to get anything useful out of a data swamp) where data goes to die. Some have even called neglected data lakes ‘data graveyards’, because they become like an attic or a basement into which old junk is cast, never to be seen or used again.
Even though Microsoft Azure’s on-demand data lake makes it cheap and easy to get going, therefore, it is a very good idea to have some sense of purpose. Again, look for guidance and insight from your solution provider: a good partner should know your business and have some idea if a) a data lake is a good idea for your business, b) what can be achieved by a data lake and c) how to get started and make a concerted drive towards creating value.
Keen to know more? We’d love to discuss Microsoft’s Azure Data Lake with a view to seeing what this exciting new technology can do for your business. Get in touch!