It’s time to bring in the little Elephant (Hadoop) into your Data Eco systems

Posted by Anoop Abraham, Senior Business Intelligence Consultant on 21 October 2014

Tags: , , ,

Over the years business intelligence has grown in leaps and bounds. The early-2000s were when we built data warehouses in a structured and rigid approach with well-defined data modelling and source-to-target mappings. We invested heavily in the infrastructure to hold those massive data warehouses. That’s the era when the database appliances started to dominate in the data warehousing arena. A fortune was spent on setting up data centres, database appliances and other hardware. The majority of those spends were on the infrastructure and the heavy lifting of data from various source systems. Those were the days for professionals who were good at taming database appliances, UNIX systems, and networking. Consulting organisations were much keener on hunting those species. At one point we often wondered whether data warehousing was going to be a monopoly and was meant for only the big boys like Oracle, Teradata, IBM, SAP etc. Data warehouses in those days were meant for big established companies and were really costly, plus the majority of companies were sceptical about the ROI. Those were the days of analytical and MIS reporting.

But as time passed things started to change dramatically. More and more companies pitched in as they realised that data warehousing is not something that should happen in the backroom. They started to realise the hidden values inside the data. As the need arose, innovations started to happen and more and more players pitched in; which paved the way to an explosion of thoughts and new ways to tame data. Different methodologies were available to ease the Data Warehouse deployments.

Now we are on the verge of a massive paradigm shift in the BI arena. Business Intelligence in the cloud has started to gain momentum. Gone are the days of maintaining data centres, big database appliances and other hardware. Analytics-as-a-service is going to be the future. But think of this, even though we have made dramatic advancements in the BI space, as an organisation that adopts BI and as a data steward, we must try to answer the following questions:

  • Is the BI team spending much of their effort in heavy lifting the data from various source systems?
  • Are there multiple repositories to stage the source data?
  • Is the ETL support team overwhelmed with support calls, with the prime reason being data load failure?
  • Is the ETL team spending too much time fixing the data load issues?
  • Is the business complaining about missing data?
  • Is the data load consuming much of your ETL window?
  • Did your 1 TB table load fail after several hours of loading and did you start over again?
  • Are you paying a premium on data storage and backup solutions?

Hadoop is here to help. Most of the organisations are worried about how to welcome this little elephant into their Data Eco systems. By carefully planning and putting in thoughtfully planned data governance, this can be made possible. The road ahead is an explosion of data flows from all sorts of sources and the current relational data warehouses will not be sufficiently flexible to handle those complex data structures. Sooner or later organisations will have to cater for those varied data sources to make them competitive in the market. Plan now to adopt the little elephant to begin your journey towards Big Data adoption. 

There is a common myth that Hadoop is a replacement for the current data warehouses and it is going to replace relational data warehouses, but the reality is that Hadoop is going to be a companion and is going to be an add-on component in the data eco-system. 

 

hadoop

(Picture taken from the  web)

Hadoop is a powerful technology, but it is just one component of the big data technology landscape. It is a very cost-effective technology for staging large amounts of raw data, both structured and unstructured, which can then be refined and prepared for analytics. Hadoop can also help you avoid costly upgrades of existing proprietary databases and data warehouse appliances when their capacity is being consumed too quickly with raw, unused data and extract-load-transform processing. 

By incorporating Hadoop expertise in the data management team, organisations can manage their ever growing varied data sources better and also reduce the spending on data storage and management. So what are you waiting for, let’s bring in the cute little elephant now and reap the benefits in the long run.

I hope to continue part 2 of this topic on the steps to take in adopting Hadoop into data eco-systems, stay tuned for more.