Everyday one hears more about how Big Data ecosystem technologies are helping create incremental innovation & disruption in any given industry vertical – be it in exciting new cross industry areas like Internet Of Things (IoT) or in reasonably staid areas like Banking, Manufacturing & Healthcare.
Big Data platforms, powered by Open Source Hadoop, can economically store large volumes of structured, unstructured or semistructured data & help process it at scale thus enabling predictive and actionable intelligence.
Corporate IT organizations in the financial industry have been tackling data challenges at scale for many years now.
Traditional sources of data in banking include
- Customer Account data e.g. Names,Demographics, Linked Accounts etc
- Transaction Data which captures the low level details of every transaction (e.g debit, credit, transfer, credit card usage etc),
- Wire & Payment Data,
- Trade & Position Data,
- General Ledger Data and Data from other systems supporting core banking functions.
Shortly after these “systems of record” became established, enterprise data warehouse (EDW) based architectures began to proliferate with the intention of mining the trove of real world data that Banks possess with an intention of providing Business Intelligence (BI) capabilities across a range of use cases – Risk Reporting, Customer Behavior, Trade Lifecycle, Compliance Reporting etc. Added to all of this, data architecture groups are responsible for maintaining an ever growing hodgepodge of business systems for customer metrics, adhoc analysis, massive scale log processing across a variety of business functions. All of the above data types have to be extensively processed before being adapted for analytic reasoning.
You also have a proliferation of data providers who want to now provide financial data as a product. These offerings range from Market Data (e.g. Bloomberg, Thomson Reuters) to Corporate Data to Macroeconomic Data (e.g Credit Bureaus) to Credit Risk Data. Providers in this business like the above typically construct models (e.g credit risk) on top of these sources and sell the models as well as the raw data to interested parties. Thus architectures need to adapt in an agile manner to able to scale, ingest and process these feeds in a manner that the business can leverage to react to rapidly changing business conditions.
Thus, Bank IT world was a world of silos till the Hadoop led disruption happened.
Where pre-Hadoop systems fall short-
The key challenges with current architectures in ingesting & processing above kinds of data –
- A high degree of Data is duplicated from system to system leading to multiple inconsistencies at the summary as well as transaction levels. Because different groups perform different risk reporting functions (e.g Credit and Market Risk) – the feeds, the ingestion, the calculators end up being duplicated as well.
- Traditional Banking algorithms cannot scale with this explosion of data as well as the heterogeneity inherent in reporting across areas such as Risk management. E.g Certain kinds of Credit Risk need access to around 200 days of historical data where one is looking at the probability of the counter-party defaulting & to obtain a statistical measure of the same. The latter are highly computationally intensive.
Circa 2015, Open source software offerings have immensely matured with compelling functionality in terms of data processing, deployment scalability, much lower cost & support for enterprise data governance. Hadoop, which is now really a platform ecosystem of 30+ projects – as opposed to a standalone technology, has been reimagined twice and now forms the backbone of any enterprise grade innovative data management project.
I hold that the catalyst for this disruption is Predictive Analytics – which provides both realtime and deeper insight across hundreds of myriad of scenarios –
- Predicting customer behavior in realtime,
- Creating models of customer personas (micro and macro) to track their journey across a Bank’s financial product offerings,
- Defining 360 degree views of a customer so as to market to them as one entity,
- Fraud detection
- Risk Data Aggregation (e.g Volcker Rule)
- Compliance etc.
The net result is that Hadoop is no longer an unknown term in the world of high finance.
Banks, insurance companies and securities firms that have begun to store and process huge amounts of data in Apache Hadoop have better insight into both their risks and opportunities.
So what capabilities does Hadoop add to existing RDBMS based technology that did not exist before?
The answer is that using Hadoop a vast amount of information can be stored at much lower price point. Thus, Banks can not only generate insights using a traditional ad-hoc querying model but also build statistical models & leverage Data Mining techniques (like classification, clustering, regression analysis, neural networks etc) to perform highly robust predictive modeling. Such models encompass the Behavioral and Realtime paradigms in addition to the traditional Historical mode.
However the story around Big Data adoption in your average Bank is typically not all that revolutionary – it typically follows a more evolutionary cycle where a rigorous engineering approach is applied to gain small business wins before scaling up to more transformative projects.
Now, from a technology perspective, Hadoop helps the IT in five major ways –
- enables more agile business & data development projects
- enables exploratory data analysis to be performed on full datasets or samples within those datasets
- reduces time to market for business capabilities
- helps store raw historical data at very short notice at very low cost
- helps store data for months and years at a much lower cost per TB compared to tape drives and other archival solutions
To sum up, why should Banks look at Hadoop and Big Data?
- Realize enormous Business Value in a range of areas as diverse as – Defensive (Risk, Fraud and Compliance – RFC ) to Competitive Parity (e.g Single View of Customer) to the Offensive (Digital Transformation across their Retail Banking business)
- Drastically Reduced Time to Market for new business projects
- Hugely Improved Quality & Access to information & realtime analytics for customers, analysts and other stakeholders
- Huge Reduction in CapEx & OpEx spend on data management projects (Big Data augments and even helps supplant legacy investments in MPP systems, Data Warehouses, RDBMS’s etc)
- Becoming the Employer of Choice for talent due to their vocal thought leadership in this field – in areas as diverse as Hadoop, Data Science and Machine Learning
5 comments
I’ve been exploring for a little bit for any high quality articles or blog
posts on Big Data & nosql in Banking . Exploring in Yahoo I eventually stumbled upon this web site.
Studying this i am glad to convey that I’ve a very good feeling I found out exactly what I needed.
I so much no doubt will make sure to do not forget this web site and will take a look
regularly. Cheers!
Vamsi – I do agree with all the broad themes you have introduced on your post. They are really convincing and will certainly work. Nonetheless, the posts are too long for novices. May just you please shorten them a bit the subsequent times?
Again, Thank you for taking the time to do such a great blog.
Keep this going please, great job!
Great post about big data banking applications. Great job describing major applications in banking.
Excellent article. I am leading a transformation project as well so this is very timely as well as hits the spot.