“Data! Data! Data!” he cried impatiently. “I can’t make bricks without clay!” Sherlock Holmes – Conan Doyle’s “The Adventure of the Copper Beaches”
The first post in this three part series described the key ways in which innovative applications of data science are changing a somewhat insular and clubby banking & financial services industry. This disruption rages across the spectrum from both a business model as well as an organizational cultural standpoint. This second post examines key & concrete usecases enabled by a ‘Data Driven’ approach in the Industry. The next & final post will examine foundational Data Science tasks & techniques commonly employed to get value from data.
Big Data platforms, powered by Open Source Hadoop, can not only economically store large volumes of structured, unstructured or semi-structured data & but also help process it at scale. The result is a steady supply of continuous, predictive and actionable intelligence. With the advent of Hadoop and Big Data ecosystem technologies, Bank IT (across a spectrum of business services) is now able to ingest, onboard & analyze massive quantities of data at a much lower price point.
One can thus can not only generate insights using a traditional ad-hoc querying(or descriptive intelligence) model but also build advanced statistical models on the data. These advanced techniques leverage data mining tasks (like classification, clustering, regression analysis, neural networks etc) to perform highly robust predictive modeling. Owing to Hadoop’s natual ability to work with any kind of data, this can encompass the streaming and realtime paradigms in addition to the traditional historical (or batch) mode.
Further, Big Data also helps Banks capture and commingle diverse datasets that can improve their analytics in combination with improved visualization tools that aid in the exploration & monetization of data.
Now, lets break the above summary down into specifics.
Data In Banking–
Corporate IT organizations in the financial industry have been tackling data challenges due to strict silo based approaches that inhibit data agility for many years now.
Consider some of the traditional (or INTERNAL) sources of data in banking –
- Customer Account data e.g. Names, Demographics, Linked Accounts etc
- Core Banking Data
- Transaction Data which captures the low level details of every transaction (e.g debit, credit, transfer, credit card usage etc)
- Wire & Payment Data
- Trade & Position Data
- General Ledger Data e.g AP (accounts payable), AR (accounts receivable), cash management & purchasing information etc.
- Data from other systems supporting banking reporting functions.
To provide the reader with a wider perspective, a vast majority of the above traditional data is almost all human generated. However, with the advent of smart sensors, enhancements in telemetry based devices like ATMs, POS terminals etc – machines are beginning to generate even more data. Thus, every time a banking customer clicks a button on their financial provider’s website or makes a purchase using a credit card or calls her bank using the phone – a digital trail is created. Mobile apps drive a ever growing number of interactions due to the sheer nature of interconnected services – banking, retail, airlines, hotels etc. The result is lots of data and metadata that is MACHINE & App generated.
In addition to the above internal & external sources, commercially available 3rd party datasets ranging from crop yields to car purchases to customer preference data (segmented by age or affluence categories), social media feedback re- financial & retail product usage are now widely available for purchase. As financial services firms sign up partnerships in Retail, Government and Manufacturing, these data volumes will only begin to explode in size & velocity.The key point is that an ever growing number of customer facing interfaces are now available for firms to collect data in a manner that they had never been able to do so before.
Where can Predictive Analytics help –
Let us now begin some of the main use cases out there as depicted in the below picture-
Illustration – Data Science led disruption in Banking
Defensive Use Cases Across the Banking Spectrum (RFC) – Risk, Fraud & Security
Internal Risk & Compliance departments are increasingly turning to Data Science techniques to create & run models on aggregated risk data. Multiple types of models and algorithms are used to find patterns of fraud and anomalies in the data to predict customer behavior. Examples include Bayesian filters, Clustering, Regression Analysis, Neural Networks etc. Data Scientists & Business Analysts have a choice of MapReduce, Spark (via Java,Python,R), Storm etc and SAS to name a few – to create these models. Fraud model development, testing and deployment on fresh & historical data become very straightforward to implement on Hadoop.
- Risk Data Aggregation and Measurement – Measure and project different kinds of banking risks (Market Risk, Credit Risk, Loan Default and Operational Risk) . The applications for Data Science range from predicting different risk metrics across market, credit risk in Capital Markets. In Consumer Banking sectors like mortgage banking, credit cards & other financial products, data science is heavily leveraged to classify products & customers into different risk categories. Then to predicting risk scores and risk portfolio trends across thousands of variables.
- Fraud Detection – Detect and predict institutional fraud for a range of usecases – Anti Money Laundering Compliance (AML), Know Your Customer (KYC), watchlist screening, tax evasion, Linked Entity Analysis etc. In the area of individual level fraud – credit card fraud & mortgage fraud – predictive models are developed which constantly analyze customer spending patterns, location & travel details, employment details and social networks to detect in real time if customer accounts are being compromised.
- Cyber Security – Analyze clickstreams, network packet capture data, weblogs, image data, telemetry data to predict security compromises & to provide advanced security analytics.
Capital Markets, Consumer Banking, Payment Systems & Wealth Management
A) Capital Markets
- Algorithmic Trading– Data Science augments trading infrastructures in several ways. It helps re-tool existing trading infrastructures so that they are more integrated yet loosely coupled and efficient by helping plug in algorithm based complex trading strategies that are quantitative in nature across a range of asset classes like equities, forex,ETFs and commodities etc. It also helps with trade execution after Hadoop incorporates newer & faster sources of data (social media, sensor data, clickstream date) and not just the conventional sources (market data, position data, M&A data, transaction data etc). E.g Retrofitting existing trade systems to be able to accommodate a range of mobile clients who have a vested interest in deriving analytics. e.g marry tick data with market structure information to understand why certain securities dip or spike at certain points and the reasons for the same (e.g. institutional selling or equity linked trades with derivatives).
- Trade Analytics – Trade Strategy development is now a complex process where heterogeneous data – ranging from market data, existing positions, corporate actions, social & sentiment data are all blended together to obtain insights into possible market movements, trader yield & profitability across multiple trading desks.
- Market & Trade Surveillance – An intelligent surveillance system needs to store trade data, reference data, order data, and market data, as well as all of the relevant communications from all the disparate systems, both internally and externally, and then match these things appropriately. The system needs to account for multiple levels of detection capabilities starting with a) configuring business rules (that describe a fraud pattern) as well as b) dynamic capabilities based on machine learning models (typically thought of as being more predictive) to detect complex patterns that pertain to insider trading and other market integrity compromises. Such a system also needs to be able to parallelize model execution at scale to be able to meet demanding latency requirements.
B) Consumer Banking & Wealth Management
Data Science has been proven in several applications in consumer banking ranging from a single view of customer to mapping customer journey across multiple financial products & channels. Techniques like pattern analysis (detecting new patterns within and across datasets), marketing analysis (across channels), recommendation analysis (across groups of products) are becoming fairly common. One can see a clear trend in early adopter consumer banking & private banking institutions in moving to an “Analytics first” approach to creating new business applications.
- Customer 360 & Segmentation –
Currently most Retail and Consumer Banks lack a comprehensive view of their customers. Each department has a limited view of customer due to which the offers and interactions with customers across multiple channels are typically inconsistent and vary a lot. This also results in limited collaboration within the bank when servicing customer needs. Leveraging the ingestion and predictive capabilities of a Hadoop based platform, Banks can provide a user experience that rivals Facebook, Twitter or Google that provide a full picture of customer across all touch points
- Some of the more granular business usecases that span the spectrum in Consumer Banking include –
- Improve profitability per retail or cards customer across the lifecycle by targeting at both micro and macro levels (customer populations) .This is done by combining the rich diverse datasets – existing transaction data, interaction data, social media feeds, online visits, cross channel data etc as well as understand customer preferences across similar segments
- Detect customer dissatisfaction by analyzing transaction, call center data
- Cross sell and upsell opportunities across different products
- Help improve the product creation & pricing process
B) Payment Networks
The real time data processing capabilities of Hadoop allow it to process data in a continual or bursty or streaming or micro batching fashion. Once payment data is ingested, such it must be processed in a very small time period (hundreds of milliseconds) which is typically termed near real time (NRT). When combined with predictive capabilities via behavioral modeling & transaction profiling Data Science can provide significant operational, time & cost savings across the below areas.
- Obtaining a single view of customer across multiple modes of payments
- Detecting payment fraud by using behavior modeling
- Understand which payment modes are used more by which customers
- Realtime analytics support
- Tracking, modeling & understanding customer loyalty
- Social network and entity link analysis
The road ahead –
How can leaders in the Banking industry leverage a predictive analytics based approach across each of the industry ?
I posit that this will take place in four ways –
- Using data to create digital platforms that better engage customers, partners and employees
- Capturing & analyzing any and all data streams from both conventional and newer sources to compile a 360 degree view of the retail customer, institutional client or payment or fraud etc. This is critical to be able to market to the customer as one entity and to assess risk across that one entity as well as populations of entities
- Creating data products by breaking down data silos and other internal organizational barriers
- Using data driven insights to support a culture of continuous innovation and experimentation
The next & final post will examine specific Data Science techniques covering key algorithms, and other computational approaches.. We will also cover business & strategy recommendations to industry CXO’s embarking on Data Science projects.
1 comment
Great article..must read for data scientists esp those in finance.