Home Big Data Cybersecurity – the Killer App for Big Data..(3/4)

Cybersecurity – the Killer App for Big Data..(3/4)

by vamsi_cz5cgo

Most people are starting to realize that there are only two different types of companies in the world: those that have been breached and know it and those that have been breached and don’t know it. Therefore, prevention is not sufficient and you’re going to have to invest in detection because you’re going to want to know what system has been breached as fast as humanly possible so that you can contain and remediate.” – Kevin Mitnick in “The World’s Most Famous Hacker”

The first two posts in this series on Cybersecurity have focused on the strategic issues around information security and the IT response from the datacenter. This third post will focus on the exciting new innovations being ushered in by Big Data techniques and players in the open source space. The final post of the series will focus on the business steps that Corporate boards, Executive & IT leadership need to adopt from a governance & strategy standpoint to protect & insulate their businesses from the constant firehose of cyber attacks.

The Corporate State of Mind with Cybersecurity– 

Beyond the sheer complexity of massively coordinated attacks, the Internet is driving a need for financial institutions & merchants to provide new channels of customer interactions. Companies are constantly collecting large amounts of consumer data across multiple touch points to paint a single picture of a customer’s journey. They need to do this to help guide constant customer interactions with enriched business context. While this has beneficial effects in terms of new markets across the globe opening up to businesses in these verticals – there is a concomitant increase in vulnerability both in terms of speed of these attacks as well as the resources needed to mount them. The hackers only need to be able to penetrate defenses once to be able to compromise both customer data as well as intellectual property.

Organizations under threat of cyber attacks broadly engage in a defensive approach to cybersecurity. What do I mean by that? They largely invest in a range of technology-oriented solutions across a range of functional areas including intrusion detection systems (IDS), firewalls, data protection products, Identity & Access Management (IAAM) solutions etc –  a range of which were covered in the first blog in this series (http://www.vamsitalkstech.com/?p=1265). While all of these investments are essential and have tremendous value to offer in their respective security silos, it bears note that hacker rings and other cyber threats are constantly evolving themselves – both from a technology as well as a fraud pattern sophistication standpoint. Thanks to Cloud Computing & easy access to a tremendous amount of compute & storage,  the technological sophistication of these bad actors is only growing. They are also increasingly well funded, in some cases by rogue governments across the globe. In addition, the cyberattacker of 2016 also leverages the Dark Web for tools that range from the latest in malware, network intrusion etc – tools that can bypass the strongest corporate firewall.

In addition, whole new kinds of cyber attacks are emerging in industry verticals like financial services. For instance, Banks continue to innovate to meet consumer demand in areas ranging from ATMs to modern point of sale (PoS) terminals to Internet Banking – they face newer and more sophisticated threats. These include – Distributed Denial of service attacks; Corporate Account Take Over (CATO) attacks, ATM cash outs etc as discussed in the first blog in this series. The common theme to these attacks is the exponentially growing amounts of network traffic that must now be handled across the billions of business records that are being produced by a range of actors across the industry – consumers, IoT enabled devices, Telemetry devices like ATMs, POS terminals etc. The data deluge across industries is only too well known thanks to the media. Digitization of consumer interactions, mobile technology & the Internet of Things (IoT) are all driving consumer demands for enterprise applications to be highly responsive yet not result in a loss of privacy and security of sensitive data.

Enter the SOC 

To provide for an integrated approach across the above security platforms & toolsets, enterprises have begun investing in SOC (Security Operations Center) platforms. The SOC is a formalized capability designed to handle any and all security incidents across millions of endpoints. The goal is to provide for corporate wide data collection, data aggregation, threat detection, advanced analytic and workflow capabilities – all from a single area of management. Thus SOC systems perform a highly essential function as they deal with massive amounts of data streams constantly being generated by many different systems, devices & business applications. These range from intrusion detection systems, firewalls, antivirus tools etc as discussed above. All of this data is then pulled into security incident and event management (SIEM) tools, which then filter, aggregate, correlate and then provide reporting functions from a security alert standpoint. The typical workflow followed is to mimic the signature behavior of endpoint systems & applications into static models that reflect the typical behavior of applications using business rules & then flag any out of band behavior. A security analyst then determines if this alert represents a specific threat or if it is just harmless noise. For example – a credit card usage event from a known bad IP address, or, erroneous application behavior that could signify a malware compromise etc are all things SOC systems are tailored to detect.

SOC systems have proved to be highly effective across a range of use cases but more importantly at offering a unified place to aggregate security related data and to perform analytics on them. The effectiveness of this compared to older approaches cannot be overstated.

The Malicious Insider Threat 

One of the biggest limitations of the classical signature-based approach to detecting cyber threats is that it cannot tackle the growing threat from insiders. As we have seen from the news headlines, more often than not, insiders cause a variety of data breaches to occur. These actions range from pure neglect or error (e.g. not patching sensitive systems, virus definitions to clicking on email phishes etc) to, malicious actions caused by a range of motivations ranging from data theft to a need to hurt the organization due to some grievance. Thus, CISOs (Chief Information Security Officers) must adopt an active approach to mitigating such insider threats, just as they must do for many external threats. SOC systems are particularly unsuited to detecting insider threats and CISOs are being forced to adopt data oriented tools and techniques to glean patterns in how insiders use IT systems to understand if any of it contains harmful activity.

The other limitations of the SOC approach also need to be catalogued-

  • The rate of false positives which are high but some of which may signify an actual compromise
  • The amount of time taken by the SOC analyst in the process of triage
  • The need to look for existing bad behavior signature patterns which doesn’t protect against new (or zero day) exploits
  • The lack of an ability to resolve the threat to business applications from partners
  • Lack of learning capabilities as the attack patterns and threats themselves evolve constantly

In the face of such challenges, there is a need to re-look the security architecture of the future. I propose this can be achieved in four strategic ways from a technical perspective.

  1. Leverage real time analytics as the foundation of any security strategy. This is only possible by adopting data analytics that provide real time analysis at extremely low time latencies. An ability to constantly ingest and analyze data from network devices, malware sources, identity and authentication systems. The ability to leverage machine learning and data science to do threat classification as opposed to strict rules based approaches to analyze relationships between data
  2. Natively integrate these analytics into these applications such that they promote and way of automatic learning of threat patterns.
  3. Promote ways to enable business processes to learn from these incidents
  4. Promote an open source ecosystem so that every enterprise that adopts these platforms can automatically learn & enhance their analytics as a way of joining forces against the cyberattacker communities

Enter Big Data –

So what can Big Data and the Hadoop ecosystem bring to this complex world of security analytics as applied to the above strategies? The answer is “All of the above and much more.” Leveraging a Big Data approach to supplant existing investments, cyber defense can move into attack mode as well.

As depicted in the below illustration, Big Data provides a data platforms that can ingest massive amounts of internal & external data,. On this provide machine learning, text mining & ontology modeling to provide advanced cyber detection, prediction and prevention. According to IDC, the big data and analytics market will reach $125 billion worldwide in 2015 [3]. It is clearly evident that an increased number of cyber security platforms will leverage big data storage and analytics going forward. Various Cybersecurity solutions like – network security, malware detection and endpoint security are beginning to feed data into a Big Data analytic platform.

Screen Shot 2016-04-06 at 9.30.31 AM

                           Illustration – Big Data Analytics (Adapted & Redrawn from IBM)

Big Data can provide Cybersecurity capabilities in four key areas –

  1. The ability to ingest application data:As players in key verticals expand the definition of Cybersecurity to encompass the insider threat – call data records (CDR), chat messages, business process data, social media activity & emails etc are all rich sources of threat detection which must be ingested as well as processed for consumption by SOC consoles.
  2. The ability to capture, store & process high volumes of any kind of security & security telemetry data at scale:Security data (e.g threat intelligence, geolocation, watchlist data,clickstreams etc) is constantly produced in every enterprise and ,all of it can be pushed to a Hadoop HDFS backed data lake.
  3. Perform universal processing of the data (transformation, enrichment, forensic analysis on the data: Such processing combines but is not limited to -business rules, machine learning, text mining to provide a way to model security threats as well as detection & deterrence processing.
  4. Long term information storage:In verticals like financial services, information security is expanding to not just include the classic security data but also AML (Anti Money Laundering) & Credit Card Fraud data that are both highly application driven.

With all of the above in mind, I would like to introduce the leading open source cybersecurity project built on Hadoop technology – Apache Metron.

Apache Metron:

Apache Metron was originally invented by James Sirota at Cisco systems[4]. Sirota is now Chief Data Scientist at Hortonworks and his team has been driving increased capabilities into Metron from both a feature as well as a community collaboration standpoint. Metron has been open sourced and has just attained top level project status within the Apache foundation. Expect to see increased maturity, feature richness and stability around the project as the vibrant open source community increasingly leverages Metron across multiple cybersecurity initiatives.

At a minimum, when combined with a datalake, it integrates a variety of open source big data technologies (e.g Apache Spark, Storm, Flume, HDFS etc) in order to offer a centralized tool for security monitoring and analysis. It provides capabilities for log aggregation, full packet capture indexing, storage, advanced behavioral analytics and data enrichment, while applying the most current threat-intelligence information to security telemetry within a single platform as depicted in the below illustration.

Metron_SOC

                           Illustration: Apache Metron – Key Capabilities (source – Hortonworks)


While a deepdive into Metron is a topic for a followup post, as the diagram above indicates, the Metron framework provides 4 key capabilities[3]:

    1. Security Data Lake / Vault – It provides cost effective way to store enriched telemetry data for long periods of time. This data lake provides the corpus of data required to do feature engineering that powers discovery analytics and provides a mechanism to search and query for operational analytics.
    2. Pluggable Framework – It provides not only a rich set of parsers for common security data sources (pcap, netflow, bro, snort, fireye, sourcefire) but also provides a pluggable framework to add new custom parsers for new data sources, add new enrichment services to provide more contextual info to the raw streaming data, pluggable extensions for threat intel feeds, and the ability to customize the security dashboards.
    3. Security Application – Metron provides standard SIEM like capabilities (alerting, threat intel framework, agents to ingest data sources) but also has packet replay utilities, evidence store and hunting services commonly used by SOC analysts.
    4. Threat Intelligence Platform – Metron will provide advanced defense techniques that consists of using a class of anomaly detection and machine learning algorithms that can be applied in real-time as events are streaming in.

Conclusion

We have covered a lot of ground in this post to reiterate the fact that big data is a natural fit for powerful security analytics. The Hadoop ecosystem & projects like Metron combine to provide a scalable platform for security analytics that can effectively enable rapid detection and rapid response for advanced security threats. It is heartening that an industry leader like Hortonworks is not only recognizing the grave business threat that Cybersecurity presents but is also driving an open source ecosystem around such needs.

The final post of the series will focus on the business recommendations that Corporate boards, Executive (CISOs, CXOs), Business & IT leadership need to adopt from both a governance & strategy standpoint to protect & insulate their businesses from the constant firehose of cyber attacks.

References –

  1. SANS SOC Reference – https://www.sans.org/reading-room/whitepapers/analyst/building-world-class-security-operations-center-roadmap-35907
  2. Hortonworks blog by James Sirota  – http://hortonworks.com/blog/leveraging-big-data-for-security-analytics/
  3. Metron Explained – https://community.hortonworks.com/articles/26050/apache-metron-explained.html

Discover more at Industry Talks Tech: your one-stop shop for upskilling in different industry segments!

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.