Going through HOOPs pt.1
Time flies! It's now been 2 months since I started at HOOP. It's also amazing how whenever I meet customers they keep telling me how difficult it is to get hold of their data. Our primary mission at HOOP is to help organisations build and promote a data culture that enables resilient Security Operations.
With that in mind I wanted to write about this effort in this 3-part blog series where we will introduce our vision for modern data-centric CyberOps. Today I’m delighted to share Part 1: The data engineering problem
By now I feel like most professionals in our sector understand the commodity that is data. It’s sort of everywhere! The internet and everything modern and digital is based on the power of data - especially MetaData. Meta comes from the Greek word “μετά” that means “after” or “post”, and denotes the fact that Meta and Metadata is what surrounds the “core” data or information. Typically, Metadata is what is most useful in an investigation - as security folks will always tell you.
However modern security teams still struggle to exploit the power of data as our own industry still makes it hard and convoluted. We lack standardisation and proper integration through open frameworks - great effort has been seen in the recent years and more will come I’m sure - and although we are not here to discuss the reasons behind that, we should just state that it’s not right!
I think this leads us to 3 key questions. What can we do about it? What is particularly hard? And, can we actually do something that can put us on a trajectory to success?
Well, the short answer is of course. But as always, I should caveat that by saying, “it depends!”
The problem starts at the source: logging. Logging is great, but when you are trying to correlate multiple sources you quickly find out that logging formats, although they might feel similar, to a machine that understands zeroes and ones, are severely different. That is why normalisation is key. Normalisation is when different fields from different sources are translated - or mapped - to be the same based on the type of information they are carrying: say src_ip and sourceip can become sourceaddress. As an example, OCSF or Open Cybersecurity Schema Framework is an effort from Splunk and AWS (2 of our key Partners) to create a universal security event format, other examples are CIM or CEF, for those who are familiar. The paper(link) is a fantastic read and I encourage anyone to read and most importantly share! Now, everyone is speaking the same language. Job done! Right?
Well……It’s not easy to do, it takes time, is hard, and that means it often ends up not happening, sometimes, never. So teams, despite having the data, are not properly utilising them. A data strategy and some help can turn this around and will make the life of the security team so much better.
But also, many customers are asking us: what sources should I log? Where should I start? And this is something that has prompted us to write another blog where we will be taking a look at the latest investigation reports to map out the “hotter” data sources based on MITRE ATT&CK techniques seen in the wild. So stay tuned for that!
The next problem on the list is understanding your data source. That means, understanding the type of data, the fields the source has, and how we can use them. Splunk’s motto for a long time was “know your data”, and for a good reason! We need to be able to spend time with our data to understand how one source is relevant to a specific use case or how it can be correlated with another source: how can we use Threat Intelligence for example to help us track early signs of compromised endpoints that can lead to a ransomware attack many weeks down the line. If we don’t understand our data we can’t know what to look out for. But in order to understand our data we need to get to our data, and that means shipping your data to your data platform.
Data shipping is how you collect logs from a source and subsequently forward them to your destination - being your data platform or data store. Unfortunately though, Data shipping is overlooked most of the time. The, what some folks call, “getting-data-in” problem is often neglected because we just want to have our data in one place and start digging, in other words, we can argue that data-shipping is a meta problem of data analysis. But a crucial one nonetheless! In most environments we see a plethora of data shippers from specific vendors, like Splunk forwarders, to open sources ones such as fluentD or elastic beats. That adds to the complexity of data ingestion, overall, and that is why data broker solutions have seen a rise. A data broker solution can help you create data pipelines that allow you to standardise data ingestion per data source and better control your data ingestion per use case e.g. security data. They can also allow you to parse, filter and transform your data in order to get from lower fidelity events to higher fidelity data.
So, because this blog turned out to be a bit longer than I initially thought, let's sum everything up:
- The key is - Log, log, log!
- Understand what you log and why.
- Decide what sources to log - typically based on your industry and your use cases.
- Start thinking about data pipelines to minimise the time and effort spent in order to get your data in more effectively.
At HOOP Cyber we want to make this an easier journey for you. We are here to help you understand your data and assist you in creating a data culture and data strategy to take your security operations to the next level. If you have any questions please reach out to us, we are eager to talk to you!