Sunday, November 28, 2010

Stuff

Timelines
Corey Harrell recently posted to his blog about using Calc (the spreadsheet program associated with OpenOffice) to perform timeline analysis. Corey's post was very revealing and thorough, and clearly demonstrates to the reader that here's a guy who does timeline analysis. I mean, Corey looks at regular expressions for searching, etc., and really does a good job of covering a lot of the different analysis aspects that you'd do in Excel, and provides a good indication that he's actually been doing this kind of analysis. Great job, Corey.

However, there are two things that came up in the post that might be good points for discussion. First, Corey says that his exploration originated from an anonymous question asking if Calc could be used in place of Excel. I applaud Corey's efforst, but this question demonstrates how some analysts will continue in asking these kinds of questions, rather than doing their own research and posting the results. This is one of those things that could have been easily stated as "Calc can also be used...", rather than as an anonymous question.

The other issue is the concern that Corey expressed with respect to the spreadsheet program's ability to handle massive numbers (100K or more) of rows. This is definitely a concern (particularly with versions of Excel that were pre-Office 2007), but it also demonstrates how timelines, which are meant to some degree to be a data reduction technique, can actually become very cumbersome and even slow the analyst down, simply by the shear volume of data. Yes, extracting data and generating a timeline is less data than a full image (i.e., a directory listing of a 160GB hard drive is much less than 160GB...), but it appears that so much data is capable of being added to a timeline that doing so could easily overwhelm an analyst.

Timelines and Data Reduction
One solution to this that I highly recommend is an educated, knowledgeable approach to timeline development and analysis, which is something Chris points to in his Sniper Forensics presentation. Rather than throwing everything (even the kitchen sink, if it has a time stamp) into your timeline and sorting it out from there, why not instead start with the goals of your examination, what you're trying to show, and go on from there? Your goals will show you what you need to look at, and what you need to show.

After all, one of the benefits and goals of timelines is...and should be...data reduction. While a timeline compiled from an abundance of data sources is indeed a reduction of data volume, is it really a reduction of data to be analyzed? In many cases, it may not be.

Consider this...let's say that while driving your car, you hear an odd noise, maybe a squeal of some kind, coming from front left tire every time you break. What do you do? Do you disassemble the entire car in hopes of finding something bad, or something that might be responsible for the noise? Or do you do some testing to ensure that the noise is really coming from where you think it is, and under what specific conditions?

No, I'm not saying that we throw everything out and start over. Rather, what I'm suggesting is that rather than throwing everything into a timeline, assess the relative value of the data prior to adding it. One excellent example of this is a SQL injection analysis I did...I started with the file system metadata, and added just the web server logs that contained the pertinent SQL injection statements. There was no need to add Event Log data, nor all of the LastWrite times from all of the Registry keys from the hive files on the system. There was simply no value in doing this; in fact, doing so would have complicated the analysis, simply through shear volume. Is this an extreme example? No, I don't think that it is.

By reducing the data that we need to analyze, even if doing so in through the analysis (determine what we can remove from timeline as "noise", etc.), we then get to a point where relational analysis...that is, analyzing different systems in relation to each other, we can get a better view of what may have occurred (and when) in multi-system incidents/breaches. Remember, in the five field timeline format that I've recommended using, there is a field for the system name, as well as one for the user name. Using these fields, you can observe the use of domain credentials across multiple systems, for example.

Goals
I know that many folks are going to say, "What if you don't know what you're looking for?", and the answer to that is, "If you don't know what you're looking for, why are you doing analysis?" Seriously. If you don't know what the goals of your analysis are, why are you doing it?

Sometimes folks will say, "my goals are to find all bad stuff". Well...what constitutes "bad"? What if you find nmap and Metasploit installed on a system? Is it bad? What if the user is a network engineer tasked with vulnerability scanning and assessment? Then, is this find "bad"?

From the questions I receive, a lot of times I think that there is difficulty in defining goals. Does anyone really want to "find all malware"? Really? So you want to know about every BHO and bit of spyware? Given some home user systems, imagine how long it would take to locate, document, and categorize all malware on a system. Usually what I have found is that "find all malware" really means "find the malware that may have been associated with a particular incident or event". Then, getting the person you're working with to describe the event can help you narrow down the goals of your analysis. After all, they're talking to you for a reason, right? If they could do the analysis themselves, there would be no reason to talk to you. Developing a concise set of goals allows you to define and set expectations, as well as deliver something tangible and useful.

Timeline as a Tool
Timeline analysis is an extremely useful and valuable tool...but like any other tool, it's just a tool. The actual analysis is up to the analyst. There may be times when it simply doesn't make sense to create a timeline...if that's the case, then don't bother. However, if it does make sense to develop a timeline, then do so intelligently.

Volatility Updates
Gleeda recently tweeted about documentation for Volatility being available. If you do any memory forensics, this is an excellent resource that walks you through getting all set up and running with the full capabilities of Volatility.

11 comments:

Ken Pryor said...

Regarding the line limit in Calc, if you use the fork of OpenOffice called Go-OO, its version of Calc allows up to a million lines. I found that after trying the regular OO Calc to view output from MFTRipper unsuccessfully.
KP

Ben said...

I think there are some valid cases where you don't know what you're looking for. Murders, for example, often seem to present computer exhibits with a remit to dig up lifestyle, contacts, movements over the last month etc - a fishing expedition, in other words. Sometimes this is based on the need to tick a box saying 'computers sent for analysis' and sometimes there's intel that the victim was a heavy computer user.

Either way, fishing trips do legitimately pay off, now and then and throwing a month's worth of data into a timeline can be a huge but very fruitful task in gaining an overview of what's going on.

Anonymous said...

Great post keydet, very interesting. A couple of points:

- While it can/does generate huge quantities of review material, a timeline IS a great data reduction tool in itself.
- You've mentioned a couple of times that you don't use EnCase, but it is a great timeline reviewing tool. It has extremely powerful filters/conditions/sorting AND you can see the actual artefact while reviewing the timeline - very useful when you spot linked artefacts that you'd miss in just a pure timeline. Many analysts are more comfortable using it than command line tools!

Regards,
James

H. Carvey said...

@Ben,

I can see that, sure. However, in such cases, is it of interest to you that Java was updated, or that Restore Points were deleted?

Remember, it's all about your goals. If the goals of your exam do not call for a timeline being created, or if they call for a comprehensive timeline to be created, then there you go. What I'm suggesting is that there may not be a need to create a timeline for ever exam, or that a "everything-and-the-kitchen-sink" timeline may not be necessary, either.

H. Carvey said...

@Ben,

I've been marinating on your comments (Thanks for commenting, BTW) for a few minutes, and it occurs to me that in such instances as you've described, the goal of the exam may actually be to produce a comprehensive "Super" timeline. If that *is* the goal, for the reasons you've listed, then so be it.

However, that being said, I still don't see the utility in dumping all of the key LastWrite times from all hive files. And I can see where embedded metadata times in documents (PDF, doc/docx, etc) would be extremely useful.

So, that makes perfect sense...*if* it's your goal.

Adam said...

I've never taken a forensics class so take it for what it's worth... If network analysts can make sense of millions of packets, I would think someone could do the same with supertimelines. It depends on the tools they have, which aren't many because analyzing supertimelines is fairly new.

Maybe it would be useful for an extension to Calc specifically for timeline analysis where you could start with as little as you want, say a filter for the usual file system timeline. Then when you find an event of interest like a suspicious prefetch file, you could click on the date/time of that record and then type in a persistent filter window something like:

Type:REG,EVT +-15s Type:index.dat -1m

Which would add the registry and event logs 15 seconds before and after the time of interest, as well as index.dat information 1 minute before. Then you may be able to highlight new events of interest you find, right click and say "Keep" and go back to the filter window to delete the filter, while keeping the new events of interest.

Whether something like BPFs for timelines would be useful or not, or if it's even possible to create an extension like that in Calc/Excel, I'm not sure... I'm just throwing it out there because the problem may not just be with the analyst's techniques (nano vs super timelines), but the tools or lack of.

Adam

Rob Lee said...

Agree that timelines produce an inordinate amount of data thus why they are challenging to analyze

Filtering is essential. grep -v seems to be a very easy solution. Alternatively, I usually recommend to narrow by time windows within an hour or two of the activity you think occurred.

I don't think anyone can claim there are any clear correct or incorrect methods, just different ways to approach it.

Typically, each analyst will use it in their own way that suits their analysis style. In other words, does it really matter how each analyst decides to tie their shoes?

One thought on unknown data though via a case study: Analysts at a very large breach found a specific unknown registry key added to the system immediately after the spear phishing attack took place. When researched, the unknown key ended up being a unique marker that showed up on every system the intruder had compromised on across the enterprise. It became a critical IOC (indicator of compromise) and is still used today. In this case, without full timeline analysis, it is likely that the key might have been missed.

H. Carvey said...

Rob,

I agree wholeheartedly that there is no one way to do this sort of thing; for example, I take a minimalist approach, while others take a kitchen sink approach.

In other words, does it really matter how each analyst decides to tie their shoes?

Nope, not at all. I think what it really comes down to is, can the results reproduced/replicated?

Good case study. Can you share the key?

Thanks for your comments!

Ben said...

A couple of months ago I'd have agreed that having a too-big timeline defeats the object and is a waste of time. With the job I was referring to above though, it's been invaluable (although I've ditched probably 75% of the data after reviewing it)

Interestingly, I was actually asked for the first time to do a 'complete timeline' by the enquiry team, which I was against doing on first hearing the request - a classic case of 'you know what you want, but I know what you need'. It's turned out though that due to some major weirdnesses at very crucial times, it's been very useful indeed to see what was going on in excruciating detail.

As ever, I want it all - I'd like a timeline that covers everything at first view, then lets me zoom in and out at will.

I didn't use Log2Timeline for it, sadly. The job was a big one and such a rush that I didn't have the chance to learn a new toolset (sorting the Perl modules on an offline Windows machine alone was a painful experience). Encase's timeline view is OK - not great, not terrible, and a CSV of MAC times combined with internet history, event log dumps and other juicy bits have been very instructive.

Keep up the good work anyway - I bought your Windows Forensics book the other day and it's fantastic. Essential reading for anyone in the business.

Dave Hull said...

Data reduction is essential to resolving cases quickly. There are cases where doing time line analysis isn't necessary at all and it shouldn't be done.

In the cases where time line analysis is necessary, my preference is to gather as much data as I can, put it into Excel 2007 (or later) where handling a large data set is not an issue and then filter to reduce the data set.

Working this way allows me to alter the filters in order to look at the data a different way as the investigation proceeds. Say you've got a time frame for a compromise, you can filter the data to examine events around that time. You may gather file name or Registry data that can then be used to filter the data in a different way (i.e. by file name or Registry key rather than by time frame). Doing so may give you more information about the compromise especially in instances where time stamp manipulation has been used.

Again, completely agree with you that you've got to have a goal in mind when you start the investigation. That goal will dictate the tools and techniques that are employed during the investigation.

Tom Harper said...

re: dealing with the kitchen sink and filtering/line limitations in Calc and Excel - I have been having pretty good success using MySQL/phpMyAdmin. MySQL will import most CSV files like the ones created by the super timeline generation process. The XAMPP package makes MySQL VERY easy to leverage. The line limitation for MySQL is equal to the number of lines you can fit into a 1TB database.

Additionally, if you want to get a little more complex in your analysis you could import the timelines from the individual artifacts (filesystem, prefetch, registry, etc.) as separate tables in the database then create relationships between them using the time/date/"whatever" field as the basis of the relationship. Querying this relational database might sometimes produce results not considered previously, depending how the relationships are established.

The phpMyAdmin piece of the XAMPP package will allow a novice user to construct queries and create database relationships in a GUI (*sighs, shakes head*) where they previously needed a fair amount of MySQL syntax knowledge for the command line. There is also a robust interactive dynamic filtering capacity built in to the GUI table browser.

XAMPP is free and installation packages are available for Linux, Jobs-ware (O$ X), and Gates-ware (Window$).

All in all, I have found MySQL to be a good set of shears to trim down the sheer volume of data I deal with sometimes in supertimelines.

Semper Gumby!

Tom H.

p.s. Harlan, I thought USAF had velcro on their shoes...does Rob know what you meant? :)