Monday, January 28, 2013

Are You Being Served, pt II

This article isn't going to be directed toward digital analysts; rather, it will be directed more to folks who hire or contract with analysts or firms, and are the recipients (or customers) of the technical work performed by those digital forensics analysts.  My goal here is to simply express some thoughts on how customers might go about determining if the results of the work that they contracted for are meeting their needs.

Previously in this blog, I asked the question, Are you being served?  If you've asked yourself that question, you may be wondering...how would I know?  Selecting a DFIR analyst (either an individual or a firm) is really no different for evaluating and hiring any other provider of services, such as a plumber or auto mechanic.  The difference is that plumbers and mechanics fix something for you, and you can evaluate their services based on if the problem is fixed, and for how long.  For customers of digital analysis services, determining if you're getting what you paid for is a bit more difficult.

In exploring the subject of finding a digital forensics expert, I ran across this article at the Law.com web site.  The article contains a number of aspects of the overall digital analysis services that lawyers should consider when looking for a digital forensics expert.  For example, the article suggests that when asked to identify methods of data exfiltration, analysts should include USB devices.  This is good to know, but more importantly, does the analyst identify all such devices, or only the thumb drives?  How do you know?  Does the analyst make an attempt to determine the use of counter-forensics techniques, where a user might delete certain artifacts in an attempt to hide the fact that they connected a specific device to the system?  What details can the analyst provide with respect to the device being connected to the system, and how a user may have interacted with that device?  Regardless of the data exfiltration method used (USB device, web mail, BlueTooth, etc.), how does the analyst address data movement, in particular?

Beyond those items addressed in the article, some other things to consider include (but are not limited to):

Does the analyst explore historical data, such as Volume Shadow Copies (VSCs), when and where it is appropriate to do so?  If not, why?  If the methodology used by the analyst fails to find any VSCs, what does the analyst state as the reason for this finding?

What about other artifacts?  When the analyst provides a finding, do they have additional artifacts to support their findings, or are their findings based on that one artifact?  If artifacts (such as Prefetch files) are not examined or missing, what reason does the analyst provide?

If you're interested in the existence of malware on a system, what does the analyst do to address this issue?  Do they run AV against the mounted image?  What else do they do?  If malware is found, do they determine the initial infection vector?  Do they determine if the malware ever actually executed?

When you look at the report provided, does the information in it answer your questions and address your concerns, or are there gaps?  Does the analyst connect the dots in the report, or do they skip over many of the dots, and fill in the gaps using speculation?

One question that you might consider asking is, what tools does the analyst use, but I would suggest that it's more important to know how the tools are used.  For example, having access to one of the commercial analysis suites can be a good thing, particularly if the analyst states that they will use it on your case to perform a keyword search.  But does it make sense to do so?  Did they work with you to develop a list of keywords to use in the search?  I've heard of examinations that were delayed for some time while the data was being preprocessed and indexed in preparation for a keyword search, yet none of the analysts could state why the keyword search was necessary or of value to the case itself.

There is often much more to digital analysis than simply finding one or two artifacts in order to "solve the case".  Systems today are sufficiently complex that multiple artifacts are needed to identify the context of a single artifact, such as a tool not finding VSCs within an image of a Windows 7 system.  Digital analysis is very often used as the basis for making critical business decisions or addressing legal questions, so the question remains...are you being served?  Are you getting the data that you need, in a timely manner, and in a manner that you can understand and use?

Resources
Law.com - How to Find a Digital Forensics Expert


Interested in Windows DFIR trainingWindows Forensic Analysis, 11-12 Mar; Timeline Analysis, 9-10 Apr. Pricing and Calendar. Send email here to register.

Why "BinMode"?

You may be wondering why I've started posting articles to my blog with titles that start with "BinMode" and "There Are Four Lights".

The "BinMode" posts are dedicated to deeply technical posts; the name comes from the fact that sometimes I'll write a Perl script that requires me to open a file using binmode(), so that I can parse the file on a binary level. These are generally posts that go beyond the tools, which tend to provide a layer of abstraction between the data and analyst.  I feel that it's important for analysts to understand what data is available to them, so that they can make better decisions as to which tool to use to extract and process that data.

An example of this is the recent work I've done parsing the Java deployment cache index (*.idx) files.  Beyond opening these files in a hex editor, one resource that I had access to in order to assist me in parsing the files is this source code page: CacheEntry.java.  Another resource that became available later in the process is the format specification that Mark Woan documented. What these resources show is that within the binary data, there is potentially some extremely valuable information.  This information might be most useful during a root cause analysis investigation, perhaps to determine the initial infection vector of malware, or how a compromise occurred.

The "Four Lights" articles are partly a nod to the inner geek (and Star Trek fan) in all of us, but they're also to address something that may be lesser known, or perhaps seen as a misconception within the digital forensic analysis community.  The title alludes to an episode of ST:TNG, during which his captors attempted to get the greatest starship captain...EVER...to say that there were only three lights, when, in fact, there were four.

If there is a particular topic that you'd like me to expand upon, or if there's something that you'd like to see addressed, feel free to leave a comment here, or to send me an email.


Interested in Windows DF training?  Check it out: Timeline Analysis, 4-5 Feb; Windows Forensic Analysis, 11-12 Mar.  Be sure to check the WindowsIR Training Page for updates.

Monday, January 21, 2013

BinMode: Parsing Java *.idx files, pt. deux

My last post addressed parsing Java *.idx files, and since I released that post, a couple of resources related to the post have been updated.  In particular, Joachim Metz has updated the ForensicsWiki page he started to include more information about the format of the *.idx files, with some information specific to what is thought to be the header of the files.

Also, Corey Harrell was kind enough to share the *.idx file from this blog post with me (click here to see the graphic of what the file "looks like" in Corey's post), and I ran it through the parser to see what I could find:

File: d:\test\781da39f-6b6c0267.idx

Times from header:
------------------------------
time_0: Sun Sep 12 15:15:32 2010 UTC
time_2: Sun Sep 12 22:38:40 2010 UTC

URL: http://xhaito.com/work/builds/exp_files/rox.jar
IP: 91.213.217.31

Server Response:
------------------------------
HTTP/1.1 200 OK
content-length: 14226
last-modified: Sun, 12 Sep 2010 15:15:32 GMT
content-type: text/plain
date: Sun, 12 Sep 2010 22:38:35 GMT
server: Apache/2
deploy-request-content-type: application/x-java-archive

Ah, pretty interesting stuff.  Again, the "Times from header" section is comprised of, at this moment, data from those offsets within the header that Joachim has identified as possibly being time stamps.  In the code, I have it display only those times that are not zero.  What we don't have at the moment is information about the structure of the header so that we can identify to what the time stamps refer.

However, this code can be used to parse *.idx files and help determine to what the times refer.  For example, in the output above we see that "time_0" is equivalent to the "last modified" field in the server response, and that the "time_2" field is a few seconds after the "date" field in the server response.  Perhaps incorporating this information into a timeline might be useful, while research continues in order to identify what the time stamps represent.  What is very useful is that the *.idx files are associated with a specific user profile, so for testing purposes, an analyst should be able to incorporate browser history and *.idx info into a timeline, and perhaps be able to "see" what the time stamps may refer to...if the analyst were to control the entire test environment, to include the web server, even more information may be developed.

Speaking of timelines, Sploited commented to my previous post regarding developing timelines analysis pivot points from other resources; in the comment, a script for parsing IE history files (urlcache.pl) was mentioned; I would suggest that incorporating a user's web history, as well as incorporating searches against the Malware Domain List might be extremely helpful in identifying initial infect vectors and entry points.


Interested in Windows DF training?  Check it out: Timeline Analysis, 4-5 Feb; Windows Forensic Analysis, 11-12 Mar.  Be sure to check the WindowsIR Training Page for updates.

Saturday, January 19, 2013

BinMode: Parsing Java *.idx files

One of the Windows artifacts that I talk about in my training courses is application log files, and I tend to sort of gloss over this topic, simply because there are so many different kinds of log files produced by applications.  Some applications, in particular AV, will write their logs to the Application Event Log, as well as a text file.  I find this to be very useful because the Application Event Log will "roll over" as it gathers more events; most often, the text logs will continue to be written to by the application.  I talk about these logs in general because it's important for analysts to be aware of them, but I don't spend a great deal of time discussing them because we could be there all week talking about them.

With the recent (Jan, 2013) issues regarding a Java 0-day vulnerability, my interest in artifacts of compromise were piqued yet again when I found that someone had released some Python code for parsing Java deployment cache *.idx files.  I located the *.idx files on my own system, opened a couple of them up in a hex editor and began conducting pattern analysis to see if I could identify a repeatable structure.  I found enough information to create a pretty decent parser for the *.idx files to which I have access.

Okay, so the big question is...so what?  Who cares?  Well, Corey Harrell had an excellent post to his blog regarding Finding (the) Initial Infection Vector, which I think is something that folks don't do often enough.  Using timeline analysis, Corey identified artifacts that required closer examination; using the right tools and techniques, this information can also be included directly into the timeline (see the Sploited blog post listed in the Resources section below) to provide more context to the timeline activity.

The testing I've been able to do with the code I wrote has been somewhat limited, as I haven't had a system that might be infected come across my desk in a bit, and I don't have access to an *.idx file like what Corey illustrated in his blog post (notice that it includes "pragma" and "cache control" statements).  However, what I really like about the code is that I have access to the data itself, and I can modify the code to meet my analysis needs, much the way I did with the Prefetch file analysis code that I wrote.  For example, I can perform frequency analysis of IP addresses or URLs, server types, etc.  I can perform searches for various specific data elements, or simply run the output of the tool through the find command, just to see if something specific exists.  Or, I can have the code output information in TLN format for inclusion in a timeline.

Regardless of what I do with the code itself, I know have automatic access to the data, and I have references included in the script itself; as such, the headers of the script serve as documentation, as well as a reminder of what's being examined, and why.  This bridges the gap between having something I need to check listed in a spreadsheet, and actually checking or analyzing those artifacts.

Resources
ForensicsWiki Page: Java
Sploited blog post: Java Forensics Using TLN Timelines
jIIr: Almost Cooked Up Some Java, Finding Initial Infection Vector


Interested in Windows DF training?  Check it out: Timeline Analysis, 4-5 Feb; Windows Forensic Analysis, 11-12 Mar.

Saturday, January 12, 2013

There Are Four Lights: The Analysis Matrix

I've talked a lot in this blog about employing event categories when developing, and in particular, when analyzing timelines, and the fact is that we can use these categories for much more that just adding analysis functionality to our timelines.  In fact, using artifact and event categories can greatly enhance our overall analysis capabilities.  This is something that Corey Harrell and I have spent a great deal of time discussing.

For one, if we categorize events, we can raise our level of awareness of the context of the data that we're analyzing.  Having categories for various artifacts can help us increase our relative level of confidence in the data that we're analyzing, because instead of looking at just one artifact, we're going to be looking at various similar, related artifacts together.

Another benefit of artifact categories is that they help us remember what various artifacts relate to...for example, I developed an event mapping file for Windows Event Log records, so as a tool parses through the information available, it can assign a category to various event records.  This way, you no longer have to search Google or look up on a separate sheet of paper what that event refers to...you have "Login" or "Failed Login Attempt" right there next to the event description.  This is particularly useful, as of Vista, Microsoft began employing a new Windows Event Log model, which means that there are a LOT more Event Logs than just the three main ones we're used to seeing.  Sometimes, you'll see one event in the System or Security Event Log that will have corresponding events in other event logs, or there will be one event all by itself...knowing what these events refer to, and having a category listed for each, is extremely valuable, and I've found it to really help me a great deal with my analysis.

One way to make use of event categories is to employ an analysis matrix.  What is an "analysis matrix"?  Well, what happens many times is that analysts will get some general (re: "vague") analysis goals, and perhaps not really know where to start. By categorizing the various artifacts on a Windows system, we can create an analysis matrix that provides us with a means for at least begin our analysis.

An analysis matrix might appear as follows:

Malware Detection Data Exfil Illicit Images IP Theft
Malware X X
Program Execution X X X
File Access X X X
Storage Access X X X
Network Access X

Again, this is simply a notional matrix, and is meant solely as an example.  However, it's also a valid matrix, and something that I've used.  Consider "data exfiltration"...the various categories we use to describe a "data exfiltration" case may often depend upon what you learn from a "customer" or other source.  For example, I did not put an "X" in the row for "Network Access", as I have had cases where access to USB devices was specified by the customer...they felt confident that with how their infrastructure was designed that this was not an option that they wanted me to pursue.  However, you may want to add this one...I have also conducted examinations in which part of what I was asked to determine was network access, such as a user taking their work laptop home and connecting to other wireless networks.

The analysis matrix is not intended to be the "be-all-end-all" of analysis, nor is it intended to be written in stone.  Rather, it's intended to be something of a living document, something that provides analysts with a means for identifying what they (intend to) do, as well as serve as a foundation on which further analysis can be built.  By using an analysis matrix, we have case documentation available to us immediately.  An analysis matrix can also provide us with pivot points for our timeline analysis; rather than combing through thousands of records in a timeline, we now not only have a means of going after that information which may be most important to our examination, but it also helps us avoid those annoying rabbit holes that we find ourselves going down sometimes.

Finally, consider this...trying to keep track of all of the possible artifacts on a Windows system can be a daunting task.  However, it can be much easier if we were to compartmentalize various artifacts into categories, making it an easier task to manage by breaking it down into smaller, easier-to-manage pieces.  Rather than getting swept up in the issues surrounding a new artifact (Jump Lists are new as of Windows 7, for example...) we can simply place that artifact in the appropriate category, and incorporate it directly into our analysis.

I've talked before in the blog about how to categorize various artifacts...in fact, in this post, I talked about the different ways that Windows shortcut files can be categorized.  We can look at access to USB devices as storage access, and include sub-categories for various other artifacts.


Interested in Windows DFIR training?  Check it out...Timeline Analysis, 4-5 Feb; Windows Forensic Analysis, 11-12 Mar.

Tuesday, January 08, 2013

Training

For those readers who may not be aware, I teach a couple of training courses through my employer, at our facility in Reston, VA.  We're also available to deliver those courses at your location, if requested.  As such, I thought it might be helpful to provide some information about the courses, so in this post, I'll talk about the courses we offer, some we're looking to offer, and what you can expect to get out of the courses.

Windows Forensic Analysis
Day 1 starts with a course introduction, and then we get right into discussing some core analysis concepts, which will be addressed again and again throughout the training.  From there, we begin exploring and discussing some of the various data sources and artifacts available on Windows 7 systems.  Knowing that XP is still out there, we don't ignore that version of Windows, we simply focus primarily on Windows 7.  Artifacts specific to other systems are discussed, as they come up.

Throughout the course, we also discuss the various artifact categories, and how to create and use an analysis matrix to focus and document your analysis. We discuss what data is available, how to get it, how to correlate that data with other available data, and how to get previous versions of that data by accessing Volume Shadow Copies.  All of this is accompanied by hands-on demonstrations of tools and techniques; many of the tools used are only available to those attending the training.

Day 2 starts with a quick review of the previous day's materials and answering any questions attendees may have; if there's any material that needs to be completed from the first day, we finish up with that, and then move into the hands-on exercises.  Depending upon the attendee's familiarity with the tools and techniques used, these exercises may be guided, or they will be completed by attendees, in teams or individually.

Do you want to know what secrets lie hidden within Windows shortcut files and Jump Lists?  Want to know more about "shellbags"?  How about other artifacts?  This course will tell...no, show...you.  Not only that, we'll show you how to use this information  to a greater effect, in a more timely and efficient manner, in order to extend your analysis.

Each attendee receives a copy of Windows Forensic Analysis Toolkit 3/e

Timeline Analysis 
Day 1 - Much like the Windows Forensic Analysis course, we start the first day with some core analysis concepts specific to timeline analysis, and then we jump right into exploring and discussing various data sources and artifacts as they relate to creating and analyzing timelines.  We discuss the various artifact and event categories, and how this information can be used to get more out of your timeline analysis.

Day 2 starts off with completing any material from the first day, answering any questions the attendees may have, and then kicking off into a series of scenarios where questions are answered based on findings from a timeline; we not only go over how to create a timeline, but also how to go about analyzing that timeline and finding the answers to the questions.

If you can't remember all of the commands that we go over in the course, don't worry...you can write down notes on the provided copies of the slides, or you can turn to the provided cheat sheet for hints and reminders.  Many of the tools used in this course are only available to those attending the course.

Each attendee receives a copy of Windows Forensic Analysis Toolkit 3/e.

Registry Analysis
This 1-day course is based on the material in my book, Windows Registry Forensics. As such, we spend some time in this course discussing not only the structure of the Registry, but also the value of performing Registry analysis.  There is a good deal of information in the Registry that can significantly impact your analysis, and the goal of this course is to allow you to go beyond assumption to determining explicitly why you're seeing what you're seeing. 

As you would guess, we spend some time discussing various tools, and some attention is given to RegRipper.  For those interested, attendees will receive plugins that are not available through the public distribution.  We also spend some time discussing the RegRipper components and structure, how it's used, and how to get the most out of it.

One of the take-aways we provide with this course is a graphic illustrating various components of USB device analysis, showing artifacts that aren't addressed anywhere else.

Each attendee receives a copy of Windows Registry Forensics.

Why Should I Attend?
That's always a great question; it's one I ask myself, as well, whenever I have an option to attend training.

Each attendee is provided the tools for the course, which includes tools that are only available to you if you attend the course.  Tools for parsing various data structures, including RegRipper plugins that you can't get any place else.  Several publicly available tools are discussed in the courses, but due to licenses, are not provided with the course materials.  In such cases, the materials provide links to the tools.

I continually update the course materials.  I sit down with the materials immediately following a course and look at my notes, any questions asked by attendees, and I pay particular attention to the course evaluation forms.  When something new pops up in the media, I like to be sure to include it in the course for discussion.  Updates come from other areas, as well...most notably, what I get from and how I perform my analysis.  New techniques and findings are continually incorporated directly into the training materials.

As the Windows operating systems have gotten more complex, it's proven to be difficult for a lot of analysts to maintain current knowledge of the various artifacts, as well as analysis tools and techniques.  These courses will not only provide you with the information, but also provide you with an opportunity to use those tools and employ those techniques, developing an understanding of each so that you can incorporate them into your analysis processes.

What Do I Need To Know Before Attending?
For the currently available courses, we ask that you arrive with a laptop with Windows 7 installed (can be a VM), a familiarity with operating at the command prompt, and a desire to learn.  Bring your questions.  While sample data is provided with the course materials, feel free to bring your own data, if you like.

The courses are developed so that you do NOT want to book all of these courses in a single 5-day training course.  The reason is that a great deal of information is provided in the Windows Forensic Analysis course, and if you've never done timeline analysis before (and in some cases, even if you have), you do not want to immediately step off into the Timeline Analysis course. It is best to take the Windows Forensic Analysis (and perhaps the Registry Analysis) course(s), return to your shop, and make develop your familiarity with the data sources before taking the Timeline Analysis course.

If you've ever seen or heard me present, you know that I am less about lecturing and more about interacting.  If you're interested in engaging and interacting with others to better understand data sources and artifacts, as well as how they can be used to further your analysis, then sign up for one of our courses.

Upcoming Course(s)
Malware Detection - By request, I'm working a course that addresses malware detection within an acquired image.  I've taught courses similar to this before, and I think that in a lot of ways, it's an eye-opener for a lot of folks, even those who deal with malware regularly.  This is NOT a malware analysis course...the purpose of this course is to help analysts understand how to locate malware within an acquired image.  This is one of those analysis skills that traverses a number of cases, from breaches to data theft, even to claims of the "Trojan Defense".

Others - TBD.


Our website includes information regarding the schedule of courses, as well as the cost for each course.  Check back regularly, as the schedule may change.  Also, if you're interested in having us come to you to provide the training, let us know.

Saturday, January 05, 2013

There Are Four Lights: USB-Accessible Storage

There's been a good deal of discussion and documentation regarding discovering USB devices that had been connected to a Windows system, as this seems to be very important to a number of examiners.  In 2005, Cory Altheide and I published some initial information, and over the years since then, that information has been expanded, simply because it continues to grow.  For example, Rob Lee has published valuable checklists via the SANS Forensics Blog, and Jacky Fox recently published her dissertation, which includes some interesting and valuable information regarding interpreting some of the information that is available regarding user access to USB devices via the Registry.  Ms. Fox determined that when a USB device is connected to a system and mounted as a volume, that volume GUID is added to the MountPoints2 key for all logged in users, not just the user logged in at the console.

Further, Mark Woan recently updated information collected by his USBDeviceForensics tool, to include querying some additional keys/values.

Regarding the additional keys/values that Mark's tool is querying, Windows 7 and 8 systems have additional values beneath the device keys in the System hive, specifically a "Property" key with a number of GUID subkeys.  This blog post provides some very good information that facilitates further searches, which leads use to information regarding a time stamp value that pertains to the InstallDate, as well as one that pertains to the FirstInstallDate.

So what?  Well, let's take a look at the MS definition for the FirstInstallDate:

Windows sets the value of DEVPKEY_Device_FirstInstallDate with the time stamp that specifies when the device instance was first installed in the system.

Pretty cool, eh?  This is what MS says about the InstallDate time stamp:

This time stamp value changes for each successive update of the device driver. For example, this time stamp reports the date and time when the device driver was last updated through Windows Update.

Ah, interesting. So it would appear that, based on the MS definitions for these values, we now have the information about when the device was first connected to the system available right there in the Registry.  I'm not saying that we don't have to go anywhere else...rather, I'm suggesting that we have corroborating data that we can use to provide an increased relative confidence (a phrase that you usually see in my posts regarding timelines) in the data that we're analyzing.

Something that hasn't been addressed is that most of the publicly-available processes that are currently being used are not as complete as they could be.  Wait...what?  Well, this is where specificity of language within the DFIR community comes into play...it turns out that the processes are actually really very good, as long as all we're interested in is specifically USB thumb drives or external drives.  However, there are devices that can be connected to Windows systems via USB and accessed as storage devices (digital cameras, iStuff, smartphone handsets), that do not necessarily become apparent to analysts using the commonly-accepted tools, processes and checklists.  We can find these devices by looking beneath other Registry keys, as well as in other locations beyond the Registry, and by correlating information between them.  This is particularly useful when counter-forensics techniques have been used (however unintentional...), as not everything may be completely gone, and we may be able to find some remnant (LNK file, shellbags, deleted Registry keys/values, Windows Event Log, etc.) that will point us to the use of such devices.

One of the pitfalls of interpretation of Registry data, as Ms. Fox pointed out in her dissertation, is that we often don't have current, up-to-date databases of all devices that could be connected to a Windows system, so we might see vendor ID (VID) and product ID (PID) values within key names beneath the Enum\USB key, but not know what they translate to...I've found Motorola devices, for instance, that required a good deal of searching in order to determine which smartphone handset was pointed to by the PID value.  As such, no process is going to be 100%, push-a-button complete, but the point is that we will know that the data is there, we know to get it, and we know how to use it.

Full analysis of USB-accessible storage media can be extremely important to a number of exams, such as illicit image and IP theft cases.  Many examiners used to think that sneaking a thumb drive into an infrastructure was a threat...and it still is; these devices get smaller and smaller every day, while their capacity increases.  But we need to start thinking about other USB-accessible storage, such as smartphones and iDevices, not because they're easily hidden, but because they're so ubiquitous that we tend to not focus on them...we take them for granted.

A Mapping Technique
The EMDMgmt subkey (within the Software Registry hive) names include the serial number for the mounted volume (VSN), which is also included in the MS-SHLLLINK structure, which itself is found in Windows shortcut/LNK files, as well as Windows 7 and 8 Jump Lists.  By correlating the VSNs from multiple sources, I was able to illustrate access to external storage devices in a manner that overcomes the shortcoming identified by Ms. Fox.  What I've done is used code to parse through the LNK structures (LNK files in the Recent folder, for example, and the LNK streams within the Jump Lists) to list the VSNs, looking for the one (or two, or however many...) that point to the device identified in the EMDMgmt subkey name.

Tuesday, January 01, 2013

BinMode

I've recently been working on a script to parse the NTFS $UsnJrnl:$J file, also known as the USN Change Journal.  Rather than blogging about the technical aspects of what this file is, or why a forensic analyst would want to parse it, I thought that this would be a great opportunity to instead talk about programming and parsing binary structures.

There are several things I like about being able to program, as an aspect of my DFIR work:
- It very often allows me to achieve something that I cannot achieve through the use of commercially available tools.  Sometimes it allows me to "get there" faster, other times, it's the only way to "get there".
- I tend to break my work down into distinct, compartmentalized tasks, which lends itself well to programming (and vice versa).
- It gives me a challenge.  I can focus my effort and concentration on solving a problem, one that I will likely see again and will already have an automated solution for solving when I see it.
- It allows me to see the data in its raw form, not filtered through an application written by a developer.  This allows me to see data within the various structures (based on structure definitions from MS and others), and possibly find new ways to use that data.

One of the benefits of programming is that I have all of this code available, not just as complete applications but also stuff I've written to help me perform analysis.  Stuff like translating time values (FILETIME objects, DOSDate time stamps, etc.), as well as a printData() function that takes binary data of an arbitrary length and translates it into a hex editor-style view, which makes it easy to print out sections of data and work with them directly.  Being able to reuse this code (even if "code reuse" is simply a matter of copy-paste) means that I can achieve a pretty extensive depth of analysis in fairly short order, reducing the time it takes for me to collect, parse, and analyze data at a more comprehensive level than before.  If I'm parsing some data, and use the printData() function to display the binary data in hex at the console, I may very well recognize a 64-bit time stamp at a regular offset, and then be able to add that to my parsing routine.  That's kind of how I went about writing the shellbags.pl plugin for RegRipper.

I've also recently been looking at IE index.dat files in a hex editor, and writing my own parser based on the MSIE Cache File Format put together by Joachim Metz.  So far, my initial parser works very well against the index.dat file in the TIF folder, as well as the one associated with the cookies.  But what's really fascinating about this is what I'm seeing...each record has two FILETIME objects and up to three DOSDate (aka, FATTime) time stamps, in addition to other metadata.  For any given entry, all of these fields may not be populated, but the fact is that I can view them...and verify them with a hex editor, if necessary.

As a side note regarding that code, I've found it very useful so far.  I can run the code at the command line, and pipe the output through one or more "find" commands in order to locate or view specific entries.  For example, the following command line gets the "Location : " fields for me, and then looks for specific entries; in this case, "apple":

C:\tools>parseie.pl index.dat | find "Location :" | find "apple" /i

Using the above command line, I'm able to narrow down the access to specific things, such as purchase of items via the Apple Store, etc.

I've also been working on a $UsnJrnl (actually, the $UsnJrnl:$J ADS file) parser, which itself has been fascinating.  This work was partially based on something I've felt that I've needed to do for a while now, and talking to Corey Harrell about some of his recent findings has renewed my interest in this effort, particularly as it applies to malware detection.

Understanding binary structures can be very helpful.  For example, consider the target.lnk file illustrated in this write-up of the Gauss malware.   If you parse the information manually, using the MS specification...which should not be hard because there are only 0xC3 bytes visible...you'll see that the FILETIME time stamps for the target file are nonsense (Cheeky4n6Monkey got that, as well).  As you parse the shell item ID list, based on the MS specification, you'll see that the first item is a System folder that points to "My Computer", and the second item is a Device entry whose GUID is "{21ec2020-3aea-1069-a2dd-08002b30309d}".  When I looked this GUID up online, I found some interesting references to protecting or locking folders, such as this one at LIUtilities, and this one at GovernmentSecurity.org.  I found this list of shell folder IDs, which might also be useful.

The final shell item, located at offset 0x84, is type 0x06, which isn't something that I've seen before.  But there's nothing in the write-up that explains in detail how this LNK file might be used by the malware for persistence or propagation, so this was just an interesting exercise for me, as well as for Cheeky4n6Monkey, who also worked on parsing the target.lnk file manually.  So, why even bother?  Well, like I said, it's extremely beneficial to understand the format of various binary structures, but there's another reason.  Have you read these posts over on the CyanLab blog? No?  You should.  I've seen shortcut/LNK files with no LinkInfo block, only the shell item ID list, that point to devices; as such, being able to parse and understand these...or even just recognize them...can be very beneficial if you're at all interested in determining USB storage devices that had been connected to a system.  So far, most of these devices that I have seen have been digital cameras and smart phone handsets.

Everything
Okay, right about now, you're probably thinking, "so what?"  Who cares, right?  Well, this should be a very interesting, if not outright important issue for DFIR analysts...many of whom want to see everything when it comes to analysis.  So the question then becomes...are you seeing everything?  When you run your tool of choice, is it getting everything?

Folks like Chris Pogue talk a lot about analysis techniques like "sniper forensics", which is an extremely valuable means for performing data collection and analysis.  However, let's take another look at the above question, from the perspective of sniper forensics...do you have the data you need?  If you don't know what's there, how do you know?

If you don't know that Windows shortcut files include a shell item ID list, and what that data means, then how can you evaluate the use of a tool that parses LNK files?  I'm using shell item ID lists as an example, simply because they're so very pervasive on Windows 7 systems...they're in shortcut files, Jump Lists, Registry value data.  They're in a LOT of Registry value data.  But the concept applies to other aspects of analysis, such as browser analysis.  When you're performing browser analysis in order to determine user activity, are you just checking the history and cookies, or are you including Registry settings ("TypedURLs" key values for IE 5-9, and "TypedURLsTimes" key values on Windows 8), bookmarks, and session restore files?  When performing USB device analysis on Windows systems, are you looking for all devices, or are you using checklists that only cover thumb drives and external hard drives?

I know that my previous paragraph covers a couple of different levels of granularity, but the point remains the same...are you getting everything that you need or want to perform your analysis?  Does the tool you're using get all system and/or user activity, or does it get some of it?

Can we ever know it all?
One of the aspects of the DFIR community is that, for the most part, most of us seem to work in isolation.  We work our cases and exams, and don't really bother too much with asking someone else, someone we know and trust, "hey, did I look at everything I could have here?" or "did I look at everything I needed to in order to address my analysis goals in a comprehensive manner?"  For a variety of reasons, we don't tend to seek out peer review, even after cases are over and done.

But you know something...we can't know it all.  No one of us is as smart or experienced as several or all of us working together.  This can be close collaboration, face-to-face, or online collaboration through blogs, or sites such as the ForensicsWiki, which makes a great repository, if it's used.

Choices
Finally, a word about choices in programming languages to use.  Some folks have a preference.  I've been using Perl for a long time, since 1999.  I learned BASIC in the '80s, as well as some Pascal, and then in the mid-'90s, I picked up some Java as part of my graduate studies.  I know some folks prefer Python, and that's fine.  Some folks within the community would like to believe that there are sharp divides between these two camps, that some who use one language detest the other, as well as those who use it.  Nothing could be further from the truth.  In fact, I would suggest that this attempt to create drama where there is none is simply a means of masking the fact that some analysts and examiners simply don't understand the technical aspects of the work that's actually being done.

Resources
Forensics from the Sausage Factory - USN Change Journal
Security BrainDump - Post regarding the USN Change Journal
OpenFoundry - Free tools; page includes link to a Python script for parsing the $UsnJrnl:$J file