Six years ago I wrote the last post in this blog. The idea I have to revamp this space is to have some flash posts. Some problems I have to face in my daily activities that I tried to solve but I noticed that it is not so well documented.
This time I had to deal with some offsets found with signsrch a really handy tool developed by Luigi Auriemma. Few words about the tool. In a nutshell, signsrch is tool for searching signatures inside files, extremely useful in reversing engineering for figuring or having an initial idea of what encryption/compression algorithm is used for a proprietary protocol or file (as you can read from Luigi’s website). My goal was to find these offsets in IDA. For few samples I can do it manually by using IDA’s search option but this does not scale and I needed this into a script. These offsets are the space from the beginning of the file to the points of interest. The addresses in IDA are like the addresses that we are supposed to expect in memory this means that they are virtual addresses.
This is a common problem and it means to translate a binary offset into a virtual address. If you google a little bit you can find some stackoverflow questions and the mathematical expression to get the desired address. This process is quite clear and straightforward if you are a familiar with the PE file format. In my specific case, I needed to integrate this simple logic in a Python script using pefile (a common library to dissect PE files) and I wasted some time because I was not so familiar with some parts of the library. I was able to manually translate the offset to the right virtual address but I failed at my first attempt with pefile.
This is how the translation works:
Offset – Section_RawOffset + Section_VirtualAddress + ImageBase
Needless to say first we need to identify the right section. This means we need to have the binary boundaries of each section and this range is defined by RawOffset and (RawOffset + SizeOfRawData). Once we have these ranges we have to see in which section the offset is contained.
My main issue was the RawOffset field and how to get it from pefile. From a quick look in ipython we can see:
It is clear that the library use a different name. This is not a big problem. My first idea was to use get_file_offset but it didn’t work. This method should return the current offset. I was sure the library exported the RawOffset field and for sure it was listed in the figure above so one of the attempts was PointerToRawData (or anything with the Raw word) and it was the right choice. At this point, we can have a formula with pefile:
At this point we can translate the offset. Let’s see how everything works now. From signsrch offset to IDA through pefile:
As we can see the address we translated from the binary offset 0x75294 to 0x476894 matches in IDA and points to IsDebuggerPresent as expected.
However, pefile is a comprehensive library and it should provide another way. Another possibile approach was to use s.contains_offset and s.get_rva_from_offset and get the virtual address (VA) from the RVA by adding the ImageBase:
As you can see from the image above we listed all the sections and printed some information such as the section’s name and the steps to get the virtual address. First we invoked contains_offset and it’s True only for the “.rdata” section, second we called get_rva_from_offset and we added the ImageBase address. As expected we got 0x476894.
This is what I mean with flash posts. This post shows signsrch, pefile and how to translate binary offsets to virtual addresses in two different ways.