Translate

Archives

Avast Retargetable Decompiler IDA Plugin

In early December 2017, Avast open-sourced their machine code decompiler for platform-independent analysis of executable code. The decompiler is named Retargetable Decompiler (AKA RetDec.)

RetDec started life in 2011 as a joint project between AVG technologies, acquired by Avast in 2016, and Brno University of Technology (BUT) in the Czech Republic. Jakub Krouste, lead at Avast Threat Labs, was the “founder” of RetDec. Peter Matula, also of Avast, was the main developer of the RetDec decompiler. Over the years since 2011, more than 20 BSc/MSc/PhD students from BUN have been involved in the project. RetDec makes extensive use of a number of interesting technologies including Capstone, Yara, LLVM and LLVM IR (Intermediate Representation.)

RetDec represents over 7 years of development work. The tool is generic in that it can transform platform-specific code, such as IA32/PE binaries and ARM/ELF, into a higher form of representation, currently either C source code or a Python-like language. The name was chosen because the tool is not limited to a single CPU architecture, operating system or executable file format. This blog post is not going to go into the internals of the RetDec decompiler. Here is a list of publications on the subject if you are interested in learning more about the technology.

So, what is decompilation? In general terms, decompilation is the reconstruction of a computer program in a high-level language from a computer program in a low-level language. What is a decompiler? I like Wikipedia’s definition, i.e.

The term decompiler is most commonly applied to a program which translates executable programs (the output from a compiler) into source code in a (relatively) high level language which, when compiled, will produce an executable whose behavior is the same as the original executable program.

There are many different types of decompilers. RetDec is a machine-code decompiler. It only supports the decompilation of native processor code, i.e. machine code. It cannot decompile intermediate bytecode such as .NET, Java or Python. Machine-code decompilers are unable to perfectly reconstruct original source code because a lot of information, including variable and function names, comments, macros and more, is lost during the compilation process. Moreover, malware authors often use sophisticated obfuscation and anti­-disassembly and anti-decompilation tricks to make decompilation as difficult as possible.

In this post, I compare the C source code generated by the RetDec IDA Plugin with that generated by the IDA Hex-Rays decompiler plugin using a number of simple executables. The results were interesting.

Currently, the RetDec decompiler plugin only decompiles 32-bit binaries and does not yet support the new IDA 7.0 plugin architecture. I used the default configuration for the plugin, i.e. remote API decompilation. I also used the excellent Visual Studio 15 build tools (link may change!) to compile and link three simple 32-bit demo executables, and IDA Pro 6.95 to compare the output from the RetDec and Hex-Rays decompiler plugins.

On to the examples…

Example 1

This is the standard Hello World program with a small twist. I use puts instead of printf to simplify the resulting machine code. Thus, no need to deal with issues such as varargs.

//
// demo1.c
//

#include <stdio.h>

int 
main(void)
{
   puts("Hello, World!");

   return 0;
}

Here is how the source code was compiled and tested:

Here is the code block for main subroutine in IDA Pro:

Here is the relevant decompiler output from Hex-Rays:

Here is the relevant decompiler output from RedDec:

Both decompilers produced reasonable C source code with Hex-Rays producing C code closer to the original source code.

Example 2

Our next example is a classic first reverse engineering practice lab program. Again, I avoided any functions such as printf which required the use of varargs.

//
// demo2.c
//

#include <stdio.h>
#include <string.h>

#define MAXSIZE_PASSWORD 100
#define PASSWORD "myGOODpassword\n"

int 
main(void) {
   int count=0;
   char buff[MAXSIZE_PASSWORD];

   while (count < 3) {
      fputs( "Enter password: ", stdout);
      fgets( buff, MAXSIZE_PASSWORD, stdin);
      if (!strcmp( buff, PASSWORD)) {
         fputs("SUCCESS: Correct password entered\n", stdout);
         return 0;
      }
      fputs("ERROR: Invalid passport entered. Try again.\n", stdout);
      count++;
   }

   fputs("Sorry. No more attempts allowed\n", stdout);
   return 1;
}

Again, here is how the source code was compiled and tested:

Here are the relevant code blocks in IDA Pro:

Here is the relevant decompiler output from Hex-Rays:

int __cdecl main(int argc, const char **argv, const char **envp)
{
  FILE *v3; // eax@3
  FILE *v4; // eax@3
  FILE *v5; // eax@4
  FILE *v7; // eax@5
  FILE *v8; // eax@6
  signed int i; // [sp+0h] [bp-6Ch]@1
  char v10; // [sp+4h] [bp-68h]@3

  for ( i = 0; i < 3; ++i )
  {
    v3 = (FILE *)__acrt_iob_func(1);
    fputs("Enter password: ", v3);
    v4 = (FILE *)__acrt_iob_func(0);
    fgets(&v10, 0x64, v4);
    if ( !strcmp(&v10, "myGOODpassword\n") )
    {
      v5 = (FILE *)__acrt_iob_func(1);
      fputs("SUCCESS: Correct password entered\n", v5);
      return 0;
    }
    v7 = (FILE *)__acrt_iob_func(1);
    fputs("ERROR: Invalid passport entered. Try again.\n", v7);
  }
  v8 = (FILE *)__acrt_iob_func(1);
  fputs("Sorry. No more attempts allowed\n", v8);
  return 1;
}

Here is the source code emitted by RetDec:

//
// This file was generated by the Retargetable Decompiler
// Website: https://retdec.com
// Copyright (c) 2017 Retargetable Decompiler 
//

#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>

// ------------------------ Structures ------------------------

struct _IO_FILE {
    int32_t e0;
};

// ------------------- Function Prototypes --------------------

int32_t _main(int32_t argc, char ** argv, char ** envp);

// --------------------- Global Variables ---------------------

char g1[17] = "Enter password: ";
char g2[45] = "ERROR: Invalid passport entered. Try again.\n";
char g3[16] = "myGOODpassword\n";
char g4[33] = "Sorry. No more attempts allowed\n";
char g5[35] = "SUCCESS: Correct password entered\n";

// ------------------------ Functions -------------------------

// Address range: 0x401000 - 0x4010d3
int32_t _main(int32_t argc, char ** argv, char ** envp) {
    int32_t v1 = 0; // bp-112
    int32_t * stream = __acrt_iob_func(1); // 0x4010ab20
    // branch -> 0x401021
    int32_t * stream4; // 0x4010ab
    while (true) {
        // 0x401021
        fputs(g1, (struct _IO_FILE *)stream);
        int32_t * stream2 = __acrt_iob_func(0); // 0x40103b
        int32_t str;
        fgets((char *)&str, 100, (struct _IO_FILE *)stream2);
        int32_t strcmp_rc = strcmp((char *)&str, g3); // 0x40105b
        int32_t * stream3 = __acrt_iob_func(1); // 0x401069
        if (strcmp_rc == 0) {
            // 0x401067
            fputs(g5, (struct _IO_FILE *)stream3);
            // branch -> 0x4010c6
            // 0x4010c6
            return ___security_check_cookie_4();
        }
        // 0x401083
        fputs(g2, (struct _IO_FILE *)stream3);
        int32_t v2 = v1 + 1; // 0x40109e
        v1 = v2;
        stream4 = __acrt_iob_func(1);
        if (v2 > 2) {
            // break -> 0x4010a9
            break;
        }
        stream = stream4;
        // continue -> 0x401021
    }
    // 0x4010a9
    fputs(g4, (struct _IO_FILE *)stream4);
    // branch -> 0x4010c6
    // 0x4010c6
    return ___security_check_cookie_4();
}

// --------------- Statically Linked Functions ----------------

// _ACRTIMP_ALT FILE * __cdecl __acrt_iob_func(unsigned);
// char * fgets(char * restrict s, int n, FILE * restrict stream);
// int fputs(const char * restrict s, FILE * restrict stream);
// int strcmp(const char * s1, const char * s2);

// --------------------- Meta-Information ---------------------

// Detected compiler/packer: msvc (vs 2012) (17)
// Detected functions: 1
// Decompiler release: v2.2.1 (2016-09-07)
// Decompilation date: 2017-12-30 12:57:30


I included the full output from RetDec to show you how RetDec emits it’s output. As you can see, it is much more readable than the somewhat spaghetti C code outputted by Hex-Rays. Observe how things quickly get complicated under the hood once file streams and FILE objects are used. You may not agree with me but personally I think that the code emitted by Hex-Rays is easier to understand in this example.

Example 3

As shown below, this simple example contains a single user-defined function, twice, and a call to printf.

//
// demo3.c
//

#include <stdio.h>

int 
twice( int number) {
   int tmp;
   
   tmp = number + number;

   return(tmp);
}

int 
main(void) {
   int tmp, num = 2;

   tmp = twice(num);

   printf("Result: %d\n", tmp);
}

Here is the code block for the main subroutine in IDA Pro:

and the code block for the twice subroutine:

Here is the relevant decompiler output from Hex-Rays:

int32_t sub_401000(int32_t a1) {
    int32_t result = 2 * a1; // 0x401007
    g2 = result;
    int32_t v1;
    g3 = v1;
    return result;
}

int32_t _main(int32_t argc, char ** argv, char ** envp) {
    int32_t v1 = sub_401000(2); // 0x401031
    g5 = v1;
    sub_4010A0((int32_t)"Result: %d\n", (char)v1);
    return 0;
}


I am not sure why Hex-Rays outputs those superfluous lines of code in both functions!

Here is the relevant decompiler output from RetDec:

int32_t sub_401000(int32_t a1) {
    // 0x401000
    return 2 * a1;
}

int32_t _main(int32_t argc, char ** argv, char ** envp) {
    char v1 = sub_401000(2);
    sub_4010A0((int32_t)"Result: %d\n", v1);
    return 0;
}


Good clean source code which is easy to understand!

As you can see from the above 3 examples, both RetDec and Hex-Rays make a reasonable attempt at decompiling each of the three executables. However, it should be quite clear to you by now that neither decompiler produces high-fidelity C source code, i.e. code that matches the original source code used to compile the executable. Frankly I am surprised, given the maturity of the Hex-Rays decompiler, that Hex-Rays output is so poor and disorganized compared with RetDec‘s output. From now on I plan to use a combination of Hex-Rays and RetDec when decompiling machine-code. Furthermore, I am also tempted to add the Snowman decompiler IDA plugin to the mix as each decompiler can provide different insights into the machine code under investigation.

By the way, RetDec, including the RetDec IDA plugin, is genuinely open-source, having an MIT license. Nice to see that Avast did not use a GPL license, which thankfully more and more companies and individuals are avoiding. All the source code for RetDec is available on GitHub.

In this post, I have only used the RetDec IDA plugin. I would be amiss, however, if I did not also point out that RetDec has a powerful REST API which can be used to provide direct access to the internals of the RetDec decompiler without using IDA Pro. You should try the API sometime!

Best Wishes for 2018.

Comments are closed.