Criticism of Decompilation and Theory
It is said that decompiling a program is like turning beefburgers back into a cow. This is a bad analogy, as we will see. When a program is compiled, it is not mashed. The source code represents a basic algorithm, and the compiler merely transfers that algorithm from the source code to the binary code. So, the analogy is better put as being a stripping-down of an information-rich algorithm to a lean algorithm (or the source code as a cow, and a thinner cow as the binary code).
There has also been an assignment of the 'halting problem' to decompilers. As we will see, this does not apply.
The first kind of halting problem, as explained on Wikipedia, is this (written as VB):
Public Function Halt(a As String, b As String) As Boolean
Halt = True
End Function
Public Function Trouble(s as String) as Boolean
If Halt(s,s) = False then
Trouble = True
Else
Loop While True
End If
End Function
This VB version of the halting problem decompiles thus:
Public Sub unk_1ad0(ByVal Arg8 As Variant)
If Call Form1.unk_1a60(Arg8,Arg8) = False Then
unk_1ad0 = True
Else
Loop While True
End If
End Sub
As we can see, we don't need to know if the program halts or not, because all we are interested in is what it does. Or, to put it more formally, the semantics (the result of what the program intends to do) are separated from the syntax (what the program actually does). We are interested only in the syntax.
Finally, another criticism is that it is impossible to tell the difference between code and data when dealing with a big block of assembly code that has no apparent structure. This is like giving us a paragraph written in Russian and expecting us to translate it with no knowledge. In the real world, once we know what processor's instruction set the program was compiled or assembled to, and the entry point, then we can start to disassemble it recursively. Really, the halting problem only applies to self-modifying code, but that is not a problem with C, C++, VB or Java. The compiler forces the user to work in a certain way, and it is these programming standardisations that allow for decompilation. Further, operating system's API's make things even easier by restricting the palette of system functions that are available to use. Standardisation makes decompilation possible.
The best way to tell the difference between code and data is to run the program in an emulator which can make a note of which bytes were opcodes and which were operands, let the user select a few program options as the program is running, and then once most of the program has been followed through execution, scan the valid code and follow the remaining branches to the end.
Information propagates like a snowball. Once you figure out one small piece of data, it leads onto more and more, which then leads onto even more.
For more information on the cow analogy read this Wiki page




