UGTS Document #26 - Last Modified: 8/29/2015 3:23 PM
Executable Code and the Windows Operating System

Program code is ultimately a sequence of machine language instructions written for a particular processor architecture (Itanium/x64, 16/32/64 bit), which are executed by a processor with the context of a process. But, there are many ways that this machine language code can be created.
  • NATIVE (UNMANAGED): The machine language code is stored verbatim in a binary file.

  • MANAGED: The program code is stored in intermediate form which is compiled at runtime (Just In Time) by other program code (a JITter) into the machine language that the computer will use. The results of the JIT step can be cached so that it only needs to be done once, or it could be re-done every time the code is loaded. Java and .NET are examples of managed code. A virtual machine with a different processor architecture is also an extreme example of managed code, where an entire native OS is dynamically recompiled to run as native code on a different processor type.

  • INTERPRETED: The program code can be stored as a script which is interpreted at runtime line by line by other program code. The interpreted code is dynamically run each time it is executed and nothing is cached. This is the slowest type of execution. VBScript, Powershell, and batch (.BAT) files are examples of interpreted code. Although you could run a virtual machine as interpreted code rather than managed, in practice this is rarely done except during development of virtualization code because it is so extremely slow compared to compiling the code only once and caching the results.
Also of interest is the format in which the code is stored - as binary or text. Interpreted code is usually stored in text format, but it could be stored in binary. Managed code is usually stored in binary, but it could be stored as text. Native code is always stored in binary.

Processor Architecture

Native program code must be tied to a particular processor architecture (Itanium/x64) and bit-depth (16/32/64) because it must be run directly by the processor without any changes allowed at runtime. Managed and interpreted code need not be tied to a particular processor architecture because they can be dynamically compiled at runtime to match the architecture that is needed. However in practice, managed and interpreted code sometimes depends on some native code in ways that assume a particular architecture, meaning that if you tried to run it on a processor architecture it was not explicitly designed for, it might crash.

In Windows, a process must be entirely 32 or 64 bit. Whereas it was possible to have 16 and 32 bit code loaded into a single process, because 16-bit memory addressing was really pseudo-32-bit where the full address was simply broken up into a 16-bit selector and a 16-bit offset, this is not possible with 32 and 64 bit code because 32 bit addresses really are only 32 bits, and 64 bit addresses are really 64 bit. Any thunking layer between the two bit depths would have to work a lot like marshalling does between managed and unmanaged code, creating copies of memory structures from 64-bit space to 32-bit space, and vice versa, and a process would have to have a separate 32-bit subsystem and kernel interface to get everything to work. Instead of doing all this, though it might have been possible in principle, Microsoft chose instead just not to support it. Instead, if you have 32 and 64 bit code that needs to talk, the 32-bit code should be loaded into a separate 32-bit process and then that process can communicate as needed with any 64-bit processes that it needs to talk with using the typical ways, such as launching by command line, out-of-process COM, .NET remoting, sockets, etc...

Static vs Dynamic Linking

All programs have an entry point - some part of the code that gets loaded and run first. From there, the program can in turn load other parts of itself as needed. If two pieces of code are stored in the same file, they are said to be statically linked. If they are stored in separate files, they are dynamically linked. A DLL is a Dynamic Link Library - it is binary program code (either managed or native) which is stored separately from the main code file (usually an EXE file). Dynamically linked code has the advantage that it can be more easily updated or replaced. It does not have to be loaded at all if it is not needed at runtime. In contrast, statically linked code cannot be updated or replaced without replacing the entire file where it is stored, and it is always loaded immediately along with whatever it has been statically linked with.

It is also easier to search for and replace program code in DLLs because DLLs are typically given descriptive filenames, have certain known last modified dates and file sizes, and contain version information in the file.

On the other hand, statically linked code has the advantage that it is hardcoded into the main file, and therefore can't be misplaced or omitted accidentally. There are rarely dependency problems with statically linked code.

In Windows, DLLs are shared in the sense that there can only be one unique copy of a native DLL loaded at any time from a given absolute path. If a process has loaded a DLL file, Windows will lock the file to prevent it from being modified by any other process, so that any other process that loads it will get exactly the same copy of the DLL.

With 16-bit DLLs, the sharing is even more extreme - there can only be one global copy loaded in memory of a particular 16-bit DLL (by filename), and if you have two 16-bit processes that use two identically named DLL files with different contents, and if they run at the same time, the first one that runs will work, and the second one that runs will crash because Windows will give it the incompatible version of the DLL which was already loaded.

Portable Executable Files: DLLs vs EXEs

DLL and EXE files are both PE (Portable Executable) formatted files, with the primary difference between the two being the value of the subsystem at offset 0x40 in the optional header, and the fact that a DLL has no entry point, but instead has a function export table, and an EXE does not have an export table. The subsystem for an EXE will usually be either 2 = IMAGE_SUBSYSTEM_WINDOWS_GUI for a GUI windows application, or 3 = IMAGE_SUBSYSTEM_WINDOWS_CUI for a Windows console program. (IMAGE_SUBSYSTEM_WINDOWS_CUI). DLLs use a subsystem value of 2, which is the same as used for a GUI windows application, but note that EXEs and DLLs with this value are not required to have a GUI, and many DLLs will not.

Other file types that use the PE file format include CPL, EXE, DLL, OCX, SYS, SCR, and DRV.

A good free utility (not public domain, but GPL) to view PE file headers is the UCWare Anywhere PE Viewer.

One thing to note about native EXE files is that since they lack an export table, they can't be loaded like a DLL is loaded to use the code and functions inside them. Since there is no export table, the loader would have to somehow disassemble the source code to guess where the functions were located and what their calling conventions were. With .NET EXE files, this is a different story, since .NET assemblies always contain full metadata about all their functions. And so, in .NET you can easily call Assembly.LoadFrom to load an EXE into another process like it were a DLL, and find the classes in it and instantiate them. For example, the Microsoft utility installutil.exe (which is shipped with the .NET runtime) appears to do this.  When you pass installutil.exe a file (such as an EXE), it loads the file as an assembly rather than executing it, and uses reflection to find all the public classes which have the RunInstaller attribute defined, and instantiates those classes to run the installation procedures you've defined in them.