API's that Suck

January 4, 2010

Learning IL – The Echo Program

Filed under: CIL — Grauenwolf @ 4:27 pm

The requirements for our echo program is pretty simple.

  1. Take any number of command line arguments.
  2. Print each argument on separate lines.
  3. Return the number of command line arguments.

In this lesson you will learn how to define parameters, work with arrays, work with loops, and return values from functions. This lesson assumes that you already understand the hello world code.

Version 1

Version 1 will deal with requirements 1 and 3. This will be enough to prove that the command line parameters are actually making it all the way to the program and the return value is making it back to the shell. In Windows you can use the command “ECHO %ERRORLEVEL%” immediately after running the program to see the return value. For this to work you need to run the program itself from the command line as well.

  1. .assembly extern mscorlib {}
  2. .assembly EchoProg {}
  3. .method static int32  Main(string[] args)
  4. {
  5.     .entrypoint
  6.     ldarg.0
  7.     ldlen
  8.     ret
  9. }

Line 1 is the core assembly reference. Again, I’m not asking for a specific version.

Line 2 is our program’s name. I’m calling it “EchoProg” because “echo” is reserved by the Windows command shell.

Line 3 is our entry point. Lets compare is to the hello world version.

  1. .method static void Main()

As you can see void has been replaced by the return type. In addition, we now have one argument in a C#-like style with the type preceding the argument name. Note that we don’t actually use the argument name anywhere, it is just their for reference.

Line 6 is the “load argument N onto the stack” command. In this case we are loading the “args” parameter, which is a pointer to an array of string.

Line 7 is the “replace the array on the stack with it’s length” command, also known as load length.

As I mentioned last time, the return command in line 8 takes the last value on the stack and places it on the calling stack, then exits the current function.

To make experimenting with this code a little easier, I wrote this little batch file. If you run it using version 1, you should get the number 4.

  1. ilasm EchoProg.il
  2. PEVerify EchoProg.exe
  3. EchoProg.exe "la la la" dee di doh

Version 2

Before we jump into loops, lets just try printing the first command line argument. The code for this is pretty straight forward.

  1. .assembly extern mscorlib {}
  2. .assembly EchoProg {}
  3. .method static int32  Main(string[] args)
  4. {
  5.     .entrypoint
  7.     //printline code
  8.     ldarg.0
  9.     ldc.i4 0
  10.     ldelem.ref
  11.     call       void [mscorlib]System.Console::WriteLine(string)
  13.     //return code
  14.     ldarg.0
  15.     ldlen
  16.     ret
  17. }


Line 7 is a comment. I believe you can use // at the end of any line, as I often see it in compiler-generated IL.

In line 8 we are loading the array onto the stack just as before.

In line 9 we see the “load a constant” command. The “.i4” suffix means that the constant is a 4-byte integer, a.k.a. a System.Int32. This is followed by the number 0, which we want. This gives us the following stack:


pointer to args

[not yours]

Line 10 starts with the “load element from array” command. It appears that the “.ref” suffix is used to indicate the array contains a non-numeric type. After this command is executed both the array and the constant will be removed from the stack and a pointer to the first string will be in its place. This is shown in the spec under the “stack transition” heading.



Version 3

Ok, now things get tricky. We are going to have to start keeping track of an array index. And since there are no loops in CIL, you have to fake it using if statements and gotos. Before we run off and start coding, lets write some pseudo code showing what we are trying to accomplish.

  1. Variables: index, arrayLength
  2. arrayLength = args.Length
  3. index = 0
  4. If index = arrayLength Then goto done
  5. Print args[index]
  6. index = index+1
  7. goto if
  8. done

Reminds me of BASICA back from the days of DOS, except you get labels instead of line numbers. Anyways, on to the code.

  1. .assembly extern mscorlib {}
  2. .assembly EchoProg {}
  3. .method static int32  Main(string[] args)
  4. {
  5.     .entrypoint
  6.     //Variables: index, arrayLength    
  7.     .locals init (
  8.         [0] int32 index,
  9.         [1] int32 arrayLength)
  11.     //arrayLength = args.Length
  12.     ldarg.0
  13.     ldlen
  14.     stloc 1
  16.     //index = 0
  17.     ldc.i4 0
  18.     stloc 0
  20.     //If index = arrayLength Then goto done
  21. if: ldloc 0
  22.     ldloc 1
  23.     beq done
  25.     //Print args[index]
  26.     ldarg.0
  27.     ldloc 0
  28.     ldelem.ref
  29.     call       void [mscorlib]System.Console::WriteLine(string)
  31.     //index = index+1
  32.     ldloc 0
  33.     ldc.i4 1
  34.     add.ovf
  35.     stloc 0
  37.     br if
  39. done:
  41.     //return code
  42.     ldarg.0
  43.     ldlen
  44.     ret
  45. }

On line 6 thru 9 we see our variables being defined. It starts with the “.locals” directive. This is followed by “init”, which sets all the local variables to 0/null. According to the spec, all verifiable code must use the init keyword.

Each well formed local variable consists of three values. First is the slot index, next is the variable type, and finally there is the variable name. Of these only the type is actually required. You could write, “.locals init (int32, int32)”, but that is going to be a royal pain to understand later on.

Lines 11 thru 14 load the array length into a local variable. The only thing new here is the “store local” command, which takes a slot number as a parameter.

Lines 16 thru 18 load the value 0 into the index variable. Strictly speaking this isn’t necessary because we are using init, but I’ll leave it here for reference.

Lines 20 to 23 represent our if statement. It starts with the label “if:” so we can jump to it later. We could have used just about anything for our label, it doesn’t really matter once the code is assembled.

Then we use the “load local” command twice, once for each variable.

Finally we use the “branch if equal” command on line 23. If the top-two items on the stack are equal, we go to line labeled “done”.

Lines 25 thru 29 is our code to actually print the parameter. The only difference between this and the one in version 2 is that we are loading the index using ldloc instead of a constant value.

Lines 31 to 35 will increment our index variable. It does this by loading both the index and the constant 1 onto the stack. Then it calls the “add with overflow check” command. Finally it stores the result value back into the index variable.

Line 37 takes us back to the top of the loop using the “branch” command. It is important that this line is skipped if our if statement back on line 20 is true.


I was surprised how much code it took to write this. Sure I knew I would have to write my own loops, but even basic stuff like incrementing local variables took much harder than I expected. I was expecting to find a “increment local” command, but it looks like we can’t do anything with values until they are on the stack.


1 Comment »

  1. Hi

    You can use xacc.ide for syntax highlighting/editing/copy to HTML for MSIL.



    Comment by leppie — January 5, 2010 @ 12:33 am

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: