API's that Suck

January 4, 2010

Learning IL – Hello World in CIL

Filed under: CIL — Grauenwolf @ 6:38 am

If you are going to be writing your own programming language you need to spend a lot of time learning the target. In my case, the target is .NET’s Common Intermediate Language, also known as IL or CIL.  As per tradition, here is my hello world app written in CIL.

  1. .assembly extern mscorlib {}
  2. .assembly Test1 {}
  3. .method static void Main()
  4. {
  5.     .entrypoint
  6.     ldstr      “Hello World”
  7.     call       void [mscorlib]System.Console::WriteLine(string)
  8.     ret
  9. }

Line 1 says that I need mscorlib, but I don’t care which version. If it was important, then I would include some extra information in the brackets.

Line 2 is the name of my assembly.  I think I’m supposed to put assembly-level attributes in the brackets, but I’m not certain.

Line 3 starts my first function, or in IL parlance a “.method”. I bet you were expecting to see some sort of class or module, I sure as heck was. But no, that is not required.  The ECMA documentation clearly says, “Methods can be defined at the global level (outside of any type)”.  Well now, if that’s the case why not just go ahead and try it out.

Warning: Languages like VB and C# cannot call free-floating functions.

Ok, so what else do we know about line 3. Well it is an entry point function, so either it returns an integer or nothing. I choose nothing hence the “void” keyword. As for “static”, it seems kinda silly. Turns out the assembler will infer that it is a static method form the context, but since we don’t want warnings we will go ahead and put that in there.

Line 5 says that this is the entry point for the executable. For executables you must have one and only one entry point, the assembler won’t even try to sort it out if you break that rule.

Line 6 is translated as “load a string and push it onto the stack”. What follows is a QSTRING. From the spec:

QSTRING is a string surrounded by double quote (″) marks. Within the quoted string the character “\” can be
used as an escape character, with “\t” representing a tab character, “\n” representing a newline character, and
“\” followed by three octal digits representing a byte with that value. The “+” operator can be used to
concatenate string literals. This way, a long string can be broken across multiple lines by using “+” and a new
string on each line. An alternative is to use “\” as the last character in a line, in which case, that character and
the line break following it are not entered into the generated string. Any white space characters (space, line-
feed, carriage-return, and tab) between the “\” and the first non-white space character on the next line are
ignored. [Note: To include a double quote character in a QSTRING, use an octal escape sequence. end note]

You don’t have to use a QSTRING. You could instead use a SQSTRING. I don’t know why you would though, because literally the only difference is that it is wrapped in single quotes instead of double quotes.

Line 7 is a method call. (Not a function call; this is CIL and everything is a method.) Note that we have to be very explicit about what assembly the method lives in, what type wraps it, and which overload we desire.

Lets take a moment to talk about the stack. When you want to call a method you need to push the arguments onto the stack. When the method returns, if it has a non-void return type, the result will be pushed onto the stack. This means at run time your stacks are going to look something like this:

Before calling WriteLine After calling WriteLine
   
Pointer to “Hello World”  
[not yours] [not yours]

If you stray into the “not yours” part of the stack, say by calling WriteLine twice in a row, then bad things will happen. The assembler doesn’t care about double-checking the stack, so that won’t save you. There is a program called PEVerify that will warn you about potential stack underflows. If you don’t use it, then you find your error when your program crashes hard with an InvalidProgramException.

Line 8 is the return statement. If the function has a return value, it is copied from the top of the current stack to the top of the calling function’s stack. Since we aren’t returning anything, the stacks are not affected.

Advertisements

8 Comments »

  1. Nice introduction. Just wondering, do you know anything about making CIL code debuggable, such as adding source file paths and line numbers? I haven’t managed to find anything about that.

    Comment by LaurieCheers — January 4, 2010 @ 7:37 am

    • Sorry, I haven’t gotten that far yet. ILASM has an option to generate PDB files, but I don’t know if that will work. If you find anything please let me know, I’m going to need that in the near future.

      Comment by grauenwolf — January 4, 2010 @ 7:52 am

      • Yes, if you add /debug, and no other line directives are present, you can step debug IL (well sort of).

        Comment by leppie — January 4, 2010 @ 9:59 pm

    • There’s some information at http://www.codeproject.com/KB/dotnet/Debug_Framework_Classes.aspx?msg=913992 about making CIL debuggable via a convoluted process of disassembly and reassembly. However, I think what you meant was source language debugging rather than CIL level debugging. I too am writing a compiler for the CLR (Owl Basic) and would also like to know how to embed source debugging information while Reflection.Emitting the CIL.

      Comment by Rob Smallshire — January 9, 2010 @ 8:58 am

  2. Just FYI, VB.NET allows you group code into “modules” as well as classes. It is not necessary to have classes in a VB.NET program. You can indeed call a “free floating function” in VB.

    I’ve never disassembled a simple VB program with one main function in a module, so I don’t know how close it looks to your hand-built CIL. I assume it looks closed than the equivalent program in C#.

    Comment by Steve Goldman — January 4, 2010 @ 7:48 am

    • A VB module is still a “.class” in IL. The only difference between a C# “static class” and a VB Module is a special attribute that the VB compiler adds. With a free floating function, there is no module, class, or any other sort of wrapper.

      Comment by grauenwolf — January 4, 2010 @ 7:57 am

  3. […] with loops, and return values from functions. This lesson assumes that you already understand the hello world […]

    Pingback by Learning IL – The Echo Program « API's that Suck — January 4, 2010 @ 4:27 pm

  4. ILASM will implicitly place the ‘global’ method in a ‘global’ class called ”. AFAIK, every assembly includes that class.

    Also not, all methods (members actually, as this is good for fields too) must be static.

    Comment by leppie — January 4, 2010 @ 10:01 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: