API's that Suck

December 27, 2010

The Story of UltraBase: Chapter 2

Filed under: UltraBase — Grauenwolf @ 8:29 pm

Last time we introduced Fred and his UltraBase. The UltraBase wasn’t a production application, but rather a sophisticated code generator that would be used to create the middle tier that would be shared by all hundred or so applications used by SmallCo. Unfortunately it had some limitations that couldn’t be overcome.

One of the biggest stumbling blocks for the middle tier/website team is optional parameters on stored procedures. SmallCo often wrote procedures in which a given parameter may be used by some applications and ignored by others. The convention was quote inconsistent; sometimes the optional parameter would have a default value, other times a null would have to be passed in.

The website was the first application to start using the middle tier. Since it had limited functionality compared to the internally facing applications, it more than others ignored a lot of parameters on the stored procedures. When they saw this, Fred’s team became very concerned. Since they weren’t sure they could properly test web service wrappers around the stored procs, they decided that all of those parameters the website didn’t use wouldn’t be available via the middle tier either.

Unfortunately the UltraBase didn’t support skipping parameters, nor did it support defaulting them to null. So after much arguing, and a near revolt by the database team, it was decided that Fred’s team would create roughly 200 stored procedures that simply wrapped the real stored procedures and hid the optional parameters.

I still wonder what they plan to do when asked to wrap the other 7,400 stored procedures we currently use.

December 23, 2010

The Story of UltraBase: Chapter 1

Filed under: UltraBase — Grauenwolf @ 8:27 pm

Like many financial companies, SmallCo had a large stable of applications. The most important of these was a very large Visual Basic 6 application that most of the sales and trading teams used. There were also dozens of small .NET applications that did back-end processing and a website used by a handful of customers that didn’t want to talk to a salesman. Since SmallCo had a tiny IT department and almost no testing budget it relied on quickly change applications even during the middle of the business day. A key part of this strategy was by putting most of the business logic into stored procedures.

This plan worked surprisingly well for a long time, but eventually SmallCo started to evolve into LargeCo. LargeCo’s business model couldn’t tolerate the frequent break-downs and emergency fixes that were part of day to day life at SmallCo. In addition, having all of the business logic in the database was causing them to purchase larger and larger databases until they reached the point where upgrades simply were not feasible. Something had to change.

Fred was hired to make this change. His mandate was to create a middle tier accessible by all applications. All stored procedure calls would go through this middle tier, and over time the logic would be pulled out of the procs and moved into the middle tier itself. The process would start with the website, which was reasonably manageable and was scheduled to be replaced eventually anyways.

To make the process easier, Fred and his team created the UltraBase. The UltraBase was sold to management as just a code generator. Developers would record all of the stored procedure names, parameters, and result sets in the UltraBase’s configuration database and out would pop the middle tier. They could then start the process of moving the business logic out of the database.

After six months of development, the process of calling a stored procedure became this:

clip_image002

Note that at every assembly below website is 100% code-generated. Even the configuration database is code-generated from another database. Absolutely no hand-written code is allowed in the middle tier. If something cannot be expressed using the UltraBase’s configuration tables it had to be moved up into the website or down into the stored procedures.

August 19, 2010

What is a weakly typed language and what does that imply?

Filed under: Langauge Design — Grauenwolf @ 12:18 pm

I have a simple test for weakness in a type system.

Are buffer overruns are possible?

The first reason we invented type systems was to gain the ability to prove that a given piece of memory has structurally valid data. Without array bound checking you lose that ability. You cannot say definitively say anything about a portion of program without examining the whole program.

Consider this fragment where aString and anArray are local variables:

    aString = "Tom";
    anArray[10] = 81;

What is the value of aString?

In a strongly typed language, aString  has the semantic value of “Tom”. It doesn’t matter is aString is an array of char, a pointer to an array of char, or a reference to a String object, the semantic meaning is well known.

In a weakly typed language you can’t tell me anything about aString unless you know how both it and anArray are laid out in memory.  It could be “Tom”, “Qom”, “TQm”, “ToQ”, or even “TomQasdasdajshd akjasghkjd asgkudhasgdoiaughd asjbvhd”.

In weakly typed languages like C, C++, and TurboPascal this leads to vulnerabilities like buffer overrun attacks as well as hard to understand errors. I included the last one because I’ve seen students using TurboPascal have a similar problem to the one I showed above. They literally spent an hour starting at code trying to figure out why their variable had the wrong value before one of them decided to just “pad” the array.

Implicit Casting

It is often said that languages with implicit casting are weakly typed. I would argue that languages that allow objects to implicitly case are strongly typed by necessity. Consider this:

    object anObject = "17"

    int aNumber = anObject

In order to implicitly case anObject into aNumber, the object that anObject must know its own type at runtime. By extension, if the object knows its own type then the runtime will ensure  that the memory under that object is only mutated in a way that is inconsistent with its type.

That isn’t to say that implicit casting requires strong typing. You can still implicitly cast if your variables know what type the object is supposed to be.

    string aString = "17"

    int aNumber = aString

Dynamic Typing, Dynamic Binding, Late Binding, and Duck Typing

For our purposes these terms all mean the same thing, which is you compile with a method name and don’t bind to the actual method until runtime. This is used in a wide variety of languages and technologies include Smalltalk, Objective-C, OLE/COM, Visual Basic, and C# 4+. It is different than “dynamic dispatching”, which is usually implemented by binding to a slot in a v-table with a well-known layout. In C++ this is done at compile time while Java and .NET defer it until the JIT compiler is run against the class.

Since the compiler has no idea what object will be in any given variable, this again requires the object to know its own type. (Note I said “type”, not “class”. In some dynamic languages each object is its own unique type.) And as before, an object that knows its own type works with the runtime to prevent random garbage from overwriting its memory.

August 12, 2010

All, and I mean literally all, version control systems are an embarrassment to our industry.

Filed under: Uncategorized — Grauenwolf @ 10:39 am

To start with, all systems that physically tie branches to directory layouts are just plain stupid. Code should be organized by the structure of the project. When you have to contort it to the needs of the VCS then something wrong.

The stream-based systems like AccuRev and ClearCase are closer to what VCS should be. But last time I checked AccuRev didn’t even have any support for build management. What’s the point of having the ability to easily move features from one stream to another if you can’t then build the damn thing? As for ClearCase, it is so buggy and temperamental you need a team just to babysit it.

The file system should be playing an active role too. Without any action by the developer, the file system and VCS should be tracking every revision to every file. And those revisions should remain available to me until I “checkpoint” the file and commit them to my private branch.

Check-ins should be based on features, not just change-sets. When I commit a set of files it should prompt me to strongly associate it with an issue or project number. Not just a soft-link like TFS does, I want a solid association where I can move the whole feature from one branch/stream to another branch/stream.

When pulling down code, I should be able to say “Get this branch/stream plus features X, Y, and Z”. I should then be able to say “Remove feature Y and recompile”. For this to work we also need to be able to indicate what the real dependencies are between features. Simply guessing that A requires B but not C because A and B have overlapping files is a good start, but we need the option to mark C as required as well.

Blind text comparisons aren’t enough, VCS systems need to deeply understand project and source files. When I add or remove a feature X from my local workspace, it should know how to update my project file accordingly. And it shouldn’t get confused just because project X and project Y both added the same file, but by chance put them in a different order in the project file.

Unit tests should never run on my machine. There should be a separate machine watching the changes on my machine and rerunning the applicable unit tests every time I change a file. When I notice something go red, the real-time versioning I mentioned in the third paragraph can be used to show me exactly what I did to break the test.

June 7, 2010

Web Apps: Because you don’t have time to piss off users one at a time.

Filed under: Uncategorized — Grauenwolf @ 10:49 pm

image

March 29, 2010

Security, Superstitions, and Stackoverflow

Filed under: Uncategorized — Grauenwolf @ 2:37 pm

Today I needed to translate a Stackoverflow post. Most translators including Google and Bing use frames, which Stackoverflow doesn’t like. So I write to them to inform them of the problem.

This was their response:

Hello,

That’s present to present malicious framing, see:

http://www.codinghorror.com/blog/2009/06/we-done-been-framed.html

http://stackoverflow.com/questions/958997/frame-buster-buster-buster-code-needed

 

The first link has this conclusion,

Yes, Digg frames ethically, so your frame-busting of the DiggBar will appear to work. But if the framing site is evil, good luck. When faced with a determined, skilled adversary that wants to frame your contnet, all bets are off. I don’t think it’s possible to escape. So consider this a wakeup call: you should build clickjacking countermeasures as if your website could be framed at any time.

The second link includes step-by-step instructions to counter “frame-busting” code.

So basically their stance is that frame-busting code is worthless, but they are going to use it anyways. Am I missing something or have they completely lost their mind?

January 4, 2010

Learning IL – The Echo Program

Filed under: CIL — Grauenwolf @ 4:27 pm

The requirements for our echo program is pretty simple.

  1. Take any number of command line arguments.
  2. Print each argument on separate lines.
  3. Return the number of command line arguments.

In this lesson you will learn how to define parameters, work with arrays, work with loops, and return values from functions. This lesson assumes that you already understand the hello world code.

Version 1

Version 1 will deal with requirements 1 and 3. This will be enough to prove that the command line parameters are actually making it all the way to the program and the return value is making it back to the shell. In Windows you can use the command “ECHO %ERRORLEVEL%” immediately after running the program to see the return value. For this to work you need to run the program itself from the command line as well.

  1. .assembly extern mscorlib {}
  2. .assembly EchoProg {}
  3. .method static int32  Main(string[] args)
  4. {
  5.     .entrypoint
  6.     ldarg.0
  7.     ldlen
  8.     ret
  9. }

Line 1 is the core assembly reference. Again, I’m not asking for a specific version.

Line 2 is our program’s name. I’m calling it “EchoProg” because “echo” is reserved by the Windows command shell.

Line 3 is our entry point. Lets compare is to the hello world version.

  1. .method static void Main()

As you can see void has been replaced by the return type. In addition, we now have one argument in a C#-like style with the type preceding the argument name. Note that we don’t actually use the argument name anywhere, it is just their for reference.

Line 6 is the “load argument N onto the stack” command. In this case we are loading the “args” parameter, which is a pointer to an array of string.

Line 7 is the “replace the array on the stack with it’s length” command, also known as load length.

As I mentioned last time, the return command in line 8 takes the last value on the stack and places it on the calling stack, then exits the current function.

To make experimenting with this code a little easier, I wrote this little batch file. If you run it using version 1, you should get the number 4.

  1. ilasm EchoProg.il
  2. PEVerify EchoProg.exe
  3. EchoProg.exe "la la la" dee di doh
  4. ECHO %ERRORLEVEL%

Version 2

Before we jump into loops, lets just try printing the first command line argument. The code for this is pretty straight forward.

  1. .assembly extern mscorlib {}
  2. .assembly EchoProg {}
  3. .method static int32  Main(string[] args)
  4. {
  5.     .entrypoint
  6.  
  7.     //printline code
  8.     ldarg.0
  9.     ldc.i4 0
  10.     ldelem.ref
  11.     call       void [mscorlib]System.Console::WriteLine(string)
  12.     
  13.     //return code
  14.     ldarg.0
  15.     ldlen
  16.     ret
  17. }

 

Line 7 is a comment. I believe you can use // at the end of any line, as I often see it in compiler-generated IL.

In line 8 we are loading the array onto the stack just as before.

In line 9 we see the “load a constant” command. The “.i4” suffix means that the constant is a 4-byte integer, a.k.a. a System.Int32. This is followed by the number 0, which we want. This gives us the following stack:

0

pointer to args

[not yours]

Line 10 starts with the “load element from array” command. It appears that the “.ref” suffix is used to indicate the array contains a non-numeric type. After this command is executed both the array and the constant will be removed from the stack and a pointer to the first string will be in its place. This is shown in the spec under the “stack transition” heading.

 

image

Version 3

Ok, now things get tricky. We are going to have to start keeping track of an array index. And since there are no loops in CIL, you have to fake it using if statements and gotos. Before we run off and start coding, lets write some pseudo code showing what we are trying to accomplish.

  1. Variables: index, arrayLength
  2. arrayLength = args.Length
  3. index = 0
  4. If index = arrayLength Then goto done
  5. Print args[index]
  6. index = index+1
  7. goto if
  8. done

Reminds me of BASICA back from the days of DOS, except you get labels instead of line numbers. Anyways, on to the code.

  1. .assembly extern mscorlib {}
  2. .assembly EchoProg {}
  3. .method static int32  Main(string[] args)
  4. {
  5.     .entrypoint
  6.     //Variables: index, arrayLength    
  7.     .locals init (
  8.         [0] int32 index,
  9.         [1] int32 arrayLength)
  10.  
  11.     //arrayLength = args.Length
  12.     ldarg.0
  13.     ldlen
  14.     stloc 1
  15.     
  16.     //index = 0
  17.     ldc.i4 0
  18.     stloc 0
  19.  
  20.     //If index = arrayLength Then goto done
  21. if: ldloc 0
  22.     ldloc 1
  23.     beq done
  24.  
  25.     //Print args[index]
  26.     ldarg.0
  27.     ldloc 0
  28.     ldelem.ref
  29.     call       void [mscorlib]System.Console::WriteLine(string)
  30.     
  31.     //index = index+1
  32.     ldloc 0
  33.     ldc.i4 1
  34.     add.ovf
  35.     stloc 0
  36.  
  37.     br if
  38.  
  39. done:
  40.  
  41.     //return code
  42.     ldarg.0
  43.     ldlen
  44.     ret
  45. }

On line 6 thru 9 we see our variables being defined. It starts with the “.locals” directive. This is followed by “init”, which sets all the local variables to 0/null. According to the spec, all verifiable code must use the init keyword.

Each well formed local variable consists of three values. First is the slot index, next is the variable type, and finally there is the variable name. Of these only the type is actually required. You could write, “.locals init (int32, int32)”, but that is going to be a royal pain to understand later on.

Lines 11 thru 14 load the array length into a local variable. The only thing new here is the “store local” command, which takes a slot number as a parameter.

Lines 16 thru 18 load the value 0 into the index variable. Strictly speaking this isn’t necessary because we are using init, but I’ll leave it here for reference.

Lines 20 to 23 represent our if statement. It starts with the label “if:” so we can jump to it later. We could have used just about anything for our label, it doesn’t really matter once the code is assembled.

Then we use the “load local” command twice, once for each variable.

Finally we use the “branch if equal” command on line 23. If the top-two items on the stack are equal, we go to line labeled “done”.

Lines 25 thru 29 is our code to actually print the parameter. The only difference between this and the one in version 2 is that we are loading the index using ldloc instead of a constant value.

Lines 31 to 35 will increment our index variable. It does this by loading both the index and the constant 1 onto the stack. Then it calls the “add with overflow check” command. Finally it stores the result value back into the index variable.

Line 37 takes us back to the top of the loop using the “branch” command. It is important that this line is skipped if our if statement back on line 20 is true.

Conclusions

I was surprised how much code it took to write this. Sure I knew I would have to write my own loops, but even basic stuff like incrementing local variables took much harder than I expected. I was expecting to find a “increment local” command, but it looks like we can’t do anything with values until they are on the stack.

Learning IL – Hello World in CIL

Filed under: CIL — Grauenwolf @ 6:38 am

If you are going to be writing your own programming language you need to spend a lot of time learning the target. In my case, the target is .NET’s Common Intermediate Language, also known as IL or CIL.  As per tradition, here is my hello world app written in CIL.

  1. .assembly extern mscorlib {}
  2. .assembly Test1 {}
  3. .method static void Main()
  4. {
  5.     .entrypoint
  6.     ldstr      “Hello World”
  7.     call       void [mscorlib]System.Console::WriteLine(string)
  8.     ret
  9. }

Line 1 says that I need mscorlib, but I don’t care which version. If it was important, then I would include some extra information in the brackets.

Line 2 is the name of my assembly.  I think I’m supposed to put assembly-level attributes in the brackets, but I’m not certain.

Line 3 starts my first function, or in IL parlance a “.method”. I bet you were expecting to see some sort of class or module, I sure as heck was. But no, that is not required.  The ECMA documentation clearly says, “Methods can be defined at the global level (outside of any type)”.  Well now, if that’s the case why not just go ahead and try it out.

Warning: Languages like VB and C# cannot call free-floating functions.

Ok, so what else do we know about line 3. Well it is an entry point function, so either it returns an integer or nothing. I choose nothing hence the “void” keyword. As for “static”, it seems kinda silly. Turns out the assembler will infer that it is a static method form the context, but since we don’t want warnings we will go ahead and put that in there.

Line 5 says that this is the entry point for the executable. For executables you must have one and only one entry point, the assembler won’t even try to sort it out if you break that rule.

Line 6 is translated as “load a string and push it onto the stack”. What follows is a QSTRING. From the spec:

QSTRING is a string surrounded by double quote (″) marks. Within the quoted string the character “\” can be
used as an escape character, with “\t” representing a tab character, “\n” representing a newline character, and
“\” followed by three octal digits representing a byte with that value. The “+” operator can be used to
concatenate string literals. This way, a long string can be broken across multiple lines by using “+” and a new
string on each line. An alternative is to use “\” as the last character in a line, in which case, that character and
the line break following it are not entered into the generated string. Any white space characters (space, line-
feed, carriage-return, and tab) between the “\” and the first non-white space character on the next line are
ignored. [Note: To include a double quote character in a QSTRING, use an octal escape sequence. end note]

You don’t have to use a QSTRING. You could instead use a SQSTRING. I don’t know why you would though, because literally the only difference is that it is wrapped in single quotes instead of double quotes.

Line 7 is a method call. (Not a function call; this is CIL and everything is a method.) Note that we have to be very explicit about what assembly the method lives in, what type wraps it, and which overload we desire.

Lets take a moment to talk about the stack. When you want to call a method you need to push the arguments onto the stack. When the method returns, if it has a non-void return type, the result will be pushed onto the stack. This means at run time your stacks are going to look something like this:

Before calling WriteLine After calling WriteLine
   
Pointer to “Hello World”  
[not yours] [not yours]

If you stray into the “not yours” part of the stack, say by calling WriteLine twice in a row, then bad things will happen. The assembler doesn’t care about double-checking the stack, so that won’t save you. There is a program called PEVerify that will warn you about potential stack underflows. If you don’t use it, then you find your error when your program crashes hard with an InvalidProgramException.

Line 8 is the return statement. If the function has a return value, it is copied from the top of the current stack to the top of the calling function’s stack. Since we aren’t returning anything, the stacks are not affected.

January 2, 2010

Foundry – Types of Variables and Variable Types

Filed under: Foundry — Grauenwolf @ 9:30 pm

It seems I’ve gotten a bit ahead of myself. Why talk about passing variables to functions before you figure out how to define a variable.

One of the things that really sucks about .NET is that variables are nullable and mutable by default. While I have never seen any convincing evidence for getting rid of either, there is plenty of reasons to limit them as much as possible. So whatever is decided needs to tackle both aspects.

Between VB and C#, I much prefer VB’s syntax. It is easier to parse, with its leading keyword, and looks the same whether or not you are using type inference. But it still has flaws such as the totally unnecessary “As” keyword. Using “dim” versus “const” sounds good, so lets start there.

Define

Define variables are variables that can only be set once and not changed from there on out. Foundry doesn’t expect programmers distinguish between statically compiled, CLR-style constants and variables that are simply assign-once, that’s what compilers are for. Likewise, type is inferred from the expression.

<Define-Statement> := “Define” <Identifier> “=” <Expression>

Declare

Declare variables are variables whose value can be changed. They require a type, which can be made nullable using the ‘?’ operator. An initial value is required if the type is left off or the variable is not nullable.

<Declare-Statement> := “Declare” <Identifier> [<Type> [ “?” ]]  “=” <Expression>
<Declare-Statement> := “Declare” <Identifier> <Type> [ “=” <Expression> ]

At some point that should be some escape analysis to make it possible to leave of the expression for non-nullable variables, but it will have to wait.

String Literals

There is another post on this one.

Date/Time Literals

Unless someone else has a reason otherwise, I’m going to adopt VB’s style of date literals for both dates and times. T-SQL just uses the string literal syntax, but that means you don’t get type inference.

<Time-Literal> := #hh:MM:ss.fffff#
<Date-Literal> := #yyyy-mm-dd#
<DateTime-Literal> := #yyyy-mm-ddThh:MM:ss.fffff#
<DateTimeOffset-Literal> := #yyyy-mm-ddThh:MM:ss.fffffZ# | #yyyy-mm-ddThh:MM:ss.fffff [+/-] hh:mm#

Numeric Literals

If the number contains a decimal place, then it is treated as Decimal. Integers default to Int32. If they don’t fit, they will be automatically treated as an Int64. If they don’t fit within that either, a Decimal is used.

Foundry: What kind of string literals should be used by default?

Filed under: Foundry — Grauenwolf @ 9:21 pm

There are a lot of options for string literals, but which to choose as the default? I’m inclined to choose verbatim strings because I’m primarily a VB and T-SQL programmer, but the other forms do have merits.

Verbatim Strings

The easiest is the one used by CSV, VB, and SQL, where in there is no escape sequences except for quotes. C# uses this as a secondary form, accessed using @" instead of the usual ".

Escaped Strings

Escaped strings are what languages like Java and C# use by default.  They use the backslash to start a escape sequence. This makes strings that actually use backslashes like file paths annoying.

Interpolated Strings

String interpolation, where in variables and expressions can be inlined right into the string literal. Under the covers this would be implemented using the String.Format function call. This could be combined with either of the two above formats.

Why not all of the above?

C# already offers two of the three by tacking an extra symbol to the front of a string to indicate it should be handled differently. But which to choose as the default?

« Newer PostsOlder Posts »

Blog at WordPress.com.