API's that Suck

August 19, 2010

What is a weakly typed language and what does that imply?

Filed under: Langauge Design — Grauenwolf @ 12:18 pm

I have a simple test for weakness in a type system.

Are buffer overruns are possible?

The first reason we invented type systems was to gain the ability to prove that a given piece of memory has structurally valid data. Without array bound checking you lose that ability. You cannot say definitively say anything about a portion of program without examining the whole program.

Consider this fragment where aString and anArray are local variables:

    aString = "Tom";
    anArray[10] = 81;

What is the value of aString?

In a strongly typed language, aString  has the semantic value of “Tom”. It doesn’t matter is aString is an array of char, a pointer to an array of char, or a reference to a String object, the semantic meaning is well known.

In a weakly typed language you can’t tell me anything about aString unless you know how both it and anArray are laid out in memory.  It could be “Tom”, “Qom”, “TQm”, “ToQ”, or even “TomQasdasdajshd akjasghkjd asgkudhasgdoiaughd asjbvhd”.

In weakly typed languages like C, C++, and TurboPascal this leads to vulnerabilities like buffer overrun attacks as well as hard to understand errors. I included the last one because I’ve seen students using TurboPascal have a similar problem to the one I showed above. They literally spent an hour starting at code trying to figure out why their variable had the wrong value before one of them decided to just “pad” the array.

Implicit Casting

It is often said that languages with implicit casting are weakly typed. I would argue that languages that allow objects to implicitly case are strongly typed by necessity. Consider this:

    object anObject = "17"

    int aNumber = anObject

In order to implicitly case anObject into aNumber, the object that anObject must know its own type at runtime. By extension, if the object knows its own type then the runtime will ensure  that the memory under that object is only mutated in a way that is inconsistent with its type.

That isn’t to say that implicit casting requires strong typing. You can still implicitly cast if your variables know what type the object is supposed to be.

    string aString = "17"

    int aNumber = aString

Dynamic Typing, Dynamic Binding, Late Binding, and Duck Typing

For our purposes these terms all mean the same thing, which is you compile with a method name and don’t bind to the actual method until runtime. This is used in a wide variety of languages and technologies include Smalltalk, Objective-C, OLE/COM, Visual Basic, and C# 4+. It is different than “dynamic dispatching”, which is usually implemented by binding to a slot in a v-table with a well-known layout. In C++ this is done at compile time while Java and .NET defer it until the JIT compiler is run against the class.

Since the compiler has no idea what object will be in any given variable, this again requires the object to know its own type. (Note I said “type”, not “class”. In some dynamic languages each object is its own unique type.) And as before, an object that knows its own type works with the runtime to prevent random garbage from overwriting its memory.


Blog at WordPress.com.