API's that Suck

August 19, 2010

What is a weakly typed language and what does that imply?

Filed under: Langauge Design — Grauenwolf @ 12:18 pm

I have a simple test for weakness in a type system.

Are buffer overruns are possible?

The first reason we invented type systems was to gain the ability to prove that a given piece of memory has structurally valid data. Without array bound checking you lose that ability. You cannot say definitively say anything about a portion of program without examining the whole program.

Consider this fragment where aString and anArray are local variables:

    aString = "Tom";
    anArray[10] = 81;

What is the value of aString?

In a strongly typed language, aString  has the semantic value of “Tom”. It doesn’t matter is aString is an array of char, a pointer to an array of char, or a reference to a String object, the semantic meaning is well known.

In a weakly typed language you can’t tell me anything about aString unless you know how both it and anArray are laid out in memory.  It could be “Tom”, “Qom”, “TQm”, “ToQ”, or even “TomQasdasdajshd akjasghkjd asgkudhasgdoiaughd asjbvhd”.

In weakly typed languages like C, C++, and TurboPascal this leads to vulnerabilities like buffer overrun attacks as well as hard to understand errors. I included the last one because I’ve seen students using TurboPascal have a similar problem to the one I showed above. They literally spent an hour starting at code trying to figure out why their variable had the wrong value before one of them decided to just “pad” the array.

Implicit Casting

It is often said that languages with implicit casting are weakly typed. I would argue that languages that allow objects to implicitly case are strongly typed by necessity. Consider this:

    object anObject = "17"

    int aNumber = anObject

In order to implicitly case anObject into aNumber, the object that anObject must know its own type at runtime. By extension, if the object knows its own type then the runtime will ensure  that the memory under that object is only mutated in a way that is inconsistent with its type.

That isn’t to say that implicit casting requires strong typing. You can still implicitly cast if your variables know what type the object is supposed to be.

    string aString = "17"

    int aNumber = aString

Dynamic Typing, Dynamic Binding, Late Binding, and Duck Typing

For our purposes these terms all mean the same thing, which is you compile with a method name and don’t bind to the actual method until runtime. This is used in a wide variety of languages and technologies include Smalltalk, Objective-C, OLE/COM, Visual Basic, and C# 4+. It is different than “dynamic dispatching”, which is usually implemented by binding to a slot in a v-table with a well-known layout. In C++ this is done at compile time while Java and .NET defer it until the JIT compiler is run against the class.

Since the compiler has no idea what object will be in any given variable, this again requires the object to know its own type. (Note I said “type”, not “class”. In some dynamic languages each object is its own unique type.) And as before, an object that knows its own type works with the runtime to prevent random garbage from overwriting its memory.


August 12, 2010

All, and I mean literally all, version control systems are an embarrassment to our industry.

Filed under: Uncategorized — Grauenwolf @ 10:39 am

To start with, all systems that physically tie branches to directory layouts are just plain stupid. Code should be organized by the structure of the project. When you have to contort it to the needs of the VCS then something wrong.

The stream-based systems like AccuRev and ClearCase are closer to what VCS should be. But last time I checked AccuRev didn’t even have any support for build management. What’s the point of having the ability to easily move features from one stream to another if you can’t then build the damn thing? As for ClearCase, it is so buggy and temperamental you need a team just to babysit it.

The file system should be playing an active role too. Without any action by the developer, the file system and VCS should be tracking every revision to every file. And those revisions should remain available to me until I “checkpoint” the file and commit them to my private branch.

Check-ins should be based on features, not just change-sets. When I commit a set of files it should prompt me to strongly associate it with an issue or project number. Not just a soft-link like TFS does, I want a solid association where I can move the whole feature from one branch/stream to another branch/stream.

When pulling down code, I should be able to say “Get this branch/stream plus features X, Y, and Z”. I should then be able to say “Remove feature Y and recompile”. For this to work we also need to be able to indicate what the real dependencies are between features. Simply guessing that A requires B but not C because A and B have overlapping files is a good start, but we need the option to mark C as required as well.

Blind text comparisons aren’t enough, VCS systems need to deeply understand project and source files. When I add or remove a feature X from my local workspace, it should know how to update my project file accordingly. And it shouldn’t get confused just because project X and project Y both added the same file, but by chance put them in a different order in the project file.

Unit tests should never run on my machine. There should be a separate machine watching the changes on my machine and rerunning the applicable unit tests every time I change a file. When I notice something go red, the real-time versioning I mentioned in the third paragraph can be used to show me exactly what I did to break the test.

Blog at WordPress.com.