Time and time again I see examples of immutable collections. These are invariable a waste of time because immutable collections are a solved problem. People often say things such as:
A pure function over immutable stacks can be safely called from multiple threads with no need for locking.
Ok, that’s true. But why use a so-called “immutable stack”? Consider these two declarations:
int Foo( ReadOnlyStack<Record> records)
int Foo( IEnumerable<Record> records)
As a developer using someone else’s API, what would you prefer? Would you like to explicitly convert your collection into a ReadOnlyStack? Or would you want the API just to make it’s own stack based on what you pass?
I know some people are thinking “but I need it to be immutable so I don’t have to take a lock”. Fine. If you really can’t trust the developer then go ahead and demand the standard read-only collection. In .NET this would be:
int Foo( ReadOnlyCollection<Record> records)
You still don’t have to make your users muck about with some sort of “ReadOnlyStack”, which is just an implementation detail anyways.
The next argument I see is the claim that this will somehow make for better performance because you don’t have to make a copy. That is clearly a violation of the premature optimization rule.
Advantages of using a shared linked-list
- Initial copies are essentially free.
- Stacks can grow forever without expensive copy operations
Disadvantages of using a shared linked-list
- Initial size is larger than array-based stacks
- As they are used, they quickly diverge and stop sharing nodes
- They are less likely to be sequential in memory, especially as they age. (Can you say cache miss?)
Clearly there are going to be times where pre-allocating an array based stack is going to give you better performance than a stack that is scattered across memory. I can’t say array-based stacks are always going to be better, but why take the risk? If you just request an IEnumerable at the beginning, you can use whatever type of stack your performance testing dictates. You can even special-case the ReadOnlyStack if you feel it is necessary.
So what is the problem?
Complex, immutable data types are the interesting problem. Currently in platforms like Java and .NET, there is no easy way to make a class that has both mutable and immutable variants.
More often than not developers want mutable objects when they are constructing it. Data may be coming from multiple sources or even be aggregated as it comes in. Only once the class is finalized do we want
Currently we can do this by using interfaces, which are hard to change, and tons of boilerplate code.
Const in .NET
One option would be to add the concept of a Const modifier to C# and Visual Basic. The rules would be simple, if you specify const than you get a compiler error when accessing non-pure methods and properties.
I don’t really like this one for two reasons.
- Cheating is too easy. Anyone can simply cast away the const either directly or by
- There is no guarantee that someone won’t hold onto the mutable reference.
Still, there is something nice about the syntax.
Boxing for Purity?
In .NET every value type actually has two classes. There is the class named for the value type such as Integer or Point. Then there is the unnamed class that is used when the value is “boxed” and placed on the stack. Developers only have to deal with the first class, the runtime takes care of the second.
What if we had the same concept for purity in classes? Say your Customer class had an implied method called MakeImmutable. This would copy it into another class called “ImmutableCustomer” that was identical except:
- The class definition is created by the compiler or runtime.
- All methods/properties not marked as “Pure” are removed.
- All child objects are mapped to their Immutable variants.
- All Customer classes can be implicitly cast to a ImmutableCustomer.
- A mutable copy can be created by calling MakeMutable. This won’t return the original or allow modifying the read-only variant.
This cannot be done in the language itself, the runtime needs to support it in order to work out the inheritance/casting rules.
Performance Lessons from StringBuilder
The StringBuilder class has an interesting property. It’s underlying string buffer is mutable, but only until you call ToString. Once you do that, any further changes will cause a new string buffer is created from the now immutable String.
The same thing should be done for these auto-generated classes. When you call MakeImmutable, it sets a flag. The copy of the underlying data isn’t actually made until a future call to mutating method/property is made.
This will be tricky when dealing with child classes. There will be times when some classes have been cloned but other are still simply marked.
What happens when you implicitly cast objects to their ImmutableVersion?
I don’t know.
Calling MakeImmutable is out of the question. In a mutable context, that call would be way too expensive.
Perhaps we just do nothing. If the caller is passing a mutable object to a method, the caller has to take responsibility for any rules.
What happens when you assign a mutable object to an Immutable reference without calling MakeImmutable?
Again, I don’t know.
One could argue the caller should be responsible for discarding the mutable references or suffer the consequences.
Alternately the receiver could call MakeImplicit itself and thus be safe from future changes. But this may break the caller’s expectations that the objects are the same.
I’m gong with compiler warnings for now. The caller must either explicitly cast the object, keeping it mutable, or call MakeImmutable which creates a copy.