API's that Suck

May 28, 2009

Challenging SICP

Filed under: Uncategorized — Grauenwolf @ 6:30 am

Consider this logic problem in the realm of “nondeterministic computing”.

Baker, Cooper, Fletcher, Miller, and Smith live on different floors of an apartment house that contains only five floors. Baker does not live on the top floor. Cooper does not live on the bottom floor. Fletcher does not live on either the top or the bottom floor. Miller lives on a higher floor than does Cooper. Smith does not live on a floor adjacent to Fletcher’s. Fletcher does not live on a floor adjacent to Cooper’s. Where does everyone live?

This is the SICP solution to that problem:

 (define (multiple-dwelling)
  (let ((baker (amb 1 2 3 4 5))
        (cooper (amb 1 2 3 4 5))
        (fletcher (amb 1 2 3 4 5))
        (miller (amb 1 2 3 4 5))
        (smith (amb 1 2 3 4 5)))
    (require
     (distinct? (list baker cooper fletcher miller smith)))
    (require (not (= baker 5)))
    (require (not (= cooper 1)))
    (require (not (= fletcher 5)))
    (require (not (= fletcher 1)))
    (require (> miller cooper))
    (require (not (= (abs (- smith fletcher)) 1)))
    (require (not (= (abs (- fletcher cooper)) 1)))
    (list (list 'baker baker)
          (list 'cooper cooper)
          (list 'fletcher fletcher)
          (list 'miller miller)
          (list 'smith smith))))

SICP then goes on to explain how you need a “Metacircular Evaluator”, which is essentially a LISP parser that runs inside another LISP program. The author writes,

The driver loop for the amb evaluator is complex, due to the mechanism that permits the user to try again in evaluating an expression. The driver uses a procedure called internal-loop, which takes as argument a procedure try-again. The intent is that calling try-again should go on to the next untried alternative in the nondeterministic evaluation. Internal-loop either calls try-again in response to the user typing try-again at the driver loop, or else starts a new evaluation by calling ambeval.

Here are some of the components that go into the ambiguous evaluator.

(define (analyze-self-evaluating exp)
  (lambda (env succeed fail)
    (succeed exp fail)))

(define (analyze-quoted exp)
  (let ((qval (text-of-quotation exp)))
    (lambda (env succeed fail)
      (succeed qval fail))))

(define (analyze-variable exp)
  (lambda (env succeed fail)
    (succeed (lookup-variable-value exp env)
             fail)))

(define (analyze-lambda exp)
  (let ((vars (lambda-parameters exp))
        (bproc (analyze-sequence (lambda-body exp))))
    (lambda (env succeed fail)
      (succeed (make-procedure vars bproc env)
               fail))))

(define (analyze-if exp)
  (let ((pproc (analyze (if-predicate exp)))
        (cproc (analyze (if-consequent exp)))
        (aproc (analyze (if-alternative exp))))
    (lambda (env succeed fail)
      (pproc env
;; success continuation for evaluating the predicate
;; to obtain pred-value
             (lambda (pred-value fail2)
               (if (true? pred-value)
                   (cproc env succeed fail2)
                   (aproc env succeed fail2)))
;; failure continuation for evaluating the predicate
             fail))))

(define (analyze-sequence exps)
  (define (sequentially a b)
    (lambda (env succeed fail)
      (a env
;; success continuation for calling a
         (lambda (a-value fail2)
           (b env succeed fail2))
;; failure continuation for calling a
         fail)))
  (define (loop first-proc rest-procs)
    (if (null? rest-procs)
        first-proc
        (loop (sequentially first-proc (car rest-procs))
              (cdr rest-procs))))
  (let ((procs (map analyze exps)))
    (if (null? procs)
        (error "Empty sequence -- ANALYZE"))
    (loop (car procs) (cdr procs))))

(define (analyze-definition exp)
  (let ((var (definition-variable exp))
        (vproc (analyze (definition-value exp))))
    (lambda (env succeed fail)
      (vproc env                        
             (lambda (val fail2)
               (define-variable! var val env)
               (succeed 'ok fail2))
             fail))))

(define (analyze-assignment exp)
  (let ((var (assignment-variable exp))
        (vproc (analyze (assignment-value exp))))
    (lambda (env succeed fail)
      (vproc env
             (lambda (val fail2)        ; *1*
               (let ((old-value
                      (lookup-variable-value var env))) 
                 (set-variable-value! var val env)
                 (succeed 'ok
                          (lambda ()    ; *2*
                            (set-variable-value! var
                                                 old-value
                                                 env)
                            (fail2)))))
             fail))))

(define (analyze-application exp)
  (let ((fproc (analyze (operator exp)))
        (aprocs (map analyze (operands exp))))
    (lambda (env succeed fail)
      (fproc env
             (lambda (proc fail2)
               (get-args aprocs
                         env
                         (lambda (args fail3)
                           (execute-application
                            proc args succeed fail3))
                         fail2))
             fail))))

(define (get-args aprocs env succeed fail)
  (if (null? aprocs)
      (succeed '() fail)
      ((car aprocs) env
;; success continuation for this aproc
                    (lambda (arg fail2)
                      (get-args (cdr aprocs)
                                env
;; success continuation for recursive
;; call to get-args
                                (lambda (args fail3)
                                  (succeed (cons arg args)
                                           fail3))
                                fail2))
                    fail)))

 

(define (execute-application proc args succeed fail)
  (cond ((primitive-procedure? proc)
         (succeed (apply-primitive-procedure proc args)
                  fail))
        ((compound-procedure? proc)
         ((procedure-body proc)
          (extend-environment (procedure-parameters proc)
                              args
                              (procedure-environment proc))
          succeed
          fail))
        (else
         (error
          "Unknown procedure type -- EXECUTE-APPLICATION"
          proc))))

(define (analyze-amb exp)
  (let ((cprocs (map analyze (amb-choices exp))))
    (lambda (env succeed fail)
      (define (try-next choices)
        (if (null? choices)
            (fail)
            ((car choices) env
                           succeed
                           (lambda ()
                             (try-next (cdr choices))))))
      (try-next cprocs))))

(define input-prompt ";;; Amb-Eval input:")
(define output-prompt ";;; Amb-Eval value:")
(define (driver-loop)
  (define (internal-loop try-again)
    (prompt-for-input input-prompt)
    (let ((input (read)))
      (if (eq? input 'try-again)
          (try-again)
          (begin
            (newline)
            (display ";;; Starting a new problem ")
            (ambeval input
                     the-global-environment
;; ambeval success
                     (lambda (val next-alternative)
                       (announce-output output-prompt)
                       (user-print val)
                       (internal-loop next-alternative))
;; ambeval failure
                     (lambda ()
                       (announce-output
                        ";;; There are no more values of")
                       (user-print input)
                       (driver-loop)))))))
  (internal-loop
   (lambda ()
     (newline)
     (display ";;; There is no current problem")
     (driver-loop))))

This is the C# implementation in all its glory. As you can see, there are two simple functions to handle the initial cross join, one function to serve as a filter, and one more to check for distinct values. All the power of a mighty ambiguous evaluator in a tiny little package.

static void Main(string[] args)
{
    const int b = 0;
    const int c = 1;
    const int f = 2;
    const int m = 3;
    const int s = 4;
    
    int[] baker = { 1, 2, 3, 4, 5 };
    int[] cooper = { 1, 2, 3, 4, 5 };
    int[] fletcher = { 1, 2, 3, 4, 5 };
    int[] miller = { 1, 2, 3, 4, 5 };
    int[] smith = { 1, 2, 3, 4, 5 };

    var source = baker.CrossJoin(cooper)
.CrossJoin(fletcher)
.CrossJoin(miller)
.CrossJoin(smith);
    var result = source.Require(x => x[b] != 5)
                      .Require(x => x[c] != 1)
                      .Require(x => x[f] != 5)
                      .Require(x => x[f] != 1)
                      .Require(x => x[m] > x[c])
                      .Require(x => Math.Abs(x[s] - x[f]) > 1)
                      .Require(x => Math.Abs(x[f] - x[c]) > 1)
                      .Require(x => IsDistinct(x));
    
    foreach (var x in result)
    {
        Console.WriteLine("{ Baker:" + x[b] + " Cooper:" + x[c] + 
" Fletcher" + x[f] + " Miller:" + x[m]
+ " Smith" + x[s] + "}"); } } public static bool IsDistinct(params int[] source) { HashSet<int> temp = new HashSet<int>(source); return temp.Count == source.Count(); } static IEnumerable<T> Require<T>(this IEnumerable<T> source, Predicate<T> predicate) { foreach (T item in source) { if (predicate(item)) yield return item; } } static IEnumerable<T[]> CrossJoin<T>(this IEnumerable<T> source1, IEnumerable<T> source2) { foreach (T itemA in source1) { foreach (T itemB in source2) { yield return new T[] { itemA, itemB }; } } } static IEnumerable<T[]> CrossJoin<T>(this IEnumerable<T[]> source1, IEnumerable<T> source2) { foreach (T[] itemA in source1) { foreach (T itemB in source2) { var temp = itemA.ToList(); temp.Add(itemB); yield return temp.ToArray(); } } }
And for completeness, here is the LINQ solution in VB.
Dim baker = L(1, 2, 3, 4, 5)
Dim cooper = L(1, 2, 3, 4, 5)
Dim fletcher = L(1, 2, 3, 4, 5)
Dim miller = L(1, 2, 3, 4, 5)
Dim smith = L(1, 2, 3, 4, 5)

Dim sample = From b In baker _
    From c In cooper _
    From f In fletcher _
    From m In miller _
    From s In smith _
        Select b, c, f, m, s

Dim query = From row In sample _
Where IsDistinct(row.b, row.c, row.f, row.m, row.s) _ Where row.b <> 5 _ Where row.c <> 1 _ Where row.f <> 5 _ Where row.f <> 1 _ Where row.m > row.c _ Where Math.Abs(row.s - row.f) <> 1 _ Where Math.Abs(row.f - row.c) <> 1 For Each row In query Console.WriteLine() Console.WriteLine("Baker : " & row.b) Console.WriteLine("Cooper : " & row.c) Console.WriteLine("Fletcher : " & row.f) Console.WriteLine("Miller : " & row.m) Console.WriteLine("Smith : " & row.s) Next

May 26, 2009

Immutable Collections Are Easy, What We Need are Immutable Data Structures

Filed under: Uncategorized — Grauenwolf @ 10:40 pm

Time and time again I see examples of immutable collections. These are invariable a waste of time because immutable collections are a solved problem. People often say things such as:

A pure function over immutable stacks can be safely called from multiple threads with no need for locking.

API Design

Ok, that’s true. But why use a so-called “immutable stack”? Consider these two declarations:

int Foo( ReadOnlyStack<Record> records)
int Foo( IEnumerable<Record> records)

As a developer using someone else’s API, what would you prefer? Would you like to explicitly convert your collection into a ReadOnlyStack? Or would you want the API just to make it’s own stack based on what you pass?

I know some people are thinking “but I need it to be immutable so I don’t have to take a lock”. Fine. If you really can’t trust the developer then go ahead and demand the standard read-only collection. In .NET this would be:

int Foo( ReadOnlyCollection<Record> records)

You still don’t have to make your users muck about with some sort of “ReadOnlyStack”, which is just an implementation detail anyways.

Performance

The next argument I see is the claim that this will somehow make for better performance because you don’t have to make a copy. That is clearly a violation of the premature optimization rule.

Advantages of using a shared linked-list

  1. Initial copies are essentially free.
  2. Stacks can grow forever without expensive copy operations

Disadvantages of using a shared linked-list

  1. Initial size is larger than array-based stacks
  2. As they are used, they quickly diverge and stop sharing nodes
  3. They are less likely to be sequential in memory, especially as they age. (Can you say cache miss?)

Clearly there are going to be times where pre-allocating an array based stack is going to give you better performance than a stack that is scattered across memory. I can’t say array-based stacks are always going to be better, but why take the risk? If you just request an IEnumerable at the beginning, you can use whatever type of stack your performance testing dictates. You can even special-case the ReadOnlyStack if you feel it is necessary.

So what is the problem?

Complex, immutable data types are the interesting problem. Currently in platforms like Java and .NET, there is no easy way to make a class that has both mutable and immutable variants. 

More often than not developers want mutable objects when they are constructing it. Data may be coming from multiple sources or even be aggregated as it comes in. Only once the class is finalized do we want

Currently we can do this by using interfaces, which are hard to change, and tons of boilerplate code.

Const in .NET

One option would be to add the concept of a Const modifier to C# and Visual Basic. The rules would be simple, if you specify const than you get a compiler error when accessing non-pure methods and properties.

I don’t really like this one for two reasons.

  1. Cheating is too easy. Anyone can simply cast away the const either directly or by
  2. There is no guarantee that someone won’t hold onto the mutable reference.

Still, there is something nice about the syntax.

Boxing for Purity?

In .NET every value type actually has two classes. There is the class named for the value type such as Integer or Point. Then there is the unnamed class that is used when the value is “boxed” and placed on the stack. Developers only have to deal with the first class, the runtime takes care of the second.

What if we had the same concept for purity in classes? Say your Customer class had an implied method called MakeImmutable. This would copy it into another class called “ImmutableCustomer” that was identical except:

  1. The class definition is created by the compiler or runtime.
  2. All methods/properties not marked as “Pure” are removed.
  3. All child objects are mapped to their Immutable variants.
  4. All Customer classes can be implicitly cast to a ImmutableCustomer.
  5. A mutable copy can be created by calling MakeMutable. This won’t return the original or allow modifying the read-only variant.

This cannot be done in the language itself, the runtime needs to support it in order to work out the inheritance/casting rules.

Performance Lessons from StringBuilder

The StringBuilder class has an interesting property. It’s underlying string buffer is mutable, but only until you call ToString. Once you do that, any further changes will cause a new string buffer is created from the now immutable String.

The same thing should be done for these auto-generated classes. When you call MakeImmutable, it sets a flag. The copy of the underlying data isn’t actually made until a future call to mutating method/property is made.

This will be tricky when dealing with child classes. There will be times when some classes have been cloned but other are still simply marked.

What happens when you implicitly cast objects to their ImmutableVersion?

I don’t know.

Calling MakeImmutable is out of the question. In a mutable context, that call would be way too expensive.

Perhaps we just do nothing. If the caller is passing a mutable object to a method, the caller has to take responsibility for any rules.

What happens when you assign a mutable object to an Immutable reference without calling MakeImmutable?

Again, I don’t know.

One could argue the caller should be responsible for discarding the mutable references or suffer the consequences.

Alternately the receiver could call MakeImplicit itself and thus be safe from future changes. But this may break the caller’s expectations that the objects are the same.

I’m gong with compiler warnings for now. The caller must either explicitly cast the object, keeping it mutable, or call MakeImmutable which creates a copy.

May 21, 2009

An API, not matter how cool you think it is, is not a DSL

Filed under: Uncategorized — Grauenwolf @ 9:58 pm

A DSL or Domain Specific Language, is a way to represent some domain-specific concept. For example, I’m looking for a DSL that can be used to describe date equations such as “the 2nd business day after the last business day of the month”. This information will find itself in various configuration files and database tables.

Unfortunately, all the recommendations I’ve been getting are for so-called “Internal DSLs”. The problem with that is they are not “domain-specific”, they are “language-specific”. You can’t use a Ruby’s Chronic from .NET or T-SQL without building a full Ruby parser. Nor can you use Java’s JODA Time or Perl’s Date::Manip. There are all API’s, not languages.

That isn’t to say there isn’t a DSL hidden in each of these. They all have some sort of meta-language describing how to format strings that you pass to their Parse methods . Unfortunately they are mostly undocumented. Chronic says it “Parses a string containing a natural language date or time”, but you are left to guessing what exactly that means from the examples.

Therefore I am moved to propose this definition for a DSL:

A Domain Specific Language is documentation on to how to parse a set of text or symbols in a consistent and repeatable fashion.

Let us turn back to JODA Time. Can you determine how to parse “Days d = Days.daysBetween(startDate, endDate);” from the JODA docs? Of course not, you also have to look up the language spec for Java. Java is the real language here, not JODA Time.

Turning to Date::Manip, we see that it has a parse complex date patterns such as “2nd day of every month [in 1997]".

0:0:0:1*2,4,6:0:0 every day at at 2:00, 4:00, and 6:00
0:0:0:2*12-13:0,30:0 every other day at 12:00, 12:30, 13:00, and 13:30
0:1:0*-1:0:0:0 the last day of every month *1990-1995:
12:0:1:0:0:0 Dec 1 in 1990 through 1995

It then goes on to explain exactly how to parse it (I will refer you to the documentation for details, start with the ParseRecur section for the actual DSL descriptions). I don’t know anything about Perl, but I don’t think I’ll have any problem porting this to .NET or T-SQL. It is a little unfriendly, more like an AST than an end-user language. But then again, so are Regular Expressions and they are clearly useful.

May 13, 2009

Open Standards is not a platform

Filed under: Uncategorized — Grauenwolf @ 7:19 pm

Today I wrote about the New York Times dropping WPF and Silverlight in favor of Adobe’s AIR. This was the first response:

Sigh. It’s not like AIR is a better idea… Open Standards, people.

That’s not even an option, let alone the right one. “Open Standards” isn’t the name of a platform. It has no runtime, no compiler, no libraries, no features, no anything. Open Standards is a concept, a description you apply to something.

What’s worse is that Silverlight is supposed to be an open standard. Much of its core has already been ratified by ECMA and Microsoft is very interested in helping third-parties like Novel build their own versions.

Adobe AIR, on the other hand, is completely proprietary. No one else makes AIR runtimes and Adobe intends to keep it that way. But that doesn’t matter to the New York Times. All they care about is which one actually works.

Create a free website or blog at WordPress.com.