Getting close to the conceptual metal

Posted on 27 January, 2020

A few years ago I watched the Structure and interpretation of computer programs video lectures, kindly posted publicly by MIT OpenCourseWare. There's so much interesting material in them but one part really struck me, in 5B: Computational Objects Gerald Jay Sussman defines a pair "in terms of nothing but air, hot air".

(define (cons x y)
  (λ (m) (m x y)))

(define (car x)
  (x (λ (a b) a)))

(define (cdr x)
  (x (λ (a b) b)))

Previous lectures had established pairs as a sort of fundamental thing. After all, pairs could be used to make lists and in Lisp once you've got pairs and lists you've got a lot. Sussman, however, was revealing that there was a level below that, there was an even more fundamental thing: functions.

Later, through learning Haskell, I became familiar with the concept of algebraic data types. It was astonishing to me that you could make so much out of so little. You seemingly only need types that are "pair-like" (products) and types that are "either-like" (sums).

data Product a b
  = P a b

data Sum a b
  = A a
  | B b

newtype Maybe a
  = Maybe (Sum () a)

newtype List a
  = List (Maybe (Product a (List a)))

newtype NonEmpty a
  = NonEmpty (Product a (List a))

When Sussman chalked up that pair made of nothing but "hot air" he attributed the encoding to Alonzo Church. Appreciating that many other types can be made from pair-like things and either-like things, I wondered if there existed a Church encoding for Either. After all, with Either and Pair it sure seemed like I'd be able to construct anything else I needed. I found Church encodings for lots of other interesting things, like booleans and natural numbers, but nothing quite like Either.

Eventually I'd happen across Scott encoding, named for Dana Scott to whom it is attributed. It's super interesting.

Scott encoding

With Scott encoding we have a consistent method for representing any algebraic datatype in the untyped lambda calculus. It goes something like this: say we have a datatype D, with N constructors, (d₁ … d_N), such that constructor d_i has arity A_i. The Scott encoding for constructor d_i of D would be:

d_i := λx₁ … x_{A_i}. λc₁ … c_N. c_i x₁ … x_{A_i}

If you've never seen this sort of notation before don't worry we'll be moving to mostly Haskell soon enough. Until then we could have a couple of swings at stating this in words.

Each constructor, d₁ through d_N, is responsible for accepting the arguments needed to call its chosen continuation, any one of c₁ through c_N.
There is one continuation per constructor (c_i is the continuation for d_i), and each continuation needs to be passed all the arguments its constructor was passed (d_i was passed x₁ … x_{A_i} and so c_i will passed them as well).

In this way values are held in the closure of each constructor. These internal values are used when the consumer of a value of type D passes in a continuation which will accept them as arguments.

To develop an intuition for how this actually works we can try converting some well-known Haskell data types to this encoding.

Pair

To Scott encode any one constructor for a given datatype we need to know how many other constructors it has, and how many arguments each of them accept. Haskell's pair type looks like syntax, however by asking GHCi for some information about it we can see that its much like any other user-defined datatype.

$ ghci
> :info (,)
data (,) a b = (,) a b  -- Defined in `GHC.Tuple'

It has one constructor which happens to share the name of its type.

> :type (,)
(,) :: a -> b -> (a, b)

For our purposes we'll rename the (,) type to Pair and the (,) constructor to pair. As Pair's only constructor, pair need only accept one continuation c. The arguments that pair accepts, such that it can call c with them, are the resulting Pair's first and second elements.

pair := λx₁, x₂. λc. c x₁ x₂

But what is Pair? How might we write the type of a function that accepts a Pair? First we might rewrite pair in Haskell, initially as a fairly direct translation.

pair = \x1 x2 -> \c -> c x1 x2

Then, without the first explicit lambda.

pair x1 x2 = \c -> c x1 x2

That can be loaded into GHCi or into a Repl.it Haskell session in order to find its inferred type.

$ ghci
GHCi, version 8.6.5: http://www.haskell.org/ghc/  :? for help
Loaded GHCi configuration from /home/brad/.ghci
> pair x1 x2 = \c -> c x1 x2
> :type pair
pair :: t1 -> t2 -> (t1 -> t2 -> t3) -> t3

You might be wondering why these lambdas are arranged the way they are. Why not use any of the following, equivalent, arrangements.

pair = \x1 -> \x2 -> \c -> c x1 x2
pair = \x1 -> \x2 c -> c x1 x2
pair = \x1 x2 c -> c x1 x2

The answer is that I'm sneakily trying to reveal what the Pair component of pair's type is, and, as it ends up, what the datatype component of any Scott encoded constructor's type happens to be.

type Pair t1 t2 t3
  = (t1 -> t2 -> t3) -> t3

The equivalent Haskell type, (,) a b has one less type variable. It's interesting to think about what it would mean for Pair to ditch t3 somehow.

type Pair t1 t2
  = (t1 -> t2 -> _) -> _

One way of looking at this is that it's up to the continuation (c) what it does with x₁ and x₂, the constructor (pair) doesn't get a say. This means that whatever pair returns must work for anything its consumer may want to produce. In Haskell types we might say that Pair must work for all possible types its provided continuation could return.

type Pair t1 t2
  = forall t3. (t1 -> t2 -> t3) -> t3

pair :: Pair a b
pair a b = \c -> c a b

Is this equivalent to (,)? How might we tell? One way might be to write a function which converts (,)s into Pairs and one which converts Pairs into (,)s. Then, given those two functions, we should be able to observe that their composition doesn't do anything. Or put another way: for these two functions:

from :: (a, b) -> Pair a b
from = _

to :: Pair a b -> (a, b)
to = _

Their composition in either direction is equivalent to an identity function.

from ∘ to = id_Pair
to ∘ from = id_(,)

If these two functions can be written, and they have this property, it means that Pair and (,) are isomorphic.

So then, is it possible to write from?

from :: (a, b) -> Pair a b
from (a, b) = _

Which is sort of asking: given an a and a b, is it possible to construct a Pair a b?

from :: (a, b) -> Pair a b
from (a, b) = pair a b

How about to?

to :: Pair a b -> (a, b)
to p = _

Recall that p is a function waiting to be passed a continuation. That continuation will be passed both an a and a b.

to :: Pair a b -> (a, b)
to p = p (\a b -> _)

So now, given an a and a b can we construct an (a, b)?

to :: Pair a b -> (a, b)
to p = p (\a b -> (a, b))

With that done: is to ∘ from an identity function? It does seem so.

> (to . from) (1, 2)
(1,2)
> (to . from) (True, "Nifty")
(True,"Nifty")

Without a Show or Eq instance for (->) it's hard to observe the same for from ∘ to. We can at least observe what to ∘ from ∘ to does.

> (to . from . to) (pair 1 2)
(1,2)
> (to . from . to) (pair True "Very nifty")
(True,"Very nifty")

Well, I'm convinced. Are you? For further tangible evidence you could try re-implementing everything in Data.Tuple in terms of Pair. It's a pretty fun exercise and helped me to become more comfortable with these lambda-only pairs.

Either

In Haskell Either is defined like this:

data Either a b
  = Left a
  | Right b

For the purposes of Scott encoding we can say that Either has two constructors which each accept one argument. This means that whatever each constructor returns will accept two continuations which themselves each accept one argument. Each constructor will accept an argument to pass to their chosen continuation.

The first constructor left, then, will accept one argument to pass to the first continuation.

left := λx. λc₁, c₂. c₁ x

And the second will accept one to pass to the second continuation.

right := λx. λc₁, c₂. c₂ x

Each continuation can accept an argument of a different type (say c₁ takes an α and c₂ takes a β) but both should return something of the same type (say γ). The reason for this is how values of the Either type are used, this may become evident later. Using a polymorphic lambda calculus rather than an untyped lambda calculus we can write out how those types line up.

left := Λα, β, γ. λx ^α. λc₁ ^{α → γ}, c₂ ^{β → γ}. c₁ x
right := Λα, β, γ. λx ^β. λc₁ ^{α → γ}, c₂ ^{β → γ}. c₂ x

For me at least, it's a little easier to see in Haskell.

left :: a -> (a -> c) -> (b -> c) -> c
left x = \c1 c2 -> c1 x

right :: b -> (a -> c) -> (b -> c) -> c
right x = \c1 c2 -> c2 x

As we did with Pair, we might ask: where's Either in all of that? And, as with Pair, I have sneakily positioned certain lambdas to try and point it out. However, in this case another method might be to put left and right side by side and view their shared return type.

left  :: a -> ((a -> c) -> (b -> c) -> c)
right :: b -> ((a -> c) -> (b -> c) -> c)

Which might yield the following.

type Either a b c
  = (a -> c) -> (b -> c) -> c

As with Pair we can make sure that the choice of c is left solely up to the consumer of an Either.

type Either a b
  = forall c. (a -> c) -> (b -> c) -> c

left :: a -> Either a b
left x = \c1 c2 -> c1 x

right :: b -> Either a b
right x = \c1 c2 -> c2 x

Can we write to and from for converting between our Either and Haskell's?

from :: Prelude.Either a b -> Either a b
from = _

to :: Either a b -> Prelude.Either a b
to = _

For from we can be given a Left containing an a or a Right containing a b.

from :: Prelude.Either a b -> Either a b
from e =
  case e of
    Left a -> _
    Right b -> _

So we have two questions to answer. Given an a can I construct an Either a b?

from :: Prelude.Either a b -> Either a b
from e =
  case e of
    Left a -> left a
    Right b -> _

Given a b can I construct an Either a b?

from :: Prelude.Either a b -> Either a b
from e =
  case e of
    Left a -> left a
    Right b -> right b

For to it would be nice if we could use a case expression as we did with from and in a sense we can.

to :: Either a b -> Prelude.Either a b
to e =
  e
    (\a -> _)
    (\b -> _)

This was less clear when looking at Pair and (,) but Scott-encoded datatypes are almost ready-made case expressions. Each continuation is like a constructor-only pattern-match.

As with to we again have two questions to answer, and they're very similar. First, given an a can I construct a Prelude.Either a b?

to :: Either a b -> Prelude.Either a b
to e =
  e
    (\a -> Left a)
    (\b -> _)

Next, given a b can I construct a Prelude.Either a b?

to :: Either a b -> Prelude.Either a b
to e =
  e
    (\a -> Left a)
    (\b -> Right b)

Are Prelude.Either and Either equivalent?

> to (from (Left "Hrm?"))
Left "Hrm?"
> to (from (to (left "Ahhhhh")))
Left "Ahhhhh"
> to (from (Right 2))
Right 2
> to (from (to (right 14)))
Right 14

Looks likely. Re-implementing everything in Data.Either would sure be convincing.

From hot air to solid ground

This is all well and good but I'm still left wondering what could actually be built with this. Knowing, intellectually, that Haskell's Either and the lambdas-only Either are equivalent isn't as satisfying as using Either and Pair to write real(ish) programs.

We could use Either and Pair to write a little REPL which evaluates Lisp-like expressions.

$ runhaskell HotAir.hs
Enter an expression to evaluate. E.G.
(+ (* (+ 12 7) 100) 58)
> (+ 1 2)
3
>

Yeah, let's do that.

To make things less onerous this little language could be limited to only adding and multiplying natural numbers. No define, no λ. Nothing other than positive integer literals (0, 1, 2 …), addition (+) and multiplication (*).

> (define (sqr x) (* x x))
Error: Unexpected char: 'd'
>

Before we begin: this is going to move quite fast. Don't worry if you miss some of the details of the implementation. It's more important to be generally aware of what the program does and to keep in mind that we're building it as a bit of a stress test for our lambda-only datatypes. If you'd like to learn more about how to build programs in Haskell I'd reccomend all the material produced by Typeclasses and the Data61 Functional Programming Course.

Now, the first thing we should do is redefine Pair and Either using newtype, rather than type (GHC is pretty great, but sometimes we need to cut it some slack).

{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE NoImplicitPrelude #-}
{-# OPTIONS_GHC -Wall -Werror #-}

module HotAir where

newtype Pair a b = Pair
  { runPair ::
      forall c.
      (a -> b -> c) ->
      c
  }

pair :: a -> b -> Pair a b
pair x1 x2 = Pair \c -> c x1 x2

newtype Either a b = Either
  { runEither ::
      forall c.
      (a -> c) ->
      (b -> c) ->
      c
  }

left :: a -> Either a b
left x = Either \c1 _ -> c1 x

right :: b -> Either a b
right x = Either \_ c2 -> c2 x

Next, we might define a monadic parser.

import Data.String (String)

newtype Parser a = Parser
  { parse ::
      String ->
      Either String (Pair a String)
  }

A satisfy function for creating a Parser which compares the next character using given Char -> Bool predicate would be handy.

{-# LANGUAGE LambdaCase #-}

import Data.Bool (Bool)
import Data.Char (Char)
import Data.List ((++))

satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser \case
  [] -> left "Unexpected EOF"
  (c : cs) ->
    if p c
      then right (pair c cs)
      else left ("Unexpected char: " ++ show c)

We can use it to define a char function which matches a specific Char.

import Data.Eq ((==))

char :: Char -> Parser Char
char = satisfy . (==)

Functor, Applicative and Alternative instances for Parser will make composing and manipulating them nicer.

import Control.Applicative
  ( Alternative (empty, (<|>)),
    Applicative (pure, (<*>)),
  )
import Data.Function (const)
import Data.Functor (Functor (fmap))

instance Functor Parser where
  fmap :: (a -> b) -> Parser a -> Parser b
  fmap f pa =
    Parser \str ->
      runEither (parse pa str)
        left
        (\p -> right (runPair p (pair . f)))

instance Applicative Parser where
  pure :: a -> Parser a
  pure a = Parser (right . pair a)

  (<*>) :: Parser (a -> b) -> Parser a -> Parser b
  pf <*> pa =
    Parser \str ->
      runEither (parse f str)
        left
        (\p -> runPair p (\f -> parse (fmap f pa)))

instance Alternative Parser where
  empty :: Parser a
  empty = Parser (const (left "Empty"))

  (<|>) :: Parser a -> Parser a -> Parser a
  Parser l <|> Parser r =
    Parser \str ->
      runEither (l str)
        (const (r str))
        right

A convenient way to ignore unused input will be useful, too.

fst :: Pair a b -> a
fst = (`runPair` const)

execParser :: Parser a -> String -> Either String a
execParser p str =
  runEither (parse p str)
    left
    (right . fst)

That should be enough to write an eval function for our little language.

import Control.Applicative (some, (*>), (<*))
import Data.Char (isDigit)
import Data.Functor ((<$), (<$>))
import GHC.Num ((*), (+))
import Numeric.Natural (Natural)

eval :: String -> Either String Natural
eval = execParser expression
  where
    expression :: Parser Natural
    expression = natural <|> application
    natural :: Parser Natural
    natural = read <$> some (satisfy isDigit)
    application :: Parser Natural
    application =
      char '(' *> operator <* space
        <*> expression <* space
        <*> expression <* char ')'
    operator :: Parser (Natural -> Natural -> Natural)
    operator = (+) <$ char '+' <|> (*) <$ char '*'
    space :: Parser Char
    space = char ' '

Finally, a main function that lets a user input expressions and view the result of evaluating them.

module HotAir (main) where

import Control.Monad (forever)
import System.IO
  ( IO,
    getLine,
    print,
    putStr,
    putStrLn,
  )

main :: IO ()
main = do
  putStrLn "Enter an expression to evaluate. E.G."
  putStrLn "(+ (* (+ 12 7) 100) 58)"
  forever $ do
    putStr "> "
    line <- getLine
    runEither (eval line)
      (putStrLn . ("Error: " ++))
      print

And here it is, a little REPL made of mostly hot air.

{-# LANGUAGE BlockArguments #-}
{-# LANGUAGE InstanceSigs #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE NoImplicitPrelude #-}
{-# OPTIONS_GHC -Wall -Werror #-}

module HotAir (main) where

import Control.Applicative
  ( Alternative (empty, (<|>)),
    Applicative (pure, (<*>)),
    some,
    (*>),
    (<*),
  )
import Control.Monad (forever)
import Data.Bool (Bool)
import Data.Char (Char, isDigit)
import Data.Eq ((==))
import Data.Function (const, ($), (.))
import Data.Functor (Functor (fmap), (<$), (<$>))
import Data.List ((++))
import Data.String (String)
import GHC.Num ((*), (+))
import Numeric.Natural (Natural)
import System.IO
  ( IO,
    getLine,
    print,
    putStr,
    putStrLn,
  )
import Text.Read (read)
import Text.Show (Show (show))

newtype Pair a b = Pair
  { runPair ::
      forall c.
      (a -> b -> c) ->
      c
  }

pair :: a -> b -> Pair a b
pair x1 x2 = Pair \c -> c x1 x2

fst :: Pair a b -> a
fst = (`runPair` const)

newtype Either a b = Either
  { runEither ::
      forall c.
      (a -> c) ->
      (b -> c) ->
      c
  }

left :: a -> Either a b
left x = Either \c1 _ -> c1 x

right :: b -> Either a b
right x = Either \_ c2 -> c2 x

newtype Parser a = Parser
  { parse ::
      String ->
      Either String (Pair a String)
  }

satisfy :: (Char -> Bool) -> Parser Char
satisfy p = Parser \case
  [] -> left "Unexpected EOF"
  (c : cs) ->
    if p c
      then right (pair c cs)
      else left ("Unexpected char: " ++ show c)

char :: Char -> Parser Char
char = satisfy . (==)

instance Functor Parser where
  fmap :: (a -> b) -> Parser a -> Parser b
  fmap f pa =
    Parser \str ->
      runEither (parse pa str)
        left
        (\p -> right (runPair p (pair . f)))

instance Applicative Parser where
  pure :: a -> Parser a
  pure a = Parser (right . pair a)

  (<*>) :: Parser (a -> b) -> Parser a -> Parser b
  pf <*> pa =
    Parser \str ->
      runEither (parse pf str)
        left
        (\p -> runPair p (\f -> parse (fmap f pa)))

instance Alternative Parser where
  empty :: Parser a
  empty = Parser (const (left "Empty"))

  (<|>) :: Parser a -> Parser a -> Parser a
  Parser l <|> Parser r =
    Parser \str ->
      runEither (l str)
        (const (r str))
        right

execParser :: Parser a -> String -> Either String a
execParser p str =
  runEither (parse p str)
    left
    (right . fst)

eval :: String -> Either String Natural
eval = execParser expression
  where
    expression :: Parser Natural
    expression = natural <|> application
    natural :: Parser Natural
    natural = read <$> some (satisfy isDigit)
    application :: Parser Natural
    application =
      char '(' *> operator <* space
        <*> expression <* space
        <*> expression <* char ')'
    operator :: Parser (Natural -> Natural -> Natural)
    operator = (+) <$ char '+' <|> (*) <$ char '*'
    space :: Parser Char
    space = char ' '

main :: IO ()
main = do
  putStrLn "Enter an expression to evaluate. E.G."
  putStrLn "(+ (* (+ 12 7) 100) 58)"
  forever $ do
    putStr "> "
    line <- getLine
    runEither (eval line)
      (putStrLn . ("Error: " ++))
      print

You can save that code into a file and execute it with runhaskell if you want to see it in action.

$ runhaskell HotAir.hs
Enter an expression to evaluate. E.G.
(+ (* (+ 12 7) 100) 58)
> (+ (* (+ 12 7) 100) 58)
1958
>

That was quite a whirlwind.

I know I really shouldn't be surprised that I could write my program with (more or less) only lambdas. Lambda calculus is "a universal model of computation" after all. I also shouldn't be surprised that translating lambda calculus terms into Haskell is so natural as Haskell based on a lambda calculus. But in this post I really wanted to focus on the wonder I feel when being able to use abstract notions like lambda calculus to write runnable programs. My hope is that by writing this I might help someone find something they find wonderful, and encourage them to explore it.

One possible avenue, which I might go into in a future post, is this: the above program still makes use of some pre-defined Haskell datatypes like Bool, Natural, [a] and Char. Does it have to? Could we build those entirely out of lambdas too? How far can we go?

Huge thanks to Brad Bow for his clarifying suggestions.