Towards a folk computer

This text was originally published on the Folk Computer wiki. It is reproduced here for archival purposes. This is a technical write-up oriented towards those setting up their own Folk system. Nonetheless, the gifs aim to be evocative for a non-technical audience.


Folk Computer is a research & art project, led by Omar Rizwan and Andrés Cuervo, centered around designing new physical computing interfaces. The aims are, among others:

One of the main hypothesis behind the system is that interaction metaphors are downstream of hardware and operating systems. For this reason, it is a deeply technical project that defines a whole new stack, from drivers for peripherals all the way up to interface primtives. Folk can be described as an operating system for real-time peripheral coordination, through reactive programming.

By walking through the development process of one of these room-scale interfaces, I hope to show the grain of the system and suggest other possible interfaces. Ideally, you'd setup your own system and follow along. Here's a preview:

Writing programs in Folk

Hello folk!

Folk is an operating system, meaning it manages the execution of computer programs. It uses a combination of projector and camera to detect the programs and display their outputs. You can see my setup in the Folk directory. Here's a first program:

The program is identified by its AprilTag, a variant of QR-code specialized for robotics. By placing the program on the table, the AprilTag enters the field of view of the camera and is detected. The OS then figures out what code it must execute. When I cover the tag, it is no longer detected and stops executing. Here's the code for a sample program I've printed out:

Wish $this is labelled "hello folk!"

Programs are written in the Folk programming language, a domain-specific language implemented on top of TCL. The main primitives in the language are clauses which we can Wish for. Above, I've wished for the program to have a label and to be outlined.

Fulfilling wishes

All statements get stored in a reactive database. We can react to a statement using a When. If the clauses match, then some code gets executed in response. For example, we can write a second program which affects the first.

When /someone/ wishes /p/ is labelled "hello folk!" {
    Wish $p is outlined green
}

Notice how, as soon as I drop the second program on the table, the first gets outlined green. The variable p is bound to the identifier of the first program, and it gets outlined. The same happens when a put a copy of the first program on the table. Another p is bound, which also gets outlined green. Note that there's very little syntax and clause-matching supports arbitrary in-fixed parameters. This leads to very readable code, even approaching English. The choice of TCL, a string-based programming language, is the central reason such a design is possible.

Somewhere else in our system there is a program which draws text on the table, and another that draws colored lines. They are implemented just as our second program above, using the When primitive to execute some code.

When /someone/ wishes /program/ is outlined /color/ {
  # Find the region which corresponds to the program.
  # For each edge in the region, draw a colored line.
}

But where are these programs, if not on the table? The system has some default "virtual programs", always running despite not being physically activated. On each frame of execution, the OS sees all statements that are live in the database, figures out a dependency tree of Whens, and then executes the code. This style of programming is far from traditional imperative programming. Here we simply define behavior we'd like to see happen, closer to a language like Prolog.

Interacting with peripherals

Live from the video feed

How do we fill in the pseudocode above that draws lines and text? We need to write some drivers between the operating system and the projector. Once a piece of hardware gets integrated into the system through a driver, we can easily write scripts which run interactively. The same process holds for webcams, RFID tag readers, or receipt printers.

Another reason Folk is implemented on top of TCL is for its excellent Unix interoperability and ability to embed sublanguages such as C inline. This is the same reason why Redis and SQLite have adopted it. In particular, this lets us read from /dev/video0 to get pixels from the camera and use the Vulkan graphics API to draw to a buffer that is then displayed by the projector. For now, let's just stick to images to see what we can build. Since there's a camera pointed down at the table to detect AprilTags, we can ask the system for the image of the whole table and display it:

When the camera frame is /image/ {
  Wish $this displays image $image
}

Image cartography

With a few lines of code, we can display the camera image on the table. It would be nice to get just a small slice of the image, corresponding to the region of the table where a program is. This is good opportunity to show how we can define new clauses in Folk. Let's write a When statement:

# Match any program "p" that has a region
When the camera frame is /f/ & /p/ has region /r/ {

    # Convert region in projector coordinates to camera
    lassign [regionToBbox $r] minX minY maxX maxY
    lassign [projectorToCamera [list $minX $minY]] px0 py0
    lassign [projectorToCamera [list $maxX $maxY]] px1 py1

    # Clamp to image bounds
    set x [expr {int(max(min($px0, $px1), 0))}]
    set y [expr {int(max(min($py0, $py1), 0))}]
    set w [expr {int(min(abs($px1 - $px0), [image width $f]))}]
    set h [expr {int(min(abs($py1 - $py0), [image height $f]))}]

    # Extract and claim the image for the page
    set subimage [image subimage $f $x $y $w $h]
    Claim $p has camera slice $subimage
}

In the last line of code, I've introduced Claim. It lets us add statements to the reactive database explicitly. We can then write a second program that helps us draw the region:

# Display a camera slice
When /someone/ wishes /p/ displays camera slice /slice/ & /p/ has region /r/ {
  set center [region centroid $r]
  Wish to draw an image with center $center image $slice radians 0 scale 1
}

Combining both together, we have a nice and simple API for get the slice of a program. We can place all three programs on the table, and see it work!

When $this has camera slice /slice/ {
  Wish $this displays camera slice $slice
}

The two helper programs began as printed programs, but eventually will become stable "virtual-programs" that run in the background. The fact that some programs are always running, waiting to be matched, makes it much more natural to bootstrap off of previous code you've written. You slowly build your own personal language that lets you express the things you want to make.

Inline device drivers

Now, I skipped some important technical details. How exactly does one write one of these drivers? For the curious, I've excerpted below a bit of the code that gets executed in order to draw images using Vulcan.

When /someone/ wishes to draw an image with /...options/ {
    #...
    set dims [list [image_t width $im] [image_t height $im]]
    set args [list $gim $dims $center $radians $scale]
    Wish the GPU draws pipeline "image" with arguments $args
}

Gpu::pipeline "image" {...} {
    vec2 a = pos + rotate(scale*-imageSize/2, -radians);
    vec2 b = pos + rotate(scale*vec2(imageSize.x, -imageSize.y)/2, -radians);
    vec2 c = pos + rotate(scale*imageSize/2, -radians);
    vec2 d = pos + rotate(scale*vec2(-imageSize.x, imageSize.y)/2, -radians);
    vec2 vertices[4] = vec2[4](a, b, d, c);
    return vertices[gl_VertexIndex];
} {...} {
    vec2 a = pos + rotate(scale*-imageSize/2, -radians);
    vec2 b = pos + rotate(scale*vec2(imageSize.x, -imageSize.y)/2, -radians);
    vec2 c = pos + rotate(scale*imageSize/2, -radians);
    vec2 d = pos + rotate(scale*vec2(-imageSize.x, imageSize.y)/2, -radians);
    vec2 p = gl_FragCoord.xy;
    vec2 uv = invBilinear(p, a, b, c, d);
    if (max( abs(uv.x-0.5), abs(uv.y-0.5)) < 0.5) {
        return texture(image, uv);
    }
    return vec4(0.0, 0.0, 0.0, 0.0);
}

For the client of the text driver, the API is straightforwardly "Folk-ish". Wish to draw text with.... Notably, however, we can just write the GLSL shaders in-line. This is thanks to the Gpu::pipeline macro which, on-the-fly compiles the shader into the full Vulkan pipeline which can be executed from TCL. Here, again, C-interop and metaprogramming are central. There's a lot of really wonderful systems programming going on here, but that's a discussion for another time. Let's keep going with camera slices.

Regions as objects

Camera slices, when displayed where they're defined, have fun recursive properties. But sometimes this can get annoying. Instead, we can use some region utilities to define a "read" region and a "write" region. This could, e.g. be used to preview a "tableshot", or to define a stop-motion animation system.

When $this has region /r/ {
    Claim $this' has region [region move $r down 110%]
}
When $this' has camera slice /slice/ {
    Wish $this displays camera slice $slice
}

Now for that stop-motion animation, we can extend the code above with a loop to generate N regions, each with its own camera slice, which we display in sequence, based on the clock time. The code is a little more involved, but is a good example of using regions, as well as the reactive properties of the language.

set N_FRAMES 5
set FPS 15

When $this has region /r/ {
    set display [region scale [region move $r down 75%] 45%]
    Claim $this has display $display
}
 
When $this has display /d/ {
    for {set i 1} {$i <= $N_FRAMES} {incr i} {
 
        # Define N capture regions
        Claim frame-$this-$i has region [region move $d right ${i}00%]
        Wish frame-$this-$i is outlined red

        # Cycle between displaying each frame based on clock
        When the clock time is /t/ & frame-$this-$i has camera slice /slice/ {
            if {round($t * $FPS) % $N_FRAMES == ($i - 1)} {
 
                Wish frame-$this-$i is outlined green
                set c [region centroid $d]
                Wish to draw an image with center $c image $slice radians 3.1459 scale 1
            }
        }
    }
}

Let's recap: The camera is managed entirely by the OS. Each program is given a location by another thread in the OS running April tag detection. The programs/threads communicate with each other programs by emitting statements. Similarly, we can write programs that react to those statements without much boilerplate. This is the "reactive programming" part of Folk.

Further, using a reactive database of statements allows for "peripheral coordination". It need not be a camera and projector. RFID tags and a radio system could be used to detect the location of programs. As long as they emit simple statements into the reactive database, they can be easily integrated into user programs.

Bootstrapping interfaces

Origami interfaces

Our stop motion system above already takes advantage of the physicality of paper. We can just draw on it directly, instead of digital sketching. We can then augment that paper with a computational layer of sequencing through the computer, striking a thoughtful balance between the computer and nostalgia. But paper is physical in more ways than drawing surface. It can be moved and folded.

With some cardboard, tape, and a push-pin, we can create a dial that changes the animation frame-rate dynamically. The program uses its angle relative to the camera to claim a global FPS across the system. Slightly modifying the animation code above such that it matches the claimed FPS, we get:

When $this has region /r/ {
  set angle [region angle $r]
  set fps [expr { round(30 * abs($angle / 3.1415)) }]
  Wish $this is labelled $fps
  Claim the fps is $fps
}
# Changes in the original program are just as simple as...
- set FPS 15
+ When $this has display /d/ & the fps is /FPS/ {

With inspiration from the tradition of "paper engineering", some further interface ideas emerge. A simple, and very clever, way to start is paper button. We use physical mechanism to impede detection until a user action is made:

Taking tableshots

So far we've relied on real-time streams of images from the camera. It would be powerful to take a snapshot at a given instant of time, a kind of picture of what's on the table, or "tableshot" to extend the screenshot metaphor. We can use the paper button design to trigger the picture.

When $this has neighbor /n/ {
  Claim $n is taking a tableshot
}
When $this has region /r/ {
  Wish $this' is outlined white
  Claim $this' has region [region move $r down 100%]
}
 
When $this' is taking a tableshot & \
     $this' has camera slice /slice/ & \
     the clock time is /t/ \
{
  # A file path where we'll persist the camera slice as jpeg.
  # We cast the timestamp to an integer so it's the same throughout
  # a second of clock time.
  set t [int $t]
  set fp "/tmp/$this-$t.jpg"
 
  # We haven't taken the picture for this second.
  When /nobody/ claims $this' has tableshot $fp {
    # Save the image to a jpeg, written to the path
    image saveAsJpeg $slice $fp
    # Persist this claim so it continues to be true the next frame
    Commit $this' { Claim $this' has tableshot $fp }
  }
}

# Match the persist claim to display the image
When $this' has tableshot /ts/ {
  Wish $this displays image $ts
}

The first program, our camera shutter, looks for any neighbors (intersect.folk), and claims they're taking a tableshot. The second program, our film waits for someone to claim it's taking a tableshot, at which point it saves the camera slice as a jpeg (images) and displays it.

That second program introduces another mechanism in the folk language, Commit, which allows us to keep persistent state across time. Saving images as a jpeg is a costly operation, so we don't want to do this 60 times a second. Instead, we use a consistent file path across and keep the claim alive until its overwritten. This is powerful, but tricky code, so I've tried to annotate it well above.

Paper computer mouse

To better the tableshot ergonomics, it would be nice if the shutter gave us a preview of which neighbors it was going to activate, as well as having a pointer which allows for more precision.

We can extend our physical button to preview which region it is acting upon using a whisker (points-at.folk). A second program, activated with a press, then claims that any selected region should take a tableshot. Combined, the two form a physical computer mouse.

Claim $this is a previewer
Wish $this has neighbors

When $this points right with length 0.2 at /p/ {
  Wish $p is outlined red
  Claim $p is selected
}
When /p/ is a previewer & /p/ has neighbor $this & /p/ claims /t/ is selected  {
  Claim $t is taking a tableshot
}
Finally, we can add a final predicate to the film program to clear the film by reseting any committed tableshots. This can then be activated by a very simple button with an "emergency flap". (Note: the complete program is in the appendix).
When /someone/ wishes to clear tableshots & /p/ has tableshot /ts/ {
    Commit $p {}
}
# Emergency button!
Wish to clear tableshots

Gestural selections

Now that we have a computer mouse, it would be nice to be able to select a variable-sized region to take a picture. What came to mind was the gesture we use to act out framing a picture. To do this, I'm going to write a "corner" program which I can tape onto a print-out of my hand.

When tag $this has corners /co/ & tag $this has center /ce/ size /_/ {

    set corner [lmap p $co {::cameraToProjector $p}]
    set du [sub [lindex $corner 1] [lindex $corner 0]]
    set dv [sub [lindex $corner 2] [lindex $corner 1]]

    set center [::cameraToProjector $ce]
    set center [vec2 add $center $du]
    set center [vec2 add $center $dv]

    Claim $this has corner $center $du $dv
}

We can preview the corners with some drawing primitives.

When /p/ has corner /c/ /du/ /dv/ {
    set a [vec2 add $c $du]
    set b [vec2 add $c $dv]
    
    Wish to draw a circle with center $c radius 4 thickness 1 color white filled true
    Wish to draw a circle with center $a radius 4 thickness 1 color red filled true
    Wish to draw a circle with center $b radius 4 thickness 1 color green filled true
    Wish to draw a stroke with points [list $c $a] width 2 color red
    Wish to draw a stroke with points [list $c $b] width 2 color green
}

Now, with a bit of linear algebra, we can write another program which finds pairs of corners and creates a virtual region inside of them. Our deconstructed computer mouse now has a selection tool. Unlike a traditional computer mouse, it is independent of the cursor position and can serve as a stable reference for other interfaces.

When /p/ has corner /c1/ /dx/ /dy/ & /q/ has corner /c2/ /du/ /dv/  {
    if {$p <= $q} return
    set i1 [ray-isect $c1 $dx $c2 $dv]
    set i2 [ray-isect $c1 $dy $c2 $du]
 
    set points [list $c1 $c2 $i1 $i2]
    set minX [min {*}[lmap p $points {lindex $p 0}]]
    set maxX [max {*}[lmap p $points {lindex $p 0}]]
    set minY [min {*}[lmap p $points {lindex $p 1}]]
    set maxY [max {*}[lmap p $points {lindex $p 1}]]

    regionFromBox $this' $minX $minY $maxX $maxY

    Wish $this' is outlined white
    lmap p $points {Wish to draw a circle with center $p radius 4 thickness 1 color white filled true}
}

proc ray-isect {p u q v} {
    # Ray-ray intersection
    # r1(t) = p + t * u
    # r2(s) = q + s * v

    # Solve system of equations by hand
    lassign $p px py
    lassign $q qx qy
    lassign $u ux uy
    lassign $v vx vy
    set s [expr {  ($py - $qy + ($uy / $ux) * ($qx - $px)) /  ($vy - ($uy / $ux) * $vx)  }]
    
    # Return r2(s) = q + s * v
    vec2 add $q [vec2 scale $v $s]
}
 
proc regionFromBox {program x1 y1 x2 y2} {
    set vertices [list [list $x2 $y1] [list $x1 $y1] [list $x1 $y2] [list $x2 $y2]]
    set edges [list [list 0 1] [list 1 2] [list 2 3] [list 3 0]]
    Claim $program has region [region create $vertices $edges]
}

Capturing regions

Now that we can define arbitrary regions, it would be nice to be able to take a tableshot of any region, instead of just the special //film// region we defined above. We can slightly modify the film program for this purpose, swapping in a generice /p/ for $this.

When /p/ is taking a tableshot & \
     /p/ has camera slice /slice/ & \
     the clock time is /t/ \
{
  set t [int $t]
  set fp "/tmp/$p-$t.jpg"
 
  When /nobody/ claims $p has tableshot $fp {
    image saveAsJpeg $slice $fp
    Commit $fp { Claim $p has tableshot $fp }
  }
}

When /someone/ wishes to clear tableshots & /p/ has tableshot /fp/ {
    Commit $fp {}
}

Instead of displaying the tableshot where it's captured, we can display them all together in a form of file listing.

When the collected matches for [list /p/ has tableshot /ts/] are /matches/ {
  set images [lmap m $matches {dict get $m ts}]
 
  When $this has region /r/ {
    lassign [regionToBbox $r] x minY maxX y
 
    foreach fp $images {
      set im [image load $fp]
      set h [image height $im]
      set w [image width $im]
      set c [list [+ $x [/ $w 2]] [+ $y [/ $h 2]]]
      Wish to draw an image with center $c image $im radians 3.1415 scale 1
      set y [+ $y $h]
    }
  }
}

Here we introduce the last folk language primitive, collected matches, which lets us aggregate across all matches for a clause, instead of matching each one individually.

Infinite plasticity

We've built an interactive folk interface to take pictures on a tabletop using physical programs and paper interfaces, instead of mouse and screen. This folk interface is just one of many supported by the Folk operating system, a means to coordinate peripherals (including, but not limited to, camera and projector) in a reactive manner.

In folk interfaces, I argued that instead of thinking of software as a mechanism to automate existing processes, it was better understood as a language for defining systems or materials with a combinatorial set of affordances. The potential for reifying metaphors into interfaces was until now restricted to the screen. Folk Computer expands the field of computational objects into the material world.

While Folk currently retains a playful aesthetic of paper collages from a child's art class, it should not be confused with mere post-modern pastiche. The primary function of the system is not the sign. It avoids the temptation of sublimating a desire for direct experience with the mere provisioning of information. Nor does Folk offer a totalizing system which caters to a user's "needs", falling prey to a modernist teleology of "ease".

Instead, Folk is prefigurative politics in the form of technological infrastructure. It not only imagines, but manifests a world where the infinite plasticity of computation lightens the finitude of the material world.

Cristóbal Sciutto, December, 2023