Scheme, Lisp, and Parse Trees
When you write 3 + 4 * 5 in mathematics, everyone agrees it means 3 + (4 * 5) = 23, not (3 + 4) * 5 = 35. Operator precedence rules—memorized through years of schooling—determine this. But what if there were a notation that never needed precedence rules? What if the structure of the expression itself made evaluation order crystal clear?
Enter Lisp and its dialect Scheme: languages where parentheses aren't decorative—they're structural. Where (+ 3 (* 4 5)) is unambiguous. Where the written form directly reflects the parse tree that represents it.
Understanding Scheme's prefix notation and its parse trees reveals something profound: the notation you choose shapes how you think about computation.
The Lisp Family: A Quick Map
If you’ve heard of Emacs Lisp (Elisp) or Common Lisp, you’re looking at the same family tree. Lisp isn’t a single language, but a lineage of dialects that share the same "code is data" DNA.
- Common Lisp: The "industrial" version—massive, feature-rich, and standardized.
- Emacs Lisp (Elisp): The specialized dialect used to build and extend the Emacs text editor. (Yes, the editor is mostly written in its own language.)
- Scheme: The "academic" version—minimalist, elegant, and the focus of this article.
We use Scheme because its syntax is so stripped-down that you can’t help but see the underlying structure of the computation.
What is Scheme?
In most programming languages, you write:
This is infix notation—the operator (+) sits between its operands.
In Scheme, you write:
This is prefix notation—the operator comes first, followed by its operands.
Comparison:
| Notation | Addition | Multiplication | Nested |
|---|---|---|---|
| Infix | 3 + 4 |
4 * 5 |
3 + 4 * 5 |
| Prefix (Scheme) | (+ 3 4) |
(* 4 5) |
(+ 3 (* 4 5)) |
Why Prefix Notation?
Advantages:
- No precedence rules needed: Parentheses make structure explicit
- Uniform syntax: Everything is
(function arg1 arg2 ...) - Easy to parse: No need for precedence-climbing or shunting-yard algorithms
- Variable number of arguments:
(+ 1 2 3 4 5)works naturally
Disadvantages:
- Unfamiliar: Humans learn infix in school
- Lots of parentheses: Can be harder to read for complex expressions
S-Expressions
The term S-expression (symbolic expression) sounds complex, but it describes a very simple, recursive structure.
Think of it like a system of nested boxes:
- Atoms: These are the contents. A number (
42) or a symbol (+) is an atom. It cannot be split further. - Lists: These are the boxes. A list is wrapped in parentheses
( ... ).
The Golden Rule: A list can contain atoms, or it can contain other lists.
| Type | Definition | Examples |
|---|---|---|
| Atom | A fundamental unit (number, symbol, string) | 42, x, +, "hello" |
| List | A container holding zero or more S-expressions | (1 2 3), (a (b c) d), () |
Because lists can contain lists, you naturally build a tree structure.
Visualizing S-Expressions
Consider the S-expression: (* (+ 1 2) 3)
It is a List containing three things:
- Atom:
* - List:
(+ 1 2)(which itself contains three atoms) - Atom:
3
If we draw this hierarchy, it is identical to a tree:
graph TD
Root[List: * ... 3] --> Atom1["Atom: *"]
Root --> InnerList[List: + 1 2]
Root --> Atom2["Atom: 3"]
InnerList --> Atom3["Atom: +"]
InnerList --> Atom4["Atom: 1"]
InnerList --> Atom5["Atom: 2"]
style Root fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style InnerList fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
Why "Symbolic"?
In most languages (like C or Java), variable names like x or calculate disappear when you compile the code—they turn into memory addresses.
In an S-expression, symbols (words like x, foo, +) are distinct data types, just like numbers. The language can manipulate words and names as easily as it manipulates math. This is why it's called a Symbolic Expression.
Under the Hood: How does the CPU handle symbols?
In languages like C, a variable name like radius is completely erased during compilation and replaced with a memory address (e.g., 0x7ffe...). The CPU only sees the address.
In Lisp/Scheme, symbols persist during execution because the language uses an Environment: a giant lookup table in memory.
- The Environment: When you call
(square 7), the language looks up the symbolsquarein its table to find the code it needs to run. - Pointers: To keep this fast, the language "interns" symbols. Every instance of
xin your code points to the exact same "Symbol Object" in memory. The CPU doesn't compare the string "x"; it just compares the memory pointers to see if they match. - Code as Data: Because symbols aren't erased, a Lisp program can read its own source code, move symbols around, and even generate new functions while it is running.
Examples of valid S-expressions:
42 ; Atom (number)
hello ; Atom (symbol)
(1 2 3) ; List of three atoms
(+ 1 2) ; List (function call)
(+ (* 2 3) 4) ; List containing another list (nested)
() ; The Empty List (also called "nil")
Lists as Code
In Scheme, code is lists. A function call is a list where:
- The first element is the function (operator)
- Remaining elements are arguments (operands)
(+ 1 2) ; Call function + with arguments 1 and 2
(* 3 4) ; Call function * with arguments 3 and 4
(- 10 5) ; Call function - with arguments 10 and 5
This homoiconicity (code as data) makes Lisp/Scheme uniquely powerful for metaprogramming—programs that manipulate programs.
Common Misconception: Code vs. Data
When we say "Code is Data," we don't mean hardcoding (mixing magic numbers or strings into your logic), which is generally bad practice.
We mean that the structure of the program is exposed as a data structure you can edit.
- In Python/Java: Code is text. To change it, you have to cut and paste strings.
- In Lisp: Code is a List. You can use standard list functions (like
maporreverse) to rewrite your own program while it is loading. This allows you to add new features to the language (like loops or classes) without waiting for the compiler developers to add them.
Special Forms: Breaking the Rules
There is one catch. Normally, Scheme evaluates all arguments before passing them to a function. But some things, like defining variables, wouldn't work if we did that.
If define were a normal function, Scheme would try to find the value of x before running the line. But x doesn't exist yet!
To handle this, Scheme has Special Forms (like define, if, and quote) that have their own custom evaluation rules. They don't evaluate all their arguments.
Reading Scheme Expressions
Let's build intuition by reading Scheme expressions aloud.
Read as: "Add 5 and 6"
Evaluates to: 11
Read as: "Multiply 7 and 8"
Evaluates to: 56
Read as: "Add the result of (add 5 and 6) to the result of (multiply 7 and 8)"
Step-by-step evaluation:
- Evaluate inner expressions first:
(+ 5 6)→11(* 7 8)→56- Substitute results:
(+ 11 56) - Evaluate outer expression:
67
Key Insight
The structure of nested parentheses directly mirrors the evaluation order. Innermost expressions evaluate first, working outward.
Parse Trees for Scheme
A parse tree (or abstract syntax tree, AST) represents the grammatical structure of an expression as a tree.
For Scheme, the correspondence between written form and parse tree is particularly direct—parentheses literally define tree structure.
Expression: (+ 5 6)
Structure:
- Root: The operator +
- Children: The operands 5 and 6
graph TD
Plus["+"] --> Five["5"]
Plus --> Six["6"]
style Plus fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Five fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Six fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
Expression: (+ (+ 5 6) (* 7 8))
Structure:
- Root: Outer + operator
- Left subtree: (+ 5 6)
- Right subtree: (* 7 8)
graph TD
RootPlus["+"] --> LeftPlus["+"]
RootPlus --> RightTimes["*"]
LeftPlus --> Five["5"]
LeftPlus --> Six["6"]
RightTimes --> Seven["7"]
RightTimes --> Eight["8"]
style RootPlus fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style LeftPlus fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style RightTimes fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Five fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Six fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Seven fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Eight fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
Evaluation order (post-order traversal):
- Evaluate left subtree:
5 + 6 = 11 - Evaluate right subtree:
7 * 8 = 56 - Evaluate root:
11 + 56 = 67
How to draw a parse tree from Scheme:
- The outermost parentheses define the root node (the operator)
- Each argument becomes a child
- If an argument is itself a list (nested expression), recurse
Example: (* (+ 3 4) (- 10 5) 2)
graph TD
Times["*"] --> Plus["+"]
Times --> Minus["-"]
Times --> Two["2"]
Plus --> Three["3"]
Plus --> Four["4"]
Minus --> Ten["10"]
Minus --> Five["5"]
style Times fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Plus fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Minus fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Two fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Three fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Four fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Ten fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Five fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
Evaluation:
(+ 3 4)→7(- 10 5)→5(* 7 5 2)→70
Evaluation Order: Post-Order Traversal
Scheme expressions evaluate using post-order traversal:
- Evaluate left subtree
- Evaluate right subtree
- Apply operator at root
This ensures arguments are computed before the function that uses them—a fundamental principle in most programming languages.
Example: (+ (+ 5 6) (* 7 8))
Tree:
Post-order traversal:
- Visit left subtree (
+): - Visit
5(leaf) → value is5 - Visit
6(leaf) → value is6 - Apply
+→5 + 6 = 11 - Visit right subtree (
*): - Visit
7(leaf) → value is7 - Visit
8(leaf) → value is8 - Apply
*→7 * 8 = 56 - Visit root (
+): - Apply
+to results →11 + 56 = 67
Result: 67
The tree structure guarantees correct evaluation order—no precedence rules needed.
The Ambiguity Problem
Why did Lisp choose such a strange notation? To eliminate ambiguity.
Infix notation (standard math) is inherently ambiguous. Consider:
3 + 4 * 5
Does this mean (3 + 4) * 5 or 3 + (4 * 5)?
To solve this, we rely on Precedence Rules (PEMDAS). You have to memorize that * beats +. These are "invisible parentheses" that the parser adds for you. Scheme removes the need for invisible rules by forcing you to make the structure explicit.
Expression: 3 + 4 * 5
The Problem:
Who owns the 4? Does + claim it, or does * claim it? The notation doesn't say.
The Fix: We apply an arbitrary rule: "Multiplication before Addition." We need a lookup table of rules to read the code.
The Hidden Tree: The computer implicitly builds this tree based on the rulebook:
graph TD
T1["3"] --- T2["+"] --- T3["4"] --- T4["*"] --- T5["5"]
subgraph Rules ["The Rulebook (PEMDAS)"]
R1["Rule: * beats +"]
end
Plus["+"] --> Three["3"]
Plus --> Times["*"]
Times --> Four["4"]
Times --> Five["5"]
T3 --> Rules
Rules -.-> Plus
style Plus fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Times fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Rules fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style R1 fill:#48bb78,stroke:#cbd5e0,stroke-width:2px,color:#fff
Expression: (+ 3 (* 4 5))
The Solution:
There is no question about who owns the 4. It is physically inside the (* ...) list. The structure is undeniable.
The Explicit Tree: The notation maps 1:1 to the tree.
graph TD
Plus["+"] --> Three["3"]
Plus --> Times["*"]
Times --> Four["4"]
Times --> Five["5"]
style Plus fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Times fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
Why Parsers Love Prefix
The difference in complexity is staggering when you try to write a program to read these expressions.
Parsing Infix (3 + 4 * 5) requires:
- Tokenizing: Breaking string into parts.
- Precedence Logic: Knowing
*>+. - Associativity Logic: Knowing
1 - 2 - 3is(1 - 2) - 3. - Complex Algorithms: Shunting-yard (Dijkstra) or Operator-precedence parsing.
Parsing Prefix ((+ 3 (* 4 5))) requires:
- Tokenizing.
- Recursion: "See a paren? Start a new node."
There is no Step 3 or 4. The parser is trivial:
| Simple Scheme Parser | |
|---|---|
- Recursively parse a list of tokens
- If we see an opening paren, we're starting a new expression
- First token after '(' is the operator
- Collect all operands until we hit the closing paren
- Each operand might itself be an expression, so recurse
- Consume the closing paren
- Base case: if it's not a paren, it's a number (or symbol)
No precedence table needed. Structure is explicit.
Historical Context
Lisp was created by John McCarthy in 1958 at MIT as part of his research on artificial intelligence. McCarthy wanted a language that:
- Could manipulate symbolic expressions (not just numbers)
- Treated code as data (enabling metaprogramming)
- Supported recursive functions naturally
S-expressions were McCarthy's solution—simple, uniform, and powerful.
Scheme emerged from MIT in 1975 as Guy Steele and Gerald Sussman explored λ-calculus and programming language semantics. They wanted a "purer" Lisp:
- Lexical scoping (instead of dynamic)
- First-class continuations
- Tail-call optimization
- Minimalist design
Scheme's simplicity made it ideal for teaching and research, culminating in the influential textbook Structure and Interpretation of Computer Programs (SICP) by Abelson and Sussman (1984).
Lisp's ideas permeate modern programming:
- Garbage collection: Pioneered by Lisp (McCarthy, 1960)
- First-class functions: Lisp made them mainstream
- Closures: Scheme formalized them
- Dynamic typing: Common in scripting languages
- Homoiconicity: Inspired Clojure, Julia's metaprogramming
Even languages that don't look like Lisp (JavaScript, Python, Ruby) adopted its core ideas.
Scheme in Action
Scheme arithmetic supports variable arguments naturally.
Nested Arithmetic:
Definitions use define. Notice how defining a function looks just like calling it, but with a body.
Defining Functions:
Scheme uses if for conditional logic. It returns a value (like the ternary operator ? : in C-style languages) rather than executing statements.
Recursion is the primary way to loop in Scheme.
Recursion and the Call Stack
Every recursive call relies on the function call stack—a fundamental data structure that manages function execution.
When (factorial 5) runs, the stack grows with each recursive call:
factorial(5)
→ factorial(4)
→ factorial(3)
→ factorial(2)
→ factorial(1) ← base case reached, starts returning
Each call waits on the stack until its nested call completes, then pops off and returns its result. This LIFO behavior is what makes recursion possible.
Why Study Scheme?
You might reasonably ask: why learn Scheme when modern languages like Python, JavaScript, or Rust dominate industry?
Scheme strips away syntax complexity, letting you focus on computational concepts:
- Recursion without boilerplate
- Higher-order functions without ceremony
- Closures and scope made explicit
Scheme's direct mapping from text to parse tree demystifies how languages work. Once you see how prefix notation avoids precedence, you understand why parsing infix is hard.
Scheme is functional-first, teaching patterns now common in JavaScript, Python, Haskell, and Rust:
map,filter,reduce- Immutability
- Function composition
Understanding Lisp/Scheme helps you appreciate language evolution. Many "modern" features (lambdas, closures, garbage collection) were invented in Lisp decades ago.
Scheme's minimalism makes it perfect for studying:
- Programming language semantics
- Interpreters and compilers
- Type theory and λ-calculus
MIT's SICP remains a gold standard in CS education precisely because Scheme gets out of the way, letting ideas shine through.
Practice Problems
Practice Problem 1: Reading Scheme
What does this Scheme expression evaluate to?
Show the evaluation steps.
Solution
Expression: (+ (* 3 4) (- 10 2) 5)
Step 1: Evaluate subexpressions
(* 3 4)→12(- 10 2)→8
Step 2: Substitute results
(+ 12 8 5)
Step 3: Evaluate outer expression
12 + 8 + 5 = 25
Answer: 25
Practice Problem 2: Convert Infix to Prefix
Convert this infix expression to Scheme prefix notation:
Then draw the parse tree.
Solution
Infix: (5 + 6) * (7 - 2)
Prefix (Scheme): (* (+ 5 6) (- 7 2))
Parse Tree:
graph TD
Times["*"] --> Plus["+"]
Times --> Minus["-"]
Plus --> Five["5"]
Plus --> Six["6"]
Minus --> Seven["7"]
Minus --> Two["2"]
style Times fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Plus fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Minus fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Five fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Six fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Seven fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Two fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
Evaluation:
(+ 5 6)→11(- 7 2)→5(* 11 5)→55
Answer: 55
Practice Problem 3: Draw Parse Tree
Draw the parse tree for this Scheme expression:
Then trace the post-order evaluation.
Solution
Parse Tree:
graph TD
RootPlus["+"] --> LeftPlus["+"]
RootPlus --> RightTimes["*"]
LeftPlus --> Five["5"]
LeftPlus --> Six["6"]
RightTimes --> Seven["7"]
RightTimes --> Eight["8"]
style RootPlus fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style LeftPlus fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style RightTimes fill:#4a5568,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Five fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Six fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Seven fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
style Eight fill:#2d3748,stroke:#cbd5e0,stroke-width:2px,color:#fff
Post-Order Evaluation:
- Visit left subtree (
+): - Visit
5→5 - Visit
6→6 - Apply
+→5 + 6 = 11 - Visit right subtree (
*): - Visit
7→7 - Visit
8→8 - Apply
*→7 * 8 = 56 - Visit root (
+): - Apply
+→11 + 56 = 67
Answer: 67
Practice Problem 4: Complex Nesting
Evaluate this Scheme expression step-by-step:
Solution
Expression: (* (+ 1 2) (- 10 (/ 8 2)) (+ 3 3))
Step 1: Evaluate innermost subexpressions
(/ 8 2)→4
Step 2: Substitute and evaluate next level
(+ 1 2)→3(- 10 4)→6(+ 3 3)→6
Step 3: Evaluate outer expression
(* 3 6 6)→108
Answer: 108
Key Takeaways
| Concept | Meaning |
|---|---|
| Prefix Notation | Operator comes before operands: (+ 1 2) |
| S-Expression | Atom or list of S-expressions |
| Parse Tree | Tree representation of expression structure |
| Post-Order Traversal | Evaluate children before parent (bottom-up) |
| Homoiconicity | Code is data; programs are lists |
| No Precedence Rules | Parentheses make structure explicit |
| Scheme/Lisp | Languages using prefix notation and S-expressions |
Why Scheme Matters
Scheme and prefix notation reveal:
- How notation shapes thought: Explicit structure changes how you reason about programs
- How parsers work: Prefix notation is trivial to parse; infix requires precedence
- How trees encode meaning: Parse trees are the "compiled" form of expressions
- How code can be data: S-expressions let programs manipulate programs
- How functional programming works: Scheme pioneered ideas now common in modern languages
You don't need to write Scheme professionally to benefit from understanding it. The insights it provides—about parsing, recursion, abstraction, and the relationship between notation and computation—are universal.
Further Reading
- Abelson & Sussman, Structure and Interpretation of Computer Programs — The definitive Scheme textbook
- David Evans, Introduction to Computing — Chapter 2 covers RTNs and parsing
- Recursive Transition Networks — How grammars define parse trees
- Computational Thinking — Abstraction and decomposition in practice
Scheme strips programming down to its essence: functions, data, and recursion. What remains is computation in its purest form—no syntax tricks, no precedence tables, just structure made explicit through parentheses. It's not the notation most programmers use daily, but understanding it changes how you think about all the notations you do use.