This work is licensed under a Creative Commons Attribution 3.0 Unported License (including images & stylesheets). The source is available on Github.

Intro

This cookbook covers some common tasks with core Clojure data structures. It assumes that the reader is familiar with foundational read and write operations (get, conj, assoc, update, etc). For more coverage of getting started with Clojure data structures, some recommended resources are:

Vectors

Intro

Vectors are probably the most commonly used data structure for sequential data that isn't code. The random access capability and the fact that vectors aren't treated as function calls make them often a better choice than lists.

Constructing Vectors

mapv and filterv are eager versions of map and filter that return vectors

user=> (mapv second [[1 1] [2 4] [3 9]])
[1 4 9]

user=> (filterv even? (range 10))
[0 2 4 6 8]

Accessing Elements

Vectors implement the stack protocol, meaning peek and pop return the last element and the vector without the last element, respectively.

user=> (peek [7 8 9])
9
user=> (pop [7 8 9])
[7 8]

(Non-)emptiness

Sometimes it's desirable to treat an empty vector as a falsey value, especially in conditionals. The not-empty function will return its argument if it's not empty, otherwise it will return nil.

user=> (defn notate-range 
         "takes a vector of [start end] (both optional) and returns a range notation"
         [coords]
         (if-not (empty? coords)
           (let [[start end] coords]
             (str start ".." end))
           "empty range"))
#'user/notate-range
user=> (notate-range [1 5])
"1..5"
user=> (notate-range [7])
"7.."
user=> (notate-range [])
"empty range"

; we can also write this as
user=> (defn notate-range 
         "takes a vector of [start end] (both optional) and returns a range notation"
         [coords]
         (if-let [[start end] (not-empty coords)]
           (str start ".." end)
           "empty range"))

Here, not-empty is used to make the conditional false (the empty vector is truthy, and not-empty makes it nil).

Linear-time Operations

Operations that need to search through elements one-by-one or operate on a large number of elements are less efficient, but still valuable in some contexts.

The replace function has a special case for vectors that returns a vector. It takes a map where the keys are the values to replace and the values are the replacements.

; change :begin to :start and :finish to :end
user=> (replace {:begin :start, :finish :end} 
                [:begin :middle :finish :begin :finish])
[:start :middle :end :start :end]

"Deleting" an item from anywhere but the last position is not a 'fast' operation, because conceptually all the indexes after the deleted item need to be decremented by one. We can use into to combine two sub-vectors, but the trade-off is that this operation adds the items from the second sub-vector one at a time, so this function is O(n) in the size of the second sub-vector.

user=> (defn delete-at [v pos]
         (into (subvec v 0 pos) (subvec v (inc pos))))
#'user/delete-at
user=> (delete-at [:a :b :c] 1)
[:a :c]

Inserting an item at an arbitrary position has the same issue as deleting - it potentially changes a lot of indexes. An implementation is also similar to delete-at, with the same caveat about efficiency.

user=> (defn insert-at [v pos value]
         (into (conj (subvec v 0 pos) value) (subvec v pos)))
#'user/insert-at
user=> (insert-at [:a :b :c] 2 "interloper!")
[:a :b "interloper!" :c]

Searching

Checking for the existence of a value (as opposed to an index) requires examining the elements one-by-one. A point of confusion that sometimes arises is the contains? function, which searches an associative collection for the existence of a key/index, and is very seldom desirable for vectors.

; `contains?` looks for an index!!!
user=> (contains? [:a :b :c] :b)
false
user=> (contains? [:a :b :c] 2)
true ; because the indexes are 0, 1, and 2

A commonly-used way to check for an occurrence of a certain value in a collection is the some function with a set as a predicate. some stops at the first truthy return, so there's no 'wasted' computation.

user=> (some #{:b} [:a :b :c])
:b

Note that searching for false or nil in this way won't work. In those cases use the nil? or false? functions.

Java interop can be used to get the first or last index of a specific value.

user=> (.indexOf [:a :b :c :b :a] :b)
1
user=> (.lastIndexOf [:a :b :c :b :a] :b)
3
; indexOf returns -1 when the value isn't found
user=> (.indexOf [:a :b] :c)
-1

keep-indexed can be used to find all the indexes of a value (or indexes that match some predicate). Note that this is actually a function that works on (and returns) a sequence.

user=> (defn indexes-of [search-value coll]
         (keep-indexed (fn [i v] (when (= search-value v) i)) coll))
#'user/indexes-of
user=> (indexes-of 15 [5 10 15 30 15 5])
(2 4)

Maps

Intro

Maps can be hashed or sorted (array maps are also available, but are mostly used for maps with < 10 entries). Sorted maps aren't as fast to lookup by key, but the sorting by key makes them useful for iteration in some circumstances.

Building Maps

zipmap associates corresponding entries from two seqs into a map. This is used here to assign people to teams.

; assign participants to teams
user=> (let [participants ["Mike" "Tina" "Alice" "Fred"]
             team-nums    (cycle [1 2])]
         (zipmap participants team-nums))
{"Mike" 1, "Tina" 2, "Alice" 1, "Fred" 2}

group-by could be used to build the team rosters, but the result still has the team numbers in the roster.

user=> (group-by val {"Mike" 1, "Tina" 2, "Alice" 1, "Fred" 2})
{1 [["Mike" 1] ["Alice" 1]], 2 [["Tina" 2] ["Fred" 2]]}

update-vals (added in Clojure 1.11) could be tacked on to clean up the output

user=> (-> (group-by val {"Mike" 1, "Tina" 2, "Alice" 1, "Fred" 2})
           (update-vals #(mapv first %)))
{1 ["Mike" "Alice"], 2 ["Tina" "Fred"]}

reduce-kv allows more control over the values, meaning it's possible to go directly to "clean" output (and maybe even different data structures).

user=> (reduce-kv
        (fn [acc k v] (update acc v (fnil conj #{}) k))
        {} {"Mike" 1, "Tina" 2, "Alice" 1, "Fred" 2})
{1 #{"Alice" "Mike"}, 2 #{"Tina" "Fred"}}

frequencies takes a collection and returns a map of elements in that collection to how many times it appears. This is used here for a rudimentary word counter.

user=> (-> "the cat in the hat fell in the vat"
           (str/split #" ")
           frequencies)
{"the" 3, "cat" 1, "in" 2, "hat" 1, "fell" 1, "vat" 1}

clojure.set has a map-invert function that will swap the keys and values. The "unique key" facet of maps means that duplicate values in the input map will get one of the keys.

user=> (require '[clojure.set :as set])
user=> (let [squares {1 1, 2 4, 3 9, 4 16}
             sqrts (set/map-invert squares)]
        (sqrts 9))
3

Accessing Entries

While select-keys can be used to create a submap, it is sometimes desirable to pull some keys into a sequence. In this example, juxt is used to make a vector of the :x and :y values of maps for passing to another function.

user=> (defn manhattan-distance [[x1 y1] [x2 y2]]
        (let [distance (comp abs -)]
         (+ (distance x1 x2) (distance y1 y2))))
#'user/manhattan-distance
user=> (let [point1 {:x 2 :y 0}
             point2 {:x 0 :y 2}
             xy-coords (juxt :x :y)]
        (manhattan-distance (xy-coords point1)
                            (xy-coords point2)))
4 

Lists

Intro

Lists are primarily used for code in Clojure. While there aren't a ton of functions aimed specifically at lists, lists do implement the sequence interface, so all sequence functions work on lists without any conversion.

Accessing Elements

Like vectors, lists implement the stack protocol, but unlike vectors, the 'top' of the stack is the front of the list. So, 'first' on a list returns the same element as peek (the top of the stack), but on a vector 'first' returns the bottom of the stack (this can be useful for queueing in LIFO vs FIFO order, for example).

user=> (peek '(1 2 3))
1
user=> (pop '(1 2 3))
(2 3)

Sets

Intro

In addition to the distinctness of sets, they're fast to check membership, so if values are being collected for the purposes of checking whether they've been seen or not, a set is often a good choice.

Constructing sets

While keys on a map returns a sequence, the Java interop call to keySet can be used to get a set of keys on the map. This set is technically an anonymous instance of Java's AbstractSet.

user=> (.keySet {:a 1 :b 2})
#{:a :b}

Sets as predicates

Because sets can be used as functions, they can be used as predicates with various higher-order functions.

; are there any fives in the sequence?
user=> (some #{5} [1 2 3 5 8 13])
5

; how many `a`s and `c`s are in the DNA sequence?
user=> (count (filter #{\a \c} "agctgcgcatagcgt"))
7

;which word(s) can be typed using only the top row of a qwerty keyboard?
user=> (let [top-row (set "qwertyuiop")
             candidates ["poet" "computer" "typewriter" "desk"]]
        (filter #(every? top-row %) candidates))
("poet" "typewriter")

Relations

In addition to the more primitive set functions, the clojure.set namespace contains the fundamental relational algebra (the underpinnings of SQL) operations. Sets of maps can be treated as relations, providing the ability to do joins, projections, etc on in-memory data structures.

user=> (require '[clojure.set :as set])
user=> (let [owners #{{:name "Jane" :pet "Fido"}
                      {:name "Tim" :pet "Scaly"}}
             pets #{{:name "Fido" :species "dog"}
                    {:name "Scaly" :species "snake"}}]
        (-> (set/rename pets {:name :pet-name}) ; rename the :name key to disambiguate
            (set/join owners {:pet-name :pet}) ; join with owners (on pet-name = pet)
            (set/project [:name :species]))) ; project (select) only the owner's name and their pet's species
#{{:name "Jane", :species "dog"} 
  {:name "Tim", :species "snake"}}

Sequences

Intro

Lots of clojure.core functions operate on sequences. In general, a function that operates on sequences will return a sequence, even when the coll (a common name for a collection argument) is something like a vector or map.

Exercise: search for clojure.core functions with a coll argument

While not perfectly accurate or comprehensive, a possible starting point for exploring sequence functions is to search for functions in the clojure.core namespace that have an argument named coll.

; since the arglist is nested, helper function to check for a `coll` arg
user=> (defn has-coll-arg? 
  "Arglists are stored as a list of arg vectors, e.g. ([x] [x y]).
  Returns the first `coll` in an arg vector, else nil."
  [arglist]
  (some ; some over the arg-vecs 
    (fn [arg-vec] (some ; some over the members of each arg-vec 
                    #{'coll} arg-vec)) arglist))
user=> (->> (ns-publics (the-ns 'clojure.core))
        vals
        (map meta)
        (filter (comp has-coll-arg? :arglists))
        (map :name)
        sort)
(->Eduction
 assoc!
 associative?
 ...)

This example, in addition to illustrating some of the runtime introspection capabilities of Clojure, illustrates a few sequence functions. some is used on both lists and vectors (and uses the 'set as predicate' idiom from above to search through the vectors). Depending on exactly which version of Clojure is in use, this code returns a sequence of around 80 functions.

Exercise: Convoluted Encoding

Here's a contrived encoding algorithm to demonstrate a few sequence functions. The decoding has the following rules:

  • the encoded string will be a sequence/string of numbers
  • for every three digits, multiply the first two and add the third until a strictly descending three digits is reached (or until all digits are exhausted)
  • each resulting number corresponds to its ordinal position in the alphabet (e.g. 1 = a, 2 = b, etc)

So, for example, "45011429135594375" becomes...

  • 450 - 4 * 5 + 0 = 20 = t
  • 114 - 1 * 1 + 4 = 4 = e
  • 291 - 2 * 9 + 1 = 19 = s
  • 355 - 3 * 5 + 5 = 20 = t
  • 943 - these digits are strictly descending, so from it onwards is thrown out, and the decoded text is 'test'
; digit char -> int, e.g. \9 -> 9
user=> (defn int-char->int [c] (- (int c) (int \0)))

; {1 \a, 2 \b, ... 26 \z}
user=> (def number->letter
        (let [numbers (range 1 27)
              letters (map char (iterate inc (int \a)))]
         (zipmap numbers letters)))

user=> (defn decode [s]
        (->> s
         (map int-char->int) ; string of digits -> seq of ints
         (partition 3) ; split into triples
         (take-while (complement #(apply > %))) ; stop if there's a descending triple
         (map (fn [[a b c]] (+ c (* a b)))) ; do the decoding math
         (map number->letter) ; get the character for each number
         (apply str))) ; put the chars into a string

user=> (decode "45011429135594375")
"test"

An exercise for the reader is to use for to generate the 3 digit sequences that 'compute' to (<= 1 n 26) and aren't in descending order, as part of the encoding procedure.

Contributors