Data Structures
This work is licensed under a Creative Commons Attribution 3.0 Unported License (including images & stylesheets). The source is available on Github.
Intro
This cookbook covers some common tasks with core Clojure data structures. It assumes that the reader is familiar with foundational read and write operations (get, conj, assoc, update, etc). For more coverage of getting started with Clojure data structures, some recommended resources are:
- Clojure.org's reference on collections
- Intro to Clojure: Data Structures
- Intro to Clojure: Functions for Creating Data Structures - also, the section right after this one, which covers manipulating them
Vectors
Intro
Vectors are probably the most commonly used data structure for sequential data that isn't code. The random access capability and the fact that vectors aren't treated as function calls make them often a better choice than lists.
Constructing Vectors
mapv
and filterv
are eager versions of map
and filter
that return vectors
user=> (mapv second [[1 1] [2 4] [3 9]])
[1 4 9]
user=> (filterv even? (range 10))
[0 2 4 6 8]
Accessing Elements
Vectors implement the stack protocol, meaning peek
and pop
return the last element and the vector without the last element, respectively.
user=> (peek [7 8 9])
9
user=> (pop [7 8 9])
[7 8]
(Non-)emptiness
Sometimes it's desirable to treat an empty vector as a falsey value, especially in conditionals. The not-empty
function will return its argument if it's not empty, otherwise it will return nil.
user=> (defn notate-range
"takes a vector of [start end] (both optional) and returns a range notation"
[coords]
(if-not (empty? coords)
(let [[start end] coords]
(str start ".." end))
"empty range"))
#'user/notate-range
user=> (notate-range [1 5])
"1..5"
user=> (notate-range [7])
"7.."
user=> (notate-range [])
"empty range"
; we can also write this as
user=> (defn notate-range
"takes a vector of [start end] (both optional) and returns a range notation"
[coords]
(if-let [[start end] (not-empty coords)]
(str start ".." end)
"empty range"))
Here, not-empty
is used to make the conditional false (the empty vector is truthy, and not-empty
makes it nil
).
Linear-time Operations
Operations that need to search through elements one-by-one or operate on a large number of elements are less efficient, but still valuable in some contexts.
The replace
function has a special case for vectors that returns a vector. It takes a map where the keys are the values to replace and the values are the replacements.
; change :begin to :start and :finish to :end
user=> (replace {:begin :start, :finish :end}
[:begin :middle :finish :begin :finish])
[:start :middle :end :start :end]
"Deleting" an item from anywhere but the last position is not a 'fast' operation, because conceptually all the indexes after the deleted item need to be decremented by one. We can use into
to combine two sub-vectors, but the trade-off is that this operation adds the items from the second sub-vector one at a time, so this function is O(n) in the size of the second sub-vector.
user=> (defn delete-at [v pos]
(into (subvec v 0 pos) (subvec v (inc pos))))
#'user/delete-at
user=> (delete-at [:a :b :c] 1)
[:a :c]
Inserting an item at an arbitrary position has the same issue as deleting - it potentially changes a lot of indexes. An implementation is also similar to delete-at
, with the same caveat about efficiency.
user=> (defn insert-at [v pos value]
(into (conj (subvec v 0 pos) value) (subvec v pos)))
#'user/insert-at
user=> (insert-at [:a :b :c] 2 "interloper!")
[:a :b "interloper!" :c]
Searching
Checking for the existence of a value (as opposed to an index) requires examining the elements one-by-one. A point of confusion that sometimes arises is the contains?
function, which searches an associative collection for the existence of a key/index, and is very seldom desirable for vectors.
; `contains?` looks for an index!!!
user=> (contains? [:a :b :c] :b)
false
user=> (contains? [:a :b :c] 2)
true ; because the indexes are 0, 1, and 2
A commonly-used way to check for an occurrence of a certain value in a collection is the some
function with a set as a predicate. some
stops at the first truthy return, so there's no 'wasted' computation.
user=> (some #{:b} [:a :b :c])
:b
Note that searching for false
or nil
in this way won't work. In those cases use the nil?
or false?
functions.
Java interop can be used to get the first or last index of a specific value.
user=> (.indexOf [:a :b :c :b :a] :b)
1
user=> (.lastIndexOf [:a :b :c :b :a] :b)
3
; indexOf returns -1 when the value isn't found
user=> (.indexOf [:a :b] :c)
-1
keep-indexed
can be used to find all the indexes of a value (or indexes that match some predicate). Note that this is actually a function that works on (and returns) a sequence.
user=> (defn indexes-of [search-value coll]
(keep-indexed (fn [i v] (when (= search-value v) i)) coll))
#'user/indexes-of
user=> (indexes-of 15 [5 10 15 30 15 5])
(2 4)
Maps
Intro
Maps can be hashed or sorted (array maps are also available, but are mostly used for maps with < 10 entries). Sorted maps aren't as fast to lookup by key, but the sorting by key makes them useful for iteration in some circumstances.
Building Maps
zipmap
associates corresponding entries from two seqs into a map. This is used here to assign people to teams.
; assign participants to teams
user=> (let [participants ["Mike" "Tina" "Alice" "Fred"]
team-nums (cycle [1 2])]
(zipmap participants team-nums))
{"Mike" 1, "Tina" 2, "Alice" 1, "Fred" 2}
group-by
could be used to build the team rosters, but the result still has the team numbers in the roster.
user=> (group-by val {"Mike" 1, "Tina" 2, "Alice" 1, "Fred" 2})
{1 [["Mike" 1] ["Alice" 1]], 2 [["Tina" 2] ["Fred" 2]]}
update-vals
(added in Clojure 1.11) could be tacked on to clean up the output
user=> (-> (group-by val {"Mike" 1, "Tina" 2, "Alice" 1, "Fred" 2})
(update-vals #(mapv first %)))
{1 ["Mike" "Alice"], 2 ["Tina" "Fred"]}
reduce-kv
allows more control over the values, meaning it's possible to go directly to "clean" output (and maybe even different data structures).
user=> (reduce-kv
(fn [acc k v] (update acc v (fnil conj #{}) k))
{} {"Mike" 1, "Tina" 2, "Alice" 1, "Fred" 2})
{1 #{"Alice" "Mike"}, 2 #{"Tina" "Fred"}}
frequencies
takes a collection and returns a map of elements in that collection to how many times it appears. This is used here for a rudimentary word counter.
user=> (-> "the cat in the hat fell in the vat"
(str/split #" ")
frequencies)
{"the" 3, "cat" 1, "in" 2, "hat" 1, "fell" 1, "vat" 1}
clojure.set
has a map-invert
function that will swap the keys and values. The "unique key" facet of maps means that duplicate values in the input map will get one of the keys.
user=> (require '[clojure.set :as set])
user=> (let [squares {1 1, 2 4, 3 9, 4 16}
sqrts (set/map-invert squares)]
(sqrts 9))
3
Accessing Entries
While select-keys
can be used to create a submap, it is sometimes desirable to pull some keys into a sequence. In this example, juxt
is used to make a vector of the :x
and :y
values of maps for passing to another function.
user=> (defn manhattan-distance [[x1 y1] [x2 y2]]
(let [distance (comp abs -)]
(+ (distance x1 x2) (distance y1 y2))))
#'user/manhattan-distance
user=> (let [point1 {:x 2 :y 0}
point2 {:x 0 :y 2}
xy-coords (juxt :x :y)]
(manhattan-distance (xy-coords point1)
(xy-coords point2)))
4
Lists
Intro
Lists are primarily used for code in Clojure. While there aren't a ton of functions aimed specifically at lists, lists do implement the sequence interface, so all sequence functions work on lists without any conversion.
Accessing Elements
Like vectors, lists implement the stack protocol, but unlike vectors, the 'top' of the stack is the front of the list. So, 'first' on a list returns the same element as peek (the top of the stack), but on a vector 'first' returns the bottom of the stack (this can be useful for queueing in LIFO vs FIFO order, for example).
user=> (peek '(1 2 3))
1
user=> (pop '(1 2 3))
(2 3)
Sets
Intro
In addition to the distinctness of sets, they're fast to check membership, so if values are being collected for the purposes of checking whether they've been seen or not, a set is often a good choice.
Constructing sets
While keys
on a map returns a sequence, the Java interop call to keySet
can be used to get a set of keys on the map. This set is technically an anonymous instance of Java's AbstractSet
.
user=> (.keySet {:a 1 :b 2})
#{:a :b}
Sets as predicates
Because sets can be used as functions, they can be used as predicates with various higher-order functions.
; are there any fives in the sequence?
user=> (some #{5} [1 2 3 5 8 13])
5
; how many `a`s and `c`s are in the DNA sequence?
user=> (count (filter #{\a \c} "agctgcgcatagcgt"))
7
;which word(s) can be typed using only the top row of a qwerty keyboard?
user=> (let [top-row (set "qwertyuiop")
candidates ["poet" "computer" "typewriter" "desk"]]
(filter #(every? top-row %) candidates))
("poet" "typewriter")
Relations
In addition to the more primitive set functions, the clojure.set
namespace contains the fundamental relational algebra (the underpinnings of SQL) operations. Sets of maps can be treated as relations, providing the ability to do joins, projections, etc on in-memory data structures.
user=> (require '[clojure.set :as set])
user=> (let [owners #{{:name "Jane" :pet "Fido"}
{:name "Tim" :pet "Scaly"}}
pets #{{:name "Fido" :species "dog"}
{:name "Scaly" :species "snake"}}]
(-> (set/rename pets {:name :pet-name}) ; rename the :name key to disambiguate
(set/join owners {:pet-name :pet}) ; join with owners (on pet-name = pet)
(set/project [:name :species]))) ; project (select) only the owner's name and their pet's species
#{{:name "Jane", :species "dog"}
{:name "Tim", :species "snake"}}
Sequences
Intro
Lots of clojure.core
functions operate on sequences. In general, a function that operates on sequences will return a sequence, even when the coll
(a common name for a collection argument) is something like a vector or map.
Exercise: search for clojure.core
functions with a coll
argument
While not perfectly accurate or comprehensive, a possible starting point for exploring sequence functions is to search for functions in the clojure.core
namespace that have an argument named coll
.
; since the arglist is nested, helper function to check for a `coll` arg
user=> (defn has-coll-arg?
"Arglists are stored as a list of arg vectors, e.g. ([x] [x y]).
Returns the first `coll` in an arg vector, else nil."
[arglist]
(some ; some over the arg-vecs
(fn [arg-vec] (some ; some over the members of each arg-vec
#{'coll} arg-vec)) arglist))
user=> (->> (ns-publics (the-ns 'clojure.core))
vals
(map meta)
(filter (comp has-coll-arg? :arglists))
(map :name)
sort)
(->Eduction
assoc!
associative?
...)
This example, in addition to illustrating some of the runtime introspection capabilities of Clojure, illustrates a few sequence functions. some
is used on both lists and vectors (and uses the 'set as predicate' idiom from above to search through the vectors). Depending on exactly which version of Clojure is in use, this code returns a sequence of around 80 functions.
Exercise: Convoluted Encoding
Here's a contrived encoding algorithm to demonstrate a few sequence functions. The decoding has the following rules:
- the encoded string will be a sequence/string of numbers
- for every three digits, multiply the first two and add the third until a strictly descending three digits is reached (or until all digits are exhausted)
- each resulting number corresponds to its ordinal position in the alphabet (e.g. 1 = a, 2 = b, etc)
So, for example, "45011429135594375" becomes...
- 450 - 4 * 5 + 0 = 20 = t
- 114 - 1 * 1 + 4 = 4 = e
- 291 - 2 * 9 + 1 = 19 = s
- 355 - 3 * 5 + 5 = 20 = t
- 943 - these digits are strictly descending, so from it onwards is thrown out, and the decoded text is 'test'
; digit char -> int, e.g. \9 -> 9
user=> (defn int-char->int [c] (- (int c) (int \0)))
; {1 \a, 2 \b, ... 26 \z}
user=> (def number->letter
(let [numbers (range 1 27)
letters (map char (iterate inc (int \a)))]
(zipmap numbers letters)))
user=> (defn decode [s]
(->> s
(map int-char->int) ; string of digits -> seq of ints
(partition 3) ; split into triples
(take-while (complement #(apply > %))) ; stop if there's a descending triple
(map (fn [[a b c]] (+ c (* a b)))) ; do the decoding math
(map number->letter) ; get the character for each number
(apply str))) ; put the chars into a string
user=> (decode "45011429135594375")
"test"
An exercise for the reader is to use for
to generate the 3 digit sequences that 'compute' to (<= 1 n 26)
and aren't in descending order, as part of the encoding procedure.
Contributors
- @bobisgeek - original author