Friday, March 8, 2019

Clojure quick performance tip: fastest(?) vector concatenation

I don't like CONCAT or MAPCAT; they create intermediate lazy seqs and have some potential problems if you're not careful. I use this instead:
(defn catvec
  ([s] (catvec [] s))

  ([init s]
   (persistent! (reduce #(reduce conj! %1 %2)
                        (transient init)
                        s))))


> (catvec [[1 2] nil [3 4 5] [6] [] [7 8]])
[1 2 3 4 5 6 7 8]


> (catvec [0] [[1 2] nil [3 4 5] [6] [] [7 8]])
[0 1 2 3 4 5 6 7 8]

..for me this yields about a 50% performance improvement compared to the perhaps more typical (vec (mapcat ..)) pattern. The key here is the use of a single transient for the entire process and no creation or use of intermediate values.

NOTE: The simplest case i.e. concatenating two vectors together is of course just   (into [0 1] [2 3]) => [0 1 2 3]  ..which uses transients internally.

I was testing rrb-vector for a while, but it seems to have some serious bugs..

No comments:

Post a Comment