Quantoga: performance

Showing posts with label performance. Show all posts

Friday, March 8, 2019

Clojure quick performance tip: fastest(?) vector concatenation

I don't like CONCAT or MAPCAT; they create intermediate lazy seqs and have some potential problems if you're not careful. I use this instead:

(defn catvec
([s] (catvec [] s))

([init s]
   (persistent! (reduce #(reduce conj! %1 %2)
                        (transient init)
                        s))))

> (catvec [[1 2] nil [3 4 5] [6] [] [7 8]])
[1 2 3 4 5 6 7 8]

> (catvec [0] [[1 2] nil [3 4 5] [6] [] [7 8]])
[0 1 2 3 4 5 6 7 8]

..for me this yields about a 50% performance improvement compared to the perhaps more typical (vec (mapcat ..)) pattern. The key here is the use of a single transient for the entire process and no creation or use of intermediate values.

NOTE: The simplest case i.e. concatenating two vectors together is of course just (into [0 1] [2 3]) => [0 1 2 3] ..which uses transients internally.

I was testing rrb-vector for a while, but it seems to have some serious bugs..

Tuesday, January 15, 2019

Clojure for fast processing of streams of data via LAZY-SEQ and SEQUE

UPDATE: After some back and forth I think clojure.core.async with its several buffers both at source, computation/transformation and sink areas is a better fit for what I'm doing!

LAZY-SEQ and SEQUE are useful in many contexts where you need to speed up processing of streams of data (SEQ) from e.g. another computation, a filesystem or a web or database server -- anything really.

The key idea is that SEQUE will continuously fetch the next n few elements from the stream in advance and in the background (thread) -- while the code that consumes or handles the data keeps on working on the current or already fetched element(s) in the foreground.

A quick video to demonstrate the effect:

SEQUE uses a LinkedBlockingQueue behind the scenes, but you can pass it anything that implements the BlockingQueue interface as needed.

Clojure makes this simple and fun and all of this might be pretty basic and trivial for many, but a small "trick" is needed to set it up correctly -- like this:

Thursday, January 10, 2019

Big data: from compressed text (e.g. CSV) to compressed binary format -- or why Nippy (Clojure) and java.io.DataOutputStream are awesome

Say you have massive amounts of historical market data in a common, gzip'ed CSV format or similar and you have these data types which represents instances of the data in your system:

(defrecord OFlow ;; Order flow; true trade and volume data!

[^double trade ;; Positive = buy, negative = sell.

^double price ;; Average fill price.

^Keyword tick-direction ;; :plus | :zero-plus | :minus | :zero-minus

^long timestamp ;; We assume this is the ts for when the order executed in full.

^IMarketEvent memeta]

(defrecord MEMeta

[^Keyword exchange-id

^String symbol

^long local-timestamp])

A good way to store and access this would be to use a binary format and a modern, fast compression algorithm. The key issue is fast decompression and LZ4HC is the best here as far as I'm aware of -- apparently reaching the limitations of what's possible with regards to RAM speed. To do this we'll use https://github.com/ptaoussanis/nippy which exposes the DataOutputStream class nicely and enables us to express a simple binary protocol for reading and writing our data types, like this:

(nippy/extend-freeze OFlow :QA/OFlow [^OFlow oflow output]

(.writeDouble output (.trade oflow))

(.writeDouble output (.price oflow))

(.writeByte output (case (.tick-direction oflow)

:plus 0, :zero-plus 1, :minus 2, :zero-minus 3))

(.writeLong output (.timestamp oflow))

;; MEMeta

(.writeUTF output (name (.exchange-id ^MEMeta (.memeta oflow))))

(.writeUTF output (.symbol ^MEMeta (.memeta oflow)))

(.writeLong output (.local-timestamp ^MEMeta (.memeta oflow))))

(nippy/extend-thaw :QA/OFlow [input]

(->OFlow (.readDouble input)

(.readDouble input)

(case (.readByte input)

0 :plus, 1 :zero-plus, 2 :minus, 3 :zero-minus)

(.readLong input)

(->MEMeta (keyword (.readUTF input))

(.readUTF input)

(.readLong input))))

..to write out the binary data to a file, you'd do something like (oflow-vector is a vector containing OFlow instances):

(nippy/freeze-to-file "data.dat" oflow-vector

{:compressor nippy/lz4hc-compressor, :encryptor nil, :no-header? true})

..and to read it back in to get a vector of OFlow instances as the result you'd do something like this:

(nippy/thaw-from-file "data.dat"

{:compressor nippy/lz4hc-compressor, :encryptor nil, :no-header? true})

...it's so simple and the result is very, very good in terms of speed and space savings [I'll add some numbers here later]. Of course you'd still want to use something like PostgreSQL for indexed views or access to the data, but this is very nice for fast access to massive amounts of sequential, high resolution data. I've split things up in such a way that each file contains 1 day worth of data; this way it is possible to make fast requests to ranges of the data at any location without doing long, linear searches. 👍

Sunday, December 23, 2018

Clojure quick tip: efficiently find last element in vector satisfying some condition

Trivial, but I keep forgetting about rseq:

user> (first (filter #(even? (do (println "x:" %) %))

(rseq (vec (range 1 100)))))

x: 99

x: 98

user>

Vec is very nice; it'll alias JVM arrays. Anyway, doing (last (filter ..)) would traverse the entire vector which isn't very effective.

PS: Merry XMAS. :)

Monday, November 12, 2018

Notes on database stuff: PostgreSQL indexing performance

It might be a very good idea to take a look at e.g. https://www.timescale.com/

Total number of rows (quote data only): 749 640 222

Querying for a single 1 hour "candle" (sort of..) without index: 5 minutes. In other words; completely useless.

..doing the same with index: around 200ms or 1500% (15x) faster. UPDATE: I fiddled around and ended up with these numbers: https://gist.github.com/lnostdal/9e7e710290cd7448cd81b29fc744fbc7 ..this should be faster than my backtester at the moment; at least for non-trivial stuff.

The query I used to test this:

SELECT MAX(bid_price) AS max_bid_price, MIN(bid_price) AS min_bid_price, MAX(ask_price) AS max_ask_price, MIN(ask_price) AS min_ask_price
FROM bitmex_quote
WHERE timestamp >= '2018-07-30 00:00:00' AND timestamp < '2018-07-30 01:00:00' AND symbol = 'XBTUSD';

I'm sure there are ways of organizing and indexing this kind of data which are better and faster than what I've done, but this is an OK start for now.

Building a partial index takes about 30 minutes for each `symbol`. I'm sure all of this would be much faster on proper hardware and finalized configuration of everything; the point are the nice % speedups.

Here's how I built the index:

CREATE INDEX idx_timestamp_xbtusd
ON public.bitmex_quote
USING btree
("timestamp" ASC NULLS LAST)
TABLESPACE pg_default
WHERE symbol = 'XBTUSD';

I create one index like this for each `symbol`. These are partial indexes which I think makes sense for this.

https://www.postgresql.org/docs/11/rules-materializedviews.html is probably a nice way to build very fast viewports of the most common timeframes; 1min, 3min, 5min, 15min, 30min, 1h, 2h and so on.

Friday, November 2, 2018

Notes on Clojure: memory leaks, performance and more

work in progress; i'll add more to this later

Memory leaks

https://www.eclipse.org/mat/ is great and works with Clojure.
When debugging, setting -Dclojure.compiler.disable-locals-clearing=true can be nice, but make sure you set this back to false (the default) because it can (will?) create memory leaks in non-trivial code.
In some cases you should use eduction instead of sequence when dealing with transducers and huge amounts of data.
Not related to Clojure, but Emacs actually has some leak problems lately: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=26952 so make sure you run a recent Emacs. This is not a problem that bit me often though.

Spent the last 2 days debugging memory. Turns out 90% of the problem was a wrong JVM flag (#1 above) -- and the remaining 10% of the problem was an issue with clojure.core.async which I'll try to document later.

Performance

-Dclojure.compiler.direct-linking=true leads to massive performance improvements in some cases. You'll need to change how you do interactive development after setting this flag tho. I also find return value type hints work better when this is true -- which can help with debugging.
Make sure you set *warn-on-reflection* to true and *unchecked-math* to :warn-on-boxed. Dealing with this correctly will probably give you the biggest performance of everything else mentioned here.
It is very important to not use Fns like FIRST on ArrayLists, ArrayDeques and similar. As it will actually generate a temporary object via an internal call to SEQ on every invocation(!). This might be obvious, but it's easy to forget because a simple call to FIRST seems like something that "should" be cheap to do. Stick with the JVM specific methods and interfaces for these classes; get, peekFirst, peekLast etc..
Also, Vector implements IFn which means you can do (my-vector 0) instead of (first my-vector) for a 4-5x win.
Always use PEEK instead of LAST when you can.
This looks useful: https://github.com/clojure-goes-fast

..I'll add more tips about performance in future posts, here: https://quantoga.blogspot.com/search/label/performance

Thursday, June 7, 2018

Clojure map concurrency test: MAP vs. PMAP vs. UPMAP

Code:
That's a 96% (!) improvement – if I'm not mistaken. I made a quick video about this here: