As a quick note to self:
(jdbc/with-db-connection [conn -db-conn-]
(let [t (time.coerce/from-string "2018-02-01")]
(time
(dotimes [i 50]
(let [ts-start (time/plus t (time/hours i))
ts-end (time/plus t (time/hours (inc i)))]
(time (println {:data (jdbc/query conn ["SELECT
first_value(bid_price) OVER w AS first_bid_price,
first_value(ask_price) OVER w AS first_ask_price,
MAX(bid_price) OVER w AS max_bid_price,
MAX(ask_price) OVER w AS max_ask_price,
MIN(bid_price) OVER w AS min_bid_price,
MIN(ask_price) OVER w AS min_ask_price,
last_value(bid_price) OVER w AS last_bid_price,
last_value(ask_price) OVER w AS last_ask_price
FROM bitmex_quote
WHERE timestamp >= ? AND timestamp < ? AND symbol = 'XBTUSD'
WINDOW w AS (ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
ORDER BY timestamp ASC
LIMIT 1;"
(time.coerce/to-sql-time ts-start) (time.coerce/to-sql-time ts-end)])
:timestamp (str ts-end)})))))))
...for this to perform reasonably you'll need to do indexing.
Tuesday, November 13, 2018
Monday, November 12, 2018
Notes on database stuff: PostgreSQL indexing performance
It might be a very good idea to take a look at e.g. https://www.timescale.com/
Total number of rows (quote data only): 749 640 222
Querying for a single 1 hour "candle" (sort of..) without index: 5 minutes. In other words; completely useless.
..doing the same with index: around 200ms or 1500% (15x) faster. UPDATE: I fiddled around and ended up with these numbers: https://gist.github.com/lnostdal/9e7e710290cd7448cd81b29fc744fbc7 ..this should be faster than my backtester at the moment; at least for non-trivial stuff.
The query I used to test this:
SELECT MAX(bid_price) AS max_bid_price, MIN(bid_price) AS min_bid_price, MAX(ask_price) AS max_ask_price, MIN(ask_price) AS min_ask_price
FROM bitmex_quote
WHERE timestamp >= '2018-07-30 00:00:00' AND timestamp < '2018-07-30 01:00:00' AND symbol = 'XBTUSD';
I'm sure there are ways of organizing and indexing this kind of data which are better and faster than what I've done, but this is an OK start for now.
Building a partial index takes about 30 minutes for each `symbol`. I'm sure all of this would be much faster on proper hardware and finalized configuration of everything; the point are the nice % speedups.
Here's how I built the index:
CREATE INDEX idx_timestamp_xbtusd
ON public.bitmex_quote
USING btree
("timestamp" ASC NULLS LAST)
TABLESPACE pg_default
WHERE symbol = 'XBTUSD';
I create one index like this for each `symbol`. These are partial indexes which I think makes sense for this.
https://www.postgresql.org/docs/11/rules-materializedviews.html is probably a nice way to build very fast viewports of the most common timeframes; 1min, 3min, 5min, 15min, 30min, 1h, 2h and so on.
Friday, November 2, 2018
Notes on Clojure: memory leaks, performance and more
work in progress; i'll add more to this later
Memory leaks
- https://www.eclipse.org/mat/ is great and works with Clojure.
- When debugging, setting -Dclojure.compiler.disable-locals-clearing=true can be nice, but make sure you set this back to false (the default) because it can (will?) create memory leaks in non-trivial code.
- In some cases you should use eduction instead of sequence when dealing with transducers and huge amounts of data.
- Not related to Clojure, but Emacs actually has some leak problems lately: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=26952 so make sure you run a recent Emacs. This is not a problem that bit me often though.
Performance
- -Dclojure.compiler.direct-linking=true leads to massive performance improvements in some cases. You'll need to change how you do interactive development after setting this flag tho. I also find return value type hints work better when this is true -- which can help with debugging.
- Make sure you set *warn-on-reflection* to true and *unchecked-math* to :warn-on-boxed. Dealing with this correctly will probably give you the biggest performance of everything else mentioned here.
- It is very important to not use Fns like FIRST on ArrayLists, ArrayDeques and similar. As it will actually generate a temporary object via an internal call to SEQ on every invocation(!). This might be obvious, but it's easy to forget because a simple call to FIRST seems like something that "should" be cheap to do. Stick with the JVM specific methods and interfaces for these classes; get, peekFirst, peekLast etc..
- Also, Vector implements IFn which means you can do (my-vector 0) instead of (first my-vector) for a 4-5x win.
- Always use PEEK instead of LAST when you can.
- This looks useful: https://github.com/clojure-goes-fast
Subscribe to:
Posts (Atom)