Monday, November 12, 2018

Notes on database stuff: PostgreSQL indexing performance

It might be a very good idea to take a look at e.g. https://www.timescale.com/

Total number of rows (quote data only): 749 640 222

Querying for a single 1 hour "candle" (sort of..) without index: 5 minutes. In other words; completely useless.

..doing the same with index: around 200ms or 1500% (15x) faster.  UPDATE: I fiddled around and ended up with these numbers: https://gist.github.com/lnostdal/9e7e710290cd7448cd81b29fc744fbc7 ..this should be faster than my backtester at the moment; at least for non-trivial stuff.

The query I used to test this:

SELECT MAX(bid_price) AS max_bid_price, MIN(bid_price) AS min_bid_price, MAX(ask_price) AS max_ask_price, MIN(ask_price) AS min_ask_price 
FROM bitmex_quote
WHERE timestamp >= '2018-07-30 00:00:00' AND timestamp < '2018-07-30 01:00:00' AND symbol = 'XBTUSD';

I'm sure there are ways of organizing and indexing this kind of data which are better and faster than what I've done, but this is an OK start for now.

Building a partial index takes about 30 minutes for each `symbol`. I'm sure all of this would be much faster on proper hardware and finalized configuration of everything; the point are the nice % speedups.

Here's how I built the index:

CREATE INDEX idx_timestamp_xbtusd
    ON public.bitmex_quote
USING btree
    ("timestamp" ASC NULLS LAST)
    TABLESPACE pg_default
WHERE symbol = 'XBTUSD';

I create one index like this for each `symbol`. These are partial indexes which I think makes sense for this.

https://www.postgresql.org/docs/11/rules-materializedviews.html is probably a nice way to build very fast viewports of the most common timeframes; 1min, 3min, 5min, 15min, 30min, 1h, 2h and so on.

No comments:

Post a Comment