Sunday, September 13, 2015

Query optimization in Orientdb

If you have used Orientdb with quite large dataset you may have experienced a huge performance decrease in query execution when the number of records increases. You may also seen tips from orientdb saying the number of record reads is above 50000 or some number and use some index to reduce the number of record reads.

The reason for this type of behavior is it has some limitations when choosing for effective indexes. And when the number of record reads it has to perform the performance reduces drastically. Let me explain the scenario with an example.

Consider a class A, with more than 100000 records and if it has N attributes as attr1, attr2, .... attrN. Also assume that attr1 has cardinality of 1, attr2 has cardinality of 25000, attr3 has cardinality of 50000 likewise. We also have indexes for each and every attributes seperately. (For the time being lets not consider about composite indexes)

So if we execute a query like this;

select from A where attr1 = 'aaa' and attr2 = 'bbb' and attr3 = 'ccc' ...

If you use explain command from the orientdb studio you may see orientdb is applying only the index on attr1. This is because attr1 is the first in the where condition of the select query. Since it is an attribute with very low cardinality the number of document reads will be very high. Therefore the index is almost useless in this kind of situations.

So if you want to optimize the performance of this query try to get the attribute with very high cardinality first where the number of document reads will be reduced.

Furthermore if you want to enforce the use of composite indexes use parenthesis like this.  

select from A where (attr1 = 'aaa' and attr2 = 'bbb') and attr3 = 'ccc' ...

Hope this will be helpful to some one who is struggling with orientdb query optimization.

Note: Above explanation is done for the orientdb version 2.1.1.