Query in BangDB
Filter for scan and data retrieval in BangDB
For scanning data in BangDB, we may use primary key based scan or secondary key based scan or text key(reversed) based scan or all of these together. This makes the data scan a very robust and flexible process. To help users to deal with definition of these queries, we use dataQuery type. It's not required that we use this type, we could simply write the query in json form and operate.
Scan always returns resultset, which is nothing but an iterable list of key, val which allows certain operations as well. This list defined by type resultset.
Scan may return NULL as well if error is encountered, hence user has to handle NULL as well. Since table or stream may contain large amount of data, hence it will not be able to return all of them at once, hence it will keep returning as required or called by the user.
User may set the limits as well and certain other conditions for filtering. These affect the way data is retrieved and also amount of data is retrieved, both in terms of number of rows or size of the data. It is defined by ScanFilter.
// ScanFiler is defined as
public ScanOperator skeyOp;
public ScanOperator ekeyOp;
public ScanLimitBy limitBy;
public int limit;
public int skipCount;
public int onlyKey;
public int reserved;
// ScanOperator is defined as below
GT, // greater than
GTE, // greater than equal to
LT, // less than
LTE, // less than equal to
EQ, // equal to
NE; // not equal to The ScanOperator is always applied to primary keys only and not the secondary keys.
For secondary keys, we use dataQuery which is defined below ScanLimitBy is used to limit the size of the data that should be retrieved in a single call LIMIT_RESULT_SIZE
.
- limit by size, it takes integer which is in MB
LIMIT_RESULT_ROW
- limit by the number of rows OnlyKey is 0
If we wish to retrieve both key and value, else 1 for only key.
Once we call scan, then it may return partial data and hence we need to keep calling this as needed to get all the data. Here is sample pseudo code for calling scan. Typical way to call scan function is as follows:
ScanFilter sf = new ScanFilter();
ResultSet rs = null;
while (true) {
rs = tbl.scan(rs, pk1, pk2, sf);
if (rs == null) break;
while (rs.hasNext()) {
// use rs
rs.getNextkey(),
rs.getNextVal()
rs.moveNext();
}
}
This will allow user to retrieve the data. If user wishes to break before data retrieval is done, then user will have to clear the rs by calling.
rs.clear();
Scan API
Let's look at the typical scan API in the BangDB. It has following signatures:
For non-json data, i.e. text or opaque data.
Applicable or exposed by BangDBTable
.
For NORMAL_TABLE
:
public ResultSet scan( ResultSet prev_rs,
String pk_skey,
String pk_ekey,
ScanFilter sf,
Transaction txn
)
The same is supported for long and byte[] pk_skey
and pk_ekey
as well.
For document or json data scan:
For WIDE_TABLE
:
public ResultSet scanDoc( ResultSet prev_rs,
String pk_skey,
String pk_ekey,
String idx_filter_json,
ScanFilter sf
)
The same is supported for long and byte[] pk_skey
and pk_ekey
as well. This one has one extra argument, idx_filter_scan
, and this is used for querying using keys other than primary keys.
For NORMAL_TABLE
or for scan()
, it's straight forward as we can only use primary keys there. When we wish to scan entire table then we may pass null for pk_skey
and pk_ekey
for non long type. For long we may use 0 and LONG_MAX_VAL
For WIDE_TABLE
or non-primary key based scan, we have detailed discussed below.
Non primary key based scan
Apart from primary keys, we can use secondary and text(reverse) keys to query data. If we create indexes on these secondary keys then it will boost performance but the index is not required for querying data using these secondary non-primary keys. However, it's highly recommended to strategically create these secondary, reverse indexes for high performance and efficient query.
Now, let's see what's these secondary keys are. Let's consider a sample event or doc/data.
Here is a sample program which does most of the operations to help you understand the APIs and their usage.
{
"name":"sachin",
"org":"bangdb",
"address":{
"home":{
"city":"bangalore",
"state":"ka",
"pin":560034
},
"office":{
"city":"bangalore",
"state":"ka",
"pin":560095
}
},
"fav-qoute":"The happiness of your life depends on the quality of your thoughts"
}
As you see, there could be multiple ways to query here, few examples are:
query1 = using "name", ex; where "name" = "sachin" etc...
query2 = using "address.home.city" = "bangalore"
query3 = using match text, like "quality, thought"
// [ Note, we use reverse index here, search with list of tokens ]
// and so on...
Further we may wish to organize a key in composite manner for the suitability of use cases Here in this doc, we have primary key as long, string or composite and then query using primary key in interesting ways; While long, opaque and string primary keys are fine, composite key is quite interesting and useful in many scenarios; Let's say we wish to have primary key as composite key with following arrangement:
city:name
//or
city:org:name
//etc.
Now we have quite flexibility in querying in different manner:
query4 = find all docs where city could be any city but name is "sachin";
//here we may use
*:sachin
Or name has "sac" as initial characters
*:sac$% [ $% means match everything before these chars but after ':' ]
Or any city and any name as long as org is "bangdb"
*:bangdb:* [ using city:org:name as key arrangement ]
query5 = find all the doc where home.city is equal to office.city home.city = $office.city
This allows users to scan with data present in the doc itself (helpful in stream). See the next section for example code.
DataQuery Type API for client
Create DataQuery object
DataQuery();
To add a query when filter value is string
void addQuery(
const char *filterKey,
ScanOperator comp_op,
const char *filterVal,
JoinOperator jOp = JO_AND
);
To add a query when filter value is long
void addQuery(const char *filterKey, ScanOperator comp_op, long filterVal, JoinOperator jOp = JO_AND);
To add a query when filter value is double
void addQuery(const char *filterKey, ScanOperator comp_op, double filterVal, JoinOperator jOp = JO_AND);
To add a query when filter values are list of words
void addQuery(const char *matchWordList, JoinOperator wordJoin, JoinOperator queryJoin, const char *field);
To get QueryType
void setQueryType(int type);
To get the query
const char *getQuery();
User should delete the returned data using delete[] To print query
void printQuery();
To delete DataQuery object
virtual ~DataQuery();
Please see Table API for details on Scan API