Table (Embedded)

C++

Java

There are very few APIs here to deal with. For simplicity, naming has been done to help understand the API.

Here is the convention:

NORMAL_TABLE and PRIMITIVE_TABLE
Key/val - operations
val is opaque data, text, fixed native type etc.
index is not supported
put(), get(), del(), scan() are the ops

WIDE_TABLE
Document data
index is supported
put_doc(), scan_doc(), get(), del() are ops
put_text() & scan_text() are used

when we wish to store text with reverse indexing fr entire text. The text is normal sentence and not necessarily json (mostly not json, for json use put_doc() and scan_doc())

LARGE_TABLE
Large data, files etc.
put_file(), put_large_data(), get_file(), get_large_data(), and few more apis
Index can't be created
primary key has to be COMPOSITE type

Please see more on this at bangdb common.

Following is the API details:

int closeTable(CloseType tblCloseType = DEFAULT, char *newname = NULL, bool force_close = false);

This closes the table and return 0 for success and -1 for error. bangdb_closeType is as defined above. Table maintains open reference count hence it will not close if there are open references to the table. When open reference is 0 then it will close. However, if we wish to override this behaviour then we must pass force_close = true.

int addIndex(const char *idxName, TableEnv *tenv);

This is the generic api for adding index for a table. The TableEnv describes the index type etc. It returns 0 for success and -1 for error.

// always creates index with following property 
BTREE,
INMEM_PERSIST,
QUASI_LEXICOGRAPH,
SORT_ASCENDING, 
log = off
int addIndex_str(const char *idxName, int idx_size, bool allowDuplicates);

This is special helper function for index creation. When we wish to add index for string type, then we use this method. idx_size is size in byte for the index keys that should be allowed. allowDuplicates sets if duplicate values will be allowed or not. It returns -1 for error and 0 for success.

int addIndex_num(const char *idxName, bool allowDuplicates);

This is special helper function for index creation. When we wish to add index for number type, then we use this method. Allowduplicates sets if duplicate values will be allowed or not. It returns -1 for error and 0 for success.

int dropIndex(const char *idxName);

This will drop the index and clear the related data from the file system and database. It returns -1 for error and 0 for success.

bool hasIndex(const char *idxName);

This returns reference to the TableEnv for the index for the given indexname. If you just need reference then set copy = false, in this case never delete the returned reference. If you wish copy then you must delete and clear the memory. For error it returns NULL.

TableEnv *getTableEnv();

This returns the copy of TableEnv reference for the table and user must delete it after use. For error it returns NULL.

int dumpData();

This dumps the data for the table which forces all data for the table to be written on the filesystem. It returns -1 for error and 0 for success.

const char *getName();

This returns name of the table, and user should delete it. It returns NULL for error.

const char *getTableDir();

This returns full table path on file system, else returns NULL for error. User should delete this.

IndexType getIndexType();

This returns index type for the table. Index type is for the primary key for the database. Here are the options of the index type.

HASH, // Not Supported
EXTHASH, // Supported, for hash keys
BTREE, // Supported, for sorted order
HEAP, // Deprecated
INVALID_INDEX_TYPE, // Invalid type

const char *getStats(bool verbose = true);

This will return json string for table stats. Verbose will dictate the brevity of the response. For errors, it will return NULL.

// for files - supported only for LARGE_TABLE
long putFile(FDT *key, const char *file_path, InsertOptions iop);

This is only supported for Large Table type (see BangDBTable_type). We can upload small or very large file using this api. Key is typically file id (string only) and file_path is the actual location of file on the server. As of now it takes local file path, but in next version it may take network or url as well. InsertOptions define how do we wish to put the file or data and the options are:

INSERT_UNIQUE, // if non-existing then insert else return
UPDATE_EXISTING, // if existing then update else return
INSERT_UPDATE, // insert if non-existing else update
DELETE_EXISTING, // delete if existing
UPDATE_EXISTING_INPLACE, // only for inplace update
INSERT_UPDATE_INPLACE, // only for inplace update

Last two options should be used with caution and we will discuss more on this later. This returns 0 for success and -1 for error.

long getFile(FDT *key, const char *fname, const char *fpath);

This returns the reference to the DBParam supplied during getInstance call. Please note it's reference therefore it should never be deleted by the user. DB should only clean it.

This is only supported for Large Table type (see BangDBTable_type). We can get the file from the server identified by the key and name the file as fname and store in the fpath on local system. This returns 0 for success and -1 for error.

long putLargeData(FDT *key, FDT *val, InsertOptions iop);

This is only supported for Large Table type (see BangDBTable_type). We can use this api to put large binary data (not file) identified with key (string only). Iop describes the insert options as explained above. It returns 0 for success and -1 for error.

long getLargeData(FDT *key, char **buf, long *len);

This is only supported for Large Table type (see BangDBTable_type). We can use this api to get large data from the table identified with key. The data will be stored in buf and length of the data in len variable. For success it returns 0 else -1 for error.

char *listLargeData_keys(char *skey = NULL, int list_size_mb = MAX_ResultSet_SIZE);

This returns list of large data keys. list_size_mb defines if we need to restrict the list size, default is MAX_ResultSet_SIZE. The return value is json string, and it contains the last key which should be used for recursive subsequent calls.

int countSliceLargeData(FDT *key);

This is only supported for Large Table type (see BangDBTable_type). BangDB stores large data in chunks or slices and this api will help us count the slices for the given data (file or binary data) identified by the key. It returns num of slices for success else -1 for error.

long countLargeData();

This is only supported for Large Table type (see BangDBTable_type). It returns count of large data in the db, else -1 for error.

int delLargeData(FDT *key);

This is only supported for Large Table type (see BangDBTable_type). It deletes the large data identified with the key and returns 0 on success and -1 for error.

// for opaque data
long put(FDT *key, FDT *val, InsertOptions flag = INSERT_UNIQUE, Transaction *txn = NULL);

long put(FDT *key, DATA_VAR *val, InsertOptions flag = INSERT_UNIQUE, Transaction *txn = NULL);

This is used for Normal Table type (see BangDBTable_type). It puts key and val in to the table. If this put operation is within transaction boundary then pass the transaction reference as well. Here it's very similar to previous put, except that it takes DATA_VAR for val, where user can define few other things and also use pre allocated buffer for val, useful when we want to avoid too many memory creation and deletion on heap. It returns 0 for success and -1 for error.

ResultSet * scan(
  ResultSet * prev_rs,
  FDT * pk_skey, FDT * pk_ekey,
  scan_filter * sf = NULL,
  DATA_VAR * dv = NULL,
  Transaction * txn = NULL
);

This is used for Normal Table type (see BangDBTable_type). This scans the data between sk and ek, the two primary keys. Either or both of these primary keys could be NULL. The scan_filter describes how to scan. Please note that the prev_rs argument should be NULL for the first call and for subsequent call it should be the rs returned on the previous call. This to ensure that recursive scan works without any issues.

Here is the definition of scan_filter:

scan_operator skey_op; // default GTE
scan_operator ekey_op; // default LTE
scan_limit_by limitby; // default LIMIT_RESULT_SIZE
short only_key = 0; // if we wish to retrieve only key and no value
short reserved = 0; // see notes below;
int limit; // default 2MB (MAX_ResultSet_SIZE) for LIMIT_RESULT_SETSIZE
int skip_count; // this is set by the db during scan, don't touch
void *arg; // any extra arg, interpreted by the callee

Reserved

Reserved has different meaning for different numbers.

0 - default value, don't do anything [ no interpretation ]
1 to 9 - select the key for secidx for this idx in the array [ in the order of defining the idx ], note this value starts from 1, in code it should be from 0 (i-1)
10 - select first one only from sec idx [ among duplicates ] for EQ
11 - select last one only from sec idx [ among duplicates ] for EQ
12 - means interpret arg as secidx_pkinfo object ptr
13 - use linear scan and not through secondary indexes
14 - use partial as well, useful for scan_text, rev idx scan, we would like to select partial ones as well.

// for text data, supported for only WIDE_TABLE
// reverse indexes the data (str) 
// FDT *key, if null then timestamp 
long putText(const char *str, int len, FDT *k = NULL, InsertOptions flag = INSERT_UNIQUE);

This api is for wide table only. Tt's used to put text which will be reversed indexed fully. User may provide some key or else timestamp will be used. Supported for only WIDE_TABLE. It returns 0 for success and -1 for error.

ResultSet *scanText(const char *wlist[], int nfilters, bool intersect = false);

This is to search using list of keys/ tokens/ words. Wlist is a list of all tokens that would be used for searching, nfilters tells the number of tokens in the list, and intersect boolean tells if we need to search with OR(false) or AND(true) condition. It returns ResultSet for success or NULL for error.

long putDoc(const char *doc, FDT *pk = NULL, const char *rev_idx_fields_json = NULL, InsertOptions flag = INSERT_UNIQUE);

This APIs for for wide table only, and to put json document pointed by doc. Pk is primary key if any, rev_idx_fields describes if there are set of fields that should be reversed indexed.

rev_idx_fields_json = {"_rev_idx_all":0, "_rev_idx_key_list":["name", "city"]}

Secondary indexed are defined using addindex API as described previously. put_doc updates all indexes accordingly. If pk is NULL then BangDB uses timestamp as key, if rev_idx_fields_json is NULL then it doesn't do reversed indexing and default InsertOptions is INSERT_UNIQUE. Upon success, it returns 0 else -1.

ResultSet * scanDoc(
    ResultSet * prev_rs, FDT * pk_skey = NULL,
    FDT * pk_ekey = NULL,
    const char * idx_filter_json = NULL,
    scan_filter * sf = NULL
);

This is used for wide table only. It's used for scanning the db for query. Query could combine primary index, secondary indexes and reversed indexes (through idx_filter_json) idx_filter_json can be defined directly by writing a json query or dataQuery type could be used to create the query in simple manner. Here is how the query looks like:

"{"query":[{"key":"city.name","cmp_op":4,"val":"paris"},
{"joinop":0},{"match_words":"sachin, rahul","joinop":1,"field":"name.first"}]}"

The query is combining secondary index “city.name” and reversed index “name.first”. Joinop = 0 means AND, therefore, fetch all the documents where name of the city is paris and first name contains sachin or rahul.

{"query":[{"key":"name","cmp_op":4,"val":"sachin"},
{"joinop":0},{"key":"age","cmp_op":0,"val":40}]}

Here the query is find all documents where name is “sachin” and age is greater than 40. Both name and age are secondary indexes. We don't use reversed index here. Further, we can query for following:

{"query":[{"key":"price","cmp_op":3,"val":"$quote"}], "qtype":2}

Here query says find all documents where price is less than quote in the same doc. etc. Please see the query section for detail discussion on this. Upon success it returns ResultSet else NULL for error.

int get(FDT *key, FDT **val, Transaction *txn = NULL);

This could be used for any table except large table. Given a key, it will return value in val attribute. This returns 0 for success and -1 for error.

int get(FDT *key, DATA_VAR *val, Transaction *txn = NULL);

This could be used for any table except large table. Given a key, it will return value in val attribute. Please note val is DATA_VAR here, which can be used for avoiding creation of too many object on heap and then deletion. This returns 0 for success and -1 for error.

long del(FDT *key, Transaction *txn = NULL);

This could be used for all table types. It deletes the data defined by key. It returns 0 for success else -1 for error.

long count(FDT *pk_skey, FDT *pk_ekey, const char *idx_filter_json = NULL, scan_filter *sf = NULL);

We can cound the number of documents, or rows using this method with supplied query filter. This could also take primary index, secondary indexes and reverse index all together or as needed. It returns count if successful else -1 for error.

long expCount(FDT *skey, FDT *ekey);

This API returns expected count between two keys. Please note this is not the exact count but a rough measurement. If there are large number of keys in the table and we wish to know rough estimate of count, then this function can be very efficient and fast with very less overhead. Returns count if successful else -1 for error.

long count();

This convenient function or special api for previous count() api where it counts all the documents or rows in the table. It works for normal and wide table.

void printStats();

This API prints stats of the table

void setAutoCommit(bool flag);

This is used if we wish to enable auto commit for single operation.

bool isSameAs(BangDBTable *tbl);

Returns true if this table is same as the given table, else returns false.

BangDBTable_type getTableType();