BangDB ML Helper offers several APIs to help simplify the ML related activities. The type offers features from Training model, prediction, versioning of model, deployment to managing large files and binary objects related to ML. Check out the few real world examples for to learn more or try them out on BangDB.

C++
Java

To create MLHelper object

BangDB MLHelper(train_pred_brs_info *tpbinfo, const char *conf_path = NULL, bool isssl = true)

To create a bucket to store all intermediate training and testing files

int createBucket(const char *bucket_info)

bucket_info is the name for the bucket to be created. It returns -1 for error.

To create or to change name of the bucket

void setBucket(const char *bucket_info)

To upload the files required to train or predict

long uploadFile(const char *key, const char *fpath, InsertOptions iop)

The key is the id of the file fpath takes the path to the file including the file name.

It returns -1 for error.

This is to train a model we should call trainModel API. This API returns immediately and if successful then it schedules training of the model. User should call getModelStaus() for sometime until it returns the end status.

int trainModel(const char *req)

It takes a training request and returns status of the training request. It returns -1 for error.

To get status of the model when training request is fired

char *getModelStatus(const char *req)

Req input parameter is like following:

req = {"schema-name":, "model_name": }

And the return value is like following:

{"schema-name":, "model_name":, "train_start_ts":, "train_end_ts":, "train_state":,}

The above is true for ML related model status. For IE (Information Extraction) related model status use following:

It returns NULL for error or errcode as -1, else errcode for success. User should free the memory using delete[].

To delete the mode

int delModel(const char *req)

This delete model by passing req parameter. req = {"schema_name":,"model_name":} It returns -1 for error.

To delete training request

int delTrainRequest(const char *req)

This is to delete the training request. Helpful when training got stuck for some reasons and the status was not updated properly. It returns -1 for error.

To predict for a particular data or event

char *predict(const char *req)

Here is how req looks like:

{schema-name, attr_type: NUM, data_type:event, re_format:N, model_name: model_name, data:"1 1:1.2 2:3.2 3:1.1"}

It returns NULL for error or errcode as -1 else errcode. User should free the memory using delete[].

To get to training request all all models for a particular schema

ResultSet *getTrainingRequests(const char *schema)

It returns NULL for error code.

To get training request for a particular model

char *getRequest(const char *req) req : {“schema_name": ,"model_name": }

It returns NULL for error or errcode as -1 else errcode. User should free the memory using delete[].

This sets the status for a particular training request

int setModelStatus(const char *status) status = {“schema_name": ,"model_name": ,"status": }

It returns -1 for error.

To get prediction status

char *getModelPredStatus(const char *req) req = {"schema-name":, "model_name": }

It returns NULL for error or errcode as -1 else errcode. User should free the memory using delete[].

To delete prediction request

int delPredRequest(const char *req) req = {"schema-name":, "model_name": “file_name":}

It returns -1 for error.

To upload any ML related file

long uploadFile(const char *bucket_info, const char *key, const char *fpath, InsertOptions iop)

Key is the id for the file and fpath takes the path to the file including the file name.

To download a file from a given bucket

long downloadFile(const char *bucket_info, const char *key, const char *fname, const char *fpath)

It returns -1 for error.

To get the binary from the given buckets

long getObject(const char *bucket_info, const char *key, const char **data, long *datlen)

It gets the object(binary or otherwise) from the given bucket, key. It fills data with the object and sets the datlen as length or size of the object. It returns -1 for error.

To delete a file from a bucket

int delFile(const char *bucket_info, const char *key)

It returns -1 for error.

To delete a bucket

int delBucket(const char *bucket_info)

It returns -1 for error.

To count the number of buckets

long countBuckets()

It returns -1 for error or count for success.

To get number of slices are there for the given file

int countSlices(const char *bucket_info, const char *key)

Since BRS (bangdb resource server) stores large files and objects in chunks, therefore we can count how many slices are there for the given file (key) by calling this function. It returns -1 for error for count for success.

To count object in a given bucket

long countObjects(const char *bucket_info)

It returns -1 for error.

To get details of all the objects in a given bucket

char *countObjectsDetails(const char *bucket_info)

It returns NULL for error else the details. User should free memory using delete[].

Count the number of models for a schema

long countModels(const char *schema)

It returns -1 for error else count.

To get list of objects for a given buckets

char *listObjects(const char *bucket_info, const char *key = NULL, int list_size_mb = 0)

This returns json string with the list of objects in a given bucket for a given key or for all keys It returns NULL for error else the object list. User should free the memory of returned data using delete[].

To get list of buckets present

char *listBuckets(const char *user_info)

This returns the list of all buckets for the user given by user_info which looks like following:

{"access_key":"akey", "secret_key":"skey"}

It returns NULL for error else the object list. User should free the memory of returned data using delete[].

To get data from stream to train model

long uploadStreamDataForTrain(const char *req)

It returns -1 for error.

To close the BangDB MLHelper

void close BangDB ML Helper ()

To delete MLHelper object

virtual ~ BangDB ML Helper()

To get instance of the MLHelper

public static synchronized BangDBMLHelper getInstance(String[] train_pred_brs_info)

train_pred_brs_info contains PORT and IP for following in order overall length of array should be 6 order - brs, pred, train.

To get detail of the object as string

public String toString()

Returns the detail of the MLHelper object as string.

To create a bucket to store all intermediate training and testing files

public int createBucket(String bucket_info)

All intermediate files, models or training/ testing related files are stored within BRS (bangdb resource server) in some bucket.

This creates a bucket as defined by the bucket_info which looks like following:

{access_key:, secret_key:, bucket_name:}

It returns -1 for error.

To create new bucket if doesn't exist otherwise update the bucket name to this name

public void setBucket(String bucket_info)

It returns -1 for error.

To upload training or prediction files

public long uploadFile(String key, String path, InsertOptions flag)

Key is the id for the file and fpath takes the path to the file including the file name.

To upload file in the given bucket

public long uploadFile(String bucketInfo, String key, String path, InsertOptions flag)

It returns -1 for error else 0 or more than 0.

This is to train a model we should call trainModel API. This API returns immediately and if successful then it schedules training of the model. User should call getModelStaus() for sometime until it returns the end status.

public int trainModel(String req)

It takes a training request and returns the status of the training request. The training request looks like following:

{
   "schema-name":"id",
   "algo_type":"SVM",
   "algo_param":{
      "svm_type":1,
      "kernel":2,
      "degree":3,
      "gamma":0.2,
      "cost":1.1,
      "cache_size":50,
      "probability":0,
      "termination_criteria":0.001,
      "nu":0.5,
      "coef0":0.1
   },
   "attr_list":[
      {
         "name":"a1",
         "position":1
      },
      {
         "name":"a2",
         "position":2
      }
   ],
   "training_details":{
      "training_source":"infile",
      "training_source_type":"FILE",
      "file_size_mb":110,
      "train_speed":1
   },
   "scale":"Y/N",
   "tune_param":"Y/N",
   "attr_type":"NUM/STR",
   "re_format":"JSON",
   "custom_format":{
      "name":"ts_rollup",
      "fields":{
         "ts":"ts",
         "quantity":"qty",
         "entityid":"eid"
      },
      "aggr_type":2,
      "gran":1
   },
   "model_name":"my_model1",
   "udf":{
      "name":"udf_name",
      "udf_logic":1,
      "bucket_name":"udf_bucket"
   }
}

To data from stream to train a model

public long uploadStreamDataForTrain(String req)

It returns -1 for error.

To get the status of the model when training request is fired

public String getModelStatus(String req)

Req input parameter is like following:

req = {"schema-name":, "model_name": }

And the return value is like following:

{"schema-name":, "model_name":, "train_start_ts":, "train_end_ts":, "train_state":}

The train_state actually tells the status of the model. The value for train_state are as following:

The above is true for ML related model status. For IE (Information Extraction) related model status use following:

To set the status of a model

public int setModelStatus(String req)

This sets the status for a particular train request. The req is as follows:

req = {"schema-name":, "model_name":, “status”: }

Upon success it returns 0 else -1 for error.

To delete a model

public int delModel(String req)

This is used to delete the model by passing req parameter

req = {"schema_name": ,"model_name": }

To delete the training request

public int delTrainRequest(String req)

This is to delete the training request. Helpful when training got stuck for some reasons and the status was not updated properly. Here is how req looks like:

req = {"schema-name":, "model_name": }

To predict for a particular data or event

public String predict(String req)

The req json looks like:

{schema-name, attr_type: NUM, data_type:event, re_arrange:N, re_format:N, model_name: model_name, data:"1 1:1.2 2:3.2 3:1.1"}

To predict for files only

public int predict_async(String req)

To get status of prediction request

public String getModelPredStatus(String req)

Given a request get the prediction status. The req is as follows:

req = {"schema-name":, "model_name": }

It returns NULL for error or errcode as -1 else errcode.

To delete prediction request

public int delPredRequest(String req)

Deletes the request. The input param req is as follows:

req = {"schema-name":, "model_name": “file_name”:}

It returns 0 for success and -1 for error.

To get list of all the training request

public ResultSet getTrainRequests(String req)

This returns all the training requests made so far for a schema. The prev_rs should be NULL for the first call and for subsequent calls, just pass the previous rs. Upon success it returns 0 else -1 for error.

To get training request from the ml housekeeping

public String getRequestDetail(String req)

It returns response with status or NULL for error or if req not found.

To get the buckets list for a user

public String listBuckets(String req)

This returns the list of all buckets for the user given by req which looks like following:

{"access_key":"akey", "secret_key":"skey"}

It may return NULL as well in case of error.

To get list of all buckets

public String listAllBuckets(String req)

To get list of object in a given bucket

public String listObjects(String req, String skey, int listSizeMB)

This returns a json string with the list of objects in a given bucket for a given key or for all keys (in case of skey is NULL). It may return NULL for error as well. list_size_mb defines the max size of the list, by default it would return 2MB of data or less.

To count number of models for a given schema

public long getModelCount(String req)

This counts the models for a given schema else returns -1 for error.

For admin settings

public int reinitMDM(String req)

To check if the BRS is local or its a distributed system

public boolean isBRSLocal()

Returns if BRS is local, useful for distributed mode or server.

To download a file from BRS

public long downloadFile(String bucketInfo, String key, String fname, String fpath)

The key is the name/id of the file to be downloaded and bucketinfo details information about the bucket from which the file has to be downloaded and fpath is the location on the local system where to download the file with name of the file as fname. It returns 0 for success else -1 for error.

To get object from a particular bucket

public byte[] getObject(String bucketInfo, String key)

It gets the object(binary or otherwise) from the given bucket, key. It returns 0 for success else -1 for error.

To get the number of buckets

public long countBuckets()

This returns a number of buckets else -1 for error.

To get the count of object in a particular bucket

public long countObjects(String bucket_info)

This counts the number of objects in the given bucket else returns -1 for error.

To get list of object in a particular bucket

public String countObjectsDetails(String bucket_info)

This gives the details of all the objects in the given bucket(bucket_info) else returns NULL for error.

To count the number of slices for a given key

public int countSlices(String bucket_info, String key)

BRS stores large files and objects in chunks, therefore we can count how many slices are there for the given file (key) by calling this function. It returns count of slices else -1 for error.

To delete a file from a particular bucket

public int delFile(String bucket_info, String key)

It returns 0 for success else -1 for error.

To delete or drop a bucket from BRS

public int delBucket(String bucket_info)

To close the BangDBMLHelper

public synchronized void closeMLHelper()

It returns 0 for success else -1 for error.