Conventions

A. Any file name must have model_name and account_id associated with it. For ex; model key would be following.

model_name__account_id [ joined by two underscores "__" ]

B. Training or prediction uses local dir /tmp/BRS_DATA for dealing with temporary files. We must create this folder on the server else training or prediction will fail

C. Training request returns immediately. This allows training to be non-blocking for client server case. However, training is blocking for embedded case since it should force user to wait.

D. Since the training is async and returns immediately therefore we must need mechanism to get the state of the training any given time. State of the training requests are maintained and could be queried by calling get_status api (as described above). Following are the states that ML infra would maintain within BangDB, these are self explanatory

  enum ML_BANGDB_TRAINING_STATE
    {
        ML_BANGDB_TRAINING_STATE_INVALID_INPUT = 10,
        ML_BANGDB_TRAINING_STATE_NOT_PRSENT,
        ML_BANGDB_TRAINING_STATE_ERROR_PARSE,
        ML_BANGDB_TRAINING_STATE_ERROR_FORMAT,
        ML_BANGDB_TRAINING_STATE_ERROR_BRS,
        ML_BANGDB_TRAINING_STATE_ERROR_TUNE,
        ML_BANGDB_TRAINING_STATE_ERROR_TRAIN,
        ML_BANGDB_TRAINING_STATE_LIMBO,
        ML_BANGDB_TRAINING_STATE_BRS_GET_PENDING,
        ML_BANGDB_TRAINING_STATE_BRS_GET_DONE,
        ML_BANGDB_TRAINING_STATE_REFORMAT_DONE,
        ML_BANGDB_TRAINING_STATE_SCALE_TUNING_DONE,
        ML_BANGDB_TRAINING_STATE_BRS_MODEL_UPLOAD_PENDING,
        ML_BANGDB_TRAINING_STATE_TRAINING_DONE,
        ML_BANGDB_TRAINING_STATE_DEPRICATED,
     };

E. User should use the helper class (bangdb_ml_helper) to deal with all sorts of things including uploading, downloading files etc. User should not use BRS for uploading or downloading files, BangDB model manager is aware of this and it takes care of all BRS interactions. Therefore it's very simple to just use 7-9 APIs given in the bangdb_ml_helper class and not worry about what to use etc.

F. Training request format for SVM

API - char *train_model(char *param_list);

  param_list = 
        {
          "account_id:id,
          "algo_type": "SVM",
          "algo_param": {"svm_type": 1, "kernel": 2, "degree": 3, "gamma": 0.2, "cost": 1.1, "cache_size": 50, 
          "probability": 0, "termination_criteria": 0.001, "nu": 0.5, "coef0": 0.1},
          "attr_list":[{"name":"a1", "position":1}, {"name":"a2", "position":2} ... ],
          "training_details":{"training_source": infile, "training_source_type": FILE, "file_size_mb": 110},
          "scale":Y/N,
          "tune_param": Y/N,
          "attr_type" : NUM/STR,
          "re_format":JSON,
          "model_name" :"my_model1"
        }

G. Prediction request for SVM

   API - char *predict(char *str, void *arg = NULL);

        str =

        ex1: {account_id, attr_type: NUM, data_type:event, re_arrange:N, re_format:N, model_name: model_name, 
              data:"1 1:1.2 2:3.2 3:1.1"}
        ex2; {account_id, attr_type: NUM, data_typee:FILE, re_arrange:N, re_format:N, model_name: model_name, 
              data:inputfile}
        ex3; {account_id, attr_type: NUM, data_type:event, re_arrange:N, re_format:JSON, model_name: model_name, 
              data:{k1:v1, k2:v2, k3:v3}}
              etc...