Stream Manager

Stream Manager - Before we go and describe the stream and how do define things and process, let's just look at the APIs that's required to do most of the stuff. Now let's get into the concept of BangDB Stream Manager.

Stream Manager is defined as set of attributes that together form a particular stream event and then operations that can be done on these events. Let's take an example of very simple case where we have few attributes are coming from different streams. For ex; temp, pressure data stream, etc. for any IOT case. Then we can define the stream as following:

{
   "schema":"myschema",
   "streams":[
      {
         "name":"temp_stream",
         "type":1,
         "swsz":81600,
         "inpt":[
            
         ],
         "attr":[
            {
               "name":"temp",
               "type":11,
               "stat":3
            },
            {
               "name":"point",
               "type":9
            },
            {
               "name":"sensor_name",
               "type":5,
               "kysz":18,
               "sidx":1,
               "stat":2
            }
         ]
      },
      {
         "name":"pressure_stream",
         "type":1,
         "inpt":[
            
         ],
         "attr":[
            {
               "name":"pressure",
               "type":11,
               "stat":3
            },
            {
               "name":"point",
               "type":9
            },
            {
               "name":"sensor_name",
               "type":5,
               "kysz":18,
               "sidx":1,
               "stat":2
            }
         ]
      }
   ]
}

This is basic structure to define the streams. Let's discuss more on this.

Schema

Since for any real use case, we will have to deal with more than single stream. Then we need to define the operations that would be done on these streams. Hence we will need to put all these streams and the operations on the stream data in some wrapper.

We use "schema" as a container for the streams and operations on them. This also allows us to isolate two different schemas within the system. You could also think of schema to segregate based on the different solutions or apps or users within the system, thereby kind of namespace which will ensure sanity across different structures

Stream

This is a collection of attributes for a particular data source. For ex: temperature sensor, payment transaction events, telecom cdr data, pizza order-delivery data etc. This is how we will define a stream in simple way:

"type" : type of the stream. Even though we ingest data in a stream, due to various processing of data, we would end up creating many other streams as well. There are following types of streams here, denoted by number.

type = 1 means normal direct or raw stream, the one for which we ingest data. This streams gets the data ingested from outside, through agents or any other means. Application simply sends data to this stream.

type = 2 means filter stream, the stream which gets data based on filter defined in the normal or other stream. Once data is ingested in the raw stream, there we may filter the data based on some condition and send data to this filter stream.

type = 3 means joined stream, the stream which gets data based on two or more joins of streams. Any two streams can join on some condition and output could be sent to this joined stream.

type = 4 means entity stream, the stream which gets data from various streams as long term profile data for the entity for long period of time (or forever). More on this later.

"swsz" : Size of sliding window. Each of these streams could also reside in a sliding window. This is not tumbling window but a continuous window, which is more appropriate for stream analytics. Tumbling window is restriction and BangDB doesn't deal with it. The swsz is number of seconds for the sliding window.

We can have as low as 1 sec to as large as many years. However, it's important to set this properly and not go below a day or hour. For use cases where we wish to analyse in 1 sec or 5 sec or 60 sec or min or hour, we can do that using the processing definition which we will discuss later. This swsz size is for the raw stream data and how long would user like to persist it.

Once it slides, data could be archived or simply discarded or sent to other integrated system. We will have more discussion on this later. Default value for the size is 86400 i.e. one day.