Geo Spatial Index query for table or stream

BangDB allows geo-spatial index to be created on table or on streams. Once done then user can query the data using the index. It also allows user to augment the query with other constraints as required. For simplicity, it uses the similar query structure as it uses for most of the other types of queries Therefore we can query for something like:

  1. Find all entities within a distance of X meters from a point p (lat, lon)
  2. Find all entities within a box defined by bot-left and top-left rectangle
  3. Find all restaurants named “Starbucks coffee” within 1000 meters from my current location
  4. Find all book stores near coffee restaurant within 2000 meters from my current location etc..

To deal with geospatial index, BangDB takes the (lat,long), converts it into a geohash (string type) and stores the geohash in sorted manner using B+tree. Therefore it aligns well with the normal indexing model internally. We have two different ways to deal with geoindex, explicit way for a table and implicit way for stream.

Geo Spatial Index for table

Let's say we have a table which stores all the shops in a given city. We assign a unique (lat,long) to every shop. Now, in order to be able to run the query based on geoindex, we need to create the index. To do so, we can use TableEnv type for the same. We just need to set up following:

TableEnv tenv;
tenv.setKeyType(COMPOSITE_KEY);
tenv.setSortMethod(LEXICOGRAPH);
tenv.setAllowDuplicate(true);
// if we wish to have duplicate index
// now create the index on the table tbl.addIndex("geohash", &tenv);
// note geohash is just the field name which will contain the gephash of the lat,long
// that's it

Geo Spatial Index for stream

Here it's even simple. Following are the steps:

Let's say in a given stream we have "lat" and "lon" as attributes which contains latitude and longitude for a given location. Data/event is flowing into the db. Now we must create the geohash index and add it to the events in the incoming stream. To do so, we will use "catr".

{
    "name":"geohash",
    "type":5,
    "sidx":1,
    "opnm":"GEOHASH",
    "iatr":[
       "lat",
       "lon"
    ]
 }

Now to query the data we can use following JSON structure to set up the Geo Spatial filter.

{"location":{"lat":12.8282,"lon":77.5454},"index":"geohash","distance":1000}

Or

{"location":{"lat":12.8282,"lon":77.5454},"index":"geohash","distance":1000,"neighbors":1} 
 // uses neighbors or
{"box":{"left_bot":{"lat":12.8282,"lon":77.5454},"right_top":{"lat":12.9282,"lon":77.9454}},"index":"geohash"}

This is as simple as that. Now to add further filter, we can use the dataQuery to add filter and then use "addGeoQuery" function to add the above json structure Same scan apis will be used to do the query. Same way we can also query stream, using scan API.

Query from CLI

It's rather simple for query from the cli; look at few examples below.

// query a stream 
  select * from schema1.stream1 where shopname = "starbucks coffee" and area = "MG Road" and geofilter {"location"{"lat":12.2345,"lon":77.2123},"index":"geohash","distance":1000}
//or to simply count, we can use count(*) 
  select * from schema1.stream1 ...
// to query a table, we use exactly same format except we use table name instead of schema.stream
  select * from mytable where shopname = "starbucks coffee" and area = "MG Road" and geofilter {"location"{"lat":12.2345,"lon":77.2123},"index":"geohash","distance":1000}