SIMILARITY

One of the major reasons for have graph structure in place is to be able to find the context and also entities & their relationships. If we have such arrangement in place, then based on the relationships we should be able to find the similarities between the entities. And if we know how close or far a given set of pair of entities are, certain actions could be taken for achieving some goals.

For example, if we are able to find two persons' degree of similarities then based on this, we could offer products to one or the other person with higher likelihood of conversion. Recommendations, serving ads, offering deals and discounts, personalization etc. As we see there are many use cases that could be enabled with this concept if we are able to do this efficiently.

Let's compute the similarities of persons based on their buying pattern here in this example:

CREATE GRAPH g3

USE GRAPH g3

CREATE (person:dan)-[BUYS {"amount": 1.2}]->(product:cookies)

CREATE (person:dan)-[BUYS {"amount": 3.2}]->( product:milk)

CREATE (person:dan)-[BUYS {"amount": 2.2}]->( product:chocolate)

CREATE (person:annie)-[BUYS {"amount": 1.2}]->( product:cucumber)

CREATE (person:annie)-[BUYS {"amount": 3.2}]->( product:milk)

CREATE (person:annie)-[BUYS {"amount": 3.2}]->( product:tomatoes)

CREATE (person:matt)-[BUYS {"amount": 3}]->( product:tomatoes)

CREATE (person:matt)-[BUYS {"amount": 2}]->( product:kale)

CREATE (person:matt)-[BUYS {"amount": 1}]->( product:cucumber)

CREATE (person:jeff)-[BUYS {"amount": 3}]->( product:cookies)

CREATE (person:jeff)-[BUYS {"amount": 2}]->( product:milk)

CREATE (person:brie)-[BUYS {"amount": 1}]->( product:tomatoes)

CREATE (person:brie)-[BUYS {"amount": 2}]->( product:milk)

CREATE (person:brie)-[BUYS {"amount": 2}]->( product:kale)

CREATE (person:brie)-[BUYS {"amount": 3}]->( product:cucumber)

CREATE (person:brie)-[BUYS {"amount": 0.3}]->( product:celery)

CREATE (person:elsa)-[BUYS {"amount": 3}]->( product:chocolate)

CREATE (person:elsa)-[BUYS {"amount": 3}]->( product:milk)

CREATE (person:john)-[BUYS {"amount": 5}]->( product:kale)

CREATE (person:john)-[BUYS {"amount": 2}]->( product:peanut)

CREATE (person:steve)-[BUYS {"amount": 7}]->( product:orange)

CREATE (person:steve)-[BUYS {"amount": 3}]->( product:mango)

Let's run the query for computations of similarity

In the example below, we return p.name AS person, note that label is also "person".(We must select 'person' here. The "p.name AS person" is required and the alias has to be same as the label, in this case 'person')

S{SIMILARITY}=>(@p person:*)-[@b BUYS]->(@c product:*); RETURN p.name AS person, b.amount AS amount, c.name AS product

{
   "errcode" : 0,
   "msg" : [
      "Successfully computed the similarity and updated the relations (use '_SIMILAR_' relation to retrieve the number)"
   ]
}

The above query used the graph data and implicitly trained KMEANS model to figure out several centroids. We can define how many such centroids to train for by providing the number along with SIMILARITY key. For example, to train for 10 centroids we may use following query.

S{SIMILARITY, 10}=>(@p person:*)-[@b BUYS]->(@c product:*); RETURN p.name AS person, b.amount AS amount, c.name AS product

The query also updates the graph by adding a relationship _SIMILAR_ (by default) or by the name as provided in the query. We can provide the name of the relationship as follows.

S{SIMILARITY(any_rel_name), 10}

Now, we can query and see the result using the relation's name that we provided or default (_SIMILAR_) if we didn't provide. In this case we didn't provide the similarity relation name hence we will use _SIMILAR_

S=>(@p person:*)-[@b _SIMILAR_]->(@c person:*); RETURN p.name AS person1, c.name AS person2, b.similarity AS similarity

+-------+----------+-------+
|person2|similarity|person1|
+-------+----------+-------+
|brie   |0.921595  | annie |
+-------+----------+-------+
|annie  |0.913827  | jeff  |
+-------+----------+-------+
|brie   |0.893442  | jeff  |
+-------+----------+-------+
|annie  |0.905641  | dan   |
+-------+----------+-------+
|brie   |0.905918  | dan   |
+-------+----------+-------+
|jeff   |0.987327  | dan   |
+-------+----------+-------+
|matt   |0.923687  | dan   |
+-------+----------+-------+
|elsa   |0.935471  | dan   |
+-------+----------+-------+
|annie  |0.991040  | matt  |
+-------+----------+-------+
|brie   |0.935316  | matt  |
+-------+----------+-------+
|jeff   |0.928872  | matt  |
+-------+----------+-------+
|annie  |0.957425  | elsa  |
+-------+----------+-------+
|brie   |0.872248  | elsa  |
+-------+----------+-------+
|jeff   |0.942420  | elsa  |
+-------+----------+-------+
|matt   |0.961484  | elsa  |
+-------+----------+-------+

Or, to view only the results where score is greater than 0.95

S=>(@p person:*)-[@b _SIMILAR_]->(@c person:*); RETURN p.name AS person1, c.name AS person2, b.similarity AS similarity WHERE similarity > 0.95

+-------+----------+-------+
|person2|similarity|person1|
+-------+----------+-------+
|jeff   |0.987327  | dan   |
+-------+----------+-------+
|annie  |0.991040  | matt  |
+-------+----------+-------+
|annie  |0.957425  | elsa  |
+-------+----------+-------+
|matt   |0.961484  | elsa  |
+-------+----------+-------+

S{SIMILARITY_TEST}=>(@p person:*)-[@b BUYS]->(@c product:*); RETURN p.name AS person, b.amount AS amount, c.name AS product

{
   "pairs":[
      {
         "similarity":0.93547131724717,
         "A":"dan",
         "B":"elsa"
      },
      {
         "A":"dan",
         "B":"matt",
         "similarity":0.923687462697141
      },
      {
         "B":"jeff",
         "A":"dan",
         "similarity":0.98732715961013
      },
      {
         "similarity":0.905641110794971,
         "A":"dan",
         "B":"annie"
      },
      {
         "similarity":0.905918283036807,
         "A":"dan",
         "B":"brie"
      },
      {
         "similarity":0.961483504509432,
         "A":"elsa",
         "B":"matt"
      },
      {
         "A":"elsa",
         "B":"jeff",
         "similarity":0.942420408659908
      },
      {
         "similarity":0.957425185618457,
         "B":"annie",
         "A":"elsa"
      },
      {
         "B":"brie",
         "A":"elsa",
         "similarity":0.872247612472379
      },
      {
         "similarity":0.928872132753044,
         "A":"matt",
         "B":"jeff"
      },
      {
         "similarity":0.991039526450287,
         "A":"matt",
         "B":"annie"
      },
      {
         "B":"brie",
         "A":"matt",
         "similarity":0.935316150022884
      },
      {
         "A":"jeff",
         "B":"annie",
         "similarity":0.913827151108844
      },
      {
         "similarity":0.89344184143706,
         "B":"brie",
         "A":"jeff"
      },
      {
         "similarity":0.921594885722905,
         "A":"annie",
         "B":"brie"
      }
   ],
   "label":"person"
}

This concludes short introduction for the Graph in BangDB, however, please go to https://bangdb.com/wp-content/uploads/2022/10/Graph-and-Cypher-BangDB-2.0.pdf for detail discussion and examples for the same.