Graph and Cypher
Graph and Cypher in BangDB is quite powerful and allows users to deal with modern and complex use cases. BangDB natively integrates Graph with Stream, which makes in possible to ingest data in stream and keep growing Graph as well. With native AI integration, the data science becomes natural element for Graph. With simple Cypher queries, user can do much more and in real-time for several use cases.
Data in a graph table for BangDB is defined as triples. A triple contains subject, object and relationship (predicate) between them. All data is stored as triple within the DB. BangDB does clever arrangements and housekeeping to store the data such that various queries can be written and run efficiently.
The structure of the query is very similar to “Cypher”. BangDB uses Cypher-like queries to process the data. The basic structures look like following:
Query | Description |
---|---|
CREATE()-[]->() | for creating node or triple |
S=>()-[]->() | for querying data |
<op USING attr1 SORT_DESC attr2 LIMIT n> query1 ++ query2 | operation on disjoint sets of queries |
The "()" denotes subject or object and "[]" denotes relation (predicate) with "->" defining the direction. The arrangement is always "Subject Predicate Object".
The node has a label associated with it. Every node is written as "label:name".
There are basically following keywords associated with all the queries.
Node, entity creation
Query | Description |
---|---|
CREATE | to create a single node, or triple |
Running query and selecting data
Query | Description |
---|---|
S=> | namespace for the unit of query |
RETURN | selecting attributes for any query |
WHERE | conditions for the query |
AS | selecting columns/attributes with alias |
DATAQUERY | for filtering within node and relations for properties |
SORT_DESC | for sorting in descending order |
SORT_ASC | for sorting in ascending order |
LIMIT | for limiting number of selections |
Statistics
Query | Description |
---|---|
COUNT | count a particular entity using COUNT(attribute) or just number of rows using COUNT(*) |
UCOUNT | unique counting using probabilistic method (using hyperloglog) |
UCOUNT_ABS | unique counting in absolute manner |
AVG | average of any attribute |
MIN | min value |
MAX | max value |
STD | standard deviation |
SUM | sum |
EXKURT | ex-kurtosis |
SKEW | skewness |
Functional properties
Query | Description |
---|---|
SYMM | symmetric relations |
ASYMM | asymmetric relations |
DISTINCT | To get results for distinct key |
UNIQUE | unique pair wise [ sub and obj pair, for their ids ] |
UNIQUE_IN_CONTEXT | unique for given sub and obj, based on selection |
UNIQUE_SELECT | unique for all selected attributes, taken together |
Graph algos
Query | Description |
---|---|
ALL_PATH | all paths between any two given nodes |
SHORT_PATH | shortest path between any two given nodes |
Set operations
Query | Description | Venn diagram |
---|---|---|
ADD | adding two or more sets ( UNION ) | |
SUBTRACT | difference of two sets ( DIFFERENCE ) | |
DIFFERENCE | difference of sets | |
JOIN or CROSS | join product of two sets ( INTERSECT ) | |
LEFT_JOIN | left set and values of joined part of second set | |
RIGHT_JOIN | right set and values of joined part of first set | |
APPEND | append two sets row wise | |
PIPE | for piping (or sending) the first list to the second query |
Data Science
Query | Description |
---|---|
SIMILARITY | compute similarities among set of nodes based on various data |
CLUSTER | to find and natural clusters |
CENTRALITY | finding the node centrality |
COMMUNITY_DETECTION | for detecting several communities within graph |
GROUPS | finding several groups given properties |
ML_ALGO | this brings entire ML algorithms to the Graph, model name is supplied as well |
Deep Learning* | DNN, RNN, ResNet. Embeddable within graph |
Information Extraction* | Ontologies or triple generation through IE |
Data is processed from left to right. There could be several triples chained to form a query, like.
S1=>()-[]->()-[]->() …
Here in the above example, the first triple will intermediate-output a set of results, these intermediate-output will become input from subsequent processing etc. Therefore, it will keep evaluating from left to right using the intermediate results. The subject for subsequent chained query will be the intermediate result of the previous triple and so on.
In some cases, we would like to keep subject of the first triple as subject for the subsequent triple, then we can use the structure like following. This in contrast with the chain query, where object of the first triple becomes the subject of the second one and so on
S2=>[S1=>()-[]->()]-[]->() …
We will see the examples for these in subsequent sections.
Merger nodes (deep merge)
MERGE function allows us to merge two nodes deeply. Which means all the relationships (in or out) and properties of relationships, properties on node are all merged.
Secondary node is finally deleted and all relationships pointing towards and out of secondary nodes are also moved completely.
Primary node is the node in which the secondary node will get merged. Arrow ('<-') points towards primary node.
To merge node (label:id2) into (label:id1)
MERGE (label:id1)<-[*]-(label:id2)
To merge node (label:id1) into (label:id2)
MERGE (label:id1)-[*]->(label:id2)
'*' could be replaced with ;
KEEP_PRIM_PROP = same as '*', means keep the property of primary node in case of duplicate key
KEEP_SEC_PROP = keep secondary node property in case of clash
KEEP_BOTH = keep duplicate properties (both primary and secondary)
Delete node
DELETE function allows us to delete a node deeply. Which means all the relationships (in or out) and properties of relationships, properties on node are all deleted.
To delete node (label:id)
DELETE (label:id)
We will use BangDB CLI to perform these exercises. But before we go there, let's see how BangDB Cypher is different from the original Cypher.
Checkout a sample use cases here to learn bit more about Graph and Cypher in BangDB.
Checkout the graph document here.