Introduction
The document contains the details of Graph implementation for Tvam within BangDB. The following details are covered in the document
- The Entities and their properties
- The relationship between entities and their properties
- Efficiency of this Graph model
- Flexibility and how to edit/update the graph model for future changes
- Cluster and Similarities, and the properties used for creating them
- Query templates and examples
- Query benchmarking for ~300K data in the schema [ using REST QUERY]
The document provides a detail for each entity and relation with several properties that are attached to these. Sample view of the part of the graph is also given everywhere. In the end the overall graph is also printed.
The graph structure needs to be flexible and extensible in nature with logical hierarchy also preserved as far as possible. And above this, it is also very important that the graph scales well and performance for both read and write is high. Further, we must also ensure that data preserves the natural groups, cliques, segments etc. Finally, we must also ensure that the Graph uses nomenclatures which is natural to the domain and aligns with RDF models.
To ensure that, these flexibilities are there in the structure, the document also lists several queries that might be performed in production and to enable the use cases. The queries are benchmarked for REST call (which is most conservative and would kind of provide the baseline). The results of the queries are also depicted pictorially for clarity. Further, these queries can be extended or more can be added as needed. As of now it lists over 40 such query templates with over 100 possible variations for different permutations and combinations for different contexts. This is to ensure that there is enough flexibility in the structure to query in many ways.
Note: The Graph, and related details (including queries) are subject to the receipt and test on actual data. These may change once we get the data, but the current structure should take care of the change easily
Entities, relationships, and their properties
- 1. Person
- CustRefID (Customer ID and TvamCustRefID): Node
Nullable | No |
---|---|
Properties | LangType, Gender, DOB, AgeAtRegistration, RegistrationDay, RegistrationMonth, RegistrationYear, CreatedDate |
Node Modifiability | No |
Property Modifiability | Yes |
Property Extensibility | Yes |
MemberID (FamilyMemberID and FamilyMemID): Node
Nullable | No |
---|---|
Properties | Gender, DOB |
Node Modifiability | No |
Property Modifiability | Yes |
Property Extensibility | Yes |
HAS_FAMILY_MEMBER: Relation between CustRefID and MemberID
Properties | RelationType |
---|---|
Modifiability | No |
Extensibility | Yes |
data:image/s3,"s3://crabby-images/e621e/e621e7245af838fd6cd551de7187202c6c462785" alt="image Analysis"
IS_IN_STATE: Relation between CustRefID to State
Properties | RelationType |
---|---|
Modifiability | No |
Extensibility | Yes |
data:image/s3,"s3://crabby-images/e621e/e621e7245af838fd6cd551de7187202c6c462785" alt="image Analysis"
IS_IN_STATE: Relation between CustRefID to State
Properties | District |
---|---|
Modifiability | No |
Extensibility | Yes |
HAS_PINCODE: Relation between CustRefID to Pincode
Properties | |
---|---|
Modifiability | No |
Extensibility | Yes |
data:image/s3,"s3://crabby-images/28725/2872567fe30e0fe689cc83a84295f7ab8ef105b0" alt="image Analysis"
Few important queries [ detail queries are at the end of the document]
- a. Find all members of the family for a given person
- b. Find all customers in a given cluster/area (city/taluk/geo-loc)/similar customers
- c. Find relative (fathers/mothers…) of all the registered customer in a given cluster/area (city/taluk/geo-loc)/similar customers
- d. Find all 1st/2nd degree connections for a given customer
- e. Add further filters in above queries (age range, gender, education etc.)
2. Address
- AddRefID: Node
There is a node for every AddRefID because a customer can have more than one address (such as HOME, OFFICE) and for every address we will have latitude and longitude associated with it.
- HAS_ADDRESS: Relation between CustRefID (Person) and AddRefID (Address Node)
Nullable | No |
---|---|
Properties | AddressType |
Modifiability | No |
Extensibility | Yes |
data:image/s3,"s3://crabby-images/6ed94/6ed94197786861e9774adac52e91247c0eed2be5" alt="image Analysis"
Few important queries [ detail queries are at the end of the document]
- f. Find all addresses for a given customer. Or find an office address for a customer.
- g. Find all customers for a given office address?
- h. Find all customers whose office and home addresses are same
- i. Find all customers whose office addresses are missing
- j. Find all customer living in a given radius for a given pair of Lat, Lon
3. Locality
- Locality (Ex: Village name): Node
Nullable | Yes |
---|---|
Properties | |
Node Modifiability | No |
Property Modifiability | Yes |
Property Extensibility | Yes |
HAS_LOCALITY: Relation between CustRefID and Localitys
Properties | |
---|---|
Modifiability | Yes |
Extensibility | Yes |
data:image/s3,"s3://crabby-images/933e0/933e0fe4eaa1c6493443dd0fbaa7d93d7a8e2605" alt="image Analysis"
Few important queries [ detail queries are at the end of the document]
- k. Find all customers for a given Locality
- l. Find all customers for whom Locality is not given
4. Pincode
- PinCode: Node
Nullable | Yes |
---|---|
Properties | Taluk |
Node Modifiability | Yes |
Property Modifiability | Yes |
Property Extensibility | Yes |
HAS_PINCODE: Relation from AddRefID and Locality to Pincode
Properties | |
---|---|
Modifiability | Yes |
Extensibility | Yes |
The “taluk” is used as a property of node Pincode and not as individual node due to following reasons
- One taluk can have more than one Pincode
- Not every customer provides taluk.
- The taluk is a local administration unit consisting of a few villages or a few wards (taluk’s definition differs from state to state). So, using it as a property is better than as a node.
data:image/s3,"s3://crabby-images/7b9a0/7b9a0bcc49227b7f7c2eb4f0055682fda05ace28" alt="image Analysis"
How to handle null pincode?
Pincode node will be created, and this will be linked with state. The node will have Taluk as property. Since the pincode was null, therefore we will use “Lat, Lon” to figure out the pincode. If “Lat, Lon” is also null, then we can use “taluk” info for the same. If “taluk” is also null, then we may use the central pin of the city/district. Since pincode is editable/updateable node, therefore this can be updated whenever we are able to get it or compute it.
Few important queries [ detail queries are at the end of the document]
- m. Find all customers for a given pincode
- n. Find all customers within a X radius of given pincode
- o. Find all customers who has purchased a product (or a given product) for a given pincode or within radius of X KM for a pincode
- p. Find similar people (based on any criteria) for a neighborhood of a given pincode
5. State
- • State: Node
Nullable | No |
---|---|
Properties | Country |
Node Modifiability | Yes |
Property Modifiability | Yes |
Property Extensibility | Yes |
- IS_IN_STATE: Relation from Pincode to State [ Inversion]
Properties | District, City |
---|---|
Modifiability | Yes |
Extensibility | Yes |
• IS_IN_STATE: Relation from Pincode to State [ Inversion]
Properties | District, City |
---|---|
Modifiability | Yes |
Extensibility | Yes |
• IS_IN_STATE: Relation from CustRefID to State
Properties | District |
---|---|
Modifiability | Yes |
Extensibility | Yes |
data:image/s3,"s3://crabby-images/2817e/2817ec819a95b7d7d72466377c4029e268455b95" alt="image Analysis"
Few important queries [ detail queries are at the end of the document]
- q. Find all active pin codes for a given state
- r. Find all customers for a given state (who has or hasn’t purchased any product)
- s. Find all similar profile people (based on any criteria) who are in a given state
6. Profile
- • CustProfile: Node
Nullable | No |
---|---|
Properties | Family Structure, Education, Profession, Marital Status, Children, SchoolGoingKids, NoOfSeniorCitizen, Dependent, Income, Business Category, Updated Date |
Node Modifiability | Yes |
Property Modifiability | Yes |
Property Extensibility | Yes |
CustProfile as a separate node to represent a customer’s general information which we get from surveys (Question Response), and we know that not every customer takes surveys so it’s better to have a node that represents a customer profile.
- PurchaseProfile: Node
Nullable | No |
---|---|
Properties | InsuranceCount, TotalInsuranceSpend, PersonalLoans, LoanAmount, UPIPayments, UPICount, UPIAmount |
Node Modifiability | Yes |
Property Modifiability | Yes |
Property Extensibility | Yes |
Properties | |
---|---|
Modifiability | Yes |
Extensibility | Yes |
HAS_PURCHASE_PROFILE: Relation between CustRefID node and PruchaseProfile node
data:image/s3,"s3://crabby-images/28f9d/28f9da02dba0d0fcf7af58f4eb99207e9a171e84" alt="image Analysis"
Few important queries [ detail queries are at the end of the document]
- t. Find all people similar to given profile (uses profile-based similarity)
- u. Find all people who has their profile similar to a given person (same as above except it takes PersonID)
- v. Add various filters (edu, prof, etc.)
7. DoctorConsultation
- • TvamRefNo: Node
Nullable | No |
---|---|
Properties | AgeAtEC, ConsultationDate, PaymentStatus, TvamConsultationId, ConsultationFee, ECregistrationDay, ECregistrationMonth, ECregistrationYear, Status, CreatedDate |
Node Modifiability | Yes |
Property Modifiability | Yes |
Property Extensibility | Yes |
TvamRefNo is used as node instead of ConsultationId because there can be cases where ConsultationId is not present.
- CONSULTS: Relation between CustRefID and TvamRefID
Properties | |
---|---|
Modifiability | Yes |
Extensibility | Yes |
data:image/s3,"s3://crabby-images/bdbfc/bdbfcdca3d37942e57c4cbba142e841123325926" alt="image Analysis"
Few important queries [ detail queries are at the end of the document]
- w. Find all persons who have consulted doctors (and num of times for each)
- x. Find all persons who have consulted doctors (and num of times for each) in a given geoloc (city, taluk or even pincode)
- y. Find all persons who have booked the consultations but not yet consulted
- z. Find all persons who have consulted Doctors but not yet purchased insurance
- aa. Find persons similar to a given person who has consulted doctor but not purchased insurance yet
- bb. Find different policies for insurances and their providers where people have consulted the Doctors most (top 3 or 5)
- cc. Find all aging fathers (different relatives) of persons similar to a given person in an area who has taken insurance
8. Doctor
- DoctorID: Node
Nullable | No |
---|---|
Properties | DoctorName |
Node Modifiability | Yes |
Property Modifiability | Yes |
Property Extensibility | Yes |
- • ADVICES: Relation between DoctorID and TvamRefNo
Properties | |
---|---|
Modifiability | Yes |
Extensibility | Yes |
data:image/s3,"s3://crabby-images/7d6bc/7d6bc42e76b186651ed6e24b11b7a8ab2fcd2367" alt="image Analysis"
9. PolicyVendor
- VendorID: Node
Nullable | No |
---|---|
Properties | VendorCode |
Node Modifiability | Yes |
Property Modifiability | Yes |
Property Extensibility | Yes |
PolicyID: Node
Nullable | No |
---|---|
Properties | CreatedDate, Insurance Type, Cover Type, Sum Insured, Premium Amount, MaxMembers, Policy Tenure Type, Policy Tenure Values, Policy Region, MinMembers, Allowed Relationships |
Node Modifiability | Yes |
Property Modifiability | Yes |
Property Extensibility | Yes |
• PROVIDED_BY: Relation between PolicyID and VendorID
Properties | Modifiability | Extensibility |
---|---|---|
Property 1 | Yes | Yes |
Property 2 | Yes | Yes |
Property 3 | Yes | Yes |
data:image/s3,"s3://crabby-images/023f0/023f0701fc9460b1b5b05c8e6137e4004a3fb489" alt="image Analysis"
11. Insurance • InsuranceI
- • InsuranceID: Node (for all Insurances)
Nullable | No |
---|---|
Properties | AgeAtInsurance, Policy Type, Policy Status, Policy Creation Month, Policy Creation Year, Policy Maturity Month, Policy Maturity Year |
Node Modifiability | Yes |
Property Modifiability | Yes |
Property Extensibility | Yes |
There is a node for every insurance purchased. This node is connected to the customer and the policy.
- COVERS: Relation between InsuranceID and MemberID
Properties | Modifiability | Extensibility |
Yes | Yes | Yes |
- IS_FOR_POLICY: Relation between InsuranceID and PolicyID
Properties | Modifiability | Extensibility |
---|---|---|
Properties | Yes | Yes |
• IS_IN_STATE: Relation between InsuranceID and State
Properties | Modifiability | Extensibility |
District | Yes | Yes |
HAS_PINCODE: Relation between InsuranceId to Pincode
Properties | District |
Modifiability | Yes |
Extensibility | Yes |
data:image/s3,"s3://crabby-images/27471/27471f89dece27461605419f4d1f9a1183f75bae" alt="image Analysis"
12. Loan
- LoanApplicationId: Node (For all Types of Loans)
Nullable | No |
---|---|
Properties | AgeAtLoanRegistration, DateOfBirth, Gender, Marital Status, Education Qualification, Purpose of Loan, Related Loan Details, Employment Details, Loan Registration Day, Month, and Year |
Node Modifiability | Yes |
Property Modifiability | Yes |
Property Extensibility | Yes |
We have 2 types of loans
- Personal and,
- Business Loan
There is a node for every loan application type. These nodes are updated when Loan application status changes.
Since the Tvam app customer and actual loan customer may be different therefore, we have Gender, Marital Status, and DOB as node properties.
- APPLIES_FOR: Relation between CustRefID and LoanApplicationID
Properties | Modifiability | Extensibility |
Yes | Yes | Yes |
- IS_IN_STATE: LoanApplicationId to State
Properties | District |
Modifiability | Yes |
Extensibility | Yes |
- HAS_ADDRESS: LoanApplicationId to AddressId
Properties | Modifiability | Extensibility |
Yes | Yes | Yes |
data:image/s3,"s3://crabby-images/54749/54749bc70139b7800e64af2eb86488153fcd9f57" alt="image Analysis"
Few important queries [ detail queries are at the end of the document]
- Find all persons who have taken loans in a given area (pincode, geo-loc, city, taluk, state)
- Find all persons who similar to persons who have taken loan in a given area
- ff. Find all ageing fathers (or any relative) of persons similar to a given person in an area who has taken loan (may add filters like loan-amount > X or duration< Y years etc.)
13. FinancialInstitute
- • BankName: Node
Nullable | No |
---|---|
Properties | |
Node Modifiability | Yes |
Property Modifiability | Yes |
Property Extensibility | Yes |
FROM_INSTITUTION: Relation between LoanApplicationId to BankName
APPLIES_FOR: Relation between CustRefID and LoanApplicationId
Properties | Modifiability | Extensibility |
Yes | Yes | Yes |
data:image/s3,"s3://crabby-images/91c25/91c256a6cc880df9c9f6d7be225c4b47a2f65b59" alt="image Analysis"
Few important queries [ detail queries are at the end of the document]
- gg. Find all customers who have applied for a loan (for any INSTITUTION or a given one) from a given geo-loc/area/Locality/pincode/geo-distance (further filter for customer profile like age, loan amount etc.)
- hh. Top K banks who have offered loans to customers in a given area (further ass filter like, loan amount more than X etc.)
14. Transaction
- TransactionId: Node (For MPR, UPI transfers)
Nullable | No |
---|---|
Properties | Amount, Payment Type, Transaction Type, PaymentTypeCategory, PayeePaymentAddress, MerchantCatCode, TransactionHr, TransactionDay, TransactionMonth, TransactionYear, TransactionDate |
Node Modifiability | Yes |
Property Modifiability | Yes |
Property Extensibility | Yes |
Note: Properties differ a little in the case of MPR (How?)
- PAYS_FOR: Relation between CustRefID and TransactionId
- PAYS_TO: Relation between TransactionId and Operator
Properties | Modifiability | Extensibility |
Yes | Yes | Yes |
data:image/s3,"s3://crabby-images/4e823/4e823eb9517eb39e44414852d4d569c5b147b0dc" alt="image Analysis"
Graph Diagram
data:image/s3,"s3://crabby-images/3c171/3c171982aae0b7bcc1b4227e4119aefeb19dba7b" alt="image Analysis"