IE for BangDB [Information Extraction and Ontology]

Information Extraction - Goal

Given a log file or line, product description URL or any related documents, create entity ontology graph using domain knowledge and store it for future retrieval and search.

Input – URL or any other doc in any format (pdf, doc, mail, etc.)
Output – JSON string containing ontology graph

Also do sentiment analysis for given text.

Information Extraction - High Level Items

Semantic entity extraction from unstructured or semi-structured texts
Linking of entities for annotation & ontology evolution
Train the knowledge base – domain specific [entities, classes etc.]
For generic text, leverage publicly available knowledge base [Freebase, Dbpedia etc.]
Dual algorithm model for better ontology results [CRF and SSVM]
Training and ontology creation infrastructure – Should be available as service hosted on-premise or on cloud
Various miscellaneous items – detail discussion in next slides

Information Extraction - Design