IE for BangDB [Information Extraction and Ontology]
Information Extraction - Goal
Given a log file or line, product description URL or any related documents, create entity ontology graph using domain knowledge and store it for future retrieval and search.
- Input – URL or any other doc in any format (pdf, doc, mail, etc.)
- Output – JSON string containing ontology graph
Also do sentiment analysis for given text.
Information Extraction - High Level Items
- Semantic entity extraction from unstructured or semi-structured texts
- Linking of entities for annotation & ontology evolution
- Train the knowledge base – domain specific [entities, classes etc.]
- For generic text, leverage publicly available knowledge base [Freebase, Dbpedia etc.]
- Dual algorithm model for better ontology results [CRF and SSVM]
- Training and ontology creation infrastructure – Should be available as service hosted on-premise or on cloud
- Various miscellaneous items – detail discussion in next slides
Information Extraction - Design
