Information Extraction - Goal

Given a log file or line, product description URL or any related documents, create entity ontology graph using domain knowledge and store it for future retrieval and search.

  • Input – URL or any other doc in any format (pdf, doc, mail, etc.)
  • Output – JSON string containing ontology graph

Also do sentiment analysis for given text.

Information Extraction - High Level Items

  • Semantic entity extraction from unstructured or semi-structured texts
  • Linking of entities for annotation & ontology evolution
  • Train the knowledge base – domain specific [entities, classes etc.]
  • For generic text, leverage publicly available knowledge base [Freebase, Dbpedia etc.]
  • Dual algorithm model for better ontology results [CRF and SSVM]
  • Training and ontology creation infrastructure – Should be available as service hosted on-premise or on cloud
  • Various miscellaneous items – detail discussion in next slides

Information Extraction - Design

Deep Learning Diagram