LaVA™

Statistical tool for extracting concepts and patterns through natural language processing

LaVA is Aptima's platform for text analytics and data mining, focusing on unsupervised topic extraction. Topics are statistical relationships between documents and terms and can represent the conceptual meaning or the “gist” of free text documents in any language. LaVA provides a complete pipeline for the analysis of text data.  LaVA starts from text in a database and supports a variety of preprocessing procedures for that text, including statistical multiword term extraction. LaVA then trains a latent variable model which produces topics that relate the terms and documents. These trained models are accessible via web services in order to obtain document similarity, to search documents, and to create time series of topic activities. In addition, once LaVA has turned the words into numbers, many other analyses are possible.  For example, Aptima has put LaVA “inside” other code to analyze:

  • the temporal patterns in newspaper articles to forecast the likelihood of civil unrest
  • the documents that employees read (e.g., scientific journal articles), in order to suggest who should collaborate with whom
  • the flow of ideas (electronic memes) between newspapers and blogs in Pakistan. 

LaVA has been developed by Aptima under a number of SBIR projects and is subject to SBIR data rights.