Project Modules
This project has declared the following modules:
Name | Description |
---|---|
TextProcMain | The TextProc main and entity classes module. |
TextProcStep | The TextProc processing step API. |
TextProcLogging | The TextProc logging facilities. |
TextProcPersistence | The TextProc persistence access layer, using Java Persistence API. |
AbstractTppTextProcStep | This module provides a skeletal implementation of a TextProc processing step in the form of an abstract class, to keep code DRY and consistent between steps. |
TppTokenizationTextProcStep | Processing step for TextProc that tokenizes its input documents via Text Processing Python. |
TppStopwordFilteringTextProcStep | Processing step for TextProc that removes stopwords from the input documents, via Text Processing Python. |
TppLemmatizationTextProcStep | Processing step for TextProc that lemmatizes the input document tokens, separated by spaces. |
CoreNLPTokenizationTextProcStep | Processing step for TextProc that tokenizes its input documents via Stanford CoreNLP. |
CoreNLPLemmatizationTextProcStep | Processing step for TextProc that lemmatizes each token of its input documents via Stanford CoreNLP. |
CoreNLPEntityExtractionTextProcStep | Processing step for TextProc that extracts new named entities from documents, from seed sets of entities, using bootstrapped pattern-based learning. |
CoreNLPKnowledgeBasePopulationTextProcStep | Processing step for TextProc that populates a knowledge base stored in Apache Jena's TDB2 format, using the NER, OpenIE and sentiment annotation facilities included with CoreNLP. |
MentionFilteringTextProcStep | Processing step for TextProc that removes Reddit mentions from the input documents. |
EmptyFilteringTextProcStep | Processing step for TextProc that doesn't copy as processed documents the input documents which are empty of meaning. |
LuceneIndexTextProcStep | Processing step for TextProc that builds a Lucene index for the input documents. |
Apache Lucene (uber JAR) | TextProc is an automated text processing tool that efficiently and flexibly applies NLP to input documents in a relational database. |
EJML (uber JAR) | TextProc is an automated text processing tool that efficiently and flexibly applies NLP to input documents in a relational database. |