Package es.uvigo.esei.sing.textproc.step
Class AbstractProcessingStep
java.lang.Object
es.uvigo.esei.sing.textproc.step.AbstractProcessingStep
- All Implemented Interfaces:
es.uvigo.esei.sing.textproc.step.internal.ProcessingStepInterface
- Direct Known Subclasses:
AbstractTppProcessingStep
public abstract class AbstractProcessingStep extends java.lang.Object implements es.uvigo.esei.sing.textproc.step.internal.ProcessingStepInterface
Contains parameter validation logic common to processing steps, reducing the
effort needed to implement the
ProcessingStepInterface
interface and
ensuring all processing steps behave in a consistent manner.- Author:
- Alejandro González García
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
AbstractProcessingStep.NullProcessingConsumer<T>
A processing consumer that does nothing.static interface
AbstractProcessingStep.ProcessingConsumer<T>
A consumer of data to be processed, which can throw a checkedProcessingException
. -
Field Summary
Fields Modifier and Type Field Description protected static java.lang.String
BATCH_SIZE_STEP_PARAMETER_NAME
protected static java.lang.String
DATA_ACCESS_EXCEPTION_MESSAGE
protected static java.lang.String
DEFAULT_BATCH_SIZE_STEP_PARAMETER
The default batch size.protected static java.lang.String
DEFAULT_PAGE_SIZE_STEP_PARAMETER
The default page size.protected static java.lang.String
DEFAULT_PRIMARY_KEY_COLUMN_PROCESSING_STEP_PARAMETER
protected static java.lang.String
DEFAULT_TEXT_COLUMN_PROCESSING_STEP_PARAMETER
protected static java.lang.String
DEFAULT_TITLE_COLUMN_PROCESSING_STEP_PARAMETER
protected java.util.List<java.util.function.Supplier<java.lang.Long>>
numberOfUnprocessedEntitiesProviders
Suppliers that count how many unprocessed entities of a type there are.protected static java.lang.String
PAGE_SIZE_STEP_PARAMETER_NAME
protected static java.lang.String
PRIMARY_KEY_COLUMN_PROCESSING_STEP_PARAMETER_NAME
protected static java.lang.String
TEXT_COLUMN_PROCESSING_STEP_PARAMETER_NAME
protected static java.lang.String
TEXT_DOCUMENT_TABLE_NAME_PROCESSING_STEP_PARAMETER_NAME
protected static java.lang.String
TEXT_DOCUMENT_WITH_TITLE_TABLE_NAME_PROCESSING_STEP_PARAMETER_NAME
protected static java.lang.String
TITLE_COLUMN_PROCESSING_STEP_PARAMETER_NAME
protected java.util.List<java.lang.String[]>
unprocessedDocumentsAttributes
The non primary key attribute names of all unprocessed document types, in the same order asunprocessedDocumentTypesNames
.protected java.util.List<java.util.function.Supplier<? extends javax.persistence.Query>>
unprocessedDocumentsQuerySuppliers
The native query suppliers for all the known types of unprocessed documents, in the same order asunprocessedDocumentTypesNames
.protected java.util.List<java.lang.String>
unprocessedDocumentTypesNames
The names of all the unprocessed document types. -
Constructor Summary
Constructors Modifier Constructor Description protected
AbstractProcessingStep(@NonNull java.util.Map<java.lang.String,java.util.function.Predicate<java.lang.String>> validationPredicates, @NonNull java.util.Set<java.lang.String> requiredParameters)
Constructs a new abstract processing step, with the given parameter validation predicates and required parameters. -
Method Summary
Modifier and Type Method Description protected java.lang.String
buildUnprocessedDocumentSelectStatement()
Constructs the SELECT SQL statement for retrieving unprocessed text documents without title from a database, using native queries.protected java.lang.String
buildUnprocessedDocumentWithTitleSelectStatement()
Constructs the SELECT SQL statement for retrieving unprocessed text documents with title from a database, using native queries.protected <T extends ProcessedDocument>
voiddeleteAllProcessedDocumentsOfType(@NonNull java.lang.Class<T> documentType)
Deletes all the processed documents of a given type from the database.void
execute(@NonNull java.util.Map<java.lang.String,java.lang.String> parameters)
Executes the processing step implemented by this object, with the given parameters.protected void
forEachDocumentInNativeQuery(@NonNull java.util.function.Supplier<? extends javax.persistence.Query> querySupplier, @NonNull java.lang.String taskName, long numberOfDocuments, @NonNull AbstractProcessingStep.ProcessingConsumer<java.util.List<java.lang.String[]>> action, java.lang.Runnable pageEndAction)
Executes the given action for each batch of documents retrieved by a native JPA query.protected java.util.Map<java.lang.String,java.lang.String>
getParameters()
Returns the parameters provided by the user for this step.protected long
getUnprocessedDocuments()
Returns the number of unprocessed text documents without title in the database.protected long
getUnprocessedDocumentsWithTitle()
Returns the number of unprocessed text documents with title in the database.protected abstract void
run()
Executes the processing step implemented by this object.protected void
saveProcessedDocument(@NonNull java.lang.Class<? extends ProcessedDocument> documentType, int primaryKey, @NonNull java.util.Map<java.lang.String,java.lang.String> processedAttributes)
Stores a processed document in the database, from its processed attributes.
-
Field Details
-
PAGE_SIZE_STEP_PARAMETER_NAME
-
BATCH_SIZE_STEP_PARAMETER_NAME
-
TEXT_DOCUMENT_WITH_TITLE_TABLE_NAME_PROCESSING_STEP_PARAMETER_NAME
protected static final java.lang.String TEXT_DOCUMENT_WITH_TITLE_TABLE_NAME_PROCESSING_STEP_PARAMETER_NAME -
TEXT_DOCUMENT_TABLE_NAME_PROCESSING_STEP_PARAMETER_NAME
-
PRIMARY_KEY_COLUMN_PROCESSING_STEP_PARAMETER_NAME
-
TEXT_COLUMN_PROCESSING_STEP_PARAMETER_NAME
-
TITLE_COLUMN_PROCESSING_STEP_PARAMETER_NAME
-
DEFAULT_PAGE_SIZE_STEP_PARAMETER
The default page size. Increase for optimal performance until memory usage, DB commit performance or transaction commit frequency are an issue. Ideally, the page size should be a multiple of the batch size.- See Also:
- Constant Field Values
-
DEFAULT_BATCH_SIZE_STEP_PARAMETER
The default batch size. The documents in a page will be divided in batches with this many documents, as much as possible. The documents in a batch will be processed together, in the same thread.- See Also:
- Constant Field Values
-
DEFAULT_PRIMARY_KEY_COLUMN_PROCESSING_STEP_PARAMETER
- See Also:
- Constant Field Values
-
DEFAULT_TEXT_COLUMN_PROCESSING_STEP_PARAMETER
- See Also:
- Constant Field Values
-
DEFAULT_TITLE_COLUMN_PROCESSING_STEP_PARAMETER
- See Also:
- Constant Field Values
-
DATA_ACCESS_EXCEPTION_MESSAGE
- See Also:
- Constant Field Values
-
unprocessedDocumentTypesNames
The names of all the unprocessed document types. -
unprocessedDocumentsQuerySuppliers
protected final java.util.List<java.util.function.Supplier<? extends javax.persistence.Query>> unprocessedDocumentsQuerySuppliersThe native query suppliers for all the known types of unprocessed documents, in the same order asunprocessedDocumentTypesNames
. -
unprocessedDocumentsAttributes
The non primary key attribute names of all unprocessed document types, in the same order asunprocessedDocumentTypesNames
. -
numberOfUnprocessedEntitiesProviders
protected final java.util.List<java.util.function.Supplier<java.lang.Long>> numberOfUnprocessedEntitiesProvidersSuppliers that count how many unprocessed entities of a type there are. The list is in the same order asunprocessedDocumentTypesNames
.
-
-
Constructor Details
-
AbstractProcessingStep
protected AbstractProcessingStep(@NonNull @NonNull java.util.Map<java.lang.String,java.util.function.Predicate<java.lang.String>> validationPredicates, @NonNull @NonNull java.util.Set<java.lang.String> requiredParameters)Constructs a new abstract processing step, with the given parameter validation predicates and required parameters. Common validation parameters will be added automatically.- Parameters:
validationPredicates
- The validation predicates to use to validate the parameters, including optional ones.requiredParameters
- A set of parameter names whose presence is required.- Throws:
java.lang.IllegalArgumentException
- If any parameter isnull
.
-
-
Method Details
-
execute
public final void execute(@NonNull @NonNull java.util.Map<java.lang.String,java.lang.String> parameters) throws ProcessingExceptionExecutes the processing step implemented by this object, with the given parameters. No guarantees are made about whether a JPA transaction is already opened when this method is invoked.- Specified by:
execute
in interfacees.uvigo.esei.sing.textproc.step.internal.ProcessingStepInterface
- Parameters:
parameters
- An unmodifiable, non-null map of non-null keys which contains all parameters specified by the user. The values may be null or not be appropriate for this step.- Throws:
ProcessingException
- If an exception occurs during execution (including ifparameters
isnull
).
-
getParameters
Returns the parameters provided by the user for this step.- Returns:
- An unmodifiable, non-null map with the step parameters, where the keys are the parameter names.
-
forEachDocumentInNativeQuery
protected final void forEachDocumentInNativeQuery(@NonNull @NonNull java.util.function.Supplier<? extends javax.persistence.Query> querySupplier, @NonNull @NonNull java.lang.String taskName, long numberOfDocuments, @NonNull @NonNull AbstractProcessingStep.ProcessingConsumer<java.util.List<java.lang.String[]>> action, java.lang.Runnable pageEndAction) throws ProcessingExceptionExecutes the given action for each batch of documents retrieved by a native JPA query. For the purposes of this method, native JPA queries are to be used when the queried table doesn't have a configured mapping entity. Therefore, it returns the value of the queried columns as strings, not performing any relational to object mapping beyond that.To maximize performance, the processing action may be executed in any thread, so thread-safety must be guaranteed in its implementation if shared state is to be accessed.
This method assumes a transaction is already active.
- Parameters:
querySupplier
- A query supplier, that must return an appropriate, non-null native query object when invoked. This allows recreating the query object when needed.taskName
- The name of the task that will be performed with the documents. It will be shown to the user.numberOfDocuments
- The total number of documents that will be processed by the action. It must be zero or greater.action
- The action to execute for every batch of documents. A batch contains at least one document. The list supplied to the consumer is not modifiable.pageEndAction
- The action to execute after a document page is processed, if processing is successful. It might benull
, in which case nothing will be done. In any case, no matter if processing is successful or not, any database transactions made by calling methods of this class are committed or rolled back before invoking this action.- Throws:
ProcessingException
- If any parameter is invalid, or an exception occurred during the processing.
-
saveProcessedDocument
protected final void saveProcessedDocument(@NonNull @NonNull java.lang.Class<? extends ProcessedDocument> documentType, int primaryKey, @NonNull @NonNull java.util.Map<java.lang.String,java.lang.String> processedAttributes) throws ProcessingExceptionStores a processed document in the database, from its processed attributes. This method starts and commits or rollbacks a JPA transaction, if no transaction is already active.- Parameters:
documentType
- The type of document that is being processed, and will be stored. It will be instantiated via reflection, so the module containing the type definition must open its package for deep reflection to the module containing this code.primaryKey
- The primary key of the processed document.processedAttributes
- The processed attributes of the document. Their names (keys) must match the attributes of the concrete document type.- Throws:
ProcessingException
- If some error occurs during the operation.java.lang.IllegalArgumentException
- If any parameter isnull
.javax.persistence.PersistenceException
- If some data access error occurs.
-
buildUnprocessedDocumentWithTitleSelectStatement
Constructs the SELECT SQL statement for retrieving unprocessed text documents with title from a database, using native queries.- Returns:
- The described statement.
-
buildUnprocessedDocumentSelectStatement
Constructs the SELECT SQL statement for retrieving unprocessed text documents without title from a database, using native queries.- Returns:
- The described statement.
-
getUnprocessedDocumentsWithTitle
Returns the number of unprocessed text documents with title in the database. This method assumes a transaction is already active.- Returns:
- The described number.
- Throws:
javax.persistence.PersistenceException
- If some error occurs while executing SQL statements in the database.
-
getUnprocessedDocuments
Returns the number of unprocessed text documents without title in the database. This method assumes a transaction is already active.- Returns:
- The described number.
- Throws:
javax.persistence.PersistenceException
- If some error occurs while executing SQL statements in the database.
-
deleteAllProcessedDocumentsOfType
protected final <T extends ProcessedDocument> void deleteAllProcessedDocumentsOfType(@NonNull @NonNull java.lang.Class<T> documentType)Deletes all the processed documents of a given type from the database.- Type Parameters:
T
- The type of documents to delete.- Parameters:
documentType
- The type of documents to delete.- Throws:
java.lang.IllegalArgumentException
- IfdocumentType
isnull
.javax.persistence.PersistenceException
- If some error occurs while executing SQL statements in the database.
-
run
Executes the processing step implemented by this object. The processing step parameters are already validated and available upon request ongetParameters()
. This method is invoked in the context of a JPA transaction that is started and committed or rolled back automatically.- Throws:
ProcessingException
- If an exception occurs during execution.
-