In version 21.4, the chemaxon.naming.DocumentExtractor
class has been removed.
The following guide helps in migrating to its alternatives, MolImporter and DocumentToStructure.
Instead of calling the constructors of DocumentExtractor or the readPDF
method:
process
method of DocumentToStructure
to receive a MolImporter instance.String params = ...; // D2S options. Optional parameter.
try (MolImporter importer = DocumentToStructure.process(text, params)) {
// ...
}
File file = ...;
String format = ...; // D2S format and options. Optional parameter.
String encoding = ...; // Character encoding. Optional parameter.
try (MolImporter importer = new MolImporter(file, format, encoding)) {
// ...
}
When using the constructor of MolImporter, the format must be specified as d2s
, or d2s:
, followed
by the required format options. If the format is omitted entirely, it is automatically detected based
on the type of the file.
The constructors of DocumentExtractor which received an URL or URLConnection parameter have no counterpart on MolImporter or DocumentToStructure. In these cases, the input must be converted to one of the applicable input types.
Instead of the configuration methods of DocumentExtractor, MolImporter has format options that can be passed at creation time, separated by commas.
setCasNumberLookup(boolean value)
→ +cas
or -cas
acceptElements(boolean on)
→ +elements
or -elements
acceptIons(boolean on)
→ +ions
or -ions
acceptGroups(boolean on)
→ +groups
or -groups
acceptGenericNames(boolean on)
→ +vernacular
or -vernacular
The processPlainText()
and processHTML()
methods of DocumentExtractor have no direct counterpart on
MolImporter, as the results of MolImporter can be read immediately, and the content type is automatically detected.
The ProgressListener support of DocumentExtractor is a removed feature, it has no alternative in case of MolImporter.
To collect the results in a list, similarly to getHits()
:
try (MolImporter importer = new MolImporter(file)) {
List<Molecule> molecules = importer.getMolStream()
.collect(Collectors.toList());
// ...
}
The returned Molecules are the same objects that were previously stored in
the structure
field of the returned Hits. The information stored in the
other fields of Hits are stored as properties in the Molecules:
hit.text
→ (String) mol.getPropertyObject(DocumentToStructure.SOURCE_TEXT)
hit.position
→ (Integer) mol.getPropertyObject(DocumentToStructure.CHARACTER)
hit.getPageNumber()
→ (Integer) mol.getPropertyObject(DocumentToStructure.PAGE)
hit.getAllPositions()
→ no alternativehit.getPositionsString()
→ no alternativeNote that all properties can be null
if the information is not provided for the current
input type.
The main method of DocumentExtractor has no direct alternative but its results can be reproduced with MolImporter and DocumentToStructure.