Trainer Engine offers 3 interfaces:
The generic workflows are summarized in this page: Trainer workflows
To start model training or prediction use the "New run" button
Target name is used to tag or classify models. Target name can be typed in or selected from a pre-defined list of values. Target name is also used during automatic name generation.
Configurations are defined in json format, but comment enriched hjson is also accepted. It is recommended to check the example configuration files first. Detailed description is available in the detailed configuration page Trainer configuration.
Calculated descriptors field describes the standardization and feature generation.
Training parameter fields configures
If the split ratio is specified (split <1) the input set is randomly spitted to training and test sets. These generated subsets are available in "Past uploads". Training is done on the training set and the test set is subsequently predicted with the trained model. Accuracy statistics are automatically calculated on both training and test sets.
Auto-generated name includes target name, time stamp, algorithm and model type.
In case of prediction run an existing model is executed on the input set. In the case of prediction run, the Observed data field is optional. If Observed data is present in the input and selected, corresponding accuracy statistics are calculated automatically.
Job life cycle
Runs page provides a browser over the previous runs.
Example: filtering Runs table for "prediction" runs.
Underlined run name provides a link to the corresponding details page.
The run detail page summarizes all the data related to a job.
Cases selected on the Runs page are available in Analyze page.
Analyze page is a flexible and configurable view to support visualization, comparison and assessment of model details and accuracy measures.
Both data points or run level data can be configured as axes. Data points are the individual molecules, run level data is associated with the run (e.g. hyper parameters, dateset size, accuracy statistics)
Zoom, pan and selection are available on scatterplots.
Selection on the scatter plots activates filtering.
Clicking on a symbol on a data point level scatter plot activates the Highlighted structure display.
Trainer REST SWAGGER documentation is available at /swagger/swagger.html
SDF files are to sent using multipart post request. These resources can be used to train model or run prediction.
Docs: /swagger/swagger.html#/rawfilesResource/upload_1
POST: /rest/rawfiles
The response contains information on the uploaded file:
{'id': {runid},
'size':'',
'fileName':'',
'uploadTimestampInMs':'',
'scrutinizeResult':{
'recordCount':'',
'fieldNames':''
}
}
To start training or test prediction, the raw input id is required.
To train new model an uploaded input file (raw input id, observed data field), descriptor and trainer configuration are required.
Docs: /swagger/swagger.html#/executeResource/runTraining
POST: /rest/execute/training
Request body for classification with error prediction:
{
"name": "string",
"type": "TRAINING",
"dataset": {
"observedData": "string",
"rawInputId": "string"
},
"splitValue": 0,
"model": {
"target": "string",
"descriptors": {},
"config": {
"trainer": {
"method": "CLASSIFICATION",
"algorithm": "RANDOM_FOREST",
"trainerWrapper": "CONFORMAL_PREDICTION",
"params": {}
}
}
}
}
The response is the run id. The details of how to configure the "descriptors" and "trainer" parameters are available here.
The runs end-point returns the main information about a run for example:
Docs: /swagger/swagger.html#/runResource/getRun_2
GET: /rest/runs/{runid}
Response
{
"name": "string",
"type": "TRAINING",
"dataset": {
"observedData": "string",
"rawInputId": "string",
"name": "string",
"recordCount": 0
},
"splitValue": 0,
"model": {
"target": "string",
"descriptors": {},
"config": {
"trainer": {
"method": "CLASSIFICATION",
"algorithm": "RANDOM_FOREST",
"trainerWrapper": "CONFORMAL_PREDICTION",
"params": {}
}
},
"relatedTraining": {
"id": 0,
"deleted": true,
"name": "string"
},
"production": true,
"descriptorCount": 0
},
"id": '',
"executionDetails": {
"status": "WAITING",
"startTime": 0,
"endTime": 0
},
"statisticalParameters": {},
"deleted": true,
"versioning": {
"outdated": true,
"rerunnable": true,
"compatibilityVersion": "string",
"upgradeRun": {
"id": 0,
"deleted": true,
"name": "string"
}
}
}
3 workflows are supported for prediction.
Steps:
Docs: /swagger/swagger.html#/executeResource/runPrediction
POST: /rest/execute/prediction
Request body:
{
"name": "string",
"trainingRunId": 0,
"rawInputId": "string",
"observedData": "string"
}
If observed data is provided, statistics are calculated automatically.
Predicted file:
Docs: /swagger/swagger.html#/predictedResource/getRun_1
GET: /rest/predicted/export/{runid}
Training models in success state can be flagged as in production. These models are available on dedicated end-points (/rest/execute/prediction/) .
Docs: /swagger/swagger.html#/executeResource/setToProduction
PUT: /rest/execute/production
Production models can be executed in one request. This method is recommended for predicting values for new molecules.
Docs: /swagger/swagger.html#/executeResource/getPrediction
POST: rest/execute/prediction/molset
Request body:
{
"modelId": 1,
"structures": [
"SMILES"
],
"resultFormat": "string"
}
The response contains the prediction, configurable applicability domain and conformal prediction results.
Docs: /swagger/swagger.html#/executeResource/predictForFile
POST: /rest/execute/prediction/file/start/{runid}'
Multipart sdf file upload, runid is the trained model run id.
Status of the running job:
Docs: /swagger/swagger.html#/executeResource/predictFileStatus
POST: /rest/execute/prediction/file/status/{runid}'
Predicted file:
Docs: /swagger/swagger.html#/predictedResource/getRun_1
GET: /rest/predicted/export/{runid}
Playground general documentation is available here: https://disco.chemaxon.com/calculators/playground/
The Playground About dialogue Integration tab shows the state of the Trainer Engine connection.
If the Trainer Engine is connected, production flagged models are listed in Playground Calculators marked with cloud icon.
Bulk prediction icon opens a dialogue to select a Trainer Engine model and upload the sdf file.
The results are available also under this icon.
Bulk predictions started from Playground are marked as "external runs" and not listed in Trainer Engine GUI Runs tab. The progress, status and corresponding log file is available with the run id:
/trainer/#/run/{run id reference}