Label | Explanation | Data Type |
Input Table | A feature class or table containing a text field with the input text for the model and a label field containing the target class labels. | Feature Layer; Table View |
Text Field | A text field in the input feature class or table that contains the text that will be classified by the model. | Field |
Label Field | A text field in the input feature class or table that contains the target class labels for training the model. In the case of multilabel text classification, specify more than one text field. | Field |
Output Model
| The output folder location that will store the trained model. | Folder |
Pretrained Model File
(Optional) | A pretrained model that will be used to fine-tune the new model. The input can be an Esri model definition file (.emd) or a deep learning package file (.dlpk). A pretrained model with similar classes can be fine-tuned to fit the new model. The pretrained model must have been trained with the same model type and backbone model that will be used to train the new model. | File |
Max Epochs
(Optional) | The maximum number of epochs for which the model will be trained. A maximum epoch value of 1 means the dataset will be passed forward and backward through the neural network one time. The default value is 5. | Long |
Model Backbone
(Optional) | Specifies the preconfigured neural network that will serve as the encoder for the model and extract feature representations of the input text in the form of fixed length vectors. These vectors are then passed as input to the classification head of the model.
| String |
Batch Size
(Optional) | The number of training samples that will be processed at one time. The default value is 2. Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size. | Double |
Model Arguments (Optional) | Additional arguments for initializing the model, such as seq_len for the maximum sequence length of the training data, that will be considered for training the model. See keyword arguments in the TextClassifier documentation for the list of supported model arguments that can be used. | Value Table |
Learning Rate
(Optional) | The step size indicating how much the model weights will be adjusted during the training process. If no value is specified, an optimal learning rate will be determined automatically. | Double |
Validation Percentage (Optional) | The percentage of training samples that will be used for validating the model. The default value is 10. | Double |
Stop when model stops improving
(Optional) | Specifies whether model training will stop when the model is no longer improving or until the Max Epochs parameter value is reached.
| Boolean |
Make model backbone trainable
(Optional) | Specifies whether the backbone layers in the pretrained model will be frozen, so that the weights and biases remain as originally designed.
| Boolean |
Remove HTML Tags
(Optional) | Specifies whether HTML tags will be removed from the input text.
| Boolean |
Remove URLs
(Optional) | Specifies whether URLs will be removed from the input text.
| Boolean |
Summary
Trains a single or multilabel text classification model to assign a predefined category or label to unstructured text.
Usage
This tool requires deep learning frameworks be installed. To set up your machine to use deep learning frameworks in ArcGIS Pro, see Install deep learning frameworks for ArcGIS.
This tool can also be used to fine-tune an existing trained model.
To run this tool using GPU, set the Processor Type environment to GPU. If you have more than one GPU, specify the GPU ID environment instead.
The input to the tool is a table or a feature class containing training data, with a text field containing the input text and a label field containing the target class labels.
For information about requirements for running this tool and issues you may encounter, see Deep Learning frequently asked questions.
Parameters
arcpy.geoai.TrainTextClassificationModel(in_table, text_field, label_field, out_model, {pretrained_model_file}, {max_epochs}, {model_backbone}, {batch_size}, {model_arguments}, {learning_rate}, {validation_percentage}, {stop_training}, {make_trainable}, {remove_html_tags}, {remove_urls})
Name | Explanation | Data Type |
in_table | A feature class or table containing a text field with the input text for the model and a label field containing the target class labels. | Feature Layer; Table View |
text_field | A text field in the input feature class or table that contains the text that will be classified by the model. | Field |
label_field [label_field,...] | A text field in the input feature class or table that contains the target class labels for training the model. In the case of multilabel text classification, specify more than one text field. | Field |
out_model | The output folder location that will store the trained model. | Folder |
pretrained_model_file (Optional) | A pretrained model that will be used to fine-tune the new model. The input can be an Esri model definition file (.emd) or a deep learning package file (.dlpk). A pretrained model with similar classes can be fine-tuned to fit the new model. The pretrained model must have been trained with the same model type and backbone model that will be used to train the new model. | File |
max_epochs (Optional) | The maximum number of epochs for which the model will be trained. A maximum epoch value of 1 means the dataset will be passed forward and backward through the neural network one time. The default value is 5. | Long |
model_backbone (Optional) | Specifies the preconfigured neural network that will serve as the encoder for the model and extract feature representations of the input text in the form of fixed length vectors. These vectors are then passed as input to the classification head of the model.
| String |
batch_size (Optional) | The number of training samples that will be processed at one time. The default value is 2. Increasing the batch size can improve tool performance; however, as the batch size increases, more memory is used. If an out of memory error occurs, use a smaller batch size. | Double |
model_arguments [model_arguments,...] (Optional) | Additional arguments for initializing the model, such as seq_len for the maximum sequence length of the training data, that will be considered for training the model. See keyword arguments in the TextClassifier documentation for the list of supported model arguments that can be used. | Value Table |
learning_rate (Optional) | The step size indicating how much the model weights will be adjusted during the training process. If no value is specified, an optimal learning rate will be determined automatically. | Double |
validation_percentage (Optional) | The percentage of training samples that will be used for validating the model. The default value is 10. | Double |
stop_training (Optional) | Specifies whether model training will stop when the model is no longer improving or until the max_epochs parameter value is reached.
| Boolean |
make_trainable (Optional) | Specifies whether the backbone layers in the pretrained model will be frozen, so that the weights and biases remain as originally designed.
| Boolean |
remove_html_tags (Optional) | Specifies whether HTML tags will be removed from the input text.
| Boolean |
remove_urls (Optional) | Specifies whether URLs will be removed from the input text.
| Boolean |
Code sample
The following Python window script demonstrates how to use the TrainTextClassificationModel function.
# Name: TrainTextClassification.py
# Description: Train a text classifier model to classify text in different classes.
#
# Requirements: ArcGIS Pro Advanced license
# Import system modules
import arcpy
import os
arcpy.env.workspace = "C:/textanalysisexamples/data"
# Set local variables
in_table = "training_data_textclassifier.csv"
out_folder = "c\\textclassifier"
# Run Train Text Classification Model
arcpy.geoai.TrainTextClassificationModel(in_table, out_folder,
max_epochs=2, text_field="Address", label_field="Country", batch_size=16)
Environments
Licensing information
- Basic: No
- Standard: No
- Advanced: Yes