Label | Explanation | Data Type |
Input Training Features | The input feature class that will be used to train the model. | Feature Layer; Table View |
Output Model | The output trained model that will be saved as a deep learning package (.dlpk) file. | File |
Variable to Predict | A field from the Input Training Features parameter value that contains the values that will be used to train the model. This field contains known (training) values of the variable that will be used to predict at unknown locations. | Field |
Treat Variable as Categorical
(Optional) | Specifies whether the Variable to Predict parameter value will be treated as a categorical variable.
| Boolean |
Explanatory Training Variables
(Optional) | A list of fields representing the explanatory variables that will help predict the value or category of the Variable to Predict parameter value. Check the accompanying check box for any variables that represent classes or categories (such as land cover, presence, or absence). | Value Table |
Explanatory Training Distance Features
(Optional) | The features whose distances from the input training features will be estimated automatically and added as more explanatory variables. Distances will be calculated from each of the input explanatory training distance features to the nearest input training features. Point and polygon features are supported, and if the input explanatory training distance features are polygons, the distance attributes will be calculated as the distance between the closest segments of the pair of features. | Feature Layer |
Explanatory Training Rasters
(Optional) | The rasters whose values will be extracted from the raster and considered as explanatory variables for the model. Each layer forms one explanatory variable. For each feature in the input training features, the value of the raster cell will be extracted at that exact location. Bilinear raster resampling will be used when extracting the raster value for continuous rasters. Nearest neighbor assignment will be used when extracting a raster value from categorical rasters. If the Input Training Features parameter value has polygons, and you have specified this parameter, one raster value for each polygon will be used in the model. Each polygon is assigned the average value for continuous rasters and the majority for categorical rasters. Check the Categorical column check box for any raster that represents classes or categories such as land cover, presence, or absence. | Value Table |
Total Time Limit (Minutes)
(Optional) | The total time limit in minutes it takes for AutoML model training. The default is 60 (1 hour). | Double |
AutoML Mode
(Optional) | Specifies the goal of AutoML and how intensive the AutoML search will be.
| String |
Algorithms
(Optional) | Specifies the algorithms that will be used during the training. By default, all the algorithms will be used.
| Multivalue |
Validation Percentage
(Optional) | The percentage of input data that will be used for validation. The default value is 10. | Long |
Output Report
(Optional) | The output report that will be generated as an .html file. If the path provided is not empty, the report will be created in a new folder under the provided path. The report will contain details of the various models as well as details of the hyperparameters that were used during the evaluation and the performance of each model. Hyperparameters are parameters that control the training process. They are not updated during training and include model architecture, learning rate, number of epochs, and so on. | File |
Output Importance Table
(Optional) | An output table containing information about the importance of each explanatory variable (fields, distance features, and rasters) used in the model. | Table |
Output Feature Class
(Optional) | The feature layer containing the predicted values by the best performing model on the training feature layer. It can be used to verify model performance by visually comparing the predicted values with the ground truth. | Feature Class |
Summary
Trains a machine learning model by building training pipelines and automating much of the training process. This includes exploratory data analysis, feature selection, feature engineering, model selection, hyperparameter tuning, and model training. Its outputs include performance metrics of the best model on the training data, as well as the trained deep learning model package .dlpk that can be used as input for the Predict Using AutoML tool to predict on a new dataset.
Usage
You must install the proper deep learning framework for Python in ArcGIS Pro.
The time it takes for the tool to produce the trained model depends on the following:
- The amount of data provided during training
- The AutoML Mode parameter value
By default, the timer for all modes is set at 60 minutes. Regardless of the amount of data used in training, the Basic option will not take the entire 60 minutes to find the optimum model. The fit process will complete as soon as the optimum model is identified. The Advanced option will take more time due to the additional tasks of feature engineering, feature selection, and hyperparameter tuning. In addition to the new features obtained by combining multiple features from the input, the tool creates spatial features with names from zone3_id through zone7_id. These new features will be extracted from the location information in the input data and will be used to train better models. For more information about the new spatial features, see How AutoML Works. If the amount of data being trained is large, all combinations of the models may not be evaluated within 60 minutes. In such cases, the best performing model determined within 60 minutes will be considered the optimum model. You can then either use this model or rerun the tool with a higher Total Time Limit (Minutes) parameter value.
An ArcGIS Spatial Analyst extension license is required to use rasters as explanatory variables.
The Output Report parameter value is a file in HTML format that provides a way to review the information in the working directory.
The first page in the output report includes links to each of the models evaluated and shows their performance on a validation dataset along with the time it took to train them. Based on the evaluation metric, the report shows the best performing model that was chosen.
RMSE is the default evaluation metric for regression problems, while Logloss is the default metric for classification problems. The following metrics are available in the output report:
-
- Classification—AUC, Logloss, F1, Accuracy, Average precision
- Regression—MSE, RMSE, MAE, R2, MAPE, Spearman coefficient, Pearson coefficient
When you click a model combination, details about the training for that model combination are displayed including the learning curves, variable importance curves, hyperparameters used, and so on.
-
Potential use cases for the tool include training an annual solar energy generation model based on weather factors, training a crop prediction model using related variables, and training a house value prediction model.
For information about requirements for running this tool and issues you may encounter, see Deep Learning frequently asked questions.
Parameters
arcpy.geoai.TrainUsingAutoML(in_features, out_model, variable_predict, {treat_variable_as_categorical}, {explanatory_variables}, {distance_features}, {explanatory_rasters}, {total_time_limit}, {autoML_mode}, {algorithms}, {validation_percent}, {out_report}, {out_importance}, {out_features})
Name | Explanation | Data Type |
in_features | The input feature class that will be used to train the model. | Feature Layer; Table View |
out_model | The output trained model that will be saved as a deep learning package (.dlpk) file. | File |
variable_predict | A field from the in_features parameter that contains the values that will be used to train the model. This field contains known (training) values of the variable that will be used to predict at unknown locations. | Field |
treat_variable_as_categorical (Optional) | Specifies whether the variable_predict parameter value will be treated as a categorical variable.
| Boolean |
explanatory_variables [explanatory_variables,...] (Optional) | A list of fields representing the explanatory variables that will help predict the value or category of the variable_predict parameter value. Pass the True value ('name_of_variable',True) for any variables that represent classes or categories (such as land cover, presence, or absence). | Value Table |
distance_features [distance_features,...] (Optional) | The features whose distances from the input training features will be estimated automatically and added as more explanatory variables. Distances will be calculated from each of the input explanatory training distance features to the nearest input training features. Point and polygon features are supported, and if the input explanatory training distance features are polygons, the distance attributes will be calculated as the distance between the closest segments of the pair of features. | Feature Layer |
explanatory_rasters [explanatory_rasters,...] (Optional) | The rasters whose values will be extracted from the raster and considered as explanatory variables for the model. Each layer forms one explanatory variable. For each feature in the input training features, the value of the raster cell will be extracted at that exact location. Bilinear raster resampling will be used when extracting the raster value for continuous rasters. Nearest neighbor assignment will be used when extracting a raster value from categorical rasters. If the in_features parameter value has polygons, and you have specified this parameter, one raster value for each polygon will be used in the model. Each polygon is assigned the average value for continuous rasters and the majority for categorical rasters. Pass a true value using "<name_of_raster> true" for any raster that represents classes or categories such as land cover, presence, or absence. | Value Table |
total_time_limit (Optional) | The total time limit in minutes it takes for AutoML model training. The default is 60 (1 hour). | Double |
autoML_mode (Optional) | Specifies the goal of AutoML and how intensive the AutoML search will be.
| String |
algorithms [algorithms,...] (Optional) | Specifies the algorithms that will be used during the training.
By default, all the algorithms will be used. | Multivalue |
validation_percent (Optional) | The percentage of input data that will be used for validation. The default value is 10. | Long |
out_report (Optional) | The output report that will be generated as an .html file. If the path provided is not empty, the report will be created in a new folder under the provided path. The report will contain details of the various models as well as details of the hyperparameters that were used during the evaluation and the performance of each model. Hyperparameters are parameters that control the training process. They are not updated during training and include model architecture, learning rate, number of epochs, and so on. | File |
out_importance (Optional) | An output table containing information about the importance of each explanatory variable (fields, distance features, and rasters) used in the model. | Table |
out_features (Optional) | The feature layer containing the predicted values by the best performing model on the training feature layer. It can be used to verify model performance by visually comparing the predicted values with the ground truth. | Feature Class |
Code sample
This example shows how to use the TrainUsingAutoML function.
# Name: TrainUsingAutoML.py
# Description: Train a machine learning model on feature or tabular data with
# automatic hyperparameter selection.
# Import system modules
import arcpy
import os
# Set local variables
datapath = "path_to_data"
out_path = "path_to_trained_model"
in_feature = os.path.join(datapath, "train_data.gdb", "name_of_data")
out_model = os.path.join(out_path, "model.dlpk")
# Run Train Using AutoML Model
arcpy.geoai.TrainUsingAutoML(in_feature, out_model, "price", None,
"bathrooms #;bedrooms #;square_fee #", None, None,
60, "BASIC")
Environments
Licensing information
- Basic: No
- Standard: No
- Advanced: Yes
Related topics
- An overview of the Feature and Tabular Analysis toolset
- Find a geoprocessing tool
- How LightGBM algorithm works
- How Linear regression algorithm works
- How XGBoost algorithm works
- How Decision tree classification and regression algorithm works
- How Extra trees classification and regression algorithm works
- How Random forest classification and regression algorithm works