Refresh Big Data Connection (GeoAnalytics Desktop)

Summary

Refreshes an existing big data connection (BDC) and registers any new datasets that have been added to the source location.

Usage

  • This tool requires a BDC. To create a BDC, use the Create Big Data Connection tool.

  • Use this tool to add one or more new datasets to an existing big data connection. Additionally, the tool will reregister any datasets that have been removed using the Remove Dataset From Big Data Connection tool. The following are examples of when to use this tool:

    • You copied a folder of data to your existing BDC source folder and want it represented as a dataset in your BDC.
    • You used the Remove Dataset From Big Data Connection tool and you want to add the removed datasets back to the BDC.

  • This tool does not refresh existing dataset properties that have been edited using the Update Big Data Connection Dataset Properties tool. All modified properties will be maintained. The following scenarios include the recommended workflows:

  • The tool messages will include the following information on the datasets discovered and their status:

    • Skipped—All existing datasets are skipped during refresh and remain as is.
    • Succeeded—New datasets that have been discovered and added to the BDC.
    • Failed—Datasets that were not successfully added to the BDC.

    You may run into one of two issues when discovering datasets in your BDC:

    • Datasets that you expected are missing. In this case, verify that the path you specified as a source folder that contains subfolders is correct and that it's a supported data type.
    • One or more datasets fail to register. If datasets fail to register, you may note some of the following:

      IssueSolutionExample

      The dataset is not in the expected format.

      Open the file to see if it looks as expected. If the data is structured incorrectly, update and try again.

      A .csv file has a few lines and a summary of the data, and then only empty lines.

      The schemas of datasets in a folder do not match.

      All files in a dataset folder must have the same schema. Open the files to compare the schemas. Resolve any mismatched schemas and try to register the dataset again.

      You have one .csv file with 10 fields, and another with 8.

      The file types of a dataset in a folder do not match.

      All files in a dataset folder must have the same extension (file type). Check the file types of the data source location and remove or relocate any misplaced files.

      A shapefile dataset is in the same folder as a parquet file.

      You have an unrecognized field format.

      This is unlikely but may occur if ORC and parquet use an unexpected format. Ensure that you use valid field formats.

      You have a parquet file with an unknown field format.

    Learn more about why datasets fail to add to a BDC file

  • Once you refresh a BDC, use the Describe Dataset tool to verify that the updated dataset looks as expected.

  • The Refresh Big Data Connection tools identifies new datasets. The following tools can also be used to modify a BDC:

  • This geoprocessing tool is powered by Spark. See Big data connections to learn more about big data connections and how to use them.

Syntax

arcpy.gapro.RefreshBDC(bdc_file, {visible_geometry}, {visible_time})
ParameterExplanationData Type
bdc_file

The BDC file to refresh.

File
visible_geometry
(Optional)

Specifies whether the fields used to identify the geometry will be included (visible) as fields for analysis when the BDC file is used in other geoprocessing tools. When geometry fields are not visible, geometry is still applied to the dataset. The geometry visibility setting can be modified in the BDC.

  • GEOMETRY_VISIBLEGeometry fields will be included as fields for analysis. This is the default.
  • GEOMETRY_NOT_VISIBLEGeometry fields will not be included as fields for analysis.
Boolean
visible_time
(Optional)

Specifies whether the fields used to indicate the time will be included (visible) as fields for analysis when the BDC file is used in other geoprocessing tools. When time fields are not visible, time is still applied to the dataset. The time visibility setting can be modified in the BDC.

  • TIME_VISIBLETime fields will be included as fields for analysis. This is the default.
  • TIME_NOT_VISIBLETime fields will not be included as fields for analysis.
Boolean

Derived Output

NameExplanationData Type
updated_bdc

The input .bdc file with updated datasets.

File

Code sample

RefreshBDC (stand-alone script)

The following Python script demonstrates how to use the RefreshBDC function.

# Name: RefreshBDC.py
# Description: Refreshes a big data connection to automatically discover datasets that 
#              have been added.
#
# Requirements: ArcGIS Pro Advanced License

# Import system modules
import arcpy

# Set local variables
bdcFile = r"c:\Projects\MyProjectFolder\my_BigDataConnection.bdc"

# Execute Refresh Big Data Connection
arcpy.gapro.refreshBDC(bdcFile)

Environments

This tool does not use any geoprocessing environments.

Licensing information

  • Basic: No
  • Standard: No
  • Advanced: Yes

Related topics