ConfiguredAssetGCSDataConnector
- class great_expectations.datasource.data_connector.ConfiguredAssetGCSDataConnector(name: str, datasource_name: str, bucket_or_name: str, assets: dict, execution_engine: Optional[great_expectations.execution_engine.execution_engine.ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, max_results: Optional[int] = None, gcs_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None, id: Optional[str] = None)#
Extension of ConfiguredAssetFilePathDataConnector used to connect to GCS.
A ConfiguredAssetGCSDataConnector requires an explicit specification of each DataAsset you want to connect to. This allows more fine-tuning, but also requires more setup. Please note that in order to maintain consistency with Google’s official SDK, we utilize terms like “bucket_or_name” and “max_results”. Since we convert these keys from YAML to Python and directly pass them in to the GCS connection object, maintaining consistency is necessary for proper usage.
- This DataConnector supports the following methods of authentication:
Standard gcloud auth / GOOGLE_APPLICATION_CREDENTIALS environment variable workflow
Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_file
Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_info
- Parameters:
name (str) – required name for DataConnector
datasource_name (str) – required name for datasource
bucket_or_name (str) – bucket name for Google Cloud Storage
assets (dict) – dict of asset configuration (required for ConfiguredAssetDataConnector)
execution_engine (ExecutionEngine) – optional reference to ExecutionEngine
default_regex (dict) – optional regex configuration for filtering data_references
sorters (list) – optional list of sorters for sorting data_references
prefix (str) – GCS prefix
delimiter (str) – GCS delimiter
max_results (int) – max blob filepaths to return
gcs_options (dict) – wrapper object for optional GCS **kwargs
batch_spec_passthrough (dict) – dictionary with keys that will be added directly to batch_spec
- get_available_data_asset_names() List[str] #
Return the list of asset names known by this DataConnector.
- Returns:
A list of available names