Skip to main content
Version: 0.17.19

SparkDatasource

class great_expectations.datasource.fluent.SparkDatasource(*, type: Literal['spark'] = 'spark', name: str, id: Optional[uuid.UUID] = None, assets: List[great_expectations.datasource.fluent.spark_datasource.DataFrameAsset] = [], spark_config: Optional[Dict[pydantic.v1.types.StrictStr, Union[pydantic.v1.types.StrictStr, pydantic.v1.types.StrictInt, pydantic.v1.types.StrictFloat, pydantic.v1.types.StrictBool]]] = None, force_reuse_spark_context: bool = True, persist: bool = True)#
add_dataframe_asset(name: str, dataframe: Optional[_SparkDataFrameT] = None, batch_metadata: Optional[BatchMetadata] = None) DataFrameAsset#

Adds a Dataframe DataAsset to this SparkDatasource object.

Parameters:
  • name – The name of the DataFrame asset. This can be any arbitrary string.

  • dataframe

    The Spark Dataframe containing the data for this DataFrame data asset.

    Deprecated since version 0.16.15: The “dataframe” argument is no longer part of “PandasDatasource.add_dataframe_asset()” method call; instead, “dataframe” is the required argument to “DataFrameAsset.build_batch_request()” method.

  • batch_metadata – An arbitrary user defined dictionary with string keys which will get inherited by any batches created from the asset.

Returns:

The DataFameAsset that has been added to this datasource.