Skip to main content
Version: 0.17.19

DataConnector

class great_expectations.datasource.DataConnector(name: str, datasource_name: str, execution_engine: ExecutionEngine, batch_spec_passthrough: Optional[dict] = None, id: Optional[str] = None)#

The base class for all Data Connectors.

Data Connectors produce identifying information, called Batch Specs, that Execution Engines can use to get individual batches of data. They add flexibility in how to obtain data such as with time-based partitioning, downsampling, or other techniques appropriate for the Datasource.

For example, a DataConnector could produce a SQL query that logically represents “rows in the Events table with a timestamp on February 7, 2012,” which an SqlAlchemy Datasource could use to materialize a SqlAlchemy Dataset corresponding to that Batch of data and ready for validation.

A Batch is a sample from a data asset, sliced according to a particular rule. For example, an hourly slide of the Events table or “most recent Users records.” It is the primary unit of validation in the Great Expectations Data Context. Batches include metadata that identifies how they were constructed–the same Batch Spec assembled by the data connector. While not every Datasource will enable re-fetching a specific batch of data, GX can store snapshots of batches or store metadata from an external data version control system.

Parameters:
  • name – The name of the Data Connector.

  • datasource_name – The name of this Data Connector’s Datasource.

  • execution_engine – The Execution Engine object to used by this Data Connector to read the data.

  • batch_spec_passthrough – Dictionary with keys that will be added directly to the batch spec.

  • id – The unique identifier for this Data Connector used when running in cloud mode.

get_available_data_asset_names() List[str]#

Return the list of asset names known by this data connector.

Returns:

A list of available names