Version: 0.15.50

How to configure a RuntimeDataConnector

This guide demonstrates how to configure a RuntimeDataConnector and only applies to the V3 (Batch Request) API. A RuntimeDataConnector allows you to specify a BatchA selection of records from a Data Asset. using a Runtime Batch RequestProvided to a Datasource in order to create a Batch., which is used to create a Validator. A ValidatorUsed to run an Expectation Suite against data. is the key object used to create ExpectationsA verifiable assertion about data. and ValidateThe act of applying an Expectation Suite to a Batch. datasets.

Prerequisites: This how-to guide assumes you have:

Completed the Getting Started Tutorial
A working installation of Great Expectations
Understand the basics of Datasources in the V3 (Batch Request) API
Learned how to configure a Data Context using test_yaml_config

A RuntimeDataConnector is a special kind of Data Connector that enables you to use a RuntimeBatchRequest to provide a Batch's data directly at runtime. The RuntimeBatchRequest can wrap an in-memory dataframe, a filepath, or a SQL query, and must include batch identifiers that uniquely identify the data (e.g. a run_id from an AirFlow DAG run). The batch identifiers that must be passed in at runtime are specified in the RuntimeDataConnector's configuration.

Steps

1. Instantiate your project's DataContext

Import these necessary packages and modules:

YAML
Python

from ruamel import yaml

import great_expectations as gx
from great_expectations.core.batch import RuntimeBatchRequest

context = gx.get_context()

import great_expectations as gx
from great_expectations.core.batch import RuntimeBatchRequest

context = gx.get_context()

2. Set up a Datasource

All of the examples below assume you’re testing configuration using something like:

YAML
Python

datasource_yaml = """
name: taxi_datasource
class_name: Datasource
execution_engine:
  class_name: PandasExecutionEngine
data_connectors:
  <DATACONNECTOR NAME GOES HERE>:
    <DATACONNECTOR CONFIGURATION GOES HERE>
"""
context.test_yaml_config(yaml_config=datasource_config)

datasource_config = {
    "name": "taxi_datasource",
    "class_name": "Datasource",
    "module_name": "great_expectations.datasource",
    "execution_engine": {
        "module_name": "great_expectations.execution_engine",
        "class_name": "PandasExecutionEngine",
    },
    "data_connectors": {
        "<DATACONNECTOR NAME GOES HERE>": {
          "<DATACONNECTOR CONFIGURATION GOES HERE>"
        },
    },
}
context.test_yaml_config(yaml.dump(datasource_config))

If you’re not familiar with the test_yaml_config method, please check out: How to configure Data Context components using test_yaml_config

3. Add a RuntimeDataConnector to a Datasource configuration

This basic configuration can be used in multiple ways depending on how the RuntimeBatchRequest is configured:

YAML
Python

datasource_yaml = r"""
name: taxi_datasource
class_name: Datasource
module_name: great_expectations.datasource
execution_engine:
  module_name: great_expectations.execution_engine
  class_name: PandasExecutionEngine
data_connectors:
  default_runtime_data_connector_name:
    class_name: RuntimeDataConnector
    batch_identifiers:
      - default_identifier_name
"""

datasource_config = {
    "name": "taxi_datasource",
    "class_name": "Datasource",
    "module_name": "great_expectations.datasource",
    "execution_engine": {
        "module_name": "great_expectations.execution_engine",
        "class_name": "PandasExecutionEngine",
    },
    "data_connectors": {
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"],
        },
    },
}

Once the RuntimeDataConnector is configured you can add your DatasourceProvides a standard API for accessing and interacting with data from a wide variety of source systems. using:

context.add_datasource(**datasource_config)

Example 1: RuntimeDataConnector for access to file-system data:

At runtime, you would get a Validator from the Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. by first defining a RuntimeBatchRequest with the path to your data defined in runtime_parameters:

batch_request = RuntimeBatchRequest(
    datasource_name="version-0.15.50 taxi_datasource",
    data_connector_name="version-0.15.50 default_runtime_data_connector_name",
    data_asset_name="version-0.15.50 YOUR_MEANINGFUL_NAME",  # This can be anything that identifies this data_asset for you
    runtime_parameters={"path": "PATH_TO_YOUR_DATA_HERE"},  # Add your path here.
    batch_identifiers={"default_identifier_name": "YOUR_MEANINGFUL_IDENTIFIER"},
)

Next, you would pass that request into context.get_validator:

validator = context.get_validator(
    batch_request=batch_request,
    create_expectation_suite_with_name="version-0.15.50 MY_EXPECTATION_SUITE_NAME",
)
print(validator.head())

Example 2: RuntimeDataConnector that uses an in-memory DataFrame

At runtime, you would get a Validator from the Data Context by first defining a RuntimeBatchRequest with the DataFrame passed into batch_data in runtime_parameters:

import pandas as pd

path = "PATH_TO_YOUR_DATA_HERE"

df = pd.read_csv(path)

batch_request = RuntimeBatchRequest(
    datasource_name="version-0.15.50 taxi_datasource",
    data_connector_name="version-0.15.50 default_runtime_data_connector_name",
    data_asset_name="version-0.15.50 YOUR_MEANINGFUL_NAME",  # This can be anything that identifies this data_asset for you
    runtime_parameters={"batch_data": df},  # Pass your DataFrame here.
    batch_identifiers={"default_identifier_name": "YOUR_MEANINGFUL_IDENTIFIER"},
)

Next, you would pass that request into context.get_validator:

validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name="version-0.15.50 MY_EXPECTATION_SUITE_NAME",
)
print(validator.head())

Additional Notes

To view the full script used in this page, see it on GitHub:

how_to_configure_a_runtimedataconnector.py

Prerequisites: This how-to guide assumes you have:

Steps​

1. Instantiate your project's DataContext​

2. Set up a Datasource​

3. Add a RuntimeDataConnector to a Datasource configuration​

Example 1: RuntimeDataConnector for access to file-system data:​

Example 2: RuntimeDataConnector that uses an in-memory DataFrame​

Additional Notes​

Steps

1. Instantiate your project's DataContext

2. Set up a Datasource

3. Add a RuntimeDataConnector to a Datasource configuration

Example 1: RuntimeDataConnector for access to file-system data:

Example 2: RuntimeDataConnector that uses an in-memory DataFrame

Additional Notes