Skip to main content
Version: 0.17.19

Batch Request

A Batch Request specifies a BatchA selection of records from a Data Asset. of data. It can be created by using the build_batch_request method found on a Data AssetA collection of records within a Datasource which is usually named based on the underlying data system and sliced to correspond to a desired specification..

A Batch Request contains all the necessary details to query the appropriate underlying data. The relationship between a Batch Request and the data returned as a Batch is guaranteed. If a Batch Request identifies multiple Batches that fit the criteria of the user provided options argument to the build_batch_request method on a Data Asset, the Batch Request will return all of the matching Batches.

If you are using an interactive session, you can inspect the allowed keys for the options argument for a Data Asset by printing the batch_request_options attribute.

Relationship to other objects

A Batch Request is always used when Great Expectations builds a Batch. Any time you interact with something that requires a Batch of Data (such as a ProfilerGenerates Metrics and candidate Expectations from data., CheckpointThe primary means for validating data in a production deployment of Great Expectations., or ValidatorUsed to run an Expectation Suite against data.) you will use a Batch Request to create the Batch that is used.

Use cases

If you are using a Custom Profiler or the interactive method of creating Expectations, you will need to provide a Batch of data for the Profiler to analyze or your manually defined Expectations to test against. For both of these processes, you will therefore need a Batch Request to get the Batch.

For more information, see:

When ValidatingThe act of applying an Expectation Suite to a Batch. data with a Checkpoint, you will need to provide one or more Batch Requests and one or more Expectation SuitesA collection of verifiable assertions about data.. You can do this at runtime, or by defining Batch Request and Expectation Suite pairs in advance, in the Checkpoint's configuration.

For more information on setting up Batch Request/Expectation Suite pairs in a Checkpoint configuration, see How to add validations data or suites to a Checkpoint.

Guaranteed relationships

The relationship between a Batch and the Batch Request that generated it is guaranteed. A Batch Request includes all the information necessary to identify a specific Batch or Batches.

Batches are always built using a Batch Request. When the Batch is built metadata is attached to the Batch object and is available via the Batch metadata attribute. This metadata contains all the option values necessary to recreate the Batch Request that corresponds to the Batch.

Access

You will rarely need to access an existing Batch Request. Instead, you will often build a Batch Request from a Data Asset. A Batch Request can also be saved to a configuration file when you save an object that required a Batch Request for setup, such as a Checkpoint. Once you receive a Batch back, it is unlikely you will need to reference to the Batch Request that generated it. Indeed, if the Batch Request was part of a configuration, Great Expectations will simply initialize a new copy rather than load an existing one when the Batch Request is needed.

Create

You can create a Batch Request from a Data Asset by calling build_batch_request. Here is an example of configuring a Pandas Filesystem Asset and creating a Batch Request:

import great_expectations as gx

context = gx.get_context()

# data_directory is the full path to a directory containing csv files
datasource = context.sources.add_pandas_filesystem(
name="my_pandas_datasource", base_directory=data_directory
)

# The batching_regex should max file names in the data_directory
asset = datasource.add_csv_asset(
name="csv_asset",
batching_regex=r"yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2}).csv",
order_by=["year", "month"],
)

batch_request = asset.build_batch_request(options={"year": "2019", "month": "02"})

The options one passes in to specify a batch will vary depending on how the specific Data Asset was configured. To look at the keys for the options dictionary, you can do the following:

options = asset.batch_request_options
print(options)