Batch Request
A Batch Request specifies a BatchA selection of records from a Data Asset. of data.
It can be created by using the build_batch_request
method found on a Data AssetA collection of records within a Datasource which is usually named based on the underlying data system and sliced to correspond to a desired specification..
A Batch Request contains all the necessary details to query the appropriate underlying data. The relationship between a Batch Request and the data returned as a Batch is guaranteed. If a Batch Request identifies multiple Batches that fit the criteria of the user provided options
argument to the build_batch_request
method on a Data Asset, the Batch Request will return all of the matching Batches.
If you are using an interactive session, you can inspect the allowed keys for the options
argument for a Data Asset
by printing the batch_request_options
attribute.
Relationship to other objects
A Batch Request is always used when Great Expectations builds a Batch. Any time you interact with something that requires a Batch of Data (such as a ProfilerGenerates Metrics and candidate Expectations from data., CheckpointThe primary means for validating data in a production deployment of Great Expectations., or ValidatorUsed to run an Expectation Suite against data.) you will use a Batch Request to create the Batch that is used.
Use cases
If you are using a Custom Profiler or the interactive method of creating Expectations, you will need to provide a Batch of data for the Profiler to analyze or your manually defined Expectations to test against. For both of these processes, you will therefore need a Batch Request to get the Batch.
For more information, see:
When ValidatingThe act of applying an Expectation Suite to a Batch. data with a Checkpoint, you will need to provide one or more Batch Requests and one or more Expectation SuitesA collection of verifiable assertions about data.. You can do this at runtime, or by defining Batch Request and Expectation Suite pairs in advance, in the Checkpoint's configuration.
For more information on setting up Batch Request/Expectation Suite pairs in a Checkpoint configuration, see How to add validations data or suites to a Checkpoint.
Guaranteed relationships
The relationship between a Batch and the Batch Request that generated it is guaranteed. A Batch Request includes all the information necessary to identify a specific Batch or Batches.
Batches are always built using a Batch Request. When the Batch is built metadata is attached to the Batch object and is available via the Batch metadata
attribute. This metadata contains all the option values necessary to recreate the Batch Request that corresponds to the Batch.
Access
You will rarely need to access an existing Batch Request. Instead, you will often build a Batch Request from a Data Asset. A Batch Request can also be saved to a configuration file when you save an object that required a Batch Request for setup, such as a Checkpoint. Once you receive a Batch back, it is unlikely you will need to reference to the Batch Request that generated it. Indeed, if the Batch Request was part of a configuration, Great Expectations will simply initialize a new copy rather than load an existing one when the Batch Request is needed.
Create
You can create a Batch Request from a Data Asset by calling build_batch_request
. Here is an example of configuring a Pandas Filesystem Asset and creating a Batch Request:
import great_expectations as gx
context = gx.get_context()
# data_directory is the full path to a directory containing csv files
datasource = context.sources.add_pandas_filesystem(
name="version-0.16.16 my_pandas_datasource", base_directory=data_directory
)
# The batching_regex should max file names in the data_directory
asset = datasource.add_csv_asset(
name="version-0.16.16 csv_asset",
batching_regex=r"yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2}).csv",
order_by=["year", "month"],
)
batch_request = asset.build_batch_request(options={"year": "2019", "month": "02"})
The options
one passes in to specify a batch will vary depending on how the specific Data Asset was configured. To look at the keys for the options dictionary, you can do the following:
options = asset.batch_request_options
print(options)