Validate multiple Batches from a Batch Request with a single Checkpoint
By default, a Checkpoint only validates the last Batch included in a Batch Request. Use the information provided here to learn how you can use a Python loop and the Checkpoint validations
parameter to validate multiple Batches identified by a single Batch Request.
Prerequisites
Create a Batch Request with multiple Batches
The following Python code creates a Batch Request that includes every available Batch in a Data Asset named asset
:
batch_request = asset.build_batch_request()
A Batch Request can only retrieve multiple Batches from a Data Asset that has been configured to include more than the default single Batch.
When working with a Filesystem Data Source and organizing Batches, the batching_regex
argument determines the inclusion of multiple Batches into a single Data Asset, with each file that matches the batching_regex
resulting in a single Batch.
SQL Data source data Assets include a single Batch by default. You can use splitters to split the single Batch into multiple Batches.
For more information on partitioning a Data Asset into Batches, see Manage Data Assets.
Get a list of Batches from the Batch Request
Use the same Data Asset that your Batch Request was built from to retrieve a list of Batches with the following code:
batch_list = asset.get_batch_list_from_batch_request(batch_request)
Convert the list of Batches into a list of Batch Requests
A Checkpoint validates Batch Requests, but only validates the last Batch found in a Batch Request. You'll need to convert the list of Batches into a list of Batch Requests that return the corresponding individual Batch.
batch_request_list = [batch.batch_request for batch in batch_list]
Build a validations list
A Checkpoint class's validations
parameter consists of a list of dictionaries. Each dictionary pairs one Batch Request with the Expectation Suite it should be validated against. The following code creates a valid validations
list and associates each Batch Request with an Expectation Suite named example_suite
.
validations = [
{"batch_request": batch.batch_request, "expectation_suite_name": "example_suite"}
for batch in batch_list
]
Run Checkpoint
The validations
list, containing the pairings of Batch Requests and Expectation Suites, can now be passed to a single Checkpoint instance which validates each Batch Request against its corresponding Expectation Suite. This effectively validates each Batch included in the original multiple-Batch Batch Request.
checkpoint = context.add_or_update_checkpoint(
name="my_taxi_validator_checkpoint", validations=validations
)
checkpoint_result = checkpoint.run()
Review the Validation Results
After the validations run, use the following code to build and view the Validation Results as Data Docs.
context.build_data_docs()
context.open_data_docs()