Version: 0.16.16

How to validate data by running a Checkpoint

This guide will help you ValidateThe act of applying an Expectation Suite to a Batch. your data by running a CheckpointThe primary means for validating data in a production deployment of Great Expectations..

The best way to Validate data with Great Expectations is using a Checkpoint. Checkpoints identify what Expectation SuitesA collection of verifiable assertions about data. to run against which Data AssetA collection of records within a Datasource which is usually named based on the underlying data system and sliced to correspond to a desired specification. and BatchA selection of records from a Data Asset. (described by a Batch RequestsProvided to a Datasource in order to create a Batch.), and what ActionsA Python class with a run method that takes a Validation Result and does something with it to take based on the results of those tests.

Succinctly: Checkpoints are used to test your data and take action based on the results.

Prerequisites

Completion of the Quickstart guide.
A working installation of Great Expectations.
Configured a Data Context
Configured an Expectations Suite
Configured a Checkpoint

You can run the Checkpoint from the CLICommand Line Interface in a Terminal shell or using Python.

Python
Terminal

If you already have created and saved a Checkpoint, then the following code snippet will retrieve it from your context and run it:

# context = gx.get_context()
result = context.run_checkpoint(
    checkpoint_name="version-0.16.16 my_checkpoint",
    batch_request={
        "datasource_name": "taxi_source",
        "data_asset_name": "yellow_tripdata",
    },
    run_name=None,
)

if not result["success"]:
    print("Validation failed!")
    sys.exit(1)

print("Validation succeeded!")

If you do not have a Checkpoint, the pre-requisite guides mentioned above will take you through the necessary steps. Alternatively, this concise example below shows how to connect to data, create an expectation suite using a validator, and create a checkpoint (saving everything to the Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. along the way).

# setup
import sys
import great_expectations as gx

context = gx.get_context()

# starting from scratch, we add a datasource and asset
datasource = context.sources.add_pandas_filesystem(
    name="version-0.16.16 taxi_source", base_directory=data_directory
)

asset = datasource.add_csv_asset(
    "yellow_tripdata",
    batching_regex=r"yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2}).csv",
    order_by=["-year", "month"],
)

# use a validator to create an expectation suite
validator = context.get_validator(
    datasource_name="taxi_source", data_asset_name="version-0.16.16 yellow_tripdata"
)
validator.expect_column_values_to_not_be_null("pickup_datetime")
context.add_expectation_suite("yellow_tripdata_suite")

# create a checkpoint
checkpoint = gx.checkpoint.SimpleCheckpoint(
    name="version-0.16.16 my_checkpoint",
    data_context=context,
    expectation_suite_name="version-0.16.16 yellow_tripdata_suite",
)

# add (save) the checkpoint to the data context
context.add_checkpoint(checkpoint=checkpoint)
cp = context.get_checkpoint(name="version-0.16.16 my_checkpoint")
assert cp.name == "my_checkpoint"

If you have already created and saved a Checkpoint, then you can run the Checkpoint using the CLI.

great_expectations checkpoint run my_checkpoint

Additional notes

This command will return posix status codes and print messages as follows:

+-------------------------------+-----------------+-----------------------+
| **Situation**                 | **Return code** | **Message**           |
+-------------------------------+-----------------+-----------------------+
| all validations passed        | 0               | Validation succeeded! |
+-------------------------------+-----------------+-----------------------+
| one or more validation failed | 1               | Validation failed!    |
+-------------------------------+-----------------+-----------------------+

Prerequisites​

Additional notes​

Prerequisites

Additional notes