Data Validation workflow
Great Expectations recommends using Checkpoints to validate data. Checkpoints validate data, save Validation ResultsGenerated when data is Validated against an Expectation or Expectation Suite., run any ActionsA Python class with a run method that takes a Validation Result and does something with it you have specified, and finally, create Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. with their results. A Checkpoint can be reused to ValidateThe act of applying an Expectation Suite to a Batch. data in the future, and you can create and configure additional Checkpoints for different business requirements.
After you've created your Checkpoint, configured it, and specified the Actions you want it to take based on the Validation Results, all you'll need to do in the future is run the Checkpoint.
Prerequisites
- Completion of the Quickstart guide
Create a Checkpoint
See How to create a new Checkpoint.
Configure your Checkpoint
When you configure your Checkpoint you can add additional validation data, or specify that validation data must be specified at run time. You can add additional Expectation SuitesA collection of verifiable assertions about data., and you can add Actions which the Checkpoint executes when it finishes Validating data. To learn more about Checkpoint configuration, see Checkpoints and Actions.
Checkpoints, Batch Requests, and Expectation Suites
Batch RequestsProvided to a Datasource in order to create a Batch. are used to specify the data that a Checkpoint Validates. You can add additional validation data to your Checkpoint by assigning it Batch Requests, or specifying that a Batch Request is required at run time.
Expectation Suites contain the ExpectationsA verifiable assertion about data. that the Checkpoint runs against the validation data specified in its Batch Requests. Checkpoints are assigned Expectation Suites and Batch Requests in pairs, and when the Checkpoint is run it will Validate each of its Expectation Suites against the data provided by its paired Batch Request.
For more information on adding Batch Requests and Expectation Suites to a Checkpoint, see How to add validations data or suites to a Checkpoint.
Checkpoints and Actions
Actions are optional and are executed after a Checkpoint validates data. Some of the more common Actions include updating Data Docs, sending emails, posting Slack notifications, or sending custom notifications. You can create custom Actions to complete business specific actions after a Checkpoint Validates. For more information about Actions, see Configure Actions.
Run your Checkpoint
See How to validate data by running a Checkpoint.
Validation Results and Data Docs
When a Checkpoint finishes Validation, its Validation Results are automatically compiled as Data Docs. You can find these results in the Validation Results tab of your Data Docs, and clicking an individual Validation Result in the Data Docs displays a detailed list of all the Expectations that ran, as well as which Expectations passed or failed.
For more information, see the Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. documentation.
Checkpoint reuse
After your Checkpoint is created, and you have used it to validate data, you can reuse it in a Python script. If you want your Checkpoint to run on a schedule, see How to deploy a scheduled Checkpoint with cron. If your pipeline architecture supports it, you can run your Checkpoints with Python scripts. Regardless of the method you use to run your Checkpoint, Actions let you customize what is done with the generated Validation Results.