Validate Data: Overview
- Completing Step 4: Validate data of the Getting Started tutorial is recommended.
When you complete this step for the first time, you will have created and run a CheckpointThe primary means for validating data in a production deployment of Great Expectations.. This Checkpoint can then be reused to ValidateThe act of applying an Expectation Suite to a Batch. data in the future, and you can also create and configure additional Checkpoints to cover different use cases, should you have them.
The Validate Data process
The recommended workflow for validating data is through the use of Checkpoints. Checkpoints handle the rest of the Validation process for you: They will Validate data, save Validation ResultsGenerated when data is Validated against an Expectation or Expectation Suite., run any ActionsA Python class with a run method that takes a Validation Result and does something with it you have specified, and finally create Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. with their results.
As you can imagine, Checkpoints will make validating data a very simple process, especially since they are reusable. Once you have created your Checkpoint, configured it to your specifications, and specified any Actions you want it to take based on the Validation Results, all you will need to do in the future is tell the Checkpoint to run.
Creating a Checkpoint
Checkpoints are simple to create. While advanced users could write their configuration from scratch, we recommend using the CLICommand Line Interface. It will launch a Jupyter Notebook set up with boilerplate code to create your checkpoint. All you will need to do is configure it! For detailed instructions, please see our guide on how to create a new Checkpoint.
Configuring your Checkpoint
There are three very important things you can do when configuring your Checkpoint. You can add additional validation data, or set the Checkpoint so that validation data must be specified at run time. You can add additional Expectation SuitesA collection of verifiable assertions about data., and you can add Actions which the Checkpoint will execute when it finishes Validating data. For a more detailed overview of Checkpoint configuration, please see our documentation on Checkpoints and Actions.
Checkpoints, Batch Requests, and Expectation Suites
Batch RequestsProvided to a Datasource in order to create a Batch. are used to specify the data that a Checkpoint will Validate. You can add additional validation data to your Checkpoint by assigning it Batch Requests, or set up the Checkpoint so that it requires a Batch Request to be specified at run time.
Expectation Suites contain the ExpectationsA verifiable assertion about data. that the Checkpoint will run against the validation data specified in its Batch Requests. Checkpoints are assigned Expectation Suites and Batch Requests in pairs, and when the Checkpoint is run it will Validate each of its Expectation Suites against the data provided by its paired Batch Request.
For more detailed instructions on how to add Batch Requests and Expectation Suites to a Checkpoint, please see our guide on how to add validations data or suites to a Checkpoint.
Checkpoints and Actions
Actions are executed after a Checkpoint validates data. They are an optional addition to Checkpoints: you do not need to include any in your Checkpoint if you have no use for them. However, they are highly customizable and can be made to do anything you can program in Python, giving you exceptional control over what happens after a Checkpoint Validates.
With that said, there are some Actions that are more common than others. Updating Data Docs, sending emails, posting slack notifications, or sending other custom notifications are all common use cases for Actions. We provide detailed examples of how to set up these Actions in our how to guides for validation Actions.
Running your Checkpoint
Running your Checkpoint once it is fully set up is very straight forward. You can do this either from the CLI or with a Python script, and both of these methods are covered in depth in our guide on how to validate data by running a Checkpoint.
Validation Results and Data Docs
When a Checkpoint finishes Validation, its Validation Results are automatically compiled as Data Docs. You can find these results in the Validation Results tab of your Data Docs, and clicking in to an individual Validation Result in the Data Docs will bring up a detailed list of all the Expectations that ran, as well as which (if any) Expectations passed and which (if any) failed.
For more information, see our documentation for Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc..
Wrapping up
Once your Checkpoint is created and you have used it to validate data, you can continue to reuse it. It will be easy for you to manually run it through the CLI or a Python script. And if you want your Checkpoint to run on a schedule, there are a few ways to do that as well.
We provide a guide for how to deploy a scheduled Checkpoint with cron, and if your pipeline architecture supports python scripts you will be able to run your Checkpoints from there. Even better: Regardless of how you choose to run your Checkpoint in the future, Actions will let you customize what is done with the Validation Results it generates.
Congratulations! At this point in your Great Expectations journey you have established the ability to reliably, and repeatedly, Validate your source data systems with ease.