Expectation Suite
The Great Expectations CLI is no longer the preferred method for implementing and configuring Great Expectations. This topic will be updated soon to reflect this change. For more information, see A fond farewell to the CLI.
An Expectation Suite is a collection of verifiable assertions about data.
Expectation Suites combine multiple ExpectationsA verifiable assertion about data. into an overall description of data. For example, you can group all the Expectations about a given table in a given database into an Expectation Suite and name it my_database.my_table
. Expectation Suite names are customizable, and the only constraint is that it must be unique to a given project.
Relationship to other objects
Expectation Suites are stored in an Expectation StoreA connector to store and retrieve information about collections of verifiable assertions about data.. They are generated interactively using a ValidatorUsed to run an Expectation Suite against data. or automatically using ProfilersGenerates Metrics and candidate Expectations from data., and are used by CheckpointsThe primary means for validating data in a production deployment of Great Expectations. to ValidateThe act of applying an Expectation Suite to a Batch. data.
Use cases
The lifecycle of an Expectation Suite starts with creating it. Then it goes through an iterative loop of Review and Edit as the team's understanding of the data described by the suite evolves.
Expectation Suites are largely managed automatically in the workflows for creating Expectations. When the Expectations are created, an Expectation Suite is created to contain them. In the Profiling workflow, this Expectation Suite will contain all the Expectations generated by the Profiler. In the interactive workflow, an Expectation Suite will be configured to include Expectations as they are defined, but will not be saved to an Expectation Store until you issue the command for it to be.
For more information on these processes, please see:
- Our overview on the process of Creating Expectations
- Our guide on how to create and edit Expectations with instant feedback from a sample Batch of data
Expectation Suites are used during the Validation of data. In this step, you will need to provide one or more Expectation Suites to a Checkpoint. This can either be done by configuring the Checkpoint to use a preset list of one or more Expectation Suites, or by configuring the Checkpoint to accept a list of one or more Expectation Suites at runtime.
Reusability
Expectation Suites are primarily used by Checkpoints, which can accept a list of one or more Expectation Suite and Batch Request pairs. Because they are stored independently of the Checkpoints that use them, the same Expectation Suite can be included in the list for multiple Checkpoints, provided the Expectation Suite contains a list of Expectations that describe the data that Checkpoint will Validate. You can even use the same Expectation Suite multiple times within the same Checkpoint by pairing it with different Batch Requests.
CRUD operations
A Great Expectations Expectation Suite enables you to perform Create, Read, Update, and Delete (CRUD) operations on the Suite's Expectations without needing to re-run them.
Each of the Expectation Suite methods that support a Create, Read, Update, or Delete (CRUD) operation relies on two main parameters - expectation_configuration
and match_type
.
- expectation_configuration - an
ExpectationConfiguration
object that is used to determine whether and where this Expectation already exists within the Suite. It can be a complete or a partial ExpectationConfiguration. - match_type - a string with the value of
domain
,success
, orruntime
which determines the criteria used for matching:domain
checks whether two Expectation Configurations apply to the same data. It results in the loosest match, and can use the least complete ExpectationConfiguration object. For example, for a column map Expectation, adomain
match_type will check that the expectation_type matches, and that the column and any row_conditions that affect which rows are evaluated by the Expectation match.success
criteria are more exacting - in addition to thedomain
kwargs, these include those kwargs used when evaluating the success of an Expectation, likemostly
,max
, orvalue_set
. -runtime
are the most specific - in addition todomain_kwargs
andsuccess_kwargs
, these include kwargs used for runtime configuration. Currently, these includeresult_format
,include_config
, andcatch_exceptions
Access
You will rarely need to directly access an Expectation Suite. If you do need to edit one, the simplest way is through the CLI. To do so, run the command:
great_expectations suite edit NAME_OF_YOUR_SUITE_HERE
This will open a Jupyter Notebook where each Expectation in the Expectation Suite is loaded as an individual cell. You can edit, remove, and add Expectations in this list. Running the cells will create the Expectations in a new Expectation Suite, which you can then save over the old Expectation Suite or save under a new name. The Expectation Suite and any changes made will not be stored until you give the command for it to be saved, however.
In almost all other circumstances you will simply pass the name of any relevant Expectation Suites to an object such as a Checkpoint that will manage accessing and using it for you.
Save Expectation Suites
Each Expectation Suite is saved in an Expectation Store, as a JSON file in the great_expectations/expectations
subdirectory of the Data Context. Best practice is for users to check these files into the version control each time they are updated, in the same way they treat their source files. This discipline allows data quality to be an integral part of versioned pipeline releases.
You can save an Expectation Suite by using a Validator'sUsed to run an Expectation Suite against data. save_expectation_suite()
method. This method will be included in the last cell of any Jupyter notebook launched from the CLI for the purpose of creating or editing Expectations.