How to configure and use a MetricStore
Metric storage is an experimental feature.
A MetricStore
is a StoreA connector to store and retrieve information about metadata in Great Expectations. that stores Metrics computed during Validation. A MetricStore
tracks the run_id
of the Validation and the Expectation SuiteA collection of verifiable assertions about data. name in addition to the Metric name and Metric kwargs.
Saving MetricsA computed attribute of data such as the mean of a column. during ValidationThe act of applying an Expectation Suite to a Batch. lets you construct a new data series based on observed dataset characteristics computed by Great Expectations. A data series can serve as the source for a dashboard, or overall data quality metrics.
Prerequisites
- A Great Expectations instance
- Completion of the Quickstart
- A configured Data Context
1. Add a MetricStore
To define a MetricStore
, add a Metric StoreA connector to store and retrieve information about computed attributes of data, such as the mean of a column. configuration to the stores
section of your great_expectations.yml
. The configuration must include the following keys:
class_name
- EnterMetricStore
. This key determines which class is instantiated to create theStoreBackend
. Other fields are passed through to theStoreBackend
class on instantiation. The only backend Store under test for use with aMetricStore
is theDatabaseStoreBackend
with Postgres.store_backend
- Defines how your metrics are persisted.
To use an SQL Database such as Postgres, add the following fields and values:
class_name
- EnterDatabaseStoreBackend
.credentials
- Point to the credentials defined in yourconfig_variables.yml
, or define them inline.
The following is an example of how the MetricStore
configuration appears in great_expectations.yml
:
stores:
# ...
metric_store: # You can choose any name as the key for your metric store
class_name: MetricStore
store_backend:
class_name: DatabaseStoreBackend
credentials: ${my_store_credentials}
# alternatively, define credentials inline:
# credentials:
# username: my_username
# password: my_password
# port: 1234
# host: xxxx
# database: my_database
# driver: postgresql
The next time your Data Context is loaded, it will connect to the database and initialize a table to store metrics if one has not already been created.
2. Configure a Validation Action
When a MetricStore
is available, add a StoreMetricsAction
validation ActionA Python class with a run method that takes a Validation Result and does something with it to your CheckpointThe primary means for validating data in a production deployment of Great Expectations. to save Metrics during Validation. The validation Action must include the following fields:
class_name
- EnterStoreMetricsAction
. Determines which class is instantiated to execute the Action.target_store_name
- Enter the key for the MetricStore you added in yourgreat_expectations.yml
. In the previous example, themetrics_store
field defines which Store backend to use when persisting the metrics.requested_metrics
- Identify the Expectation Suites and Metrics you want to store.
Add the following entry to great_expectations.yml
to generate Validation ResultGenerated when data is Validated against an Expectation or Expectation Suite. statistics:
expectation_suite_name:
statistics.<statistic name>
Add the following entry to great_expectations.yml
to generate values from a specific ExpectationA verifiable assertion about data. result
field:
expectation_suite_name:
- column:
<column name>:
<expectation name>.result.<value name>
To indicate that any Expectation Suite can be used to generate values, use the wildcard "*"
.
If you use an Expectation Suite name as a key, Metrics are only added to the MetricStore
when the Expectation Suite runs. When you use the wildcard "*"
, Metrics are added to the MetricStore
for each Expectation Suite that runs in the Checkpoint.
The following example yaml configuration adds StoreMetricsAction
to the taxi_data
dataset:
action_list:
# ...
- name: store_metrics
action:
class_name: StoreMetricsAction
target_store_name: metric_store # This should match the name of the store configured above
requested_metrics:
public.taxi_data.warning: # match a particular expectation suite
- column:
passenger_count:
- expect_column_values_to_not_be_null.result.element_count
- expect_column_values_to_not_be_null.result.partial_unexpected_list
- statistics.successful_expectations
"*": # wildcard to match any expectation suite
- statistics.evaluated_expectations
- statistics.success_percent
- statistics.unsuccessful_expectations
3. Test your MetricStore and StoreMetricsAction
Run the following command to run your Checkpoint and test StoreMetricsAction
:
import great_expectations as gx
context = gx.get_context()
checkpoint_name = "version-0.16.16 your checkpoint name here"
context.run_checkpoint(checkpoint_name=checkpoint_name)