Metrics
Metrics are values derived from one or more Batches that can be used to evaluate Expectations or to summarize the result of Validation. A Metric could be a statistic, such as the minimum value of the column, or a more complex object, such as a histogram.
Metrics are the core tool used to validate data. When an Expectation should be evaluated, Great Expectations collects all the Metrics requested by the Expectation and provides them to the Expectation's validation logic. The Expectation can also expose Metrics, such as the observed value of a useful statistic via an Expectation Validation Result, where Data Docs -- or other Expectations -- can use them.
Put simply, a Metric answers a question about your data posed by an Expectation.
Metrics are produced using ExecutionEngine-specific logic that is defined in a MetricProvider
. When a MetricProvider
class is first encountered, Great Expectations will register the metric and any methods that it defines as able to
produce Metrics.
Metrics naming conventions
Metrics can have any name. However, for the "core" Great Expectations Metrics, we use the following conventions:
- For aggregate metrics, such as the mean value of a column, we use describe the domain and name of the statistic,
such as
column.mean
orcolumn.max
. - For map metrics, which produce values for individual records or rows, we define the domain using the prefix "
column_values" and use several consistent suffixes to provide related metrics. For example, for the Metric that
defines whether specific column values fall into an expected set, several related metrics are defined:
column_values.in_set.unexpected_count
provides the total number of unexpected values in the domain.column_values.in_set.unexpected_values
provides a sample of unexpected_values; "result_format" is one of its value_keys to determine how many values should be returned.column_values.in_set.unexpected_rows
provides full rows for which the value in the domain column was unexpectedcolumn_values.in_set.unexpected_value_counts
provides a count of how many times each unexpected value occurred
Additionally, to facilitate optimized computation of Metrics, we use Metric Partials which define partially-parameterized functions that are necessary to build a desired Metric.
- For aggregate metrics, we often use an ExecutionEngine specific function with the suffix
.aggregate_fn
, such ascolumn.max.aggregate_fn
. - For map metrics, to compute
column_values.in_set.unexpected_count
, we will rely on a condition calledcolumn_values.in_set.condition
.
Types of MetricProvider Functions
This diagram shows the relationship between different types of MetricProvider functions.
Accessing Metrics
Expectation Validation Results and Expectation Suite Validation Results can expose metrics that are defined by specific
Expectations that have been validated, called "Expectation Defined Metrics." To access those values, we address the
metric as a dot-delimited string that identifies the value, such as expect_column_values_to_be_unique .success
or expect_column_values_to_be_between.result.unexpected_percent
. These metrics may be stored in a MetricsStore.
A metric_kwargs_id
is a string representation of the Metric Kwargs that can be used as a database key. For simple
cases, it could be easily readable, such as column=Age
, but when there are multiple keys and values or complex values,
it will most likely be an md5 hash of key/value pairs. It can also be None
in the case that there are no kwargs
required to identify the metric.
The following examples demonstrate how metrics are defined:
res = df.expect_column_values_to_be_in_set(
"color",
["red", "green"]
)
res.get_metric(
"expect_column_values_to_be_in_set.result.missing_count",
column="color"
)
See the How to configure a MetricsStore guide for more information.