ExecutionEngine
- class great_expectations.execution_engine.ExecutionEngine(name: Optional[str] = None, caching: bool = True, batch_spec_defaults: Optional[dict] = None, batch_data_dict: Optional[dict] = None, validator: Optional[Validator] = None)#
ExecutionEngine defines interfaces and provides common methods for loading Batch of data and compute metrics.
ExecutionEngine is the parent class of every backend-specific computational class, tasked with loading Batch of data and computing metrics. Each subclass (corresponding to Pandas, Spark, SQLAlchemy, and any other computational components) utilizes generally-applicable methods defined herein, while implementing required backend-specific interfaces. ExecutionEngine base class also performs the important task of translating cloud storage resource URLs to format and protocol compatible with given computational mechanism (e.g, Pandas, Spark). Then ExecutionEngine subclasses access the referenced data according to their paritcular compatible protocol and return Batch of data.
In order to obtain Batch of data, ExecutionEngine (through implementation of key interface methods by subclasses) gets data records and also provides access references so that different aspects of data can be loaded at once. ExecutionEngine uses BatchManager for Batch caching in order to reduce load on computational backends.
Crucially, ExecutionEngine serves as focal point for resolving (i.e., computing) metrics. Wherever opportunities arize to bundle multiple metric computations (e.g., SQLAlchemy, Spark), ExecutionEngine utilizes subclasses in order to provide specific functionality (bundling of computation is available only for “deferred execution” computational systems, such as SQLAlchemy and Spark; it is not available for Pandas, because Pandas computations are immediate).
Finally, ExecutionEngine defines interfaces for Batch data sampling and splitting Batch of data along defined axes.
Constructor builds an ExecutionEngine, using provided configuration options (instatiation is done by child classes).
- Parameters
name – (str) name of this ExecutionEngine
caching – (Boolean) if True (default), then resolved (computed) metrics are added to local in-memory cache.
batch_spec_defaults – dictionary of BatchSpec overrides (useful for amending configuration at runtime).
batch_data_dict – dictionary of Batch objects with corresponding IDs as keys supplied at initialization time
validator – Validator object (optional) – not utilized in V3 and later versions
- get_compute_domain(domain_kwargs: dict, domain_type: Union[str, great_expectations.core.metric_domain_types.MetricDomainTypes], accessor_keys: Optional[Iterable[str]] = None) Tuple[Any, dict, dict] #
get_compute_domain() is an interface method, which computes the optimal domain_kwargs for computing metrics based on the given domain_kwargs and specific engine semantics.
- Parameters
domain_kwargs (dict) – a dictionary consisting of the Domain kwargs specifying which data to obtain
domain_type (str or MetricDomainTypes) – an Enum value indicating which metric Domain the user would like to be using, or a corresponding string value representing it. String types include “column”, “column_pair”, “table”, and “other”. Enum types include capitalized versions of these from the class MetricDomainTypes.
accessor_keys (str iterable) – keys that are part of the compute Domain but should be ignored when describing the Domain and simply transferred with their associated values into accessor_domain_kwargs.
- Returns
data corresponding to the compute domain;
a modified copy of domain_kwargs describing the Domain of the data returned in (1);
a dictionary describing the access instructions for data elements included in the compute domain (e.g. specific column name).
In general, the union of the compute_domain_kwargs and accessor_domain_kwargs will be the same as the domain_kwargs provided to this method.
- Return type
A tuple consisting of three elements
- get_domain_records(domain_kwargs: dict) Any #
get_domain_records() is an interface method, which computes the full-access data (dataframe or selectable) for computing metrics based on the given domain_kwargs and specific engine semantics.
- Parameters
domain_kwargs (dict) –
- Returns
data corresponding to the compute domain