Data Assistant
A Data Assistant is a pre-configured utility that simplifies the creation of ExpectationsA verifiable assertion about data.. A Data Assistant can help you determine a starting point when working with a large, new, or complex dataset by asking questions and then building a list of relevant MetricsA computed attribute of data such as the mean of a column. from the answers to those questions. Branching question paths based on your responses ensure that additional, relevant Metrics are not missed. The result is a comprehensive collection of Metrics that can be saved, reviewed as graphical plots, or used by the Data Assistant to generate a set of proposed Expectations.
Data Assistants allow you to introspect multiple BatchesA selection of records from a Data Asset. and create an Expectation SuiteA collection of verifiable assertions about data. from the aggregated Metrics of those Batches. They provide convenient, visual representations of the generated Expectations to assist with identifying outliers in the corresponding parameters. They can be accessed from your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components., and provide a good starting point for building Expectations or performing initial data exploration.
Relationships to other objects
A Data Assistant implements a pre-configured ProfilerGenerates Metrics and candidate Expectations from data. in order to gather Metrics and propose an Expectation Suite based on the introspection of the Batch or Batches contained in a provided Batch RequestProvided to a Datasource in order to create a Batch..
Use cases
Data Assistants are an ideal starting point for creating Expectations. When you're working with unfamiliar data, a Data Assistant can provide an overview by introspecting the data and generating a series of relevant Expectations using estimated parameters for you to review. When you use the "flag_outliers"
value for the estimation
parameter, your generated Expectations have parameters that disregard values that the Data Assistant identifies as outliers. To create a graphical representation of the generated Expectations, use the Data Assistant's plot_metrics()
method. Reviewing the Data Assistant's results can help you identify outliers in your data.
When you're working with familiar, good data, a Data Assistant can use the "exact"
value for the estimation
parameter to provide comprehensive Expectations that reflect the values found in the provided data.
Profiling
To provide a representative analysis of the provided data, a Data Assistant can automatically process multiple Batches from a single Batch Request.
Multi-Batch introspection
Data Assistants leverage the ability to process multiple Batches from a single Batch Request to provide a representative analysis of the provided data. With previous Profilers you would only be able to introspect a single Batch at a time. This meant that the Expectation Suite generated would only reflect a single Batch. If you had many Batches of data that you wanted to build inter-related Expectations for, you would have needed to run each Batch individually and then manually compare and update the Expectation parameters that were generated. With a Data Assistant, that process is automated. You can provide a Data Assistant multiple Batches and get back Expectations that have parameters based on, for instance, the mean or median value of a column on a per-Batch basis.
Visual plots for Metrics
When working in a Jupyter Notebook you can use the plot_metrics()
method of a Data Assistant's result object to generate a visual representation of your Expectations, the values that were assigned to their parameters, and the Metrics that informed those values. This assists in exploratory data analysis and fine-tuning your Expectations, while providing complete transparency into the information used by the Data Assistant to build your Expectations.
Data Assistants can be accessed from your Data Context. To select a Data Assistant in a Jupyter Notebook, enter context.assistants.
and use code completion. All Data Assistants have a run(...)
method that takes in a Batch Request and numerous optional parameters, and then loads the results into an Expectation Suite for future use.
To access the Onboarding Data Assistant, use context.assistants.onboarding
.
Configure
Data Assistants are pre-configured. You provide the Batch Request, and some optional parameters in the Data Assistant's run(...)
method.
Related documentation
To learn more about working with the Onboarding Data Assistant, see How to create an Expectation Suite with the Onboarding Data Assistant.