Instantiate a Data Context
A Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. contains the configurations for ExpectationsA verifiable assertion about data., Metadata StoresA connector to store and retrieve information about metadata in Great Expectations., Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc., CheckpointsThe primary means for validating data in a production deployment of Great Expectations., and all things related to working with Great Expectations (GX). Use the information provided here to instantiate a Data Context so that you can continue working with previously defined GX configurations.
- Existing Filesystem
- Filesystem with Python
- Specific Filesystem
- Ephemeral
Existing Filesystem
Instantiate an existing Filesystem Data Context so that you can continue working with previously defined GX configurations.
Prerequisites
- A Great Expectations instance. See Install Great Expectations with source data system dependencies.
Import GX
Run the following command to import the GX module:
import great_expectations as gx
Run the get_context(...)
method
To quickly acquire a Data Context, use the get_context(...)
method without any defined parameters:
context = gx.get_context()
This functions as a convenience method for initializing, instantiating, and returning a Data Context. In the absence of parameters defining its behavior, calling get_context()
returns a Cloud Data Context, a Filesystem Data Context, or an Ephemeral Data Context depending on what type of Data Context has previously been initialized with your GX install.
If you have GX Cloud configured on your system, get_context()
instantiates and returns a Cloud Data Context. Otherwise, get_context()
instantiates and returns the last accessed Filesystem Data Context. If a previously initialized Filesystem Data Context cannot be found, get_context()
initializes, instantiates, and returns a temporary in-memory Ephemeral Data Context.
An Ephemeral Data Context is an in-memory Data Context that is not intended to persist beyond the current Python session. However, if you decide that you would like to save its contents for future use you can do so by converting it to a Filesystem Data Context:
context = context.convert_to_file_context()
This method will initialize a Filesystem Data Context in the current working directory of the Python process that contains the Ephemeral Data Context. For more detailed explanation of this method, please see our guide on how to convert an ephemeral data context to a filesystem data context
Verify Data Context content
We can ensure that the Data Context was instantiated correctly by printing its contents.
print(context)
This will output the full configuration of the Data Context in the format of a Python dictionary.
Python
A Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. is required in almost all Python scripts utilizing GX. Use Python code to initialize, instantiate, and verify the contents of a Filesystem Data Context.
Prerequisites
- A Great Expectations instance. See Install Great Expectations with source data system dependencies.
Import GX
Run the following command to import the GX module:
import great_expectations as gx
Determine the folder to initialize the Data Context in
Run the following command to initialize your Filesystem Data Context in an empty folder:
path_to_empty_folder = "/my_gx_project/"
Create a context
You provide the path for your empty folder to the GX library's FileDataContext.create(...)
method as the project_root_dir
parameter. Because you are providing a path to an empty folder, FileDataContext.create(...)
initializes a Filesystem Data Context in that location.
For convenience, the FileDataContext.create(...)
method instantiates and returns the newly initialized Data Context, which you can keep in a Python variable.
from great_expectations.data_context import FileDataContext
context = FileDataContext.create(project_root_dir=path_to_empty_folder)
If the project_root_dir
provided to the FileDataContext.create(...)
method points to a folder that does not already have a Data Context present, the FileDataContext.create(...)
method initializes a Filesystem Data Context in that location even if other files and folders are present. This allows you to initialize a Filesystem Data Context in a folder that contains your source data or other project related contents.
If a Data Context already exists in project_root_dir
, the FileDataContext.create(...)
method will not re-initialize it. Instead, FileDataContext.create(...)
instantiates and returns the existing Data Context.
Verify the Data Context content
We can ensure that the Data Context was instantiated correctly by printing its contents.
print(context)
This will output the full configuration of the Data Context in the format of a Python dictionary.
Specific
If you're using GX for multiple projects, you might want to use a different Data Context for each project. Instantiate a specific Filesystem Data Context so that you can switch between sets of previously defined GX configurations.
Prerequisites
- A Great Expectations instance. See Install Great Expectations with source data system dependencies.
- A previously initialized Filesystem Data Context.
Import GX
Run the following command to import the GX module:
import great_expectations as gx
Specify a folder containing a previously initialized Filesystem Data Context
Each Filesystem Data Context has a root folder in which it was initialized. This root folder identifies the specific Filesystem Data Context to instantiate.
path_to_project_root = "./my_project/"
Run the get_context(...)
method
You provide the path for your empty folder to the GX library's get_context(...)
method as the project_root_dir
parameter. Because you are providing a path to an empty folder, the get_context(...)
method instantiates and return the Data Context at that location.
context = gx.get_context(project_root_dir=path_to_project_root)
Note that there is a subtle distinction between the project_root_dir
and context_root_dir
arguments accepted by get_context(...)
.
Your context root is the directory that contains all your GX config while your project root refers to your actual working directory (and therefore contains the context root).
# The overall directory is your project root
data/
great_expectations/ # The GX folder with your config is your context root
great_expectations.yml
...
...
Both are functionally equivalent for purposes of working with a file-backed project.
If the root directory provided to the get_context(...)
method points to a folder that does not already have a Data Context, the get_context(...)
method initializes a new Filesystem Data Context in that location.
The get_context(...)
method instantiates and returns the newly initialized Data Context.
Verify the Data Context content
We can ensure that the Data Context was instantiated correctly by printing its contents.
print(context)
This will output the full configuration of the Data Context in the format of a Python dictionary.
Ephemeral
An Ephemeral Data Context is a temporary, in-memory Data Context. They are ideal for doing data exploration and initial analysis when you do not want to save anything to an existing project, or for when you need to work in a hosted environment such as an EMR Spark Cluster.
An Ephemeral Data Context does not persist beyond the current Python session. To keep the contents of your Ephemeral Data Context for future use, see How to convert an Ephemeral Data Context to a Filesystem Data Context.
Prerequisites
- A Great Expectations instance. See Setup: Overview.
Import classes
To create your Data Context, you'll create a configuration that uses in-memory Metadata Stores.
Run the following command to import the
DataContextConfig
and theInMemoryStoreBackendDefaults
classes:from great_expectations.data_context.types.base import (
DataContextConfig,
InMemoryStoreBackendDefaults,
)Run the following command to import the
EphemeralDataContext
class:from great_expectations.data_context import EphemeralDataContext
Create the Data Context configuration
Run the following command to create a Data Context configuration that specifies the use of in-memory Metadata Stores and pass in an instance of the InMemoryStoreBackendDefaults
class as a parameter when initializing an instance of the DataContextConfig
class:
project_config = DataContextConfig(
store_backend_defaults=InMemoryStoreBackendDefaults()
)
Instantiate an Ephemeral Data Context
Run the following command to initialize the EphemeralDataContext
class while passing in the DataContextConfig
instance you created as the value of the project_config
parameter.
context = EphemeralDataContext(project_config=project_config)
An Ephemeral Data Context is an in-memory Data Context that is not intended to persist beyond the current Python session. However, if you decide that you would like to save its contents for future use you can do so by converting it to a Filesystem Data Context:
context = context.convert_to_file_context()
This method will initialize a Filesystem Data Context in the current working directory of the Python process that contains the Ephemeral Data Context. For more detailed explanation of this method, please see our guide on how to convert an ephemeral data context to a filesystem data context
Connect GX to source data systems
Now that you have an Ephemeral Data Context you can connect GX to your data. See the following topics:
Next steps
To customize a Data Context configuration for Metadata Stores and Data Docs, see:
- Configure Expectation Stores
- Configure Validation Result Stores
- How to configure and use a Metric Store
- How to host and share Data Docs on a filesystem
To connect GX to source data: