Install Great Expectations with source data system dependencies
You can install Great Expectations (GX) locally, or in hosted environments such as Databricks, Amazon EMR, or Google Cloud Composer. Installing GX locally lets you test features and functionality to determine if it's suitable for your use case.
Windows support for the open source Python version of GX is currently unavailable. If you’re using GX in a Windows environment, you might experience errors or performance issues.
- Local
- Hosted
- Amazon S3
- Microsoft Azure Blob Storage
- Google Cloud Storage
- SQL databases
Local
Install Great Expectations (GX) locally.
Prerequisites
- An installation of Python, version 3.8 to 3.11. To download and install Python, see Python downloads.
Great Expectations is developed and tested on macOS and Linux Ubuntu. The installation on Windows may differ from the following procedure. If you have questions or encounter issues, post your comments on the Great Expectations Slack channel.
If you're using a Mac M1, see Installing Great Expectations on a Mac M1.
Check Python version
Run the following code to check what version of Python is currently installed:
python --version
Great Expectations supports Python versions 3.8 to 3.11. If a Python 3 version number is not returned, run the following code:
python3 --version
Choose installation method
Our recommended best practice is to use a virtual environment for your Great Expectations installation. Both standard Python 3 and Anaconda support the creation of virtual environments. Once the virtual environment is created, you can then use either pip
or conda
to install project requirements, depending on how the virtual environment was created.
- pip
- conda
After you have confirmed that Python 3 is installed locally, you can create a virtual environment with venv
before installing your packages with pip
. The following examples use venv
for virtual environments because it is included with Python 3. You can use alternate tools such as virtualenv and pyenv to install GX in virtual environments.
Run one of the following code blocks to create your virtual environment:
python -m venv my_venv
or
python3 -m venv my_venv
A new directory named my_venv
is created in your virtual environment.
Run the following code to activate the virtual environment:
source my_venv/bin/activate
To change the name of your virtual environment, replace my_venv
in the example code.
After you've activated your virtual environment, you should ensure that you have the latest version of pip installed. Pip is a tool that is used to easily install Python packages.
Run the following code to ensure that you have the latest version of pip installed:
python -m ensurepip --upgrade
or
python3 -m ensurepip --upgrade
Anaconda is a package and management system that supports Python. If you choose to go through the installation process using Anaconda, you will want to ensure that Anaconda is installed by running:
conda --version
If no version number is printed, you can download Anaconda here.
Once Anaconda is installed, you can create and activate a new virtual environment by running:
conda create --name YOUR_ENVIRONMENT_NAME
conda activate YOUR_ENVIRONMENT_NAME
Replace "YOUR_ENVIRONMENT_NAME" with the name you wish you use for your environment.
Install GX
Once you have your virtual environment created and activated, you will be able to use either pip or Anaconda to install Great Expectations.
- pip
- conda
Run one of the following code blocks to use pip to install Great Expectations:
python -m pip install great_expectations
or
python3 -m pip install great_expectations
You can use Anaconda to install Great Expectations by running:
conda install -c conda-forge great-expectations
Confirm GX installation
Run the following code to confirm the GX installation is working:
great_expectations --version
Version information similar to the following is returned:
great_expectations, version 0.17.19
Hosted
Great Expectations can be deployed in environments such as Databricks, Amazon EMR, or Google Cloud Composer. These environments do not always have a file system that allows a Great Expectations installation. To install Great Expectations in a hosted environment, see one of the following guides:
Amazon S3
Create your GX Python environment, install Great Expectations locally, and then configure the necessary dependencies to access data stored on Amazon S3.
Prerequisites
- An installation of Python 3.8 to 3.11. To download and install Python, see Python downloads.
- The ability to install Python modules with pip
- The AWS CLI. See Installing or updating the latest version of the AWS CLI.
- AWS credentials. See Configuring the AWS CLI.
Ensure your AWS CLI version is the most recent
You can verify that the AWS CLI has been installed by running the command:
aws --version
If this command does not respond by informing you of the version information of the AWS CLI, you may need to install the AWS CLI or otherwise troubleshoot your current installation. For detailed guidance on how to do this, please refer to Amazon's documentation on how to install the AWS CLI)
Ensure your AWS credentials are correctly configured
You can verify that the AWS CLI has been installed by running the command:
aws --version
If this command does not respond by informing you of the version information of the AWS CLI, you may need to install the AWS CLI or otherwise troubleshoot your current installation. For detailed guidance on how to do this, please refer to Amazon's documentation on how to install the AWS CLI)
Check your Python version
You can check your version of Python by running:
python --version
GX currently supports Python versions 3.8 to 3.11
python
or python3
Depending on your installation and configuration of Python 3, you may find that executing Python commands from the terminal by calling python
doesn't work as desired. If a command using python
does not work, try using python3
.
Instead of:
python --version
Try:
python3 --version
If this produces the desired result, simply replace python
with python3
in our example terminal commands.
If this does not work, you may need to look into your Python 3 installation or configuration.
Create a Python virtual environment
As a best practice, we recommend using a virtual environment to partition your GX installation from any other Python projects that may exist on the same system. This ensures that there will not be dependency conflicts between the GX installation and other Python projects.
Once we have confirmed that Python 3 is installed locally, we can create a virtual environment with venv
.
venv
?We have chosen to use venv
for virtual environments in this guide because it is included with Python 3. You are not limited to using venv
, and can just as easily install Great Expectations into virtual environments with tools such as virtualenv
, pyenv
, etc.
We will create our virtual environment by running:
python -m venv my_venv
This command will create a new directory called my_venv
. Our virtual environment will be located in this directory.
In order to activate the virtual environment we will run:
source my_venv/bin/activate
my_venv
?You can name your virtual environment anything you like. Simply replace my_venv
in the examples above with the name that you would like to use.
Install GX with optional dependencies for S3
To install Great Expectations with the optional dependencies needed to work with AWS S3 we execute the following pip command from the terminal:
python -m pip install 'great_expectations[s3]'
This will install Great Expectations and the boto3
package. GX uses boto3
to access S3.
Verify the GX has been installed correctly
You can verify that GX installed successfully with the CLI command:
great_expectations --version
The output you receive if GX was successfully installed will be:
great_expectations, version 0.17.19
Next steps
Now that you have installed GX with the necessary dependencies for working with S3, you are ready to initialize your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. The Data Context will contain your configurations for GX components, as well as provide you with access to GX's Python API.
To create a Data Context, see Instantiate a Data Context.
Microsoft Azure Blob Storage
Create your GX Python environment, install Great Expectations locally, and then configure the necessary dependencies to access data stored on Microsoft Azure Blob Storage.
Prerequisites
- An installation of Python 3.8 to 3.11. To download and install Python, see Python downloads.
- The ability to install Python modules with pip
- An Azure Storage account. A connection string is required to complete the setup.
Check your Python version
You can check your version of Python by running:
python --version
GX currently supports Python versions 3.8 to 3.11
python
or python3
Depending on your installation and configuration of Python 3, you may find that executing Python commands from the terminal by calling python
doesn't work as desired. If a command using python
does not work, try using python3
.
Instead of:
python --version
Try:
python3 --version
If this produces the desired result, simply replace python
with python3
in our example terminal commands.
If this does not work, you may need to look into your Python 3 installation or configuration.
Create a Python virtual environment
As a best practice, we recommend using a virtual environment to partition your GX installation from any other Python projects that may exist on the same system. This ensures that there will not be dependency conflicts between the GX installation and other Python projects.
Once we have confirmed that Python 3 is installed locally, we can create a virtual environment with venv
.
venv
?We have chosen to use venv
for virtual environments in this guide because it is included with Python 3. You are not limited to using venv
, and can just as easily install Great Expectations into virtual environments with tools such as virtualenv
, pyenv
, etc.
We will create our virtual environment by running:
python -m venv my_venv
This command will create a new directory called my_venv
. Our virtual environment will be located in this directory.
In order to activate the virtual environment we will run:
source my_venv/bin/activate
my_venv
?You can name your virtual environment anything you like. Simply replace my_venv
in the examples above with the name that you would like to use.
Install GX with optional dependencies for Azure Blob Storage
To install Great Expectations with the optional dependencies needed to work with Azure Blob Storage we execute the following pip command from the terminal:
python -m pip install 'great_expectations[azure]'
Verify that GX has been installed correctly
You can verify that GX installed successfully with the CLI command:
great_expectations --version
The output you receive if GX was successfully installed will be:
great_expectations, version 0.17.19
Configure the config_variables.yml
file with your Azure Storage credentials
We recommend that Azure Storage credentials be stored in the config_variables.yml
file, which is located in the uncommitted/
folder by default, and is not part of source control. The following lines add Azure Storage credentials under the key AZURE_STORAGE_CONNECTION_STRING
. Additional options for configuring the config_variables.yml
file or additional environment variables can be found here.
AZURE_STORAGE_CONNECTION_STRING: "DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY==>"
Next steps
To configure your Data Context to use Azure Blob Storage, see:
GCS
Create your GX Python environment, install Great Expectations locally, and then configure the necessary dependencies to access data stored on GCS.
Prerequisites
- An installation of Python 3.8 to 3.11. To download and install Python, see Python downloads.
- The ability to install Python modules with pip
- A Google Cloud Platform (GCP) service account with permissions to access GCP resources including Storage Objects.
Ensure your GCP credentials are correctly configured
The Google Cloud Platform documentation describes how to verify authentication for the Google Cloud API, which includes:
- Creating a Google Cloud Platform (GCP) service account.
- Setting the
GOOGLE_APPLICATION_CREDENTIALS
environment variable, - Verifying authentication by running a simple Google Cloud Storage client library script.
Check your Python version
You can check your version of Python by running:
python --version
GX currently supports Python versions 3.8 to 3.11
python
or python3
Depending on your installation and configuration of Python 3, you may find that executing Python commands from the terminal by calling python
doesn't work as desired. If a command using python
does not work, try using python3
.
Instead of:
python --version
Try:
python3 --version
If this produces the desired result, simply replace python
with python3
in our example terminal commands.
If this does not work, you may need to look into your Python 3 installation or configuration.
Create a Python virtual environment
As a best practice, we recommend using a virtual environment to partition your GX installation from any other Python projects that may exist on the same system. This ensures that there will not be dependency conflicts between the GX installation and other Python projects.
Once we have confirmed that Python 3 is installed locally, we can create a virtual environment with venv
.
venv
?We have chosen to use venv
for virtual environments in this guide because it is included with Python 3. You are not limited to using venv
, and can just as easily install Great Expectations into virtual environments with tools such as virtualenv
, pyenv
, etc.
We will create our virtual environment by running:
python -m venv my_venv
This command will create a new directory called my_venv
. Our virtual environment will be located in this directory.
In order to activate the virtual environment we will run:
source my_venv/bin/activate
my_venv
?You can name your virtual environment anything you like. Simply replace my_venv
in the examples above with the name that you would like to use.
Install optional dependencies
To install Great Expectations with the optional dependencies needed to work with GCP we execute the following pip command from the terminal:
python -m pip install 'great_expectations[gcp]'
This will install Great Expectations and additional packages for interacting with Google Cloud Storage.
Verify that GX has been installed correctly
You can verify that GX installed successfully with the CLI command:
great_expectations --version
The output you receive if GX was successfully installed will be:
great_expectations, version 0.17.19
Next steps
Now that you have installed GX with the necessary dependencies for working with GCS, you are ready to initialize your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. The Data Context will contain your configurations for GX components, as well as provide you with access to GX's Python API.
To create a Data Context, see Instantiate a Data Context.
SQL databases
Create your GX Python environment, install Great Expectations locally, and then configure the necessary dependencies to access data stored on SQL databases.
Prerequisites
- An installation of Python 3.8 to 3.11. To download and install Python, see Python downloads.
- The ability to install Python modules with pip
Check your Python version
You can check your version of Python by running:
python --version
GX currently supports Python versions 3.8 to 3.11
python
or python3
Depending on your installation and configuration of Python 3, you may find that executing Python commands from the terminal by calling python
doesn't work as desired. If a command using python
does not work, try using python3
.
Instead of:
python --version
Try:
python3 --version
If this produces the desired result, simply replace python
with python3
in our example terminal commands.
If this does not work, you may need to look into your Python 3 installation or configuration.
Create a Python virtual environment
As a best practice, we recommend using a virtual environment to partition your GX installation from any other Python projects that may exist on the same system. This ensures that there will not be dependency conflicts between the GX installation and other Python projects.
Once we have confirmed that Python 3 is installed locally, we can create a virtual environment with venv
.
venv
?We have chosen to use venv
for virtual environments in this guide because it is included with Python 3. You are not limited to using venv
, and can just as easily install Great Expectations into virtual environments with tools such as virtualenv
, pyenv
, etc.
We will create our virtual environment by running:
python -m venv my_venv
This command will create a new directory called my_venv
. Our virtual environment will be located in this directory.
In order to activate the virtual environment we will run:
source my_venv/bin/activate
my_venv
?You can name your virtual environment anything you like. Simply replace my_venv
in the examples above with the name that you would like to use.
Install GX with optional dependencies for SQL databases
To install Great Expectations with the optional dependencies needed to work with SQL databases we execute the following pip command from the terminal:
pip install 'great_expectations[sqlalchemy]'
The above pip instruction will install GX with basic SQL support through SqlAlchemy. However, certain SQL dialects require additional dependencies. Depending on the SQL database type you will be working with, you may wish to use one of the following installation commands, instead:
- AWS Athena:
pip install 'great_expectations[athena]'
- BigQuery:
pip install 'great_expectations[bigquery]'
- MSSQL:
pip install 'great_expectations[mssql]'
- PostgreSQL:
pip install 'great_expectations[postgresql]'
- Redshift:
pip install 'great_expectations[redshift]'
- Snowflake:
pip install 'great_expectations[snowflake]'
- Trino:
pip install 'great_expectations[trino]'
Verify that GX has been installed correctly
You can verify that GX installed successfully with the CLI command:
great_expectations --version
The output you receive if GX was successfully installed will be:
great_expectations, version 0.17.19
Set up credentials
Different SQL dialects have different requirements for connection strings and methods of configuring credentials. By default, GX allows you to define credentials as environment variables or as values in your Data Context. See Instantiate a Data Context.
There may also be third party utilities for setting up credentials of a given SQL database type. For more information on setting up credentials for a given source database, please reference the official documentation for that SQL dialect as well as our guide on how to set up credentials.
Next steps
Now that you have installed GX with the necessary dependencies for working with SQL databases, you are ready to initialize your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. The Data Context will contain your configurations for GX components, as well as provide you with access to GX's Python API.
To create a Data Context, see Instantiate a Data Context.