Initialize a Data Context
- You need a Python environment where you can install Great Expectations and other dependencies, e.g. a virtual environment.
Set up your machine for the tutorial
For this tutorial, we will use a simplified version of the NYC taxi ride data.
Clone the ge_tutorials repository to download the data and directories with the final versions of the tutorial, which you can use for reference:
git clone https://github.com/superconductive/ge_tutorials
cd ge_tutorials
The repository you cloned contains several directories with final versions for our tutorials. The final version for this tutorial is located in the getting_started_tutorial_final_v3_api/
folder. You can use the final version as a reference or to explore a complete deploy of Great Expectations, but you do not need it for this tutorial.
Install Great Expectations and dependencies
Great Expectations requires Python 3 and can be installed using pip. If you haven’t already, install Great Expectations by running:
pip install great_expectations
You can confirm that installation worked by running
great_expectations --version
This should return something like:
great_expectations, version 0.13.43
For detailed installation instructions, see How to install Great Expectations locally.
Other deployment patterns
Create a Data Context
In Great Expectations, your Data Context manages your project configuration, so let’s go and create a Data Context for our tutorial project!
When you installed Great Expectations, you also installed the Great Expectations command line interface (CLI). It provides helpful utilities for deploying and configuring Data Contexts, plus a few other convenience methods.
To initialize your Great Expectations deployment for the project, run this command in the terminal from the ge_tutorials/
directory:
great_expectations init
You should see this:
Using v3 (Batch Request) API
___ _ ___ _ _ _
/ __|_ _ ___ __ _| |_ | __|_ ___ __ ___ __| |_ __ _| |_(_)___ _ _ ___
| (_ | '_/ -_) _` | _| | _|\ \ / '_ \/ -_) _| _/ _` | _| / _ \ ' \(_-<
\___|_| \___\__,_|\__| |___/_\_\ .__/\___\__|\__\__,_|\__|_\___/_||_/__/
|_|
~ Always know what to expect from your data ~
Let's create a new Data Context to hold your project configuration.
Great Expectations will create a new directory with the following structure:
great_expectations
|-- great_expectations.yml
|-- expectations
|-- checkpoints
|-- plugins
|-- .gitignore
|-- uncommitted
|-- config_variables.yml
|-- data_docs
|-- validations
OK to proceed? [Y/n]: <press Enter>