Contribution and Testing
Running tests
You can run all unit tests by running pytest
in the great_expectations
directory root. By default the tests will be run against pandas
and sqlite
, with the ability to test additional backends like postgresql
, spark
, and mssql
using pytest
flags. To run a test against a specific backend like PostgreSQL, you can run pytest --postgresql
.
Currently the list of supported pytest
flags for general testing are as follows:
--spark
: Execute tests against Spark backend.--postgresql
: Execute tests against PostgreSQL.--mysql
: Execute tests against MySql.--mssql
: Execute tests against Microsoft SQL Server.--bigquery
: Execute tests against Google BigQuery (requires additional set up).--aws
: Execute tests against AWS resources like S3, Redshift and Athena (requires additional setup).
In addition, if you would like to skip all local backend tests (with the exception of the pandas backend), you can run pytest --no-sqlalchemy
.
Note: as of early 2020, the tests generate many warnings. Most of these are generated by dependencies (pandas, sqlalchemy, etc.) You can suppress them with pytest’s --disable-pytest-warnings
flag: pytest --no-sqlalchemy --disable-pytest-warnings
.
BigQuery tests
In order to run BigQuery tests, you first need to go through the following steps:
- Select or create a Cloud Platform project.
- Setup Authentication.
- In your project, create a BigQuery dataset (e.g. named
test_ci
) and set the dataset default table expiration to.1
days
After setting up authentication, you can run with your project using the environment variables GE_TEST_BIGQUERY_PROJECT
and GE_TEST_BIGQUERY_DATASET
, e.g.
GE_TEST_BIGQUERY_PROJECT=<YOUR_GOOGLE_CLOUD_PROJECT>
GE_TEST_BIGQUERY_DATASET=test_ci
pytest tests/test_definitions/test_expectations_cfe.py --bigquery
Writing unit and integration tests
Production code in Great Expectations must be thoroughly tested. In general, we insist on unit tests for all branches of every method, including likely error states. Most new feature contributions should include several unit tests. Contributions that modify or extend existing features should include a test of the new behavior.
Experimental code in Great Expectations need only be tested lightly. We are moving to a convention where experimental features are clearly labeled in documentation and the code itself. However, this convention is not uniformly applied today.
Most of Great Expectations’ integration testing is in the CLI, which naturally exercises most of the core code paths. Because integration tests require a lot of developer time to maintain, most contributions should not include new integration tests, unless they change the CLI itself.
Note: we do not currently test Great Expectations against all types of SQL database. CI test coverage for SQL is limited to PostgreSQL, SQLite, MSSQL, and BigQuery. We have observed some bugs because of unsupported features or differences in SQL dialects, and we are actively working to improve dialect-specific support and testing.
Unit tests for Expectations
One of Great Expectations’ important promises is that the same Expectation will produce the same result across all supported execution environments: pandas, sqlalchemy, and Spark.
To accomplish this, Great Expectations encapsulates unit tests for Expectations as JSON files. These files are used as fixtures and executed using a specialized test runner that executes tests against all execution environments.
Test fixture files are structured as follows:
{
"expectation_type" : "expect_column_max_to_be_between",
"datasets" : [{
"data" : {...},
"schemas" : {...},
"tests" : [...]
}]
}
Each item under datasets
includes three entries: data
, schemas
, and tests
.
data
…defines a dataframe of sample data to apply Expectations against. The dataframe is defined as a dictionary of lists, with keys containing column names and values containing lists of data entries. All lists within a dataset must have the same length.
"data" : {
"w" : [1, 2, 3, 4, 5, 5, 4, 3, 2, 1],
"x" : [2, 3, 4, 5, 6, 7, 8, 9, null, null],
"y" : [1, 1, 1, 2, 2, 2, 3, 3, 3, 4],
"z" : ["a", "b", "c", "d", "e", null, null, null, null, null],
"zz" : ["1/1/2016", "1/2/2016", "2/2/2016", "2/2/2016", "3/1/2016", "2/1/2017", null, null, null, null],
"a" : [null, 0, null, null, 1, null, null, 2, null, null],
},
schemas
…define the types to be used when instantiating tests against different execution environments, including different SQL dialects. Each schema is defined as dictionary with column names and types as key-value pairs. If the schema isn’t specified for a given execution environment, Great Expectations will introspect values and attempt to guess the schema.
"schemas": {
"sqlite": {
"w" : "INTEGER",
"x" : "INTEGER",
"y" : "INTEGER",
"z" : "VARCHAR",
"zz" : "DATETIME",
"a" : "INTEGER",
},
"postgresql": {
"w" : "INTEGER",
"x" : "INTEGER",
"y" : "INTEGER",
"z" : "TEXT",
"zz" : "TIMESTAMP",
"a" : "INTEGER",
}
},
tests
…define the tests to be executed against the dataframe. Each item in tests
must have title
, exact_match_out
, in
, and out
. The test runner will execute the named Expectation once for each item, with the values in in
supplied as kwargs.
The test passes if the values in the expectation Validation Result correspond with the values in out
. If exact_match_out
is true, then every field in the Expectation output must have a corresponding, matching field in out
. If it’s false, then only the fields specified in out
need to match. For most use cases, false is a better fit, because it allows narrower targeting of the relevant output.
suppress_test_for
is an optional parameter to disable an Expectation for a specific list of backends.
See an example below. For other examples
"tests" : [{
"title": "Basic negative test case",
"exact_match_out" : false,
"in": {
"column": "w",
"result_format": "BASIC",
"min_value": null,
"max_value": 4
},
"out": {
"success": false,
"observed_value": 5
},
"suppress_test_for": ["sqlite"]
},
...
]
The test fixture files are stored in subdirectories of tests/test_definitions/
corresponding to the class of Expectation:
- column_map_expectations
- column_aggregate_expectations
- column_pair_map_expectations
- column_distributional_expectations
- multicolumn_map_expectations
- other_expectations
By convention, the name of the the file is the name of the Expectation, with a .json suffix. Creating a new json file will automatically add the new Expectation tests to the test suite.
Note: If you are implementing a new Expectation, but don’t plan to immediately implement it for all execution environments, you should add the new test to the appropriate list(s) in the candidate_test_is_on_temporary_notimplemented_list
method within tests/test_utils.py
. Often, we see Expectations developed first for pandas, then later extended to SqlAlchemy and Spark.
You can run just the Expectation tests with pytest tests/test_definitions/test_expectations.py
.
Performance testing
Configuring Data Before Running Performance Tests
The performance tests use BigQuery.
Before running a performance test, setup data with tests/performance/setup_bigquery_tables_for_performance_test.sh
.
For example:
GE_TEST_BIGQUERY_PEFORMANCE_DATASET=<YOUR_GCP_PROJECT> tests/performance/setup_bigquery_tables_for_performance_test.sh
For more information on getting started with BigQuery, please refer to the above section on BigQuery tests.
Running the Performance Tests
Run the performance tests with pytest, e.g.
pytest tests/performance/test_bigquery_benchmarks.py \
--bigquery --performance-tests \
-k 'test_taxi_trips_benchmark[1-True-V3]' \
--benchmark-json=tests/performance/results/`date "+%H%M"`_${USER}.json \
-rP -vv
Some benchmarks take a long time to complete. In this example, only the relatively fast test_taxi_trips_benchmark[1-True-V3]
benchmark is run and the output should include runtime like the following:
--------------------------------------------------- benchmark: 1 tests ------------------------------------------------------
Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------
test_taxi_trips_benchmark[1-True-V3] 5.0488 5.0488 5.0488 0.0000 5.0488 0.0000 0;0 0.1981 1 1
-----------------------------------------------------------------------------------------------------------------------------
The result is saved for comparisons as described below.
Comparing Performance Results
Compare test results in this directory with py.test-benchmark compare
, e.g.
$ py.test-benchmark compare --group-by name tests/performance/results/initial_baseline.json tests/performance/results/*${USER}.json
---------------------------------------------------------------------------- benchmark 'test_taxi_trips_benchmark[1-True-V3]': 2 tests ---------------------------------------------------------------------------
Name (time in s) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_taxi_trips_benchmark[1-True-V3] (initial_base) 5.0488 (1.0) 5.0488 (1.0) 5.0488 (1.0) 0.0000 (1.0) 5.0488 (1.0) 0.0000 (1.0) 0;0 0.1981 (1.0) 1 1
test_taxi_trips_benchmark[1-True-V3] (2114_work) 6.4675 (1.28) 6.4675 (1.28) 6.4675 (1.28) 0.0000 (1.0) 6.4675 (1.28) 0.0000 (1.0) 0;0 0.1546 (0.78) 1 1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Please refer to pytest-benchmark documentation for more info.
Checking in new benchmark results
When creating a pull request that is intended to improve performance, please include in the pull request benchmark results showing improvements. Please use the script run_benchmark_multiple_times.sh
to run the benchmark multiple times. Name the tests with the first argument provided to that script. For example, the tests/performance/results/minimal_multithreading_*.json
files were created with the following command:
$ tests/performance/run_benchmark_multiple_times.sh minimal_multithreading
Manual testing
We do manual testing (e.g. against various databases and backends) before major releases and in response to specific bugs and issues.