Data tests (Coming soon)
Coming soon: Data tests will be available in the next release.
Data tests are assertions that you make about models and resources in a project. These tests validate the correctness of the transformed data, ensuring that standards for integrity and quality are met before delivering the data to downstream analytics. A data test is simply a SQL or Jinja query that selects rows meeting a failing condition.
Data test is one of the dbt artifacts that Recurve integrates into its data validation capabilities.
There are two types of data tests:
Generic test: a test designed to be reusable across multiple models, columns, or tables. It targets specific data but can be applied broadly to check for common conditions or constraints.
Singular test: a custom SQL-based test where you can write a specific SQL query to validate data. It is typically used for more specific or complex checks that are unique to a particular model or situation.
Recurve provides a wide collection of templates for built-in generic tests, as well as the flexibility to write custom queries to cover more advanced scenarios. Built-in generic tests provide the basic but essential validations, such as NULL
check or referential integrity, while custom generic tests and singular tests can deal with complex business logic.
As with any quality control practice, we recommend starting with the most basic testing scenarios before moving to complex cases. You can first apply built-in generic tests to validate the core assumptions about your dataset structure, then add custom tests to reflect specific business rules.
Test categories
Data tests are associated with categories that reflect relevant business metrics:
Freshness
Check the timestamp to see if the data is up-to-date and made available for the intended use cases.
A daily sales report should have data of the most recent transactions.
Raw data is expected to be ingested daily before 6 pm.
Volume
Check that the number of rows are properly processed and falls within a specified range.
Updated data should be no less than 1000 records every day.
Completeness
Check if there are missing values in datasets.
Timestamp column should not contain
NULL
values.Name column should not contain empty strings.
Validity
Check if values in a column fall within a specified list or adhere to a defined format.
String value should be within the list of enums.
Numeric value should be within the expected data range.
Uniqueness
Check for undesired duplicates of data within a dataset.
In the order table, no two records should have the same combination of
order_id
andproduct_id
values.
Consistency
Check if aggregated data aligns correctly with the expected results.
The total sales amount in monthly_sales is aggregated from daily transactions in daily_sales table. Consistency check should be applied to ensure montly total sales is consistent with the sum of daily sales.
Test severity
A data test applied to an asset returns a list of failing rows as its result. Each test result has a severity level indicating its significance: Passed, Warning, or Error.
In the Severity level section of a test, you can set thresholds for these levels based on the number of failures:
For example, a test that checks for data completeness (columns with NULL
values) can have the following thresholds:
Error if the
NULL
% is greater than 10%.Warning if the
NULL
% is between 0% and 10%.Passed otherwise.
Create a test from a template
Test creation is centralized in the Test case template section, which displays all available test artifacts, including:
Templates: templates for built-in generic tests, organized by their scope (column-level or table-level) and by test category.
Custom generic tests: generic tests defined in project Library.
Custom SQL: singular tests for one-off use cases.
Currently, Recurve supports adding tests to a model.
Follow these steps to create a test from a template:
Open your model in the editor.
Switch to the Test cases tab and click + Add new.
In the test template, you can see all the available test templates, including the templates for built-in tests and custom tests. Select a desired template.
Configure the test case.
The configuration may differ between templates. For example, in the template for Row Values Freshness Check, we need to specify the following:
Name: the displayed name in the test case list.
Description: the description to communicate your intentions.
Track data from column: specify the column to apply the test condition.
Condition (optional): limit the scope of the test to only certain rows based on specific criteria, ensuring that the comparison is meaningful within a defined context.
Severity: set the threshold for severity levels. See: #test-result-and-severity.
Click Add.
The new test will be displayed in the Test cases tab of the model.
Test execution
Data tests assigned to a model are executed when you:
Run and preview the model using console.
Run a pipeline that involves the model. See: Job.
Last updated