3. Create data models

This section introduces the core concepts of data modeling. You'll learn how to create a simple data transformation use case with Recurve.

Assets in Recurve

Assets are the building blocks of your data workflow in Recurve. You can think of assets as the essential components you'll create and manage throughout your data transformation process. Each type of asset serves a specific purpose, working together to turn your raw data into valuable insights.

In Data modeling, you mainly work with these types of assets:

Sources: references to your raw data, allowing you to connect and document your data origins.
Models: queries that process this data, applying transformations to create structured, analytics-ready datasets.
Jinja macros and variables: Jinja templating artefacts that add programming logic to SQL, allowing you to write more dynamic and maintainable transformations.

Data modeling process

The data modeling process in Recurve follows a logical flow that helps you build and validate your transformation workflow.

Define your sources: Start by defining sources that represent your raw data tables. Sources help you describe and document the origins of your data.
Create data models: With your sources defined, create data models to transform your raw data into useful analytics datasets. Each model represents a specific transformation step that builds on previous steps.
Configure materialization: Specify how the results of your models will be materialized in your data warehouse.
Add data tests: Add data tests to the models to ensure they're working correctly. These tests associated with a model are automatically executed every time the model is successfully built.

Your development cycle should include these steps to ensure that transformations work properly before applying them to production data.

Prerequisites

The following walkthrough uses the jaffle_shop dataset (a fictional e-commerce store) provided by the dbt Community. You can follow the guide in this repository to generate the data and load it into your target database: jaffle-shop-generator.

Walkthrough

Let's go through the data modeling process of Recurve with hands-on steps.

To begin, from the Data development dashboard, open the project that you've created. By default, Recurve navigates you to the Design section, where all transformation activities happen.

Define your sources

Follow these steps:

In the Models tab, click on the + icon and select Add source.
In the opened modal:
1. Select the connection type.
2. Select the target connection. This is the project connection that you've set up in 2. Create a project
Click Next. Recurve then displays all the tables available from the target connection.

Select the desired raw tables or models.
Here we select all tables organized in the jaffle_shop schema.

jaffle_shop tables

The jaffle_shop schema includes these tables:

Customers (who place Orders)
Orders (from those Customers)
Products (the food and beverages the Orders contain)
Items (of those Products)
Supplies (needed for making those Products)
Stores (where the Orders are placed and fulfilled)

Click Add source.

The selected tables will then be added to the Sources folder and grouped by the schema name.

Create data models

To demonstrate the dynamics and modularity of data models, here we're going to create three models:

This stage model standardizes customer data by selecting relevant fields (customer_id, first_name, and last_name) from the raw customers table.

This uses the Jinja {{ source() }} function to reference the raw tables defined in the previous section.

select 
    id as customer_id,
    name as customer_name

from {{ source('jaffle_shop', 'raw_customers') }}

Similar to stg_customers, this stage model standardizes order data by selecting and renaming relevant fields (order_id, customer_id, order_date, status) from the raw orders table.

select 
    id as order_id,
    customer as customer_id,
    ordered_at as order_date

from {{ source("jaffle_shop", "raw_orders") }}

This model consolidates customer data with order history.

Here we use the {{ ref() }} function to make reference to the two staging models.

-- Reference the staged data (stg_customers and stg_orders) to gather customer information.

with customers as (

    select * from {{ ref('stg_customers') }}

),

orders as (

    select * from {{ ref('stg_orders') }}

),

-- Aggregate the order data to calculate metrics per customer

customer_orders as (

    select
        customer_id,

        min(order_date) as first_order_date,
        max(order_date) as most_recent_order_date,
        count(order_id) as number_of_orders

    from orders

    group by 1

),

-- Join aggregated data with the customer information to produce details about each customer and their order history

final as (

    select
        customers.customer_id,
        customers.customer_name,
        customer_orders.first_order_date,
        customer_orders.most_recent_order_date,
        coalesce(customer_orders.number_of_orders, 0) as number_of_orders

    from customers

    left join customer_orders using (customer_id)

)

select * from final

Follow these steps to create each model:

In the Models tab, click on the + icon and select New SQL model.
Provide the model name and click Create.
The new model is then placed in the Models folder.
Open the model in the editor and paste in the query.
Click Save to confirm the changes.
Click Preview to view the query output in the Result tab.
Perform the steps above to create the other two models.

Now that we've created two staging models that standardize raw data, and one model that aggregates and consolidates the results, we can view them in Data linage to better understand the relationship.

Open a model and toggle on the Lineage view option. This will display a DAG (directed acyclic graph) showing the relationship of assets, from raw data to the final downstream model.

Data lineage is achieved through the use of source() and ref() functions, which automatically track dependencies of assets.

Configure materialization

You can specifically configure how a model is materialized within your warehouse.

By default, all models have the table materialization.

Follow these steps:

Open a model in the editor.
Click on .
In the Materialization field, select a materialization option.

Continuing with our three example models, we can materialize the staging models as views to ensure they reflect the latest source data and minimize storage costs. On the other hand, the consolidated model can be materialized as table as it is the final model and is queried more frequently.

Add data tests

Coming soon: Data tests will be available in the next release.

Data tests are simply SQL queries that return failing records, based on the condition that you set. These tests validate the correctness of the transformed data and ensure results from the model meet predefined standards.

Recurve provides a list of built-in tests that you can quickly add to your models.

To add a test to a model, follow these steps:

Open a model in the editor.
Switch to the Test cases tab.
Click +Add new.
Select a template and specify the values.
For example, with the stg_orders model, we can add the Empty Value test to verify that no null value exists in the date column.
Click Add.

The new test is added to the model's test case list and is executed every time the model runs in the console or as part of a pipeline.

Previous2. Create a project Next4. Build models in console

Last updated 2 months ago