Advent of data

2022
6 min read

Terraforming Snowflake a brief look

A guide to terraform Snowflake in order to manage your database and schema assets.
Terraforming Snowflake a brief look
Photo by Shubham Dhage / Unsplash

As Data became a center piece of every company, Snowflake has quickly risen to being one of the top contenders in the enterprise-grade cloud Data Warehouse space. As with every new tool this brings new challenges on how to manage snowflake resources efficiently.

Snowflake prides itself in near zero management as you can start experimenting with it almost instantly as you gain access. Provisioning of Data warehouse, schemas, roles and users is fairly effortless, and it also offers SCIM integration (eg. Can integrate with your AAD).

However terraforming Snowflake also has its fair share of problems. In enterprise settings the consideration of security, access management, collaborations and audit logs need to be taken into account and soon you will find yourself writing queries and digging through query history in order to generate a good picture of what is happening inside of your account. With multiple environments the complexity only increases.

So how can we manage our snowflake resources in a similarly efficient way, without compromising on its ease of use. Well, cloud engineers gave us an answer a long time ago as they faced very similar problems when it comes to cloud infrastructure.

Infrastructure as Code

In a nutshell Infrastructure as Code (IaC) is a process where you define your infrastructure through machine-readable configuration files and provision them automatically. The tool of choice for many becomes terraform as it is fairly simple to use and offers a variety of providers. And if you guessed that snowflake is one of them you guessed right.

  • Providers, providers are essentially plugins for terraform to interact with cloud, snowflake or other APIs doc
  • Resources, terraform defines every piece of infrastructure as a resource. In our case database, schema, user and role are all separate resources. doc
  • State file, Terraform retains a state file of all your provisional resources which is essentially your configuration mapped to existing infrastructure (database) doc
  • plan and apply, Using state file, plan command, compares your configuration to existing infrastructure and produces a plan of changes that need to be performed on your existing resources in order to match the configuration. Once you are satisfied with changes that will be made, you can execute them with the apply command doc
  • Modules, Modules are containers for multiple resources used together and allow us to reuse our configuration file multiple times. doc

Snowflake and Terraform

So let’s take a look at a simple terraform example for provisioning a snowflake database.

terraform {
    required_providers {
        snowflake = {
        source  = "Snowflake-Labs/snowflake"
        version = "0.53"
        }
    }
}
provider "snowflake" {
    role  = "SYSADMIN"
    alias = "SYSADMIN"
}
resource "snowflake_database" "db" {
  provider = snowflake.SYSADMIN
  name     = "DB"
}
main.tf

Seems easy, right? This can be used to provision all kinds of snowflake resources and if you are new to terraform I highly recommend to take a look at this tutorial written by snowflake which will introduce you to all basic concepts.

As easy as it seems, it does bring a set of challenges of its own:

  • Code repetition: As the amount of database objects (resources) grows you will find yourself repeating the same chunks of code.
  • Permission model: With granting permission to roles, and roles to different users, it is hard to establish clear a permission model without querying the database afterwards.
  • Multiple environments: Introducing multiple environments into the picture will magnify these issues. (as for the role you might want to grant different permissions depending on the environment)

Terraform offers some ways how we can mitigate the code repetition and that is modules. According to your data strategy you might have read and written role provisions on certain schemas as soon as they are created and this can be encapsulated inside of the module. Similarly you can use variables to pass values dynamically to the config files during planning and execution. Another helpful option is to use for_each on repetitive resources and provision them using a single resource block.

However, we still face the biggest challenge of them all, readability. With splitting files into modules and using for_each, configuration files will become even more hard to read. In order to solve it, the best way would be to use an external file that will provide a great visibility into our database objects, where we could clearly define shcemas, roles, users and even databases. Somebody said YAML?

Snowflake + Terraform + YAML

YAML or yet another markup language is a data serialization language which is often used for writing configuration files (or at least that's how Red Hat defines it). If you are not familiar with YAML, you can find its specification here, but for our purpose a very basic understanding will be sufficient.

Lets take a look at how this hypothetical configuration could look like:

database: &DB1
    name: TEST
    
landing-schemas: &LS
  - name: &E1 ERP1
  - name: &E2 ERP2

domain-schemas: &DS
  - name: &D1 DOM1
  - name: &D2 DOM2

domain-schema-suffix:
  - DATA
  - TEST

access_roles: 
  - name: &R1 SNOW_ERP
    read_acces: 
      - *E1 
    write_access: 
      - *E2
  - name: &R2 SNOW_ERP2
    read_acces: ["DOM1_DATA", "DOM2_DATA"]

user-roles-mapping:
  - name: &A1 SNOW_AAD_GROUP1
    roles: 
      - *R1
      - *R2
parameters.yaml

For the purpose of this article i kept it in one file but you might want to split it according to your needs. We defined our database object such as database, schemas, roles and users in clearly readable form. The special character & and * denote an anchor and alias node respectively. For the sake of simplicity you can think of it as an address and pointer.

With our configuration defined in this way we can leverage another feature of terraform called locals doc Local value assigns a name to an expression, ergo it can be used multiple times without repeating the expression. Terraform also allows us to parse this YAML configuration doc.

locals {
    # decode the yaml file 
    yaml_file = yamldecode(file("parameters.yaml"))
    
    db = local.yaml_file["database"]

    # decide on database name based on env variable passed to terraform  
    db_name = var.env == "PROD" 
    	? local.db.name : "${local.db.name}_${var.env}"

    # parse landing schemas 
    db_landing_schemas = 
    flatten([for schema in local.yaml_file["landing-schemas"]: 
        {
                database = local.db_name
                name   = schema.name
        }
    ]) 
 }
local.tf

Which can be later used as an input for the for_each function and we can provision all the landing schemas in one resource block.

resource "snowflake_schema" "lanidng_schemas" {
    for_each = { for entry in local.db_landing_schemas: "${entry.database}.${entry.name}" => entry }
    provider = snowflake.SYSADMIN
    database = each.value.database 
    name     = each.value.name
    is_managed = false
    depends_on = [
      snowflake_database.db
    ]
}
part of main.tf

However what if we want to map role defined in our yaml file and grant read and write access at the same time? Well, we will use modules. Modules in terraform are defined as folders, which contain main.tf and optionally variables.tf and output.tf files. For this example we will disregard output.tf as we don't need to return any variables, since all mapping is defined in the yaml file. Lets start with extending locals file to parse roles and their access rights.

db_access_roles = [for role in local.yaml_file["access_roles"]:
        {
            database = local.db_name
            role_name = role.name
            write_access = toset(lookup(role, "write_access", [])) 
            read_access = toset(lookup(role, "read_acces", []))
        }
    ]
part of locals.tf

Now we need to define input variables for our module originally named roleschemagrant.

variable "db" {
    type = string
}
variable "write_schema" {
    type = set(string)
}
variable "read_schema" {
    type = set(string)
}
variable "role" {
    type = string
}
modules/roleschemagrant/variables.tf

These values will be our input and now lets define the module main file and our role assignment.

terraform {
  required_providers {
    snowflake = {
      source  = "Snowflake-Labs/snowflake"
      version = "0.53"
      configuration_aliases = [snowflake.role]
    }
  }
}

resource "snowflake_role" "dw_role" {
    provider = snowflake.role
    name     = var.role
}
resource "snowflake_schema_grant" "grant_write" {
  provider = snowflake.role
  for_each = var.write_schema
  database_name = "${var.db}"
  schema_name   = each.value
  privilege =  "CREATE TABLE"
  roles     = [snowflake_role.dw_role.name]
}

resource "snowflake_schema_grant" "grant_read" {
  provider = snowflake.role
  for_each = var.read_schema
  database_name = var.db
  schema_name   = each.value
  privilege = "USAGE"
  roles     = [snowflake_role.dw_role.name]
}
modules/roleschemagrant/main.tf

We are using for_each as snowflake assigns multiple roles to one schema and in our yaml file we are assigning multiple schemas to one role. This could be turned around, and in production settings you might want to define it as privileges on schemas instead of on roles, however for the purpose of this example this will suffice.

To stitch it all together we will call this module from our main.tf file.

module provision_roles {
    source = "./modules/roleschemagrant/"
    providers = {
        snowflake.role = snowflake.security_admin
    }
    for_each = {for entry in local.db_access_roles: entry.role_name => entry }
    db = each.value.database
    read_schema = each.value.read_access
    write_schema = each.value.write_access
    role = each.value.role_name
    depends_on = [
      snowflake_schema.domain_schemas,
      snowflake_schema.lanidng_schemas
    ]
}
part of main.tf

You can apply similar strategy to every single snowflake object. And that’s all folks! You can find a working example here.

Conclusion and consideration

The purpose of this article was to showcase how you can define a clear and readable definition of your snowflake objects and use terraform in order to deploy it. There are many other tools and even terraform snowflake provides for that matter, that can fit your use case better.

As it is a norm in the data world there is no one shoe fits them all. This is an example of how to provision snowflake objects in terraform, however this is not an ultimate guide, your strategy might be different. You can still leverage concepts described in this post, but I highly recommend you tweak it in order to fit your situation.