👋 We're Knock. We provide a set of simple APIs and a dashboard that developers use to introduce transactional notifications into their products, without having to build and maintain a notification system in-house.

—

Last month we shipped our data warehouse connector. It lets our customers bring the normalized, cross-channel notification engagement data we produce at Knock into their data warehouse to query alongside the rest of their product data.

In this post, we’ll cover why we shipped our data warehouse connector and our decision to use a vendor instead of building in-house.

Why we built our data warehouse connector

Our customers use Knock to power cross-channel notification experiences. The average Knock customer might use Knock to send emails, in-app, push, and Slack notifications.

When we send messages across those channels, we keep track of both delivery status (was the message successfully delivered to the user) and engagement status (was the message interacted with by the user).

Part of the value we provide to customers is in normalizing this cross-channel data and presenting it back to them in a way that’s easier to understand. Instead of our customers needing to dig into the docs of each of their downstream notification channels to understand what qualified as a “read” message, we map those statuses back to a standard set of channel-agnostic definitions.

While this engagement and delivery data is available in the Knock dashboard, our specialty is notification infrastructure, not data visualization. We started to receive requests from customers for custom analyses of their notification data they couldn’t accomplish in our dashboard.

Manual data exports got the job done in the early days, but we knew that approach wouldn’t scale for long. Then we realized we could enable this in a self-service way for our customers by bringing the data to them.

There are two big benefits of bringing Knock notification data to our customers’ data warehouses.

  1. Our customers can evaluate notification data alongside the rest of their product data. Bringing Knock notification data into the warehouse lets our customers join our data with the rest of their model to get a more holistic view of user engagement. It’s not enough to just know if a user engaged with a notification, you want to know if they completed the action associated with it.
  2. Our customers can use their existing data visualization tooling. A data warehouse connector enables our customers to analyze our data with the tooling they already use, and without us having to build advanced data visualization into our product.

What does a data warehouse connector need?

We started our research by looking into what it would take to build this in-house. We identified the following system components that would be needed to push our data into our customers’ data warehouses.

  • Data warehouse connection configuration. A configuration that allows customers to provide their data warehouse credentials and connection details, including the ability to proxy connections over an SSH tunnel for customers who don’t expose their data warehouse to the public internet. We didn’t want to have to build this ourselves for every possible data warehouse we’d encounter along the way; Snowflake, Redshift, BigQuery.
  • Data transformation layer. We knew we’d be using our own data warehouse as a source for whatever connector we ended up building or buying. We use DBT for building our models and planned to use this for our connector. An added benefit: no proprietary customer data or PII makes it into our data warehouse, so we knew our data warehouse connector would be privacy-first by default.
  • Sync mechanism. The beating heart of any data warehouse integration is the system to sync records from the source to the destination. The key challenge here is building a sync system that works efficiently over a large number of records to keep track of new, updated and deleted records from the source to the destination. Syncing must be efficient, reliable, and scalable across many customers and work across each supported destination database.
  • Logging and observability. As builders of a dev tool, we know that logging and observability are first-class citizens in our product, and we wanted to have the same amount of insight into our data warehouse as we do for our notifications.

Evaluating our options

We ultimately decided that this was something we didn’t want to build in-house—the discrete components alone are complex enough to warrant their own team to maintain. Additionally, we have a deep belief in outsourcing anything that isn’t a part of our core product. WorkOS for SSO, Algolia for search, and LaunchDarkly for feature flagging are good examples.

We looked for options to support this and found two main paths available to us.

Customer-run code using a data integration platform

The first option we considered was to use our existing REST API and provide code that our customers could deploy themselves to execute the sync. This option would mean our customers would need to rely on a data integration platform such as Fivetran or Airbyte as the sync engine, and we would simply provide the glue code, which could be deployed as a serverless function in their cloud of choice.

The benefit is that we'd repurpose our existing API as the sync source while shifting the burden of syncing onto a third party, so there was much less to build. However, this comes with the trade-off of being worse for our customers as they now need to deploy and operate custom code and pay for a third-party data integration platform in order to power the sync.

The other drawback of this approach is that the data we expose over the messages API isn't particularly well suited for powering analytics about notification usage and engagement.

Prequel: data warehouse connections as a service

Then we found Prequel. Prequel manages data warehouse connectors and data synchronization, and exists to solve the very problem we set out above: giving our customers a way to bring their Knock data into their data warehouse.

There were a few things that stood out to us about Prequel’s offering:

  • We could use our data warehouse as a source. This means we could quickly build models to surface to our customers in their own data warehouse while keeping the secure-by-default benefits discussed above. Another benefit was offloading any read volume from our primary application database.
  • No reliance on customers needing a data integration platform. Prequel connects directly with our customer’s data warehouse and as such, we would not need to rely on our customers powering the integration via a third-party platform that we have no visibility into.
  • No custom code to manage. Prequel ensures that we control the sync process and do not need to ship code that our customers self-host to sync data. This results in a much more seamless experience for our customers, who set up a connection, and data just magically starts syncing. ✨

There are, however, always trade-offs with using a third party to power pieces of our product. Fortunately for us, seeing Prequel’s heavy focus on security and reliability, we felt confident about partnering with them.

Building the Knock data connector using Prequel

Prequel made it straightforward to connect Knock data to their system (and on top of that, they were extremely responsive to any questions or troubleshooting we had). At a high level, we configured our data as a Prequel source, then Prequel provided us with a test destination, and we tested our data flow before opening it to our own customers.

Here are the steps we took:

  1. First, we had to set up our tables that we wanted to make available to the data warehouse and configure the columns appropriately to be consumed. (Prequel docs on data model configuration.)
  2. We set up this data in Amazon RedShift, which then needed to be added as a “source” in Prequel.
  3. Then we used Prequel’s Github app to manage our config files and have changes to our table automatically sync to Prequel.
  4. Once the source was configured, Prequel supplied us with a sandbox Snowflake database that we used to test that our source was properly set up and data was flowing to the destination.
  5. Finally, we used Prequel’s magic link feature to let our first data warehouse customer securely connect their own database. It took less than 20 minutes for the initial data sync (tens of millions of rows) to complete, and transfers are made on a rolling basis after that.

Going forward, there’s virtually no management on our part, and very little overhead when a new customer requests access to the data warehouse. Prequel is a supportive partner with a powerful product.

Conclusion

Overall we’re glad we decided to ship a data warehouse connector and that we used Prequel to build it. It’s enabled us to become a part of how our customers understand their customer lifecycle and engagement, without us needing to invest in building an entire analytics visualization suite into our product.

At Knock we think a lot about build vs. buy decisions, especially for notification systems. If you have any learnings on how you think about building v. buying, let us know.