Updated Mar 18 2021

Data Software-as-a-Service: The Case for a Hybrid Deployment Architecture

Lior Gavish

CTO and Co-founder, Monte Carlo. Programming wizard and lover of cats.

Kevin Stumpf

Kevin Stumpf is the co-founder and CTO of Tecton.ai, a leading enterprise feature store for machine learning. Previously, he served on Uber's Machine Learning team where he led development on the Michelangelo Platform.

As founders of companies that build solutions designed to help teams deliver on the promise of data, we knew we wanted to build great products that are easy to deploy and manage for our customers.

We also knew that since we would be integrating with our customers’ data stacks, we would need to offer the highest level of security and compliance. The question was: how are we going to build them? SaaS? On-prem? Something else?

To meet these goals, we chose a hybrid deployment architecture, a new approach that marries on-prem security with SaaS convenience. Here’s why.

With more and more data being ingested and stored by your company than ever before and increased scrutiny around how this data is accessed and used, layering in any type of new third-party solution to your existing data infrastructure implicitly comes with its fair share of security concerns.

In fact, in the age of GDPR, CCPA, HIPAA and many other important acronyms, managing complex data systems and keeping sensitive data secure are two distinct problems that don’t have a one-size-fits-all solution.

For data leaders, choosing which problem to address can feel like an exercise in picking your poison.

The traditional on-prem deployment model

The on-prem deployment architecture, by which third-party software runs in the customer’s environment, offers heightened security at the expense of speed and operational efficiency. Image courtesy of authors.

On-premises (on-prem) refers to the traditional deployment model in which the software runs in the customer’s environment, often inside a dedicated VPC. In particular, all data that the service stores or processes remains in the customer’s cloud.

Benefits

While the vendor writes the code, the customer maintains full control and ownership of the data.

On-prem solutions provide customers a binary and a license key (in the case of licensed software), and the customers manage the deployment. It is the traditional deployment model of choice for countless software products. Examples include MemSQL and Splunk’s early on-prem offerings, and rely on the customer’s engineering and IT team to handle the deployment. Compared to SaaS solutions, on-prem platforms often provide greater customization and configuration based on the customer’s needs.

For the customer, an obvious benefit of choosing a vendor that leverages an on-prem architecture is rooted in perceived security and compliance. By keeping the data in the customer’s environment, an on-prem architecture exposes no connectivity to external parties. Moreover, the vendor cannot access any sensitive information since all of the data and software is housed in the customer’s cloud.

Challenges

An on-prem deployment model requires that the customers shoulders most of the operational overhead. The customer must troubleshoot disaster recovery situations, such as application outages and data downtime, which can be time-intensive and lead to a subpar experience.

A second limitation of the on-prem model is lack of speed-to-deployment, both of the baseline software and any future product updates. Since the software lives in the customer’s environment, upgrades can be a tedious process that requires extensive permissions and additional resources.

The SaaS model

Software-as-a-Service (SaaS) solutions offer off-the-shelf software hosted in the vendor’s cloud that can be provisioned and used instantaneously by customers. In this model, the software is run and managed by the vendor, with the customer data stored in the vendor’s cloud. Pioneered by Salesforce, notable more recent examples from the data world include Snowflake, Segment, and Chartio.

Benefits

The SaaS model makes it easy for vendors to make updates, roll out new features, and address common pain points at scale, as opposed to pushing changes to individual customers’ environments. For many, this creates a more delightful user experience, often at a lower cost. This also extends to maintenance on the software, which is outsourced to the vendor who knows the software best.

Challenges

When you throw data into the mix, the SaaS model becomes a bit more complicated, particularly as it relates to compliance requirements and data lock-in.

While any self-respecting SaaS provider will encrypt your data at rest, it is still locked away in the vendor’s environment. As a result, many customers are unwilling (or for regulatory reasons, unable) to hand off the management and storage of data entirely.

Even if a customer is comfortable signing away the responsibility for securely storing the data outside of their environment, they still have to accept the fact that data is now fully locked-in and under the vendor’s control.

So, what does it take to get the compliance and flexibility benefits of an on-prem solution with the ease-of-deployment and convenience of a SaaS vendor?

We believe that there’s a better way forward for modern data products: a hybrid architecture.

The hybrid deployment model

Over the past decade, we’ve seen a rise in software engineering and DevOps teams across industries leverage hybrid cloud architectures to manage infrastructure-as-a-service applications, including New Relic and Atlassian. More recently, many data software vendors have made a similar design decision.

The hybrid architecture model leverages best practice from software engineering and DevOps architectures to combine the security of on-prem with the ease of a SaaS deployment. Image courtesy of authors.

To marry the best of the SaaS world and on-prem world for the modern data stack, buyers of data software should consider solutions that incorporate a hybrid architecture. This approach is composed of two parts: (1) a control plane managed by the vendor, and (2) a data plane in the customer’s environment.

The control plane

The control plane typically hosts the majority of the software’s business logic and handles insensitive metadata. It communicates with the data plane and delegates sensitive operations (such as processing, storing or deleting data) to it. The control plane also provides web and API interfaces, and monitors the health of the data plane. The control plane runs entirely in the vendor’s environment and typically follows a multi-tenant architecture, though some vendors offer a single-tenant control plane (often for a price premium) that runs in a customer-dedicated, completely isolated, VPC.

The data plane

The data plane typically processes and stores all of the customer’s sensitive data. It must be able to receive instructions from the control plane, and pass back metadata about its operations and health. Technically, the interface between the control and data plane is often implemented by a thin agent that runs in the customer’s environment. Some vendors are even able to skip an agent altogether and fully leverage cross-cloud account IAM roles.

At its essence, separating the customer’s data from the managed software gives customers the agility of a SaaS product with the compliance and data ownership of an on-prem solution, and keeping customer data in the customer’s cloud environment at all times.

Quicker onboarding and time to value

The hybrid architecture enables customers to deploy software quickly and, often with very little manual overhead.

Simultaneously, this speedy onboarding allows customers to derive near-immediate impact from the product, and in the short term, near-immediate value from their data or ML models.

As part of this deployment model, vendors with hybrid solutions typically provide on-call support as a key feature of their product — almost like an embedded SRE team for their customers.

Manage complex infrastructure and sensitive customer data independently

One of the biggest benefits of the hybrid model is that it frees customers from needing to configure or maintain the vendor’s complex infrastructure, while giving customers complete control over their data.

One way to do this is to give the vendor access to the “data plane” through an agent or cross-account roles, extracting information such as metadata, query logs and aggregated statistics. Unlike many SaaS products, no individual records or PII are ever taken out of the customer’s data warehouses, lakes, or BI tools and stored on the vendor’s cloud.

The hybrid approach also facilitates the addition of “knobs” that control the extent of account permissions for the vendor (i.e., the more permissive, the less management on the customer’s side, and vice versa). This gives customers greater agency over data access and security, which is critical for industries such as FinTech and healthcare where sensitive data abounds and the margin of acceptable error is low to non-existent.

By having the vendor manage the service’s compute resources, it also ensures that any issues with the product can be quickly resolved by the vendor without burdening the customer. A data agent solves this problem and ensures the vendor’s infrastructure can be easily maintained, debugged, and updated without effort or resources from the customer.

Pro-tip for vendors reading this: we recommend getting your SOC2 certification early — you’ll thank us later. Many enterprises, specifically in GDPR, HIPAA, and SOX compliant industries, will require this before they even consider working with you.

Fast and continuous software upgrades

The hybrid deployment model keeps stable cloud primitives like S3, EMR, DynamoDB in the customer cloud and all of the ever-improving, fast-moving pieces of infrastructure (i.e. the product itself) in a managed cloud. As a result, customers can integrate new solutions into their data stack more easily as the cost of trying out and getting started with a hybrid-model vendor is significantly lower than it is for complex on-prem software. Hosting the service in the vendor’s environment also makes rolling out updates for all customers much easier and more seamless than if it were hosted in the customer’s private cloud.

Hosting the service in the vendor’s environment means that customers can get access to new features, ensuring that innovation and product development is not driven in silos. Customers don’t even have to be aware of different software versions and slow upgrade cycles anymore – they can rest assured that they’ll always be using the latest and greatest release, fully automatically.

Hybrid models give customers flexibility over how they choose to use the product, for instance, if they want to deploy it across their entire stack or only in a few select data environments, and can easily add or subtract instances of the service as necessary.

Charting the path forward for flexible, secure data stacks

While we did not know each other when we founded Monte Carlo and Tecton, we ended up choosing a similar architecture for our products. This hybrid model ended up being critical in our ability to support data and ML organizations while also getting a seal of approval from security teams.

By leveraging a hybrid SaaS/on-prem architecture, solutions providers can build data products that are easy to deploy, require little to no operational overhead on behalf of the customer, facilitate full data ownership, and, perhaps most significantly, ensure the utmost data security and compliance.

At the end of the day, why shouldn’t you have it both ways?

This article was cross-posted on Tecton’s blog.

Interested in learning more? Reach out to Lior Gavish, Co-founder and CTO of Monte Carlo, or Kevin Stumpf, Co-founder and CTO of Tecton.

Talk to us!

Related resources

2023: The state of data quality

Did you know that bad data impacts 31% of a company’s revenue? And that 74% of data engineers say data quality issues are surfaced first by stakeholders? These stats and more in our recent survey with Wakefield Research.

Learn more

3 simple steps for Snowflake cost optimization without getting too crazy.

Snowflake cost optimization efforts need to be right sized. Learn how to get the most savings with investing too much of your team’s time.

Learn more

Data testing vs. data quality monitoring vs. data observability: What's right for your team?

In the fight against bad data and broken pipelines, there are a few popular options. But what makes the most sense for your data quality needs? We’ve got the answers.

Learn more

Data Software-as-a-Service: The Case for a Hybrid Deployment Architecture

The traditional on-prem deployment model

Benefits

Challenges

The SaaS model

Benefits

Challenges

The hybrid deployment model

The control plane

The data plane

Quicker onboarding and time to value

Manage complex infrastructure and sensitive customer data independently

Fast and continuous software upgrades

Charting the path forward for flexible, secure data stacks

Related resources

2023: The state of data quality

3 simple steps for Snowflake cost optimization without getting too crazy.

Data testing vs. data quality monitoring vs. data observability: What's right for your team?

The No-Panic Guide to Building a Data Engineering Pipeline That Actually Scales

How Skyscanner Enabled Data & AI Governance with Monte Carlo

8 Essential Data Pipeline Design Patterns You Should Know

The traditional on-prem deployment model

Benefits

Challenges

The SaaS model

Benefits

Challenges

The hybrid deployment model

The control plane

The data plane

Quicker onboarding and time to value

Manage complex infrastructure and sensitive customer data independently

Fast and continuous software upgrades

Charting the path forward for flexible, secure data stacks

Read more posts.

Databricks Data + AI Summit 2023 Keynote Recap: LakehouseIQ, Delta Lake 3.0, and More!

Build vs Buy Data Pipeline Guide

How the GitLab Data Team Builds a Culture of Radical Transparency

Are Apache Iceberg Tables Right For Your Data Lake? 6 Reasons Why.

What’s In Store for the Future of the Modern Data Stack?

What in the World is Going on with Data Catalogs?

Related resources

2023: The state of data quality

3 simple steps for Snowflake cost optimization without getting too crazy.

Data testing vs. data quality monitoring vs. data observability: What's right for your team?