Data Quality

Updated Jun 24 2024

The Ultimate Guide to Domain Integrity in Databases

Lindsay MacDonald

Lindsay is a Content Marketing Manager at Monte Carlo.

Bad data can mislead your business, causing more harm than having no data at all.

The first step in avoiding bad data is ensuring domain integrity. Read on to learn why domain integrity is important, how to successfully implement domain integrity, and best practices for automation.

What is Domain Integrity?

Domain integrity is a group of restrictions a database follows to ensure that the data it contains is accurate and consistent. For example, if I have an INT column in a SQL database, it will only accept whole number integers. This restriction blocks me from corrupting the column in the future by attempting to add TEXT string data.

Domain integrity is important because it prevents this type of invalid data entry and thereby maintains a reliable database. It is a key part of data integrity alongside entity and referential integrity.

Here is a table to better explain the difference between domain integrity, entity integrity, and referential integrity:

Type	Definition	Purpose	Enforcement
Domain integrity	Ensures that all values in a column fall within a defined set of permissible values.	To ensure that the data in each column is accurate and consistent.	Enforced through data types, constraints (CHECK), and rules (NOT NULL).
Entity integrity	Ensures that each table has a primary key and that the primary key values are unique and not null.	To ensure that each row in in a table can be uniquely identified.	Enforced through PRIMARY KEY constraints, ensuring that no duplicate or null values exist in primary key columns.
Referential integrity	Ensures that a foreign key value always points to an existing, valid row in another table.	To maintain logical relationships between tables, preventing orphaned records.	Enforced through FOREIGN KEY constraints, ensuring that foreign key values match primary key values in related tables.

Choosing the Right Data Type

Implementing domain integrity is a standard process of creating any sort of database table or data pipeline. When creating a database table, each column of data needs to have a type.

The tricky detail is making sure to pick the right types to best fit our data. Below are common data types and their use cases:

Data TYPE	USE CASE
INT	Used for integer values without decimals, suitable for counting, identifiers, and whole numbers.
DECIMAL	Used for exact numerical values with fixed precision and scale, suitable for financial calculations and quantities where exact precision is needed.
VARCHAR(n)	Used for variable-length character strings, suitable for text fields where the length can vary like names, emails, or descriptions.
TEXT	Used for large variable-length character strings, suitable for long text fields lik comments, articles, or product descriptions.
DATE	Used for date values, suitable for storing dates without time components like birthdays, anniversaries, or deadlines.
TIME	Used for time values, suitable for storing times without date components like office hours or appointment times.
DATETIME	Used for date and time values, suitable for storing precise moments in time like timestamps for events or logs.
BOOLEAN	Used for true/false values, suitable for binary conditions like status flags or feature toggles.
BLOB	Used for binary large objects, suitable for storing binary data like images, audio, or video files.
JSON	Used for storing JSON formatted data, suitable for flexible, semi-structured data like API responses or configuration settings.
UUID	Used for universally unique identifiers, suitable for generating unique keys across tables and databases.

Domain Integrity Constraints

But there is more to domain integrity than just choosing the right type. Constraints can also be added to each field to make sure that it matches certain criteria. Here are some examples of common domain integrity constraints:

Constraint type	code snippet	explanation
NOT NULL	`sql CREATE TABLE Employees ( name VARCHAR(50) NOT NULL );`	Ensures that the name column cannot have NULL values, enforcing that every employee must have a name.
CHECK	`sql CREATE TABLE Employees ( age INT CHECK (age >= 18 AND age <= 65) );`	Ensures that the age column values must be between 18 and 65, enforcing that employees fall within this age range.
DEFAULT	`sql CREATE TABLE Employees ( hire_date DATE DEFAULT CURRENT_DATE );`	Ensures that the hire_date column will default to the current date if no value is provided during insertion.
UNIQUE	`sql CREATE TABLE Employees ( email VARCHAR(100) UNIQUE );`	Ensures that the email column values must be unique across all rows, preventing duplicate email addresses.

How to Implement Domain Integrity

Setting up domain constraints is a standard process of creating any table in SQL. For example, here is a simple table in MySQL, where each column is a string of characters:

CREATE TABLE Employees (
    id VARCHAR(50),
    name VARCHAR(50),
    age VARCHAR(50),
    email VARCHAR(100),
    hire_date VARCHAR(50)
);

But any data engineer will quickly see that VARCHAR is likely the wrong type for some of these fields based on their name. While the CREATE TABLE statement is valid SQL, it doesn’t ensure the strictest domain integrity, meaning that data corruption is still likely. Here is a more standard CREATE TABLE statement:

CREATE TABLE Employees (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    age INT,
    email VARCHAR(100),
    hire_date DATE
);

But we can do even better with more specific constraints:

CREATE TABLE Employees (
    id INT PRIMARY KEY,
    name VARCHAR(50) NOT NULL,
    age INT CHECK (age >= 18 AND age <= 65),
    email VARCHAR(100) UNIQUE,
    hire_date DATE DEFAULT CURRENT_DATE
);

This ensures domain integrity in our SQL Server by validating that all the values of the table are the ones we would expect:

Every employee has a name.
Employees are of a working age.
Each employee has their own unique email.
All hire dates are valid, and have a default just in case.

Handling Exceptions and Errors in Domain Integrity

When enforcing domain integrity, it’s important to handle exceptions and errors that may arise. For example, attempting to insert a value that violates a CHECK constraint will result in an error message like this one in MySQL:

ERROR 3819 (HY000): Check constraint ‘Employees_chk_1’ is violated.

That is why continuous monitoring and updating of constraints is necessary as data requirements evolve. But this will come with some challenges, such as:

Identifying and handling legacy data that violates domain rules.
Performance overhead of constraint checks on large datasets.

Automate Monitoring of Domain Integrity with Monte Carlo

Monitoring domain integrity manually can be labor-intensive and error-prone. Automating this process with a data observability platform like Monte Carlo can help.

Monte Carlo provides automated monitoring of domain integrity, ensuring that data remains accurate and consistent without manual intervention. With a data observability solution like Monte Carlo, data teams get:

Automated data quality checks and monitoring
Early detection of data anomalies and inconsistencies
Improved reliability of data-driven applications

Explore Monte Carlo’s features and request a demo to see how automated domain integrity monitoring can enhance your data reliability efforts.

Our promise: we will show you the product.

Frequently Asked Questions

What is domain integrity with an example?

Domain integrity ensures that all values in a database column fall within a defined set of permissible values. For example, if you have a column for ages in a table, you can enforce domain integrity by specifying that the age column only accepts integer values between 18 and 65. This prevents invalid data from being entered into the column.

How do I ensure domain integrity in SQL Server?

You ensure domain integrity in SQL Server by defining appropriate data types for columns and using constraints such as NOT NULL, CHECK, UNIQUE, and DEFAULT. These constraints ensure that the data entered into the columns meets specific criteria, such as non-null values, specific ranges, uniqueness, and default values.

Why is domain integrity important?

Domain integrity is important because it ensures the accuracy and consistency of data within a database. By enforcing rules on the types of data that can be entered into each column, it prevents the entry of invalid data, which can lead to data corruption, errors in data processing, and inaccurate reports. Maintaining domain integrity helps ensure that the data remains reliable and useful for decision-making and analysis.

What is an example of a domain integrity constraint?

An example of a domain integrity constraint is a rule that ensures data in a column meets specific criteria, such as a column that only accepts non-null values, values within a certain range, or unique values. For instance, setting a rule that a column for ages only accepts values between 18 and 65 helps maintain valid data entries.

What is the difference between entity integrity and domain integrity?

Entity integrity ensures that each row in a table can be uniquely identified, typically through the use of primary keys that are unique and not null. Domain integrity, on the other hand, ensures that all values in a column fall within a defined set of permissible values, using data types and constraints. While entity integrity focuses on the uniqueness of rows, domain integrity focuses on the validity and consistency of individual column values.

What are domain integrity constraints?

Domain integrity constraints are rules applied to database columns to ensure that the data entered into those columns is valid and consistent. Examples of domain integrity constraints include: NOT NULL (ensuring a column cannot have null values), CHECK (ensuring values meet specific criteria), UNIQUE (ensuring all values in a column are unique), and DEFAULT (providing a default value if none is specified). These constraints help maintain the quality and reliability of the data within a database.

The Ultimate Guide to Domain Integrity in Databases

Table of Contents

What is Domain Integrity?

Choosing the Right Data Type

Domain Integrity Constraints

How to Implement Domain Integrity

Handling Exceptions and Errors in Domain Integrity

Automate Monitoring of Domain Integrity with Monte Carlo

Frequently Asked Questions

What is domain integrity with an example?

How do I ensure domain integrity in SQL Server?

Why is domain integrity important?

What is an example of a domain integrity constraint?

What is the difference between entity integrity and domain integrity?

What are domain integrity constraints?

Monte Carlo Recognized as the #1 Leader in Data Observability and Data Quality by G2

3 Steps to AI-Ready Data

The Comprehensive Guide to Data Reconciliation

Table of Contents

What is Domain Integrity?

Choosing the Right Data Type

Domain Integrity Constraints

How to Implement Domain Integrity

Handling Exceptions and Errors in Domain Integrity

Automate Monitoring of Domain Integrity with Monte Carlo

Frequently Asked Questions

What is domain integrity with an example?

How do I ensure domain integrity in SQL Server?

Why is domain integrity important?

What is an example of a domain integrity constraint?

What is the difference between entity integrity and domain integrity?

What are domain integrity constraints?

Read more posts.

The 5 Data Quality Rules You Should Never Write Again

How to Set Data Quality Standards for Your Company the Right Way

Just Launched: (No Code) Validation Monitors

Top Data Quality Alert Strategies From 3 Real Data Teams

Scary Data Quality Stories: 7 Tips for Preventing Your Own Data Downtime Nightmare

Your Guide to Building the Perfect Data Quality Dashboard