Data Quality

The Ultimate Guide to Domain Integrity in Databases

domain integrity

Lindsay MacDonald

Lindsay is a Content Marketing Manager at Monte Carlo.

Bad data can mislead your business, causing more harm than having no data at all.

The first step in avoiding bad data is ensuring domain integrity. Read on to learn why domain integrity is important, how to successfully implement domain integrity, and best practices for automation.

What is Domain Integrity?

Domain integrity is a group of restrictions a database follows to ensure that the data it contains is accurate and consistent. For example, if I have an INT column in a SQL database, it will only accept whole number integers. This restriction blocks me from corrupting the column in the future by attempting to add TEXT string data.

Domain integrity is important because it prevents this type of invalid data entry and thereby maintains a reliable database. It is a key part of data integrity alongside entity and referential integrity.

Here is a table to better explain the difference between domain integrity, entity integrity, and referential integrity:

TypeDefinitionPurposeEnforcement
Domain integrityEnsures that all values in a column fall within a defined set of permissible values.To ensure that the data in each column is accurate and consistent.Enforced through data types, constraints (CHECK), and rules (NOT NULL).
Entity integrityEnsures that each table has a primary key and that the primary key values are unique and not null.To ensure that each row in in a table can be uniquely identified.Enforced through PRIMARY KEY constraints, ensuring that no duplicate or null values exist in primary key columns.
Referential integrityEnsures that a foreign key value always points to an existing, valid row in another table.To maintain logical relationships between tables, preventing orphaned records.Enforced through FOREIGN KEY constraints, ensuring that foreign key values match primary key values in related tables.

Choosing the Right Data Type

Implementing domain integrity is a standard process of creating any sort of database table or data pipeline. When creating a database table, each column of data needs to have a type.

The tricky detail is making sure to pick the right types to best fit our data. Below are common data types and their use cases:

Data TYPEUSE CASE
INTUsed for integer values without decimals, suitable for counting, identifiers, and whole numbers.
DECIMALUsed for exact numerical values with fixed precision and scale, suitable for financial calculations and quantities where exact precision is needed.
VARCHAR(n)Used for variable-length character strings, suitable for text fields where the length can vary like names, emails, or descriptions.
TEXTUsed for large variable-length character strings, suitable for long text fields lik comments, articles, or product descriptions.
DATEUsed for date values, suitable for storing dates without time components like birthdays, anniversaries, or deadlines.
TIMEUsed for time values, suitable for storing times without date components like office hours or appointment times.
DATETIMEUsed for date and time values, suitable for storing precise moments in time like timestamps for events or logs.
BOOLEANUsed for true/false values, suitable for binary conditions like status flags or feature toggles.
BLOBUsed for binary large objects, suitable for storing binary data like images, audio, or video files.
JSONUsed for storing JSON formatted data, suitable for flexible, semi-structured data like API responses or configuration settings.
UUIDUsed for universally unique identifiers, suitable for generating unique keys across tables and databases.

Domain Integrity Constraints

But there is more to domain integrity than just choosing the right type. Constraints can also be added to each field to make sure that it matches certain criteria. Here are some examples of common domain integrity constraints:

Constraint typecode snippetexplanation
NOT NULLsql CREATE TABLE Employees (
 name VARCHAR(50) NOT NULL
);
Ensures that the name column cannot have NULL values, enforcing that every employee must have a name.
CHECKsql CREATE TABLE Employees (
 age INT CHECK (age >= 18 AND age <= 65)
);
Ensures that the age column values must be between 18 and 65, enforcing that employees fall within this age range.
DEFAULTsql CREATE TABLE Employees (      
 hire_date DATE DEFAULT CURRENT_DATE );
Ensures that the hire_date column will default to the current date if no value is provided during insertion.
UNIQUEsql CREATE TABLE Employees (
 email VARCHAR(100) UNIQUE
);
Ensures that the email column values must be unique across all rows, preventing duplicate email addresses.

How to Implement Domain Integrity

Setting up domain constraints is a standard process of creating any table in SQL. For example, here is a simple table in MySQL, where each column is a string of characters:

CREATE TABLE Employees (
    id VARCHAR(50),
    name VARCHAR(50),
    age VARCHAR(50),
    email VARCHAR(100),
    hire_date VARCHAR(50)
);

But any data engineer will quickly see that VARCHAR is likely the wrong type for some of these fields based on their name. While the CREATE TABLE statement is valid SQL, it doesn’t ensure the strictest domain integrity, meaning that data corruption is still likely. Here is a more standard CREATE TABLE statement:

CREATE TABLE Employees (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    age INT,
    email VARCHAR(100),
    hire_date DATE
);

But we can do even better with more specific constraints:

CREATE TABLE Employees (
    id INT PRIMARY KEY,
    name VARCHAR(50) NOT NULL,
    age INT CHECK (age >= 18 AND age <= 65),
    email VARCHAR(100) UNIQUE,
    hire_date DATE DEFAULT CURRENT_DATE
);

This ensures domain integrity in our SQL Server by validating that all the values of the table are the ones we would expect:

  • Every employee has a name.
  • Employees are of a working age.
  • Each employee has their own unique email.
  • All hire dates are valid, and have a default just in case.

Handling Exceptions and Errors in Domain Integrity

When enforcing domain integrity, it’s important to handle exceptions and errors that may arise. For example, attempting to insert a value that violates a CHECK constraint will result in an error message like this one in MySQL:

ERROR 3819 (HY000): Check constraint ‘Employees_chk_1’ is violated.

That is why continuous monitoring and updating of constraints is necessary as data requirements evolve. But this will come with some challenges, such as:

  • Identifying and handling legacy data that violates domain rules.
  • Performance overhead of constraint checks on large datasets.

Automate Monitoring of Domain Integrity with Monte Carlo

Monitoring domain integrity manually can be labor-intensive and error-prone. Automating this process with a data observability platform like Monte Carlo can help.

Monte Carlo provides automated monitoring of domain integrity, ensuring that data remains accurate and consistent without manual intervention. With a data observability solution like Monte Carlo, data teams get:

  • Automated data quality checks and monitoring
  • Early detection of data anomalies and inconsistencies
  • Improved reliability of data-driven applications

Explore Monte Carlo’s features and request a demo to see how automated domain integrity monitoring can enhance your data reliability efforts.

Our promise: we will show you the product.

Frequently Asked Questions

What is domain integrity with an example?

Domain integrity ensures that all values in a database column fall within a defined set of permissible values. For example, if you have a column for ages in a table, you can enforce domain integrity by specifying that the age column only accepts integer values between 18 and 65. This prevents invalid data from being entered into the column.

How do I ensure domain integrity in SQL Server?

You ensure domain integrity in SQL Server by defining appropriate data types for columns and using constraints such as NOT NULL, CHECK, UNIQUE, and DEFAULT. These constraints ensure that the data entered into the columns meets specific criteria, such as non-null values, specific ranges, uniqueness, and default values.

Why is domain integrity important?

Domain integrity is important because it ensures the accuracy and consistency of data within a database. By enforcing rules on the types of data that can be entered into each column, it prevents the entry of invalid data, which can lead to data corruption, errors in data processing, and inaccurate reports. Maintaining domain integrity helps ensure that the data remains reliable and useful for decision-making and analysis.

What is an example of a domain integrity constraint?

An example of a domain integrity constraint is a rule that ensures data in a column meets specific criteria, such as a column that only accepts non-null values, values within a certain range, or unique values. For instance, setting a rule that a column for ages only accepts values between 18 and 65 helps maintain valid data entries.

What is the difference between entity integrity and domain integrity?

Entity integrity ensures that each row in a table can be uniquely identified, typically through the use of primary keys that are unique and not null. Domain integrity, on the other hand, ensures that all values in a column fall within a defined set of permissible values, using data types and constraints. While entity integrity focuses on the uniqueness of rows, domain integrity focuses on the validity and consistency of individual column values.

What are domain integrity constraints?

Domain integrity constraints are rules applied to database columns to ensure that the data entered into those columns is valid and consistent. Examples of domain integrity constraints include: NOT NULL (ensuring a column cannot have null values), CHECK (ensuring values meet specific criteria), UNIQUE (ensuring all values in a column are unique), and DEFAULT (providing a default value if none is specified). These constraints help maintain the quality and reliability of the data within a database.