Data Observability

Entity Integrity 101: Purpose, Requirements, & Examples

entity integrity

Lindsay MacDonald

Lindsay is a Content Marketing Manager at Monte Carlo.

When designing a database, there is always one question I start with: How do I ensure entity integrity?

The term may sound unfamiliar, but the concept crosses the mind of any engineer when they are writing their database schema. Entity integrity is a key signal that a database is properly designed.

What is Entity Integrity?

Entity integrity is the design concept that every record in a database should be its own unique and independent unit. For example, in a customers table, each row should represent a unique customer. Data corruption would occur if duplicate customer records exist within the database.

Entity integrity is important for maintaining a logically and consistently organized database. It enhances the accuracy and performance of the database through the implementation of primary keys.

The Role of Primary Keys

Primary keys are the unique identifiers that guarantee entity integrity for each record within a database. Any column in the table can be selected to become the primary key, but it is most often associated with an ID or other identifying number. For example, in the customers table below, the customerID is the best choice for a primary key:

CREATE TABLE customers (
  customerID INTEGER PRIMARY KEY,
  firstName TEXT,
  lastName TEXT,
  email TEXT
);

By selecting customerID as a primary key, the database engine will prevent duplicate records from being inserted, ensuring data uniqueness.

For example, if we attempt to insert a duplicate primary key, we might see an error like this:

ERROR: duplicate key value violates unique constraint "customers_pkey"
DETAIL: Key (customerID)=(1) already exists.

How Entity Integrity Speeds Up Your Database

By guaranteeing unique customer records with primary keys, we also create an easier way to link tables.

For example, if we also have an orders table to track customer purchases, we can ensure that we are linking to only one unique customer from that table, instead of multiple if we didn’t design with entity integrity.

CREATE TABLE orders (
  orderID INTEGER PRIMARY KEY,
  customerID INTEGER, -- Guaranteed only 1 customer due to earlier primary key
  orderDate DATE,
  orderAmount DECIMAL(10, 2),
  FOREIGN KEY (customerID) REFERENCES customers(customerID)
);

Not only will the JOIN statements linking these two tables be more organized with the use of primary keys, but they will also be more performant! Most relational databases create an index of the values used for primary keys, meaning they can be searched more quickly than regular values.

Use UNIQUE to Guarantee More Entity Integrity

In addition to primary keys, the UNIQUE constraint can be used to further guarantee the entity integrity of other columns in the table. To continue the customers example, if we want to ensure each customer has a unique email, we can use the UNIQUE constraint like so:

CREATE TABLE customers (
  customerID INTEGER PRIMARY KEY,
  firstName TEXT,
  lastName TEXT,
  email TEXT UNIQUE
);

Challenges in Maintaining Entity Integrity

By designing our databases properly using primary keys and UNIQUE constraints from the beginning, we can let the database itself handle entity integrity guarantees going forward. This will prevent data corruption from duplicated or overwritten IDs.

Problems with entity integrity usually only occur when attempting to merge multiple legacy data sources or unstructured data that can’t make the same integrity guarantees.

Hopefully, the only issue you face is conflicting primary keys. The keys in one table can be updated in as little as 3 steps to be appended to the other table:

-- Step 1: Find the maximum ID in table1
SELECT MAX(id) INTO @max_id FROM table1;

-- Step 2: Increment the IDs in table2 to avoid conflicts
UPDATE table2 SET id = id + @max_id;

-- Step 3: Merge the tables
INSERT INTO table1 (id, name)
SELECT id, name FROM table2;

UNIQUE constraints are more difficult to coincide after the fact and will likely require manual intervention to resolve all the records.

Even worse is if there is a mess of unstructured or historical data that needs to be ingested, then that will require a much broader and tailored solution.

Fortunately, there are data observability platforms today that can help manage your company’s data from creation to analysis. Monte Carlo can automate checks for entity integrity, ensuring that any issues are detected right away, and guiding you to a quick solution.

Monte Carlo’s Approach to Entity Integrity

Entity integrity monte carlo

Monte Carlo is a data observability platform that provides automated monitoring of all data integrity, including entity integrity.

This makes what is usually a laborious and error-prone process easy, with features such as: 

  • Automated data quality metrics and alerts. 
  • Early detection of data anomalies and inconsistencies.
  • Root cause analysis for rapid troubleshooting.

Are you curious about increasing your data reliability? We’d love to hear from you! Book a time below to get started.

Our promise: we will show you the product.

Frequently Asked Questions

How is entity integrity enforced?

Entity integrity is enforced by using primary keys and UNIQUE constraints in database tables to ensure that each record is unique and that no duplicate records can be inserted.

How do you test entity integrity?

You test entity integrity by attempting to insert duplicate primary keys or values in columns with UNIQUE constraints. The database should return an error, indicating that the constraints are working correctly.

What is the difference between referential integrity and entity integrity?

Entity integrity ensures that each record in a table is unique, typically through the use of primary keys, while referential integrity ensures that relationships between tables remain consistent, typically through the use of foreign keys.

What is an entity integrity constraint?

An entity integrity constraint is a rule that ensures each record in a database table is unique, usually enforced by primary keys and UNIQUE constraints.

What is the difference between key constraint and entity integrity constraint?

A key constraint specifically refers to the use of keys (such as primary or foreign keys) to enforce uniqueness or relationships between tables, whereas an entity integrity constraint ensures the overall uniqueness of records within a single table.

What are entity integrity constraints used for?

Entity integrity constraints are used to maintain the accuracy and consistency of data within a database by ensuring that each record is unique and that no duplicate records exist.