SQL EXCEPT Clause: Finding the Difference Between Two Result Sets

Introduction

The SQL EXCEPT clause is used to find the difference between two result sets. It returns the rows that are present in the first query’s result set but not in the second query’s result set. This is analogous to the set difference operation in relational algebra. It’s a useful tool for comparing datasets and identifying discrepancies.

Understanding the EXCEPT Clause

The EXCEPT clause works by comparing two result sets and returning rows from the first result set that do not appear in the second result set. It eliminates duplicate rows in the final result set.

Syntax

The basic syntax for using EXCEPT is:

SELECT column_name(s)
FROM tableA
EXCEPT
SELECT column_name(s)
FROM tableB;

SQL EXCEPT Example

To illustrate the use of the EXCEPT clause, let’s use the following example. Suppose we have two tables: Employees and Contractors. We want to find out which employees are not contractors.

Employees Table:

EmployeeIDNameDepartment
1AliceHR
2BobIT
3CharlieFinance
4DavidIT
5EveMarketing

Contractors Table:

ContractorIDNameProject
1BobProjectX
2FrankProjectY
3GraceProjectZ
4EveProjectA

To find employees who are not contractors, use the following SQL query:

SELECT Name
FROM Employees
EXCEPT
SELECT Name
FROM Contractors;

Output:

Name
Alice
Charlie
David

Retaining Duplicates with EXCEPTALL

By default, EXCEPT removes duplicate rows from the result set. If you want to retain duplicates, you should use EXCEPT ALL (note that this is supported in some databases but not all).

Example with Duplicates:
Assume the Employees table had duplicate entries for the names:

Employees Table (with duplicates):

EmployeeIDNameDepartment
1AliceHR
2BobIT
2BobIT
3CharlieFinance
4DavidIT
5EveMarketing

To retain duplicates in the result, use:

SELECT Name
FROM Employees
EXCEPT ALL
SELECT Name
FROM Contractors;

Output:

Name
Alice
Bob
Bob
Charlie
David

Difference Between EXCEPT and NOT IN

Both EXCEPT and NOT IN can be used to find discrepancies between two sets of data, but they have key differences:

  • Duplicates: EXCEPT removes duplicates from the result set, while NOT IN retains them.
  • Support: EXCEPT is not supported by MySQL, whereas NOT IN is supported across most SQL databases.
  • Performance: The performance of EXCEPT and NOT IN can vary depending on the database engine and the size of the datasets.

Conclusion

The EXCEPT clause is a powerful tool for finding differences between two result sets, allowing you to identify records that are unique to the first set. While EXCEPT automatically removes duplicates, NOT IN can be used when you need to retain duplicates. Be aware of the specific support and performance characteristics of your SQL database when choosing between these options.