- James O'Doherty
- Jan 19, 2023
- 2 min read
Updated: Feb 16, 2023

The theoretical foundations of relational databases are based on relational algebra. Relational algebra is also the basis for SQL which was the first language to use Codd’s relational model.
SQL uses set operators. Set operations can be illustrated using Venn diagrams.
Let’s look at different SQL constructs using two very simple sets. I like travel so we’ll use two sets – flights (pink) and destinations (blue). The flights dataset contains all the flights leaving from Heathrow Airport in London and the destinations dataset contains all the possible destination airports in the world.
[Heathrow has 1300 flights per day and handles 80.1 million passengers per year. It’s the busiest two runway airport in the world.]
Before we examine actual SQL constructs, let’s first look at different types of sets. A set is a collection of elements. In database terms, it’s a collection of rows.

There are two more kinds of datasets. Disjoint sets are sets that have no elements in common.
Disjoint Sets
In our travel database, we also have a dataset of ferry crossings from Calais. Logically, ferry crossings and flights have nothing in common.

​SELECT * FROM flights | ​SELECT * FROM ferry crossings |
Where and Having
SQL has two clauses that are used for filtering – WHERE and HAVING. WHERE is use to filter rows based on some criteria and HAVING filters a dataset after aggregation. Both clauses return a subset.
SELECT * FROM flights WHERE airline = 'British Airways' | ​SELECT * COUNT(*) AS number_of_flights FROM flightsGROUP BYairline HAVING COUNT(*)>5 |
Subset
A subset is a contained within another set. In our Destinations dataset, we get a subset by selecting all destinations in Australia.

​​SELECT * FROM destinations WHERE country = ‘Australia’ |
Empty set
There is a special set in set theory – the empty set. Now because our Flights dataset only includes airlines flying from Heathrow, it easy to get an empty set.

SELECT+ FROM flights WHERE airline = 'Easyjet' |
Now that we’ve looked at the special datasets, we turn to the set operators – intersection, union and cartesian products.
Intersection
The intersection of two sets results in a subset of common elements. SQL uses joins to link two or more sets. The INNER JOIN returns the intersection of two sets. It returns all rows that match the value of the common fields specified on the join.

​SELECT * FROM flights F INNER JOIN destinations D ON F.destination_airport = D.arrival_airport WHERE D.country = ‘Germany’ |
The INTERSECT operator return distinct row in both datasets.
​SELECT destination FROM Flights INTERSECT SELECT destination FROM Destinations; |
Outer Join
An outer join combines the selected columns from the two sets for every combination of rows that satisfy the join predicate and adds the rows that are not having a match for the specified join side. There are two types of OUTER JOIN – LEFT AND RIGHT.


SELECT * FROM flights F LEFT OUTER JOIN destinations D ON F.destinations_airport = D.arrival_airport | ​SELECT * FROM flights F RIGHT OUTER JOIN destinations D ON F.destinations_airport = D.arrival_airport |
The EXEPT operator is similar to an OUTER JOIN. It returns distinct rows from the query left of the except operator hat are not found on the right query.
Union
Union combines two datasets into a single set. You can control whether the result sets has duplicate rows with UNION ALL or only distinct rows with UNION.

The basic rule for union is that the number and order of the columns must be the same in both sets.
Cartesian Product
The CROSS JOIN returns a cartesian product of two sets. The first row of set A is combined with each row in set B. The number of rows in the resulting set is the number of rows in set A x number f rows in set B.

​SELECT airline, destination FROM flights CROSS JOIN SELECT airline, pilot_name FROM pilot WHERE airline=’BA’ |