Friday, January 24, 2014

The SQL NULL paradox

Level: 1 where 1 is noob and 5 is totally awesome
System: MS SQL Server or PostgreSQL

The Paradox 


One of the best things about computing, it is so logical. We can always trust in 1 + 1 = 2, 1 < 2 or 1 = 1. So now I'm going to throw 4 queries and their results, as an example:

SELECT CASE WHEN 42 > 42 THEN 'True' ELSE 'False' END; -- Result: False
SELECT CASE WHEN 42 < 42 THEN 'True' ELSE 'False' END; -- Result: False
SELECT CASE WHEN 42 != 42 THEN 'True' ELSE 'False' END;-- Result: False
SELECT CASE WHEN 42 = 42 THEN 'True' ELSE 'False' END; -- Result: True

Hopefully no surprise there. Now I'm going to throw 4 queries more, and their result:

SELECT CASE WHEN NULL > NULL THEN 'True' ELSE 'False' END; -- Result: False
SELECT CASE WHEN NULL < NULL THEN 'True' ELSE 'False' END; -- Result: False
SELECT CASE WHEN NULL != NULL THEN 'True' ELSE 'False' END;-- Result: False
SELECT CASE WHEN NULL = NULL THEN 'True' ELSE 'False' END; -- Result: False

The three first lines, seems logical, but wait a minute:-). How is this possible? If NULL != NULL then NULL = NULL  must be true, else it is illogical. Or is it?

The Explanation


Actually it is not paradox. The reason, some might expect NULL = NULL to be true, is because they misunderstand NULL. They mistake NULL for being a value. If NULL was a value, then NULL = NULL would be true. So if it is not a value, what is it then?

The ANSI SQL-92 Specification says about NULL:
"null value (null): A special value, or mark, that is used to indicate the absence of any data value."   
I can understand the confusion, because have just written, it is not a value. But you have to notice it says "A special value". Absence of data value, is the right way to see NULL,  unknown or missing.

To be able to handle NULL values defined in the SQL-92 specs. Databases servers such as SQL Server and PostgreSQL uses trinary logic and not binary logic. It means, logic operations can have 3 out comes: True, False or NULL.

Because NULL is considered to be unknown, any comparison to NULL, even NULL vs. NULL will have False as result, because it is unknown what we compares. While using any arithmetic operator with NULL, will have NULL as result. The latter makes good sense. Lets say you add something to unknown, then result is unknown.

So in the end, it's all logical. For more read Handling Null Values for SQL Server, or 9.2. Comparison Operators for PostgreSQL. Even better read them both:-)

No comments:

Post a Comment