Suspicious types

There is nothing suspicious in their appearance. Moreover, they even seem to you well and for a long time familiar. But that's only until you check them out. This is where they will show their insidious essence, working in a completely different way than you expected. And sometimes they throw out things that make their hair stand on end - for example, they lose the secret data entrusted to them. When you confront them, they claim they don't know each other, even though they work hard in the shadows under the same umbrella. It's time to finally bring them to clean water. Let's take a look at these suspicious types.

Data typing in PostgreSQL, for all its logic, really sometimes presents very strange surprises. In this article, we will try to clarify some of their quirks, understand the reason for their strange behavior and understand how not to run into problems in everyday practice. To tell the truth, I compiled this article, including as a kind of reference book for myself, a reference book that could be easily referred to in controversial cases. Therefore, it will be updated as new surprises from suspicious types are discovered. So, let's go, O tireless trackers of databases!

Dossier number one. real/double precision/numeric/money

It would seem that numeric types are the least problematic in terms of surprises in behavior. But no matter how. Therefore, we will start with them. So…

Forgot how to count

SELECT 0.1::real = 0.1

?column?
boolean
---------
f

What's the matter? That PostgreSQL casts the untyped constant 0.1 to the double precision type and tries to compare it with 0.1 of the real type. And these are completely different values! The essence of the representation of real numbers in machine memory. Since 0.1 cannot be represented as a finite binary fraction (it would be 0.0(0011) in binary), numbers with different digits will differ, hence the result that they are not equal. Generally speaking, this is a topic for a separate article, I will not write more here.

Where is the error from?

SELECT double precision(1)

ERROR:  syntax error at or near "("
LINE 1: SELECT double precision(1)
                               ^
********** Ошибка **********
ERROR: syntax error at or near "("
SQL-состояниС: 42601
Π‘ΠΈΠΌΠ²ΠΎΠ»: 24

Many people know that PostgreSQL allows a functional notation of type casting. That is, you can write not only 1::int, but also int(1), which will be equivalent. But not for types whose name consists of several words! Therefore, if you want to convert a numeric value to double precision in a functional form, use an alias of this type float8, that is, SELECT float8(1).

What is greater than infinity?

SELECT 'Infinity'::double precision < 'NaN'::double precision

?column?
boolean
---------
t

Wow, how it is! It turns out there is something greater than infinity, and that's NaN! At the same time, the PostgreSQL documentation honestly looks at us and claims that NaN is obviously greater than any other number, and, therefore, infinity. The opposite is also true for -NaN. Hello math lovers! But we must remember that all this operates in the context of real numbers.

Eye rounding

SELECT round('2.5'::double precision)
     , round('2.5'::numeric)

      round      |  round
double precision | numeric
-----------------+---------
2                | 3

Another unexpected hello from the base. Again, remember that double precision and numeric types have different roundings. For numeric, it is normal when 0,5 is rounded up, and for double precision, 0,5 is rounded towards the nearest even integer.

Money is something special

SELECT '10'::money::float8

ERROR:  cannot cast type money to double precision
LINE 1: SELECT '10'::money::float8
                          ^
********** Ошибка **********
ERROR: cannot cast type money to double precision
SQL-состояниС: 42846
Π‘ΠΈΠΌΠ²ΠΎΠ»: 19

According to PostgreSQL, money is not a real number. According to some individuals, too. But we need to remember that casting the money type is possible only to the numeric type, just as only the numeric type can be cast to the money type. But with him you can already play as your heart desires. But it will not be the same money.

Smallint and Sequence Generation

SELECT *
  FROM generate_series(1::smallint, 5::smallint, 1::smallint)

ERROR:  function generate_series(smallint, smallint, smallint) is not unique
LINE 2:   FROM generate_series(1::smallint, 5::smallint, 1::smallint...
               ^
HINT:  Could not choose a best candidate function. You might need to add explicit type casts.
********** Ошибка **********
ERROR: function generate_series(smallint, smallint, smallint) is not unique
SQL-состояниС: 42725
Подсказка: Could not choose a best candidate function. You might need to add explicit type casts.
Π‘ΠΈΠΌΠ²ΠΎΠ»: 18

Does not like PostgreSQL to trifle. What are such sequences based on smallint? int, no less! Therefore, when trying to execute the above query, the base tries to cast smallint to some other integer type, and sees that there can be several such casts. Which cast to choose? She cannot decide this, and therefore falls with an error.

Dossier number two. "char"/char/varchar/text

A number of oddities are also present in character types. Let's get to know them too.

What are these tricks?

SELECT 'ΠŸΠ•Π’Π―'::"char"
     , 'ΠŸΠ•Π’Π―'::"char"::bytea
     , 'ΠŸΠ•Π’Π―'::char
     , 'ΠŸΠ•Π’Π―'::char::bytea

 char  | bytea |    bpchar    | bytea
"char" | bytea | character(1) | bytea
-------+-------+--------------+--------
 ╨     | xd0  | П            | xd09f

What is this "char" type, what is this clown? We don't need such people... Because it pretends to be an ordinary char, even though it's in quotation marks. And it differs from a regular char, which is without quotes, in that it outputs only the first byte of the string representation, while a normal char outputs the first character. In our case, the first character is the letter P, which occupies 2 bytes in the unicode representation, as evidenced by the conversion of the result to the bytea type. And the "char" type takes only the first byte of this unicode representation. Then why is this type needed? The PostgreSQL documentation says that this is a special type used for special needs. So we probably don't need it. But look him in the eye and make no mistake when you meet him with his special demeanor.

Extra spaces. Out of sight, out of mind

SELECT 'abc   '::char(6)::bytea
     , 'abc   '::char(6)::varchar(6)::bytea
     , 'abc   '::varchar(6)::bytea

     bytea     |   bytea  |     bytea
     bytea     |   bytea  |     bytea
---------------+----------+----------------
x616263202020 | x616263 | x616263202020

Take a look at the example provided. I specifically brought all the results to the bytea type, so that you can clearly see what lies there. Where are the trailing spaces after the cast to varchar(6)? The documentation succinctly states: "When casting a character value to another character type, padding spaces are discarded." This dislike must be remembered. And note that if a quoted string constant is immediately cast to varchar(6), trailing spaces are preserved. Such are miracles.

Dossier number three. json/jsonb

JSON is a separate structure that lives its own life. Therefore, its entities and PostgreSQL entities are slightly different. Here are examples.

Johnson and Johnson. feel the difference

SELECT 'null'::jsonb IS NULL

?column?
boolean
---------
f

The thing is that JSON has its own null entity, which is not analogous to NULL in PostgreSQL. At the same time, the JSON object itself could very well be NULL, so SELECT null::jsonb IS NULL (note the lack of single quotes) will return true this time.

One letter changes everything

SELECT '{"1": [1, 2, 3], "2": [4, 5, 6], "1": [7, 8, 9]}'::json

                     json
                     json
------------------------------------------------
{"1": [1, 2, 3], "2": [4, 5, 6], "1": [7, 8, 9]}

---

SELECT '{"1": [1, 2, 3], "2": [4, 5, 6], "1": [7, 8, 9]}'::jsonb

             jsonb
             jsonb
--------------------------------
{"1": [7, 8, 9], "2": [4, 5, 6]}

The thing is that json and jsonb are completely different structures. In json, the object is stored as is, and in jsonb it is already stored as a parsed indexed structure. That is why in the second case the value of the object by key 1 was changed from [1, 2, 3] to [7, 8, 9], which came to the structure at the very end with the same key.

Do not drink water from the face

SELECT '{"reading": 1.230e-5}'::jsonb
     , '{"reading": 1.230e-5}'::json

          jsonb         |         json
          jsonb         |         json
------------------------+----------------------
{"reading": 0.00001230} | {"reading": 1.230e-5}

PostgreSQL in the implementation of JSONB changes the formatting of real numbers, bringing them to the classical form. This does not happen for the JSON type. Weird a little, but its right.

Dossier number four. date/time/timestamp

There are some oddities with date/time types too. Let's look at them. I’ll make a reservation right away that some of the behavioral features become clear if you understand the essence of working with time zones well. But this is also a topic for a separate article.

My yours don't understand

SELECT '08-Jan-99'::date

ERROR:  date/time field value out of range: "08-Jan-99"
LINE 1: SELECT '08-Jan-99'::date
               ^
HINT:  Perhaps you need a different "datestyle" setting.
********** Ошибка **********
ERROR: date/time field value out of range: "08-Jan-99"
SQL-состояниС: 22008
Подсказка: Perhaps you need a different "datestyle" setting.
Π‘ΠΈΠΌΠ²ΠΎΠ»: 8

It would seem that there is something incomprehensible? But still, the base does not understand what we put in the first place here - a year or a day? And decides it's January 99, 2008, which blows her mind. Generally speaking, in the case of transferring dates in text format, you need to carefully check how correctly the database recognized them (in particular, parse the datestyle parameter with the SHOW datestyle command), since ambiguities in this matter can be very expensive.

Where did you come from?

SELECT '04:05 Europe/Moscow'::time

ERROR:  invalid input syntax for type time: "04:05 Europe/Moscow"
LINE 1: SELECT '04:05 Europe/Moscow'::time
               ^
********** Ошибка **********
ERROR: invalid input syntax for type time: "04:05 Europe/Moscow"
SQL-состояниС: 22007
Π‘ΠΈΠΌΠ²ΠΎΠ»: 8

Why can't the base understand the explicitly specified time? Because the time zone does not have an abbreviation, but a full name, which makes sense only in the context of a date, since it takes into account the history of time zone changes, and it does not work without a date. And the very wording of the time string raises questions - what did the programmer really mean? Therefore, everything is logical here, if you understand.

What's wrong with him?

Imagine the situation. You have a field in the table with the timestamptz type. You want to index it. But you understand that building an index on this field is not always justified due to its high selectivity (almost all values ​​of this type will be unique). So you decide to reduce the selectivity of the index by casting this type to a date. And you get a surprise:

CREATE INDEX "iIdent-DateLastUpdate"
  ON public."Ident" USING btree
  (("DTLastUpdate"::date));

ERROR:  functions in index expression must be marked IMMUTABLE
********** Ошибка **********
ERROR: functions in index expression must be marked IMMUTABLE
SQL-состояниС: 42P17

What's the matter? The fact that to cast the timestamptz type to the date type, the value of the TimeZone system parameter is used, which makes the type conversion function dependent on the custom parameter, i.e. volatile. Such functions are not allowed in the index. In this case, you must explicitly indicate in which time zone the type conversion is performed.

When now is not even now at all

We are used to the fact that now() returns the current date / time, taking into account the time zone. But look at the following queries:

START TRANSACTION;
SELECT now();

            now
  timestamp with time zone
-----------------------------
2019-11-26 13:13:04.271419+03

...

SELECT now();

            now
  timestamp with time zone
-----------------------------
2019-11-26 13:13:04.271419+03

...

SELECT now();

            now
  timestamp with time zone
-----------------------------
2019-11-26 13:13:04.271419+03

COMMIT;

The date/time is returned the same no matter how much time has passed since the previous request! What's the matter? The fact that now() is not the current time, but the start time of the current transaction. Therefore, within the framework of the transaction, it does not change. Any query that is launched outside of a transaction is implicitly wrapped in a transaction, which is why we don't notice that the time given by a simple SELECT now(); actually not the current one... If you want to get an honest current time, you need to use the clock_timestamp() function.

Dossier number five. bit

Strange a little bit

SELECT '111'::bit(4)

 bit
bit(4)
------
1110

On which side should the bits be added in case of type extension? Seems to be on the left. But only the base has a different opinion on this matter. Be careful: if the number of digits does not match, you will get something completely different from what you wanted when casting the type. This applies to both adding bits to the right and cutting bits. Right too...

Dossier number six. Arrays

Not even NULL fired

SELECT ARRAY[1, 2] || NULL

?column?
integer[]
---------
{1,2}

As normal people raised in SQL, we expect the result of this expression to be NULL. But it was not there. An array is returned. Why? Because in this case, the base casts NULL to an integer array and implicitly calls the array_cat function. But it still remains unclear why this "massive cat" does not reset the array to zero. This behavior also needs to be remembered.

Summarize. There are enough oddities. Most of them, of course, are not so critical as to speak of flagrantly inadequate behavior. And others are explained by the convenience of use or the frequency of their applicability in certain situations. But at the same time, there are many surprises. Therefore, you need to know about them. If you find something else strange or unusual in the behavior of any types, write in the comments, I will gladly supplement the dossiers available on them.

Source: habr.com

Add a comment