Criticism of the protocol and organizational approaches of Telegram. Part 1, technical: experience of writing a client from scratch - TL, MT

Recently, posts have begun to appear on Habré more often about how good Telegram is, how brilliant and experienced the Durov brothers are in building network systems, etc. At the same time, very few people really immersed themselves in the technical device - at most they use a fairly simple (and very different from MTProto) JSON-based Bot API, and usually just accept on faith all those praises and PR that revolve around the messenger. Almost a year and a half ago, my colleague at NPO Echelon Vasily (unfortunately, his account on Habré was deleted along with the draft) started writing his own Telegram client from scratch in Perl, and later the author of these lines joined. Why Perl, some will immediately ask? Because there are already such projects in other languages. In fact, this is not the point, there could be any other language where finished library, and accordingly the author must go all the way from scratch. Moreover, cryptography is such a thing - trust, but verify. With a security-focused product, you can't just rely on a vendor's ready-made library and blindly believe it (however, this is a topic for more in the second part). At the moment, the library works quite well at the “middle” level (allows you to make any API requests).

However, there will not be much cryptography and mathematics in this series of posts. But there will be many other technical details and architectural crutches (it will also be useful for those who will not write from scratch, but will use the library in any language). So, the main goal was to try to implement the client from scratch according to official documentation. That is, suppose that the source code of official clients is closed (again, in the second part, we will reveal in more detail the topic of what this is really is so), but, as in the old days, for example, there is a standard like RFC - is it possible to write a client according to the specification alone, “without peeping” into the source code, even official (Telegram Desktop, mobile), even unofficial Telethon?

Table of Contents:

Documentation ... is it there? Is it true?..

Fragments of notes for this article began to be collected last summer. All this time on the official site https://core.telegram.org the documentation was as of Layer 23, i.e. stuck somewhere in 2014 (remember, back then there weren’t even channels yet?). Of course, in theory, this should have made it possible to implement a client with functionality at that time in 2014. But even in this state, the documentation was, firstly, incomplete, and secondly, in places it contradicted itself. A little over a month ago, in September 2019, it was accidentally it was found that the site has a large update of the documentation, for a completely fresh Layer 105, with a note that now everything needs to be read again. Indeed, many articles have been revised, but many have remained unchanged. Therefore, when reading the criticism below about the documentation, you should keep in mind that some of these things are no longer relevant, but some are still quite. After all, 5 years in the modern world is not just a lot, but very a lot of. Since then (especially if you do not take into account the discarded and resurrected geochats since then), the number of API methods in the scheme has grown from a hundred to more than two hundred and fifty!

Where do you start as a young writer?

It doesn't matter if you write from scratch or use, for example, ready-made libraries like Telethon for Python or Madeline for PHP, in any case, you will first need register your applicationget parameters api_id и api_hash (those who worked with the VKontakte API immediately understand) by which the server will identify the application. This have for legal reasons, but we will talk more about why library authors cannot publish it in the second part. Perhaps you will be satisfied with the test values, although they are very limited - the fact is that now you can register on your number only one application, so do not rush headlong.

Now, from a technical point of view, we should have been interested in the fact that after registration we should receive notifications from Telegram about updates to the documentation, protocol, etc. That is, one could assume that the site with the docks was simply “scored” and continued to work specifically with those who began to make clients, because. it's easier. But no, nothing like that was observed, no information came.

And if you write from scratch, then the use of the received parameters is actually still far away. Although https://core.telegram.org/ and talks about them first in Getting Started, in fact, you first have to implement MTProto protocol - but if you believe layout according to the OSI model at the end of the page of the general description of the protocol, then completely in vain.

In fact, both before MTProto and after, at several levels at once (as foreign networkers working in the OS kernel say, layer violation), a big, painful and terrible topic will get in the way ...

Binary serialization: TL (Type Language) and its scheme, and layers, and many other scary words

This topic, in fact, is the key to Telegram's problems. And there will be many terrible words if you try to delve into it.

So, scheme. If you remember this word, say, json schemaYou thought right. The goal is the same: some language to describe a possible set of transmitted data. This, in fact, is where the similarity ends. If from the page MTProto protocol, or from the source tree of the official client, we will try to open some scheme, we will see something like:

int ? = Int;
long ? = Long;
double ? = Double;
string ? = String;

vector#1cb5c415 {t:Type} # [ t ] = Vector t;

rpc_error#2144ca19 error_code:int error_message:string = RpcError;

rpc_answer_unknown#5e2ad36e = RpcDropAnswer;
rpc_answer_dropped_running#cd78e586 = RpcDropAnswer;
rpc_answer_dropped#a43ad8b7 msg_id:long seq_no:int bytes:int = RpcDropAnswer;

msg_container#73f1f8dc messages:vector<%Message> = MessageContainer;

---functions---

set_client_DH_params#f5045f1f nonce:int128 server_nonce:int128 encrypted_data:bytes = Set_client_DH_params_answer;

ping#7abe77ec ping_id:long = Pong;
ping_delay_disconnect#f3427b8c ping_id:long disconnect_delay:int = Pong;

invokeAfterMsg#cb9f372d msg_id:long query:!X = X;
invokeAfterMsgs#3dc4b4f0 msg_ids:Vector<long> query:!X = X;

account.updateProfile#78515775 flags:# first_name:flags.0?string last_name:flags.1?string about:flags.2?string = User;
account.sendChangePhoneCode#8e57deb flags:# allow_flashcall:flags.0?true phone_number:string current_number:flags.0?Bool = auth.SentCode;

A person who sees this for the first time will intuitively recognize only part of what is written - well, these are apparently structures (although where is the name, on the left or on the right?), There are fields in them, after which the type goes through the colon ... probably. Here, in angle brackets, there are probably templates as in C ++ (in fact, not quite). And what do all the other symbols mean, question marks, exclamation marks, percentages, lattices (and obviously they mean different things in different places), present somewhere, but not somewhere, hexadecimal numbers - and most importantly, how to get from this right (which will not be rejected by the server) byte stream? You have to read the documentation (Yes, there are links to the schema in the JSON version nearby - but this does not make it clearer).

Opening the page Binary Data Serialization and plunge into the magical world of mushrooms and discrete mathematics, something similar to matan in the 4th year. Alphabet, type, value, combinator, functional combinator, normal form, composite type, polymorphic type... and that's just the first page! Next awaits you TL Language, which, although it already contains an example of a trivial request and response, does not provide an answer to more typical cases at all, which means that you will have to wade through the retelling of mathematics translated from Russian into English on eight more nested pages!

Readers familiar with functional languages ​​and automatic type inference, of course, saw in this language descriptions, even from an example, much more familiar, and can say that this is generally not bad in principle. The objections to this are:

  • yes goal sounds good, but alas not achieved
  • education in Russian universities varies even among IT specialties - not everyone read the corresponding course
  • Finally, as we shall see, in practice it is not required, since only a limited subset of even the TL that was described is used

As LeoNerd on channel #perl on the FreeNode IRC network, trying to implement a gate from Telegram to Matrix (the translation of the quote is inaccurate from memory):

It feels like someone who was introduced to type theory for the first time, got excited and started trying to play with it, not really caring if it was necessary in practice.

See for yourself if the need for bare types (int, long, etc.) as something elementary does not raise questions - in the end they must be implemented manually - for example, let's take an attempt to derive from them vector. That is, in fact, array, if you call the resulting things by their proper names.

But before

Brief description of a subset of the TL syntax for those who don't… read the official documentation

constructor = Type;
myVec ids:Vector<long> = Type;

fixed#abcdef34 id:int = Type2;

fixedVec set:Vector<Type2> = FixedVec;

constructorOne#crc32 field1:int = PolymorType;
constructorTwo#2crc32 field_a:long field_b:Type3 field_c:int = PolymorType;
constructorThree#deadcrc bit_flags_of_what_really_present:# optional_field4:bit_flags_of_what_really_present.1?Type = PolymorType;

an_id#12abcd34 id:int = Type3;
a_null#6789cdef = Type3;

Always starts definition constructor, after which, optionally (in practice, always) through the symbol # should CRC32 from the normalized description string of the given type. Next comes the description of the fields, if they are - the type can be empty. It all ends with an equal sign, the name of the type to which the given constructor - that is, in fact, the subtype - belongs. The type to the right of the equals sign is polymorphic - that is, it can correspond to several specific types.

If the definition occurs after the line ---functions---, then the syntax will remain the same, but the meaning will be different: the constructor will become the name of the RPC function, the fields will become parameters (well, that is, it will remain exactly the same given structure as described below, it will just be the meaning given), and "polymorphic type ' is the type of the returned result. True, it will still remain polymorphic - just defined in the section ---types---, and this constructor will not be considered. Type overloads of called functions by their arguments, i.e. for some reason, several functions with the same name but a different signature, as in C++, are not provided in TL.

Why "constructor" and "polymorphic" if it's not OOP? Well, in fact, it will be easier for someone to think about it in terms of OOP - a polymorphic type as an abstract class, and constructors are its direct descendant classes, moreover final in the terminology of a number of languages. In fact, of course, here similarity with real overloaded constructor methods in OO programming languages. Since there are just data structures here, there are no methods (although the description of functions and methods below is quite capable of creating confusion in the head about what they are, but that's about something else) - you can think of a constructor as a value from which being constructed type when reading a stream of bytes.

How does this happen? The deserializer, which always reads 4 bytes, sees the value 0xcrc32 - and understands what will happen next field1 with type int, i.e. reads exactly 4 bytes, on this overlying field with type PolymorType read. Sees 0x2crc32 and understands that there are two fields further, first long, so we read 8 bytes. And then again a complex type, which is deserialized in the same way. For example, Type3 could be declared in the schema as soon as two constructors, respectively, further must meet either 0x12abcd34, after which you need to read another 4 bytes intor 0x6789cdef, after which there will be nothing. Anything else - you need to throw an exception. In any case, after that we return to reading 4 bytes int margins field_c в constructorTwo and on that we finish reading our PolymorType.

Finally, if caught 0xdeadcrc for constructorThree, then things get more complicated. Our first field bit_flags_of_what_really_present with type # - in fact, this is just an alias for the type natmeaning "natural number". That is, in fact, unsigned int is the only case, by the way, when unsigned numbers are found in real circuits. So, next is a construction with a question mark, meaning that this is the field - it will be present on the wire only if the corresponding bit is set in the field referenced (approximately like a ternary operator). So, suppose that this bit was on, then you need to read a field like Type, which in our example has 2 constructors. One is empty (consists only of an identifier), the other has a field ids with type ids:Vector<long>.

You might think that both templates and generics are good or Java. But no. Almost. This only case of angle brackets in real circuits, and it is ONLY used for Vector. In a byte stream, this will be 4 CRC32 bytes for the Vector type itself, always the same, then 4 bytes - the number of array elements, and then these elements themselves.

Add to this the fact that serialization always occurs in words of 4 bytes, all types are multiples of it - built-in types are also described bytes и string with manual serialization of the length and this alignment by 4 - well, it seems to sound normal and even relatively efficient? Although TL is claimed to be efficient binary serialization, but to hell with them, with the expansion of anything, even boolean values ​​and single-character strings up to 4 bytes, will JSON still be much thicker? Look, even unnecessary fields can be skipped by bit flags, everything is just fine, and even extensible for the future, did you add new optional fields to the constructor later?..

But no, if you read not my brief description, but the full documentation, and think about the implementation. Firstly, the CRC32 of the constructor is calculated by the normalized schema text description string (remove extra whitespace, etc.) - so if a new field is added, the type description string will change, and hence its CRC32 and, consequently, serialization. And what would the old client do if he received a field with new flags set, but he did not know what to do with them next? ..

Secondly, let's remember CRC32, which is used here essentially as hash functions to uniquely determine what type is being (de)serialized. Here we are faced with the problem of collisions - and no, the probability is not one in 232, but much more. Who remembered that CRC32 is designed to detect (and correct) errors in the communication channel, and accordingly improve these properties to the detriment of others? For example, she does not care about the permutation of bytes: if you count CRC32 from two lines, in the second you will swap the first 4 bytes with the next 4 bytes - it will be the same. When we have text strings from the Latin alphabet (and a little punctuation) as input, and these names are not particularly random, the probability of such a permutation is greatly increased.

By the way, who checked what was there really CRC32? In one of the early sources (before Waltman) there was a hash function that multiplied each character by the number 239, so beloved by these people, ha ha!

Finally, okay, we realized that constructors with a field type Vector<int> и Vector<PolymorType> will have different CRC32. And what about the presentation on the line? And in terms of theory, does it become part of the type? Let's say we pass an array of ten thousand numbers, well, with Vector<int> everything is clear, the length and another 40000 bytes. And if this Vector<Type2>, which consists of only one field int and it is the only one in the type - do we need to repeat 10000xabcdef0 34 times and then 4 bytes int, or the language is able to DISPLAY this for us from the constructor fixedVec and instead of 80000 bytes, transfer again only 40000?

This is not an idle theoretical question at all - imagine you get a list of group users, each of which has an id, first name, last name - the difference in the amount of data transferred over a mobile connection can be significant. It is the effectiveness of Telegram serialization that is advertised to us.

So…

Vector, which could not be deduced

If you try to wade through the description pages of combinators and about, you will see that a vector (and even a matrix) is formally trying to deduce several sheets through tuples. But in the end they get hammered, the final step is skipped, and the definition of a vector is simply given, which is also not bound to a type. What's the matter here? In languages programming, especially functional ones, it is quite typical to describe the structure recursively - the compiler with its lazy evaluation will understand everything and do it. In language data serialization but EFFICIENCY is needed: it is enough to simply describe list, i.e. a structure of two elements - the first is a data element, the second is the same structure itself or an empty space for the tail (pack (cons) in Lisp). But this will obviously require each element additionally spend 4 bytes (CRC32 in the case of TL) to describe its type. It is easy to describe an array fixed size, but in the case of an array of a previously unknown length, we break off.

So since TL doesn't allow you to output a vector, it had to be added on the side. Ultimately the documentation says:

Serialization always uses the same constructor “vector” (const 0x1cb5c415 = crc32(“vector t:Type # [ t ] = Vector t”) that is not dependent on the specific value of the variable of type t.

The value of the optional parameter t is not involved in the serialization since it is derived from the result type (always known prior to deserialization).

Take a closer look: vector {t:Type} # [ t ] = Vector t - but nowhere the definition itself does not say that the first number must be equal to the length of the vector! And it doesn't follow from anywhere. This is a given that you need to keep in mind and implement with your hands. Elsewhere, the documentation even honestly mentions that the type is fake:

The Vector t polymorphic pseudotype is a “type” whose value is a sequence of values ​​of any type t, either boxed or bare.

… but does not focus on it. When you, tired of wading through the stretching of mathematics (maybe even known to you from a university course), decide to score and watch how to actually work with it in practice, the impression remains in your head: here Serious Mathematics is based on, obviously Cool People (two mathematicians -winner of the ACM), and not just anyone. The goal - to splurge - has been achieved.

By the way, about the number. Recall # it is a synonym nat, natural number:

There are type expressions (typeexpr) and numeric expressions (nat-expr). However, they are defined the same way.

type-expr ::= expr
nat-expr ::= expr

but in grammar they are described in the same way, i.e. this difference again must be remembered and put into the implementation by hand.

Well, yes, template types (vector<int>, vector<User>) have a common identifier (#1cb5c415), i.e. if you know the call is declared as

users.getUsers#d91a548 id:Vector<InputUser> = Vector<User>;

then you are waiting for not just a vector, but a vector of users. More precisely, should wait - in real code, each element, if not a bare type, will have a constructor, and in a good way in the implementation it would be necessary to check - and we were sent exactly in each element of this vector that type? And if it was some kind of PHP, in which the array can contain different types in different elements?

At this point, you start to wonder - is such a TL needed? Maybe for the cart it would be possible to use the human serializer, the same protobuf that already existed then? It was theory, let's look at practice.

Existing TL implementations in code

TL was born in the bowels of VKontakte even before the well-known events with the sale of Durov's share and (surely), even before the development of Telegram. And in open source sources of the first implementation you can find a lot of funny crutches. And the language itself was implemented there more fully than it is now in Telegram. For example, hashes are not used at all in the scheme (meaning the built-in pseudotype (like a vector) with deviant behavior). Or

Templates are not used now. Instead, the same universal constructors (for example, vector {t:Type} [t] = Vector t) are used w

but let us consider for the sake of completeness the picture, in order to trace, so to speak, the evolution of the Giant of Thought.

#define ZHUKOV_BYTES_HACK

#ifdef ZHUKOV_BYTES_HACK

/* dirty hack for Zhukov request */

Or this beautiful one:

    static const char *reserved_words_polymorhic[] = {

      "alpha", "beta", "gamma", "delta", "epsilon", "zeta", "eta", "theta", NULL

      };

This fragment is about templates, like:

intHash {alpha:Type} vector<coupleInt<alpha>> = IntHash<alpha>;

This is the definition of the hashmap template type, as a vector of int - Type pairs. In C++ it would look something like this:

    template <T> class IntHash {
      vector<pair<int,T>> _map;
    }

so here alpha - keyword! But only in C++ you can write T, but you have to write alpha, beta... But no more than 8 parameters, the fantasy ended on theta. So it seems that once in St. Petersburg there were approximately such dialogues:

-- Надо сделать в TL шаблоны
-- Бл... Ну пусть параметры зовут альфа, бета,... Какие там ещё буквы есть... О, тэта!
-- Грамматика? Ну потом напишем

-- Смотрите, какой я синтаксис придумал для шаблонов и вектора!
-- Ты долбанулся, как мы это парсить будем?
-- Да не ссыте, он там один в схеме, захаркодить -- и ок

But it was about the first laid out implementation of TL "in general". Let's move on to the consideration of implementations in the actual Telegram clients.

Basil's word:

Vasily, [09.10.18 17:07] Most of all, the ass is hot from the fact that they screwed up a bunch of abstractions, and then they hammered a bolt on them, and overlaid the codegegerator with crutches
As a result, first from the docks the pilot.jpg
Then from the jekichan.webp code

Of course, from people familiar with algorithms and mathematics, we can expect that they have read Aho, Ullman, and are familiar with the de facto industry standard tools for writing their DSL compilers over the decades, right? ..

By the author telegram-cli is Vitaliy Valtman, as can be understood from the occurrence of the TLO format outside its (cli) limits, a member of the team - now the library for parsing TL is allocated separatelywhat is the impression of her TL parser? ..

16.12 04:18 Vasily: in my opinion, someone has not mastered lex + yacc
16.12 04:18 Vasily: otherwise I can't explain it
16.12 04:18 Vasily: well, or they were paid for the number of lines in VK
16.12 04:19 Vasily: 3k+ lines of others<censored> instead of a parser

Maybe an exception? Let's see how makes this is the OFFICIAL client — Telegram Desktop:

    nametype = re.match(r'([a-zA-Z.0-9_]+)(#[0-9a-f]+)?([^=]*)=s*([a-zA-Z.<>0-9_]+);', line);
    if (not nametype):
      if (not re.match(r'vector#1cb5c415 {t:Type} # [ t ] = Vector t;', line)):
         print('Bad line found: ' + line);

1100+ lines in Python, a couple of regular expressions + special cases of the vector type, which, of course, is declared in the scheme as it should be according to the TL syntax, but they put it on this syntax, parse it more ... The question is, why bother with all this miracleиmore puff, if no one is going to parse it according to the documentation anyway ?!

By the way... Remember we talked about the CRC32 check? So, in the Telegram Desktop code generator there is a list of exceptions for those types in which the calculated CRC32 does not match as indicated in the diagram!

Vasily, [18.12 22:49] and here you should think about whether such a TL is needed
if I wanted to mess with alternative implementations, I would start inserting line breaks, half of the parsers will break on multi-line definitions
tdesktop, however, too

Remember the point about one-liners, we will return to it a little later.

Okay, telegram-cli is unofficial, Telegram Desktop is official, but what about the others? And who knows?.. In the Android client code, there was no schema parser at all (which raises questions about open source, but this is for the second part), but there were several other funny pieces of code, but about them in the subsection below.

What other questions does serialization raise in practice? For example, they screwed up, of course, with bit fields and conditional fields:

vasily: flags.0? true
means the field is present and true if the flag is set

vasily: flags.1? int
means the field is present and needs to be deserialized

Vasily: Ass, don't burn, what are you doing!
Vasily: Somewhere in the doc there is a mention that true is a bare type of zero length, but it is unrealistic to collect something from their docs
Vasily: There is no such thing in open implementations either, but there are a lot of crutches and props

How about a Telethon? Looking ahead on the topic of MTProto, an example - there are such pieces in the documentation, but the sign % it is only described as "corresponding to the given bare-type", i.e. in the examples below, either an error, or something undocumented:

Vasily, [22.06.18/18/38 XNUMX:XNUMX] In one place:

msg_container#73f1f8dc messages:vector message = MessageContainer;

In a different:

msg_container#73f1f8dc messages:vector<%Message> = MessageContainer;

And these are two big differences, in real life some kind of naked vector comes

I haven't seen bare vector definitions and haven't come across it

Analysis written in telethon by hand

His schema commented out the definition msg_container

Again, the question remains about%. It is not described.

Vadim Goncharov, [22.06.18/19/22 XNUMX:XNUMX PM] and in tdesktop?

Vasily, [22.06.18/19/23 XNUMX:XNUMX] But their TL parser on the regulators probably won't eat it either

// parsed manually

TL is a beautiful abstraction, no one implements it completely

And there is no % in their version of the scheme

But here the documentation contradicts itself, so xs

It was found in grammar, they could just forget to describe the semantics

Well, you saw the dock on TL, you can't figure it out without half a liter

“Well, let’s say,” another reader will say, “you criticize everything, so show it as it should.”

Vasily replies: “as for the parser, I need things like

    args: /* empty */ { $$ = NULL; }
        | args arg { $$ = g_list_append( $1, $2 ); }
        ;

    arg: LC_ID ':' type-term { $$ = tl_arg_new( $1, $3 ); }
            | LC_ID ':' condition '?' type-term { $$ = tl_arg_new_cond( $1, $5, $3 ); free($3); }
            | UC_ID ':' type-term { $$ = tl_arg_new( $1, $3 ); }
            | type-term { $$ = tl_arg_new( "", $1 ); }
            | '[' LC_ID ']' { $$ = tl_arg_new_mult( "", tl_type_new( $2, TYPE_MOD_NONE ) ); }
            ;

somehow more like than

struct tree *parse_args4 (void) {
  PARSE_INIT (type_args4);
  struct parse so = save_parse ();
  PARSE_TRY (parse_optional_arg_def);
  if (S) {
    tree_add_child (T, S);
  } else {
    load_parse (so);
  }
  if (LEX_CHAR ('!')) {
    PARSE_ADD (type_exclam);
    EXPECT ("!");
  }
  PARSE_TRY_PES (parse_type_term);
  PARSE_OK;
}

or

        # Regex to match the whole line
        match = re.match(r'''
            ^                  # We want to match from the beginning to the end
            ([w.]+)           # The .tl object can contain alpha_name or namespace.alpha_name
            (?:
                #             # After the name, comes the ID of the object
                ([0-9a-f]+)    # The constructor ID is in hexadecimal form
            )?                 # If no constructor ID was given, CRC32 the 'tl' to determine it

            (?:s              # After that, we want to match its arguments (name:type)
                {?             # For handling the start of the '{X:Type}' case
                w+            # The argument name will always be an alpha-only name
                :              # Then comes the separator between name:type
                [wd<>#.?!]+  # The type is slightly more complex, since it's alphanumeric and it can
                               # also have Vector<type>, flags:# and flags.0?default, plus :!X as type
                }?             # For handling the end of the '{X:Type}' case
            )*                 # Match 0 or more arguments
            s                 # Leave a space between the arguments and the equal
            =
            s                 # Leave another space between the equal and the result
            ([wd<>#.?]+)     # The result can again be as complex as any argument type
            ;$                 # Finally, the line should always end with ;
            ''', tl, re.IGNORECASE | re.VERBOSE)

this is the ENTIRE lexer:

    ---functions---         return FUNCTIONS;
    ---types---             return TYPES;
    [a-z][a-zA-Z0-9_]*      yylval.string = strdup(yytext); return LC_ID;
    [A-Z][a-zA-Z0-9_]*      yylval.string = strdup(yytext); return UC_ID;
    [0-9]+                  yylval.number = atoi(yytext); return NUM;
    #[0-9a-fA-F]{1,8}       yylval.number = strtol(yytext+1, NULL, 16); return ID_HASH;

    n                      /* skip new line */
    [ t]+                  /* skip spaces */
    //.*$                 /* skip comments */
    /*.**/              /* skip comments */
    .                       return (int)yytext[0];

those. simpler is putting it mildly."

In general, in the end, the parser and code generator for the actually used subset of TL fit into about 100 lines of grammar and ~ 300 lines of the generator (including all print's generated code), including type goodies, type information for introspection in each class. Each polymorphic type is turned into an empty abstract base class, and constructors inherit from it and have methods for serialization and deserialization.

Lack of types in type language

Strong typing is good, right? No, this is not a holivar (although I prefer dynamic languages), but a postulate within TL. Based on it, the language should provide all sorts of checks for us. Well, okay, let not him, but the implementation, but he should at least describe them. And what opportunities do we want?

First of all, constraints. Here we see in the documentation for uploading files:

The file's binary content is then split into parts. All parts must have the same size ( part_size ) and the following conditions must be met:

  • part_size % 1024 = 0 (divisible by 1KB)
  • 524288 % part_size = 0 (512KB must be evenly divisible by part_size)

The last part does not have to satisfy these conditions, provided its size is less than part_size.

Each part should have a sequence number, file_part, with a value ranging from 0 to 2,999.

After the file has been partitioned you need to choose a method for saving it on the server. use upload.saveBigFilePart in case the full size of the file is more than 10 MB and upload.saveFilePart for smaller files.
[…] one of the following data input errors may be returned:

  • FILE_PARTS_INVALID - Invalid number of parts. The value is not between 1..3000

Are any of these present in the schema? Is it somehow expressible by means of TL? No. But excuse me, even the old-fashioned Turbo Pascal was able to describe the types given by ranges. And he could do one more thing, now better known as enum - a type consisting of an enumeration of a fixed (small) number of values. In languages ​​like C - numeric, mind you, we've only talked about types so far. numbers. But there are also arrays, strings ... for example, it would be nice to describe that this string can only contain a phone number, right?

None of this is in TL. But there is, for example, in JSON Schema. And if someone else can object about the divisibility of 512 KB that this still needs to be checked in the code, then make sure that the client simply could not send number out of range 1..3000 (and the corresponding error could not have arisen) it would be possible, right? ..

By the way, about errors and return values. The eye is blurred even for those who have worked with TL - it did not immediately dawn on us that each a function in TL can actually return not only the described return type, but also an error. But this is not deducible by means of the TL itself. Of course, it’s understandable and nafig is not necessary in practice (although in fact, RPC can be done in different ways, we will return to this) - but what about the Purity of the concepts of Mathematics of Abstract Types from the heavenly world? .. Grabbed the tug - so match.

And finally, what about readability? Well, there, in general, I would like Description have it right in the schema (again, it is in the JSON schema), but if it’s already strained with it, then what about the practical side - at least it’s trite to watch diffs during updates? See for yourself at real examples:

channelFull#76af5481 flags:# can_view_participants:flags.3?true can_set_username:flags.6?true can_set_stickers:flags.7?true hidden_prehistory:flags.10?true id:int about:string participants_count:flags.0?int admins_count:flags.1?int kicked_count:flags.2?int banned_count:flags.2?int read_inbox_max_id:int read_outbox_max_id:int unread_count:int chat_photo:Photo notify_settings:PeerNotifySettings exported_invite:ExportedChatInvite bot_info:Vector<BotInfo> migrated_from_chat_id:flags.4?int migrated_from_max_id:flags.4?int pinned_msg_id:flags.5?int stickerset:flags.8?StickerSet available_min_id:flags.9?int = ChatFull;
+channelFull#1c87a71a flags:# can_view_participants:flags.3?true can_set_username:flags.6?true can_set_stickers:flags.7?true hidden_prehistory:flags.10?true can_view_stats:flags.12?true id:int about:string participants_count:flags.0?int admins_count:flags.1?int kicked_count:flags.2?int banned_count:flags.2?int online_count:flags.13?int read_inbox_max_id:int read_outbox_max_id:int unread_count:int chat_photo:Photo notify_settings:PeerNotifySettings exported_invite:ExportedChatInvite bot_info:Vector<BotInfo> migrated_from_chat_id:flags.4?int migrated_from_max_id:flags.4?int pinned_msg_id:flags.5?int stickerset:flags.8?StickerSet available_min_id:flags.9?int = ChatFull;

or

message#44f9b43d flags:# out:flags.1?true mentioned:flags.4?true media_unread:flags.5?true silent:flags.13?true post:flags.14?true id:int from_id:flags.8?int to_id:Peer fwd_from:flags.2?MessageFwdHeader via_bot_id:flags.11?int reply_to_msg_id:flags.3?int date:int message:string media:flags.9?MessageMedia reply_markup:flags.6?ReplyMarkup entities:flags.7?Vector<MessageEntity> views:flags.10?int edit_date:flags.15?int post_author:flags.16?string grouped_id:flags.17?long = Message;
+message#44f9b43d flags:# out:flags.1?true mentioned:flags.4?true media_unread:flags.5?true silent:flags.13?true post:flags.14?true from_scheduled:flags.18?true id:int from_id:flags.8?int to_id:Peer fwd_from:flags.2?MessageFwdHeader via_bot_id:flags.11?int reply_to_msg_id:flags.3?int date:int message:string media:flags.9?MessageMedia reply_markup:flags.6?ReplyMarkup entities:flags.7?Vector<MessageEntity> views:flags.10?int edit_date:flags.15?int post_author:flags.16?string grouped_id:flags.17?long = Message;

Someone like it, but GitHub, for example, refuses to highlight changes inside such long lines. The game “find 10 differences”, and what the brain immediately sees is that the beginnings and ends are the same in both examples, you need to tediously read somewhere in the middle ... In my opinion, this is not just in theory, but purely visually looks dirty and unkempt.

By the way, about the purity of theory. Why are bit fields needed? Don't they seem to smell bad from the point of view of type theory? An explanation can be seen in earlier versions of the schema. At first, yes, it was so, a new type was created for each sneeze. These rudiments are still there in this form, for example:

storage.fileUnknown#aa963b05 = storage.FileType;
storage.filePartial#40bc6f52 = storage.FileType;
storage.fileJpeg#7efe0e = storage.FileType;
storage.fileGif#cae1aadf = storage.FileType;
storage.filePng#a4f63c0 = storage.FileType;
storage.filePdf#ae1e508d = storage.FileType;
storage.fileMp3#528a0677 = storage.FileType;
storage.fileMov#4b09ebbc = storage.FileType;
storage.fileMp4#b3cea0e4 = storage.FileType;
storage.fileWebp#1081464c = storage.FileType;

But now imagine if you have 5 optional fields in your structure, then you need 32 types for all possible options. combinatorial explosion. So the crystal purity of the TL theory once again crashed against the cast-iron ass of the harsh reality of serialization.

In addition, in places these guys themselves violate their own typing. For example, in MTProto (next chapter) the response can be compressed by Gzip, everything is sensible - except for the violation of layers and schema. Once, and did not reap the RpcResult itself, but its contents. Well, why do this? .. I had to cut in a crutch so that compression would work anywhere.

Or another example, we once found an error - sent InputPeerUser instead InputUser. Or vice versa. But it worked! That is, the server did not care about the type. How can this be? The answer, perhaps, will be prompted by code fragments from telegram-cli:

  if (tgl_get_peer_type (E->id) != TGL_PEER_CHANNEL || (C && (C->flags & TGLCHF_MEGAGROUP))) {
    out_int (CODE_messages_get_history);
    out_peer_id (TLS, E->id);
  } else {    
    out_int (CODE_channels_get_important_history);

    out_int (CODE_input_channel);
    out_int (tgl_get_peer_id (E->id));
    out_long (E->id.access_hash);
  }
  out_int (E->max_id);
  out_int (E->offset);
  out_int (E->limit);
  out_int (0);
  out_int (0);

In other words, here the serialization is done MANUALLY, not generated code! Maybe the server is implemented in a similar way?.. In principle, this will work if done once, but how can you support it later with updates? Isn't that what the scheme was for? And then we move on to the next question.

Versioning. Layers

Why schema versions are called layers can only be guessed at based on the history of published schemas. Apparently, at first it seemed to the authors that basic things can be done on an unchanged scheme, and only where necessary, indicate to specific requests that they are being done according to a different version. In principle, even a good idea - and the new will, as it were, “mix in”, layer on the old. But let's see how it was done. True, it was not possible to look from the very beginning - it's funny, but the base layer scheme simply does not exist. Layers started at 2. The documentation tells us about a special TL feature:

If a client supports Layer 2, then the following constructor must be used:

invokeWithLayer2#289dd1f6 {X:Type} query:!X = X;

In practice, this means that before every API call, an int with the value 0x289dd1f6 must be added before the method number.

Sounds OK. But what happened next? Then came

invokeWithLayer3#b7475268 query:!X = X;

So what is next? As it is easy to guess

invokeWithLayer4#dea0d430 query:!X = X;

Funny? No, it's too early to laugh, think about what each a request from another layer needs to be wrapped in such a special type - if you have them all different, how else to distinguish them? And adding just 4 bytes in front is a pretty efficient method. So

invokeWithLayer5#417a57ae query:!X = X;

But it is obvious that after a while it will become some bacchanalia. And the solution came:

Update: Starting with Layer 9, helper methods invokeWithLayerN can be used together with initConnection

Hooray! After 9 versions, we finally came to what was done in the Internet protocols back in the 80s - version negotiation once at the beginning of the connection!

So what is next?..

invokeWithLayer10#39620c41 query:!X = X;
...
invokeWithLayer18#1c900537 query:!X = X;

And now you can laugh. Only after another 9 layers, a universal constructor with a version number was finally added, which needs to be called only once at the beginning of the connection, and the meaning in the layers seems to have disappeared, now it's just a conditional version, like everywhere else. Problem solved.

Right?..

Vasily, [16.07.18/14/01 XNUMX:XNUMX PM] On Friday I thought:
The teleserver sends events without a request. Requests need to be wrapped in InvokeWithLayer. The server does not wrap updates, there is no structure for wrapping responses and updates.

Those. the client cannot specify the layer in which he wants updates

Vadim Goncharov, [16.07.18/14/02 XNUMX:XNUMX PM] Isn't InvokeWithLayer a crutch in principle?

Vasily, [16.07.18/14/02 XNUMX:XNUMX PM] This is the only way

Vadim Goncharov, [16.07.18/14/02 XNUMX:XNUMX PM] which in essence should mean layering at the beginning of the session

By the way, it follows from this that a client downgrade is not provided

Updates, i.e. type Updates in the scheme, this is what the server sends to the client not in response to an API request, but on its own when an event occurs. This is a complex topic that will be discussed in another post, but for now it is important to know that the server accumulates Updates even when the client is offline.

Thus, when refusing to wrap each package to indicate its version, hence the following possible problems logically arise:

  • the server sends updates to the client before the client has told which version it supports
  • what should be done after upgrading the client?
  • who guaranteesthat the server's opinion about the layer number will not change in the process?

Do you think this is purely theoretical thinking, and in practice this cannot happen, because the server is written correctly (in any case, it is tested well)? Ha! No matter how!

This is exactly what we ran into in August. On August 14, messages flashed that something was being updated on the Telegram servers ... and then in the logs:

2019-08-15 09:28:35.880640 MSK warn  main: ANON:87: unknown object type: 0x80d182d1 at TL/Object.pm line 213.
2019-08-15 09:28:35.751899 MSK warn  main: ANON:87: unknown object type: 0xb5223b0f at TL/Object.pm line 213.

and then a few megabytes of stack traces (well, at the same time, logging was fixed). After all, if something was not recognized in your TL - it is binary by signatures, further in the stream EVERYTHING goes, decoding will become impossible. What to do in such a situation?

Well, the first thing that comes to anyone's mind is to disconnect and try again. Did not help. We googled CRC32 - these turned out to be objects from scheme 73, although we worked on scheme 82. We carefully look at the logs - there are identifiers from two different schemes!

Maybe the problem is purely in our unofficial client? No, we run Telegram Desktop 1.2.17 (the version supplied with a number of Linux distributions), it writes to the Exception log: MTP Unexpected type id #b5223b0f read in MTPMessageMedia…

Criticism of the protocol and organizational approaches of Telegram. Part 1, technical: experience of writing a client from scratch - TL, MT

Google showed that a similar problem had already happened to one of the unofficial clients, but then the version numbers and, accordingly, the assumptions were different ...

So what to do? Vasily and I split up: he tried to update the scheme to 91, I decided to wait a few days and try to 73. Both methods worked, but since they are empirical, there is no understanding of how many versions you need to jump up or down, nor how long you have to wait .

Later, I managed to reproduce the situation: we start the client, turn it off, recompile the scheme to another layer, restart, catch the problem again, return to the previous one - oops, no switching the scheme and restarting the client for several minutes will help. You will receive a mix of data structures from different layers.

Explanation? As you can guess from the various indirect symptoms, the server consists of many different types of processes on different machines. Most likely, the one of the servers that is responsible for “buffering” put in the queue what the higher ones gave it, and they gave it in the scheme that was at the time of generation. And until this queue was “rotten”, nothing could be done about it.

Unless... but this is a terrible crutch?!.. No, before thinking about crazy ideas, let's look at the code of official clients. In the Android version, we do not find any TL parser, but we find a hefty file (github refuses to color it) with (de)serialization. Here are the code snippets:

public static class TL_message_layer68 extends TL_message {
    public static int constructor = 0xc09be45f;
//...
//еще пачка подобных
//...
    public static class TL_message_layer47 extends TL_message {
        public static int constructor = 0xc992e15c;
        public static Message TLdeserialize(AbstractSerializedData stream, int constructor, boolean exception) {
            Message result = null;
            switch (constructor) {
                case 0x1d86f70e:
                    result = new TL_messageService_old2();
                    break;
                case 0xa7ab1991:
                    result = new TL_message_old3();
                    break;
                case 0xc3060325:
                    result = new TL_message_old4();
                    break;
                case 0x555555fa:
                    result = new TL_message_secret();
                    break;
                case 0x555555f9:
                    result = new TL_message_secret_layer72();
                    break;
                case 0x90dddc11:
                    result = new TL_message_layer72();
                    break;
                case 0xc09be45f:
                    result = new TL_message_layer68();
                    break;
                case 0xc992e15c:
                    result = new TL_message_layer47();
                    break;
                case 0x5ba66c13:
                    result = new TL_message_old7();
                    break;
                case 0xc06b9607:
                    result = new TL_messageService_layer48();
                    break;
                case 0x83e5de54:
                    result = new TL_messageEmpty();
                    break;
                case 0x2bebfa86:
                    result = new TL_message_old6();
                    break;
                case 0x44f9b43d:
                    result = new TL_message_layer104();
                    break;
                case 0x1c9b1027:
                    result = new TL_message_layer104_2();
                    break;
                case 0xa367e716:
                    result = new TL_messageForwarded_old2(); //custom
                    break;
                case 0x5f46804:
                    result = new TL_messageForwarded_old(); //custom
                    break;
                case 0x567699b3:
                    result = new TL_message_old2(); //custom
                    break;
                case 0x9f8d60bb:
                    result = new TL_messageService_old(); //custom
                    break;
                case 0x22eb6aba:
                    result = new TL_message_old(); //custom
                    break;
                case 0x555555F8:
                    result = new TL_message_secret_old(); //custom
                    break;
                case 0x9789dac4:
                    result = new TL_message_layer104_3();
                    break;

or

    boolean fixCaption = !TextUtils.isEmpty(message) &&
    (media instanceof TLRPC.TL_messageMediaPhoto_old ||
     media instanceof TLRPC.TL_messageMediaPhoto_layer68 ||
     media instanceof TLRPC.TL_messageMediaPhoto_layer74 ||
     media instanceof TLRPC.TL_messageMediaDocument_old ||
     media instanceof TLRPC.TL_messageMediaDocument_layer68 ||
     media instanceof TLRPC.TL_messageMediaDocument_layer74)
    && message.startsWith("-1");

Hmm... it looks crazy. But, probably, this is a generated code, then okay? .. But it certainly supports all versions! True, it is not clear why everything is mixed in one heap, and secret chats, and all sorts of _old7 somehow not similar to machine generation ... However, most of all I went nuts from

TL_message_layer104
TL_message_layer104_2
TL_message_layer104_3

Guys, can't you even decide within one layer?! Well, okay, "two", let's say, were released with an error, well, it happens, but THREE? .. Immediately again on the same rake? What kind of pornography is this, sorry? ..

By the way, a similar thing happens in the Telegram Desktop sources - if so, and several commits in a row to the scheme do not change its layer number, but fix something. In conditions when there is no official data source for the scheme, where can I get it from, except for the official client sources? And you take it from there, you cannot be sure that the scheme is entirely correct until you test all the methods.

How can this even be tested? I hope that fans of unit, functional and other tests will share in the comments.

Okay, let's look at another piece of code:

public static class TL_folders_deleteFolder extends TLObject {
    public static int constructor = 0x1c295881;

    public int folder_id;

    public TLObject deserializeResponse(AbstractSerializedData stream, int constructor, boolean exception) {
        return Updates.TLdeserialize(stream, constructor, exception);
    }

    public void serializeToStream(AbstractSerializedData stream) {
        stream.writeInt32(constructor);
        stream.writeInt32(folder_id);
    }
}

//manually created

//RichText start
public static abstract class RichText extends TLObject {
    public String url;
    public long webpage_id;
    public String email;
    public ArrayList<RichText> texts = new ArrayList<>();
    public RichText parentRichText;

    public static RichText TLdeserialize(AbstractSerializedData stream, int constructor, boolean exception) {
        RichText result = null;
        switch (constructor) {
            case 0x1ccb966a:
                result = new TL_textPhone();
                break;
            case 0xc7fb5e01:
                result = new TL_textSuperscript();
                break;

That “manually created” comment here suggests that only part of this file is written by hand (can you imagine the maintenance nightmare?), and the rest is machine generated. However, then another question arises - that the sources are available not completely (a la blobs under the GPL in the Linux kernel), but this is already a topic for the second part.

But enough. Let's move on to the protocol on top of which all this serialization is chasing.

MT Proto

So let's open general description и detailed description of the protocol and the first thing we stumble over is terminology. And with an abundance of everything. In general, this seems to be a trademark of Telegram - to call things in different places in different ways, or different things in one word, or vice versa (for example, in a high-level API if you see a sticker pack - this is not what you thought).

For example, "message" (message) and "session" (session) - here they mean something different than in the usual interface of the Telegram client. Well, everything is clear with the message, it could be interpreted in terms of OOP, or simply called the word “package” - this is a low, transport level, there are not the same messages as in the interface, there are a lot of service ones. But the session ... but first things first.

Transport level

The first thing is transport. We will be told about 5 options:

  • TCP
  • web socket
  • Websocket over HTTPS
  • HTTP
  • HTTPS

Vasily, [15.06.18/15/04 XNUMX:XNUMX PM] And there is also UDP transport, but it is not documented

And TCP in three variants

The first is similar to UDP over TCP, each packet includes a sequence number and a crc
Why is it so painful to read docks on a cart?

Well there now TCP already in 4 variants:

  • bridged
  • Intermediate
  • padded intermediate
  • Full

Ok, Padded intermediate for MTProxy, this was later added due to known events. But why two more versions (three in total), when one could do? All four essentially differ only in how to set the length and payload of the main MTProto itself, which will be discussed further:

  • in Abridged it is 1 or 4 bytes but not 0xef then body
  • in Intermediate this is 4 bytes of length and a field, and the first time the client must send 0xeeeeeeee to indicate that it is Intermediate
  • in Full, the most addictive, from the point of view of a networker: length, sequence number, and NOT THE ONE that is basically MTProto, body, CRC32. Yes, all this over TCP. Which provides us with reliable transport in the form of a serial stream of bytes, no sequences are needed, especially checksums. Okay, now I will be objected that TCP has a 16-bit checksum, so data corruption happens. Great, except that we actually have a cryptographic protocol with hashes longer than 16 bytes, all these errors - and even more - will be caught on a SHA mismatch at a higher level. There is NO point in CRC32 over this.

Let's compare Abridged, where one byte of length is possible, with Intermediate, which justifies "In case 4-byte data alignment is needed", which is pretty nonsense. What, it is believed that Telegram programmers are so clumsy that they cannot read data from the socket into an aligned buffer? You still have to do this, because reading can return you any number of bytes (and there are also proxy servers, for example ...). Or, on the other hand, why bother with Abridged if we still have hefty paddings from 16 bytes on top - save 3 bytes sometimes ?

One gets the impression that Nikolai Durov is very fond of inventing bicycles, including network protocols, without real practical need.

Other transport options, incl. Web and MTProxy, we will not consider now, maybe in another post, if there is a request. We will only recall now about this very MTProxy that soon after its release in 2018, providers quickly learned to block exactly it, intended for block bypassBy packet size! And also the fact that the MTProxy server written (again by Waltman) in C was unnecessarily tied to Linux specifics, although it was not required at all (Phil Kulin will confirm), and that a similar server either on Go or on Node.js fit less than a hundred lines.

But we will draw conclusions about the technical literacy of these people at the end of the section, after considering other issues. For now, let's move on to the 5th OSI layer, session - on which they placed the MTProto session.

Keys, messages, sessions, Diffie-Hellman

They put it there not entirely correctly ... A session is not the same session that is visible in the interface under Active sessions. But in order.

Criticism of the protocol and organizational approaches of Telegram. Part 1, technical: experience of writing a client from scratch - TL, MT

Here we have received a string of bytes of known length from the transport layer. This is either an encrypted message or plaintext - if we are still at the key negotiation stage and are actually doing it. Which of the bunch of concepts called "key" are we talking about? Let's clarify this issue for the Telegram team itself (I apologize for translating my own documentation from English to either a tired brain at 4 in the morning, it was easier to leave some phrases as they are):

There are two entities called Session - one in the UI of official clients under "current sessions", where each session corresponds to a whole device / OS.
The second - MTProto session, which has a message sequence number (in a low-level sense) in it, and which may last between different TCP connections. Several MTProto sessions can be set up at the same time, for example, to speed up file downloads.

Between these two studio sessions is the concept authorization. In the degenerate case, one can say that UI session is the same as authorizationBut alas, it's complicated. We look:

  • The user on the new device first generates auth_key and bounds it to account, for example, by SMS - that's why authorization
  • It happened inside the first MTProto session, which has session_id inside yourself.
  • At this step, the combination authorization и session_id could be named instance - this word is found in the documentation and code of some clients
  • Then, the client can open some MTProto sessions under the same auth_key - to the same DC.
  • Then one day the client needs to request a file from another DC - and for this DC a new one will be generated auth_key !
  • To tell the system that this is not a new user registering, but the same authorization (UI session), the client uses API calls auth.exportAuthorization in home DC auth.importAuthorization in the new DC.
  • All the same, there may be several open MTProto sessions (each with its own session_id) to this new DC, under his auth_key.
  • Finally, the client may want Perfect Forward Secrecy. Every auth_key was permanent key - per DC - and the client can call auth.bindTempAuthKey for use temporary auth_key - and again, only one temp_auth_key per DC, common to all MTProto sessions to this DC.

Note that salt (and future salts) also one on auth_key those. shared among all MTProto sessions to the same DC.

What does "between different TCP connections" mean? It means that this something like authorization cookie on a website - it persists (survives) many TCP connections to this server, but one day it will go bad. Only unlike HTTP, in MTProto, inside the session, messages are sequentially numbered and confirmed, they entered the tunnel, the connection was broken - after establishing a new connection, the server will kindly send everything in this session that it did not deliver in the previous TCP connection.

However, the information above is a squeeze after many months of litigation. In the meantime, are we implementing our client from scratch? - let's go back to the beginning.

So we generate auth_key by versions of Diffie-Hellman from Telegram. Let's try to understand the documentation...

Vasily, [19.06.18/20/05 1:255] data_with_hash := SHAXNUMX(data) + data + (any random bytes); such that the length is equal to XNUMX bytes;
encrypted_data := RSA(data_with_hash, server_public_key); a 255-byte long number (big endian) is raised to the requisite power over the requisite modulus, and the result is stored as a 256-byte number.

They got some dope DH

Doesn't look like a healthy person's DH
There are no two public keys in dx

Well, in the end, we figured it out, but the sediment remained - a proof of work is done by the client that he was able to factorize the number. Type of protection against DoS attacks. And the RSA key is only used once in one direction, essentially for encryption new_nonce. But while this seemingly simple operation succeeds, what will you have to face?

Vasily, [20.06.18/00/26 XNUMX:XNUMX] I haven't reached the appid request yet

I sent a request to DH

And, in the dock on the transport it is written that it can answer with 4 bytes of the error code. And that's it

Well, he told me -404, so what?

Here I am to him: “catch your efigna encrypted with the server key with a fingerprint of such and such, I want DH”, and it responds stupidly 404

What would you think of such a server response? What to do? There is no one to ask (but more on that in the second part).

Here all the interest in the dock is to do

I have nothing else to do, I only dreamed of converting numbers back and forth

Two 32 bit numbers. I packed them like everyone else

But no, it is these two that you need first in a line as BE

Vadim Goncharov, [20.06.18/15/49 404:XNUMX PM] and because of this XNUMX?

Vasily, [20.06.18/15/49 XNUMX:XNUMX PM] YES!

Vadim Goncharov, [20.06.18/15/50 XNUMX:XNUMX PM] so I don’t understand what he can “didn’t find”

Vasily, [20.06.18 15:50] about

I did not find such a decomposition into simple divisors%)

Even error reporting was not mastered

Vasily, [20.06.18/20/18 5:XNUMX PM] Oh, there's also MDXNUMX. Already three different hashes

The key fingerprint is computed as follows:

digest = md5(key + iv)
fingerprint = substr(digest, 0, 4) XOR substr(digest, 4, 4)

SHA1 and sha2

So let's put auth_key 2048 bits in size we got according to Diffie-Hellman. What's next? Then we find out that the lower 1024 bits of this key are not used in any way ... but let's think about this for now. At this step, we have a shared secret with the server. An analogue of a TLS session has been established, a very costly procedure. But the server doesn't know anything about who we are yet! Not yet, actually authorization. Those. if you thought in terms of “login-password”, as it used to be in ICQ, or at least “login-key”, as in SSH (for example, on some gitlab / github). We got anonymous. And if the server answers us "these phone numbers are served by another DC"? Or even “your phone number is banned”? The best thing we can do is save the key in the hope that it will still be useful and not rotten by then.

By the way, we "received" it with reservations. For example, do we trust the server? Is he fake? We need cryptographic checks:

Vasily, [21.06.18/17/53 2:XNUMX PM] They offer mobile clients to check a XNUMXkbit number for simplicity%)

But it's not clear at all, nafeijoa

Vasily, [21.06.18/18/02 XNUMX:XNUMX] The dock does not say what to do if it turned out to be not simple

Not said. Let's see what the official client for Android does in this case? A that's what (and yes, the whole file is interesting there) - as they say, I'll just leave it here:

278     static const char *goodPrime = "c71caeb9c6b1c9048e6c522f70f13f73980d40238e3e21c14934d037563d930f48198a0aa7c14058229493d22530f4dbfa336f6e0ac925139543aed44cce7c3720fd51f69458705ac68cd4fe6b6b13abdc9746512969328454f18faf8c595f642477fe96bb2a941d5bcd1d4ac8cc49880708fa9b378e3c4f3a9060bee67cf9a4a4a695811051907e162753b56b0f6b410dba74d8a84b2a14b3144e0ef1284754fd17ed950d5965b4b9dd46582db1178d169c6bc465b0d6ff9ca3928fef5b9ae4e418fc15e83ebea0f87fa9ff5eed70050ded2849f47bf959d956850ce929851f0d8115f635b105ee2e4e15d04b2454bf6f4fadf034b10403119cd8e3b92fcc5b";
279   if (!strcasecmp(prime, goodPrime)) {

No, of course there some there are checks for the simplicity of a number, but personally I no longer have sufficient knowledge in mathematics.

Okay, we got the master key. To log in, i.e. send requests, it is necessary to perform further encryption, already using AES.

The message key is defined as the 128 middle bits of the SHA256 of the message body (including session, message ID, etc.), including the padding bytes, prepended by 32 bytes taken from the authorization key.

Vasily, [22.06.18/14/08 XNUMX:XNUMX PM] Average bitches

Got auth_key. All. Further them ... it is not clear from the docks. Feel free to study the open source code.

Note that MTProto 2.0 requires from 12 to 1024 bytes of padding, still subject to the condition that the resulting message length be divisible by 16 bytes.

So how much padding to put in?

And yes, here, too, 404 in case of an error

If someone carefully studied the diagram and the text of the documentation, he noticed that there is no MAC there. And that AES is used in some IGE mode that is not used anywhere else. They, of course, write about it in their FAQ... Here, like, the message key itself is at the same time the SHA hash of the decrypted data used to check the integrity - and in case of a mismatch, the documentation for some reason recommends silently ignoring them (but what about security, suddenly break us?).

I am not a cryptographer, maybe in this mode in this case there is nothing wrong from a theoretical point of view. But I can definitely name a practical problem, using the example of Telegram Desktop. It encrypts the local cache (all these D877F783D5D3EF8C) in the same way as messages in MTProto (only in this case, version 1.0), i.e. first the message key, then the data itself (and somewhere aside the main big auth_key 256 bytes, without which msg_key useless). So, the problem becomes noticeable on large files. Namely, you need to keep two copies of the data - encrypted and decrypted. And if there are megabytes, or streaming video, for example? .. Classic schemes with MAC after the ciphertext allow you to read it streaming, immediately transferring it. And with MTProto you have to first encrypt or decrypt the entire message, only then transfer it to the network or to disk. Therefore, in the latest versions of Telegram Desktop in the cache in user_data another format is already used - with AES in CTR mode.

Vasily, [21.06.18/01/27 20:XNUMX AM] Oh, I found out what IGE is: IGE was the first attempt at an "authenticating encryption mode," originally for Kerberos. It was a failed attempt (it does not provide integrity protection), and had to be removed. That was the beginning of a XNUMX year quest for an authenticating encryption mode that works, which recently culminated in modes like OCB and GCM.

And now the arguments from the cart side:

The team behind Telegram, led by Nikolai Durov, consists of six ACM champions, half of them Ph.Ds in math. It took them about two years to roll out the current version of MTProto.

What's funny. Two years to the lower level

Or we could just take tls

Okay, let's say we have done encryption and other nuances. Can we finally send TL-serialized requests and deserialize responses? So what should be sent and how? Here is the method initConnectionmaybe this is it?

Vasily, [25.06.18/18/46 XNUMX:XNUMX PM] Initializes connection and save information on the user's device and application.

It accepts app_id, device_model, system_version, app_version and lang_code.

And some query

Documentation as always. Feel free to study the open source

If everything was roughly clear with invokeWithLayer, then what is it? It turns out that suppose we have - the client already had something to ask the server about - there is a request that we wanted to send:

Vasily, [25.06.18/19/13 XNUMX:XNUMX] Judging by the code, the first call is wrapped in this garbage, and the garbage itself is in invokewithlayer

Why couldn't initConnection be a separate call, but must be a wrapper? Yes, as it turned out, it must be done every time at the beginning of each session, and not one-time, as with the main key. But! It cannot be called by an unauthorized user! Here we have reached the stage in which it is applicable This one documentation page - and it tells us that...

Only a small portion of the API methods are available to unauthorized users:

  • auth.sendCode
  • auth.resendCode
  • account.getPassword
  • auth.checkPassword
  • auth.checkPhone
  • auth.signUp
  • auth.signIn
  • auth.importAuthorization
  • help.getConfig
  • help.getNearestDc
  • help.getAppUpdate
  • help.getCdnConfig
  • langpack.getLangPack
  • langpack.getStrings
  • langpack.getDifference
  • langpack.getLanguages
  • langpack.getLanguage

The very first of them auth.sendCode, and there is that treasured first request in which we will send api_id and api_hash, and after which we receive an SMS with a code. And if we got to the wrong DC (phone numbers of this country are served by another, for example), then we will receive an error with the number of the desired DC. To find out which IP address we need to connect to by the DC number, we will be helped by help.getConfig. Once there were only 5 entries, but after the well-known events of 2018, the number has increased significantly.

Now let's remember that we got at this stage on the anonymous server. Isn't it too expensive to just get an IP address? Why not do this, and other operations, in the unencrypted part of MTProto? I hear an objection: “how can you make sure that it’s not the RKN that will respond with fake addresses?”. To this we recall that, in fact, in official clients embedded RSA keys, i.e. you can just sign this information. Actually, this is already done for information on bypassing locks that clients receive through other channels (it is logical that this cannot be done in MTProto itself, because you still need to know where to connect).

OK. At this stage of client authorization, we are not yet authorized and have not registered our application. We just want to see for now what the server responds to the methods available to an unauthorized user. And here…

Vasily, [10.07.18 14:45] https://core.telegram.org/method/help.getConfig

config#7dae33e0 [...] = Config;
help.getConfig#c4f9186b = Config;

https://core.telegram.org/api/datacenter

config#232d5905 [...] = Config;
help.getConfig#c4f9186b = Config;

In the scheme, the first, the second comes

In the tdesktop schema, the third value is

Yes, since then, of course, the documentation has been updated. Although soon it may become irrelevant again. And how should a novice developer know? Maybe if you register your application, they will inform you? Vasily did this, but alas, nothing was sent to him (again, we'll talk about this in the second part).

... You noticed that we have already somehow moved to the API, i.e. to the next level and missed something in the MTProto theme? Nothing surprising:

Vasily, [28.06.18/02/04 2:XNUMX AM] Mm, they are rummaging through some of the algorithms on eXNUMXe

Mtproto defines encryption algorithms and keys for both domains, as well as a bit of a wrapper structure

But they are constantly mixing different stack levels, so it's not always clear where mtproto ended and the next level began.

How are they mixed? Well, here is the same temporary key for PFS, for example (by the way, Telegram Desktop does not know how to do it). It is executed by an API request auth.bindTempAuthKey, i.e. from the top level. But at the same time, it interferes with encryption at the lower level - after it, for example, you need to do it again initConnection etc., this is not just normal request. Separately, it also delivers that you can have only ONE temporary key on the DC, although the field auth_key_id in each message allows you to change the key at least every message, and that the server has the right to “forget” the temporary key at any time - what to do in this case, the documentation does not say ... well, why it would not be possible to have several keys, as with a set of future salts, but ?..

There are a few other things worth noting in the MTProto theme.

Message messages, msg_id, msg_seqno, acknowledgments, pings in the wrong direction and other idiosyncrasies

Why do you need to know about them? Because they "leak" one level higher, and you need to know about them when working with the API. Suppose we are not interested in msg_key, the lower level decrypted everything for us. But inside the decrypted data, we have the following fields (also the length of the data to know where the padding is, but this is not important):

  • salt-int64
  • session_id - int64
  • message_id - int64
  • seq_no-int32

Recall that salt is one for the entire DC. Why know about her? Not only because there is a request get_future_salts, which tells which intervals will be valid, but also because if your salt is “rotten”, then the message (request) will simply be lost. The server will of course report the new salt by issuing new_session_created - but with the old one you will have to somehow resend, for example. And this question affects the architecture of the application.

The server is allowed to drop sessions altogether and respond in this way for many reasons. Actually, what is an MTProto session from the client side? These are two numbers session_id и seq_no messages within this session. Well, and the underlying TCP connection, of course. Let's say our client still doesn't know how to do a lot of things, disconnected, reconnected. If this happened quickly - the old session continued in the new TCP connection, increase seq_no further. If it takes a long time, the server could delete it, because on its side it is also a queue, as we found out.

What should be seq_no? Oh, that's a tricky question. Try to honestly understand what was meant:

Content related message

A message requiring an explicit acknowledgment. These include all the user and many service messages, virtually all with the exception of containers and acknowledgments.

Message Sequence Number (msg_seqno)

A 32-bit number equal to twice the number of “content-related” messages (those requiring acknowledgment, and in particular those that are not containers) created by the sender prior to this message and subsequently incremented by one if the current message is a content related message. A container is always generated after its entire contents; therefore, its sequence number is greater than or equal to the sequence numbers of the messages contained in it.

What kind of circus is this with an increment of 1, and then another 2? .. I suspect that the original meaning was “low bit for ACK, the rest is a number”, but the result is not quite right - in particular, it turns out that it can be sent some confirmations that have the same seq_no! How? Well, for example, the server sends us something, sends, and we ourselves are silent, we only answer with service confirmation messages about receiving his messages. In this case, our outgoing confirmations will have the same outgoing number. If you are familiar with TCP and thought that this sounds kind of crazy, but it seems to be not very wild, because in TCP seq_no does not change, and confirmation goes to seq_no the other side - then I hasten to upset. Confirmations are coming to MTProto NOT by seq_no, as in TCP, but msg_id !

What is this msg_id, the most important of these fields? The unique ID of the message, as the name implies. It is defined as a 64-bit number, the least significant bits of which again have server-not-server magic, and the rest is a Unix timestamp, including the fractional part, shifted 32 bits to the left. Those. timestamp per se (and messages with too different times will be rejected by the server). From this it turns out that, in general, this is an identifier that is global for the client. While - remember session_id - we are guaranteed: Under no circumstances can a message meant for one session be sent into a different session. That is, it turns out that there is already three level — session, session number, message id. Why such an overcomplication, this mystery is very great.

So, msg_id needed for…

RPC: requests, responses, errors. Confirmations.

As you may have noticed, there is no special type or function "make an RPC request" anywhere in the schema, although there are answers. After all, we have content-related messages! That is, any message can be a request! Or not be. After all, each Yes msg_id. And here are the answers:

rpc_result#f35c6d01 req_msg_id:long result:Object = RpcResult;

This is where it is indicated to which message this is a response. Therefore, at the top level of the API, you will have to remember what number your request had - I think it is not necessary to explain that the work is asynchronous, and there can be several requests at the same time, the answers to which can be returned in any order? In principle, from this, and error messages like no workers, the architecture behind this can be traced: the server that maintains a TCP connection with you is a front-end balancer, it directs requests to backends and collects them back along message_id. Everything seems to be clear, logical and good here.

Yes?.. And if you think about it? After all, the RPC response itself also has a field msg_id! Do we need to yell at the server “you are not responding to my answer!”? And yes, what was there about confirmation? About page messages about messages tells us what is

msgs_ack#62d6b459 msg_ids:Vector long = MsgsAck;

and each side must do it. But not always! If you receive an RpcResult, it serves as an acknowledgment by itself. That is, the server can respond to your request with MsgsAck - like, "I received it." Can immediately answer RpcResult. It could be both.

And yes, you still have to answer the answer! Confirmation. Otherwise, the server will consider it undelivered and throw it out to you again. Even after reconnection. But here, of course, the question of timeouts will arise. Let's look at them a little later.

In the meantime, let's consider possible errors in query execution.

rpc_error#2144ca19 error_code:int error_message:string = RpcError;

Oh, someone will exclaim, here is a more human format - there is a line! Take your time. Here list of errorsbut certainly not complete. From it we learn that the code is − something like HTTP errors (well, of course, the semantics of the responses is not respected, in some places they are distributed by codes at random), and the string looks like CAPITAL_LETTERS_AND_NUMBERS. For example, PHONE_NUMBER_OCCUPIED or FILE_PART_X_MISSING. Well, that is, you still have to this line parse. For example, FLOOD_WAIT_3600 will mean that you have to wait an hour, and PHONE_MIGRATE_5that the phone number with this prefix should be registered in the 5th DC. We have a type language, right? We don’t need an argument from the string, regular expressions will do, cho.

Again, this is not on the service messages page, but, as is already customary with this project, information can be found on another documentation page. Or arouse suspicion. First, look, violation of typing/layers - RpcError can be invested in RpcResult. Why not outside? What have we not taken into account?.. Accordingly, where is the guarantee that RpcError may not be invested in RpcResult, but be directly or nested in another type? it lacks req_msg_id ? ..

But let's continue about service messages. The client may consider that the server is thinking for a long time, and make such a wonderful request:

rpc_drop_answer#58e4a740 req_msg_id:long = RpcDropAnswer;

There are three possible answers to it, again intersecting with the confirmation mechanism, to try to understand what they should be (and what is the list of types that do not require confirmation in general), the reader is left as homework (note: the information in the Telegram Desktop sources is not complete).

Addiction: Message Post Statuses

In general, many places in TL, MTProto and Telegram in general leave a feeling of stubbornness, but out of politeness, tact and others soft skills we politely kept silent about it, and the obscenities in the dialogues were censored. However, this placeОmost of the page about messages about messages causes shock even for me, who has been working with network protocols for a long time and has seen bicycles of varying degrees of curvature.

It starts harmlessly, with confirmations. Next, we are told about

bad_msg_notification#a7eff811 bad_msg_id:long bad_msg_seqno:int error_code:int = BadMsgNotification;
bad_server_salt#edab447b bad_msg_id:long bad_msg_seqno:int error_code:int new_server_salt:long = BadMsgNotification;

Well, everyone who starts working with MTProto will have to face them, in the “corrected - recompiled - launched” cycle, getting number errors or salt that has gone rotten during edits is a common thing. However, there are two points here:

  1. It follows that the original message is lost. We need to fence some queues, we will consider this later.
  2. What are those weird error numbers? 16, 17, 18, 19, 20, 32, 33, 34, 35, 48, 64... where are the rest of the numbers, Tommy?

The documentation states:

The intention is that error_code values ​​are grouped (error_code >> 4): for example, the codes 0x40 - 0x4f correspond to errors in container decomposition.

but, firstly, a shift in the other direction, and secondly, it doesn't matter where the rest of the codes are? In the author's head?.. However, these are trifles.

Addiction starts in post status messages and post copies:

  • Request for Message Status Information
    If either party has not received information on the status of its outgoing messages for a while, it may explicitly request it from the other party:
    msgs_state_req#da69fb52 msg_ids:Vector long = MsgsStateReq;
  • Informational Message regarding Status of Messages
    msgs_state_info#04deb57d req_msg_id:long info:string = MsgsStateInfo;
    Here, info is a string that contains exactly one byte of message status for each message from the incoming msg_ids list:

    • 1 = nothing is known about the message (msg_id too low, the other party may have forgotten it)
    • 2 = message not received (msg_id falls within the range of stored identifiers; however, the other party has certainly not received a message like that)
    • 3 = message not received (msg_id too high; however, the other party has certainly not received it yet)
    • 4 = message received (note that this response is also at the same time a receipt acknowledgment)
    • +8 = message already acknowledged
    • +16 = message not requiring acknowledgment
    • +32 = RPC query contained in message being processed or processing already complete
    • +64 = content-related response to message already generated
    • +128 = other party knows for a fact that message is already received
      This response does not require an acknowledgment. It is an acknowledgment of the relevant msgs_state_req, in and of itself.
      Note that if it turns out suddenly that the other party does not have a message that looks like it has been sent to it, the message can simply be re-sent. Even if the other party should receive two copies of the message at the same time, the duplicate will be ignored. (If too much time has passed, and the original msg_id is no longer valid, the message is to be wrapped in msg_copy).
  • Voluntary Communication of Status of Messages
    Either party may voluntarily inform the other party of the status of the messages transmitted by the other party.
    msgs_all_info#8cc0d131 msg_ids:Vector long info:string = MsgsAllInfo
  • Extended Voluntary Communication of Status of One Message
    ...
    msg_detailed_info#276d3ec6 msg_id:long answer_msg_id:long bytes:int status:int = MsgDetailedInfo;
    msg_new_detailed_info#809db6df answer_msg_id:long bytes:int status:int = MsgDetailedInfo;
  • Explicit Request to Re-Send Messages
    msg_resend_req#7d861a08 msg_ids:Vector long = MsgResendReq;
    The remote party immediately responds by re-sending the requested messages […]
  • Explicit Request to Re-Send Answers
    msg_resend_ans_req#8610baeb msg_ids:Vector long = MsgResendReq;
    The remote party immediately responds by re-sending answers to the requested messages […]
  • Message Copies
    In some situations, an old message with a msg_id that is no longer valid needs to be re-sent. Then, it is wrapped in a copy container:
    msg_copy#e06046b2 orig_message:Message = MessageCopy;
    Once received, the message is processed as if the wrapper were not there. However, if it is known for certain that the message orig_message.msg_id was received, then the new message is not processed (while at the same time, it and orig_message.msg_id are acknowledged). The value of orig_message.msg_id must be lower than the container's msg_id.

Let us even keep silent about the fact that in msgs_state_info again, the ears of the unfinished TL stick out (we needed a vector of bytes, and in the lower two bits of enum, and in the older bits flags). The point is something else. Does anyone understand why all this is in practice in real client necessary?.. With difficulty, but you can imagine some benefit if a person is engaged in debugging, and in an interactive mode - ask the server what and how. But requests are described here round trip.

It follows from this that each side must not only encrypt and send messages, but also store data about them, about the answers to them, and for an unknown amount of time. The documentation does not describe the timings or the practical applicability of these features. in no way. What is most surprising is that they are actually used in the code of official clients! Apparently, they were told something that was not included in the open documentation. Understand from the code why, is no longer as simple as in the case of TL - this is not a (comparatively) logically isolated part, but a piece tied to the application architecture, i.e. will require much more time to understand the application code.

Pings and timings. Queues.

From everything, if you recall the guesses about the server architecture (distribution of requests across backends), a rather dull thing follows - despite all the guarantees of delivery that in TCP (either the data has been delivered, or you will be informed about the break, but the data will be delivered until the moment of the problem), that confirmations in MTProto itself - no guarantees. The server can easily lose or throw out your message, and nothing can be done about it, just to fence crutches of various types.

And first of all - message queues. Well, for one thing, everything was obvious from the very beginning - an unconfirmed message must be stored and resent. And after what time? And the jester knows him. Perhaps those addict service messages somehow solve this problem with crutches, say, in Telegram Desktop there are about 4 queues corresponding to them (maybe more, as already mentioned, for this you need to delve into its code and architecture more seriously; at the same time, we we know that it cannot be taken as a sample, a certain number of types from the MTProto scheme are not used in it).

Why is this happening? Probably, the server programmers could not ensure reliability within the cluster, or at least even buffering on the front balancer, and shifted this problem to the client. Out of desperation, Vasily tried to implement an alternative option, with only two queues, using algorithms from TCP - measuring RTT to the server and adjusting the “window” size (in messages) depending on the number of unacknowledged requests. That is, such a rough heuristic for estimating server load - how many of our requests it can chew at the same time and not lose.

Well, that is, you understand, right? If you have to implement TCP again on top of a protocol that works over TCP, this indicates a very poorly designed protocol.

Oh yes, why is more than one queue needed, and in general, what does this mean for a person working with a high-level API? Look, you make a request, you serialize it, but it is often impossible to send it immediately. Why? Because the answer will be msg_id, which is temporaryаI'm a label, the appointment of which is better to postpone as late as possible - suddenly the server will reject it because of a time mismatch between us and it (of course, we can make a crutch that shifts our time from the present to the server time by adding a delta calculated from the server responses - official clients do this, but this method is crude and inaccurate due to buffering). So when you make a request with a local function call from the library, the message goes through the following stages:

  1. Lies in the same queue and is waiting for encryption.
  2. Appointed msg_id and the message went to another queue - possible forwarding; send to socket.
  3. a) The server replied MsgsAck - the message was delivered, we delete it from the "other queue".
    b) Or vice versa, he didn’t like something, he answered badmsg - we resend from the “other queue”
    c) Nothing is known, it is necessary to resend the message from another queue - but it is not known exactly when.
  4. Server finally answered RpcResult - the actual response (or error) - not just delivered, but also processed.

Attention! , the use of containers could partially solve the problem. This is when a bunch of messages is packed into one, and the server responded with an acknowledgment to all at once, with one msg_id. But he will also reject this pack, if something went wrong, also the whole thing.

And at this point, non-technical considerations come into play. From experience, we have seen many crutches, and in addition, now we will see more examples of bad advice and architecture - in such conditions, is it worth trusting and making such decisions? The question is rhetorical (of course not).

What are we talking about? If on the topic “addict messages about messages” you can still speculate with objections like “you are stupid, you didn’t understand our brilliant idea!” (so first write the documentation, as normal people should, with rationale and packet exchange examples, then we'll talk), then timings / timeouts are a purely practical and specific issue, everything has long been known here. But what does the documentation tell us about timeouts?

A server usually acknowledges the receipt of a message from a client (normally, an RPC query) using an RPC response. If a response is a long time coming, a server may first send a receipt acknowledgment, and somewhat later, the RPC response itself.

A client normally acknowledges the receipt of a message from a server (usually, an RPC response) by adding an acknowledgment to the next RPC query if it is not transmitted too late (if it is generated, say, 60-120 seconds following the receipt of a message from the server). However, if for a long period of time there is no reason to send messages to the server or if there is a large number of unacknowledged messages from the server (say, over 16), the client transmits a stand-alone acknowledgment.

... I translate: we ourselves do not know how much and how it is necessary, well, let's estimate that let it be like this.

And about pings:

Ping Messages (PING/PONG)

ping#7abe77ec ping_id:long = Pong;

A response is usually returned to the same connection:

pong#347773c5 msg_id:long ping_id:long = Pong;

These messages do not require acknowledgments. A pong is transmitted only in response to a ping while a ping can be initiated by either side.

Deferred Connection Closure + PING

ping_delay_disconnect#f3427b8c ping_id:long disconnect_delay:int = Pong;

Works like ping. In addition, after this is received, the server starts a timer which will close the current connection disconnect_delay seconds later unless it receives a new message of the same type which automatically resets all previous timers. If the client sends these pings once every 60 seconds, for example, it may set disconnect_delay equal to 75 seconds.

Are you out of your mind?! In 60 seconds, the train will enter the station, drop off and pick up passengers, and again lose connection in the tunnel. In 120 seconds, while you are poking around, he will arrive at another one, and the connection will most likely break. Well, it’s clear where the legs grow from - “I heard a ringing, but I don’t know where it is”, there is Nagle’s algorithm and the TCP_NODELAY option, which was intended for interactive work. But, sorry, delay its default value - 200 Milliseconds. If you really want to portray something similar and save on a possible pair of packets - well, put it off, at least for 5 seconds, or whatever the timeout of the message “User is typing ...” is now equal to. But no more.

And finally, pings. That is, checking the liveliness of a TCP connection. It's funny, but about 10 years ago I wrote a critical text about the messenger of the hostel of our faculty - there the authors also pinged the server from the client, and not vice versa. But third-year students are one thing, and an international office is another, right? ..

First, a small educational program. A TCP connection, in the absence of packet exchange, can live for weeks. This is both good and bad, depending on the purpose. Well, if you had an SSH connection to the server open, you got up from your computer, rebooted the power router, returned to your place - the session through this server did not break (did not type anything, there were no packets), convenient. It’s bad if there are thousands of clients on the server, each one takes up resources (hello Postgres!), and the client host may have rebooted a long time ago - but we won’t know about it.

Chat/IM systems belong to the second case for another, additional reason - online statuses. If the user "fell off", it is necessary to inform his interlocutors about it. Otherwise, there will be a mistake that the creators of Jabber made (and corrected for 20 years) - the user disconnected, but they continue to write messages to him, believing that he is online (which were also completely lost in these few minutes before the break was discovered). No, the TCP_KEEPALIVE option, which many people who do not understand how TCP timers work, pops anywhere (by setting wild values ​​like tens of seconds), will not help here - you need to make sure that not only the OS kernel of the user's machine is alive, but also functions normally, in able to answer, and the application itself (do you think it can't freeze? Telegram Desktop on Ubuntu 18.04 has crashed for me repeatedly).

That is why you should ping server client, and not vice versa - if the client does this, when the connection is broken, the ping will not be delivered, the goal is not achieved.

And what do we see in Telegram? Everything is exactly the opposite! Well, i.e. formally, of course, both sides can ping each other. In practice, clients use a crutch ping_delay_disconnect, which cocks a timer on the server. Well, sorry, it's not the client's business to decide how long he wants to live there without ping. The server, based on its load, knows better. But, of course, if you don’t feel sorry for the resources, then the evil Pinocchio are themselves, and the crutch will come down ...

How should it have been designed?

I believe that the above facts quite clearly indicate the not very high competence of the Telegram / VKontakte team in the field of the transport (and lower) level of computer networks and their low qualification in relevant matters.

Why did it turn out so complicated, and how can Telegram architects try to object? The fact that they tried to make a session that survives TCP connection breaks, that is, what we didn’t deliver now, we’ll deliver later. They probably also tried to make UDP transport, though they ran into difficulties and abandoned it (that's why the documentation is empty - there was nothing to brag about). But due to a lack of understanding of how networks in general and TCP in particular work, where you can rely on it, and where you need to do it yourself (and how), and attempts to combine this with cryptography “one shot of two birds with one stone” - such a cadaver turned out.

How should it have been? Based on the fact that msg_id is a timestamp that is cryptographically necessary to prevent replay attacks, it is an error to attach a unique identifier function to it. Therefore, without drastically changing the current architecture (when the Updates thread is formed, this is a high-level API topic for another part of this series of posts), one would have to:

  1. The server holding the TCP connection to the client takes responsibility - if you subtracted from the socket, if you please, confirm, process or return an error, no loss. Then the confirmation is not a vector of id's, but simply "the last received seq_no" - just a number, as in TCP (two numbers - your own seq and confirmed). We're always in session, aren't we?
  2. The timestamp to prevent replay attacks becomes a separate field, a la nonce. Checked, but nothing else is affected. Enough and uint32 - if our salt changes at least every half a day, we can allocate 16 bits to the lower bits of the integer part of the current time, the rest - to the fractional part of a second (as it is now).
  3. Is removed msg_id at all - from the point of view of distinguishing requests on the backends, there is, firstly, the client id, and secondly, the session id, and concatenate them. Accordingly, as a request identifier, only one is enough seq_no.

Also not the best option, a complete random could serve as an identifier - this is already done in the high-level API when sending a message, by the way. It would be better to remake the architecture from relative to absolute altogether, but this is a topic for another part, not this post.

API?

Ta-daam! So, having made our way through a path full of pain and crutches, we were finally able to send any requests to the server and receive any answers to them, as well as receive updates from the server (not in response to a request, but it sends us itself, such as PUSH, if anyone so much clearer).

Attention, now there will be the only Perl example in the article! (for those not familiar with the syntax, the first argument to bless is the object's data structure, the second is its class):

2019.10.24 12:00:51 $1 = {
'cb' => 'TeleUpd::__ANON__',
'out' => bless( {
'filter' => bless( {}, 'Telegram::ChannelMessagesFilterEmpty' ),
'channel' => bless( {
'access_hash' => '-6698103710539760874',
'channel_id' => '1380524958'
}, 'Telegram::InputPeerChannel' ),
'pts' => '158503',
'flags' => 0,
'limit' => 0
}, 'Telegram::Updates::GetChannelDifference' ),
'req_id' => '6751291954012037292'
};
2019.10.24 12:00:51 $1 = {
'in' => bless( {
'req_msg_id' => '6751291954012037292',
'result' => bless( {
'pts' => 158508,
'flags' => 3,
'final' => 1,
'new_messages' => [],
'users' => [],
'chats' => [
bless( {
'title' => 'Хулиномика',
'username' => 'hoolinomics',
'flags' => 8288,
'id' => 1380524958,
'access_hash' => '-6698103710539760874',
'broadcast' => 1,
'version' => 0,
'photo' => bless( {
'photo_small' => bless( {
'volume_id' => 246933270,
'file_reference' => '
'secret' => '1854156056801727328',
'local_id' => 228648,
'dc_id' => 2
}, 'Telegram::FileLocation' ),
'photo_big' => bless( {
'dc_id' => 2,
'local_id' => 228650,
'file_reference' => '
'secret' => '1275570353387113110',
'volume_id' => 246933270
}, 'Telegram::FileLocation' )
}, 'Telegram::ChatPhoto' ),
'date' => 1531221081
}, 'Telegram::Channel' )
],
'timeout' => 300,
'other_updates' => [
bless( {
'pts_count' => 0,
'message' => bless( {
'post' => 1,
'id' => 852,
'flags' => 50368,
'views' => 8013,
'entities' => [
bless( {
'length' => 20,
'offset' => 0
}, 'Telegram::MessageEntityBold' ),
bless( {
'length' => 18,
'offset' => 480,
'url' => 'https://alexeymarkov.livejournal.com/[url_вырезан].html'
}, 'Telegram::MessageEntityTextUrl' )
],
'reply_markup' => bless( {
'rows' => [
bless( {
'buttons' => [
bless( {
'text' => '???? 165',
'data' => 'send_reaction_0'
}, 'Telegram::KeyboardButtonCallback' ),
bless( {
'data' => 'send_reaction_1',
'text' => '???? 9'
}, 'Telegram::KeyboardButtonCallback' )
]
}, 'Telegram::KeyboardButtonRow' )
]
}, 'Telegram::ReplyInlineMarkup' ),
'message' => 'А вот и новая книга! 
// [текст сообщения вырезан чтоб не нарушать правил Хабра о рекламе]
напечатаю.',
'to_id' => bless( {
'channel_id' => 1380524958
}, 'Telegram::PeerChannel' ),
'date' => 1571724559,
'edit_date' => 1571907562
}, 'Telegram::Message' ),
'pts' => 158508
}, 'Telegram::UpdateEditChannelMessage' ),
bless( {
'pts' => 158508,
'message' => bless( {
'edit_date' => 1571907589,
'to_id' => bless( {
'channel_id' => 1380524958
}, 'Telegram::PeerChannel' ),
'date' => 1571807301,
'message' => 'Почему Вы считаете Facebook плохой компанией? Можете прокомментировать? По-моему, это шикарная компания. Без долгов, с хорошей прибылью, а если решат дивы платить, то и еще могут нехило подорожать.
Для меня ответ совершенно очевиден: потому что Facebook делает ужасный по качеству продукт. Да, у него монопольное положение и да, им пользуется огромное количество людей. Но мир не стоит на месте. Когда-то владельцам Нокии было смешно от первого Айфона. Они думали, что лучше Нокии ничего быть не может и она навсегда останется самым удобным, красивым и твёрдым телефоном - и доля рынка это красноречиво демонстрировала. Теперь им не смешно.
Конечно, рептилоиды сопротивляются напору молодых гениев: так Цукербергом был пожран Whatsapp, потом Instagram. Но всё им не пожрать, Паша Дуров не продаётся!
Так будет и с Фейсбуком. Нельзя всё время делать говно. Кто-то когда-то сделает хороший продукт, куда всё и уйдут.
#соцсети #facebook #акции #рептилоиды',
'reply_markup' => bless( {
'rows' => [
bless( {
'buttons' => [
bless( {
'data' => 'send_reaction_0',
'text' => '???? 452'
}, 'Telegram::KeyboardButtonCallback' ),
bless( {
'text' => '???? 21',
'data' => 'send_reaction_1'
}, 'Telegram::KeyboardButtonCallback' )
]
}, 'Telegram::KeyboardButtonRow' )
]
}, 'Telegram::ReplyInlineMarkup' ),
'entities' => [
bless( {
'length' => 199,
'offset' => 0
}, 'Telegram::MessageEntityBold' ),
bless( {
'length' => 8,
'offset' => 919
}, 'Telegram::MessageEntityHashtag' ),
bless( {
'offset' => 928,
'length' => 9
}, 'Telegram::MessageEntityHashtag' ),
bless( {
'length' => 6,
'offset' => 938
}, 'Telegram::MessageEntityHashtag' ),
bless( {
'length' => 11,
'offset' => 945
}, 'Telegram::MessageEntityHashtag' )
],
'views' => 6964,
'flags' => 50368,
'id' => 854,
'post' => 1
}, 'Telegram::Message' ),
'pts_count' => 0
}, 'Telegram::UpdateEditChannelMessage' ),
bless( {
'message' => bless( {
'reply_markup' => bless( {
'rows' => [
bless( {
'buttons' => [
bless( {
'data' => 'send_reaction_0',
'text' => '???? 213'
}, 'Telegram::KeyboardButtonCallback' ),
bless( {
'data' => 'send_reaction_1',
'text' => '???? 8'
}, 'Telegram::KeyboardButtonCallback' )
]
}, 'Telegram::KeyboardButtonRow' )
]
}, 'Telegram::ReplyInlineMarkup' ),
'views' => 2940,
'entities' => [
bless( {
'length' => 609,
'offset' => 348
}, 'Telegram::MessageEntityItalic' )
],
'flags' => 50368,
'post' => 1,
'id' => 857,
'edit_date' => 1571907636,
'date' => 1571902479,
'to_id' => bless( {
'channel_id' => 1380524958
}, 'Telegram::PeerChannel' ),
'message' => 'Пост про 1С вызвал бурную полемику. Человек 10 (видимо, 1с-программистов) единодушно написали:
// [текст сообщения вырезан чтоб не нарушать правил Хабра о рекламе]
Я бы добавил, что блестящая у 1С дистрибуция, а маркетинг... ну, такое.'
}, 'Telegram::Message' ),
'pts_count' => 0,
'pts' => 158508
}, 'Telegram::UpdateEditChannelMessage' ),
bless( {
'pts' => 158508,
'pts_count' => 0,
'message' => bless( {
'message' => 'Здравствуйте, расскажите, пожалуйста, чем вредит экономике 1С?
// [текст сообщения вырезан чтоб не нарушать правил Хабра о рекламе]
#софт #it #экономика',
'edit_date' => 1571907650,
'date' => 1571893707,
'to_id' => bless( {
'channel_id' => 1380524958
}, 'Telegram::PeerChannel' ),
'flags' => 50368,
'post' => 1,
'id' => 856,
'reply_markup' => bless( {
'rows' => [
bless( {
'buttons' => [
bless( {
'data' => 'send_reaction_0',
'text' => '???? 360'
}, 'Telegram::KeyboardButtonCallback' ),
bless( {
'data' => 'send_reaction_1',
'text' => '???? 32'
}, 'Telegram::KeyboardButtonCallback' )
]
}, 'Telegram::KeyboardButtonRow' )
]
}, 'Telegram::ReplyInlineMarkup' ),
'views' => 4416,
'entities' => [
bless( {
'offset' => 0,
'length' => 64
}, 'Telegram::MessageEntityBold' ),
bless( {
'offset' => 1551,
'length' => 5
}, 'Telegram::MessageEntityHashtag' ),
bless( {
'length' => 3,
'offset' => 1557
}, 'Telegram::MessageEntityHashtag' ),
bless( {
'offset' => 1561,
'length' => 10
}, 'Telegram::MessageEntityHashtag' )
]
}, 'Telegram::Message' )
}, 'Telegram::UpdateEditChannelMessage' )
]
}, 'Telegram::Updates::ChannelDifference' )
}, 'MTProto::RpcResult' )
};
2019.10.24 12:00:51 $1 = {
'in' => bless( {
'update' => bless( {
'user_id' => 2507460,
'status' => bless( {
'was_online' => 1571907651
}, 'Telegram::UserStatusOffline' )
}, 'Telegram::UpdateUserStatus' ),
'date' => 1571907650
}, 'Telegram::UpdateShort' )
};
2019.10.24 12:05:46 $1 = {
'in' => bless( {
'chats' => [],
'date' => 1571907946,
'seq' => 0,
'updates' => [
bless( {
'max_id' => 141719,
'channel_id' => 1295963795
}, 'Telegram::UpdateReadChannelInbox' )
],
'users' => []
}, 'Telegram::Updates' )
};
2019.10.24 13:01:23 $1 = {
'in' => bless( {
'server_salt' => '4914425622822907323',
'unique_id' => '5297282355827493819',
'first_msg_id' => '6751307555044380692'
}, 'MTProto::NewSessionCreated' )
};
2019.10.24 13:24:21 $1 = {
'in' => bless( {
'chats' => [
bless( {
'username' => 'freebsd_ru',
'version' => 0,
'flags' => 5440,
'title' => 'freebsd_ru',
'min' => 1,
'photo' => bless( {
'photo_small' => bless( {
'local_id' => 328733,
'volume_id' => 235140688,
'dc_id' => 2,
'file_reference' => '
'secret' => '4426006807282303416'
}, 'Telegram::FileLocation' ),
'photo_big' => bless( {
'dc_id' => 2,
'file_reference' => '
'volume_id' => 235140688,
'local_id' => 328735,
'secret' => '71251192991540083'
}, 'Telegram::FileLocation' )
}, 'Telegram::ChatPhoto' ),
'date' => 1461248502,
'id' => 1038300508,
'democracy' => 1,
'megagroup' => 1
}, 'Telegram::Channel' )
],
'users' => [
bless( {
'last_name' => 'Panov',
'flags' => 1048646,
'min' => 1,
'id' => 82234609,
'status' => bless( {}, 'Telegram::UserStatusRecently' ),
'first_name' => 'Dima'
}, 'Telegram::User' )
],
'seq' => 0,
'date' => 1571912647,
'updates' => [
bless( {
'pts' => 137596,
'message' => bless( {
'flags' => 256,
'message' => 'Создать джейл с именем покороче ??',
'to_id' => bless( {
'channel_id' => 1038300508
}, 'Telegram::PeerChannel' ),
'id' => 119634,
'date' => 1571912647,
'from_id' => 82234609
}, 'Telegram::Message' ),
'pts_count' => 1
}, 'Telegram::UpdateNewChannelMessage' )
]
}, 'Telegram::Updates' )
};

Yes, specially not under the spoiler - if you haven't read it, go and do it!

Oh, wai~~… what does it look like? Something very familiar… maybe this is the data structure of a typical Web API in JSON, except perhaps classes were attached to objects?..

So it turns out ... What is it, comrades? .. So much effort - and we stopped to rest where the Web programmers just starting?.. Wouldn't just JSON over HTTPS be easier?! And what did we get in exchange? Were these efforts worth it?

Let's evaluate what TL+MTProto has given us and what alternatives are possible. Well, HTTP request-response is a bad fit, but at least something on top of TLS?

compact serialization. Seeing this data structure, similar to JSON, it is remembered that there are its binary variants. Let's mark MsgPack as insufficiently extensible, but there is, for example, CBOR - by the way, the standard described in RFC 7049. It is notable for the fact that it defines tags, as an extension mechanism, and among already standardized are available:

  • 25 + 256 - replacing duplicate lines with a line number reference, such a cheap compression method
  • 26 - serialized Perl object with class name and constructor arguments
  • 27 - serialized language-independent object with type name and constructor arguments

Well, I tried to serialize the same data in TL and CBOR with packing of strings and objects enabled. The result began to differ in favor of CBOR somewhere from a megabyte:

cborlen=1039673 tl_len=1095092

So, output: There are substantially simpler formats that are not subject to the sync failure or unknown identifier problem, with comparable efficiency.

Fast connection establishment. This means zero RTT after reconnection (when the key has already been generated once) - applicable from the very first MTProto message, but with some reservations - they got into the same salt, the session did not go rotten, etc. What does TLS offer us in return? Related quote:

When using PFS in TLS, TLS session tickets (RFC 5077) to resume the encrypted session without renegotiating the keys and without storing the key information on the server. When opening the first connection and generating keys, the server encrypts the state of the connection and sends it to the client (in the form of a session ticket). Accordingly, when the connection is resumed, the client sends a session ticket containing, among other things, the session key back to the server. The ticket itself is encrypted with a temporary key (session ticket key), which is stored on the server and must be distributed to all frontend servers that handle SSL in clustered solutions.[10]. Thus, the introduction of a session ticket can violate PFS if temporary server keys are compromised, for example, when they are stored for a long time (OpenSSL, nginx, Apache by default store them for the entire time the program is running; popular sites use the key for several hours, up to days).

Here RTT is not zero, you need to exchange at least ClientHello and ServerHello, after which, together with Finished, the client can already send data. But here it should be remembered that we do not have the Web, with its bunch of newly opened connections, but a messenger, the connection of which is often one and more or less long-lived, relatively short requests for Web pages - everything is multiplexed inside. That is, it is quite acceptable, if we did not come across a very bad subway section.

Forgot something else? Write in the comments.

To be continued!

In the second part of this series of posts, we will consider organizational issues rather than technical ones - approaches, ideology, interface, attitude towards users, etc. Based, however, on the technical information that was presented here.

The third part will continue the analysis of the technical component / development experience. You will learn in particular:

  • continuation of the pandemonium with the variety of TL-types
  • unknown things about channels and supergroups
  • than dialogs is worse than roster
  • about absolute vs relative message addressing
  • what is the difference between photo and image
  • how emoji interfere with italicized text

and other crutches! Stay tuned!

Source: habr.com

Add a comment