XML is almost always misused

XML is almost always misused
The XML language was invented in 1996. As soon as he had time to appear, as the possibilities of his application were already beginning to be misunderstood, and for the purposes for which they tried to adapt him, he was not the best choice.

It's no exaggeration to say that the vast majority of the XML schemas I've seen have been inappropriate or misuses of XML. Moreover, this use of XML showed a fundamental misunderstanding of what XML is in the first place.

XML is a markup language. This is not a data format.. Most XML schemas explicitly ignored this distinction, confusing XML with a data format, which ultimately meant a mistake in choosing XML itself, since it was the data format that was really needed.

Without getting too specific, XML is best suited for annotating blocks of text with structure and metadata. Unless your primary concern is working with a block of text, XML is unlikely to be justified.

From this point of view, there is an easy way to check how well an XML schema is done. Let's take for example a document in the proposed schema and remove all tags and attributes from it. If what's left doesn't make sense (or if there's an empty string left), then either your schema isn't built right, or you just shouldn't have used XML.

Next, I will give some of the most common examples of incorrectly constructed circuits.

<roоt>
  <item name="name" value="John" />
  <item name="city" value="London" />
</roоt>

Here we see an example of an unreasonable and strange (albeit very common) attempt to express in XML a simple key-value dictionary. If all tags and attributes are removed, an empty string remains. In essence, this document is, as absurd as it may sound, a semantic annotation of an empty string.

<root name="John" city="London" />

To make matters worse, we don't just have an empty string semantic annotation here as an extravagant way of expressing a dictionary - this time "dictionary" is directly encoded as attributes on the root element. This makes the given set of attribute names on the element undefined and dynamic. Moreover, it can be seen from here that all the author really wanted to express was a simple key-value syntax, but instead he made a completely strange decision to apply XML, forcing the use of a single empty element simply as a prefix to use attribute syntax. And such schemes come across to me very often.

<roоt>
  <item key="name">John</item>
  <item key="city">London</item>
</roоt>

This is already something better, but now the keys are metadata for some reason, but the values ​​are not. A very strange take on dictionaries. If you remove all tags and attributes, half of the information will be lost.

A proper dictionary expression in XML would look something like this:

<roоt>
  <item>
    <key>Name</key>
    <value>John</value>
  </item>
  <item>
    <key>City</key>
    <value>London</value>
  </item>
</roоt>

But if people have made the strange decision to use XML as a data format and then use it to organize a dictionary, then they must understand that what they are doing is inappropriate and not convenient. More often than not, designers mistakenly choose XML to build their applications. But more often than not, they exacerbate the situation by using XML in one of the forms described above, ignoring the fact that XML is simply not suitable for this.

Worst XML Schema? By the way, the prize for the worst XML schema I've ever seen Gets the automatic provisioning configuration file format for Polycom IP telephony phones. Such files require the download of TFTP request XML files, which... Well, here is an excerpt from one such file:

<softkey
        softkey.feature.directories="0"
        softkey.feature.buddies="0"
        softkey.feature.forward="0"
        softkey.feature.meetnow="0"
        softkey.feature.redial="1"
        softkey.feature.search="1"

        softkey.1.enable="1"
        softkey.1.use.idle="1"
        softkey.1.label="Foo"
        softkey.1.insert="1"
        softkey.1.action="..."

        softkey.2.enable="1"
        softkey.2.use.idle="1"
        softkey.2.label="Bar"
        softkey.2.insert="2"
        softkey.2.action="..." />

This is not someone's bad joke. And this is not my invention:

  • elements are simply used as a prefix to attach attributes, which themselves have hierarchical names.
  • If you want to assign values ​​to several instances of a record of a certain type, you must use attribute names to do this, which have indexes.
  • In addition, attributes starting with softkey., must be placed on the elements <softkey/>, attributes starting with feature., must be placed on the elements <feature/> etc., despite the fact that it looks completely redundant and seemingly meaningless.
  • And finally, if you were hoping that the first component of the attribute name always matches the element name - nothing like that! For example, attributes up. must be attached to <userpreferences/>. The order of attaching attribute names to elements is arbitrary, and almost completely.

Documents or data. From time to time someone does absolutely strange things, trying to compare XML and JSON - and thereby showing that they do not understand either one or the other. XML is a document markup language. JSON, on the other hand, is a structured data format, so comparing them against each other is like trying to compare warm with soft.

Understanding the difference between documents and data. As an analogue of XML, you can conditionally take a machine-readable document. Although intended to be machine readable, it refers metaphorically to documents, and from that point of view is in fact comparable to PDF documents, which are most often not machine readable.

For example, in XML, the order of the elements matters. And in JSON, the order of key-value pairs within objects is meaningless and undefined. If you want to get an unordered dictionary of key-value pairs, the actual order in which the elements in that file appear doesn't matter. But you can form many different things from this data. application documentsbecause there is a certain order in the document. Metaphorically, this is analogous to a document on paper, although it does not have physical dimensions, unlike a printout or PDF file.

My example of a proper XML dictionary representation shows the order of the elements in the dictionary, as opposed to the JSON representation. I cannot ignore this order: such linearity is inherent in the document model and the XML format. One might decide to ignore the order when interpreting this XML document, but this is pointless to argue, since the issue is beyond the scope of a discussion of the format itself. What's more, if you make a document browser-browsable by attaching a cascading style sheet to it, you'll see that the elements of the dictionary appear in a certain order, and not in any other order.

In other words, a dictionary (a piece of structured data) can be converted to n various possible documents (XML, PDF, paper, etc.), where n - the number of possible combinations of elements in the dictionary, and we have not yet taken into account other possible variables.

However, it also follows that if you want to transfer only data, then using a machine-readable document for this will not be effective. It uses a model, which in this case is superfluous, it will only interfere. In addition, in order to extract the original data, it will be necessary to write a program. It hardly makes sense to use XML for something that won't be formatted as a document at some point (say, with CSS or XSLT or both), since that's the main (if not the only) reason for doing so. to stick to the document model.

Moreover, since XML does not have the concept of numbers (or Boolean expressions, or other data types), all numbers represented in this format are treated as extra text only. To retrieve data, the schema and its relationship to the corresponding data being expressed must be known. You also need to know when, based on the context, this or that element of the text is a number, and it should be converted to a number, etc.

Thus, the process of extracting data from XML documents is not so different from the process of recognizing scanned documents containing, for example, tables that form many pages of numerical data. Yes, it is in principle possible to do this, but this is not the most optimal way, except as a last resort, when there are absolutely no other options. A reasonable solution would be to simply find a digital copy of the original data, not embedded in a document model, in which the data is combined with its specific textual representation.

That being said, it doesn't surprise me at all that XML is popular in business. The reason for this is precisely that the format of documents (on paper) is clear and familiar to business, and they want to continue to use a familiar and understandable model. For the same reason, it is too common for businesses to use PDF documents instead of more machine-readable formats—because they are still tied to the notion of a printed page with a specific physical size. This applies even to documents that are unlikely to ever be printed (for example, an 8000-page registry documentation PDF file). From this point of view, the use of XML in business is essentially a manifestation of skeuomorphism. People understand the metaphorical idea of ​​a printed page of limited size, and they understand how to create business processes based on printed documents. If that's your point, documents with no physical size limits that are machine-readable—XML documents—represent an innovation while being a familiar and comfortable counterpart to a document. That does not prevent them from being an incorrect and overly skeuomorphic way of presenting data.

To date, the only XML Schemas I know of that I can actually cite as a valid application of this format are XHTML and DocBook.

Source: habr.com

Add a comment