How to stop doing the same thing

Do you like to repeat routine operations from time to time? Here I am no. But every time in the SQL client, when working with the Rostelecom storage, I had to manually register all the joins between the tables. And this despite the fact that in 90% of cases the fields and join conditions of the tables matched from query to query! It would seem that any SQL client has autocomplete functions, but it does not always work for storages: they rarely have unique constraint and foreign key in order to increase performance, and without this the program cannot find out how the entities are related to each other and what it can do for you offer.

How to stop doing the same thing

After going through denial, anger, bargaining, depression and getting closer to acceptance, I decided - why not try to implement autocomplete with blackjack myself and how it should be? I'm using the dbeaver client, written in java, it has an open source community version. A simple plan came up:

  1. Find in the source code the classes responsible for autocompletion
  2. Reorient them to work with external metadata and pull information about joins from there
  3. ??
  4. PROFIT

I quickly figured out the first point - I found a request to correct autocomplete in the bugtracker and in the related commit discovered the SQLCompletionAnalyzer class. Looked at the code - what you need. It remains to rewrite it so that everything works. I waited for a free evening and began to think over the implementation. I decided to keep the table link rules (metadata) in json. I had no practical experience with this format and the current task was seen as an opportunity to correct this omission.

To work with json, I decided to use the library json-simple from google. This is where the surprises began. As it turned out, dbeaver, as a true application, is written on the eclipse platform using the OSGi framework. For experienced developers, this thing gives the convenience of managing dependencies, but for me it was more like dark magic, for which I was clearly not ready: as usual, I write the import of the classes I need from the json-simple library in the header of the class being edited, I specify it in pom. xml, after which the project categorically refuses to assemble normally and crashes with errors.

As a result, it turned out to fix build errors: I registered the library not in pom.xml, but in manifest.mf, as required by OSGI, while specifying it as import-package. Not the prettiest solution, but it works. Here comes the next surprise. If you are developing in intellij idea, you can’t just start debugging your project based on the eclipse platform: an inexperienced developer should suffer no less than an analyst without query completion. The beaver developers themselves came to the rescue, indicating in the wiki all the dances with a tambourine that need to be done. The most annoying thing is that even after all these squats, the project did not want to start in debugging with the json library connected via import-package (despite the fact that it was still successfully assembled into the finished product).

By that time, I managed to feel the inconvenience of using json for my task - after all, the metadata was supposed to be edited manually, and the xml format is better suited for this. The second argument in favor of xml was the presence in the JDK of all the necessary classes, which made it possible to stop fighting with the external library. With great pleasure, I transferred all the metadata from json to xml and started editing the autocomplete logic.

Metadata example

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<tableRelations>
    <tableRelation>
        <leftTable>dim_account</leftTable>
        <rightTable>dim_partner</rightTable>
        <joinColumnPair leftColumn="partner_key" rightColumn="partner_key"/>
        <joinColumnPair leftColumn="src_id" rightColumn="src_id"/>
    </tableRelation>
    <tableRelation>
        <leftTable>dim_account</leftTable>
        <rightTable>dim_branch</rightTable>
        <joinColumnPair leftColumn="src_id" rightColumn="src_id"/>
        <joinColumnPair leftColumn="branch_key" rightColumn="branch_key"/>
    </tableRelation>
</tableRelations>

As a result I made changes to the SQLUtils and SQLCompletionAnalyzer classes. The idea is this: if the program failed to find suitable autocomplete suggestions according to the basic logic, then it checks for possible joins using an external xml file. The file itself stores pairs of tables indicating the fields by which these tables need to be linked. Restrictions on the technical validity dates of the eff_dttm and exp_dttm records and the deleted_ind logical deletion flag are set by default.

When the changes were made to the code, the question arose - who will fill the file with metadata? There are many entities in the repository, it is expensive to prescribe all the connections yourself. As a result, I decided to assign this task to my fellow analysts. I posted the metadata file to svn, from where a checkout is made to the local directory with the program. The principle is this: did a new entity appear in the repository? One analyst makes possible joins to the file, commits the changes, the rest check in and enjoy the working autocomplete: community, knowledge accumulation and all that. Held a workshop for colleagues on using the program, wrote an article in confluence - now the company has one more convenient tool.

Working on this feature gave me the understanding that you should not be afraid to pick open source projects - as a rule, they have a clear architecture, and even basic knowledge of the language will be enough for experiments. And with a certain amount of perseverance, it will even be possible to get rid of the hated routine operations, saving yourself time for new experiments.

Source: habr.com

Add a comment