What is the brain of a student learning about the computer world capable of?

Good day.

Having finished writing another script in Bash, I realized that everything should be completely different, but everything worked. I want to show you what indecencies and crutches I wrote in order to solve the problem, but so far without having a wagon of knowledge. In other words, a caricature of programming.

Task


Something needed to:

  • Displayed many rhymes for a word, except for squares
  • Crossed many rhymes of two words

For what? Well, here it is - and that's it.
Who does not know, a square rhyme (colloquially - a square) - two words that have the last two letters in spelling the same, which (often, only this) makes them a rhyme. For example, roses are frosts; tire is a car. The use of squares in modern versification is not particularly approved by people, due to their primitiveness.

Solution


It seemed to me that the easiest solution was to write a Bash script that uses an already existing rhyme generator - HOST, which first of all selects them by consonance, and not by spelling. What is HOST? Because if you specify the real name of the site, they will say that it is advertising. Why not keep using it? Firstly, despite his advantage of selecting rhymes according to consonances, he still often gives out squares. Secondly, you still have to think with your brains, spend time switching between tabs, energy memorizing repeated words in lists to find a rhyme for two words.

Getting strong rhymes

What do I know? I know about utility wget, which downloads the page at the specified URL. Well, we execute the request - we get the HTML page in the file, which is named the word for rhyme. For example, let's search for the word "here":

wget https://HOST/rifma/здесь

But I only need a list of words, how do I get rid of everything else? We look and see that the list of words is designed, no matter how strange it may be, in the form of a list, and the words are in tags . Well, we have a wonderful utility thirst - so let's write:

cat $word | grep '<li>' | sed -e "s%<li>%%" | sed -e "s%</li>%%" | sed -e "s/ //g" | sed -e "/^$/d" 1> $word

First, from the word file, select the lines that contain the tag - we get a bunch of empty tags and lines with words. We remove the tag itself and its closing one - here percent symbols are used instead of slashes because in the tag itself there is already a slash, why thirst doesn't understand you a bit. And the interest is good. Remove all spaces from the file, remove empty lines. Voila - a ready-made list of words.

In order to remove words that rhyme with the last letters, select the last two letters from the original word and clean up the list:

squad=${word:((${#word}-2)):2}
cat $word | sed -e "/.$squad$/d" 1> $word

We look, we try - everything works ... so, where is the list for the word "play"? And for the word "go"? The file is empty! And this is all because these words are verbs, and we know what they do with those who rhyme with verbs. Verbal rhyme is even worse than square rhyme, because there are the most verbs in the Russian language, and even all with the same endings, which is why they were not in the final file after checking the endings.

However, we are not in a hurry. For each word there are not only rhymes, but also assonances, which sometimes sound much better than rhyme - that's why they are assonances (fr. assonance, from lat. assono - I sound in tune).

We get assonances

This is where the fun begins: assonances appear on a separate URL, but on the same page, by executing a script, sending an HTTP request and receiving a response. How to say wget'u press the button? But no way. Sadly.

Noticing that the URL in the line still somehow changes, I copied what was there after switching to assonances and pasted it in a new browser tab - strong rhymes opened. Not that.

In fact, I thought, it should not matter for the server whether the script that sends the request to it is running, or whether the person manually types it. So? And who knows, let's go check it out.

Where to send? What to send? HTTP request to the server IP, there is something like GET ... then there is something HTTP / 1.1 ... We need to see what and where the browser sends. Install wireshark, look at the traffic:

0040 37 5d a3 84 27 e7 fb 13 6d 93 ed cd 56 04 9d 82 7]£.'çû.m.íÍV...
0050 32 7c fb 67 46 71 dd 36 4d 42 3d f3 62 1b e0 ad 2|ûgFqÝ6MB=ób.à.
0060 ef 87 be 05 6a f9 e1 01 41 fc 25 5b c0 77 d3 94 ï.¾.jùá.Aü%[ÀwÓ.

Um… what? Oh yes, we have HTTPS. What to do? Arrange a MITM attack on yourself? Ideally, the victim herself will help us.

In general, having guessed to climb on the browser, I did find the request itself, and the addressee. Go:

Dialog with terminal

telnet IP PORT
Trying IP...
Connected to IP.
Escape character is '^]'.
GET /rifma/%D0%BC%D0%B0%D1%82%D1%8C?mode=block&type=asn HTTP/1.1
Host: HOST
Accept-Language: en-US,en;q=0.5
X-Requested-With: XMLHttpRequest
Connection: close

HTTP/1.1 400 Bad Request
Server: nginx/1.8.0
Date: Sun, 03 Nov 2019 20:06:59 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 270
Connection: close

<html>
<head><title>400 The plain HTTP request was sent to HTTPS port</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<center>The plain HTTP request was sent to HTTPS port</center>
<hr><center>nginx/1.8.0</center>
</body>
</html>
Connection closed by foreign host.

Huh. Hee hee. Indeed what I expected by sending a bare HTTP request to an HTTPS port. Is it encrypted now? All this fuss with RSA keys, then with SHA256. Why, there is OpenSSL for such things. Well, we already know what to do, just first remove the Referer and Cookie fields - I think they will not greatly affect the case:

Dialog with terminal

openssl s_client -connect IP:PORT
{Всякие ключи, сертификаты}
GET /rifma/%D0%B7%D0%B4%D0%B5%D1%81%D1%8C?mode=block&type=asn HTTP/1.1
Host: HOST
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0
Accept: text/javascript,text/html,application/xml,text/xml,*/*
Accept-Language: ru-RU,ru;q=0.8,en-US;q=0.5,en;q=0.3
Accept-Encoding: gzip, deflate, br
X-Requested-With: XMLHttpRequest
Connection: keep-alive

HTTP/1.1 200 OK
Content-Type: text/html;charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
Status: 200 OK
Date: Sun, 03 Nov 2019 20:34:33 GMT
Set-Cookie: COOKIE
X-Powered-By: Phusion Passenger 5.0.16
Server: nginx/1.8.0 + Phusion Passenger 5.0.16
Expires: Thu, 01 Jan 1970 00:00:01 GMT
Cache-Control: no-cache
Strict-Transport-Security: max-age=31536000
Content-Security-Policy: block-all-mixed-content
Content-Encoding: gzip

What is the brain of a student learning about the computer world capable of?

What is this, mate on the server? Well, at least they answered me with 200 OK, so the cookies and the referrer do not affect anything. Gzip compression, but ASCII characters are copied when copying. Exactly, you can remove the line Accept encoding. Everything is fine - we get an HTML document, now with assonances. But here are two questions: how to run OpenSSL and pass data to it with a script? And how to read the output if, after receiving the response, we remain, as it were, in the “shell” of OpenSSL? If you can think of something with the second, but with the first ...

How good it is to eat Habrwhere I read about the utility expect, which automates the process of interacting with programs that expect human interaction. Even more attractive is having a team autoexpect, generating expect script according to your actions. Well, we start, we do all this and here is the finished script. Only it is very huge, and all because OpenSSL displays certificates, keys, and expect waiting for the output of this. Do we need it? No. We demolish the entire first prompt, leaving only the last line break 'r'. We also remove the User-Agent and Accept fields from our request - they do not affect anything. Yes, let's start. The script was executed, but where is the coveted HTML document? expect ate it. In order to make it spit out, you need to put:

set results $expect_out(buffer)

before the end of the script - this is how the output of the executable will be written expect'om command and displayed. As a result, something like this:

expect script

#!/usr/bin/expect -f

set timeout -1
spawn openssl s_client -connect IP:PORT
match_max 100000
expect -exact "
---r
"
send -- "GET /rifma/%d0%b7%d0%b4%d0%b5%d1%81%d1%8c?mode=block&type=asn HTTP/1.1rHost: HOSTrAccept-Language: en-US,en;q=0.5rX-Requested-With: XMLHttpRequestrConnection: close"
expect -exact "GET /rifma/%d0%b7%d0%b4%d0%b5%d1%81%d1%8c?mode=block&type=asn HTTP/1.1r
Host: HOSTr
Accept-Language: en-US,en;q=0.5r
X-Requested-With: XMLHttpRequestr
Connection: close"
send -- "r"
set results $expect_out(buffer)
expect -exact "r
"
send -- "r"
expect eof

But that's not all! As you can see, in all examples the request URL was static, but it is he who is responsible for which word the assonances will be displayed to. And so it turns out that we will constantly search for the word "%d0%b7%d0%b4%d0%b5%d1%81%d1%8c" in ASCII or "here" in UTF-8. What to do? Of course, it's easy to generate a new script every time, friends! Just no longer autoexpect'om, but with the help threw out, because nothing changes in our new one, except for the word. And long live a new problem: how can we intelligently translate a word from Cyrillic into a URL format? Something and for the terminal there is nothing special. It's okay, can we? Can:

See what I can!

function furl {
furl=$(echo "$word" | sed 's:А:%d0%90:g;s:Б:%d0%91:g;s:В:%d0%92:g;s:Г:%d0%93:g;s:Д:%d0%94:g;s:Е:%d0%95:g;s:Ж:%d0%96:g;s:З:%d0%97:g;s:И:%d0%98:g;s:Й:%d0%99:g;s:К:%d0%9a:g;s:Л:%d0%9b:g;s:М:%d0%9c:g;s:Н:%d0%9d:g;s:О:%d0%9e:g;s:П:%d0%9f:g;s:Р:%d0%a0:g;s:С:%d0%a1:g;s:Т:%d0%a2:g;s:У:%d0%a3:g;s:Ф:%d0%a4:g;s:Х:%d0%a5:g;s:Ц:%d0%a6:g;s:Ч:%d0%a7:g;s:Ш:%d0%a8:g;s:Щ:%d0%a9:g;s:Ъ:%d0%aa:g;s:Ы:%d0%ab:g;s:Ь:%d0%ac:g;s:Э:%d0%ad:g;s:Ю:%d0%ae:g;s:Я:%d0%af:g;s:а:%d0%b0:g;s:б:%d0%b1:g;s:в:%d0%b2:g;s:г:%d0%b3:g;s:д:%d0%b4:g;s:е:%d0%b5:g;s:ж:%d0%b6:g;s:з:%d0%b7:g;s:и:%d0%b8:g;s:й:%d0%b9:g;s:к:%d0%ba:g;s:л:%d0%bb:g;s:м:%d0%bc:g;s:н:%d0%bd:g;s:о:%d0%be:g;s:п:%d0%bf:g;s:р:%d1%80:g;s:с:%d1%81:g;s:т:%d1%82:g;s:у:%d1%83:g;s:ф:%d1%84:g;s:х:%d1%85:g;s:ц:%d1%86:g;s:ч:%d1%87:g;s:ш:%d1%88:g;s:щ:%d1%89:g;s:ъ:%d1%8a:g;s:ы:%d1%8b:g;s:ь:%d1%8c:g;s:э:%d1%8d:g;s:ю:%d1%8e:g;s:я:%d1%8f:g;s:ё:%d1%91:g;s:Ё:%d0%81:g')}

In total, we have a script that converts a word into ASCII text, which generates another script that requests a site page with assonances via OpenSSL from the server. And then we redirect the output of the last script to a file and, in the old fashioned way, pass it through "filters" extra, squares and add to the file.

Intersection of many. Outcome

Actually this is what causes the least problems. We perform the above procedures for two words, then from two lists we compare each word with each and if a match is found, we display it. Now we have a script that takes two words as input and displays a list of words that rhyme with both of them, and even taking into account assonances, and all this without manually switching between four tabs and memorizing the words "by eye" - that's all collected, accounted for and discarded automatically. Wonderful.

The purpose of this publication was to show that if a person needs something, then he will do it anyway. Very inefficient, crooked, creepy, but it will work.

Source: habr.com

Add a comment