New programming language Mash

For several years I tried my hand at developing my own programming language. I wanted to create, in my opinion, the most simple, fully functional and convenient language possible.

In this article I want to highlight the main stages of my work and, to begin with, describe the created concept of the language and its first implementation, which I am currently working on.

Let me say in advance that I wrote the entire project in Free Pascal, because... programs on it can be assembled for a huge number of platforms, and the compiler itself produces very optimized binaries (I collect all the components of the project with the O2 flag).

Language runtime

First of all, it’s worth talking about the virtual machine that I had to write to run future applications in my language. I decided to implement a stack architecture, perhaps, because it was the easiest way. I didn’t find a single normal article on how to do this in Russian, so after familiarizing myself with the English-language material, I sat down to designing and writing my own bicycle. Next I will present my “advanced” ideas and developments in this matter.

Stack implementation

Obviously, at the top of the VM is the stack. In my implementation it works in blocks. Essentially this is a simple array of pointers and a variable to store the index of the top of the stack.
When it is initialized, an array of 256 elements is created. If more pointers are pushed onto the stack, its size increases by the next 256 elements. Accordingly, when removing elements from the stack, its size is adjusted.

The VM uses several stacks:

  1. Main stack.
  2. A stack for storing return points.
  3. Garbage collector stack.
  4. Try/catch/finally block handler stack.

Constants and Variables

This one is simple. Constants are handled in a separate small piece of code and are available in future applications via static addresses. Variables are an array of pointers of a certain size, access to its cells is carried out by index - i.e. static address. Variables can be pushed to the top of the stack or read from there. Actually, because While our variables essentially store pointers to values ​​in VM memory, the language is dominated by working with implicit pointers.

Garbage collector

In my VM it is semi-automatic. Those. the developer himself decides when to call the garbage collector. It does not work using a regular pointer counter, as in Python, Perl, Ruby, Lua, etc. It is implemented through a marker system. Those. when a variable is intended to be assigned a temporary value, a pointer to this value is added to the garbage collector's stack. In the future, the collector quickly runs through the already prepared list of pointers.

Handling try/catch/finally blocks

As in any modern language, exception handling is an important component. The VM core is wrapped in a try..catch block, which can return to code execution after catching an exception by pushing some information about it onto the stack. In application code, you can define try/catch/finally blocks of code, specifying entry points at catch (exception handler) and finally/end (end of the block).

Multithreading

It is supported at the VM level. It's simple and convenient to use. It works without an interrupt system, so the code should be executed in several threads several times faster, respectively.

External libraries for VMs

There is no way to do without this. VM supports imports, similar to how it is implemented in other languages. You can write part of the code in Mash and part of the code in native languages, then link them into one.

Translator from high-level Mash language to bytecode for VMs

Intermediate language

To quickly write a translator from a complex language into VM code, I first developed an intermediate language. The result was an assembler-like terrible spectacle that there is no particular point in considering here. I will only say that at this level the translator processes most constants and variables, calculates their static addresses and the addresses of entry points.

Translator architecture

I didn't choose the best architecture for implementation. The translator does not build a code tree, as other translators do. He looks at the beginning of the structure. Those. if the piece of code being parsed looks like “while <condition>:”, then it is obvious that this is a while loop construct and needs to be processed as a while loop construct. Something like a complex switch-case.

Thanks to this architectural solution, the translator turned out to be not very fast. However, the ease of its modification has increased significantly. I added the necessary structures faster than my coffee could cool down. Full OOP support was implemented in less than a week.

Code optimization

Here, of course, it could have been implemented better (and will be implemented, but later, as soon as one gets around to it). So far, the optimizer only knows how to cut off unused code, constants and imports from the assembly. Also, several constants with the same value are replaced by one. That's all.

Mash language

Basic concept of language

The main idea was to develop the most functional and simple language possible. I think that the development copes with its task with a bang.

Code blocks, procedures and functions

All constructions in the language are opened with a colon. : and are closed by the operator end.

Procedures and functions are declared as proc and func, respectively. The arguments are listed in parentheses. Everything is like most other languages.

Operator return you can return a value from a function, operator break allows you to exit the procedure/function (if it is outside the loops).

Example code:

...

func summ(a, b):
  return a + b
end

proc main():
  println(summ(inputln(), inputln()))
end

Supported Designs

  • Loops: for..end, while..end, until..end
  • Conditions: if..[else..]end, switch..[case..end..][else..]end
  • Methods: proc <name>():... end, func <name>():... end
  • Label & goto: <name>:, jump <name>
  • Enum enumerations and constant arrays.

Variables

The translator can determine them automatically, or if the developer writes var before defining them.

Code examples:

a ?= 10
b ?= a + 20

var a = 10, b = a + 20

Global and local variables are supported.

OOP

Well, we’ve come to the most delicious topic. Mash supports all object-oriented programming paradigms. Those. classes, inheritance, polymorphism (including dynamic), dynamic automatic reflection and introspection (full).

Without further ado, it’s better to just give code examples.

A simple class and working with it:

uses <bf>
uses <crt>

class MyClass:
  var a, b
  proc Create, Free
  func Summ
end

proc MyClass::Create(a, b):
  $a = new(a)
  $b = new(b)
end

proc MyClass::Free():
  Free($a, $b)
  $rem()
end

func MyClass::Summ():
  return $a + $b
end

proc main():
  x ?= new MyClass(10, 20)
  println(x->Summ())
  x->Free()
end

Will output: 30.

Inheritance and polymorphism:

uses <bf>
uses <crt>

class MyClass:
  var a, b
  proc Create, Free
  func Summ
end

proc MyClass::Create(a, b):
  $a = new(a)
  $b = new(b)
end

proc MyClass::Free():
  Free($a, $b)
  $rem()
end

func MyClass::Summ():
  return $a + $b
end

class MyNewClass(MyClass):
  func Summ
end

func MyNewClass::Summ():
  return ($a + $b) * 2
end

proc main():
  x ?= new MyNewClass(10, 20)
  println(x->Summ())
  x->Free()
end

Will output: 60.

What about dynamic polymorphism? Yes, this is reflection!:

uses <bf>
uses <crt>

class MyClass:
  var a, b
  proc Create, Free
  func Summ
end

proc MyClass::Create(a, b):
  $a = new(a)
  $b = new(b)
end

proc MyClass::Free():
  Free($a, $b)
  $rem()
end

func MyClass::Summ():
  return $a + $b
end

class MyNewClass(MyClass):
  func Summ
end

func MyNewClass::Summ():
  return ($a + $b) * 2
end

proc main():
  x ?= new MyClass(10, 20)
  x->Summ ?= MyNewClass::Summ
  println(x->Summ())
  x->Free()
end

Will output: 60.

Now let's take a moment to introspect for simple values ​​and classes:

uses <bf>
uses <crt>

class MyClass:
  var a, b
end

proc main():
  x ?= new MyClass
  println(BoolToStr(x->type == MyClass))
  x->rem()
  println(BoolToStr(typeof(3.14) == typeReal))
end

Will output: true, true.

About assignment operators and explicit pointers

The ?= operator is used to assign a variable a pointer to a value in memory.
The = operator changes a value in memory using a pointer from a variable.
And now a little about explicit pointers. I added them to the language so that they exist.
@<variable> — take an explicit pointer to a variable.
?<variable> — get a variable by pointer.
@= — assign a value to a variable by an explicit pointer to it.

Example code:

uses <bf>
uses <crt>

proc main():
  var a = 10, b
  b ?= @a
  PrintLn(b)
  b ?= ?b
  PrintLn(b)
  b++
  PrintLn(a)
  InputLn()
end

Will output: some number, 10, 11.

Try..[catch..][finally..]end

Example code:

uses <bf>
uses <crt>

proc main():
  println("Start")
  try:
    println("Trying to do something...")
    a ?= 10 / 0
  catch:
    println(getError())
  finally:
    println("Finally")
  end
  println("End")
  inputln()
end

Plans for the future

I keep looking and looking at GraalVM & Truffle. My runtime environment does not have a JIT compiler, so in terms of performance it is currently only competitive with Python. I hope that I will be able to implement JIT compilation based on GraalVM or LLVM.

Repository

You can play with the developments and follow the project yourself.

Site
Repository on GitHub

Thank you for reading to the end if you did.

Source: habr.com

Add a comment