For several years I tried my hand at developing my own programming language. I wanted to create, in my opinion, the most simple, fully functional and convenient language possible.
In this article I want to highlight the main stages of my work and, to begin with, describe the created concept of the language and its first implementation, which I am currently working on.
Let me say in advance that I wrote the entire project in Free Pascal, because... programs on it can be assembled for a huge number of platforms, and the compiler itself produces very optimized binaries (I collect all the components of the project with the O2 flag).
Language runtime
First of all, it’s worth talking about the virtual machine that I had to write to run future applications in my language. I decided to implement a stack architecture, perhaps, because it was the easiest way. I didn’t find a single normal article on how to do this in Russian, so after familiarizing myself with the English-language material, I sat down to designing and writing my own bicycle. Next I will present my “advanced” ideas and developments in this matter.
Stack implementation
Obviously, at the top of the VM is the stack. In my implementation it works in blocks. Essentially this is a simple array of pointers and a variable to store the index of the top of the stack.
When it is initialized, an array of 256 elements is created. If more pointers are pushed onto the stack, its size increases by the next 256 elements. Accordingly, when removing elements from the stack, its size is adjusted.
The VM uses several stacks:
- Main stack.
- A stack for storing return points.
- Garbage collector stack.
- Try/catch/finally block handler stack.
Constants and Variables
This one is simple. Constants are handled in a separate small piece of code and are available in future applications via static addresses. Variables are an array of pointers of a certain size, access to its cells is carried out by index - i.e. static address. Variables can be pushed to the top of the stack or read from there. Actually, because While our variables essentially store pointers to values in VM memory, the language is dominated by working with implicit pointers.
Garbage collector
In my VM it is semi-automatic. Those. the developer himself decides when to call the garbage collector. It does not work using a regular pointer counter, as in Python, Perl, Ruby, Lua, etc. It is implemented through a marker system. Those. when a variable is intended to be assigned a temporary value, a pointer to this value is added to the garbage collector's stack. In the future, the collector quickly runs through the already prepared list of pointers.
Handling try/catch/finally blocks
As in any modern language, exception handling is an important component. The VM core is wrapped in a try..catch block, which can return to code execution after catching an exception by pushing some information about it onto the stack. In application code, you can define try/catch/finally blocks of code, specifying entry points at catch (exception handler) and finally/end (end of the block).
Multithreading
It is supported at the VM level. It's simple and convenient to use. It works without an interrupt system, so the code should be executed in several threads several times faster, respectively.
External libraries for VMs
There is no way to do without this. VM supports imports, similar to how it is implemented in other languages. You can write part of the code in Mash and part of the code in native languages, then link them into one.
Translator from high-level Mash language to bytecode for VMs
Intermediate language
To quickly write a translator from a complex language into VM code, I first developed an intermediate language. The result was an assembler-like terrible spectacle that there is no particular point in considering here. I will only say that at this level the translator processes most constants and variables, calculates their static addresses and the addresses of entry points.
Translator architecture
I didn't choose the best architecture for implementation. The translator does not build a code tree, as other translators do. He looks at the beginning of the structure. Those. if the piece of code being parsed looks like “while <condition>:”, then it is obvious that this is a while loop construct and needs to be processed as a while loop construct. Something like a complex switch-case.
Thanks to this architectural solution, the translator turned out to be not very fast. However, the ease of its modification has increased significantly. I added the necessary structures faster than my coffee could cool down. Full OOP support was implemented in less than a week.
Code optimization
Here, of course, it could have been implemented better (and will be implemented, but later, as soon as one gets around to it). So far, the optimizer only knows how to cut off unused code, constants and imports from the assembly. Also, several constants with the same value are replaced by one. That's all.
Mash language
Basic concept of language
The main idea was to develop the most functional and simple language possible. I think that the development copes with its task with a bang.
Code blocks, procedures and functions
All constructions in the language are opened with a colon. : and are closed by the operator end.
Procedures and functions are declared as proc and func, respectively. The arguments are listed in parentheses. Everything is like most other languages.
Operator return you can return a value from a function, operator break allows you to exit the procedure/function (if it is outside the loops).
Example code:
...
func summ(a, b):
return a + b
end
proc main():
println(summ(inputln(), inputln()))
end
Supported Designs
- Loops: for..end, while..end, until..end
- Conditions: if..[else..]end, switch..[case..end..][else..]end
- Methods: proc <name>():... end, func <name>():... end
- Label & goto: <name>:, jump <name>
- Enum enumerations and constant arrays.
Variables
The translator can determine them automatically, or if the developer writes var before defining them.
Code examples:
a ?= 10
b ?= a + 20
var a = 10, b = a + 20
Global and local variables are supported.
OOP
Well, we’ve come to the most delicious topic. Mash supports all object-oriented programming paradigms. Those. classes, inheritance, polymorphism (including dynamic), dynamic automatic reflection and introspection (full).
Without further ado, it’s better to just give code examples.
A simple class and working with it:
uses <bf>
uses <crt>
class MyClass:
var a, b
proc Create, Free
func Summ
end
proc MyClass::Create(a, b):
$a = new(a)
$b = new(b)
end
proc MyClass::Free():
Free($a, $b)
$rem()
end
func MyClass::Summ():
return $a + $b
end
proc main():
x ?= new MyClass(10, 20)
println(x->Summ())
x->Free()
end
Will output: 30.
Inheritance and polymorphism:
uses <bf>
uses <crt>
class MyClass:
var a, b
proc Create, Free
func Summ
end
proc MyClass::Create(a, b):
$a = new(a)
$b = new(b)
end
proc MyClass::Free():
Free($a, $b)
$rem()
end
func MyClass::Summ():
return $a + $b
end
class MyNewClass(MyClass):
func Summ
end
func MyNewClass::Summ():
return ($a + $b) * 2
end
proc main():
x ?= new MyNewClass(10, 20)
println(x->Summ())
x->Free()
end
Will output: 60.
What about dynamic polymorphism? Yes, this is reflection!:
uses <bf>
uses <crt>
class MyClass:
var a, b
proc Create, Free
func Summ
end
proc MyClass::Create(a, b):
$a = new(a)
$b = new(b)
end
proc MyClass::Free():
Free($a, $b)
$rem()
end
func MyClass::Summ():
return $a + $b
end
class MyNewClass(MyClass):
func Summ
end
func MyNewClass::Summ():
return ($a + $b) * 2
end
proc main():
x ?= new MyClass(10, 20)
x->Summ ?= MyNewClass::Summ
println(x->Summ())
x->Free()
end
Will output: 60.
Now let's take a moment to introspect for simple values and classes:
uses <bf>
uses <crt>
class MyClass:
var a, b
end
proc main():
x ?= new MyClass
println(BoolToStr(x->type == MyClass))
x->rem()
println(BoolToStr(typeof(3.14) == typeReal))
end
Will output: true, true.
About assignment operators and explicit pointers
The ?= operator is used to assign a variable a pointer to a value in memory.
The = operator changes a value in memory using a pointer from a variable.
And now a little about explicit pointers. I added them to the language so that they exist.
@<variable> — take an explicit pointer to a variable.
?<variable> — get a variable by pointer.
@= — assign a value to a variable by an explicit pointer to it.
Example code:
uses <bf>
uses <crt>
proc main():
var a = 10, b
b ?= @a
PrintLn(b)
b ?= ?b
PrintLn(b)
b++
PrintLn(a)
InputLn()
end
Will output: some number, 10, 11.
Try..[catch..][finally..]end
Example code:
uses <bf>
uses <crt>
proc main():
println("Start")
try:
println("Trying to do something...")
a ?= 10 / 0
catch:
println(getError())
finally:
println("Finally")
end
println("End")
inputln()
end
Plans for the future
I keep looking and looking at GraalVM & Truffle. My runtime environment does not have a JIT compiler, so in terms of performance it is currently only competitive with Python. I hope that I will be able to implement JIT compilation based on GraalVM or LLVM.
Repository
You can play with the developments and follow the project yourself.
Thank you for reading to the end if you did.
Source: habr.com