handmade.network » Forums » Work-in-Progress » HoC language and compiler v0.1
dfrunza
Dumitru Frunza
22 posts

Apprentice
github.com/dfrunza/hoc

#12990 HoC language and compiler v0.1
7 months, 3 weeks ago

:) fixed - HoC 0.1d

Programming is hard, let's go shopping!
dfrunza
Dumitru Frunza
22 posts

Apprentice
github.com/dfrunza/hoc

#14708 HoC language and compiler v0.1
3 weeks, 3 days ago

Hi there, fellow Handmade enthusiasts!

Today is the official closing day of HoC, a project I have been working on for 1 year. It's been a long year of hard work, sweat and blood, and time has come when I feel that mission has been accomplished.

The compiler in the current state generates x86 32-bit assembly code, which then gets assembled by the Microsoft Assembler (MASM) into the final executable. The language is blocked-structured with a largely C-like syntax and some Pascal sprinkling, and it supports 'char', 'int', 'float', string and multi-dimensional array types; it's got the usual control-flow statements 'if-else', 'while' and 'do-while'; procedures and recursive invocations; Windows API functions can be invoked as well.
One rather nice feature is that the procedures do not have to be forward-declared and that types are automatically inferred.
What the language lacks most prominently is 'struct' and 'union' types.

The entire project is 11,797 lines and it's the largest I've ever written. The counting of lines is done by a little program written in HoC called 'sloc' which is about 200 lines long.
Here's the GitHub link to the project: HoC source code

The goal was not to have a feature-complete product, but rather to understand as deeply as possible, given my mental capacities, of the anatomy of a compiler, how it works and what it takes to build one from scratch. As such, the program is not polished at all and is ridden with bugs.

The idea was to study and apply the techniques from the classic book Compilers: Principles, Techniques, and Tools (1st Edition). I would painstakingly read a chapter, sometimes several times over, try and understand as best as possible everything that was said, then go about implementing the ideas covered in it. That was the workflow pattern for all but two chapters: 'LR Parsers' and 'Code Optimizations' which I've skipped.

I think I can now count myself among the people who can legitimately claim they have read the "Dragon Book" (80% of it).

So I'd like to share my thoughts and do a short review of the book.

It's a difficult book for a beginner, and in hindsight, it was foolish to read it as an introduction to compilers; as a 2nd or 3rd book - sure - but definetly not as 1st.

The authors generally do a pretty good job at explaining the algorithms and how things work, in a coherent story, although curiously structured. The theory is medium-to-difficult, and there's a lot of practical ideas in-between and there are very good examples that illustrate the points. Occasionally they prove theorems, but those are short and involve simple concepts.

The best chapters, in my opinion, are the 'Type Checking', 'Run-Time Environments' and 'Intermediate Code Generation'. These are relatively easy to grasp and implement, and things make sense once you've made it that far. The type checking chapter introduces the technique of 'type unification' which is awesome because it's simple yet powerful.

The bad chapters are the first ones: 'Lexical Analysis', 'Syntax Analysis' and 'Syntax-Directed Translation'. I've mentioned the 'weird structuring' of the book, and it's because it starts with these specific chapters first. They are THICK on theory, and reading thru them makes you wonder: um, OK, but how does this apply to a compiler, again?!

Eventually, it downed to me later that the core technique they were trying to push is the 'syntax-directed translation', and for that to work, they had to introduce automatic 'lexical analyzers' and automatic 'parser generators' and had to stick this body of theory at the start of the book.

Fortunately, a trademark of the authors is that they give several alternative solutions to the same problem, so they do show manual techniques like recursive-descent, so it's possible to skip ahead, once at least one solution has been understood.

Another good aspect is that at the end of each chapter there's one or two pages of references to literature, and with commentaries, which helps in finding extra info on the subjects. At the end of the book there's pages and pages of more references to papers, although some of those are outdated.


And so it ends the HoC saga!
Thanks for reading!



Programming is hard, let's go shopping!