Handmade Network»Forums»Work-in-Progress
Dumitru Frunza
24 posts
Apprentice github.com/dfrunza/hoc
HoC language and compiler v0.1
Edited by Dumitru Frunza on
Hey guys,

A while ago I've started working on a compiler and now the code is ready for the first release :
https://github.com/dfrunza/hoc/archive/0.1.zip

HoC is a simple C clone and the main goal is to learn about the anatomy of a compiler.

v0.1 implements the minimum set of features that allows to write basic yet functional programs.
Implemented in this release are :
  • built-in types - int, float, bool and char
  • arrays and pointer types
  • procedures
  • flow control statements - if, else, while
  • evaluation of expressions
  • arithmetic and boolean operators
  • explicit and implicit type conversions
  • a simple "standard" library, which includes routines for :
    • printing (to stdout) of strings, ints and floats
    • math functions like abs(), min(), pow()
    • sorting of arrays of integers using quicksort and insertion sort algorithms

A short tutorial
Variable declarations
1
2
3
4
5
var float f = 1.23;
var int i = 66;
var [20]char buf; /* array of 20 chars allocated on the stack */
var char* str = "Hello World!";
var char* str = new(char, 20); /* array of 20 chars allocated on the heap */

Procedures
1
2
3
4
proc int add(int a, int b)
{
  return a + b;
}

Importation of code from another file
1
2
3
4
5
6
include "str.hoc";

proc void hello_world()
{
  print_str("Hello World!");
}

A complete program
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
include "str.hoc";

/*
   For all integers from 1 to 'n' print:
     fizz if it's divisible by 3
     buzz if it's divisible by 5
     fizzbuzz if it's divisible by 15
     else print the integer itself
*/
proc void fizzbuzz(int n)
{
  if(n > 0)
  {
    var int i = 1;
    while(i <= n)
    {
      if((i % 15) == 0)
        print_str("fizzbuzz");
      else if((i % 3) == 0)
        print_str("buzz");
      else if((i % 5) == 0)
        print_str("fizz");
      else
        print_int(i);

      print_str("\n");
      i = i + 1;
    }
  }
}

proc int main()
{
  fizzbuzz(100);
  return 0;
}

Compile a HoC source file
1
cmd$ hocc.exe my_program.hoc

If there are no errors in the source file, then the output executable `my_program.exe` will be created, otherwise the first error will be reported.

Included in the release package is a `test.hoc` file which can be compiled and run.
Simon Anciaux
1337 posts
HoC language and compiler v0.1
Edited by Simon Anciaux on Reason: Typo
I tried to compile test.hoc with the source files in a different directory than the compiler and got an error. If I put the sources in the same directory it compiles.
1
2
> bin\hocc.exe hoc\test.hoc
w:\hoc\hocc.c(298) : could not read file `vm.exe`

I compiled a very small program and the compiler didn't warn about using uninitialized variables. Are there default values ?
The float printing could be better: printing 1.0 instead of 0.1E1.

Good luck !
Dumitru Frunza
24 posts
Apprentice github.com/dfrunza/hoc
HoC language and compiler v0.1
HoC v0.1b

could not read file `vm.exe`
Fixed; the problem is that the compiler needs the `vm.exe` file which is expected to be located in the working directory. That file is an implementation of a stack-machine and the compiler is generating code for it - the output executable `test.exe` is in fact a copy of `vm.exe` plus the code.

The float printing could be better: printing 1.0 instead of 0.1E1
Done.

..the compiler didn't warn about using uninitialized variables. Are there default values ?
In this version the compiler does not do data flow analysis at all.

Thanks for suggestions and for taking the time to look into it!
Mārtiņš Možeiko
2559 posts / 2 projects
HoC language and compiler v0.1
Edited by Mārtiņš Možeiko on
What kind of magic is this?
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
typedef struct AstBlock
{
  AstNode;

  List node_list;

  int block_id;
  int nesting_depth;
  AstBlock* encl_block;
  List local_decls;
  List nonlocal_occurs;

  List access_links;
  int links_size;
  int locals_size;
}
AstBlock;

...

AstBlock*
new_block(SourceLocation* src_loc)
{
  AstBlock* node = mem_push_struct(arena, AstBlock);
  node->kind = AstNodeKind_Block;
  ...

node->kind? AstBlock doesn't have member with name "kind". How does this compile at all?

And what is "AstNode;" in "AstBlock" structure? Just a declaration without member name, so it uses space in memory? First time I see this kind of syntax being used.
101 posts
HoC language and compiler v0.1
Edited by pragmatic_hero on
Welcome to C (not even the plus plus kind).

Anonymous structs, its similar to anonymous unions (which have valid use cases).
C11 and some GNU extensions allow this.

AstNode is typedef-ed anonymous struct I believe.
Mārtiņš Možeiko
2559 posts / 2 projects
HoC language and compiler v0.1
Edited by Mārtiņš Možeiko on
Wait a sec, by anonymous struct I always called this thing:
1
2
3
4
5
6
7
typedef struct Foo
{
   struct      // this is anon struct
   {
     int x, y;
   };
} Foo;


Are you saying that code above is equivalent with following code:
1
2
3
4
5
6
7
8
9
tyepdef struct v2
{
  int x, y;
} v2;

typedef struct Foo
{
   v2;
} Foo;

That's interesting, I didn't know that.

It seems this works fine with MSVC, but I cannot make this work with gcc: https://godbolt.org/g/XEuCPi

Edit: OK, adding "-fplan9-extensions" makes it work: https://godbolt.org/g/ARHy3M
Its documented here: https://gcc.gnu.org/onlinedocs/gcc-4.7.1/gcc/Unnamed-Fields.html
Mārtiņš Možeiko
2559 posts / 2 projects
HoC language and compiler v0.1
Edited by Mārtiņš Možeiko on
There is more magical code that compiles for some reason I don't understand...

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
typedef struct
{
  AstNode* ast;
}
AstNodeRef;

typedef struct
{
  AstNode;

  AstNodeRef type_expr;
}
AstPointer;

void DEBUG_print_ast_node(String* str, int indent_level, AstNode* node, char* tag);

...
{
    AstPointer* ptr = (AstPointer*)node;
    DEBUG_print_ast_node(str, indent_level, ptr->type_expr, "type_expr");
}


How the hell type_expr can be passed to "AstNode*" argument??
Mārtiņš Možeiko
2559 posts / 2 projects
HoC language and compiler v0.1
Edited by Mārtiņš Možeiko on
This is completely bananas...
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
typedef struct a { int x; } a;
typedef struct a2 { int x; } a2[2];

struct b { a; };
struct b2 { a2; };

#include <stdio.h>
int main()
{
  printf("%zu\n", sizeof(struct b));
  printf("%zu\n", sizeof(struct b2));
}

This prints out 4 and 0. wtf...
Dumitru Frunza
24 posts
Apprentice github.com/dfrunza/hoc
HoC language and compiler v0.1
Edited by Dumitru Frunza on Reason: typo
1
DEBUG_print_ast_node(str, indent_level, ptr->type_expr, "type_expr");

This one doesn't compile :)
Martins, this snippet code is from the dev branch, which is in non-compilable state.

That being said, the construct :
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
typedef struct AstBlock
{
  AstNode;

  List node_list;
  ...
}

AstBlock*
new_block(SourceLocation* src_loc)
{
  AstBlock* node = mem_push_struct(arena, AstBlock);
  node->kind = AstNodeKind_Block;
  ...
}

is compilable with MSVC, and it's something that it's used on purpose - AstNode fields can be accessed from a AstBlock variable as if those fields where defined directly in the AstBlock structure.

Here's the familiar way that does the same thing :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
typedef struct AstBlock
{
  AstNode node;

  List node_list;
  ...
}

AstBlock*
new_block(SourceLocation* src_loc)
{
  AstBlock* node = mem_push_struct(arena, AstBlock);
  node->node.kind = AstNodeKind_Block;
  ...
}

But now the fields of AstNode are "scoped" by the `node` field of AstBlock.

Hope this clarifies the situation a bit...
Dumitru Frunza
24 posts
Apprentice github.com/dfrunza/hoc
HoC language and compiler v0.1
It doesn't compile with MSVC for me.
1
'a2 [2]' : no members defined using this type

This does :
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
typedef struct a { int x; } a;
typedef struct a2 { int x; } a2[2];

struct b { a; };
struct b2 { struct a2; };

#include <stdio.h>
int main()
{
  printf("%d\n", sizeof(struct b));
  printf("%d\n", sizeof(struct b2));
}

And the output :
1
2
4
4
Mārtiņš Možeiko
2559 posts / 2 projects
HoC language and compiler v0.1
Edited by Mārtiņš Možeiko on
Oh, I didn't think that master would not compile. Ok, that clears things up a bit.

I was using gcc for that last code fragment. But with what syntax do you access "x" variable from b2 structure in second array element?
Dumitru Frunza
24 posts
Apprentice github.com/dfrunza/hoc
HoC language and compiler v0.1
..with what syntax do you access "x" variable from b2 structure in second array element?
I don't know..
Mārtiņš Možeiko
2559 posts / 2 projects
HoC language and compiler v0.1
:) That's cool. I'm just sitting and thinking what it could be used for...

Anyways, here are bunch of one-line programs that make hocc crash/assert:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
proc void fibo(int,n) {}
proc void fibo() { if(true) ; {} }
proc void fibo(int n) { var int*f_n = n; }
proc void fibo() { if(true) { { } } }
proc void fibo() { { fibo(); } }
proc int gcd_(int a, int0b);
proc int*gcd_(int a) { return a; }
proc int gcd_(int a, int b) { &   a = b; return a; }
proc int gcd_(int a) { a : a; }
proc int gcd(int );
proc void swap(int* data) { data[0](= 0; }
Dumitru Frunza
24 posts
Apprentice github.com/dfrunza/hoc
HoC language and compiler v0.1
HoC 0.1c

That's quite a bunch of bugs :)

These 3 lines below have same root cause - bug in the parsing of formal proc arguments.
1
2
3
proc void fibo(int,n) {}
proc int gcd_(int a, int0b);
proc int gcd(int );

Here anonymous blocks were not handled but thankfully the feature was easy to add.
1
2
3
proc void fibo() { if(true) ; {} }
proc void fibo() { if(true) { { } } }
proc void fibo() { { fibo(); } }

Sketchy support for labels and gotos; disabled for now.
1
proc int gcd_(int a) { a : a; }

Bug in the parsing of procedure calls; enforcing the calling of procs by ID only.
1
proc void swap(int* data) { data[0](= 0; }

Two bugs related to type checking... Oh boy, type checking is giving me a hard time, the code is so messy. The plan for the next release is to straighten it up, priority nr. 1
Anyway, the bugs are fixed, sort of...
1
2
proc int*gcd_(int a) { return a; }
proc int gcd_(int a, int b) { &   a = b; return a; }

Thanks Martins!
Mārtiņš Možeiko
2559 posts / 2 projects
HoC language and compiler v0.1
Very nice!

Here are few more I found:
1
2
3
proc void abs(int a) { a =--a; }
proc void fpdec(int* significand) { significand[ ] = 0; }
proc void print_str(char* str) { putc(*str)+str = str + 1; }