HoC language and compiler v0.1

Hey guys,

A while ago I've started working on a compiler and now the code is ready for the first release :
https://github.com/dfrunza/hoc/archive/0.1.zip

HoC is a simple C clone and the main goal is to learn about the anatomy of a compiler.

v0.1 implements the minimum set of features that allows to write basic yet functional programs.
Implemented in this release are :
  • built-in types - int, float, bool and char
  • arrays and pointer types
  • procedures
  • flow control statements - if, else, while
  • evaluation of expressions
  • arithmetic and boolean operators
  • explicit and implicit type conversions
  • a simple "standard" library, which includes routines for :
    • printing (to stdout) of strings, ints and floats
    • math functions like abs(), min(), pow()
    • sorting of arrays of integers using quicksort and insertion sort algorithms

A short tutorial
Variable declarations
1
2
3
4
5
var float f = 1.23;
var int i = 66;
var [20]char buf; /* array of 20 chars allocated on the stack */
var char* str = "Hello World!";
var char* str = new(char, 20); /* array of 20 chars allocated on the heap */

Procedures
1
2
3
4
proc int add(int a, int b)
{
  return a + b;
}

Importation of code from another file
1
2
3
4
5
6
include "str.hoc";

proc void hello_world()
{
  print_str("Hello World!");
}

A complete program
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
include "str.hoc";

/*
   For all integers from 1 to 'n' print:
     fizz if it's divisible by 3
     buzz if it's divisible by 5
     fizzbuzz if it's divisible by 15
     else print the integer itself
*/
proc void fizzbuzz(int n)
{
  if(n > 0)
  {
    var int i = 1;
    while(i <= n)
    {
      if((i % 15) == 0)
        print_str("fizzbuzz");
      else if((i % 3) == 0)
        print_str("buzz");
      else if((i % 5) == 0)
        print_str("fizz");
      else
        print_int(i);

      print_str("\n");
      i = i + 1;
    }
  }
}

proc int main()
{
  fizzbuzz(100);
  return 0;
}

Compile a HoC source file
1
cmd$ hocc.exe my_program.hoc

If there are no errors in the source file, then the output executable `my_program.exe` will be created, otherwise the first error will be reported.

Included in the release package is a `test.hoc` file which can be compiled and run.

Edited by Dumitru Frunza on
I tried to compile test.hoc with the source files in a different directory than the compiler and got an error. If I put the sources in the same directory it compiles.
1
2
> bin\hocc.exe hoc\test.hoc
w:\hoc\hocc.c(298) : could not read file `vm.exe`

I compiled a very small program and the compiler didn't warn about using uninitialized variables. Are there default values ?
The float printing could be better: printing 1.0 instead of 0.1E1.

Good luck !

Edited by Simon Anciaux on Reason: Typo
HoC v0.1b

could not read file `vm.exe`
Fixed; the problem is that the compiler needs the `vm.exe` file which is expected to be located in the working directory. That file is an implementation of a stack-machine and the compiler is generating code for it - the output executable `test.exe` is in fact a copy of `vm.exe` plus the code.

The float printing could be better: printing 1.0 instead of 0.1E1
Done.

..the compiler didn't warn about using uninitialized variables. Are there default values ?
In this version the compiler does not do data flow analysis at all.

Thanks for suggestions and for taking the time to look into it!
What kind of magic is this?
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
typedef struct AstBlock
{
  AstNode;

  List node_list;

  int block_id;
  int nesting_depth;
  AstBlock* encl_block;
  List local_decls;
  List nonlocal_occurs;

  List access_links;
  int links_size;
  int locals_size;
}
AstBlock;

...

AstBlock*
new_block(SourceLocation* src_loc)
{
  AstBlock* node = mem_push_struct(arena, AstBlock);
  node->kind = AstNodeKind_Block;
  ...

node->kind? AstBlock doesn't have member with name "kind". How does this compile at all?

And what is "AstNode;" in "AstBlock" structure? Just a declaration without member name, so it uses space in memory? First time I see this kind of syntax being used.

Edited by Mārtiņš Možeiko on
Welcome to C (not even the plus plus kind).

Anonymous structs, its similar to anonymous unions (which have valid use cases).
C11 and some GNU extensions allow this.

AstNode is typedef-ed anonymous struct I believe.

Edited by pragmatic_hero on
Wait a sec, by anonymous struct I always called this thing:
1
2
3
4
5
6
7
typedef struct Foo
{
   struct      // this is anon struct
   {
     int x, y;
   };
} Foo;


Are you saying that code above is equivalent with following code:
1
2
3
4
5
6
7
8
9
tyepdef struct v2
{
  int x, y;
} v2;

typedef struct Foo
{
   v2;
} Foo;

That's interesting, I didn't know that.

It seems this works fine with MSVC, but I cannot make this work with gcc: https://godbolt.org/g/XEuCPi

Edit: OK, adding "-fplan9-extensions" makes it work: https://godbolt.org/g/ARHy3M
Its documented here: https://gcc.gnu.org/onlinedocs/gcc-4.7.1/gcc/Unnamed-Fields.html

Edited by Mārtiņš Možeiko on
There is more magical code that compiles for some reason I don't understand...

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
typedef struct
{
  AstNode* ast;
}
AstNodeRef;

typedef struct
{
  AstNode;

  AstNodeRef type_expr;
}
AstPointer;

void DEBUG_print_ast_node(String* str, int indent_level, AstNode* node, char* tag);

...
{
    AstPointer* ptr = (AstPointer*)node;
    DEBUG_print_ast_node(str, indent_level, ptr->type_expr, "type_expr");
}


How the hell type_expr can be passed to "AstNode*" argument??

Edited by Mārtiņš Možeiko on
This is completely bananas...
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
typedef struct a { int x; } a;
typedef struct a2 { int x; } a2[2];

struct b { a; };
struct b2 { a2; };

#include <stdio.h>
int main()
{
  printf("%zu\n", sizeof(struct b));
  printf("%zu\n", sizeof(struct b2));
}

This prints out 4 and 0. wtf...

Edited by Mārtiņš Možeiko on
1
DEBUG_print_ast_node(str, indent_level, ptr->type_expr, "type_expr");

This one doesn't compile :)
Martins, this snippet code is from the dev branch, which is in non-compilable state.

That being said, the construct :
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
typedef struct AstBlock
{
  AstNode;

  List node_list;
  ...
}

AstBlock*
new_block(SourceLocation* src_loc)
{
  AstBlock* node = mem_push_struct(arena, AstBlock);
  node->kind = AstNodeKind_Block;
  ...
}

is compilable with MSVC, and it's something that it's used on purpose - AstNode fields can be accessed from a AstBlock variable as if those fields where defined directly in the AstBlock structure.

Here's the familiar way that does the same thing :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
typedef struct AstBlock
{
  AstNode node;

  List node_list;
  ...
}

AstBlock*
new_block(SourceLocation* src_loc)
{
  AstBlock* node = mem_push_struct(arena, AstBlock);
  node->node.kind = AstNodeKind_Block;
  ...
}

But now the fields of AstNode are "scoped" by the `node` field of AstBlock.

Hope this clarifies the situation a bit...

Edited by Dumitru Frunza on Reason: typo
It doesn't compile with MSVC for me.
1
'a2 [2]' : no members defined using this type

This does :
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
typedef struct a { int x; } a;
typedef struct a2 { int x; } a2[2];

struct b { a; };
struct b2 { struct a2; };

#include <stdio.h>
int main()
{
  printf("%d\n", sizeof(struct b));
  printf("%d\n", sizeof(struct b2));
}

And the output :
1
2
4
4
Oh, I didn't think that master would not compile. Ok, that clears things up a bit.

I was using gcc for that last code fragment. But with what syntax do you access "x" variable from b2 structure in second array element?

Edited by Mārtiņš Možeiko on
..with what syntax do you access "x" variable from b2 structure in second array element?
I don't know..
:) That's cool. I'm just sitting and thinking what it could be used for...

Anyways, here are bunch of one-line programs that make hocc crash/assert:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
proc void fibo(int,n) {}
proc void fibo() { if(true) ; {} }
proc void fibo(int n) { var int*f_n = n; }
proc void fibo() { if(true) { { } } }
proc void fibo() { { fibo(); } }
proc int gcd_(int a, int0b);
proc int*gcd_(int a) { return a; }
proc int gcd_(int a, int b) { &   a = b; return a; }
proc int gcd_(int a) { a : a; }
proc int gcd(int );
proc void swap(int* data) { data[0](= 0; }
HoC 0.1c

That's quite a bunch of bugs :)

These 3 lines below have same root cause - bug in the parsing of formal proc arguments.
1
2
3
proc void fibo(int,n) {}
proc int gcd_(int a, int0b);
proc int gcd(int );

Here anonymous blocks were not handled but thankfully the feature was easy to add.
1
2
3
proc void fibo() { if(true) ; {} }
proc void fibo() { if(true) { { } } }
proc void fibo() { { fibo(); } }

Sketchy support for labels and gotos; disabled for now.
1
proc int gcd_(int a) { a : a; }

Bug in the parsing of procedure calls; enforcing the calling of procs by ID only.
1
proc void swap(int* data) { data[0](= 0; }

Two bugs related to type checking... Oh boy, type checking is giving me a hard time, the code is so messy. The plan for the next release is to straighten it up, priority nr. 1
Anyway, the bugs are fixed, sort of...
1
2
proc int*gcd_(int a) { return a; }
proc int gcd_(int a, int b) { &   a = b; return a; }

Thanks Martins!
Very nice!

Here are few more I found:
1
2
3
proc void abs(int a) { a =--a; }
proc void fpdec(int* significand) { significand[ ] = 0; }
proc void print_str(char* str) { putc(*str)+str = str + 1; }