2025-06-04

1. ulang

Last summer, I wanted to make some bigger project in Java to have a nice portfolio piece under my belt. I had just finished my TOML parser, which was just a lovely experience of a straight week of coding. And back then, I had no idea how programming language compilers were made; the problem space was just mystical to me, and I really don't like that in software. And I, generally, only like black boxes that you use to cook salmon. So I made my own interpreted language in Java called ulang.

The syntax is somewhat similar to Odin/Go:

println(sum_f32([ 1, 2, 3, 4, 5 ]))
func sum_f32(array: ...) f32 {
    result: f32
    for i: num = 0; i < len(array); i = i + 1 {
        result = result + array[i]
    }
    return result
}

It lacks more fancy features like type inference, foreach or the += operator, or compound types (structs/classes)... I mean, I guess, you could use arrays for types, since they are heterogeneous here...

balls = [ [ 15, 200, 200, 1, -1, "red" ] ]

# struct Ball {
ball_radii = 0 # technically, a circle has an infinite amount of radii, 
ball_pos_x = 1 # | it's just that they're all the same
ball_pos_y = 2
ball_vel_x = 3
ball_vel_y = 4
ball_color = 5
# }

It does however have a foreign function interface, many of the normal math operators, type aliassing, arrays and $insert statements (that work similar to C).

2. Type System Fun

The language also has static typing. Which, at first, was incredibly simple, with only number, bool, string, char and a list (dynamic array) type. I also had structs and hash maps planned but just never got around to implementing them. Unfortunately, all of the simplicity evaporated once I started interacting with other Java code (bindings/foreign functions). So now I have varargs as a type, any. And also typedef + foreign to declare foreign types... So, by now, I just have this beautiful block of:

# Java primitives
typedef u8  = foreign "byte" 
typedef i8  = foreign "char"
typedef i16 = foreign "short"
typedef i32 = foreign "int"
typedef i64 = foreign "long"
typedef f32 = foreign "float"
typedef f64 = foreign "double"
typedef u8_arr  = foreign "[B"  # array of bytes
typedef i8_arr  = foreign "[C"  # array of oh it's alphabetical
typedef b8_arr  = foreign "[Z"  # array of nevermind
typedef i16_arr = foreign "[S"  # array of shorts!
typedef i32_arr = foreign "[I"  # array of integers!
typedef i64_arr = foreign "[J"  # array of jiants?
typedef f32_arr = foreign "[F"  # array of floats!
typedef f64_arr = foreign "[D"  # array of doubles!

# typedef f32_arr_arr_arr = foreign "[[[F"  # 3D array

Java's reflection was actually a massive obstacle for my interpreter. Since, firstly, primitive types cannot be passed as a generic (that's why you use List<Float> and Map<String, Integer>). When you try to pass them wrapped in, for example, an Object the primitives simply "decay" to their respective wrapper types and so in the later stages of the project, type validation sometimes just kind of failed...

Take this function, for example:

public static Object try_cast_some(Object any, Class needed) {
    if(needed.equals(byte.class))    return ((Number) any).byteValue();
    if(needed.equals(short.class))   return ((Number) any).shortValue();
    if(needed.equals(int.class))     return ((Number) any).intValue();
    if(needed.equals(long.class))    return ((Number) any).longValue();
    if(needed.equals(float.class))   return ((Number) any).floatValue();
    if(needed.equals(double.class))  return ((Number) any).doubleValue();
    if(needed.equals(Byte.class))    return Byte.   valueOf(((Number) any).byteValue());
    if(needed.equals(Short.class))   return Short.  valueOf(((Number) any).shortValue());
    if(needed.equals(Integer.class)) return Integer.valueOf(((Number) any).intValue());
    ...

I wrote this function because I thought it would convert Integer to int. And it does, just that immediately after int decays right back to Integer... Although it does, in fact, cast between different numbers, which is nice when all my literals are Doubles.

And even once I realized that this decaying was happening, I still had to deal with primitives and wrappers being mixed in certain places so, I have this beautiful little piece of code...

private boolean is_of_type(Class a, Class b, String b_typename) {
    return a.isAssignableFrom(b) || b.isAssignableFrom(a)
        || b_typename.equals(SyntaxDefinitions.TYPE_ANY)
        || (is_number(a) && is_number(b))
        || a.getSimpleName().equalsIgnoreCase(b.getSimpleName());
}
private boolean is_number(Class c) {
    return Number.class.isAssignableFrom(c) || c.isPrimitive();
}

as well as this (TypeValidation):

private boolean types_match(String a, String b) {
    if(a.equals(SyntaxDefinitions.TYPE_ANY) || b.equals(SyntaxDefinitions.TYPE_ANY)) return true;
    if(a.startsWith("[] ") || b.startsWith("[] "))
        if(a.endsWith("any") || b.endsWith("any")) return true;
    if(a.equals(SyntaxDefinitions.TYPE_NUMBER) || a.equals(SyntaxDefinitions.TYPE_BOOLEAN) || a.equals(SyntaxDefinitions.TYPE_CHAR))
        if(SyntaxDefinitions.types.get(b).isPrimitive()) return true;
    if(b.equals(SyntaxDefinitions.TYPE_NUMBER) || b.equals(SyntaxDefinitions.TYPE_BOOLEAN) || b.equals(SyntaxDefinitions.TYPE_CHAR))
        if(SyntaxDefinitions.types.get(a).isPrimitive()) return true;
    return a.equals(b);
}

Nowadays, I would avoid this problem by not using Java for an interpreter, but back then it took me a whole lot of time to grok what was happening and fix it.

2.1. Operator Precedence

Operator precedence (or "order of operations") is relatively easy to miss in math. We quickly learn that multiplication goes before addition and raising something to the power goes before multiplication. And so, because I learned it so soon, I never actually questioned why this happens.

Now then, why does operator precedence exist? Umm... I dunno ¯\_(ツ)_/¯... I guess, someone really wanted to write 3 + 4 * x instead of 4 * x + 3... And for what gets precedence over what? Well... there is the nice and logical chain of: parentheses → ... → exponentiation → multiplication → addition → ..., but, I guess, log in log x * 4 has higher precedence than the integral in ∫xdx? And what about set operations? What about logical operations? What about bitwise operators in C!? Oh, so window_options & RESIZABLE != 0 is window_options & (RESIZABLE != 0). Okay...

So, I either use the superior (reverse) Polish notation that no one knows, or I structure my parser around this archaic, mostly illogical, but widely accepted set of rules... I chose the second option.

One reason for my choice was that, a little while ago, I had watched Jonathan Blow talk about his parser, and he was teaching how to solve this exact problem. Now I know that the algorithm is darn similar to a way earlier one by Vaughan Pratt, but I don't really care who discovered it and I would not be surprised if there was a rediscovery here. Anyways, this algorithm is actually really quite beautiful. You parse binary operations with increasing precedence recursively and decreasing (or equal) precedence iteratively. Since, parsing recursively naturally results in a right-leaning tree and iteratively in a left-leaning tree... I do not work with trees much, but that is still the only place where I have seen that sort of a strategy.

2.2. Other Deceptively Simple Problems

When I was first starting on the interpreter, it was very hard for me to wrap my head around how to evaluate functions. But the answer is "just do it." A function is made up of its declaration (arguments, return parameter, and so on...) and also its body. The body in my interpreter is just an array of statements, basically an array of lines of code and a scope. The hard part for me was actually making sure that a function would not need anything else. I eventually had to just go with it and hope, that I would not need to rewrite much.

Something that I did make more difficult for myself than it needed to be was types vs values. I tried to detect types in my lexical analyzer, so they are, weirdly, not hoisted and I need to declare them before hand and I don't really care about them in the parser. Basically, the lexer is, simply, not the time to differentiate between what is a type and what is a value. Just leave them as an "identifier." In my opinion, you should actually determine what is a type and what is not in a stage after the parser. I have seen this called the: "semantic analysis" stage in the Zig compiler. And that is what I did in my later language. But this is all language dependent.

Otherwise, for variables, I just store a Stack> scopes. For actual binary operators (like adding 2 numbers, and so on...) I simply cast 2 values to Doubles. If statements are just if statements in java: ... if((Boolean) eval(statement.condition)) interpret(statement.body); .... Variable arguments just slightly worsen the possible compiler error specificity. There are more various smaller problems that were hard for me the first time around, but for the sake of brevity, I won't mention them.

3. Final Words

All in all, the project ended up having around 3000 lines of code. I have a small standard library, a small (second degree) wrapper over OpenGL and GLFW, and program that draws a spinning sphere of points. I further facilitated my addiction of writing parsers. I also learned a lot about Java's reflection, for the first time in my life actually saw a reason to update Java from 1.8 (huge mistake), and I became far more syntax agnostic, to the point where the difference between Rust's and Java's syntaxes is that Rust's takes longer to learn... (But do not get me wrong. I still write #define STRUCT(NAME, ...) typedef struct { __VA_ARGS__ N##NAME;} if only I can).