Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

Some years ago a booklet has been published, called Build your own text editor, a guide to create a minimal text editor from scratch in the C programming language, based on the code of the kilo editor. It's a fun exercise to learn some C programming, because writing a toy text editor is fun.

In my attempt at learning the Zig programming language, I thought that rewriting that editor in Zig would have been a good exercise as well, and since learning resources for Zig aren't overly abundant, I thought it would have been a good idea to write a guide on how to do it, step by step, following the example of the booklet I mentioned before.

I want to make it clear that I'm neither an expert Zig programmer (this was my first Zig program), or an expert programmer in general (I'm self-taught and I just dabble with programming so to speak), so don't expect great technical insights or elaborate programming techniques, it's not the purpose of this document anyway. Moreover, I never claim that the way I solve a particular problem in this program is the best way to solve it, neither that it's the one that is the most idiomatic to the Zig programming language. Like its own predecessor, the C programming language, Zig is rather free-form in the sense that it doesn't enforce a particular programming style or paradigm. Still, also Zig has its idioms and best practices, and I try to follow them in general, but sometimes I will also show different ways to approach the same problem.

As a matter of fact, in this guide I don't strive to find the optimal solutions, from the point of view of performance optimizations and memory usage for example, but generally the simplest ones that I consider still acceptable. It is a minimal text editor, after all.

Also remember that the program we're creating is just a toy, an exercise to learn something more about a programming language, and not a tool that can have any serious use.

Compared to the original C version, here we will not respect the 1024 lines of code limit (from which the name kilo stems) and we will not be limited to a single file, since pursuing (or even worse achieving) such coinciseness would preclude us from using many useful features of the Zig programming language, such as importable modules and instantiable types. Having everything in a single file might make sense for small libraries, but it's not what we're doing here.

I do sometimes use collapsible notes:

Note

Heya!

Speaking of the knowledge required to understand this booklet, this is not a programming guide but rather an exercise, so I will expect that you have at least some notion in systems programming languages like C, in the sense that I will suppose that you know already what pointers are and how to use them, or anything that can be considered basic programming knowledge.

I will also expect that you know the basics of the Zig programming language, so if you didn't already, I suggest that you go through the exercises from the ziglings project before attempting this one. Other learning resources that I found useful are (in no particular order):

and the most important of all, always up to date:

Setup

This program was written in a Linux environment, for the Linux environment. Therefore if you use Windows you should install WSL2 and a Linux distribution (I tested it on Ubuntu Preview). I don't think it can work on MacOS, but you're free to try, and anyway it would not be too difficult to make it work there in the future.

Install zig

First thing, you need Zig itself. If you don't have it, or if you have a different version installed, you can download it here. Currently this document uses the 0.15.1 release, so it's the one that you should download.

Decompress the archive somewhere, for example in ~/.local/zig:

tar xf <archive> --directory=/home/username/.local
mv <name-of-extracted-directory> zig

Then add this directory to your path by adding this to your .bashrc

export PATH=$PATH:~/.local/zig

Now start a new terminal instance and see if zig is in your path:

zig version

And it should print

0.15.1

Finally, choose a directory for your project and initialize it:

mkdir kilo-zig
cd kilo-zig
zig init

Setup an editor

You will also need a text editor, before you can use your own. Which you shouldn't do anyway, so you need an editor.

I recommend you don't use any advanced tooling, like zls. I think that while still learning it's better not to use them, I find it's enough to rely on what the compiler tells you, then find and fix the mistakes by yourself.

ctags

Instead, if you use an editor that supports tags, I think it's a good idea to use them, to navigate faster between functions, types and other parts of our project. To use tags you must have universal-ctags installed, for example in Debian/Ubuntu you install it with:

sudo apt install universal-ctags

But ctags doesn't support natively Zig, so you should create a file at ~/.config/ctags/zig.ctags with this content:

--langdef=zig
--map-zig=.zig

--kinddef-zig=f,function,functions
--kinddef-zig=m,method,methods
--kinddef-zig=t,type,types
--kinddef-zig=v,field,fields

# functions
--regex-zig=/^(export +)?(pub +)?(inline +)?(extern .+ )?fn +([a-zA-Z0-9_]+)/\5/f/{exclusive}

# structs, union, enum
--regex-zig=/^(export +)?(pub +)?[\t ]*const +([a-zA-Z0-9_]+) = (struct|enum|union)/\3/t/{exclusive}{scope=push}
--regex-zig=/^}///{exclusive}{scope=pop}{placeholder}

# methods
--regex-zig=/^[\t ]+(pub +)?(inline +)?fn +([a-zA-Z0-9_]+)/\3/m/{exclusive}{scope=ref}

# public constants/variables
--regex-zig=/^(export +)?pub +(const|var) +([a-zA-Z0-9_]+)(:.*)? = .*/\3/v/{exclusive}

The Zig build system

From now on, I'll assume the directory of the project is located in ~/kilo-zig.

cd ~/kilo-zig

After having initialized the project with zig init inside that directory, a bunch of files will have been created. We don't need the src/root.zig file, because that is only useful if we are creating a library, and we're not, so we delete it:

rm src/root.zig

We'll also have to edit the build.zig file, which is the zig equivalent of a Makefile. I will not go into details about how the zig build system works, because I barely know it myself. What matters now is that currently the default build file is unsuitable to build our project. If we open it, we'll see that it does several things:

  • it defines build options
  • it defines a module (mod that points at src/root.zig)
  • it defines a main executable (exe that points at src/main.zig)
  • it adds steps for tests for both main executable and module

We'll have to remove all the steps that would build a module. So you remove:

  • the mod variable
  • the .imports field in the .addExecutable() argument
  • other lines with mod: mod_tests, run_mod_tests and so on

You'll also rename exe.name to kilo.

This is the final build.zig with most comments removed:

build.zig
const std = @import("std");

pub fn build(b: *std.Build) void {
    // Standard target options allow the person running `zig build` to choose
    // what target to build for.
    const target = b.standardTargetOptions(.{});
    // Standard optimization options allow the person running `zig build` to select
    // between Debug, ReleaseSafe, ReleaseFast, and ReleaseSmall.
    const optimize = b.standardOptimizeOption(.{});

    // Here we define an executable. An executable needs to have a root module
    // which needs to expose a `main` function.
    const exe = b.addExecutable(.{
        .name = "kilo",
        .root_module = b.createModule(.{
            .root_source_file = b.path("src/main.zig"),
            .target = target,
            .optimize = optimize,
        }),
    });

    // By default the install prefix is `zig-out/` but can be overridden by
    // passing `--prefix` or `-p`.
    b.installArtifact(exe);

    // This creates a top level step. Top level steps have a name and can be
    // invoked by name when running `zig build` (e.g. `zig build run`).
    // This will evaluate the `run` step rather than the default step.
    const run_step = b.step("run", "Run the app");

    // This creates a RunArtifact step in the build graph.
    const run_cmd = b.addRunArtifact(exe);
    run_step.dependOn(&run_cmd.step);

    // By making the run step depend on the default step, it will be run from the
    // installation directory rather than directly from within the cache directory.
    run_cmd.step.dependOn(b.getInstallStep());

    // This allows the user to pass arguments to the application in the build
    // command itself, like this: `zig build run -- arg1 arg2 etc`
    if (b.args) |args| {
        run_cmd.addArgs(args);
    }

    // Creates an executable that will run `test` blocks from the executable's
    // root module.
    const exe_tests = b.addTest(.{
        .root_module = exe.root_module,
    });

    // A run step that will run the second test executable.
    const run_exe_tests = b.addRunArtifact(exe_tests);

    // A top level step for running all tests.
    const test_step = b.step("test", "Run tests");
    test_step.dependOn(&run_exe_tests.step);
}

The main.zig file

Every respectable program has an entry point, to let users to actually execute it and do something with it. Our program is no exception.

Our entry point is located in src/main.zig, as we defined it in the build.zig script. The file doesn't have to be named this way, but it must contain a main() function.

Note

I like to have big banners to separate sections of the source code, you don't have to follow my habits of course, feel free to remove them if you don't like them.

zig init created a src/main.zig, which we'll have to replace entirely with this:

main.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Main function
//
///////////////////////////////////////////////////////////////////////////////

pub fn main() !void {
    var da = std.heap.DebugAllocator(.{}){};
    defer _ = da.deinit();

    const allocator = switch (builtin.mode) {
        .Debug => da.allocator(),
        else => std.heap.smp_allocator,
    };
    _ = allocator;
}

///////////////////////////////////////////////////////////////////////////////
//
//                              Constants, variables
//
///////////////////////////////////////////////////////////////////////////////

const std = @import("std");
const builtin = @import("builtin");

We keep the constants at the bottom of the file, so they don't get too much in the way. Now they're two, but there's often a whole lot of them.

What we're doing for now is define the allocators we'll be using. Code doesn't compile if the variables defined in it aren't being used, Zig never likes that. So for now we have:

_ = allocator;

after we define the constant.

What this code means, at any rate, is that we use the debug allocator in Debug mode, and a much faster allocator in proper release modes.

The builtin.mode defaults to .Debug, so if we simply run

zig build

it will build the program in debug mode. To use the faster allocator we'll need to pass an argument, for example:

zig build -Doptimize=ReleaseSmall # optimize for small binary size
zig build -Doptimize=ReleaseFast # optimize for performance
zig build -Doptimize=ReleaseSafe # optimize for safety

But we'll mostly build in debug mode, because if something goes wrong and the program panics, we'll get the most useful informations about what has caused the panic, such as array access with index out of bounds (it happened often to me while writing the program).

Panic handler

Speaking of panic, we want to add our own panic handler. Normally, if the program panics, it will crash and invoke the default panic handler, which prints a stack trace about the error. We'll need more than that, so we change the panic handler to our own:

main.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Panic handler
//
///////////////////////////////////////////////////////////////////////////////

pub const panic = std.debug.FullPanic(crashed);

fn crashed(msg: []const u8, trace: ?usize) noreturn {
    std.debug.defaultPanic(msg, trace);
}

Since we don't need it for anything yet, what it does is simply to call the default panic handler, passing the same arguments it receives.

You may have noticed that strange return type: noreturn. It means the function doesn't simply return anything, like a void would do, it doesn't return at all. This is so because when this function is called, our program has crashed already, and it couldn't return any value anyway. You shouldn't worry about it because it's the first and last time we'll see it in our program.

What's panic anyway?

When the program encounters an error at runtime, depending on the kind of error, two things may happen:

  • the program crashes (best case)
  • the program keeps running, but its state is corrupted (worst case)

In the second case really nasty things can happen, so we want to avoid bugs at all costs. In safe release modes (Debug and ReleaseSafe), events that would normally cause a crash or undefined behavior cause panic instead. The program terminates and you get a meaningful stack trace of what has caused the error.

Terminal configuration

When we write text in an editor, the character is immediately read and handled by the program. This is not what happens normally in a terminal, because the default way a terminal handles keypresses is the so-called canonical mode: in this mode, keys are sent to the program only after the user presses the Enter key.

Let's write first a function that can read bytes from the user keypresses:

main.zig
// Read from stdin into `buf`, return the number of read characters
fn readChar(buf: []u8) !usize {
    const stdin = std.posix.STDIN_FILENO;
    return try std.posix.read(stdin, buf);
}

This will read from stdin one character at a time, store the read character in buf and return the number of characters that have been read. buf should be a slice, because std.posix.read accepts a slice as parameter.

In general, you'll find out that working with slices will prevent a lot of headaches, because the Zig type system is very strict, but most functions of the standard libraries that work with arrays are designed to take a slice as parameter. You still keep the ownership of the underlying array, of course.

Remeber that to pass a slice of an array to a function we use one of the following notations:

    &array      // create the slice by taking the address of an array
    array[0..]  // a slice with all elements of an array

Let's call it from main() by adding these lines:

    };
    _ = allocator;
    var buf: [1]u8 = undefined;
    while (try readChar(&buf) == 1 and buf[0] != 'q') {}

If you build and run the program in a terminal, you'll see that even if you press q the loop doesn't stop, you need to press Enter, and if you press any key after q, you'll find those characters in your command line prompt.

So you'll understand the need to change how the terminal sends what it reads to our program, and this is what raw mode is for.

For this purpose, we'll create a new module for our program, we'll call it linux, and it will handle all interactions with the operating system, such as reading characters.

The linux module

Create a file src/linux.zig and paste the following content:

linux.md
//! Module that handles interactions with the operating system.

///////////////////////////////////////////////////////////////////////////////
//
//                              Raw mode
//
///////////////////////////////////////////////////////////////////////////////

/// Enable terminal raw mode, return previous configuration.
pub fn enableRawMode() !linux.termios {
    const orig_termios = try posix.tcgetattr(STDIN_FILENO);

    // stuff here

    return orig_termios;
}

/// Disable terminal raw mode by restoring the saved configuration.
pub fn disableRawMode(termios: linux.termios) void {
    posix.tcsetattr(STDIN_FILENO, .FLUSH, termios) catch @panic("Disabling raw mode failed!");
}

///////////////////////////////////////////////////////////////////////////////
//
//                              Constants, variables
//
///////////////////////////////////////////////////////////////////////////////

const std = @import("std");
const linux = std.os.linux;
const posix = std.posix;

const STDOUT_FILENO = posix.STDOUT_FILENO;
const STDIN_FILENO = posix.STDIN_FILENO;

For now, we have two functions:

enableRawModeshould change the terminal configuration, switching away from canonical mode, then should return the original configuration
disableRawModeshould restore the original configuration

We have to fill the enableRawMode function, since right now it's not doing anything.

Enabling raw mode

Note

The original booklet I mentioned in the introduction goes into great detail in explaining what all the flags mean. I have no intention to do that, if you are curious about them you can consult the original.

First we make a copy of the original configuration, so that we can modify it.

    // stuff here
    // make a copy
    var termios = orig_termios;

We then set a number of flags in this copy. We disable echoing of the characters we type:

    termios.lflag.ECHO = false; // don't echo input characters

We disable canonical mode, so that the terminal doesn't wait for Enter to be pressed when reading characters:

    termios.lflag.ICANON = false; // read input byte-by-byte instead of line-by-line

We disable some key combinations that usually have a special behavior in terminals, so that are available for us to use them in our program:

    termios.lflag.ISIG = false; // disable Ctrl-C and Ctrl-Z signals
    termios.iflag.IXON = false; // disable Ctrl-S and Ctrl-Q signals
    termios.lflag.IEXTEN = false; // disable Ctrl-V
    termios.iflag.ICRNL = false; // CTRL-M being read as CTRL-J

For reference:

keydefault behavior
Ctrl-Csends a SIGINT signal that causes the program to terminate
Ctrl-Zsends a SIGSTOP signal which causes the suspension of the program (which you can then resume with fg in the terminal command line)
Ctrl-Sproduces XOFF control character, halts data transmission
Ctrl-Qproduces XON control character, resumes data transmission
Ctrl-Vnext character will be inserted literally
Ctrl-Mread as ASCII 10 Ctrl-J instead of 13 Enter

Let's disable output processing, to prevent the terminal to issue a carriage return (\r) in addition to each new line (\n) when Enter is pressed:

    termios.oflag.OPOST = false; // disable output processing

You can see that the termios flags are placed into structs that start either with i (input, as in iflags) or o (output, as in oflags).

Let's disable more flags, which are even more obscure than the previous ones and that I won't even try to explain (sorry):

    termios.iflag.BRKINT = false; // break conditions cause SIGINT signal
    termios.iflag.INPCK = false; // disable parity checking (obsolete?)
    termios.iflag.ISTRIP = false; // disable stripping of 8th bit
    termios.cflag.CSIZE = .CS8; // set character size to 8 bits

From the original booklet

This step probably won’t have any observable effect for you, because these flags are either already turned off, or they don’t really apply to modern terminal emulators. But at one time or another, switching them off was considered (by someone) to be part of enabling “raw mode”, so we carry on the tradition (of whoever that someone was) in our program.

As far as I can tell:

  • When BRKINT is turned on, a break condition will cause a SIGINT signal to be sent to the program, like pressing Ctrl-C.
  • INPCK enables parity checking, which doesn’t seem to apply to modern terminal emulators.
  • ISTRIP causes the 8th bit of each input byte to be stripped, meaning it will set it to 0. This is probably already turned off.
  • CS8 is not a flag, it is a bit mask with multiple bits, which we set using the bitwise-OR (|) operator unlike all the flags we are turning off. It sets the character size (CS) to 8 bits per byte. On my system, it’s already set that way.

A timeout for read()

Finally, we want to set a timeout for read(), so that our editor will be able to discern an Esc from an escape sequence. In fact, all terminal escape sequences that codify for many keys begin with an Esc (that's why they are called escape sequences), and we want to be able to handle them accordingly.

Here we use some constants that are defined in std.os.linux. Since they are in an enum, we'll have to use the builtin function @intFromEnum() so that we can use them for array indexing (which expects an usize type).

    // Set read timeouts
    termios.cc[@intFromEnum(linux.V.MIN)] = 0; // Return immediately when any bytes are available
    termios.cc[@intFromEnum(linux.V.TIME)] = 1; // Wait up to 0.1 seconds for input

Important

This took me hours to figure out. The original kilo editor uses constants that come from the libc termios.h header, but initially I simply used the values from the C version, thinking they would apply also for the Zig version. They didn't work, that is, there was no read timeout. I initially asked the AI, and it didn't help. I then looked for other Zig implementations of this same editor on the internet, but all of them repeated this mistake, until I found one implementation that did the right thing, that is, to use the constants that are provided by the Zig standard library (what is being done in the snippet of code above).

The lesson was: don't try to reinvent a system-defined constant, use the system-defined constant, even if it means that you must look for it in the standard library.

We're done, we can apply the new terminal configuration and return the original one:

    // update config
    try posix.tcsetattr(STDIN_FILENO, .FLUSH, termios);
    return orig_termios;

Back to main.zig

We left our main function in this state:

main.zig
pub fn main() !void {
    var da = std.heap.DebugAllocator(.{}){};
    defer _ = da.deinit();

    const allocator = switch (builtin.mode) {
        .Debug => da.allocator(),
        else => std.heap.smp_allocator,
    };
    _ = allocator;

    var buf: [1]u8 = undefined;
    while (try readChar(&buf) == 1 and buf[0] != 'q') {}
}

Now we want to enable raw mode, right? And it's the first thing that our main function will do. Add these lines at the top of it:

pub fn main() !void {
    orig_termios = try linux.enableRawMode();
    defer linux.disableRawMode(orig_termios);

The defer statement is important because we want to restore the original configuration when the program exits. We also want to update the bottom section with our new variables. Add this at the bottom of the file:

const linux = @import("linux.zig");

var orig_termios: std.os.linux.termios = undefined;

Reminder

When variables and constants are placed at the root level of a file, that is, outside any functions, they behave like static identifiers in C, only visible to the code of the current file, unless they have the pub qualifier, meaning they can be accessed from files that import the current one.

Moreover, if the module is meant to be instantiated (it has fields defined at the root level), these variable and constants are, again, static, not part of the instances: all instances will share the same value, which is quite obvious for constants, less so for variables.

Why is it important to define orig_termios at the root level? Because we want to handle another case: our program crashes, and we don't want to leave the terminal in an unusable state if that happens. We'll have to update our crash handler as well:

/// Our panic handler disables terminal raw mode and calls the default panic
/// handler.
fn crashed(msg: []const u8, trace: ?usize) noreturn {
    linux.disableRawMode(orig_termios);

As you can see, also this function needs to access the original terminal configuration, and there's no way to pass it with an argument, it must read it from a variable.

Now, if you try to build and run the project, something strange happens: the program terminates immediately.

Can you guess why?

Because of the timeout to read() in enableRawMode(). If you comment out the two lines where the timeout is set, you can recompile, run, and see that the prompt keeps reading characters until you press q, only then it terminates.

Getting the window size

Note

Before we proceed, delete the last 2 lines in the main functions (the ones that read the from input) and the readChar() function as well, we won't need them anymore.

We went past raw mode, which was possibly annoying. Unfortunately we must take care of the low level code before we can proceed to code the actual editor. And there's still a good bit to come.

Before we can draw anything on the screen, we must know its size, the number of rows and columns.

There are two ways to do this, with the second method that will be attempted in the case that the first one fails.

The first method involves calling the linux ioctl function to request the window size from the operating system.

The fallback method involves determining the cursor position in a maximized window.

The ioctl method

We'll first create two new modules:

types.zighub for all the custom types of our editor
ansi.zighandles ansi escape sequences

In src/types.zig we'll write this:

types.zig
//! Collection of types used by the editor.

///////////////////////////////////////////////////////////////////////////////
//
//                              Editor types
//
///////////////////////////////////////////////////////////////////////////////

/// Dimensions of the terminal screen where the editor runs.
pub const Screen = struct {
    rows: usize = 0,
    cols: usize = 0,
};

Important

Zig supports default initializers in structs, but with some catch... more on this later.

In src/ansi.zig we'll write this:

ansi.zig
//! Module that handles ansi terminal sequences.

///////////////////////////////////////////////////////////////////////////////
//
//                              Functions
//
///////////////////////////////////////////////////////////////////////////////

/// Get the window size.
pub fn getWindowSize() !t.Screen {
    // code to come...
}

///////////////////////////////////////////////////////////////////////////////
//
//                              Constants, variables
//
///////////////////////////////////////////////////////////////////////////////

const std = @import("std");
const linux = @import("linux.zig");
const t = @import("types.zig");

We should fill the getWindowSize() function.

ansi.zig: getWindowSize()
    var screen: t.Screen = undefined;
    var wsz: std.posix.winsize = undefined;

    if (linux.winsize(&wsz) == -1 or wsz.col == 0) {
        // fallback method will be here
    } else {
        screen = t.Screen{
            .rows = wsz.row,
            .cols = wsz.col,
        };
    }
    return screen;

Much like in the original C code, we use ioctl() to request the window size of the terminal, and this will be stored in the wsz struct which we pass by reference.

The ioctl() function returns -1 on failure, but we consider a failure also a column value of 0 in the passed wsz struct.

Note that in the second part of the condition (wsz.col == 0) wsz would already have a value because it's assumed that the ioctl() call was successful, since it didn't return -1.

The winsize() function

We'll also have to update our src/linux.zig module to add the winsize() function that is called in getWindowSize():

linux.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Functions
//
///////////////////////////////////////////////////////////////////////////////

/// Read the window size into the `wsz` struct.
pub fn winsize(wsz: *posix.winsize) usize {
    return linux.ioctl(STDOUT_FILENO, linux.T.IOCGWINSZ, @intFromPtr(wsz));
}

To know why std.os.linux.ioctl is invoked like that, we should look for it in the Zig standard library:

std/os/linux.zig
pub fn ioctl(fd: fd_t, request: u32, arg: usize) usize {
    return syscall3(.ioctl, @as(usize, @bitCast(@as(isize, fd))), request, arg);
}

The function doesn't have any documentation, so we just invoke it like we invoked the one in the original written in C, where the call was:

C
    if (ioctl(STDOUT_FILENO, TIOCGWINSZ, &wsz) == -1 || wsz.ws_col == 0)

The TIOCGWINSZ is replaced by the linux.T.IOCGWINSZ constant, found in std.os.linux module of the Zig standard library.

The other difference is the third argument, that is usize in Zig, so we must do a pointer cast to integer:

@intFromPtr(wsz)

Reminder

Remember to mark functions with the pub qualifier when they are called by other modules.

Note

I put this function in linux module because I preferred to keep all the low level interactions with the operating system in it.

The cursor position method

In case of failure, we'll have to resort to a second method.

We replace the commented line in getWindowSize() with:

    if (linux.winsize(&wsz) == -1 or wsz.col == 0) {
        // fallback method will be here
    if (linux.winsize(&wsz) == -1 or wsz.col == 0) {
        screen = try getCursorPosition();

Our getCursorPosition() function also goes just below getWindowSize():

ansi.zig
/// Get the cursor position, to determine the window size.
pub fn getCursorPosition() !t.Screen {
    // code to come...
}

What should we do in there? The idea is to maximize the terminal screen, so that the cursor is positioned to the bottom-right corner, and read the current row and column from there.

For both things, we need issue escape sequences to the terminal.

  • to maximize the screen, we'll issue two sequences in a row, one to set the columns and one to set the rows.

  • to read the cursor position, we'll issue a sequence, and read the response of the terminal in a []u8 buffer

ANSI escape sequences

We'll define the following constants in ansi.zig:

ansi.zig
/// Control Sequence Introducer: ESC key, followed by '[' character
pub const CSI = "\x1b[";

/// The ESC character
pub const ESC = '\x1b';

// Sets the number of column and rows to very high numbers, trying to maximize
// the window.
pub const WinMaximize = CSI ++ "999C" ++ CSI ++ "999B";

// Reports the cursor position (CPR) by transmitting ESC[n;mR, where n is the
// row and m is the column
pub const ReadCursorPos = CSI ++ "6n";

linux.write()

How exactly do we send these sequences? We're back into linux.zig.

linux.zig
// Write bytes to stdout, return error if the requested amount of bytes
// couldn't be written.
pub fn write(buf: []const u8) !void {
    if (try posix.write(STDOUT_FILENO, buf) != buf.len) {
        return error.WriteIncomplete;
    }
}

WriteIncomplete in this case is an error I just made up, probably it's not a very good way to handle incomplete writes, in the sense that we should probaby retry. In my defense, I can say that the original C editor did this:

C
  if (write(STDOUT_FILENO, "\x1b[6n", 4) != 4) return -1;

which means that it gave up all the same. Hey... I think we're trying hard enough already. At least for our humble editor, that is.

Back to getCursorPosition()

Now it's hopefully clear what we'll do:

  1. issue sequences to maximize screen and to report cursor position
  2. read the response in a []u8 buffer
  3. parse the result, to extract the screen size
ansi.zig: getCursorPosition()
    var buf: [32]u8 = undefined;

    try linux.write(WinMaximize ++ ReadCursorPos);

    var nread = try linux.readChars(&buf);

What's that readChars() over there?

This is actually the function that we'll use to read all input from stdin, so it's worth taking care of it right now. It's not too different from the readChar() function we wrote in main.zig and that we carelessly deleted when we didn't need it anymore.

linux.readChars()

linux.zig
/// Keep reading from stdin until we get a valid character, ignoring
/// .WouldBlock errors.
pub fn readChars(buf: []u8) !usize {
    while (true) {
        const n = posix.read(STDIN_FILENO, buf) catch |err| switch (err) {
            error.WouldBlock => continue,
            else => return err,
        };
        if (n >= 1) return n;
    }
}

Let's compare it with the previous readChar() function which was:

// Read from stdin into `buf`, return the number of read characters.
fn readChar(buf: []u8) !usize {
    return try posix.read(STDIN_FILENO, buf);
}

The main difference is that now we are in raw mode, and there is a read() timeout in place, so we must handle the error which happens when the timeout kicks in. This error is .WouldBlock, and we must ignore it, that is, we must keep reading until we read something, or a different error is returned by posix.read().

If posix.read() finally returns a positive number because it read something, we return it. If it didn't read anything, it's probably because we didn't type anything, and the loop continues.

Back to getCursorPosition()

So now we got the response from the terminal, and we read it inside our []u8 buffer.

ansi.zig: getCursorPosition()
    var nread = try linux.readChars(&buf);
    if (nread < 5) return error.CursorError;

For a response to be valid, it should follow this format:

ESC ] rows ; cols R

for example, 0x1b]50;120R. This sequence has a minimum of 5 characters, plus the final R. I think in some occasions I couldn't read the R character immediately, but maybe I've been doing something wrong? Anyway this is what we do:

ansi.zig: getCursorPosition()
    // we should ignore the final R character
    if (buf[nread - 1] == 'R') {
        nread -= 1;
    }
    // not there yet? we will ignore it, but it should be there
    else if (try linux.readChars(buf[nread..]) != 1 or buf[nread] != 'R') {
        return error.CursorError;
    }

That is, we keep reading until we get this R character, if it's not yet in our buffer. Since we don't want to overwrite our previous response, we pass a slice that starts at nread, which is the number of characters that have been read until now. When R is finally read, buf[nread] should hold it.

If the first two characters aren't ESC ], we error out:

ansi.zig: getCursorPosition()
    if (buf[0] != ESC or buf[1] != '[') return error.CursorError;

Finally we must parse the number of rows and columns. The original C code used sscanf() for this purpose, but we won't use libc in this project. We parse it by hand.

ansi.zig: getCursorPosition()
    var screen = t.Screen{};
    var semicolon: bool = false;
    var digits: u8 = 0;

    // no sscanf, format to read is "row;col"
    // read it right to left, so we can read number of digits
    // stop before the CSI, so at index 2
    var i = nread;
    while (i > 2) {
        i -= 1;
        if (buf[i] == ';') {
            semicolon = true;
            digits = 0;
        }
        else if (semicolon) {
            screen.rows += (buf[i] - '0') * try std.math.powi(usize, 10, digits);
            digits += 1;
        } else {
            screen.cols += (buf[i] - '0') * try std.math.powi(usize, 10, digits);
            digits += 1;
        }
    }
    if (screen.cols == 0 or screen.rows == 0) {
        return error.CursorError;
    }
    return screen;

If you did programming exercises before, this method of parsing integers should be familiar. The Zig standard library has a function for this purpose (std.fmt.parseInt), but in this case it wouldn't have spared us much trouble. There's a semicolon between the numbers, and we would have needed to track the start and end position of both numbers.

First test

Important

This test will be special because it needs an interactive terminal, it will not be executed with:

zig build test

but with:

zig test src/term_tests.zig

It will be the only test of this kind, unfortunately it's also the first one.

We want to test if our functions work. Specifically, we'll test the getWindowSize() and getCursorPosition(), which also test setting raw mode and readChars() along the way.

We'll add a couple of constants at the bottom of ansi.zig:

ansi.zig
const builtin = @import("builtin");

// CSI sequence to clear the screen.
pub const ClearScreen = CSI ++ "2J" ++ CSI ++ "H";

We'll create a new file named src/term_tests.zig, with this content:

term_tests.zig
//! Additional tests that need an interactive terminal, not testable with:
//!
//!     zig build test
//!
//! Must be tested with:
//!
//!     zig test src/term_tests.zig

test "getWindowSize" {
    const orig_termios = try linux.enableRawMode();
    defer linux.disableRawMode(orig_termios);

    const s1 = try ansi.getWindowSize();
    try std.testing.expect(s1.rows > 0 and s1.cols > 0);
    const s2 = try ansi.getCursorPosition();
    try linux.write(ansi.ClearScreen);
    try std.testing.expect(s1.rows == s2.rows and s1.cols == s2.cols);
}

const std = @import("std");
const linux = @import("linux.zig");
const ansi = @import("ansi.zig");

We'll clear the screen after having called the second method, because that function call has the side-effect of maximizing the terminal screen, which messes up the output of the test result.

To ensure that our getWindowSize() works and doesn't fallback, we must add a check in that function:

ansi.zig: getWindowSize()
    if (linux.winsize(&wsz) == -1 or wsz.col == 0) {
        if (builtin.is_test) return error.getWindowSizeFailed;

This will cause the function to error out, if the ioctl method fails. We will then get the window size with the fallback method, and ensure the resulting sizes are the same.

Digression: the comptime keyword

You probably know of comptime in Zig. Here we have an application of the concept: since the builtin.is_test variable is evaluated at compile time, the whole branch in getWindowSize() can be resolved at compile time, the relative code will be removed and will not be executed at runtime.

This has the same effect of an #ifdef block in C for conditional compilation, but the syntax looks much less intrusive. You can even force any expression to be evaluated at compile time by using the comptime keyword before the expression, but here it's not needed, because the builtin.is_test variable is guaranteed to be compile-time known.

While using the comptime keyword, sometimes the compiler complains that using the keyword is redundant, because the expression is always compile-time known, other times it doesn't complain, as in the case above, even if I'm pretty sure that all builtin variables are compile-time known. We saw another example in the main() function, where the allocator was chosen by testing the builtin.mode variable.

To my understanding, also from reading several posts made by the original creator of Zig (Andrew Kelley), most of the time it's not necessary to use the keyword, the compiler is smart enough to evaluate at compile time what it can, even if you don't specify it expressly. But sometimes the compiler says:

error: unable to resolve comptime value

In these cases the comptime keyword might fix the issue.

Bottom line: don't be compulsive in filling your code with comptime, it's not necessary.

The editor types

Now it's time for the first steps towards the creation of our editor.

The original C code of kilo is single-file, with a global variable E that holds the Editor struct, and all functionalities are implemented there. Initially I wrote this program pretty much in the same way, and it worked, as a demonstration that you can write code in Zig that uses global variables, just as in old-fashioned C programs.

In Zig you can also (and probably should) use instantiable types, which then are used in a OOP fashion by omitting the first argument when this is of the same type, either passed by reference or by value. You should know this already, so I won't elaborate.

It may be useful to remind that a Zig module is essentially a struct, that is, you can think the content of a file as wrapped in

struct {
    // the file content
}

which means that we can define at the root level of a file the members of our type, then treat the whole file as an instantiable type. That's what we'll do with the main types of our editor, which will be:

Editorfor the editor functionalities
Bufferfor the file contents
Roweach row of the buffer
Viewtracks cursor position and offsets of the editing window

To keep the code simple, we'll code most functionalities in the Editor type, while the others will be lightweight structs that never modify the state of the editor.

The types module holds all our types

Even though each of these types will have its own importable module, all other modules will access them through the src/types.zig module, that serves as a centralized hub for all our types.

We can do this because the program is small, but probably it wouldn't be a wise thing to do in a large program. Still, also the Zig standard library often makes types defined in submodules accessible from the root module. An example is std.ArrayList.

We'll do it right away, open the types module and add this below the Screen definition:

types.zig
pub const Editor = @import("Editor.zig");
pub const Buffer = @import("Buffer.zig");
pub const Row = @import("Row.zig");
pub const View = @import("View.zig");

The files don't exist yet, but we'll create them soon. We'll build up the types little by little, adding more stuff only when we need it.

We also create a section for other miscellaneous types:

types.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Other types
//
///////////////////////////////////////////////////////////////////////////////

/// A dynamical string.
pub const Chars = std.ArrayList(u8);

And the usual Constants section:

types.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Constants, variables
//
///////////////////////////////////////////////////////////////////////////////

const std = @import("std");

The Editor type

Create src/Editor.zig and let's start adding the struct members. This is an instantiable module, that's why the filename starts with a capital letter. It's not enforced, but it's the Zig convention for types to be capitalized.

At the top, we add comments followed by an exclamation mark: it's the module description. Such special comments may be used for documentation generation.

Editor.zig
//! Type that manages most of the editor functionalities.
//! It draws the main window, the statusline and the message area, and controls
//! the event loop.

At the bottom, we put our usual section with constants:

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Constants, variables
//
///////////////////////////////////////////////////////////////////////////////

const Editor = @This();

const std = @import("std");

const t = @import("types.zig");

@This() is a builtin function that returns the type of the struct. It is capitalized like all functions that return types. This constant means that in this file Editor refers to the same type we're defining. Others prefer to name such constants Self. I prefer more descriptive names.

Fields

Back at the top, below the module description, we start adding the type members:

Editor.zig
/// Allocator used by the editor instance
alc: std.mem.Allocator,

We'll use a single allocator for now, the Editor will pass its own to the types that will require it. We call the field simply alc, because it will be passed so often as argument, that I prefer to keep the name short.

Editor.zig
/// The size of the terminal window where the editor runs
screen: t.Screen,

/// Text buffer the user is currently editing
buffer: t.Buffer,

/// Tracks cursor position and part of the buffer that fits the screen
view: t.View,

/// Becomes true when the main loop should stop, causing the editor to quit
should_quit: bool,

We didn't create the Buffer nor the View type yet. should_quit is the variable that we'll use to control the main event loop. When this variable becomes true, the loop is interrupted and the program quits.

Initialization

Now we'll create functions to initialize/deinitialize the editor:

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Init/deinit
//
///////////////////////////////////////////////////////////////////////////////

/// Return the initialized editor instance.
pub fn init(allocator: std.mem.Allocator, screen: t.Screen) !Editor {
    return .{
        .alc = allocator,
        .screen = .{
            .rows = screen.rows - 2, // make room for statusline/message area
            .cols = screen.cols,
        },
        .buffer = try t.Buffer.init(allocator),
        .view = .{},
        .should_quit = false,
    };
}

This is a simple init() function that returns a new instance of the Editor. It's not a method because its first argument is not of type Editor. It is invoked in this way:

    var editor = Editor.init(allocator, screen);

The deinit() function, on the other hand, is a proper method, because it is used to deinitialize an instance.

Editor.zig
/// Deinitialize the editor.
pub fn deinit(e: *Editor) void {
    e.buffer.deinit();
}

Accordingly, it is invoked like this:

    editor.deinit();

Everything that has used an allocator should be deinitialized here. If you forget to deinitialize/deallocate something, while still using the DebugAllocator, you'll be told when exiting the program that your program has leaked memory, and the relative stack trace.

We'll also add a method called startUp(). This function will handle the event loop, and is also called from main().

Editor.zig
/// Start up the editor: open the path in args if valid, start the event loop.
pub fn startUp(e: *Editor, path: ?[]const u8) !void {
    if (path) |name| {
        _ = name;
        // we open the file
    }
    else {
        // we generate the welcome message
    }

    while (e.should_quit == false) {
        // refresh the screen
        // process keypresses
    }
}

It's only a stub, but you can see what it should do.

Before continuing the Editor type, we must define the other ones.

The Buffer type

Description at the top:

Buffer.zig
//! A Buffer holds the representation of a file, divided in rows.
//! If modified, it is marked as dirty until saved.

Let's add the constants: as usual, they'll stay at the bottom.

Also here we set a constant to @This(), so that we can refer to our type inside its own definition.

Buffer.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Constants, variables
//
///////////////////////////////////////////////////////////////////////////////

const Buffer = @This();

const std = @import("std");
const t = @import("types.zig");

/// Initial allocation size for Buffer.rows
const initial_rows_capacity = 40;

Fields

Also in this case, as you can see, no default initializers.

Some members are optional, meaning that they can be null, and null will be their initial value when the Buffer is initialized.

Buffer.zig
alc: std.mem.Allocator,

// Modified state
dirty: bool,

// Buffer rows
rows: std.ArrayList(t.Row),

// Path of the file
filename: ?[]u8,

// Name of the syntax
syntax: ?[]const u8,

Initialization

All in all, this type is quite simple. It doesn't handle single row initialization, because rows are created and inserted by the Editor, but it will deinitialize them. Possibly I'm doing a questionable choice here, maybe I should let the Buffer initialize the single rows, since it's here that they're freed at last. Especially if we intend to give a Buffer its own different allocator (an arena allocator probably would fit it best). But it's a small detail, since the Editor can access the Buffer allocator just fine, since there are no private fields in Zig.

Buffer.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Init/deinit
//
///////////////////////////////////////////////////////////////////////////////

pub fn init(allocator: std.mem.Allocator) !Buffer {
    return Buffer{
        .alc = allocator,
        .rows = try .initCapacity(allocator, initial_rows_capacity),
        .dirty = false,
        .filename = null,
        .syntax = null,
    };
}

pub fn deinit(buf: *Buffer) void {
    t.freeOptional(buf.alc, buf.filename);
    t.freeOptional(buf.alc, buf.syntax);
    for (buf.rows.items) |*row| {
        row.deinit(buf.alc);
    }
    buf.rows.deinit(buf.alc);
}

There is one new function, freeOptional(), which we didn't define yet. It's a simple helper, but it doesn't harm to have some helper functions. I put it in the types module, right above the bottom section:

types.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Functions
//
///////////////////////////////////////////////////////////////////////////////

/// Free an optional slice if not null.
pub fn freeOptional(allocator: std.mem.Allocator, sl: anytype) void {
    if (sl) |slice| {
        allocator.free(slice);
    }
}

Note

I put this function here only because the types module is accessed by most other modules, so it's easily accessible. But since it doesn't return a Type, I think it's slightly misplaced.

The Row type

Description at the top:

Row.zig
//! A Row contains 3 arrays, one for the actual characters, one for how it is
//! rendered on the screen, and one with the highlight of each element of the
//! rendered array.

Constants:

Row.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Constants, variables
//
///////////////////////////////////////////////////////////////////////////////

const Row = @This();

const std = @import("std");

const t = @import("types.zig");

const initial_row_size = 80;

Also for this type, we keep it simple: no operations are performed by it. We will add more things to this type as soon as we need them, this is only a partial implementation.

Row.zig
/// The ArrayList with the actual row characters
chars: t.Chars,

/// Array with the visual representation of the row
render: []u8,

///////////////////////////////////////////////////////////////////////////////
//
//                              Init/deinit
//
///////////////////////////////////////////////////////////////////////////////

pub fn init(allocator: std.mem.Allocator) !Row {
    return Row{
        .chars = try .initCapacity(allocator, initial_row_size),
        .render = &.{},
    };
}

pub fn deinit(row: *Row, allocator: std.mem.Allocator) void {
    row.chars.deinit(allocator);
    allocator.free(row.render);
}

Some explanations:

  • our chars field is a dynamic string, it contains the actual characters of the row, it expands or shrinks as characters are typed/deleted. We set an initial capacity, to reduce the need for later allocations.

  • the render field is a simple array of u8. This is probably not optimal, but we'll see later if we can improve the implementation. The point is that this array doesn't need to grow dynamically, when it is updated its new size can be precalculated, so at most it would need a single reallocation, which may result in no new allocation at all. For now we keep it simple.

  • as usual, the init() function returns a new instance, the deinit() method frees the memory.

We also add some methods that will help us keeping code concise:

Row.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Methods
//
///////////////////////////////////////////////////////////////////////////////

/// Length of the real row.
pub fn clen(row: *Row) usize {
    return row.chars.items.len;
}

/// Length of the rendered row.
pub fn rlen(row: *Row) usize {
    return row.render.len;
}

Zero-length initialization

In this line

.render = &.{},

you might wonder why that notation: &.{}. It's a zero-length slice. The official documentation says:

A zero-length initialization can always be used to create an empty slice,
even if the slice is mutable. This is because the pointed-to data is zero
bits long, so its immutability is irrelevant.

It's different from initializing a slice to undefined, because here the slice has a known length, which is 0. So you can loop it safely, provided that you check its length and don't access any index, since it's empty.

for (&.{}) |c| {} // ok

I think it's always preferable to initialize a slice this way, rather than with undefined.

The View type

This type only contains fields. This is the full type, nothing more will be added. It tracks the cursor position and the portion of the buffer that is shown in the main window.

View.zig
//! A View of the current buffer is what we can see of it, and where the
//! cursor lies in it. It's basically the editor window where the file is
//! shown.

/// cursor column
cx: usize = 0,

/// cursor line
cy: usize = 0,

/// column in the rendered row
rx: usize = 0,

/// wanted column when moving vertically across shorter lines
cwant: usize = 0,

/// the top visible line, increases as we scroll down
rowoff: usize = 0,

/// the leftmost visible column
coloff: usize = 0,

Digression: default initializers

When defining the Screen type, I wrote that Zig supports default initializers for structs, with a catch. The catch is that they may be the source of illegal behaviors, as stated by the official language reference.

In the example from that link, it's not too clear at first sight what's the problem in that struct is, so it's worth pointing it out.

Given this struct:

const Threshold = struct {
    minimum: f32 = 0.25,
    maximum: f32 = 0.75,

    fn validate(t: Threshold, value: f32) void {
        assert(t.maximum >= t.minimum);
    }
};

If we create a variable like this:

var a = Threshold{ .maximum = 0.2 };

we created a variable where the maximum is smaller than the minimum, and the validate() function would panic at runtime. So if in your code you rely on the assumption that maximum is always greater than minimum, you could fall into some illegal behavior.

For this reason in this program I avoid default initializers for complex types that have methods, which may access those values. I only use them for simple types without methods, because it's hard to give up the convenience of being able to write:

var a = SomeType{};

For more complex types I use a init() function that returns the instance, as it's customary to have such functions, and set default values there.

undefined as default value

undefined is generally used for local variables whose lifetime is limited and obvious, as obvious is the place where they acquire a meaningful value. The compiler will not warn you if you use variable set to undefined. Instead, it will warn you if you don't initialize a member. Therefore you should have a really good reason to set an undefined default value inside structs.

Anyway, why using undefined at all? For example, sometimes you need a variable declared beforehand in an upper scope. In this case, setting it to a value that it is meant to be overwritten would cause confusion: why am I setting it to that value? The intent of undefined is clear instead: this variable must acquire a meaningful value later on.

Sze from the Ziggit forum says:

You are telling the compiler that you want the value to be undefined. And there aren’t enough safety checks yet so that all ways to use such an undefined value would be caught reliably. So for now you have to be careful and it is better to only use undefined, when you are making sure that you are setting it to a valid value before you actually use it. In cases where some field sometimes needs to be set to undefined, it is better to avoid using a field default value for that and instead pass undefined for that field value explicitly during initialization/setup.

Initialize the editor

Let's open main.zig and initialize the editor. Add this to the main() function:

    _ = allocator;
    var e = try t.Editor.init(allocator, try ansi.getWindowSize());
    defer e.deinit();

    var args = std.process.args();
    _ = args.next(); // ignore first arg

    try e.startUp(args.next()); // possible file to open

If you remember, the Editor.init() function had this signature:

pub fn init(allocator: std.mem.Allocator, screen: t.Screen) !Editor

which means that, besides an allocator, it wants to know the size of the screen, which is what getWindowSize() fetches.

If the ansi and types modules aren't being imported, add them to the constants.

main.zig
const t = @import("types.zig");
const ansi = @import("ansi.zig");

Next we process the command line arguments, we skip the first one, since it's the name of our executable, finally we start up the editor passing the second argument, which could be null.

Keypress processing

Before starting to draw anything, let's handle keypresses, because it's easier, shorter, and as a bonus we'll have a way to quit the editor if we build it (remember that our raw mode disables Ctrl-C).

By the way, did you follow my advice to install ctags? Because from now on, we'll move very often from function to function, file to file, and having to spend half a minute to find something kills completely the fun, believe me.

This is our event loop in Editor.startUp():

    while (e.should_quit == false) {
        // refresh the screen
        // process keypresses
    }
    while (e.should_quit == false) {
        // refresh the screen
        try e.processKeypress();
    }

Let's create the function:

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Keys processing
//
///////////////////////////////////////////////////////////////////////////////

/// Process a keypress: will wait indefinitely for readKey, which loops until
/// a key is actually pressed.
fn processKeypress(e: *Editor) !void {
    const k = try ansi.readKey();

    const static = struct {
        var q: u8 = 3;
    };

    switch (k) {
        .ctrl_q => {
            if (static.q > 1) {
                static.q -= 1;
                return;
            }
            try ansi.clearScreen();
            e.should_quit = true;
        },
        else => {},
    }

    // reset quit counter for any keypress that isn't Ctrl-Q
    static.q = 3;
}

This function calls ansi.readKey() (which we didn't write yet), then handle the keypress. The only keypress that we handle for now is Ctrl-Q, and we want to press it 3 times in a row before quitting.

It needs to import ansi.zig:

Editor.zig
const ansi = @import("ansi.zig");

Static variables in Zig

See that static struct? Zig doesn't have the concept of static variables that are local to a function, like in C. But you can achieve the same effect by declaring a constant struct inside the function, and define variables (not fields!) inside of it. You don't need to call it static, of course, it can have any name.

And that .ctrl_q? It's an enum field, of an enum that we didn't write yet.

Things we're missing

We can't compile yet. We must add:

  • Key enum
  • ansi.readKey()
  • ansi.clearScreen()

The Key enum

Let's start with the enum, it will be placed in the 'other types' section of the types module:

types.zig
/// ASCII codes of the keys, as they are read from stdin.
pub const Key = enum(u8) {
    ctrl_b = 2,
    ctrl_c = 3,
    ctrl_d = 4,
    ctrl_f = 6,
    ctrl_g = 7,
    ctrl_h = 8,
    tab = 9,
    ctrl_j = 10,
    ctrl_k = 11,
    ctrl_l = 12,
    enter = 13,
    ctrl_q = 17,
    ctrl_s = 19,
    ctrl_t = 20,
    ctrl_u = 21,
    ctrl_z = 26,
    esc = 27,
    backspace = 127,
    left = 128,
    right = 129,
    up = 130,
    down = 131,
    del = 132,
    home = 133,
    end = 134,
    page_up = 135,
    page_down = 136,
    _,
};

This is a non-exhaustive enum: it has an underscore as last element.

Generally, enums are a strongly namespaced type. You can't infer an integer from it, if that enum doesn't have a member with that value. Non-exhaustive enums are more permissive: they are like a set of all integers of a certain type, some of which have been given a name.

This means that we will be able to cast an integer to an enum member (with @enumFromInt), even if the enum doesn't have a member for that integer.

Why do we want this? Because we aren't going to give a name to all possible keys:

  • readKey() will read u8 characters through readChars()
  • readKey() will return a Key, so it must be able turn any u8 character into a Key enum member

But this character may be a letter, a digit, or anything that doesn't have a field in that enum. We want to full interoperation with all possible u8 values.

Reading keys

The ansi module needs a new constant, from the Zig standard library:

ansi.zig
const asc = std.ascii;

Let's write ansi.readKey().

ansi.zig
/// Read a character from stdin. Wait until at least one character is
/// available.
pub fn readKey() !t.Key {
    // code to come...
}

We'll use a [4]u8 buffer to store the keys that will be read. We'll feed this to the same readChars() that we've used before.

ansi.zig: readKey()
    // we read a sequence of characters in a buffer
    var seq: [4]u8 = undefined;
    const nread = try linux.readChars(&seq);

    // if the first character is ESC, it could be part of an escape sequence
    // in this case, nread will be > 2, that means that more than two
    // characters have been read into the buffer, and it's an escape sequence
    // for sure, if we can't recognize this sequence we return ESC anyway

If you remember, that function has a loop that ignores .WouldBlock errors, and it's guaranteed to read at least one byte from stdin before returning. If the keypress is a special key which uses CSI escape sequences, there will be more characters. We read up to 4 characters, then we decide what to do with them.

You can verify that the sequences are correct by opening a terminal, pressing Ctrl-V and then the key. For example:

keyssequencecharacter-by-character
Ctrl-VLeft^[[DESC [ D
Ctrl-VDel^[[~3ESC [ ~ 3

We use @enumFromInt to cast a character in the sequence to a Key enum member, which might not be defined, but it won't be a problem since our enum is non-exhaustive.

ansi.zig: readKey()
    const k: t.Key = @enumFromInt(seq[0]);

Note that this function doesn't guarantee that we interpret all possible escape sequences: if a sequence isn't recognized, ESC is returned.

We also handle the case that more than one character has been read, but it's not an escape sequence (nread > 1). It's possibly a multi-byte character and we don't handle those, so we return ESC.

If instead it's a single character, it is returned as-is.

ansi.zig: readKey()
    if (k == .esc and nread > 2) {
        if (seq[1] == '[') {
            if (nread > 3 and asc.isDigit(seq[2])) {
                if (seq[3] == '~') {
                    switch (seq[2]) {
                        '1' => return .home,
                        '3' => return .del,
                        '4' => return .end,
                        '5' => return .page_up,
                        '6' => return .page_down,
                        '7' => return .home,
                        '8' => return .end,
                        else => {},
                    }
                }
            }
            switch (seq[2]) {
                'A' => return .up,
                'B' => return .down,
                'C' => return .right,
                'D' => return .left,
                'H' => return .home,
                'F' => return .end,
                else => {},
            }
        }
        else if (seq[1] == 'O') {
            switch (seq[2]) {
                'H' => return .home,
                'F' => return .end,
                else => {},
            }
        }
        return .esc;
    }
    else if (nread > 1) {
        return .esc;
    }
    return k;

clearScreen()

We also add a clearScreen() function:

ansi.zig
/// Clear the screen.
pub fn clearScreen() !void {
    try linux.write(ClearScreen);
}

At this point, if we compile and run we should get an empty prompt, if we then press Ctrl-Q three times in a row the program should clear the screen and quit.

Reading and Writing

Before we can draw anything, we must be able to open a file, read all of its lines and store them in our Buffer.

In main(), the first command line argument is passed to the Editor.startUp() function. If it is non-null, the file will be opened if existing.

To handle read/write operations, we'll use the Io.Reader and Io.Writer interfaces. They have methods to process incoming/outcoming data and can do buffered reading and writing. They are interfaces, meaning that independently from what they are attached to, they have the same way of operating. So if you read from stdin or from a file, you'll have access to the same ways of processing data.

They have been only recently added to the Zig standard library and are a vast subject, so I will only mention that they exist, and that we'll be using them for some tasks.

For now we can only read a file, because we don't have the means to fill our Buffer rows yet.

Opening a file

In order, we're going to:

  • update our buffer filename, to match the path of the file we're going to open

  • try to open the file itself and read its lines

  • if that fails, we start editing an empty file with the given name

Let's update our Editor.startUp():

Editor.zig: startUp()
    if (path) |name| {
        _ = name;
        // we open the file
    }
    if (path) |name| {
        try e.openFile(name);
    }

Just below startUp(), we inaugurate a new section for file operations, and we add an openFile() function:

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              File operations
//
///////////////////////////////////////////////////////////////////////////////

/// Open a file with `path`.
fn openFile(e: *Editor, path: []const u8) !void {
    // code to come...
}

Naming the buffer

We update the buffer name from the path argument:

Editor.zig: openFile()
    var B = &e.buffer;

    // store the filename into the buffer
    B.filename = try e.updateString(B.filename, path);

To update the filename, we write a helper function (I put the Helpers section at the bottom, above the Constants section):

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Helpers
//
///////////////////////////////////////////////////////////////////////////////

/// Update the string, freeing the old one and allocating from `path`.
fn updateString(e: *Editor, old: ?[]u8, path: []const u8) ![]u8 {
    t.freeOptional(e.alc, old);
    return try e.alc.dupe(u8, path);
}

For now we can't rename a buffer, so the old filename will always be null. Which is OK only because we made our Buffer.filename an optional type.

Open the file

After having stored the new filename into the Buffer, we try to open the file. std.fs.cwd().openFile() is how we open files, and it works on both relative and absolute paths, so we don't have to worry about that.

Editor.zig: openFile()
    // read lines if the file could be opened
    const file = std.fs.cwd().openFile(path, .{ .mode = .read_only });
    if (file) |f| {
        defer f.close();
        try e.readLines(f);
    }

openFile() expects an OpenMode enum value, which is one of the following:

std.fs.File
pub const OpenMode = enum {
    read_only,
    write_only,
    read_write,
};

We're opening to read, so our .mode is .read_only.

The function openFile() returns an error union, so we must do a capture on our if statement, to get the value, or handle the error. If the file doesn't exist (error.FileNotFound) we don't want to quit, instead we assume we're editing a new file. If the file exists, we read its lines, without forgetting to close() the file handle.

Editor.zig: openFile()
    else |err| switch (err) {
        error.FileNotFound => {}, // new unsaved file
        else => return err,
    }

Io.Reader

std.fs.File implements the Io.Reader interface, so we'll use that to read its lines. A simple pattern would be like the following:

Editor.zig
/// Read all lines from file.
fn readLines(e: *Editor, file: std.fs.File) !void {
    _ = e;
    var buf: [1024]u8 = undefined;
    var reader = file.reader(&buf);

    while (reader.interface.takeDelimiterExclusive('\n')) |line| {
        // we print the line to stderr, to see if it works
        std.debug.print("{s}\n", .{line});
    }
    else |err| if (err != error.EndOfStream) return err;
}

file is the file that has already been opened and is ready to be read. We create a buffer on the stack, then we initialize its reader. Io.Reader actually lives in reader.interface, so Io.Reader methods will be called on the interface.

We stop at error .EndOfStream, which means our file has been fully read. Other errors instead should be handled.

Now, this implementation is simple, but it has a problem: the buffer is on the stack, and has fixed size. Which means that we can't read lines longer than its size. If a file has lines that are longer than that, it will error out. We'll fix this later.

Anyway, let's test this. Create a file named kilo at the root of the project:

~/kilo-zig/kilo
#!/bin/sh

~/kilo-zig/zig-out/bin/kilo "$@" 2>err.txt

Then

chmod u+x kilo

It will run the program and write stderr output to err.txt. Compile and run with an argument, the lines of the file should be written into err.txt:

./kilo src/main.zig

Remember that we still have to press 3 times Ctrl-Q to quit.

Digression: assignments in Zig

In Zig, it's really important that you pay attention to details, which, if you have mainly experience with OOP languages, you may find confusing or even frustrating. It has to do with the fact that in Zig, as in C, all assignments imply a copy by value.

This is especially important when assigning struct values. Most OOP languages, when assigning objects, take a reference to it. But in Zig structs are not references, they are values, and they are copied when assigned.

The case of the Io.Reader interface

The interface is a nested struct in the reader. To work, it uses a builtin function called @fieldParentPtr() that desumes the address of its parent, so that the interface knows the address of the struct that contains it. But if instead of writing:

var reader = file.reader(&buf);
while (reader.interface.takeDelimiterExclusive('\n')) |line|

you write:

var reader = file.reader(&buf).interface;
while (reader.takeDelimiterExclusive('\n')) |line|

then you make a copy of that interface, which is orphan, can't take a valid address of its parent because it doesn't have one, and is essentially broken.

There's also the problem that file.reader(&buf), which is the legitimate parent, in the second form doesn't have a stable address, because it's not assigned to any variable, meaning that in the second expression it's temporary memory that becomes immediately invalid at the end of the assignment. So even if interface wasn't a copy and could still get its address, it would be invalid memory anyway.

The program will panic at runtime (in safe builds!), and the error reported can be hard to understand. Unfortunately Zig documentation is still immature, so right now you'll have to find out the hard way how these things work.

These kind of issues can be frustrating if you're used to OOP languages, which are generally designed to perform complex operations under the hood, hiding the details of the implementation from the user, for the sake of easiness of use.

In OOP languages when you assign something, often you aren't copying by value, but you are taking a reference to an object. In Zig you are expected to understand what assignments do (they always copy by value), and what you are really assigning.

Other example, many OOP languages have private fields, which can't be accessed outside of a certain scope. Zig has nothing like that, and everything is in plain sight, but it expects that you know what you're doing. As the creator of Zig said:

it all comes down to simplicity. Other languages hide complex details from
you; Zig keeps things simpler but in exchange requires you to understand
those details.

That said, there's probably room for improvement, and possibly there will be ways, in the future, to at least prevent accidental mistakes.

Interesting discussions and posts

Filling rows

Now that we can read a file line by line, we must store these lines in our Buffer rows.

We'll modify readLines() so that it will insert the row.

Editor.zig: readLines()
    _ = e;
    var buf: [1024]u8 = undefined;
    var reader = file.reader(&buf);
    while (reader.interface.takeDelimiterExclusive('\n')) |line| {
        // we print the line to stderr, to see if it works
        std.debug.print("{s}\n", .{line});
    while (reader.interface.takeDelimiterExclusive('\n')) |line| {
        try e.insertRow(e.buffer.rows.items.len, line);

which means that we'll insert a row at the last index of Buffer.rows.

Watch out the reading buffer

We'll also fix one problem of the current way we're reading the file. We're using a fixed buffer which is placed on the stack, and that's ok, because our file.reader needs a buffer. But the way this reader works, is that this buffer is filled with the line that is being read, then a row is inserted with the content of this buffer.

If the line is longer than the buffer, the program will quit with an error:

error: StreamTooLong

I don't know if there's a way to salvage the line that has just been read and be able to handle the error in the else branch. My first guess is no.

We could allocate a very large buffer and use that:

    const buf = try e.alc.alloc(u8, 60 * 1024 * 1024);
    defer e.alc.free(buf);
    var reader = file.reader(buf);

But this approach has multiple problems:

  • it's very slow, because allocating such a large buffer is expensive
  • we could get a OutOfMemory error
  • it doesn't solve the problem that you might still have files with lines longer than that

Using an allocating Reader

So we use another solution (suggested on Ziggit forum):

Editor.zig: readLines()
    while (reader.interface.takeDelimiterExclusive('\n')) |line| {
        try e.insertRow(e.buffer.rows.items.len, line);
    }
    else |err| if (err != error.EndOfStream) return err;
    var line_writer = std.Io.Writer.Allocating.init(e.alc);
    defer line_writer.deinit();

    while (reader.interface.streamDelimiter(&line_writer.writer, '\n')) |_| {
        try e.insertRow(e.buffer.rows.items.len, line_writer.written());
        line_writer.clearRetainingCapacity();
        reader.interface.toss(1); // skip the newline
    }
    else |err| if (err != error.EndOfStream) return err;

This approach makes the reader not store the line it is reading in a line slice, but it will feeding it to an allocating Writer, that stores the line in itself, allocating as much as it is needed.

It uses another method of the Reader interface:

  • instead of takeDelimiterExclusive, which doesn't take a Writer as argument, it will use streamDelimiter, which does

  • it must toss the last character, because streamDelimiter doesn't skip it, like takeDelimiterExclusive would do

Way too complex?

You can see that this is quite complex. I needed the help of experienced Zig users just to read the lines of the file. But this is a temporary problem, because the Reader and Writer interfaces are very new, and they still lack convenience, which has been already been promised and will come soon in the next Zig versions.

Inserting a row

If you remember, our Row type had two arrays:

Row.zig
/// The ArrayList with the actual row characters
chars: t.Chars,

/// Array with the visual representation of the row
render: []u8,

where Chars is actually a std.ArrayList(u8), which we'll be using a lot.

In our insertRow() function, what we'll do is:

  • initialize a new Row
  • copy the line into row.chars
  • insert the row in Buffer.rows

Finally we'll update the row, and set the dirty flag.

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Row operations
//
///////////////////////////////////////////////////////////////////////////////

/// Insert a row at index `ix` with content `line`, then update it.
fn insertRow(e: *Editor, ix: usize, line: []const u8) !void {
    const B = &e.buffer;

    var row = try t.Row.init(B.alc);
    try row.chars.appendSlice(B.alc, line);

    try B.rows.insert(B.alc, ix, row);

    try e.updateRow(ix);
    B.dirty = true;
}

We set the dirty flag because the same function will be used while modifying the buffer, but for now we're just reading the file. This flag will be reset in openFile().

Add this at the bottom of openFile():

Editor.zig: openFile()
    else |err| switch (err) {
        error.FileNotFound => {}, // new unsaved file
        else => return err,
    }
    B.dirty = false;
}

Updating a row

Updating the row means that we must update the render field from the chars field. That is, we must generate what will be actually rendered on screen.

The only way they will differ, at this point, is given by the possible presence of tab characters in our chars ArrayList.

Let's say we want to make this tabstop an option, so that it can be configured. We create a src/option.zig file and paste the following:

option.zig
//! Editor options. For now they are hard-coded and cannot be modified from
//! inside the editor, neither are read from a configuration file.

/// Number of spaces a tab character accounts for
pub var tabstop: u8 = 8;

As the description says, they're hard coded, but we'll still use a module, so that we can test different options ourselves if we want.

We'll also have to import it in the Constants section:

Editor.zig
const opt = @import("option.zig");

rowAt() and currentRow()

We'll write other helper functions that we'll use a lot:

Editor.zig
/// Get the row pointer at index `ix`.
fn rowAt(e: *Editor, ix: usize) *t.Row {
    return &e.buffer.rows.items[ix];
}

/// Get the row pointer at cursor position.
fn currentRow(e: *Editor) *t.Row {
    return &e.buffer.rows.items[e.view.cy];
}

Because frankly, to take that pointer all the times becomes annoying after a while.

We shouldn't worry about performance loss for too many function calls: Zig lacks macros, so the compiler tries to inline small functions when it can. Writing small functions is actually the Zig way to write macros.

updateRow()

The purpose of this function is to update the rendered row, which is what we see on screen.

Editor.zig
/// Update row.render, that is the visual representation of the row.
/// Performs a syntax update at the end.
fn updateRow(e: *Editor, ix: usize) !void {
    // code to come...
}

Allocator.realloc()

Editor.zig: updateRow()
    const row = e.rowAt(ix);

    // get the length of the rendered row and reallocate
    const rlen = // ??? total size of the rendered row ???
    row.render = try e.alc.realloc(row.render, rlen);

As explained before, I chose to make row.render a simple array because we can desume its size before any reallocation happens. Most of the time a reallocation would not result in a new allocation, because realloc() does the following:

  • if the previous size is 0 (first time the row is updated) and new size is bigger, there is an allocation
  • if the new size is smaller (characters are deleted), it is resized
  • if the new size is slightly bigger (such as when inserting a single character while typing), most of the times it will extend the array without reallocating
  • it would only allocate when the size is bigger and it's not possible to extend the array

An ArrayList would bring some benefits, but also increase total memory usage. For now we'll keep it simple, but we'll keep it in mind.

Looping characters of the real row

Editor.zig: updateRow()
    var idx: usize = 0;
    var i: usize = 0;

    while (i < row.chars.items.len) : (i += 1) {
        if (row.chars.items[i] == '\t') {
            row.render[idx] = ' ';
            idx += 1;
            while (idx % opt.tabstop != 0) : (idx += 1) {
                row.render[idx] = ' ';
            }
        }
        else {
            row.render[idx] = row.chars.items[i];
            idx += 1;
        }
    }

What the loop does, is that it inserts in row.render the same character when it's not a tab, otherwise it will convert it to spaces, making some considerations in the process:

  • inside the loop, idx is the current column in the rendered row
  • we want a minimum of one space, so we add it, and increase idx
  • we want to see if there are more spaces to add, and this is true if (idx % tabstop != 0)

For example, assuming tabstop = 8, at the start of a line, where idx is 0, a Tab would insert 8 spaces.

But a Tab typed in the middle of a row won't add necessarily tabstop spaces, because the starting column in the rendered row may be such that idx % 8 is greater than 1, so if we insert a tab at idx = 12, we have a space insertion, which makes idx = 13, then 5 more spaces, because 13 % 8 = 5.

Computing beforehand the size of the rendered row

    // get the length of the rendered row and reallocate
    const rlen = // ??? total size of the rendered row ???
    row.render = try e.alc.realloc(row.render, rlen);

We didn't assign anything to rlen. How do we know how long will be our rendered row? We'll have do something similar to what we do inside the loop in updateRow(), but we just increase idx and return the final value. But often in our program we'll have to convert a real column index to an index in the rendered row, so we write a function that does that.

We call the function cxToRx() and the call becomes:

    const rlen = // ??? total size of the rendered row ???
    const rlen = row.cxToRx(row.chars.items.len);

That is, we calculate the index in the rendered row for the last column of the real row.

We put this function in Row.zig, because it is in agreement with how we wanted to design our types: they shouldn't change the state of the Editor, but they can return their own state. Here Row will not modify itself, so it's ok.

Row.zig: methods section
/// Calculate the position of a real column in the rendered row.
pub fn cxToRx(row: *Row, cx: usize) usize {
    var rx: usize = 0;
    for (0..cx) |i| {
        if (row.chars.items[i] == '\t') {
            rx += (opt.tabstop - 1) - (rx % opt.tabstop);
        }
        rx += 1;
    }
    return rx;
}

The loop is a bit different here, because instead of two nested loops we have only one. That's because we don't need to modify the row in any way, so we can calculate the needed spaces in a single operation. Which is quite a bit more difficult to understand, to be honest. Feel free to recreate an example loop step by step as we did above.

Also this function needs to import option.zig, so do that.

Note

We don't handle multi-byte characters, and we don't have virtual text of any kind. In a real editor this function would be more complex.

Write a test

Let's write a test to see if what we wrote is working.

Run tests from main.zig

To run this test with

zig build test

we must add a section to our main.zig module:

main.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Tests
//
///////////////////////////////////////////////////////////////////////////////

comptime {
    if (builtin.is_test) {
        _ = @import("Editor.zig");
    }
}

In fact, our build.zig is set up in a way that running zig build test executes the tests that are in main.zig. When tests are executed from a module, all tests placed in imported modules are executed too.

We don't want any test in main.zig, but with this comptime block we import the modules we want to test, if builtin.is_test is true, and this happens only when we're running tests.

Importing these modules will cause their tests to be executed, which is what we want.

Add the test in Editor

You should add some constants in Editor:

Editor.zig
const linux = @import("linux.zig");

const mem = std.mem;
const expect = std.testing.expect;

Then we add a test to src/Editor.zig. Add the test section just above the Constants section.

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Tests
//
///////////////////////////////////////////////////////////////////////////////

test "insert rows" {
    var da = std.heap.DebugAllocator(.{}){};
    defer _ = da.deinit();

    var e = try t.Editor.init(da.allocator(), .{ .rows = 50, .cols = 180 });
    try e.openFile("src/main.zig");
    defer e.deinit();

    const row = e.rowAt(6).chars.items;
    try expect(mem.eql(u8, "pub fn main() !void {", row));
}

It's a simple test that verifies the number of rows that have been read, and that the content of one row actually matches the one in the file.

I initialize the editor with a 'fake' screen, because this isn't an interactive terminal. Also, we avoid the event loop by reading directly the file with openFile(), otherwise processKeypress() would hang the test.

If we modify main.zig again, this test could fail, or course. I will not, but maybe you will.

The screen surface

Now we're finally ready to start drawing on the screen.

We'll use an ArrayList to hold all characters that will be printed on every screen refresh.

We'll add a new field to our Editor type:

Editor.zig
/// String that is printed on the terminal at every screen redraw
surface: t.Chars,

It is initialized in the init() function:

Editor.zig: init()
    return .{
        .alc = allocator,
    // multiply * 10, because each cell could contain escape sequences
    const surface_capacity = screen.rows * screen.cols * 10;
    return .{
        .alc = allocator,
        .surface = try t.Chars.initCapacity(allocator, surface_capacity),

We give our surface an initial capacity, so that it will probably never reallocate. We make enough room for escape sequences: potentially, almost every cell of the screen could contain an escape sequence.

surface must be deinitialized in deinit(), or it will leak:

Editor.zig: deinit()
/// Deinitialize the editor.
pub fn deinit(e: *Editor) void {
    e.buffer.deinit();
    e.surface.deinit(e.alc);

Note that we must pass the allocator as argument when deinitializing an ArrayList.

Appending to the surface

Every time we want to append to the surface, we'd need either:

try e.surface.appendSlice(e.alc, slice);

or

try e.surface.append(e.alc, character);

Let's create a helper function, because we'll append to the surface in lots of places, and we want our code to be more concise and readable.

Editor.zig
/// Append either a slice or a character to the editor surface.
fn toSurface(e: *Editor, value: anytype) !void {
    switch (@typeInfo(@TypeOf(value))) {
        .pointer => try e.surface.appendSlice(e.alc, value),
        else => try e.surface.append(e.alc, value),
    }
}

With this function we just need to do:

try e.toSurface(slice_or_character);

@TypeOf

Builtin function @TypeOf() returns a type, which can only be evaluated at compile time, hence our helper doesn't have any runtime cost, because the operation to perform is decided at compile time. As proof of this, you will get a compile error if you pass something wrong to this function.

More escape sequences

We also add a bunch of constants to the bottom of ansi.zig. These are escape sequences that we'll use while drawing, at one point or another, so let's just add them all now:

ansi.zig
/// Background color
pub const BgDefault = CSI ++ "40m";

/// Foreground color
pub const FgDefault = CSI ++ "39m";

/// Hide the terminal cursor
pub const HideCursor = CSI ++ "?25l";

/// Show the terminal cursor
pub const ShowCursor = CSI ++ "?25h";

/// Move cursor to position 1,1
pub const CursorTopLeft = CSI ++ "H";

/// Start reversing colors
pub const ReverseColors = CSI ++ "7m";

/// Reset colors to terminal default
pub const ResetColors = CSI ++ "m";

/// Clear the content of the line
pub const ClearLine = CSI ++ "K";

/// Color used for error messages
pub const ErrorColor = CSI ++ "91m";

Refresh the screen

In startUp(), replace the commented placeholder in the event loop:

Editor.zig: startUp()
    while (e.should_quit == false) {
        // refresh the screen
        try e.refreshScreen();

We'll do the drawing with this function, that goes in a new section, which I put above the Helpers section:

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Screen update
//
///////////////////////////////////////////////////////////////////////////////

/// Full refresh of the screen.
fn refreshScreen(e: *Editor) !void {
    // code to come...
}

We'll have to explain what goes on.

  • we clear our ArrayList, which will eventually contain the characters that must be printed

  • we set the background color, hide the terminal cursor so that it doesn't get in the way, and move the cursor to the top left position

  • we draw the rows, later we'll also draw the statusline and the message area

Editor.zig: refreshScreen()
    e.surface.clearRetainingCapacity();

    try e.toSurface(ansi.BgDefault);
    try e.toSurface(ansi.HideCursor);
    try e.toSurface(ansi.CursorTopLeft);

    try e.drawRows();
    // try e.drawStatusline();
    // try e.drawMessageBar();
  • we move the cursor to its current position and we show it again

  • we print the whole thing with a write() call

Editor.zig: refreshScreen()
    const V = &e.view;

    // move cursor to its current position (could have been moved with keys)
    var buf: [32]u8 = undefined;
    const row = V.cy - V.rowoff + 1;
    const col = V.rx - V.coloff + 1;
    try e.toSurface(try ansi.moveCursorTo(&buf, row, col));
    try e.toSurface(ansi.ShowCursor);

    try linux.write(e.surface.items);

moveCursorTo()

To move the cursor we'll need a new function in ansi.zig:

ansi.zig
/// Return the escape sequence to move the cursor to a position.
pub fn moveCursorTo(buf: []u8, row: usize, col: usize) ![]const u8 {
    return std.fmt.bufPrint(buf, CSI ++ "{};{}H", .{ row, col });
}

It takes a slice buf and formats it to generate an escape sequence that will move the cursor to a position.

Drawing the rows

This function will be expanded later, but for now all it needs to do is to draw the rows without any highlight.

Editor.zig
/// Append rows to be drawn to the surface. Handles escape sequences for syntax
/// highlighting.
fn drawRows(e: *Editor) !void {
    // code to come...
}

We can print a number of rows which is equal to the height of our main window, which is e.screen.rows. We use a for loop with a range, but to the index y we must add e.view.rowoff, which is the current row offset. This will be greater than 0 if we scroll down our window and the first row went off-screen.

Editor.zig: drawRows()
    const V = &e.view;
    const rows = e.buffer.rows.items;

    for (0 .. e.screen.rows) |y| {
        const ix: usize = y + V.rowoff;

Since we draw by screen rows, and not by Buffer rows, y may be greater than the number of the Buffer rows, which means we are past the end of the file. In this case we draw a ~ to point that out.

Editor.zig: drawRows()
        // past buffer content
        if (ix >= rows.len) {
            try e.toSurface('~');
        }

Otherwise, we are within the file content, but it doesn't mean that there is something to print in all cases:

Editor.zig: drawRows()
        // within buffer content
        else {
            // length of the rendered line
            const rowlen = rows[ix].render.len;

            // actual length that should be drawn because visible
            var len = if (V.coloff > rowlen) 0 else rowlen - V.coloff;

For example, if we scrolled the window to the right, the leftmost columns would go off-screen, and e.view.coloff would become positive. If the line is shorter than that, nothing will be printed, because it's completely off-screen.

We also limit len to the number of screen columns:

Editor.zig: drawRows()
            len = @min(len, e.screen.cols);

If len > 0 there's something to print: which would be the slice of the rendered line that starts at coloff, and is long len characters.

We append this slice to the surface ArrayList.

Editor.zig: drawRows()
            // draw the visible part of the row
            if (len > 0) {
                try e.toSurface(rows[ix].render[V.coloff .. V.coloff + len]);
            }
        }

We end the line after that:

Editor.zig: drawRows()
        try e.toSurface(ansi.ClearLine);
        try e.toSurface("\r\n"); // end the line
    }

Again: V.coloff is 0 unless a part of the row went off-screen on the left side.

At this point, if you compile and run:

./kilo kilo

you should already be able to visualize the file on the screen! That's big progress. You can't move the cursor, and you can still quit the editor with Ctrl-Q pressed 3 times.

Note

You will notice that the last 2 lines of the screen don't have the ~ character: that's because in init() we subtracted 2 from the real screen height, to make room for statusline and message area.

The statusline

Uncomment the line in refreshScreen() where we draw the statusline.

Editor.zig: refreshScreen()
    try e.drawRows();
    // try e.drawStatusline();
    try e.drawStatusline();

The drawStatusline() function

I put this below drawRows():

Editor.zig
/// Append the statusline to the surface.
fn drawStatusline(e: *Editor) !void {
    const V = &e.view;
    // code to come...
}

We want the color of the statusline to be the inverse of the normal text color, with dark text over bright background. We want two sections, so we declare two buffers.

Editor.zig: drawStatusline()
    try e.toSurface(ansi.ReverseColors);

    var lbuf: [200]u8 = undefined;
    var rbuf: [80]u8 = undefined;
  • on the left side we want to display the filename, or [No Name] for a newly created file, and the modified state of the buffer

  • on the right side, the filetype (or no ft) and the current cursor position

Editor.zig: drawStatusline()
    // left side of the statusline
    var ls = std.fmt.bufPrint(&lbuf, "{s} - {} lines{s}", .{
        e.buffer.filename orelse "[No Name]",
        e.buffer.rows.items.len,
        if (e.buffer.dirty) " [modified]" else "",
    }) catch "";

    // right side of the statusline (leading space to guarantee separation)
    var rs = std.fmt.bufPrint(&rbuf, " | {s} | col {}, ln {}/{} ", .{
        e.buffer.syntax orelse "no ft",
        V.cx + 1,
        V.cy + 1,
        e.buffer.rows.items.len,
    }) catch "";

We'll use std.fmt.bufPrint to format the two sides of the statusline, then we'll fill with spaces the room between them, to cover the whole e.screen.cols dimension, which would be the width of the screen.

Note that we use the orelse statement to provide fallbacks for our optional variables (e.buffer.filename and e.buffer.syntax)

Since we'll use fixed buffers on the stack for bufPrint, there's the risk of having filenames that are so long that they won't fit, in that case we just print nothing. We do the same for the right side.

We'll prioritize the left side, in case there isn't enough room for both.

We'll have to ensure we reset colors and insert a new line at the end. We could use a defer statement for this purpose, but inside defer statements error handling isn't allowed, so we would have to ignore the errors, and hope for the best. Instead we'll create a small helper function so that errors can still be handled. In Zig the goto statement doesn't exist, so we must get used to this kind of alternatives.

Editor.zig: drawStatusline()
    var room_left = e.screen.cols;

    // prioritize left side
    if (ls.len > room_left) {
        ls = ls[0 .. room_left];
    }
    room_left -= ls.len;

    try e.toSurface(ls);

    if (room_left == 0) {
        try e.finalizeStatusline();
        return;
    }

Labeled blocks as goto alternative

Another alternative to goto is a labeled block, for example:

do: {
    std.debug.print("do block\n", .{});

    var i: usize = 0;
    while (i < 10) : (i += 1) {
        if (i == 5) {
            break :do;
        }
    }
    std.debug.print("no break\n", .{});
}
std.debug.print("exit\n", .{});

prints:

do block
exit

This increases the indentation level of the whole block, though, so I prefer other solutions, when possible.

To make sure we only append if there's enough room, we track the available room in the room_left variable that is initially equal to e.screen.cols, and we reduce it as we determine the size of the left and right sides

Append the right side and we're done:

Editor.zig: drawStatusline()
    // add right side and spaces if there is room left for them
    if (rs.len > room_left) {
        rs = rs[0 .. room_left];
    }
    room_left -= rs.len;

    try e.surface.appendNTimes(e.alc, ' ', room_left);
    try e.toSurface(rs);
    try e.finalizeStatusline();

finalizeStatusline()

This is the helper function to finalize the statusline, and still be able to handle errors.

Editor.zig
/// Reset colors and append new line after statusline
fn finalizeStatusline(e: *Editor) !void {
    try e.toSurface(ansi.ResetColors);
    try e.toSurface("\r\n");
}

Compile and run, and enjoy your statusline!

The message area

For the message area we'll need more Editor fields:

Editor.zig
/// String to be printed in the message area (can be a prompt)
status_msg: t.Chars,

/// Controls the visibility of the status message
status_msg_time: i64,

Also add these constants:

Editor.zig
const time = std.time.timestamp;
const time_ms = std.time.milliTimestamp;

const initial_msg_size = 80;

Add to init():

Editor.zig: init()
        .status_msg = try t.Chars.initCapacity(allocator, initial_msg_size),
        .status_msg_time = 0,

and to deinit() (always deinitialize ArrayLists or they will leak):

Editor.zig: deinit()
    e.status_msg.deinit(e.alc);

Uncomment the line in refreshScreen() where we draw the message area.

Editor.zig: refreshScreen()
    try e.drawRows();
    try e.drawStatusline();
    // try e.drawMessageBar();
    try e.drawMessageBar();

The drawMessageBar() function

I put this below finalizeStatusline():

Editor.zig
/// Append the message bar to the surface.
fn drawMessageBar(e: *Editor) !void {
    try e.toSurface(ansi.ClearLine);

    var msglen = e.status_msg.items.len;
    if (msglen > e.screen.cols) {
        msglen = e.screen.cols;
    }
    if (msglen > 0 and time() - e.status_msg_time < 5) {
        try e.toSurface(e.status_msg.items[0 .. msglen]);
    }
}

As you can see, it's pretty simple. We clear the line, then if there's a message to be printed, we append it to the surface.

We have also some sort of timer: it's not a real timer in the sense that there's not an async timer that runs independently from the main thread. Remember that the screen is redrawn in the event loop, whose iterations are controlled by the processKeypress() function, since it's that function that halts the loop while waiting for new keys pressed by the user. So what this "timer" does, is to check if 5 seconds have passed since the last redraw, then it will append the message to the surface if it didn't, otherwise it will not append anything, and the message won't be printed.

It will be the function which sets a status message that will update status_msg_time, but we don't have a way to set a status message yet.

The welcome message

The original kilo editor would print a welcome message when the program is started without arguments, which results in a new empty, unnamed buffer to be created.

We also want it because it's cool and reminds us (or at least me) of vim.

New fields in Editor:

Editor.zig
/// String to be displayed when the editor is started without loading a file
welcome_msg: t.Chars,

/// Becomes false after the first screen redraw
just_started: bool,

Initialize in init().

Editor.zig: init()
        .welcome_msg = try t.Chars.initCapacity(allocator, 0),
        .just_started = true,

Deinitialize in deinit():

Editor.zig: deinit()
    e.welcome_msg.deinit(e.alc);

When, how, and where do we want the welcome message to appear?

  • we generate it when the argument for startUp() is null, which means there's no file to open

  • we want to generate it dynamically because the message should be centered on screen, and we can assess that only at runtime

  • we render the message in drawRows()

A module for messages

When generating the message, we must fetch the base string from somewhere. It will be the same for other text constants and messages that we'll use in the editor in the future. So we create a message module and we import it in Editor:

const message = @import("message.zig");

This module for now will look like this:

message.zig
//! Module that holds various strings for the message area, either status or
//! error messages, or prompts.

const std = @import("std");
const opt = @import("option.zig");

const status_messages = .{
    .{ "welcome", "Kilo editor -- version " ++ opt.version_str },
};

pub const status = std.StaticStringMap([]const u8).initComptime(status_messages);

We also create a version_str in our option module, so that it contains the current version number, as a string:

option.zig
pub const version_str = "0.1";

The StaticStringMap is created at compile time (see how it's initialized), and will be accessed in Editor with message.status.get(), that returns an optional value which is null if the key couldn't be found.

Keys of StaticStringMap will always be strings, but values can be of any type. In our case they are also strings ([]const u8).

Generate the message

We had a commented placeholder in startUp(), so we must replace it with the actual function call.

Editor.zig: startUp()
    else {
        // we generate the welcome message
    }
    else {
        try e.generateWelcome();
    }

The function to generate the message is:

Editor.zig
/// Generate the welcome message.
fn generateWelcome(e: *Editor) !void {
    // code to come...
}

The line with the welcome message starts with a ~, because we're in an empty buffer.

The length of the message must be limited to the screen columns - 1, because of the ~ which we just appended.

Editor.zig: generateWelcome()
    try e.welcome_msg.append(e.alc, '~');

    var msg = message.status.get("welcome").?;
    if (msg.len >= e.screen.cols) {
        msg = msg[0 .. e.screen.cols - 1];
    }

The padding will be inserted before the message.

Editor.zig: generateWelcome()
    const padding: usize = (e.screen.cols - msg.len) / 2;

    try e.welcome_msg.appendNTimes(e.alc, ' ', padding);
    try e.welcome_msg.appendSlice(e.alc, msg);

Render the message

In drawRows(), all we have to do is replace the if branch for when the row is past the end of the buffer, with this:

Editor.zig: drawRows()
        // past buffer content
        if (ix >= rows.len) {
            try e.toSurface('~');
        // past buffer content
        if (ix >= rows.len) {
            if (e.just_started
                and e.buffer.filename == null
                and e.buffer.rows.items.len == 0
                and y == e.screen.rows / 3) {
                try e.toSurface(e.welcome_msg.items);
            }
            else {
                try e.toSurface('~');
            }

We append it to the surface if the buffer is empty, doesn't even have a name, and current row is at about 1/3 of the height of the screen.

Remember to set just_started to false at the bottom of refreshScreen(), if you didn't already.

Editor.zig: refreshScreen()
    e.just_started = false;
    try linux.write(e.surface.items);

We also set just_started to false so that our welcome message won't be printed again.

Compile and run with

./kilo

to see an empty buffer and the welcome message. You can try to run again with a narrower terminal window, to verify that the message and the statusline are displayed correctly.

A text viewer

Right now we're able to open a file and display it, but not being able to move the cursor, keeps us stuck in the top-left corner of the screen.

Our processKeypress() must detect more keys, and we must bind these keys to actions to perform.

We change our function to this:

Editor.zig
/// Process a keypress: will wait indefinitely for readKey, which loops until
/// a key is actually pressed.
fn processKeypress(e: *Editor) !void {
    const k = try ansi.readKey();

    const static = struct {
        var q: u8 = opt.quit_times;
    };

    const B = &e.buffer;

    switch (k) {
        .ctrl_q => {
            if (B.dirty and static.q > 0) {
                static.q -= 1;
                return;
            }
            try ansi.clearScreen();
            e.should_quit = true;
        },
        else => {},
    }

    // reset quit counter for any keypress that isn't Ctrl-Q
    static.q = opt.quit_times;
}

opt.quit_times

First thing, we want to remove that magic number and bind static.q to an option, so in option.zig we'll add:

option.zig
pub const quit_times = 3;

and we replace 3 with opt.quit_times. And we only want to repeat Ctrl-Q if the buffer has modified.

Next, we'll handle more keypresses.

Before we deal with movements, we must complete our Row type.

The rxToCx() method

This does the opposite of the cxToRx() method, that is, it finds the real column for an index of the rendered row. It must still iterate the real row, not the rendered one, because from the latter we just couldn't know what was a tab and what a real space character. Therefore we iterate the real row like in cxToRx(), we track both the rendered column and the current index in the real row, and when the resulting rendered column is greater than the requested column we return the current index in the real row.

Row.zig
/// Calculate the position of a rendered column in the real row.
pub fn rxToCx(row: *Row, rx: usize) usize {
    var cur_rx: usize = 0;
    var cx: usize = 0;
    while (cx < row.chars.items.len) : (cx += 1) {
        if (row.chars.items[cx] == '\t') {
            cur_rx += (opt.tabstop - 1) - (cur_rx % opt.tabstop);
        }
        cur_rx += 1;

        if (cur_rx > rx) {
            return cx;
        }
    }
    return cx;
}

More keypress handling

Inside the switch that handles keypresses, we add a variable and more prongs:

Editor.zig: processKeypress()
    const B = &e.buffer;
    const V = &e.view;
Editor.zig: processKeypress()
        .ctrl_d, .ctrl_u, .page_up, .page_down => {
            // by how many rows we'll jump
            const leap = e.screen.rows - 1;

            // place the cursor at the top of the window, then jump
            if (k == .ctrl_u or k == .page_up) {
                V.cy = V.rowoff;
                V.cy -= @min(V.cy, leap);
            }
            // place the cursor at the bottom of the window, then jump
            else {
                V.cy = V.rowoff + e.screen.rows - 1;
                V.cy = @min(V.cy + leap, B.rows.items.len);
            }
        },

        .home => {
            V.cx = 0;
        },

        .end => {
            // last row doesn't have characters!
            if (V.cy < B.rows.items.len) {
                V.cx = B.rows.items[V.cy].clen();
            }
        },

        .left, .right => {
            e.moveCursorWithKey(k);
        },

        .up, .down => {
            e.moveCursorWithKey(k);
        },

I added comments so that what happens should be self-explanatory.

One of my favorite Zig features is how you can omit the enum type when using their values, since the type of those values is known to be that type of enum. It makes the code very expressive and avoids redundancy, without resorting to macros or untyped constants. It also makes it easier to write this kind of guides.

We see the a new function, moveCursorWithKey(), which we'll cover next.

Move with keys

This function will let us move the cursor with arrow keys. Also in this case the code is self-explanatory.

With keys Left and Right we can also change row, if we are respectively in the first or last column of the row.

Editor.zig
/// Update the cursor position after a key has been pressed.
fn moveCursorWithKey(e: *Editor, key: t.Key) void {
    const V = &e.view;
    const numrows = e.buffer.rows.items.len;

    switch (key) {
        .left => {
            if (V.cx != 0) { // not the first column
                V.cx -= 1;
            }
            else if (V.cy > 0) { // move back to the previous row
                V.cy -= 1;
                V.cx = e.currentRow().clen();
            }
        },
        .right => {
            if (V.cy < numrows) {
                if (V.cx < e.currentRow().clen()) { // not the last column
                    V.cx += 1;
                }
                else { // move to the next row
                    V.cy += 1;
                    V.cx = 0;
                }
            }
        },
        .up => {
            if (V.cy != 0) {
                V.cy -= 1;
            }
        },
        .down => {
            if (V.cy < numrows) {
                V.cy += 1;
            }
        },
        else => {},
    }
}

Handling the wanted column

When we move vertically, the cursor keeps its current column. That's pretty obvious. But when it moves to a shorter line, if we don't keep track of the previous value, it will keep moving along the shorter line, instead we want to move along the same column from where we started. That is the wanted column, and in our View type is the cwant field.

This variable should be:

  • restored when moving vertically, either with arrow keys or by page

  • set to the current column when moving left or right, or to the beginning of the line (Home key), or after typing/deleting something

  • when using the End key, it should be set to a special value that means: always stick to the end of the line when moving vertically

The special value we use is std.math.maxInt(usize), which we store in a constant:

Editor.zig
const maxUsize = std.math.maxInt(usize);

The Cwant enum

These different behaviors are listed in an enum, which will go in our types module:

types.zig
/// Controls handling of the wanted column.
pub const Cwant = enum(u8) {
    /// To set cwant to a new value
    set,
    /// To restore current cwant, or to the last column if too big
    restore,
    /// To set cwant to maxUsize, which means 'always the last column'
    maxcol,
};

The doCwant() function

Differently from the original kilo editor, here the cwant field will track the rendered column, not the real one, which makes more sense in an editor.

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              View operations
//
///////////////////////////////////////////////////////////////////////////////

/// Handle wanted column. `want` can be:
/// .set: set e.view.cwant to a new value
/// .maxcol: set to maxUsize, which means 'always the last column'
/// .restore: set current column to cwant, or to the last column if too big
fn doCwant(e: *Editor, want: t.Cwant) void {
    const V = &e.view;
    const numrows = e.buffer.rows.items.len;

    switch (want) {
        // code to come...
    }
}

So when we set cwant, we assign it to the current column of the rendered row.

If want is .maxcol, we set cwant to our special value.

Editor.zig: doCwant()
        .set => {
            V.cwant = if (V.cy < numrows) e.currentRow().cxToRx(V.cx) else 0;
        },
        .maxcol => {
            V.cwant = maxUsize;
        },

When we restore it, since cwant is an index in the rendered row, we use rxToCx() to find out the real column, to which cx must be set.

When we restore cwant, we'll check if we can actually restore it. If the length of the current row is shorter, the cursor will be moved to the last column.

If the value of cwant is our special value, the cursor will always be placed in the last column, even if the starting line was shorter than the following ones.

Editor.zig: doCwant()
        .restore => {
            if (V.cy == numrows) { // past end of file
                V.cx = 0;
            }
            else if (V.cwant == maxUsize) { // wants end of line
                V.cx = e.currentRow().clen();
            }
            else {
                const row = e.currentRow();
                const rowlen = row.clen();
                if (rowlen == 0) {
                    V.cx = 0;
                }
                else {
                    // cwant is an index of the rendered column, must convert
                    V.cx = row.rxToCx(V.cwant);
                    if (V.cx > rowlen) {
                        V.cx = rowlen;
                    }
                }
            }
        },

Note

Here the else prong isn't needed, since we handle all members of the enum.

Calls to doCwant()

Where should the wanted column be handled? Right in the processKeypress() function. You'll have to add calls to doCwant() as follows:

Editor.zig: somewhere in processKeypress()
            // after handling <ctrl-d>, <ctrl-u>, <page-up>, <page-down>
            e.doCwant(.restore);

            // after handling <up>, <down>
            e.doCwant(.restore);

            // after handling <left>, <right> and <home>
            e.doCwant(.set);

            // after handling <end>
            e.doCwant(.maxcol);

Scroll the view

There's one more thing to write, before all this begins to actually work. Until now, movement keys would set the row (View.cy) and the real column (View.cx). But in our refreshScreen() function, the escape sequence that actually moves the cursor to the new position will need View.rx, that is the column in the rendered row.

This value will be set in another function, scroll(), which will be invoked at the top of the refreshScreen() function. So place the call now:

/// Full refresh of the screen.
fn refreshScreen(e: *Editor) !void {
    e.scroll();

We must define another option:

option.zig
/// Minimal number of screen lines to keep above and below the cursor
pub var scroll_off: u8 = 2;

The actual scroll() function has 3 purposes:

  • adapt the view to respect the scroll_off option
  • set the visual column (column in the rendered row)
  • set View.rowoff and View.coloff, which control the visible part of the buffer relatively to the first row and the first column
Editor.zig
/// Scroll the view, respecting scroll_off.
fn scroll(e: *Editor) void {
    const V = &e.view;
    const numrows = e.buffer.rows.items.len;

    // handle scroll_off here...

    // update rendered column here...

    // update rowoff and coloff here...
}

the scroll_off option

This is how the Vim documentation describes it:

Minimal number of screen lines to keep above and below the cursor. This will make some context visible around where you are working.

    //////////////////////////////////////////
    //          scrolloff option
    //////////////////////////////////////////

    if (opt.scroll_off > 0 and numrows > e.screen.rows) {
        while (V.rowoff + e.screen.rows < numrows
               and V.cy + opt.scroll_off >= e.screen.rows + V.rowoff)
        {
            V.rowoff += 1;
        }
        while (V.rowoff > 0 and V.rowoff + opt.scroll_off > V.cy) {
            V.rowoff -= 1;
        }
    }

The rendered column

    //////////////////////////////////////////
    //          update rendered column
    //////////////////////////////////////////

    V.rx = 0;

    if (V.cy < numrows) {
        V.rx = e.currentRow().cxToRx(V.cx);
    }

We just use the cxToRx() function, for all lines except the last one, which is completely empty, not even a \n character, so we can't index it in any way (the program would panic).

rowoff, coloff

rowoff is the topmost visible row, coloff is the leftmost visible column. While the latter is rarely positive, the former will be positive whenever we can't see the first line of the file.

When the function is called, cy (the cursor column) can have a new value, but rowoff has still the old value, so it must be updated. Same for coloff.

    //////////////////////////////////////////
    //      update rowoff and coloff
    //////////////////////////////////////////

    // cursor has moved above the visible window
    if (V.cy < V.rowoff) {
        V.rowoff = V.cy;
    }
    // cursor has moved below the visible window
    if (V.cy >= V.rowoff + e.screen.rows) {
        V.rowoff = V.cy - e.screen.rows + 1;
    }
    // cursor has moved beyond the left edge of the window
    if (V.rx < V.coloff) {
        V.coloff = V.rx;
    }
    // cursor has moved beyond the right edge of the window
    if (V.rx >= V.coloff + e.screen.cols) {
        V.coloff = V.rx - e.screen.cols + 1;
    }

Casting numbers

When calculating a value, and we are handling unsigned integer types (like in this case), we should avoid subtractions, unless we are absolutely sure that the left operand is greater than the right operand.

Castings in Zig tend to be quite verbose, since the Zig phylosophy is to make everything as explicit as possible, and the verbosity is also an element of the concept of friction that Zig has adopted: to make safe things easy, and unsafe things uncomfortable, even if not impossible, so that one becomes inclined to take the safer route to the solution of a problem.

In this program, we don't do any casting, but we don't have to deal with floating point numbers either.

To avoid castings of unsigned integers, sometimes it's enough to move the subtracted operand to the other side of the equation, making it become a positive operand. It's what we're doing here, even though it can make the operation less intuitive.

This thread on Ziggit forum is an interesting read about castings.

Compile and run!

Our text viewer is complete. You should be able to open any file and navigate it with ease.

A text editor

Now we want to turn our text viewer in a proper editor. I guess it's the natural progression for this kind of things. Not to mention that our guide is called "Build a text editor", not "Build a text viewer". Let's not forget that.

Let's start by handling more keypresses in the processKeypress() function.

We add new switch prongs for Backspace, Del and Enter:

Editor.zig: processKeypress()
        .backspace, .ctrl_h, .del => {
            if (k == .del) {
                e.moveCursorWithKey(.right);
            }
            try e.deleteChar();
            e.doCwant(.set);
        },

        .enter => try e.insertNewLine(),

We also change our else branch to handle characters to be inserted. We only handle Tab and printable characters, for now.

Editor.zig: processKeypress()
        else => {
            const c = @intFromEnum(k);
            if (k == .tab or asc.isPrint(c)) {
                try e.insertChar(c);
                e.doCwant(.set);
            }
        },

There is a new constant to set:

Editor.zig
const asc = std.ascii;

And new functions to implement:

  • insertChar will insert a character at cursor positin
  • deleteChar will delete the character on the left of the cursor
  • insertNewLine will start editing a new line after the current one

Insert characters

Before inserting a character, we check if we are in a new row, if so, we insert the row in the buffer. After that, we can just insert the character and move forward. We wrote already our insertRow() function, so there's nothing to add (for now).

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              In-row operations
//
///////////////////////////////////////////////////////////////////////////////

/// Insert a character at current cursor position. Handle textwidth.
fn insertChar(e: *Editor, c: u8) !void {
    const V = &e.view;

    // last row, insert a new row before inserting the character
    if (V.cy == e.buffer.rows.items.len) {
        try e.insertRow(e.buffer.rows.items.len, "");
    }

    // insert the character and move the cursor forward
    try e.rowInsertChar(V.cy, V.cx, c);
    V.cx += 1;
}

rowInsertChar()

This will perform the actual character insertion in the row.chars ArrayList, update the rendered row, and set the modified flag.

Editor.zig
/// Insert character `c` in the row with index `ix`, at column `at`.
fn rowInsertChar(e: *Editor, ix: usize, at: usize, c: u8) !void {
    try e.rowAt(ix).chars.insert(e.buffer.alc, at, c);
    try e.updateRow(ix);
    e.buffer.dirty = true;
}

Deleting a character

By deleting a character, we mean deleting the character to the left of our cursor, what the Backspace key normally does.

Editor.zig
/// Delete a character before cursor position (backspace).
fn deleteChar(e: *Editor) !void {
    const V = &e.view;
    const B = &e.buffer;

    // code to come...
}

We'll want to handle different cases:

Cursor is past the end of file: move to the end of the previous line, don't return, we will possibly delete a character.

    // past the end of the file
    if (V.cy == B.rows.items.len) {
        e.moveCursorWithKey(.left);
    }

Cursor at the start of the file: nothing to do.

    // start of file
    if (V.cx == 0 and V.cy == 0) {
        return;
    }

Cursor after the first column: delete the character at column before the current one.

    // delete character in current line
    if (V.cx > 0) {
        try e.rowDelChar(V.cy, V.cx - 1);
        V.cx -= 1;
    }

Cursor is at the start of a line which isn't the first one: we'll append the current line to the previous one, then delete the current row. The cursor will then be moved to the row above, at a column that is the length of the previous row before the lines were joined.

    // join with previous line
    else {
        V.cx = B.rows.items[V.cy - 1].clen();
        try e.rowInsertString(V.cy - 1, V.cx, e.currentRow().chars.items);
        e.deleteRow(V.cy);
        V.cy -= 1;
    }

rowDelChar()

For the actual character deletion we write rowDelChar(), which closely resembles rowInsertChar():

Editor.zig
/// Delete a character in the row with index `ix`, at column `at`.
fn rowDelChar(e: *Editor, ix: usize, at: usize) !void {
    _ = e.rowAt(ix).chars.orderedRemove(at);
    try e.updateRow(ix);
    e.buffer.dirty = true;
}

rowInsertString()

In case we want to join lines, we'll need two new functions.

Editor.zig
/// Insert a string at position `at`, in the row at index `ix`.
fn rowInsertString(e: *Editor, ix: usize, at: usize, chars: []const u8) !void {
    try e.rowAt(ix).chars.insertSlice(e.buffer.alc, at, chars);
    try e.updateRow(ix);
    e.buffer.dirty = true;
}

This is very similar to rowInsertChar(), but inserts a slice instead of inserting a character. Here we're just appending at the end of the row, since we're passing an at argument that is equal to the length of the row.

deleteRow()

The last function we need for now is the one that deletes a row from the Buffer. I put this function below insertRow().

As mentioned when we talked about the Buffer type, we're sometimes deinitializing individual rows in the Editor methods, which isn't ideal, but I don't think that creating a method in Buffer just for this is that much better. We can access the Buffer allocator just fine, but we must remember that a Row uses the Buffer allocator, not the Editor one. It's only happening right now that both Editor and Buffer use the same allocator, but things might change in the future.

Editor.zig
/// Delete a row and deinitialize it.
fn deleteRow(e: *Editor, ix: usize) void {
    var row = e.buffer.rows.orderedRemove(ix);
    row.deinit(e.buffer.alc);
    e.buffer.dirty = true;
}

The string module

Before we proceed, let's add a new module called string.zig. It will be quite simple, just a few helpers for string operations.

It will contain a single function for now. Don't forget to import in Editor.

string.zig
//! Module with functions handling strings.

///////////////////////////////////////////////////////////////////////////////
//
//                              Functions
//
///////////////////////////////////////////////////////////////////////////////

/// Return the number of leading whitespace characters
pub fn leadingWhitespaces(src: []u8) usize {
    var i: usize = 0;
    while (i < src.len and asc.isWhitespace(src[i])) : (i += 1) {}
    return i;
}

///////////////////////////////////////////////////////////////////////////////
//
//                              Constants, variables
//
///////////////////////////////////////////////////////////////////////////////

const std = @import("std");
const asc = std.ascii;
const mem = std.mem;

Insert a new line

We insert a new line when we press Enter. Nothing simpler right? This operation is a bit more complex than it seems, especially if we want to copy indentation, which is optional, but it's so useful that we don't want to miss it.

Let's ignore indentation for now, and write the basic function.

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Insert lines
//
///////////////////////////////////////////////////////////////////////////////

/// Insert a new line at cursor position. Will carry to the next line
/// everything that is after the cursor.
fn insertNewLine(e: *Editor) !void {
    const V = &e.view;

    // make sure the beginning of the line is visible
    V.coloff = 0;

    // code to come...

    // row operations have been concluded, update rows
    try e.updateRow(V.cy - 1);
    try e.updateRow(V.cy);

    // set cursor position at the start of the new line
    V.cx = 0;
    V.cwant = 0;
}

At least, we want to handle several cases:

  • are we at the beginning of the line (cx = 0)? We insert an empty line above the current line, then increase the row number
Editor.zig: insertNewLine()
    // at first column, just insert an empty line above the cursor
    if (V.cx == 0) {
        try e.insertRow(V.cy, "");
        V.cy += 1;
        return;
    }
  • is there any whitespace that follows the cursor? Then we want to remove it when carrying over the text that follows
Editor.zig: insertNewLine()
    // leading whitespace removed from characters after cursor
    var skipw: usize = 0;

    var oldrow = e.currentRow().chars.items;

    // any whitespace before the text that is going into the new row
    if (V.cx < oldrow.len) {
        skipw = str.leadingWhitespaces(oldrow[V.cx..]);
    }

We already know that we are in the middle of a line, so we must carry everything that comes after the cursor to the new line.

After the row has been inserted, we proceed to the new row and shrink the row above. We perform this operation last, because we needed those characters to be able to append them. Cut and paste is actually a copy then delete operation in our case.

Editor.zig: insertNewLine()
    // will insert a row with the characters to the right of the cursor
    // skipping whitespace after the cursor
    try e.insertRow(V.cy + 1, oldrow[V.cx + skipw ..]);

    // proceed to the new row
    V.cy += 1;

    // delete from the row above the content that we moved to the next row
    e.rowAt(V.cy - 1).chars.shrinkAndFree(e.alc, V.cx);

Note

We are using the shrinkAndFree method, which is not optimal, because in many cases we would like to retain the ArrayList capacity. At least partially.

We could use instead the method shrinkRetainingCapacity, which does what it says. But this could lead to excessive memory usage, because rows would always keep the biggest capacity they had at any time, always growing, never shrinking.

Maybe better would be to do a shrinkAndFree while keeping some extra room, followed by a resize to set the correct length.

The same concepts would apply to row.render, if it was made an ArrayList.

These are all optimizations that can wait, anyway. For now, we keep it simple.

You might want to compile and run at this point, to check that everything is working. You should be able to insert characters, delete them, and inserting new lines.

Autoindent

We also want an option for autoindent.

Let's add the option:

option.zig
/// Copy indent from current line when starting a new line
pub var autoindent = true;

Autoindent brings additional concerns:

  • we should copy the indent from the line above

  • are we inserting the line while in the middle of the indent? Then we want to shorten the indent and remove the part of it that lies after the cursor

Add the ind variable: it is the number of whitespace characters that we must copy from the line above.

    // leading whitespace removed from characters after cursor
    var skipw: usize = 0;
    // extra characters for indent
    var ind: usize = 0;

What if we hit Enter in the middle of the indentation? We want to reduce it to the current column.

    // any whitespace before the text that is going into the new row
    if (V.cx < oldrow.len) {
        skipw = str.leadingWhitespaces(oldrow[V.cx..]);
    }
    if (opt.autoindent) {
        ind = str.leadingWhitespaces(oldrow);

        // reduce indent if current column is within it
        if (V.cx < ind) {
            ind = V.cx;
        }
    }

After we proceed to the new row, we must copy over the indent from the line above. Before copying, we reassign the pointer, because a row insertion in Buffer.rows has happened, which could have caused the invalidation of all row pointers...

    // proceed to the new row
    V.cy += 1;
    if (ind > 0) {
        // reassign pointer, invalidated by row insertion
        oldrow = e.rowAt(V.cy - 1).chars.items;

        // in new row, shift the old content forward, to make room for indent
        const newrow = try e.currentRow().chars.addManyAt(e.alc, 0, ind);

        // Copy the indent from the previous row.
        for (0..ind) |i| {
            newrow[i] = oldrow[i];
        }
    }

Finally, we must update the last two lines to set the cursor column after the indent:

    // set cursor position at the start of the new line
    V.cx = 0;
    V.cwant = 0;
    // set cursor position right after the indent in the new line
    V.cx = ind;
    V.cwant = ind;

Compile and try it!

Handling text wrapping

There's one last thing that we should handle, one little thing that will make our editor much more usable.

After we type a certain number of characters in the line, we want our text to be automatically wrapped into a new line, to avoid that the line becomes too long.

We call this option textwidth and we add it to our option module.

option.zig
/// Wrap text over a new line, when current line becomes longer than this value
pub var textwidth = struct {
    enabled: bool = true,
    len: u8 = 79,
} {};

Thinking more about it, it's not always desirable, especially when writing code. Our implementation will be particularly stubborn and absolutely refuse to let us write differently. In the future we might introduce ways to change option values with key combinations, and allow different options for different filetypes. For now, this is it, and we must accept it.

We need a new string module function:

string.zig
/// Return true if `c` is a word character.
pub fn isWord(c: u8) bool {
    return switch (c) {
        '0'...'9', 'a'...'z', 'A'...'Z', '_' => true,
        else => false,
    };
}

Handling of text wrapping happens in insertChar(), right after inserting the character.

Editor.zig: insertChar()
    // insert the character and move the cursor forward
    try e.rowInsertChar(V.cy, V.cx, c);
    V.cx += 1;
    //////////////////////////////////////////
    //              textwidth
    //////////////////////////////////////////

    const row = e.currentRow();
    const rx = row.cxToRx(V.cx);

    if (opt.textwidth.enabled and rx > opt.textwidth.len and str.isWord(c)) {

The logic can be split in two phases.

Phase 1

we must find the start of the current word, crawling back along the current row
if this word is preceded by a space character, we push back the cursor again, because we want to remove a single space while wrapping text, but not more than one
if this word is preceded by another kind of separator, we don't remove it, we just wrap the word
Editor.zig: insertChar()
        // will be 1 if a space before the wrapped word must be removed
        var skipw: usize = 0;

        // find the start of the current word
        var start: usize = rx - 1;

        while (start > 0) {
            if (!str.isWord(row.render[start - 1])) {
                // we want to remove a space before the wrapped word, but not
                // other kinds of separators (not even a tab, just in case)
                if (row.render[start - 1] == ' ') {
                    skipw = 1;
                }
                break;
            }
            start -= 1;
        }

Phase 2

We crawled back in the row, and we found where this word began. If the column is 0, it means it's a single very long sequence of word characters, we can't wrap anything.

If instead we can wrap it, we proceed as follows:

we set the cursor before the word, and also before the space character that precedes it (if there is one)
we insert a new line: the same things that would happen when pressing Enter would happen now, the extra space would be deleted and the word would be carried to the new line
we move forward the cursor to the end of the word we wrapped
Editor.zig: insertChar()
        // only wrap if the word doesn't start at the beginning
        if (start > 0) {
            const wlen = rx - start;

            // move the cursor to the start of the word, also skipping a space
            V.cx = row.rxToCx(start - skipw);

            // new line insertion will carry over the word and delete the space
            try e.insertNewLine();

            // move forward the cursor to the end of the word
            V.cx += wlen;
        }
    }

This completes the editor chapter. We still can't save our edits, but before getting there we need to expand the capabilities of our message area, so that it can actually print something.

Interacting with the user

At various points of our program, we'll want to interact with the users, either by notifying them of something, or by requesting something.

For example, we want to print a "help" sort of message when the editor starts, we must prompt for a filename when trying to save an unnamed buffer, or for a word when using the searching functionality.

We have already added the status_msg field in Editor, so we must add a function that prints it.

We'll have two ways to print, either normal messages (or prompts) using regular highlight, or error messages, which we'll print in a bright red color.

statusMessage()

What this function does, is clearing the previous content, and replace it with a new one, which we'll format on the fly by using the ArrayList(u8) method print(). Note that this method only works if the base type of the array is u8.

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Message area
//
///////////////////////////////////////////////////////////////////////////////

/// Set a status message, using regular highlight.
pub fn statusMessage(e: *Editor, comptime format: []const u8, args: anytype) !void {
    assert(format.len > 0);
    e.status_msg.clearRetainingCapacity();
    try e.status_msg.print(e.alc, format, args);
    e.status_msg_time = time();
}

print() uses std.Io.Writer, we'll see this interface again when we'll want to save a file.

We never pass an empty format, so we assert() that the format is not empty. You have to define a assert constant (do it yourself).

Finally we update status_msg_time, so that the message will be actually printed, then cleared after a while.

Note

This function doesn't really print anything on screen: the actual printing will be done in drawMessageBar(), which we already wrote.

Compile and run to see your "help" message printed in the message area when you start up the editor.

errorMessage()

This function is similar, but it will color the message in bright red, since it's supposed to be an error. Note that we can use the ++ string concatenation operator, since all values are comptime-known.

Editor.zig
/// Print an error message, using error highlight.
pub fn errorMessage(e: *Editor, comptime format: []const u8, args: anytype) !void {
    assert(format.len > 0);
    e.status_msg.clearRetainingCapacity();
    const fmt = ansi.ErrorColor ++ format ++ ansi.ResetColors;
    try e.status_msg.print(e.alc, fmt, args);
    e.status_msg_time = time();
}

The 'help' message

Let's take care of the "help" message.

Editor.zig: startUp()
pub fn startUp(e: *Editor, path: ?[]const u8) !void {
    try e.statusMessage(message.status.get("help").?, .{});

help should be a key in our message string map, but we don't have it yet, so add it to status_messages:

message.zig: status_messages
    .{ "help", "HELP: Ctrl-S = save | Ctrl-Q = quit | Ctrl-F = find" },

The 'unsaved' message

Let's add a message that warns us when we press Ctrl-Q and there are unsaved changes:

message.zig: status_messages
    .{ "unsaved", "WARNING!!! File has unsaved changes. Press Ctrl-Q {d} more times to quit." },

We print this message in processKeypress:

Editor.zig: processKeypress()
        .ctrl_q => {
            if (B.dirty and static.q > 0) {
                try e.statusMessage(message.status.get("unsaved").?, .{static.q});

Now, if we have unsaved changes, we'll get this warning, telling us how many times we must press Ctrl-Q to quit.

Needed constants:

Editor.zig
const assert = std.debug.assert;

I/O: writing

To save files we'll use the Io.Writer interface. I'm not going to explain in detail what is possible to do with it, because it has been recently introduced into the Zig standard library, it's a vast subject and I'm not familiar with it. So I'll stick to the minimum of informations to make our use case work.

Let's handle first the case where the filename is known, and we just want to save the current file.

We add another key-value pair to our status_messages string map:

message.zig: status_messages
    .{ "bufwrite", "\"{s}\" {d} lines, {d} bytes written" },

So that we'll print a message if the save is successful.

ioerr() and the error messages StringMap

Whenever a write operation fails, we'll handle the error in a helper function, ioerr():

Editor.zig
/// Handle an error of type IoError by printing an error message, without
/// quitting the editor.
fn ioerr(e: *Editor, err: t.IoError) !void {
    try e.errorMessage(message.errors.get("ioerr").?, .{@errorName(err)});
    return;
}

As you can see, this function doesn't make the process terminate only because we couldn't save the file for some reason. Instead, it will print an error in the message area, with the name of the error.

IoError

The ioerr function accepts an argument of type IoError. This is an error union that we'll define in types:

types.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Error sets
//
///////////////////////////////////////////////////////////////////////////////

/// Error set for both read and write operations.
pub const IoError = std.fs.File.OpenError
                 || std.fs.File.WriteError
                 || std.Io.Reader.Error
                 || std.Io.Writer.Error;

It includes errors for both reading and writing, because to write a file, we must also be able to open it, and also that can fail.

Error messages

We keep all these error messages we'll be using in message.zig, in another StringMap that we'll call errors:

message.zig
const error_messages = .{
    .{ "ioerr", "Can't save! I/O error: {s}" },
};

pub const errors = std.StaticStringMap([]const u8).initComptime(error_messages);

Saving a file

Editor.zig
/// Try to save the current file, prompt for a file name if currently not set.
/// Currently saving the file fails if directory doesn't exist, and there is no
/// tilde expansion.
fn saveFile(e: *Editor) !void {
    var B = &e.buffer;

    if (B.filename == null) {
        // will prompt for a filename
        return;
    }

    // code to come...
}

Before saving, we want to determine in advance how many bytes we'll write to disk, so that we can print it in a message.

Since e.buffer.filename is optional, once we are certain that it can't be null, we can access safely its non-null value with the .? notation.

Editor.zig: saveFile()
    // determine number of bytes to write, make room for \n characters
    var fsize: usize = B.rows.items.len;
    for (B.rows.items) |row| {
        fsize += row.chars.items.len;
    }

    const file = std.fs.cwd().createFile(B.filename.?, .{ .truncate = true });
    if (file) |f| {
        // write lines to file
    }
    else |err|{
        e.alc.free(B.filename.?);
        B.filename = null;
        return e.ioerr(err);
    }

We will try to open the file in writing mode, truncating it and replacing all bytes. Here the key std function is std.fs.cwd().createFile().

In this block we write the lines:

Editor.zig: saveFile()
    if (file) |f| {
        // write lines to file
        var buf: [1024]u8 = undefined;
        var writer = f.writer(&buf);
        defer f.close();
        // for each line, write the bytes, then the \n character
        for (B.rows.items) |row| {
            writer.interface.writeAll(row.chars.items) catch |err| return e.ioerr(err);
            writer.interface.writeByte('\n') catch |err| return e.ioerr(err);
        }
        // write what's left in the buffer
        try writer.interface.flush();
        try e.statusMessage(message.status.get("bufwrite").?, .{
            B.filename.?, B.rows.items.len, fsize
        });
        B.dirty = false;
        return;

Before writing, we need a buffered writer. The size doesn't matter too much I think, but too small would be close to unbuffered.

To actually write the file, we use the Io.Writer interface, which is accessed at writer.interface.

After we wrote all bytes, we have to flush the writer. This is what happens:

1.we provide a small buffer, that lives on the stack
2.this buffer is filled by the writer with characters that have to be written
3.when the buffer is full, the writer actually writes the data, then empties the buffer and repeats
4.when there's nothing more to write, there can be something left in the buffer, because the writer only writes the buffer when it's full
5.so we flush the buffer: the writer empties it and writes what's left

When we're done we print a message that says the name of the written file, how many lines and bytes have been written to disk.

If for some reason the write fails, the buffer filename is freed and made null.

Important

The same remarks that have been made for the Io.Reader interface are valid here: you can't make a copy of the interface, by assigning it directly:

const interface = f.writer(&buf).interface; // WRONG

Prompts

We're back to the point where we need to interact with the user, in this case to obtain a filename for a buffer that doesn't have one, so that we can save it.

The prompt function

We'll put this function in the Message Area section, right above the statusMessage and errorMessage functions.

For now this is a simplified version, we'll have to expand it later, when we'll want this prompt to accept a callback as argument, so that this callback can be invoked at each user input. But right now we don't need it, so we keep it at its simplest.

Editor.zig
/// Start a prompt in the message area, return the user input.
/// Prompt is terminated with either .esc or .enter keys.
/// Prompt is also terminated by .backspace if there is no character left in
/// the input.
fn promptForInput(e: *Editor, prompt: []const u8) !t.Chars {
    var al = try t.Chars.initCapacity(e.alc, 80);

    while (true) {
        // read keys
    }
    e.clearStatusMessage();
    return al;
}

This function returns an ArrayList, which is allocated inside the function itself. It's not a pointer to an existing ArrayList, it's a new one. The caller must remember to deinitialize this ArrayList with a defer statement.

Note that in this case, returning a pointer to the ArrayList created in promptForInput() would mean to return a dangling pointer, so we should either:

  • return a copy (doing this)
  • pass a pointer to an existing ArrayList as argument

To be more explicit, we could pass the allocator to promptForInput(), but I'm not doing it here.

The loop

The loop reads typed characters in the ArrayList. Input is terminated with Esc or Enter, and also with Backspace if the prompt is empty. If you wondered if we can move the cursor inside the prompt, the answer is no. But we can press Backspace to delete characters.

Editor.zig: promptForInput() loop
        try e.statusMessage("{s}{s}", .{ prompt, al.items });
        try e.refreshScreen();

        const k = try ansi.readKey();
        const c = @intFromEnum(k);

        switch (k) {
            .ctrl_h, .backspace => {
                if (al.items.len == 0) {
                    break;
                }
                _ = al.pop();
            },

            .esc, .enter => break,

            else => if (k == .tab or asc.isPrint(c)) {
                try al.append(e.alc, c);
            },
        }

When all is done, we clear the message area with this function, which we'll put in the Helpers section:

Editor.zig
/// Clear the message area. Can't fail because it won't reallocate.
fn clearStatusMessage(e: *Editor) void {
    e.status_msg.clearRetainingCapacity();
}

Prompting for a filename

In saveFile() we had a placeholder of this case, and we'll replace it with:

Editor.zig: saveFile()
        // will prompt for a filename
        return;
        var al = try e.promptForInput(message.prompt.get("fname").?);
        defer al.deinit(e.alc);

        if (al.items.len > 0) {
            B.filename = try e.updateString(B.filename, al.items);
        }
        else {
            try e.statusMessage("Save aborted", .{});
            return;
        }

We need a new StringMap in our message module:

message.zig
const prompt_messages = .{
    .{ "fname", "Enter filename, or ESC to cancel: " },
};

pub const prompt = std.StaticStringMap([]const u8).initComptime(prompt_messages);

Binding Ctrl-S to save the file

We don't have yet a way to save, because we didn't bind a key. We add a new branch to the processKeypress function:

Editor.zig: processKeypress()
        .ctrl_s => try e.saveFile(),

And that's about it. If you compile and run with:

./kilo some_new_file

you should be able to edit the file, give it a name and save it.

Highlight

We have two features left to implement: searching and syntax highlighting. Both of them require the ability to apply a different highlight to our text, so we'll do that.

We'll do everything in the types module, but first we must define the color codes that we'll be using. In ansi define these namespaced constants:

ansi.zig
/// Codes for 16-colors terminal escape sequences (foreground)
pub const FgColor = struct {
    pub const default: u8 = 39;
    pub const black: u8 = 30;
    pub const red: u8 = 31;
    pub const green: u8 = 32;
    pub const yellow: u8 = 33;
    pub const blue: u8 = 34;
    pub const magenta: u8 = 35;
    pub const cyan: u8 = 36;
    pub const white: u8 = 37;
    pub const black_bright: u8 = 90;
    pub const red_bright: u8 = 91;
    pub const green_bright: u8 = 92;
    pub const yellow_bright: u8 = 93;
    pub const blue_bright: u8 = 94;
    pub const magenta_bright: u8 = 95;
    pub const cyan_bright: u8 = 96;
    pub const white_bright: u8 = 97;
};

/// Codes for 16-colors terminal escape sequences (background)
pub const BgColor = struct {
    pub const default: u8 = 49;
    pub const black: u8 = 40;
    pub const red: u8 = 41;
    pub const green: u8 = 42;
    pub const yellow: u8 = 43;
    pub const blue: u8 = 44;
    pub const magenta: u8 = 45;
    pub const cyan: u8 = 46;
    pub const white: u8 = 47;
    pub const black_bright: u8 = 100;
    pub const red_bright: u8 = 101;
    pub const green_bright: u8 = 102;
    pub const yellow_bright: u8 = 103;
    pub const blue_bright: u8 = 104;
    pub const magenta_bright: u8 = 105;
    pub const cyan_bright: u8 = 106;
    pub const white_bright: u8 = 107;
};

Highlight enum

We need to define the Highlight enum, which goes in types. We start with few values and will expand it later:

types.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Highlight
//
///////////////////////////////////////////////////////////////////////////////

/// All available highlight types.
pub const Highlight = enum(u8) {
    /// The normal highlight
    normal = 0,

    /// Incremental search highlight
    incsearch,

    /// Highlight for error messages
    err,
};

An array for highlight

Our Row type must have an additional array, which will have the same length of the render array, and which will contain the Highlight for each element of the render array:

Row.zig
/// Array with the highlight of the rendered row
hl: []t.Highlight,

We'll initialize this array in Row.init():

        .hl = &.{},

deinitialize it in Row.deinit():

    allocator.free(row.hl);

and will fill it in a new function:

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Syntax highlighting
//
///////////////////////////////////////////////////////////////////////////////

/// Update highlight for a row.
fn updateHighlight(e: *Editor, ix: usize) !void {
    const row = e.rowAt(ix);

    // reset the row highlight to normal
    row.hl = try e.alc.realloc(row.hl, row.render.len);
    @memset(row.hl, .normal);
}

Later we'll do syntax highlighting here. This function is called at the end of updateRow(), because every time the rendered row is updated, its highlight must be too.

Editor.zig: updateRow()
    try e.updateHighlight(ix);
}

Highlight groups

Highlight groups have properties, which we define in a new type.

types.zig
/// Attributes of a highlight group.
pub const HlGroup = struct {
    /// Foreground CSI color code
    fg: u8,

    /// Background CSI color code
    bg: u8,

    reverse: bool,
    bold: bool,
    italic: bool,
    underline: bool,
};

An array of highlight groups

We create the array with the highlight groups in a new module hlgroups.zig, since an array isn't a Type.

We add already a helper to get the index for the array when initializing it.

hlgroups.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Highlight groups
//
///////////////////////////////////////////////////////////////////////////////

// here goes the hlGroups array

// Get the enum value as integer, so that it can be used as array index.
fn int(ef: t.Highlight) usize {
    return @intFromEnum(ef);
}

///////////////////////////////////////////////////////////////////////////////
//
//                              Constants, variables
//
///////////////////////////////////////////////////////////////////////////////

const std = @import("std");
const t = @import("types.zig");

const ansi = @import("ansi.zig");
const CSI = ansi.CSI;
const FgColor = ansi.FgColor;
const BgColor = ansi.BgColor;

Here things become really interesting, so pay attention.

We must define an array of highlight groups. There are no designated initializers in Zig, so we use a labeled block to make up for them. At the same time, you'll see that these blocks let us do some wondrous things.

This block must return an array of HlGroup, with a size that is the number of the fields of the Highlight enum. We don't want to guess how many highlight types we have, so we get the exact number of them. We can do so with:

@typeInfo(EnumType).@"enum".fields.len

@" notation for identifiers

From the official documentation:

Variable identifiers are never allowed to shadow identifiers from an outer
scope. Identifiers must start with an alphabetic character or underscore
and may be followed by any number of alphanumeric characters or
underscores. They must not overlap with any keywords.

If an identifier wouldn't be valid according to this rules, we can use the @" notation. In our case we write @"enum" because enum is a keyword.

hlgroups.zig
// Number of members in the Highlight enum
const n_hl = @typeInfo(t.Highlight).@"enum".fields.len;

/// Array with highlight groups.
pub const hlGroups: [n_hl]t.HlGroup = arr: {
    // Initialize the hlGroups array at compile time. A []HlGroup array is
    // first declared undefined, then it is filled with all highlight groups.
    var hlg: [n_hl]t.HlGroup = undefined;
    hlg[int(.normal)] = .{
        .fg = FgColor.default,
        .bg = BgColor.default,
        .reverse = false,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.incsearch)] = .{
        .fg = FgColor.green,
        .bg = BgColor.default,
        .reverse = true,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.err)] = .{
        .fg = FgColor.red_bright,
        .bg = BgColor.default,
        .reverse = false,
        .bold = true,
        .italic = false,
        .underline = false,
    };
    break :arr hlg;
};

An array of highlight attributes

We also define an array with the attributes (the generated CSI sequences) for all highlight groups. Also this one is created with a labeled block.

In this last block there's a loop: from the previously defined highlight groups, it will generate the CSI escape sequence (the attribute) of the group itself. This sequence is what we will actually use in the program to apply the highlight.

hlgroups.zig
/// Array with highlight attributes.
pub const hlAttrs: [n_hl][]const u8 = arr: {
    // generate the attribute for each of the highlight groups
    // bold/italic/etc: either set them, or reset them to avoid their
    // propagation from previous groups
    var hla: [n_hl][]const u8 = undefined;
    for (hlGroups, 0..) |hlg, i| {
        hla[i] = CSI ++ std.fmt.comptimePrint("{s}{s}{s}{s}{};{}m", .{
            if (hlg.bold) "1;" else "22;",
            if (hlg.italic) "3;" else "23;",
            if (hlg.underline) "4;" else "24;",
            if (hlg.reverse) "7;" else "27;",
            hlg.fg,
            hlg.bg,
        });
    }
    break :arr hla;
};

Maybe you didn't realize yet why it's so awesome: everything here is done at compile time! There won't be trace of this in the binary executable, except the resulting hlAttrs array. The block doesn't use the comptime keyword, if you use it the compiler will tell you

error: redundant comptime keyword in already comptime scope

As proof that the comptime keyword is unnecessary most of the times.

Note

The hlGroups array isn't used at runtime. Still, defining it is useful because we can change more easily the highlight groups. The compiler keeps out of the executable what isn't used at runtime anyway.

How we access the attribute

We'll create a method in the HlGroup type that returns the attribute for that highlight type:

types.zig: HlGroup
    underline: bool,
    /// Get the attribute of a HlGroup from the hlAttrs array.
    pub fn attr(color: Highlight) []const u8 {
        return hlAttrs[@intFromEnum(color)];
    }

And import the array:

types.zig
const hlAttrs = @import("hlgroups.zig").hlAttrs;

CSI escape sequences

The attribute of each highlight group is a string: the escape sequence that is fed to the terminal to get the highlight we want. The format is:

ESC[{bold};{italic};{underline};{reverse};{fg-color};{bg-color}m

For example, if a group wants bold text, it will start with

\x1b[1;

If it doesn't want it, it will reset the bold attribute with

\x1b[22;

Otherwise it would inherit the value of the group that preceded it, whatever it was.

More comptime

We used a hard-coded ErrorColor when printing errors in the message area, time to change it in errorMessage():

    const fmt = ansi.ErrorColor ++ format ++ ansi.ResetColors;
    const fmt = comptime t.HlGroup.attr(.err) ++ format ++ ansi.ResetColors;

You should now delete the ErrorColor constant from ansi.

Note the comptime keyword here. Without it, the compiler would say:

error: unable to resolve comptime value
note: slice being concatenated must be comptime-known

With the comptime keyword, you force the compiler to at least try to get that value at compile time. In this case, it succeeds. Also note that comptime can precede any expression, to force it being evaluated at compile time: function calls, assignments, etc.

Again: you generally don't need the compile keyword. But if the compiler complains with that sort of errors, and you think it should be able to get the value, it's worth a try.

Applying the highlight

We have now all we need to apply the highlight. This should be done where rows are drawn, in drawRows(). There, until now, we were simply drawing the rendered row as-is. This must change into:

We get the portion of the line that starts at coloff, and we iterate it for len characters, so that we only iterate the part of the line that can fit the screen:

drawRows() outer loop
            if (len > 0) {
                try e.toSurface(rows[ix].render[V.coloff .. V.coloff + len]);
            }
            // part of the line after coloff, and its highlight
            const rline = if (len > 0) rows[ix].render[V.coloff..] else &.{};
            const hl = if (len > 0) rows[ix].hl[V.coloff..] else &.{};

Inside the inner loop we check the character highlight, if it's different, we apply the highlight attribute, which will remain enabled until a different highlight is found in the row.hl array:

            var current_color = t.Highlight.normal;

            // loop characters of the rendered row
            for (rline[0..len], 0..) |c, i| {
                if (hl[i] != current_color) {
                    const color = hl[i];
                    current_color = color;
                    try e.toSurface(t.HlGroup.attr(color));
                }

We draw the character. At the end of the line we restore default highlight, otherwise the last highlight would carry over beyond the end of the line, and onto the next line:

                try e.toSurface(c);
            }
            // end of the line, reset highlight
            try e.toSurface(ansi.ResetColors);

Safe to iterate zero-length slices?

We can safely iterate a zero-length slice with a for loop. For example this just prints nothing:


    const line: []const u8 = &.{};
    for (line) |c| {
        std.debug.print("{}\n", .{c});
        break;
    } else {
        std.debug.print("nothing\n", .{});
    }

We could not do this with a while loop, because we would need to actually access the line by index.

Highlight for non-printable characters

As a first proof-of-concept for our highlight, we want non-printable characters to be printed with a reversed highlight (black on white), for example we'll turn Ctrl-A into A with reversed colors. If the character is not a Ctrl character, it will be printed as ? with reversed colors.

It won't work for some charcters like Tab or Backspace, though, but for now it will do.

This kind of highlight will work with all filetypes, so we aren't talking about syntax highlighting yet.

We'll need a way to insert non-printable characters, so we define a key (Ctrk-K) which will let us insert characters verbatim, even those that we couldn't type anyway. For example Ctrl-Q would quit, it would not insert it. But while inserting characters verbatim we'll be able to type it.

Process verbatim keypresses

In processKeypress(), we add a variable verbatim in the static struct:

Editor.zig: processKeypress()
    const static = struct {
        var q: u8 = opt.quit_times;
        var verbatim: bool = false;

Just below the static struct definition, before processing keypresses, we check if the variable was set, in this case we reset the variable, insert the character and return. There is a set of characters that we don't insert, because we cannot handle them at this point, they would just break our text.

Editor.zig: processKeypress()
    if (static.verbatim) {
        static.verbatim = false;
        switch (k) {
            // these cause trouble, don't insert them
            .enter,
            .ctrl_h,
            .backspace,
            .ctrl_j,
            .ctrl_k,
            .ctrl_l,
            .ctrl_u,
            .ctrl_z,
            => {
                try e.errorMessage(message.errors.get("nonprint").?, .{ k });
                return;
            },
            else => try e.insertChar(@intFromEnum(k)),
        }
        return;
    }

We'll make Ctrl-K set this variable to true:

Editor.zig: processKeypress() switch
        .ctrl_k => static.verbatim = true,

For the error, we need the nonprint error message:

message.zig: error_messages
    .{ "nonprint", "Can't insert character: {any}" },

Highlight the verbatim characters

This highlight group is filetype-independent, so we just handle it in the drawRows() inner loop:

Editor.zig: drawRows() inner loop
                if (hl[i] != current_color) {
                if (c != '\t' and !asc.isPrint(c)) {
                    // for example, turn Ctrl-A into 'A' with reversed colors
                    current_color = t.Highlight.nonprint;
                    try e.toSurface(t.HlGroup.attr(.nonprint));
                    try e.toSurface(switch (c) {
                        0...26 => '@' + c,
                        else => '?',
                    });
                }
                else if (hl[i] != current_color) {

We also need to add nonprint to the Highlight enum:

types.zig: Highlight enum
    /// Highlight for non-printable characters
    nonprint,

Define the highlight group

Now, if you try to compile, the compiler will say something like:

src/types.zig|162 col 20| error: use of undefined value here causes illegal behavior
||             if (hlg.bold) "1;" else "22;",

That's because we didn't define the highlight group in hlGroups, but the hlAttrs initializer tries to access it. This means that our system is really ok! We can't forget to define groups without the compiler telling us.

So we add the highlight group in the hlGroups labeled block:

types.zig
    hlg[int(.nonprint)] = .{
        .fg = FgColor.white,
        .bg = BgColor.default,
        .reverse = true,
        .bold = false,
        .italic = false,
        .underline = false,
    };

Now it should compile and the following should work:

  • try inserting a non-printable character with Ctrl-K followed by Ctrl-A

  • now try pressing two times Ctrl-K: we decided not insert certain characters and print an error message instead, this should have the .err highlight.

Searching

Now that we can prompt the user for input, and we can apply highlight to the text, we could give our editor the capability to search for words in the file.

To be able to do this, we'll need several changes. We defined the incsearch highlight, so we don't need to do that.

Instead, we must change how promptForInput() works. Until now, it only prompted a string from the user and returned it, without doing anything in between.

Now instead we want that every time the user types a character, the currently typed pattern will be searched, and if found it will be given a highlight on the screen.

The prompt callback

To achieve this, we will need our promptForInput() function to accept a callback function as parameter, and call it repeatedly inside its body.

We define the callback types as follows:

types.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Callbacks
//
///////////////////////////////////////////////////////////////////////////////

/// The prompt callback function type
pub const PromptCb = fn (*Editor, PromptCbArgs) EditorError!void;

/// Arguments for the prompt callback
pub const PromptCbArgs = struct {
    /// Current input entered by user
    input: *Chars,

    /// Last typed key
    key: Key,

    /// Saved view, in case it needs to be restored
    saved: View,

    /// Becomes true in the last callback invocation
    final: bool = false,
};

Note how easy and clear it is in Zig to define typedefs (as they are named in C), as we do for PromptCb.

Then we change the promptForInput() signature to:

/// Start a prompt in the message area, return the user input.
/// Prompt is terminated with either .esc or .enter keys.
/// Prompt is also terminated by .backspace if there is no character left in
/// the input.
fn promptForInput(e: *Editor, prompt: []const u8) !t.Chars {
/// Start a prompt in the message area, return the user input.
/// At each keypress, the prompt callback is invoked, with a final invocation
/// after the prompt has been terminated with either .esc or .enter keys.
/// Prompt is also terminated by .backspace if there is no character left in
/// the input.
fn promptForInput(e: *Editor, prompt: []const u8, saved: t.View, cb: ?t.PromptCb) !t.Chars {
    _ = cb;
    _ = saved;

We'll have to fix the previous invocation:

        var al = try e.promptForInput(message.prompt.get("fname").?);
        var al = try e.promptForInput(message.prompt.get("fname").?, .{}, null);

EditorError set

If you try to compile now, the compiler will tell you that this error doesn't exist. If you try to remove it from the PromptCb return value, the compiler will tell you

error: function type cannot have an inferred error set

So we need an explicit error set for our callback. We don't know how many kinds of errors could cause a PromptCb to fail. The callback we'll be using for the searching function will be of type

error{OutOfMemory}

So we could just write that. But PromptCb is a 'generic' callback, which could do just about anything, and we'd need to add more errors to that set.

Instead, we create our EditorError set, and if we'll need to handle more errors, we'll add them to this set.

Just add it above our previous IoError set:

types.zig
/// Error set for functions requiring explicit error handling.
pub const EditorError = error{
    OutOfMemory,
};

Updated promptForInput()

Remove those assignments at the top:

Editor.zig: promptForInput()
    _ = cb;
    _ = saved;

Now our prompt function needs to invoke this PromptCb callback.

Before the loop starts, we want to define some variables:

Editor.zig: promptForInput()
    var k: t.Key = undefined;
    var c: u8 = undefined;
    var cb_args: t.PromptCbArgs = undefined;
    while (true) {

which we'll assign inside the loop:

    while (true) {
        try e.statusMessage("{s}{s}", .{ prompt, al.items });
        try e.refreshScreen();
        const k = try ansi.readKey();
        const c = @intFromEnum(k);
        k = try ansi.readKey();
        c = @intFromEnum(k);
        cb_args = .{ .input = &al, .key = k, .saved = saved };

Before the loop ends, we run the callback, if not null:

        if (cb) |callback| try callback(e, cb_args);
    }
    e.clearStatusMessage();

After the loop, we call it one last time before returning the input:

    e.clearStatusMessage();
    return al;
    e.clearStatusMessage();
    cb_args.final = true;
    if (cb) |callback| try callback(e, cb_args);
    return al;

Doing the search

Our prompt accepts a callback now, so we're ready to implement the search functionality.

We bind a new key:

Editor.zig: processKeypress()
        .ctrl_f => try e.find(),

Then we define our function:

Editor.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Find
//
///////////////////////////////////////////////////////////////////////////////

/// Start the search prompt.
fn find(e: *Editor) !void {
    const saved = e.view;
    var query = try e.promptForInput("/", saved, findCallback);
    query.deinit(e.alc);
}

In this function, we make a copy of the current View, so that we can restore the cursor position in the case that the search is interrupted.

We get our query, then deinitialize it. It's clear we're missing some piece of the puzzle...

Which brings us to the findCallback() function, which is passed to the prompt.

The find callback: preparations

We'll have to break the code for the find callback in pieces somehow.

We also need some additional preparations.

Pos type

We need a type that represents a position in the buffer.

types.zig
/// A position in the buffer.
pub const Pos = struct {
    lnr: usize = 0,
    col: usize = 0,
};

wrapscan option

We need an option for the searching behavior: should the search continue when the end of file is reached, by repeating the search from the start of the file? This also works while searching backwards:

option.zig
/// Searches wrap around the end of the file
pub var wrapscan = true;

Constants

We need two new constants:

Editor.zig
const lastIndexOf = mem.lastIndexOf;
const indexOf = mem.indexOf;

The findCallback() function

This one is big. We'll start with a stub, filled with placeholders. We reset the highlight and we handle clean up at the start of the function, so that later it can return at any point.

Editor.zig
/// Called by promptForInput() for every valid inserted character.
/// The saved view is restored when the current query isn't found, or when
/// backspace clears the query, so that the search starts from the original
/// position.
fn findCallback(e: *Editor, ca: t.PromptCbArgs) t.EditorError!void {
    // 1. variables
    // 2. restore line highlight
    // 3. clean up
    // 4. query is empty so no need to search, but restore position
    // 5. handle backspace
    // 6. find the starting line and the column offset for the search
    // 7. start the search
}

Variables

As we did before, we have a static struct which will save the current state of the search.

    const static = struct {
        var found: bool = false;
        var view: t.View = .{};
        var pos: t.Pos = .{};
        var oldhl: []t.Highlight = &.{};
    };

We also define some constants:

    const empty = ca.input.items.len == 0;
    const numrows = e.buffer.rows.items.len;

Restore line highlight before incsearch highlight

Before a new search attempt, we restore the underlying line highlight, so that if the search fails, the search highlight has been cleared already.

    // restore line highlight before incsearch highlight, or clean up
    if (static.oldhl.len > 0) {
        @memcpy(e.rowAt(static.pos.lnr).hl, static.oldhl);
    }

Clean up

The clean up must also be handled early. This block runs during the last invocation of the callback, that is done for this exact purpose.

In this step we free the search highlight, reset our static variables and restore the view if necessary.

    // clean up
    if (ca.final) {
        e.alc.free(static.oldhl);
        static.oldhl = &.{};
        if (empty or ca.key == .esc) {
            e.view = ca.saved;
        }
        if (!static.found and ca.key == .enter) {
            try e.statusMessage("No match found", .{});
        }
        static.found = false;
        return;
    }

Empty query

This happens after we press Backspace and the query is now empty. We don't cancel the search yet, but we restore the original view. Search will be canceled if we press Backspace again. We also reset static.found because it was true if that character we just deleted was a match.

    // Query is empty so no need to search, but restore position
    if (empty) {
        static.found = false;
        e.view = ca.saved;
        return;
    }

Handle Backspace

This happens when we press Backspace, but the query is not empty. In this case we restore our static view, which is set later on. Note that if the current query can't be found, this would be the same of the original view, but what matters is that we must restore it, whatever it is.

    // when pressing backspace we restore the previously saved view
    // cursor might move or not, depending on whether there is a match at
    // cursor position
    if (ca.key == .backspace or ca.key == .ctrl_h) {
        e.view = static.view;
    }

We define some constants, to make the function flow more understandable.

    //////////////////////////////////////////
    //   Find the starting position
    //////////////////////////////////////////

    const V = &e.view;

    const prev = ca.key == .ctrl_t;
    const next = ca.key == .ctrl_g;

    // current cursor position
    var pos = t.Pos{ .lnr = V.cy, .col = V.cx };

    const eof = V.cy == numrows;
    const last_char_in_row = !eof and V.rx == e.currentRow().render.len;
    const last_row = V.cy == numrows - 1;

    // must move the cursor forward before searching when we don't want to
    // match at cursor position
    const step_fwd = next or empty or !static.found;

Warning

If we skip the !eof check when defining last_char_in_row, we would cause panic when starting a search at the end of the file. This happens because e.currentRow() tries to get a pointer to a line that doesn't exist. Watch out for these things!

We are determining where the search must start, and that's either at cursor position, or just after that (one character to the right). That is, we must decide whether to accept a match at cursor position or not.

We want to step forward:

  • if we press Ctrl-G, looking for the next match

  • if we are at the starting position, because either:

    • we just started a search
    • query is empty
    • a match hasn't been found

In any of these cases:

    if (step_fwd) {
        if (eof or (last_row and last_char_in_row)) {
            if (!opt.wrapscan) { // restart from the beginning of the file?
                return;
            }
        }
        else if (last_char_in_row) { // start searching from next line
            pos.lnr = V.cy + 1;
        }
        else { // start searching after current column
            pos.col = V.cx + 1;
            pos.lnr = V.cy;
        }
    }

Our match is an optional slice of the chars.items array of the Row where the match was found. We try to find it with the appropriate functions, which we'll define later.

    //////////////////////////////////////////
    //          Start the search
    //////////////////////////////////////////

    var match: ?[]const u8 = null;

    if (!prev) {
        match = e.findForward(ca.input.items, &pos);
    }
    else {
        match = e.findBackward(ca.input.items, &pos);
    }

    static.found = match != null;

If a match is found, we update the cursor position and the static variables.

Since match is a slice of the original array, we can find the column with pointer arithmetic, by subtracting the address of the first character of the chars.items array from the address of the first character of our match.

    const row = e.rowAt(pos.lnr);

    if (match) |m| {
        V.cy = pos.lnr;
        V.cx = &m[0] - &row.chars.items[0];

        static.view = e.view;
        static.pos = .{ .lnr = pos.lnr, .col = V.cx };

&row.chars.items[0]&m[0]Match (m)Row(row.chars.items)

Note

Since we pass &pos to the functions, we could set the column there, but this works anyway (it's actually less trouble). Initially I wasn't using Pos, but I'm keeping it to show an example of pointer arithmetic in Zig. Feel free to refactor it if it suits you better.

Before setting the new highlight, we store a copy in static.oldhl. It will be restored at the top of the callback, every time the callback is invoked.

Note that we are matching against row.chars.items (the real row), but the highlight must match the characters in the rendered row, so we must convert our match position first, with cxToRx.

        // first make a copy of current highlight, to be restored later
        static.oldhl = try e.alc.realloc(static.oldhl, row.render.len);
        @memcpy(static.oldhl, row.hl);

        // apply search highlight
        const start = row.cxToRx(V.cx);
        const end = row.cxToRx(V.cx + m.len);
        @memset(row.hl[start .. end], t.Highlight.incsearch);
    }

If a match wasn't found, we restore the initial view (before we started searching).

We must also handle the case that wrapscan is disabled, a match isn't found in the current searching direction, but there was possibly a match before, so we just remain there, and set the highlight at current position. We need to set it because the original has been restored at the top.

Also here we do the same conversion, but we use the saved position.

    else if (next or prev) {
        // the next match wasn't found in the searching direction
        // we still set the highlight for the current match, since the original
        // highlight has been restored at the top of the function
        // this can definitely happen with !wrapscan
        const start = row.cxToRx(static.pos.col);
        const end = row.cxToRx(static.pos.col + ca.input.items.len);
        @memset(row.hl[start .. end], t.Highlight.incsearch);
    }
    else {
        // a match wasn't found because the input couldn't be found
        // restore the original view (from before the start of the search)
        e.view = ca.saved;
    }

Search forwards

When searching forwards for a match, we start searching at the given position, in the current row. We use the std.mem.indexOf function, that finds the relative position of a slice in another slice, or returns null if the slice isn't contained in the other slice.

Following steps are followed unless a match is returned.

search a slice of the current row [col..]
reset search column to 0
search the following lines
end of file, no wrapscan? return null
restart from the beginning of the file
if you reach the initial line, only search [..col]

If a match is found, pos.lnr is updated, because the callback will need the line where it was found.

Editor.zig
/// Start a search forwards.
fn findForward(e: *Editor, query: []const u8, pos: *t.Pos) ?[]const u8 {
    var col = pos.col;
    var i = pos.lnr;

    while (i < e.buffer.rows.items.len) : (i += 1) {
        const rowchars = e.rowAt(i).chars.items;

        if (indexOf(u8, rowchars[col..], query)) |m| {
            pos.lnr = i;
            return rowchars[(col + m)..(col + m + query.len)];
        }

        col = 0; // reset search column
    }

    if (!opt.wrapscan) {
        return null;
    }

    // wrapscan enabled, search from start of the file to current row
    i = 0;
    while (i <= pos.lnr) : (i += 1) {
        const rowchars = e.rowAt(i).chars.items;

        if (indexOf(u8, rowchars, query)) |m| {
            pos.lnr = i;
            return rowchars[m .. m + query.len];
        }
    }
    return null;
}

Search backward

The process is very similar, but in reverse. This time we use the std.mem.lastIndexOf function, that finds the relative position of a slice in another slice before a certain index, or returns null if the slice isn't contained in the other slice.

Following steps are followed unless a match is returned.

search a slice of the current row [0..col]
search the previous lines
start of file, no wrapscan? return null
restart from the end of the file
if you reach the initial line, only search [col..]

If a match is found, pos.lnr is updated, because the callback will need the line where it was found.

Editor.zig
/// Start a search backwards.
fn findBackward(e: *Editor, query: []const u8, pos: *t.Pos) ?[]const u8 {
    // first line, search up to col
    const row = e.rowAt(pos.lnr);
    const col = pos.col;
    var rowchars = row.chars.items;
    var i: usize = undefined;

    if (lastIndexOf(u8, rowchars[0..col], query)) |m| {
        return rowchars[m .. m + query.len];
    }
    else if (pos.lnr > 0) {
        // previous lines, search full line
        i = pos.lnr - 1;
        while (true) : (i -= 1) {
            rowchars = e.rowAt(i).chars.items;

            if (lastIndexOf(u8, rowchars, query)) |m| {
                pos.lnr = i;
                return rowchars[m .. m + query.len];
            }
            if (i == 0) break;
        }
    }

    if (!opt.wrapscan) {
        return null;
    }

    i = e.buffer.rows.items.len - 1;
    while (i > pos.lnr) : (i -= 1) {
        rowchars = e.rowAt(i).chars.items;

        if (lastIndexOf(u8, rowchars, query)) |m| {
            pos.lnr = i;
            return rowchars[m .. m + query.len];
        }
    }

    // check again the starting line, this time in the part after the offset
    rowchars = e.rowAt(pos.lnr).chars.items;

    if (lastIndexOf(u8, rowchars[col..], query)) |m| {
        // m is the index in the substring starting from `col`, therefore we
        // must add `col` to get the real index in the row
        return rowchars[m + col .. m + col + query.len];
    }
    return null;
}

Write a test

By now you should be able to compile, run and test the feature yourself.

Anyway, the searching feature is way more complex than anything we did before, and it's worth writing a test for it.

I don't know how to simulate keystrokes, so I'm just calling the callback repeatedly.

I initialize the editor with a 'fake' screen, because this isn't an interactive terminal.

Remember that we can do array multiplications (**) and concatenation (++), but only in comptime scopes.

I won't explain what the test does, hopefully you'll be able to understand it.

Editor.zig: Tests section
test "find" {
    var da = std.heap.DebugAllocator(.{}){};
    defer _ = da.deinit();

    var e = try t.Editor.init(da.allocator(), .{ .rows = 50, .cols = 180 });
    defer e.deinit();

    opt.wrapscan = true;
    opt.tabstop = 8;

    // our test buffer
    try e.insertRow(e.buffer.rows.items.len, "\tabb");
    try e.insertRow(e.buffer.rows.items.len, "\tacc");
    try e.insertRow(e.buffer.rows.items.len, "\tadd\tadd");

    const n = [1]t.Highlight{ .normal };
    const s = [1]t.Highlight{ .incsearch };

    // Row.hl has the same number of elements as the rendered row, and here we
    // have tabs

    // first 2 lines: normal highlight
    const norm1 = n ** 11;
    // third line: normal highlight
    const norm2 = n ** 19;
    // \t + 1 letter in lines 1-2
    const hl = s ** 9 ++ n ** 2;
    // \t + 2 letters in lines 1-2
    const hl2 = s ** 10 ++ n ** 1;
    // \t + 2 letters in line 3, first match
    const hl3 = s ** 10 ++ n ** 9;
    // \t + 1 letter in line 3, first match
    const hl4 = s ** 9 ++ n ** 10;
    // \t + 1 letter in line 3, second match
    const hl5 = n ** 11 ++ s ** 6 ++ n ** 2;

    var al = try t.Chars.initCapacity(e.alc, 80);
    defer al.deinit(e.alc);

    // our prompt is "\ta", it should be found in line 2, because we skip the
    // match at cursor position
    try al.appendSlice(e.alc, "\ta");
    var ca: t.PromptCbArgs = .{ .input = &al, .key = @enumFromInt('a'), .saved = e.view };
    try e.findCallback(ca);

    try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &hl));
    try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &norm2));

    // now it's "\tac", extending the current match
    try al.append(e.alc, 'c');
    ca = .{ .input = &al, .key = @enumFromInt('c'), .saved = e.view };
    try e.findCallback(ca);

    try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &hl2));
    try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &norm2));

    // now it's "\ta", resizing the current match
    _ = al.pop();
    ca = .{ .input = &al, .key = .backspace, .saved = e.view };
    try e.findCallback(ca);
    try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &hl));
    try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &norm2));

    // now it's "\tad", found in line 3
    try al.append(e.alc, 'd');
    ca = .{ .input = &al, .key = @enumFromInt('d'), .saved = e.view };
    try e.findCallback(ca);
    try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &hl3));

    // now it's "\ta", resizes the current match
    _ = al.pop();
    ca = .{ .input = &al, .key = .backspace, .saved = e.view };
    try e.findCallback(ca);
    try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &hl4));

    // find next: finds another "\ta" in the same row
    ca = .{ .input = &al, .key = .ctrl_g, .saved = e.view };
    try e.findCallback(ca);
    try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &hl5));

    // find next again: finds "\ta" in the first line
    ca = .{ .input = &al, .key = .ctrl_g, .saved = e.view };
    try e.findCallback(ca);
    try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &hl));
    try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &norm2));

    // find prev: goes back to last line (2nd match)
    ca = .{ .input = &al, .key = .ctrl_t, .saved = e.view };
    try e.findCallback(ca);
    try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &hl5));

    opt.wrapscan = false;

    // find next should fail (stays the same)
    ca = .{ .input = &al, .key = .ctrl_g, .saved = e.view };
    try e.findCallback(ca);
    try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &norm1));
    try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &hl5));

    // clean up
    ca.final = true;
    try e.findCallback(ca);
}

Syntax highlighting

The last feature to implement is syntax highlighting.

New fields

Add a new field in the Buffer type:

Buffer.zig
// Pointer to the syntax definition
syndef: ?*const t.Syntax,

And one in the Row type:

Row.zig
/// True when the row has a multiline comment continuing into next line
ml_comment: bool,

This one becomes true when a line contains the leader that opens the multi-line comment, and stays true in all following rows, until the end of the block is found, in that row it becomes false again.

Reminder

Initialize both in their relative init(), to null and false respectively. Add imports where necessary.

Fill the rest of Highlight enum

This is the full Highlight enum, with all needed highlight names:

types.zig
/// All available highlight types.
pub const Highlight = enum(u8) {
    /// The normal highlight
    normal = 0,

    /// Line comments highlight
    comment,

    /// Multiline comments highlight
    mlcomment,

    /// Numbers highlight
    number,

    /// String highlight
    string,

    /// Highlight for keywords of type 'keyword'
    keyword,

    /// Highlight for keywords of type 'types'
    types,

    /// Highlight for keywords of type 'builtin'
    builtin,

    /// Highlight for keywords of type 'constant'
    constant,

    /// Highlight for keywords of type 'preproc'
    preproc,

    /// Highlight for uppercase words
    uppercase,

    /// Highlight for escape sequences in strings
    escape,

    /// Incremental search highlight
    incsearch,

    /// Highlight for non-printable characters
    nonprint,

    /// Highlight for error messages
    err,
};

Fill the rest of hlGroups array

This the full initializer of the hlGroups array, replace the previous one with it.

hlgroups.zig
/// Array with highlight groups.
pub const hlGroups: [n_hl]t.HlGroup = arr: {
    // Initialize the hlGroups array at compile time. A []HlGroup array is
    // first declared undefined, then it is filled with all highlight groups.
    var hlg: [n_hl]t.HlGroup = undefined;
    hlg[int(.normal)] = .{
        .fg = FgColor.default,
        .bg = BgColor.default,
        .reverse = false,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.comment)] = .{
        .fg = FgColor.black_bright,
        .bg = BgColor.default,
        .reverse = false,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.mlcomment)] = .{
        .fg = FgColor.blue_bright,
        .bg = BgColor.default,
        .reverse = false,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.number)] = .{
        .fg = FgColor.white_bright,
        .bg = BgColor.default,
        .reverse = false,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.string)] = .{
        .fg = FgColor.green,
        .bg = BgColor.default,
        .reverse = false,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.keyword)] = .{
        .fg = FgColor.cyan,
        .bg = BgColor.default,
        .reverse = false,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.types)] = .{
        .fg = FgColor.cyan_bright,
        .bg = BgColor.default,
        .reverse = false,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.builtin)] = .{
        .fg = FgColor.magenta,
        .bg = BgColor.default,
        .reverse = false,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.constant)] = .{
        .fg = FgColor.yellow,
        .bg = BgColor.default,
        .reverse = false,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.preproc)] = .{
        .fg = FgColor.red_bright,
        .bg = BgColor.default,
        .reverse = false,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.uppercase)] = .{
        .fg = FgColor.yellow_bright,
        .bg = BgColor.default,
        .reverse = false,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.escape)] = .{
        .fg = FgColor.red,
        .bg = BgColor.default,
        .reverse = false,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.incsearch)] = .{
        .fg = FgColor.green,
        .bg = BgColor.default,
        .reverse = true,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.nonprint)] = .{
        .fg = FgColor.white,
        .bg = BgColor.default,
        .reverse = true,
        .bold = false,
        .italic = false,
        .underline = false,
    };
    hlg[int(.err)] = .{
        .fg = FgColor.red_bright,
        .bg = BgColor.default,
        .reverse = false,
        .bold = true,
        .italic = false,
        .underline = false,
    };
    break :arr hlg;
};

Syntax types

You can either defined them in types module (which I do) or in a different file, which will be imported by the types module, so that it's accessible also from there.

The Syntax type

This type defines many properities of a syntax: extensions used for filetype detection, comment leaders, keywords and syntax-specific editor options.

types.zig
///////////////////////////////////////////////////////////////////////////////
//
//                              Syntax types
//
///////////////////////////////////////////////////////////////////////////////

pub const Syntax = struct {
    /// Name of filetype
    ft_name: []const u8,

    /// Array of extensions for filetype detection
    ft_ext: []const []const u8,

    /// Array of names for filetype detection, to be matched against the tail
    /// of any path, so for example ".git/config" will match against any git
    /// configuration file in any directory.
    ft_fntails: []const []const u8,

    /// Leaders for single-line comments
    lcmt: []const []const u8,

    /// Array with multiline comment leaders
    /// [0] is start of block
    /// [1] is leader for lines between start and end
    /// [2] is end of block
    mlcmt: ?[3][]const u8,

    /// Array of words with 'Keywords' highlight
    keywords: []const []const u8,

    /// Array of words with 'Types' highlight
    types: []const []const u8,

    /// Array of words with 'Builtin' highlight
    builtin: []const []const u8,

    /// Array of words with 'Constant' highlight
    constant: []const []const u8,

    /// Array of words with 'Preproc' highlight
    preproc: []const []const u8,

    /// Bit field with supported syntax groups
    flags: SyntaxFlags,
};

The syntax flags

This type is important because it controls the kinds of highlight that a syntax supports, that is what the syntax highlighter will actually highlight when parsing the buffer.

types.zig
pub const SyntaxFlags = packed struct {
    /// Should highlight integer and floating point numbers
    numbers: bool = false,

    /// Should highlight 0x[0-9a-fA-F]+ numbers
    hex: bool = false,

    /// Should highlight 0b[01]+ numbers
    bin: bool = false,

    /// Should highlight 0o[0-7]+ numbers
    octal: bool = false,

    /// Supports undescores in numeric literals
    uscn: bool = false,

    /// Should highlight strings
    strings: bool = false,

    /// Supports double-quoted strings
    dquotes: bool = false,

    /// Supports single-quoted strings
    squotes: bool = false,

    /// Highlight backticks as strings
    backticks: bool = false,

    /// Single-quotes are used for char literals instead
    chars: bool = false,

    /// Should highlight uppercase words
    uppercase: bool = false,
};

Syntax definitions

I won't explain much of what goes on here: it's a list of syntax definitions, with their flags, keywords and so on. Create a module named syndefs.zig and copy-paste them.

You might note that all arrays have a & in front of them. That's because the Syntax type's fields are slices, for example []const []const u8. The only exception is mlcmt, because it has a fixed size, besides being optional.

syndefs.zig
//! Module that contains all syntax definitions.

pub const Syntaxes = [_]t.Syntax{

    //////////////////////////////////////////////////////////////////////////
    //// zig
    .{
        .ft_name = "zig",
        .ft_ext = &.{
            "zig",
            "zon",
        },
        .ft_fntails = &.{},
        .lcmt = &.{"//"},
        .mlcmt = null,
        .keywords = &.{
            "addrspace", "align",   "allowzero",      "and",
            "anyframe",  "anytype", "catch",          "const",
            "else",      "enum",    "error",          "fn",
            "for",       "if",      "opaque",         "or",
            "orelse",    "packed",  "struct",         "switch",
            "try",       "union",   "usingnamespace", "var",
            "volatile",  "while",
        },
        .types = &.{
            "i8",          "u8",           "i16",            "u16",
            "i32",         "u32",          "i64",            "u64",
            "i128",        "u128",         "isize",          "usize",
            "c_char",      "c_short",      "c_ushort",       "c_int",
            "c_uint",      "c_long",       "c_ulong",        "c_longlong",
            "c_ulonglong", "c_longdouble", "f16",            "f32",
            "f64",         "f80",          "f128",           "bool",
            "anyopaque",   "void",         "noreturn",       "type",
            "anyerror",    "comptime_int", "comptime_float",
        },
        .builtin = &.{
            "export",   "extern",      "noinline",    "nosuspend",
            "inline",   "suspend",     "async",       "await",
            "defer",    "errdefer",    "unreachable", "comptime",
            "continue", "return",      "resume",      "threadlocal",
            "callconv", "linksection", "asm",         "noalias",
            "test",     "pub",         "break",
        },
        .constant = &.{
            "undefined", "true", "false", "null",
        },
        .preproc = &.{
            "@addrSpaceCast",    "@addWithOverflow",    "@alignCast",
            "@alignOf",          "@as",                 "@atomicLoad",
            "@atomicRmw",        "@atomicStore",        "@bitCast",
            "@bitOffsetOf",      "@bitSizeOf",          "@branchHint",
            "@breakpoint",       "@mulAdd",             "@byteSwap",
            "@bitReverse",       "@offsetOf",           "@call",
            "@cDefine",          "@cImport",            "@cInclude",
            "@clz",              "@cmpxchgStrong",      "@cmpxchgWeak",
            "@compileError",     "@compileLog",         "@constCast",
            "@ctz",              "@cUndef",             "@cVaArg",
            "@cVaCopy",          "@cVaEnd",             "@cVaStart",
            "@divExact",         "@divFloor",           "@divTrunc",
            "@embedFile",        "@enumFromInt",        "@errorFromInt",
            "@errorName",        "@errorReturnTrace",   "@errorCast",
            "@export",           "@extern",             "@field",
            "@fieldParentPtr",   "@FieldType",          "@floatCast",
            "@floatFromInt",     "@frameAddress",       "@hasDecl",
            "@hasField",         "@import",             "@inComptime",
            "@intCast",          "@intFromBool",        "@intFromEnum",
            "@intFromError",     "@intFromFloat",       "@intFromPtr",
            "@max",              "@memcpy",             "@memset",
            "@min",              "@wasmMemorySize",     "@wasmMemoryGrow",
            "@mod",              "@mulWithOverflow",    "@panic",
            "@popCount",         "@prefetch",           "@ptrCast",
            "@ptrFromInt",       "@rem",                "@returnAddress",
            "@select",           "@setEvalBranchQuota", "@setFloatMode",
            "@setRuntimeSafety", "@shlExact",           "@shlWithOverflow",
            "@shrExact",         "@shuffle",            "@sizeOf",
            "@splat",            "@reduce",             "@src",
            "@sqrt",             "@sin",                "@cos",
            "@tan",              "@exp",                "@exp2",
            "@log",              "@log2",               "@log10",
            "@abs",              "@floor",              "@ceil",
            "@trunc",            "@round",              "@subWithOverflow",
            "@tagName",          "@This",               "@trap",
            "@truncate",         "@Type",               "@typeInfo",
            "@typeName",         "@TypeOf",             "@unionInit",
            "@Vector",           "@volatileCast",       "@workGroupId",
            "@workGroupSize",    "@workItemId",
        },
        .flags = .{
            .numbers = true,
            .strings = true,
            .dquotes = true,
            .chars = true,
            .uppercase = true,
            .hex = true,
            .bin = true,
            .octal = true,
            .uscn = true,
        },
    },

    //////////////////////////////////////////////////////////////////////////
    //// c
    .{
        .ft_name = "c",
        .ft_ext = &.{
            "c", "h",
        },
        .ft_fntails = &.{},
        .lcmt = &.{"//"},
        .mlcmt = .{ "/*", " *", "*/" },
        .keywords = &.{
            "auto",           "case",          "const",
            "default",        "do",            "else",
            "enum",           "extern",        "for",
            "goto",           "if",            "inline",
            "register",       "restrict",      "sizeof",
            "static",         "struct",        "switch",
            "typedef",        "union",         "volatile",
            "while",          "_Alignas",      "_Alignof",
            "_Atomic",        "_Generic",      "_Noreturn",
            "_Static_assert", "_Thread_local",
        },
        .types = &.{
            "void",     "char",       "short",    "int",       "long",
            "float",    "double",     "signed",   "unsigned",  "_Bool",
            "_Complex", "_Imaginary", "size_t",   "ptrdiff_t", "wchar_t",
            "int8_t",   "int16_t",    "int32_t",  "int64_t",   "uint8_t",
            "uint16_t", "uint32_t",   "uint64_t", "intptr_t",  "uintptr_t",
        },
        .builtin = &.{
            "continue", "return", "break",
        },
        .constant = &.{
            "NULL", "EOF", "true", "false", "TRUE", "FALSE",
        },
        .preproc = &.{
            "#include",     "#define",  "#undef",           "#ifdef",
            "#ifndef",      "#if",      "#endif",           "#else",
            "#elif",        "#line",    "#error",           "#pragma",
            "#warning",     "__FILE__", "__LINE__",         "__DATE__",
            "__TIME__",     "__STDC__", "__STDC_VERSION__", "__func__",
            "__FUNCTION__",
        },
        .flags = .{
            .numbers = true,
            .strings = true,
            .dquotes = true,
            .chars = true,
            .uppercase = true,
            .hex = true,
        },
    },

    //////////////////////////////////////////////////////////////////////////
    //// c++
    .{
        .ft_name = "cpp",
        .ft_ext = &.{
            "cpp", "cc", "cxx", "c++", "hpp", "hh", "hxx", "h++",
        },
        .ft_fntails = &.{},
        .lcmt = &.{"//"},
        .mlcmt = .{ "/*", " *", "*/" },
        .keywords = &.{
            "alignas",          "alignof",     "and",          "and_eq",
            "asm",              "auto",        "bitand",       "bitor",
            "case",             "catch",       "class",        "compl",
            "const",            "consteval",   "constexpr",    "constinit",
            "const_cast",       "co_await",    "co_return",    "co_yield",
            "decltype",         "default",     "delete",       "do",
            "dynamic_cast",     "else",        "enum",         "explicit",
            "export",           "extern",      "for",          "friend",
            "goto",             "if",          "inline",       "mutable",
            "namespace",        "new",         "noexcept",     "not",
            "not_eq",           "operator",    "or",           "or_eq",
            "private",          "protected",   "public",       "register",
            "reinterpret_cast", "requires",    "sizeof",       "static",
            "static_assert",    "static_cast", "struct",       "switch",
            "template",         "this",        "thread_local", "throw",
            "try",              "typedef",     "typeid",       "typename",
            "union",            "using",       "virtual",      "volatile",
            "while",            "xor",         "xor_eq",
        },
        .types = &.{
            "void",     "char",      "char8_t",  "char16_t",
            "char32_t", "wchar_t",   "short",    "int",
            "long",     "float",     "double",   "signed",
            "unsigned", "bool",      "size_t",   "ptrdiff_t",
            "int8_t",   "int16_t",   "int32_t",  "int64_t",
            "uint8_t",  "uint16_t",  "uint32_t", "uint64_t",
            "intptr_t", "uintptr_t",
        },
        .builtin = &.{
            "std",      "override", "final", "concept",
            "continue", "return",   "break",
        },
        .constant = &.{
            "nullptr", "true", "false",
        },
        .preproc = &.{
            "#include",            "#define",     "#undef",   "#ifdef",
            "#ifndef",             "#if",         "#endif",   "#else",
            "#elif",               "#line",       "#error",   "#pragma",
            "#warning",            "__cplusplus", "__FILE__", "__LINE__",
            "__DATE__",            "__TIME__",    "__func__", "__FUNCTION__",
            "__PRETTY_FUNCTION__",
        },
        .flags = .{
            .numbers = true,
            .strings = true,
            .dquotes = true,
            .chars = true,
            .uppercase = true,
            .hex = true,
            .bin = true,
        },
    },

    //////////////////////////////////////////////////////////////////////////
    //// python
    .{
        .ft_name = "python",
        .ft_ext = &.{
            "py", "pyw", "pyx", "pxd", "pxi",
        },
        .ft_fntails = &.{},
        .lcmt = &.{"#"},
        .mlcmt = .{ "\"\"\"", "", "\"\"\"" },
        .keywords = &.{
            "and",    "as",      "assert",   "class",
            "def",    "del",     "elif",     "else",
            "except", "finally", "for",      "from",
            "global", "if",      "import",   "in",
            "is",     "lambda",  "nonlocal", "not",
            "or",     "pass",    "raise",    "try",
            "while",  "with",    "yield",    "async",
            "await",
        },
        .types = &.{
            "int",       "float",     "complex", "str",   "bytes",
            "bytearray", "bool",      "list",    "tuple", "dict",
            "set",       "frozenset", "object",  "type",
        },
        .builtin = &.{
            "__debug__", "self",   "cls",
            "continue",  "return", "break",
        },
        .constant = &.{
            "True", "False", "None", "NotImplemented", "Ellipsis",
        },
        .preproc = &.{
            "abs",         "all",          "any",        "ascii",
            "bin",         "bool",         "callable",   "chr",
            "classmethod", "compile",      "delattr",    "dir",
            "divmod",      "enumerate",    "eval",       "exec",
            "filter",      "format",       "getattr",    "globals",
            "hasattr",     "hash",         "help",       "hex",
            "id",          "input",        "isinstance", "issubclass",
            "iter",        "len",          "locals",     "map",
            "max",         "memoryview",   "min",        "next",
            "oct",         "open",         "ord",        "pow",
            "print",       "property",     "range",      "repr",
            "reversed",    "round",        "setattr",    "slice",
            "sorted",      "staticmethod", "sum",        "super",
            "vars",        "zip",          "__import__",
        },
        .flags = .{
            .numbers = true,
            .strings = true,
            .dquotes = true,
            .squotes = true,
            .uppercase = true,
            .hex = true,
            .bin = true,
            .octal = true,
            .uscn = true,
        },
    },

    //////////////////////////////////////////////////////////////////////////
    //// lua
    .{
        .ft_name = "lua",
        .ft_ext = &.{
            "lua",
        },
        .ft_fntails = &.{},
        .lcmt = &.{"--"},
        .mlcmt = .{ "--[[", "", "]]" },
        .keywords = &.{
            "and",   "do",       "else",   "elseif", "end",
            "for",   "function", "if",     "in",     "local",
            "not",   "or",       "repeat", "then",   "until",
            "while",
        },
        .types = &.{
            "boolean",  "number", "string", "userdata",
            "function", "thread", "table",
        },
        .builtin = &.{
            "_G",   "_VERSION", "self",
            "goto", "return",   "break",
        },
        .constant = &.{
            "true", "false", "nil",
        },
        .preproc = &.{
            "assert",   "collectgarbage", "dofile",   "error",
            "getfenv",  "getmetatable",   "ipairs",   "load",
            "loadfile", "loadstring",     "next",     "pairs",
            "pcall",    "print",          "rawequal", "rawget",
            "rawlen",   "rawset",         "require",  "select",
            "setfenv",  "setmetatable",   "tonumber", "tostring",
            "type",     "unpack",         "xpcall",   "coroutine",
            "debug",    "io",             "math",     "os",
            "package",  "string",         "table",
        },
        .flags = .{
            .numbers = true,
            .strings = true,
            .dquotes = true,
            .squotes = true,
            .uppercase = true,
            .hex = true,
        },
    },

    //////////////////////////////////////////////////////////////////////////
    //// javascript
    .{
        .ft_name = "javascript",
        .ft_ext = &.{
            "js", "jsx", "mjs", "cjs",
        },
        .ft_fntails = &.{},
        .lcmt = &.{"//"},
        .mlcmt = .{ "/*", " *", "*/" },
        .keywords = &.{
            "case",    "catch",   "class",      "const",    "debugger",
            "default", "delete",  "do",         "else",     "export",
            "extends", "finally", "for",        "function", "if",
            "import",  "in",      "instanceof", "let",      "new",
            "super",   "switch",  "this",       "throw",    "try",
            "typeof",  "var",     "void",       "while",    "with",
            "yield",   "async",   "await",      "static",   "get",
            "set",
        },
        .types = &.{
            "boolean", "number",  "bigint",  "string",   "symbol",
            "object",  "Array",   "Object",  "Function", "String",
            "Number",  "Boolean", "Date",    "RegExp",   "Error",
            "Map",     "Set",     "Promise", "Symbol",   "BigInt",
        },
        .builtin = &.{
            "globalThis", "console", "window",   "document",
            "global",     "process", "continue", "return",
            "break",
        },
        .constant = &.{
            "true", "false", "null", "undefined", "NaN", "Infinity",
        },
        .preproc = &.{
            "parseInt",      "parseFloat",         "isNaN",
            "isFinite",      "encodeURI",          "encodeURIComponent",
            "decodeURI",     "decodeURIComponent", "eval",
            "setTimeout",    "setInterval",        "clearTimeout",
            "clearInterval", "JSON",               "Math",
            "console",       "alert",              "confirm",
            "prompt",        "require",            "module",
            "exports",       "__dirname",          "__filename",
        },
        .flags = .{
            .numbers = true,
            .strings = true,
            .dquotes = true,
            .squotes = true,
            .uppercase = true,
            .hex = true,
            .bin = true,
            .octal = true,
        },
    },

    //////////////////////////////////////////////////////////////////////////
    //// bash
    .{
        .ft_name = "bash",
        .ft_ext = &.{
            "sh", "bash", "zsh", "fish",
        },
        .ft_fntails = &.{ ".bashrc", ".bash_profile", ".zshrc", ".profile" },
        .lcmt = &.{"#"},
        .mlcmt = null,
        .keywords = &.{
            "if",     "then",     "else",     "elif",    "fi",
            "case",   "esac",     "for",      "while",   "until",
            "do",     "done",     "function", "select",  "time",
            "in",     "break",    "continue", "return",  "exit",
            "local",  "readonly", "declare",  "typeset", "export",
            "unset",  "shift",    "set",      "unalias", "alias",
            "source", "eval",     "exec",     "trap",    "wait",
            "jobs",   "bg",       "fg",       "disown",  "suspend",
            "kill",   "killall",  "nohup",    "logout",
        },
        .types = &.{
            "array", "string", "integer", "associative",
        },
        .builtin = &.{
            "echo",      "printf", "read",    "test",  "cd",
            "pwd",       "pushd",  "popd",    "dirs",  "history",
            "fc",        "hash",   "type",    "which", "command",
            "builtin",   "enable", "help",    "bind",  "complete",
            "compgen",   "caller", "getopts", "let",   "mapfile",
            "readarray", "ulimit", "umask",   "shopt", "times",
        },
        .constant = &.{
            "true", "false",
        },
        .preproc = &.{
            "$0",            "$1",           "$2",              "$3",
            "$4",            "$5",           "$6",              "$7",
            "$8",            "$9",           "$@",              "$*",
            "$#",            "$",            "$!",              "$?",
            "$-",            "$_",           "$HOME",           "$PATH",
            "$PWD",          "$OLDPWD",      "$USER",           "$UID",
            "$SHELL",        "$TERM",        "$LANG",           "$LC_ALL",
            "$TMPDIR",       "$IFS",         "$PS1",            "$PS2",
            "$PS3",          "$PS4",         "$PROMPT_COMMAND", "$BASH",
            "$BASH_VERSION", "$BASHPID",     "$BASH_SUBSHELL",  "$LINENO",
            "$FUNCNAME",     "$BASH_SOURCE", "$BASH_LINENO",    "$SECONDS",
            "$RANDOM",       "$REPLY",       "$OPTARG",         "$OPTIND",
            "$HOSTNAME",     "$HOSTTYPE",    "$MACHTYPE",       "$OSTYPE",
            "$PIPESTATUS",   "$SHELLOPTS",   "$BASHOPTS",
        },
        .flags = .{
            .numbers = true,
            .strings = true,
            .dquotes = true,
            .squotes = true,
            .backticks = true,
            .uppercase = true,
        },
    },

    //////////////////////////////////////////////////////////////////////////
    //// gitconfig
    .{
        .ft_name = "gitconfig",
        .ft_ext = &.{},
        .ft_fntails = &.{ ".gitconfig", ".git/config" },
        .lcmt = &.{ "#", ";" },
        .mlcmt = null,
        .keywords = &.{
            "auto",     "always", "never", "local", "global", "system",
            "worktree",
        },
        .types = &.{
            "core",        "user",       "remote",        "branch",
            "merge",       "push",       "pull",          "fetch",
            "alias",       "color",      "diff",          "log",
            "status",      "commit",     "tag",           "rebase",
            "rerere",      "submodule",  "credential",    "http",
            "https",       "url",        "init",          "clone",
            "gc",          "fsck",       "pack",          "receive",
            "transfer",    "uploadpack", "uploadarchive", "advice",
            "apply",       "blame",      "browser",       "clean",
            "column",      "format",     "grep",          "gui",
            "help",        "i18n",       "imap",          "instaweb",
            "interactive", "mailinfo",   "mailmap",       "man",
            "notes",       "pager",      "pretty",        "protocol",
            "sendemail",   "sequence",   "showbranch",    "web",
        },
        .builtin = &.{
            "HEAD",   "FETCH_HEAD", "ORIG_HEAD", "MERGE_HEAD",
            "master", "main",       "origin",    "upstream",
            "refs",   "heads",      "tags",      "remotes",
        },
        .constant = &.{
            "true", "false", "yes", "no", "on", "off",
        },
        .preproc = &.{
            "name",                           "email",
            "editor",                         "pager",
            "excludesfile",                   "attributesfile",
            "hooksPath",                      "templatedir",
            "gitProxy",                       "sshCommand",
            "askpass",                        "autocrlf",
            "safecrlf",                       "filemode",
            "ignorecase",                     "precomposeUnicode",
            "hideDotFiles",                   "symlinks",
            "bare",                           "worktree",
            "logAllRefUpdates",               "repositoryformatversion",
            "sharedrepository",               "denyCurrentBranch",
            "denyNonFastforwards",            "fsckObjects",
            "transferFsckObjects",            "receivefsckObjects",
            "allowTipSHA1InWant",             "allowReachableSHA1InWant",
            "allowAnySHA1InWant",             "advertiseRefs",
            "allowUnadvertisedObjectRequest", "keepAlive",
            "maxStartups",                    "timeout",
            "uploadpack",                     "uploadarchive",
        },
        .flags = .{
            .numbers = true,
            .strings = true,
            .dquotes = true,
            .squotes = true,
        },
    },
};

const t = @import("types.zig");

New string functions

We'll complete our string module with a new set of functions that we'll use for syntax highlighting.

str.eql

We could just call mem.eql(u8, ...) everywhere. It's just a shorthand. I don't know it's good practice, but we'll call it many times and the meaning is immediately obvious, so it's ok for me.

string.zig
/// Return `true` if slices have the same content.
pub fn eql(a: []const u8, b: []const u8) bool {
    return mem.eql(u8, a, b);
}

str.isTail

This is used for filetype detection.

string.zig
/// Return `true` if the tail of haystack is exactly `needle`.
pub fn isTail(haystack: []const u8, needle: []const u8) bool {
    const idx = mem.lastIndexOfLinear(u8, haystack, needle);
    return idx != null and idx.? + needle.len == haystack.len;
}

str.getExtension

Also used for filetype detection.

string.zig
/// Get the extension of a filename.
pub fn getExtension(path: []u8) ?[]u8 {
    const ix = mem.lastIndexOfScalar(u8, path, '.');
    if (ix == null or ix == path.len - 1) {
        return null;
    }
    return path[ix.? + 1 ..];
}

str.isSeparator

This one is very similar to str.isWord. It's actually the opposite. I only make it a different function to be able to check whitespace before other characters, since whitespace is the most common way to separate words, and should be prioritized when deciding if something is a separator or not.

But I'm not sure it really makes a difference. If it doesn't, this function should be removed and
!str.isWord() would be used instead.

string.zig
/// Return true if character is a separator (not a word character).
pub fn isSeparator(c: u8) bool {
    if (c == ' ' or c == '\t') return true;
    return switch (c) {
        '0'...'9', 'a'...'z', 'A'...'Z', '_' => false,
        else => true,
    };
}

Digression: inline keyword

inline with functions

The inline calling convention forces a function to be inlined at all call sites.
If the function cannot be inlined, it is a compile-time error.

This what the creator of the Zig language wrote:

Quote

It’s best to let the compiler decide when to inline a function, except for these scenarios:

  • You want to change how many stack frames are in the call stack, for debugging purposes
  • You want the comptime-ness of the arguments to propagate to the return value of the function
  • Performance measurements demand it. Don’t guess!

Otherwise you actually end up restricting what the compiler is allowed to do when you use inline which can harm binary size, compilation speed, and even runtime performance.

So basically he's recommending not to use it unless you have a good and measurable reason to do so.

Other uses of inline

From the official language reference:

Other uses of inline are very different, because they usually allow loops to be evaluated at compile time. I've never used them, since I never felt the need for them, so I can't tell you more.

Filetype detection

We'll write a function named selectSyntax to detect and set the buffer syntax. This function will be invoked in two places:

  • in openFile():
Editor.zig: openFile()
    B.filename = try e.updateString(B.filename, path);
    B.syntax = try e.selectSyntax();
  • in saveFile(), so that we can set a syntax for newly created files, after we give them a name:
Editor.zig: saveFile()
    B.syntax = try e.selectSyntax();
    // determine number of bytes to write, make room for \n characters
    var fsize: usize = B.rows.items.len;

The selectSyntax() function

I put this in the "Syntax highlighting" section.

We start by freeing the old syntax, then we try to assign it again. For now unnamed buffers can't set a syntax, but it will be selected when the buffer is named and saved.

Editor.zig
/// Return the syntax name for the current file, or null.
fn selectSyntax(e: *Editor) !?[]const u8 {
    var B = &e.buffer;

    // free the old syntax, if any
    t.freeOptional(e.alc, B.syntax);
    B.syntax = null;

    // we might allow setting a syntax even without a filename, actually...
    // but for now it's not possible
    if (B.filename == null) {
        return null;
    }

    // code to come...
}

We get the extension of the syntax, then we loop over all syntax definitions and we see if any of them matches for that extension.

If none of the extension matches, we match against the tail of the filename.

Editor.zig: selectSyntax()
    const fileExt = str.getExtension(B.filename.?);

    for (&syndefs.Syntaxes) |*syntax| {
        if (fileExt) |extension| {
            for (syntax.ft_ext) |ext| {
                if (str.eql(ext, extension)) {
                    B.syndef = syntax;
                    return try e.alc.dupe(u8, syntax.ft_name);
                }
            }
        }
        for (syntax.ft_fntails) |name| {
            if (str.isTail(B.filename.?, name)) {
                B.syndef = syntax;
                return try e.alc.dupe(u8, syntax.ft_name);
            }
        }
    }
    return null;

Needed constants:

Editor.zig
const syndefs = @import("syndefs.zig");

Doing the highlight

First, let's add a new option, which controls globally if syntax highlighting should be done or not:

option.zig
/// Enable syntax highlighting
pub var syntax = true;

In updateHighlight, we'll return early if the buffer has no filetype, or this option is disabled.

Editor.zig: updateHighlight()
    // reset the row highlight to normal
    row.hl = try e.alc.realloc(row.hl, row.render.len);
    @memset(row.hl, .normal);
    if (e.buffer.syntax == null or opt.syntax == false) {
        return;
    }

We do the highlight of the whole rendered row. This is certainly not ideal, because certain files have very long lines, and only a part of it is actually visible. At the same time, if we restrict parsing to only what we can see, we will certainly have bad highlight in all those cases where the highlight of a character depends on what precedes it, or even follows.

We could try to do it anyway and add some safety margin, both on the left and the right side of the rendered part of the line, so that parsing starts before coloff and ends after coloff + screen.cols, but it wouldn't be perfect (think of very long line comments).

We could make it optional, to have a fast highlight mode, but we can't change options inside the editor.

Doing it properly would need some serious changes, but we'll pass this time. I said it is a toy editor for reasons, and this isn't the only one.

Top-level symbols

Before we start the loop that iterates all characters visible on screen, we define some constants and variables.

The most important one is prev_sep: it controls when we can start to parse something new. If this variable isn't set correctly where it needs to be, highlighting of will be likely broken.

in_string, which tells us if we're in a string or not, is checked early since inside strings we should ignore everything else, except escaped characters (for which we have an escaped variable).

Similarly for in_mlcomment: also in this case we don't parse anything until we find the sequence that closes the comment.

    //////////////////////////////////////////
    //          Top-level symbols
    //////////////////////////////////////////

    // length of the rendered row
    const rowlen = row.render.len;

    // syntax definition
    const s = e.buffer.syndef.?;

    // line comment leader
    const lc = s.lcmt;

    // multiline comment leaders
    const mlc = s.mlcmt;

    // syntax flags
    const flags = s.flags;

    // character is preceded by a separator
    var prev_sep = true;

    // character is preceded by a backslash
    var escaped = false;

    // character is inside a string or char literal
    var in_string = false;
    var in_char = false;
    var delimiter: u8 = 0;

    // line is in a multiline comment
    var in_mlcomment = ix > 0 and e.buffer.rows.items[ix - 1].ml_comment;

    // all keywords in the syntax definition, subdivided by kinds
    // each kind has its own specific highlight
    const all_syn_keywords = [_]struct {
        kind: []const []const u8, // array with keywords of some kind
        hl: t.Highlight,
    }{
        .{ .kind = s.keywords, .hl = t.Highlight.keyword },
        .{ .kind = s.types,    .hl = t.Highlight.types },
        .{ .kind = s.builtin,  .hl = t.Highlight.builtin },
        .{ .kind = s.constant, .hl = t.Highlight.constant },
        .{ .kind = s.preproc,  .hl = t.Highlight.preproc },
    };

The top-level loop

We'll have multiple nested loops, so we will use labels to break to an outer loop. The top-level loop has the toplevel label.

We'll use labels for all loops, and all break and continue statements. This way it should be clearer from which loop we're breaking.

At the bottom of the top-level loop we'll increase the row index and set the critical prev_sep variable.

First thing we do is to skip whitespaces, which are also a valid separator.

    var i: usize = 0;
    toplevel: while (i < rowlen) {
        if (asc.isWhitespace(row.render[i])) { // skip whitespaces
            prev_sep = true;
            i += 1;
            continue :toplevel;
        }

        // rest of parsing goes here...

        prev_sep = str.isSeparator(row.render[i]);
        i += 1;
    }

Multi-line comments

Note

Remember that we had, when defining constants and variables:

// line is in a multiline comment
var in_mlcomment = ix > 0 and e.buffer.rows.items[ix - 1].ml_comment;

we have ML comments...

Our mlcmt field is an optional field, so we must check if it's null. If not null, it's a [3]u8 array with start marker, middle marker and end marker.

        // ML comments
        if (mlc != null and mlc.?.len > 0 and !in_string) {
            const mc = mlc.?;

we are in a ML comment...

... this we can know because in_mlcomment is true if the previous row's ml_comment field is true.

In this case we paint the character as ML comment, and keep looking for the end marker.

            if (in_mlcomment) {
                const len = mc[2].len;
                row.hl[i] = t.Highlight.mlcomment;

we do find the end marker...

... then in_mlcomment becomes false. After the marker, normal parsing resumes in this row.

Note

We don't break out of the top-level loop, we continue it, because unlike line comments, multi-line ones can end in the same line where they started.

                if (i + len <= rowlen and str.eql(row.render[i .. i + len], mc[2])) { // END
                    @memset(row.hl[i .. i + len], t.Highlight.mlcomment);
                    i += len;
                    in_mlcomment = false;
                    prev_sep = true;
                    continue :toplevel;
                }

we don't find the end marker...

... then in_mlcomment keeps being true also for this line. We keep painting everything as ML comment.

                else {
                    i += 1;
                    continue :toplevel;
                }
            }

we aren't in a ML comment yet...

... and we find the start marker. in_mlcomment becomes true. From then onwards, characters are painted as ML comment.

            else {
                const len = mc[0].len;

                if (i + len <= rowlen and str.eql(row.render[i .. i + len], mc[0])) { // START
                    @memset(row.hl[i .. i + len], t.Highlight.mlcomment);
                    i += len;
                    in_mlcomment = true;
                    continue :toplevel;
                }
            }
        }

Following row will have in_mlcomment set to false.

A change in comment state triggers a chain update

Normally we only update the row that has changed, but for multi-line patterns, we must update following rows too, otherwise their highlight would stay the same.

We must keep updating following rows, until the value of in_mlcomment matches the value of row.ml_comment: only in this case we know that the row wasn't affected by the multi-line pattern. Only then we can stop the chain of row updates.

This is done at the very bottom of the updateHighlight function. Add it now, so that you can have a clearer picture.

Editor.zig: bottom of updateHighlight()
    // If a multiline comment state has changed (either a comment started, or
    // a previous one has been closed) we must update following the row, which
    // will in turn update others, until all rows affected by the comment are
    // updated.
    const mlc_state_changed = row.ml_comment != in_mlcomment;
    row.ml_comment = in_mlcomment;
    if (mlc_state_changed and ix + 1 < e.buffer.rows.items.len) {
        try e.updateHighlight(ix + 1);
    }

If you still didn't get it, imagine 10 rows, no ML comments. Their row.ml_comment is false.

1.if ML comment starts at line 2, in_mlcomment becomes true
2.in_mlcomment is different from row.ml_comment and it triggers the chain update
3.following row has in_mlcomment set to true, because it's equal to row.ml_comment of previous row
4.it's different from its own row.ml_comment, chain update continues
5.all following lines become commented this way, all their row.ml_comment becomes true
6.now you insert the end marker at line 4
7.you trigger another chain update, which reverses the state of the lines that follow

Note

This chain update is probably inefficient, since after the rows that follow are updated, they will be updated again when it's their turn in drawRows() to be updated. We could use a Buffer field to track how many lines could skip the update, because they've been updated this way. We're not doing it, though.

Line comments

For line comments we just check we aren't in a string or in a multiline comment, and we look for the comment leader. If found, the rest of the line is a comment, no need to continue parsing this line.

Editor.zig: inside updateHighlight() top-level loop
        // single-line comment
        if (lc.len > 0 and !in_string and !in_mlcomment) {
            for (lc) |ldr| {
                if (i + ldr.len <= rowlen and str.eql(row.render[i .. i + ldr.len], ldr)) {
                    @memset(row.hl[i..], t.Highlight.comment);
                    break :toplevel;
                }
            }
        }

Strings

Highlighting of strings is controlled by Syntax.flags.strings, but that's not enough. Syntaxes can support double quoted strings, single quoted strings, backticks as strings or char literals, or more often a combination of them.

in_string and in_char differ because the highlight is different (string vs number). Moreover different delimiters must be handled independently: if a double quote is found and a string starts, a single quote after that is still part of the string. Same is true for double quotes after single quotes.

Whatever the delimiter and the string type, an escaped character is an escaped character, and it gets the .escape highlight, together with the escaping backslash.

If the start of a string or a char literal is found, delimiter is set to the character, and the appropriate highlight is set until delimiter is found again.

Multi-line strings aren't supported.

Editor.zig: inside updateHighlight() top-level loop
        if (flags.strings) {
            if (in_string or in_char) {
                if (escaped or row.render[i] == '\\') {
                    escaped = !escaped;
                    row.hl[i] = t.Highlight.escape;
                }
                else {
                    row.hl[i] = if (in_char) t.Highlight.number else t.Highlight.string;
                    if (row.render[i] == delimiter) {
                        in_string = false;
                        in_char = false;
                    }
                }
                i += 1;
                continue :toplevel;
            }
            else if (flags.dquotes and row.render[i] == '"'
                     or flags.squotes and row.render[i] == '\''
                     or flags.backticks and row.render[i] == '`') {
                in_string = true;
                delimiter = row.render[i];
                row.hl[i] = t.Highlight.string;
                i += 1;
                continue :toplevel;
            }
            else if (flags.chars and row.render[i] == '\'') {
                in_char = true;
                delimiter = row.render[i];
                row.hl[i] = t.Highlight.number;
                i += 1;
                continue :toplevel;
            }
        }

Numbers

Parsing numbers depends on syntax flags: different filetypes support different number formats. The formats we support are:

typeformatflag
integersNnumbers
floatsN.N([eE]N)?numbers
hex0[xX]Nhex
octal0[oO]Noctal
binary0[bB]Nbin

Integers and floats are always parsed if flags.numbers is true.

First we check if it's some special number notation (hex, octal binary). If true, we set the appropriate boolean variable and advance the index by 2 characters.

Editor.zig: inside updateHighlight() top-level loop
        // numbers
        if (flags.numbers and prev_sep) {
            var prev_digit = false;
            var is_float = false;
            var has_exp = false;
            var is_hex = false;
            var is_bin = false;
            var is_octal = false;
            var NaN = false;

            const begin = i;

            // hex, binary, octal notations
            if (i + 1 < rowlen) {
                if (row.render[i] == '0') {
                    switch (row.render[i + 1]) {
                        'x', 'X' => if (flags.hex) {
                            is_hex = true;
                            i += 2;
                        },
                        'b', 'B' => if (flags.bin) {
                            is_bin = true;
                            i += 2;
                        },
                        'o', 'O' => if (flags.octal) {
                            is_octal = true;
                            i += 2;
                        },
                        else => {},
                    }
                }
            }

Then we parse the actual number. What counts as a digit depends on the type that has been detected. If it's not a special notation, we only accept digits and a dot.

The variable prev_digit is true if the previous character was a valid digit for the type. This variable must be true at the end of the parsing, or this simply isn't a number.

If flags.uscn is true, we also accept underscores as separator. They are part of the number, but they aren't digits themselves, so if they aren't followed by a digit, it won't be a number.

For the dot, it's similar: it requires to be followed by a digit, otherwise it's a simple separator. Not only that, but there can be only one dot in the number. Finding a dot the first time sets is_float, finding it twice means it's not a number.

Same goes for e/E (exponent): they must be followed by digits, and may not appear more than once. If it's a hex digit, though, e/E are digits, not exponents.

Editor.zig: flags.numbers
            // accept consecutive digits, or a dot followed by a number
            digits: while (true) : (i += 1) {
                if (i == rowlen) break :digits;

                switch (row.render[i]) {
                    '0'...'1' => prev_digit = true,

                    // invalid for binary numbers
                    '2'...'7' => {
                        if (!is_bin) {
                            prev_digit = true;
                        }
                        else {
                            prev_digit = false;
                            break :digits;
                        }
                    },

                    // invalid for binary and octal numbers
                    '8'...'9' => {
                        if (!is_bin and !is_octal) {
                            prev_digit = true;
                        }
                        else {
                            prev_digit = false;
                            break :digits;
                        }
                    },

                    // underscores as delimiters in numeric literals
                    '_' => {
                        if (prev_digit and flags.uscn) {
                            prev_digit = false;
                        }
                        else {
                            break :digits;
                        }
                    },

                    // could be an exponent, or a hex digit
                    'e', 'E' => {
                        if (is_float and !has_exp) {
                            has_exp = true;
                            prev_digit = false;
                        }
                        else if (is_hex) {
                            prev_digit = true;
                        }
                        else {
                            break :digits;
                        }
                    },

                    // hex digits
                    'a'...'d', 'f', 'A'...'D', 'F' => {
                        if (is_hex) prev_digit = true else break :digits;
                    },

                    // floating point
                    '.' => {
                        prev_sep = true;
                        prev_digit = false;
                        if (!is_float and !is_hex and !is_bin) {
                            is_float = true;
                        }
                        else {
                            break :digits;
                        }
                    },

                    else => break :digits,
                }
            }

After the loop ends, because a character has been found that is not valid for the type of number, we check if it's actually a number:

  • last character must be a valid digit
  • it must be followed by either a separator or end of line

We must also set the very important prev_sep variable, which controls whether the following characters may be parsed as new tokens, or as part of the previous one, or not at all. In this case, since we only have keywords left to parse, if this is false it will effectively end the parsing of the line.

If end of line has been reached we stop.

Editor.zig: flags.numbers
            // previous separator could be invalid if any character was
            // processed
            prev_sep = i == begin or str.isSeparator(row.render[i - 1]);

            // no matter the type of number, last character should be a digit
            if (!prev_digit) {
                NaN = true;
            }
            // after our number comes something that isn't a separator
            else if (i != rowlen and !str.isSeparator(row.render[i])) {
                NaN = true;
            }
            if (!NaN) {
                for (begin..i) |idx| {
                    row.hl[idx] = t.Highlight.number;
                }
            }
        }
        if (i == rowlen) break :toplevel;

Keywords

Remember the constant we set before the top-level loop started:

    // all keywords in the syntax definition, subdivided by kinds
    // each kind has its own specific highlight
    const all_syn_keywords = [_]struct {
        kind: []const []const u8, // array with keywords of some kind
        hl: t.Highlight,
    }{
        .{ .kind = s.keywords, .hl = t.Highlight.keyword },
        .{ .kind = s.types,    .hl = t.Highlight.types },
        .{ .kind = s.builtin,  .hl = t.Highlight.builtin },
        .{ .kind = s.constant, .hl = t.Highlight.constant },
        .{ .kind = s.preproc,  .hl = t.Highlight.preproc },
    };

Now we iterate this array. Each element of this array is a set of keywords ([]const []const u8), together with the highlight which they will use.

We loop each of these sets of keyword: if we find that what follows the current position is the keyword, we set the highlight and advance the position by the keyword length.

Editor.zig: inside updateHighlight() top-level loop
        // keywords
        if (prev_sep) {
            kwloop: for (all_syn_keywords) |keywords| {
                for (keywords.kind) |kw| {
                    const kwend = i + kw.len; // index where keyword would end

                    // separator or end of row after keyword
                    if ((kwend < rowlen and str.isSeparator(row.render[kwend]))
                        or kwend == rowlen)
                    {
                        if (str.eql(row.render[i..kwend], kw)) {
                            @memset(row.hl[i..kwend], keywords.hl);
                            i += kw.len;
                            break :kwloop;
                        }
                    }
                }
            }

Uppercase words

Similar process, but we don't loop any array, we just check if there's a sequence of uppercase characters or underscores.

Important

We must reset the prev_sep variable before continuing, or the loop will hang.

Editor.zig: inside updateHighlight() continuing block
            if (flags.uppercase) {
                var upper = false;
                const begin = i;
                upp: while (i < rowlen and !str.isSeparator(row.render[i])) {
                    if (!asc.isUpper(row.render[i]) and row.render[i] != '_') {
                        upper = false;
                        break :upp;
                    }
                    upper = true;
                    i += 1;
                }
                if (upper and i - begin > 1) {
                    @memset(row.hl[begin..i], t.Highlight.uppercase);
                }
            }
            prev_sep = false;
            continue :toplevel;
        }

Conclusion

I hope you found it interesting and/or useful.

Credits and thanks:

  • Andrew Kelley for creating the Zig programming language
  • Users of the Ziggit forum for answering questions
  • Especially user @vulpesx for advices on making the code more idiomatic for Zig
  • Paige Ruten, the writer of the original booklet
  • Salvatore Sanfilippo, the author of the original kilo editor

With that I don't mean the code is perfect, as I wrote in the introduction, but I did my best.

If you find mistakes, oversights or bad/wrong explanations of language concepts/features, please post an issue.

Thanks for reading.