Introduction
Some years ago a booklet has been published, called Build your own text editor, a guide to create a minimal text editor from scratch in the C programming language, based on the code of the kilo editor. It's a fun exercise to learn some C programming, because writing a toy text editor is fun.
In my attempt at learning the Zig programming language, I thought that rewriting that editor in Zig would have been a good exercise as well, and since learning resources for Zig aren't overly abundant, I thought it would have been a good idea to write a guide on how to do it, step by step, following the example of the booklet I mentioned before.
I want to make it clear that I'm neither an expert Zig programmer (this was my first Zig program), or an expert programmer in general (I'm self-taught and I just dabble with programming so to speak), so don't expect great technical insights or elaborate programming techniques, it's not the purpose of this document anyway. Moreover, I never claim that the way I solve a particular problem in this program is the best way to solve it, neither that it's the one that is the most idiomatic to the Zig programming language. Like its own predecessor, the C programming language, Zig is rather free-form in the sense that it doesn't enforce a particular programming style or paradigm. Still, also Zig has its idioms and best practices, and I try to follow them in general, but sometimes I will also show different ways to approach the same problem.
As a matter of fact, in this guide I don't strive to find the optimal solutions, from the point of view of performance optimizations and memory usage for example, but generally the simplest ones that I consider still acceptable. It is a minimal text editor, after all.
Also remember that the program we're creating is just a toy, an exercise to learn something more about a programming language, and not a tool that can have any serious use.
Compared to the original C version, here we will not respect the 1024 lines of
code limit (from which the name kilo
stems) and we will not be limited to
a single file, since pursuing (or even worse achieving) such coinciseness
would preclude us from using many useful features of the Zig programming
language, such as importable modules and instantiable types. Having everything
in a single file might make sense for small libraries, but it's not what we're
doing here.
I do sometimes use collapsible notes:
Speaking of the knowledge required to understand this booklet, this is not a programming guide but rather an exercise, so I will expect that you have at least some notion in systems programming languages like C, in the sense that I will suppose that you know already what pointers are and how to use them, or anything that can be considered basic programming knowledge.
I will also expect that you know the basics of the Zig programming language, so if you didn't already, I suggest that you go through the exercises from the ziglings project before attempting this one. Other learning resources that I found useful are (in no particular order):
- Zig on exercism.org
- Learning Zig
- Zig Cookbook
- zig.guide (slightly outdated)
and the most important of all, always up to date:
Setup
This program was written in a Linux environment, for the Linux environment. Therefore if you use Windows you should install WSL2 and a Linux distribution (I tested it on Ubuntu Preview). I don't think it can work on MacOS, but you're free to try, and anyway it would not be too difficult to make it work there in the future.
Install zig
First thing, you need Zig itself. If you don't have it, or if you have
a different version installed, you can download it
here. Currently this document uses the 0.15.1
release, so it's the one
that you should download.
Decompress the archive somewhere, for example in ~/.local/zig
:
tar xf <archive> --directory=/home/username/.local
mv <name-of-extracted-directory> zig
Then add this directory to your path by adding this to your .bashrc
export PATH=$PATH:~/.local/zig
Now start a new terminal instance and see if zig
is in your path:
zig version
And it should print
0.15.1
Finally, choose a directory for your project and initialize it:
mkdir kilo-zig
cd kilo-zig
zig init
Setup an editor
You will also need a text editor, before you can use your own. Which you shouldn't do anyway, so you need an editor.
I recommend you don't use any advanced tooling, like zls. I think that while still learning it's better not to use them, I find it's enough to rely on what the compiler tells you, then find and fix the mistakes by yourself.
ctags
Instead, if you use an editor that supports tags, I think it's a good idea to
use them, to navigate faster between functions, types and other parts of our
project. To use tags you must have universal-ctags
installed, for example in
Debian/Ubuntu you install it with:
sudo apt install universal-ctags
But ctags
doesn't support natively Zig, so you should create a file at
~/.config/ctags/zig.ctags
with this content:
--langdef=zig
--map-zig=.zig
--kinddef-zig=f,function,functions
--kinddef-zig=m,method,methods
--kinddef-zig=t,type,types
--kinddef-zig=v,field,fields
# functions
--regex-zig=/^(export +)?(pub +)?(inline +)?(extern .+ )?fn +([a-zA-Z0-9_]+)/\5/f/{exclusive}
# structs, union, enum
--regex-zig=/^(export +)?(pub +)?[\t ]*const +([a-zA-Z0-9_]+) = (struct|enum|union)/\3/t/{exclusive}{scope=push}
--regex-zig=/^}///{exclusive}{scope=pop}{placeholder}
# methods
--regex-zig=/^[\t ]+(pub +)?(inline +)?fn +([a-zA-Z0-9_]+)/\3/m/{exclusive}{scope=ref}
# public constants/variables
--regex-zig=/^(export +)?pub +(const|var) +([a-zA-Z0-9_]+)(:.*)? = .*/\3/v/{exclusive}
The Zig build system
From now on, I'll assume the directory of the project is located in
~/kilo-zig
.
cd ~/kilo-zig
After having initialized the project with zig init
inside that directory,
a bunch of files will have been created. We don't need the src/root.zig
file,
because that is only useful if we are creating a library, and we're not, so we
delete it:
rm src/root.zig
We'll also have to edit the build.zig
file, which is the zig equivalent of
a Makefile. I will not go into details about how the zig build system works,
because I barely know it myself. What matters now is that currently the default
build file is unsuitable to build our project. If we open it, we'll see that it
does several things:
- it defines build options
- it defines a module (
mod
that points atsrc/root.zig
) - it defines a main executable (
exe
that points atsrc/main.zig
) - it adds steps for tests for both main executable and module
We'll have to remove all the steps that would build a module. So you remove:
- the
mod
variable - the
.imports
field in the.addExecutable()
argument - other lines with
mod
:mod_tests
,run_mod_tests
and so on
You'll also rename exe.name
to kilo
.
This is the final build.zig
with most comments removed:
const std = @import("std");
pub fn build(b: *std.Build) void {
// Standard target options allow the person running `zig build` to choose
// what target to build for.
const target = b.standardTargetOptions(.{});
// Standard optimization options allow the person running `zig build` to select
// between Debug, ReleaseSafe, ReleaseFast, and ReleaseSmall.
const optimize = b.standardOptimizeOption(.{});
// Here we define an executable. An executable needs to have a root module
// which needs to expose a `main` function.
const exe = b.addExecutable(.{
.name = "kilo",
.root_module = b.createModule(.{
.root_source_file = b.path("src/main.zig"),
.target = target,
.optimize = optimize,
}),
});
// By default the install prefix is `zig-out/` but can be overridden by
// passing `--prefix` or `-p`.
b.installArtifact(exe);
// This creates a top level step. Top level steps have a name and can be
// invoked by name when running `zig build` (e.g. `zig build run`).
// This will evaluate the `run` step rather than the default step.
const run_step = b.step("run", "Run the app");
// This creates a RunArtifact step in the build graph.
const run_cmd = b.addRunArtifact(exe);
run_step.dependOn(&run_cmd.step);
// By making the run step depend on the default step, it will be run from the
// installation directory rather than directly from within the cache directory.
run_cmd.step.dependOn(b.getInstallStep());
// This allows the user to pass arguments to the application in the build
// command itself, like this: `zig build run -- arg1 arg2 etc`
if (b.args) |args| {
run_cmd.addArgs(args);
}
// Creates an executable that will run `test` blocks from the executable's
// root module.
const exe_tests = b.addTest(.{
.root_module = exe.root_module,
});
// A run step that will run the second test executable.
const run_exe_tests = b.addRunArtifact(exe_tests);
// A top level step for running all tests.
const test_step = b.step("test", "Run tests");
test_step.dependOn(&run_exe_tests.step);
}
The main.zig file
Every respectable program has an entry point, to let users to actually execute it and do something with it. Our program is no exception.
Our entry point is located in src/main.zig
, as we defined it in the
build.zig
script. The file doesn't have to be named this way, but it must
contain a main()
function.
I like to have big banners to separate sections of the source code, you don't have to follow my habits of course, feel free to remove them if you don't like them.
zig init
created a src/main.zig
, which we'll have to replace entirely with
this:
///////////////////////////////////////////////////////////////////////////////
//
// Main function
//
///////////////////////////////////////////////////////////////////////////////
pub fn main() !void {
var da = std.heap.DebugAllocator(.{}){};
defer _ = da.deinit();
const allocator = switch (builtin.mode) {
.Debug => da.allocator(),
else => std.heap.smp_allocator,
};
_ = allocator;
}
///////////////////////////////////////////////////////////////////////////////
//
// Constants, variables
//
///////////////////////////////////////////////////////////////////////////////
const std = @import("std");
const builtin = @import("builtin");
We keep the constants at the bottom of the file, so they don't get too much in the way. Now they're two, but there's often a whole lot of them.
What we're doing for now is define the allocators we'll be using. Code doesn't compile if the variables defined in it aren't being used, Zig never likes that. So for now we have:
_ = allocator;
after we define the constant.
What this code means, at any rate, is that we use the debug allocator in Debug mode, and a much faster allocator in proper release modes.
The builtin.mode
defaults to .Debug
, so if we simply run
zig build
it will build the program in debug mode. To use the faster allocator we'll need to pass an argument, for example:
zig build -Doptimize=ReleaseSmall # optimize for small binary size
zig build -Doptimize=ReleaseFast # optimize for performance
zig build -Doptimize=ReleaseSafe # optimize for safety
But we'll mostly build in debug mode, because if something goes wrong and the program panics, we'll get the most useful informations about what has caused the panic, such as array access with index out of bounds (it happened often to me while writing the program).
Panic handler
Speaking of panic, we want to add our own panic handler. Normally, if the program panics, it will crash and invoke the default panic handler, which prints a stack trace about the error. We'll need more than that, so we change the panic handler to our own:
///////////////////////////////////////////////////////////////////////////////
//
// Panic handler
//
///////////////////////////////////////////////////////////////////////////////
pub const panic = std.debug.FullPanic(crashed);
fn crashed(msg: []const u8, trace: ?usize) noreturn {
std.debug.defaultPanic(msg, trace);
}
Since we don't need it for anything yet, what it does is simply to call the default panic handler, passing the same arguments it receives.
You may have noticed that strange return type: noreturn
. It means the
function doesn't simply return anything, like a void
would do, it doesn't
return at all. This is so because when this function is called, our program has
crashed already, and it couldn't return any value anyway. You shouldn't worry
about it because it's the first and last time we'll see it in our program.
When the program encounters an error at runtime, depending on the kind of error, two things may happen:
- the program crashes (best case)
- the program keeps running, but its state is corrupted (worst case)
In the second case really nasty things can happen, so we want to avoid bugs at all costs. In safe release modes (Debug and ReleaseSafe), events that would normally cause a crash or undefined behavior cause panic instead. The program terminates and you get a meaningful stack trace of what has caused the error.
Terminal configuration
When we write text in an editor, the character is immediately read and handled by the program. This is not what happens normally in a terminal, because the default way a terminal handles keypresses is the so-called canonical mode: in this mode, keys are sent to the program only after the user presses the Enter key.
Let's write first a function that can read bytes from the user keypresses:
// Read from stdin into `buf`, return the number of read characters
fn readChar(buf: []u8) !usize {
const stdin = std.posix.STDIN_FILENO;
return try std.posix.read(stdin, buf);
}
This will read from stdin
one character at a time, store the read character
in buf
and return the number of characters that have been read.
buf
should be a slice, because std.posix.read
accepts a slice as parameter.
In general, you'll find out that working with slices will prevent a lot of headaches, because the Zig type system is very strict, but most functions of the standard libraries that work with arrays are designed to take a slice as parameter. You still keep the ownership of the underlying array, of course.
Remeber that to pass a slice of an array to a function we use one of the following notations:
&array // create the slice by taking the address of an array
array[0..] // a slice with all elements of an array
Let's call it from main()
by adding these lines:
};
_ = allocator;
var buf: [1]u8 = undefined;
while (try readChar(&buf) == 1 and buf[0] != 'q') {}
If you build and run the program in a terminal, you'll see that even if you press q the loop doesn't stop, you need to press Enter, and if you press any key after q, you'll find those characters in your command line prompt.
So you'll understand the need to change how the terminal sends what it reads to our program, and this is what raw mode is for.
For this purpose, we'll create a new module for our program, we'll call it
linux
, and it will handle all interactions with the operating system, such as
reading characters.
The linux module
Create a file src/linux.zig
and paste the following content:
//! Module that handles interactions with the operating system.
///////////////////////////////////////////////////////////////////////////////
//
// Raw mode
//
///////////////////////////////////////////////////////////////////////////////
/// Enable terminal raw mode, return previous configuration.
pub fn enableRawMode() !linux.termios {
const orig_termios = try posix.tcgetattr(STDIN_FILENO);
// stuff here
return orig_termios;
}
/// Disable terminal raw mode by restoring the saved configuration.
pub fn disableRawMode(termios: linux.termios) void {
posix.tcsetattr(STDIN_FILENO, .FLUSH, termios) catch @panic("Disabling raw mode failed!");
}
///////////////////////////////////////////////////////////////////////////////
//
// Constants, variables
//
///////////////////////////////////////////////////////////////////////////////
const std = @import("std");
const linux = std.os.linux;
const posix = std.posix;
const STDOUT_FILENO = posix.STDOUT_FILENO;
const STDIN_FILENO = posix.STDIN_FILENO;
For now, we have two functions:
enableRawMode | should change the terminal configuration, switching away from canonical mode, then should return the original configuration |
disableRawMode | should restore the original configuration |
We have to fill the enableRawMode
function, since right now it's not doing
anything.
Enabling raw mode
The original booklet I mentioned in the introduction goes into great detail in explaining what all the flags mean. I have no intention to do that, if you are curious about them you can consult the original.
First we make a copy of the original configuration, so that we can modify it.
// stuff here
// make a copy
var termios = orig_termios;
We then set a number of flags in this copy. We disable echoing of the characters we type:
termios.lflag.ECHO = false; // don't echo input characters
We disable canonical mode, so that the terminal doesn't wait for Enter to be pressed when reading characters:
termios.lflag.ICANON = false; // read input byte-by-byte instead of line-by-line
We disable some key combinations that usually have a special behavior in terminals, so that are available for us to use them in our program:
termios.lflag.ISIG = false; // disable Ctrl-C and Ctrl-Z signals
termios.iflag.IXON = false; // disable Ctrl-S and Ctrl-Q signals
termios.lflag.IEXTEN = false; // disable Ctrl-V
termios.iflag.ICRNL = false; // CTRL-M being read as CTRL-J
For reference:
key | default behavior |
---|---|
Ctrl-C | sends a SIGINT signal that causes the program to terminate |
Ctrl-Z | sends a SIGSTOP signal which causes the suspension of the program (which you can then resume with fg in the terminal command line) |
Ctrl-S | produces XOFF control character, halts data transmission |
Ctrl-Q | produces XON control character, resumes data transmission |
Ctrl-V | next character will be inserted literally |
Ctrl-M | read as ASCII 10 Ctrl-J instead of 13 Enter |
Let's disable output processing, to prevent the terminal to issue a carriage
return (\r
) in addition to each new line (\n
) when Enter is
pressed:
termios.oflag.OPOST = false; // disable output processing
You can see that the termios flags are placed into structs that start either
with i
(input, as in iflags
) or o
(output, as in oflags
).
Let's disable more flags, which are even more obscure than the previous ones and that I won't even try to explain (sorry):
termios.iflag.BRKINT = false; // break conditions cause SIGINT signal
termios.iflag.INPCK = false; // disable parity checking (obsolete?)
termios.iflag.ISTRIP = false; // disable stripping of 8th bit
termios.cflag.CSIZE = .CS8; // set character size to 8 bits
From the original booklet
From the original booklet
This step probably won’t have any observable effect for you, because these flags are either already turned off, or they don’t really apply to modern terminal emulators. But at one time or another, switching them off was considered (by someone) to be part of enabling “raw mode”, so we carry on the tradition (of whoever that someone was) in our program.
As far as I can tell:
- When BRKINT is turned on, a break condition will cause a SIGINT signal to be sent to the program, like pressing Ctrl-C.
- INPCK enables parity checking, which doesn’t seem to apply to modern terminal emulators.
- ISTRIP causes the 8th bit of each input byte to be stripped, meaning it will set it to 0. This is probably already turned off.
- CS8 is not a flag, it is a bit mask with multiple bits, which we set using the bitwise-OR (|) operator unlike all the flags we are turning off. It sets the character size (CS) to 8 bits per byte. On my system, it’s already set that way.
A timeout for read()
Finally, we want to set a timeout for read()
, so that our editor will be able
to discern an Esc from an escape sequence. In fact, all terminal
escape sequences that codify for many keys begin with an Esc (that's
why they are called escape sequences), and we want to be able to handle them
accordingly.
Here we use some constants that are defined in std.os.linux
. Since they are
in an enum
, we'll have to use the builtin function @intFromEnum()
so that
we can use them for array indexing (which expects an usize
type).
// Set read timeouts
termios.cc[@intFromEnum(linux.V.MIN)] = 0; // Return immediately when any bytes are available
termios.cc[@intFromEnum(linux.V.TIME)] = 1; // Wait up to 0.1 seconds for input
This took me hours to figure out. The original kilo editor uses constants that
come from the libc termios.h
header, but initially I simply used the values
from the C version, thinking they would apply also for the Zig version. They
didn't work, that is, there was no read timeout. I initially asked the AI, and
it didn't help. I then looked for other Zig implementations of this same editor
on the internet, but all of them repeated this mistake, until I found one
implementation that did the right thing, that is, to use the constants that are
provided by the Zig standard library (what is being done in the snippet of code
above).
The lesson was: don't try to reinvent a system-defined constant, use the system-defined constant, even if it means that you must look for it in the standard library.
We're done, we can apply the new terminal configuration and return the original one:
// update config
try posix.tcsetattr(STDIN_FILENO, .FLUSH, termios);
return orig_termios;
Back to main.zig
We left our main function in this state:
pub fn main() !void {
var da = std.heap.DebugAllocator(.{}){};
defer _ = da.deinit();
const allocator = switch (builtin.mode) {
.Debug => da.allocator(),
else => std.heap.smp_allocator,
};
_ = allocator;
var buf: [1]u8 = undefined;
while (try readChar(&buf) == 1 and buf[0] != 'q') {}
}
Now we want to enable raw mode, right? And it's the first thing that our main function will do. Add these lines at the top of it:
pub fn main() !void {
orig_termios = try linux.enableRawMode();
defer linux.disableRawMode(orig_termios);
The defer
statement is important because we want to restore the original
configuration when the program exits. We also want to update the bottom section
with our new variables. Add this at the bottom of the file:
const linux = @import("linux.zig");
var orig_termios: std.os.linux.termios = undefined;
When variables and constants are placed at the root level of a file, that is,
outside any functions, they behave like static
identifiers in C, only visible
to the code of the current file, unless they have the pub
qualifier, meaning
they can be accessed from files that import the current one.
Moreover, if the module is meant to be instantiated (it has fields defined at the root level), these variable and constants are, again, static, not part of the instances: all instances will share the same value, which is quite obvious for constants, less so for variables.
Why is it important to define orig_termios
at the root level? Because we want
to handle another case: our program crashes, and we don't want to leave the
terminal in an unusable state if that happens. We'll have to update our crash
handler as well:
/// Our panic handler disables terminal raw mode and calls the default panic
/// handler.
fn crashed(msg: []const u8, trace: ?usize) noreturn {
linux.disableRawMode(orig_termios);
As you can see, also this function needs to access the original terminal configuration, and there's no way to pass it with an argument, it must read it from a variable.
Now, if you try to build and run the project, something strange happens: the program terminates immediately.
Can you guess why?
Can you guess why?
Because of the timeout to read()
in enableRawMode()
. If you comment out the
two lines where the timeout is set, you can recompile, run, and see that the
prompt keeps reading characters until you press q, only then it
terminates.
Getting the window size
Before we proceed, delete the last 2 lines in the main functions (the ones that
read the from input) and the readChar()
function as well, we won't need them
anymore.
We went past raw mode, which was possibly annoying. Unfortunately we must take care of the low level code before we can proceed to code the actual editor. And there's still a good bit to come.
Before we can draw anything on the screen, we must know its size, the number of rows and columns.
There are two ways to do this, with the second method that will be attempted in the case that the first one fails.
The first method involves calling the linux ioctl
function to request the
window size from the operating system.
The fallback method involves determining the cursor position in a maximized window.
The ioctl
method
We'll first create two new modules:
types.zig | hub for all the custom types of our editor |
ansi.zig | handles ansi escape sequences |
In src/types.zig
we'll write this:
//! Collection of types used by the editor.
///////////////////////////////////////////////////////////////////////////////
//
// Editor types
//
///////////////////////////////////////////////////////////////////////////////
/// Dimensions of the terminal screen where the editor runs.
pub const Screen = struct {
rows: usize = 0,
cols: usize = 0,
};
In src/ansi.zig
we'll write this:
//! Module that handles ansi terminal sequences.
///////////////////////////////////////////////////////////////////////////////
//
// Functions
//
///////////////////////////////////////////////////////////////////////////////
/// Get the window size.
pub fn getWindowSize() !t.Screen {
// code to come...
}
///////////////////////////////////////////////////////////////////////////////
//
// Constants, variables
//
///////////////////////////////////////////////////////////////////////////////
const std = @import("std");
const linux = @import("linux.zig");
const t = @import("types.zig");
We should fill the getWindowSize()
function.
var screen: t.Screen = undefined;
var wsz: std.posix.winsize = undefined;
if (linux.winsize(&wsz) == -1 or wsz.col == 0) {
// fallback method will be here
} else {
screen = t.Screen{
.rows = wsz.row,
.cols = wsz.col,
};
}
return screen;
Much like in the original C code, we use ioctl()
to request the window size
of the terminal, and this will be stored in the wsz
struct which we pass by
reference.
The ioctl()
function returns -1
on failure, but we consider a failure also
a column value of 0
in the passed wsz
struct.
Note that in the second part of the condition (wsz.col == 0
) wsz
would
already have a value because it's assumed that the ioctl()
call was
successful, since it didn't return -1
.
The winsize()
function
We'll also have to update our src/linux.zig
module to add the winsize()
function that is called in getWindowSize()
:
///////////////////////////////////////////////////////////////////////////////
//
// Functions
//
///////////////////////////////////////////////////////////////////////////////
/// Read the window size into the `wsz` struct.
pub fn winsize(wsz: *posix.winsize) usize {
return linux.ioctl(STDOUT_FILENO, linux.T.IOCGWINSZ, @intFromPtr(wsz));
}
To know why std.os.linux.ioctl
is invoked like that, we should look for it in
the Zig standard library:
pub fn ioctl(fd: fd_t, request: u32, arg: usize) usize {
return syscall3(.ioctl, @as(usize, @bitCast(@as(isize, fd))), request, arg);
}
The function doesn't have any documentation, so we just invoke it like we invoked the one in the original written in C, where the call was:
if (ioctl(STDOUT_FILENO, TIOCGWINSZ, &wsz) == -1 || wsz.ws_col == 0)
The TIOCGWINSZ
is replaced by the linux.T.IOCGWINSZ
constant, found in
std.os.linux
module of the Zig standard library.
The other difference is the third argument, that is usize
in Zig, so we must
do a pointer cast to integer:
@intFromPtr(wsz)
I put this function in linux
module because I preferred to keep all the low
level interactions with the operating system in it.
The cursor position method
In case of failure, we'll have to resort to a second method.
We replace the commented line in getWindowSize()
with:
if (linux.winsize(&wsz) == -1 or wsz.col == 0) {
// fallback method will be here
if (linux.winsize(&wsz) == -1 or wsz.col == 0) {
screen = try getCursorPosition();
Our getCursorPosition()
function also goes just below getWindowSize()
:
/// Get the cursor position, to determine the window size.
pub fn getCursorPosition() !t.Screen {
// code to come...
}
What should we do in there? The idea is to maximize the terminal screen, so that the cursor is positioned to the bottom-right corner, and read the current row and column from there.
For both things, we need issue escape sequences to the terminal.
-
to maximize the screen, we'll issue two sequences in a row, one to set the columns and one to set the rows.
-
to read the cursor position, we'll issue a sequence, and read the response of the terminal in a
[]u8
buffer
ANSI escape sequences
We'll define the following constants in ansi.zig
:
/// Control Sequence Introducer: ESC key, followed by '[' character
pub const CSI = "\x1b[";
/// The ESC character
pub const ESC = '\x1b';
// Sets the number of column and rows to very high numbers, trying to maximize
// the window.
pub const WinMaximize = CSI ++ "999C" ++ CSI ++ "999B";
// Reports the cursor position (CPR) by transmitting ESC[n;mR, where n is the
// row and m is the column
pub const ReadCursorPos = CSI ++ "6n";
linux.write()
How exactly do we send these sequences? We're back into linux.zig
.
// Write bytes to stdout, return error if the requested amount of bytes
// couldn't be written.
pub fn write(buf: []const u8) !void {
if (try posix.write(STDOUT_FILENO, buf) != buf.len) {
return error.WriteIncomplete;
}
}
WriteIncomplete
in this case is an error I just made up, probably it's not
a very good way to handle incomplete writes, in the sense that we should
probaby retry. In my defense, I can say that the original C editor did this:
if (write(STDOUT_FILENO, "\x1b[6n", 4) != 4) return -1;
which means that it gave up all the same. Hey... I think we're trying hard enough already. At least for our humble editor, that is.
Back to getCursorPosition()
Now it's hopefully clear what we'll do:
- issue sequences to maximize screen and to report cursor position
- read the response in a
[]u8
buffer - parse the result, to extract the screen size
var buf: [32]u8 = undefined;
try linux.write(WinMaximize ++ ReadCursorPos);
var nread = try linux.readChars(&buf);
What's that readChars()
over there?
This is actually the function that we'll use to read all input from stdin
, so
it's worth taking care of it right now. It's not too different from the
readChar()
function we wrote in main.zig
and that we carelessly deleted
when we didn't need it anymore.
linux.readChars()
/// Keep reading from stdin until we get a valid character, ignoring
/// .WouldBlock errors.
pub fn readChars(buf: []u8) !usize {
while (true) {
const n = posix.read(STDIN_FILENO, buf) catch |err| switch (err) {
error.WouldBlock => continue,
else => return err,
};
if (n >= 1) return n;
}
}
Let's compare it with the previous readChar()
function which was:
// Read from stdin into `buf`, return the number of read characters.
fn readChar(buf: []u8) !usize {
return try posix.read(STDIN_FILENO, buf);
}
The main difference is that now we are in raw mode, and there is a read()
timeout in place, so we must handle the error which happens when the timeout
kicks in. This error is .WouldBlock
, and we must ignore it, that is, we must
keep reading until we read something, or a different error is returned by
posix.read()
.
If posix.read()
finally returns a positive number because it read something,
we return it. If it didn't read anything, it's probably because we didn't type
anything, and the loop continues.
Back to getCursorPosition()
So now we got the response from the terminal, and we read it inside our []u8
buffer.
var nread = try linux.readChars(&buf);
if (nread < 5) return error.CursorError;
For a response to be valid, it should follow this format:
ESC ] rows ; cols R
for example, 0x1b]50;120R
. This sequence has a minimum of 5 characters, plus
the final R
. I think in some occasions I couldn't read the R
character
immediately, but maybe I've been doing something wrong? Anyway this is what we
do:
// we should ignore the final R character
if (buf[nread - 1] == 'R') {
nread -= 1;
}
// not there yet? we will ignore it, but it should be there
else if (try linux.readChars(buf[nread..]) != 1 or buf[nread] != 'R') {
return error.CursorError;
}
That is, we keep reading until we get this R
character, if it's not yet in
our buffer. Since we don't want to overwrite our previous response, we pass
a slice that starts at nread
, which is the number of characters that have
been read until now. When R
is finally read, buf[nread]
should hold it.
If the first two characters aren't ESC ]
, we error out:
if (buf[0] != ESC or buf[1] != '[') return error.CursorError;
Finally we must parse the number of rows and columns. The original C code used
sscanf()
for this purpose, but we won't use libc
in this project. We parse
it by hand.
var screen = t.Screen{};
var semicolon: bool = false;
var digits: u8 = 0;
// no sscanf, format to read is "row;col"
// read it right to left, so we can read number of digits
// stop before the CSI, so at index 2
var i = nread;
while (i > 2) {
i -= 1;
if (buf[i] == ';') {
semicolon = true;
digits = 0;
}
else if (semicolon) {
screen.rows += (buf[i] - '0') * try std.math.powi(usize, 10, digits);
digits += 1;
} else {
screen.cols += (buf[i] - '0') * try std.math.powi(usize, 10, digits);
digits += 1;
}
}
if (screen.cols == 0 or screen.rows == 0) {
return error.CursorError;
}
return screen;
If you did programming exercises before, this method of parsing integers should
be familiar. The Zig standard library has a function for this purpose
(std.fmt.parseInt
), but in this case it wouldn't have spared us much trouble.
There's a semicolon between the numbers, and we would have needed to track the
start and end position of both numbers.
First test
This test will be special because it needs an interactive terminal, it will not be executed with:
zig build test
but with:
zig test src/term_tests.zig
It will be the only test of this kind, unfortunately it's also the first one.
We want to test if our functions work. Specifically, we'll test the
getWindowSize()
and getCursorPosition()
, which also test setting raw mode
and readChars()
along the way.
We'll add a couple of constants at the bottom of ansi.zig
:
const builtin = @import("builtin");
// CSI sequence to clear the screen.
pub const ClearScreen = CSI ++ "2J" ++ CSI ++ "H";
We'll create a new file named src/term_tests.zig
, with this content:
//! Additional tests that need an interactive terminal, not testable with:
//!
//! zig build test
//!
//! Must be tested with:
//!
//! zig test src/term_tests.zig
test "getWindowSize" {
const orig_termios = try linux.enableRawMode();
defer linux.disableRawMode(orig_termios);
const s1 = try ansi.getWindowSize();
try std.testing.expect(s1.rows > 0 and s1.cols > 0);
const s2 = try ansi.getCursorPosition();
try linux.write(ansi.ClearScreen);
try std.testing.expect(s1.rows == s2.rows and s1.cols == s2.cols);
}
const std = @import("std");
const linux = @import("linux.zig");
const ansi = @import("ansi.zig");
We'll clear the screen after having called the second method, because that function call has the side-effect of maximizing the terminal screen, which messes up the output of the test result.
To ensure that our getWindowSize()
works and doesn't fallback, we must add
a check in that function:
if (linux.winsize(&wsz) == -1 or wsz.col == 0) {
if (builtin.is_test) return error.getWindowSizeFailed;
This will cause the function to error out, if the ioctl
method fails.
We will then get the window size with the fallback method, and ensure the
resulting sizes are the same.
Digression: the comptime keyword
You probably know of comptime
in Zig. Here we have an application of the
concept: since the builtin.is_test
variable is evaluated at compile time, the
whole branch in getWindowSize()
can be resolved at compile time, the relative
code will be removed and will not be executed at runtime.
This has the same effect of an #ifdef
block in C for conditional compilation,
but the syntax looks much less intrusive. You can even force any expression to
be evaluated at compile time by using the comptime
keyword before the
expression, but here it's not needed, because the builtin.is_test
variable is
guaranteed to be compile-time known.
While using the comptime
keyword, sometimes the compiler complains that using
the keyword is redundant, because the expression is always compile-time known,
other times it doesn't complain, as in the case above, even if I'm pretty sure
that all builtin
variables are compile-time known. We saw another example in
the main()
function, where the allocator was chosen by testing the
builtin.mode
variable.
To my understanding, also from reading several posts made by the original creator of Zig (Andrew Kelley), most of the time it's not necessary to use the keyword, the compiler is smart enough to evaluate at compile time what it can, even if you don't specify it expressly. But sometimes the compiler says:
error: unable to resolve comptime value
In these cases the comptime
keyword might fix the issue.
Bottom line: don't be compulsive in filling your code with comptime
, it's not
necessary.
The editor types
Now it's time for the first steps towards the creation of our editor.
The original C code of kilo
is single-file, with a global variable E
that
holds the Editor struct, and all functionalities are implemented there.
Initially I wrote this program pretty much in the same way, and it worked, as
a demonstration that you can write code in Zig that uses global variables, just
as in old-fashioned C programs.
In Zig you can also (and probably should) use instantiable types, which then are used in a OOP fashion by omitting the first argument when this is of the same type, either passed by reference or by value. You should know this already, so I won't elaborate.
It may be useful to remind that a Zig module is essentially a struct
, that
is, you can think the content of a file as wrapped in
struct {
// the file content
}
which means that we can define at the root level of a file the members of our type, then treat the whole file as an instantiable type. That's what we'll do with the main types of our editor, which will be:
Editor | for the editor functionalities |
Buffer | for the file contents |
Row | each row of the buffer |
View | tracks cursor position and offsets of the editing window |
To keep the code simple, we'll code most functionalities in the Editor type, while the others will be lightweight structs that never modify the state of the editor.
The types
module holds all our types
Even though each of these types will have its own importable module, all other
modules will access them through the src/types.zig
module, that serves as
a centralized hub for all our types.
We can do this because the program is small, but probably it wouldn't be a wise
thing to do in a large program. Still, also the Zig standard library often
makes types defined in submodules accessible from the root module. An example
is std.ArrayList
.
We'll do it right away, open the types
module and add this below the Screen
definition:
pub const Editor = @import("Editor.zig");
pub const Buffer = @import("Buffer.zig");
pub const Row = @import("Row.zig");
pub const View = @import("View.zig");
The files don't exist yet, but we'll create them soon. We'll build up the types little by little, adding more stuff only when we need it.
We also create a section for other miscellaneous types:
///////////////////////////////////////////////////////////////////////////////
//
// Other types
//
///////////////////////////////////////////////////////////////////////////////
/// A dynamical string.
pub const Chars = std.ArrayList(u8);
And the usual Constants section:
///////////////////////////////////////////////////////////////////////////////
//
// Constants, variables
//
///////////////////////////////////////////////////////////////////////////////
const std = @import("std");
The Editor type
Create src/Editor.zig
and let's start adding the struct members. This is an
instantiable module, that's why the filename starts with a capital letter. It's
not enforced, but it's the Zig convention for types to be capitalized.
At the top, we add comments followed by an exclamation mark: it's the module description. Such special comments may be used for documentation generation.
//! Type that manages most of the editor functionalities.
//! It draws the main window, the statusline and the message area, and controls
//! the event loop.
At the bottom, we put our usual section with constants:
///////////////////////////////////////////////////////////////////////////////
//
// Constants, variables
//
///////////////////////////////////////////////////////////////////////////////
const Editor = @This();
const std = @import("std");
const t = @import("types.zig");
@This()
is a builtin function that returns the type of the struct. It is
capitalized like all functions that return types. This constant means that in
this file Editor
refers to the same type we're defining. Others prefer to
name such constants Self
. I prefer more descriptive names.
Fields
Back at the top, below the module description, we start adding the type members:
/// Allocator used by the editor instance
alc: std.mem.Allocator,
We'll use a single allocator for now, the Editor will pass its own to the types
that will require it. We call the field simply alc
, because it will be passed
so often as argument, that I prefer to keep the name short.
/// The size of the terminal window where the editor runs
screen: t.Screen,
/// Text buffer the user is currently editing
buffer: t.Buffer,
/// Tracks cursor position and part of the buffer that fits the screen
view: t.View,
/// Becomes true when the main loop should stop, causing the editor to quit
should_quit: bool,
We didn't create the Buffer
nor the View
type yet.
should_quit
is the variable that we'll use to control the main event loop.
When this variable becomes true, the loop is interrupted and the program quits.
Initialization
Now we'll create functions to initialize/deinitialize the editor:
///////////////////////////////////////////////////////////////////////////////
//
// Init/deinit
//
///////////////////////////////////////////////////////////////////////////////
/// Return the initialized editor instance.
pub fn init(allocator: std.mem.Allocator, screen: t.Screen) !Editor {
return .{
.alc = allocator,
.screen = .{
.rows = screen.rows - 2, // make room for statusline/message area
.cols = screen.cols,
},
.buffer = try t.Buffer.init(allocator),
.view = .{},
.should_quit = false,
};
}
This is a simple init()
function that returns a new instance of the Editor.
It's not a method because its first argument is not of type Editor
.
It is invoked in this way:
var editor = Editor.init(allocator, screen);
The deinit()
function, on the other hand, is a proper method, because it is
used to deinitialize an instance.
/// Deinitialize the editor.
pub fn deinit(e: *Editor) void {
e.buffer.deinit();
}
Accordingly, it is invoked like this:
editor.deinit();
Everything that has used an allocator should be deinitialized here. If you
forget to deinitialize/deallocate something, while still using the
DebugAllocator
, you'll be told when exiting the program that your program has
leaked memory, and the relative stack trace.
We'll also add a method called startUp()
. This function will handle the event
loop, and is also called from main()
.
/// Start up the editor: open the path in args if valid, start the event loop.
pub fn startUp(e: *Editor, path: ?[]const u8) !void {
if (path) |name| {
_ = name;
// we open the file
}
else {
// we generate the welcome message
}
while (e.should_quit == false) {
// refresh the screen
// process keypresses
}
}
It's only a stub, but you can see what it should do.
Before continuing the Editor type, we must define the other ones.
The Buffer type
Description at the top:
//! A Buffer holds the representation of a file, divided in rows.
//! If modified, it is marked as dirty until saved.
Let's add the constants: as usual, they'll stay at the bottom.
Also here we set a constant to @This()
, so that we can refer to our type
inside its own definition.
///////////////////////////////////////////////////////////////////////////////
//
// Constants, variables
//
///////////////////////////////////////////////////////////////////////////////
const Buffer = @This();
const std = @import("std");
const t = @import("types.zig");
/// Initial allocation size for Buffer.rows
const initial_rows_capacity = 40;
Fields
Also in this case, as you can see, no default initializers.
Some members are optional, meaning that they can be null
, and null
will
be their initial value when the Buffer is initialized.
alc: std.mem.Allocator,
// Modified state
dirty: bool,
// Buffer rows
rows: std.ArrayList(t.Row),
// Path of the file
filename: ?[]u8,
// Name of the syntax
syntax: ?[]const u8,
Initialization
All in all, this type is quite simple. It doesn't handle single row
initialization, because rows are created and inserted by the Editor, but it
will deinitialize them. Possibly I'm doing a questionable choice here, maybe
I should let the Buffer initialize the single rows, since it's here that
they're freed at last. Especially if we intend to give a Buffer its own
different allocator (an arena
allocator probably would fit it best). But it's
a small detail, since the Editor can access the Buffer
allocator just fine,
since there are no private fields in Zig.
///////////////////////////////////////////////////////////////////////////////
//
// Init/deinit
//
///////////////////////////////////////////////////////////////////////////////
pub fn init(allocator: std.mem.Allocator) !Buffer {
return Buffer{
.alc = allocator,
.rows = try .initCapacity(allocator, initial_rows_capacity),
.dirty = false,
.filename = null,
.syntax = null,
};
}
pub fn deinit(buf: *Buffer) void {
t.freeOptional(buf.alc, buf.filename);
t.freeOptional(buf.alc, buf.syntax);
for (buf.rows.items) |*row| {
row.deinit(buf.alc);
}
buf.rows.deinit(buf.alc);
}
There is one new function, freeOptional()
, which we didn't define yet.
It's a simple helper, but it doesn't harm to have some helper functions.
I put it in the types
module, right above the bottom section:
///////////////////////////////////////////////////////////////////////////////
//
// Functions
//
///////////////////////////////////////////////////////////////////////////////
/// Free an optional slice if not null.
pub fn freeOptional(allocator: std.mem.Allocator, sl: anytype) void {
if (sl) |slice| {
allocator.free(slice);
}
}
I put this function here only because the types
module is accessed by most
other modules, so it's easily accessible. But since it doesn't return
a Type, I think it's slightly misplaced.
The Row type
Description at the top:
//! A Row contains 3 arrays, one for the actual characters, one for how it is
//! rendered on the screen, and one with the highlight of each element of the
//! rendered array.
Constants:
///////////////////////////////////////////////////////////////////////////////
//
// Constants, variables
//
///////////////////////////////////////////////////////////////////////////////
const Row = @This();
const std = @import("std");
const t = @import("types.zig");
const initial_row_size = 80;
Also for this type, we keep it simple: no operations are performed by it. We will add more things to this type as soon as we need them, this is only a partial implementation.
/// The ArrayList with the actual row characters
chars: t.Chars,
/// Array with the visual representation of the row
render: []u8,
///////////////////////////////////////////////////////////////////////////////
//
// Init/deinit
//
///////////////////////////////////////////////////////////////////////////////
pub fn init(allocator: std.mem.Allocator) !Row {
return Row{
.chars = try .initCapacity(allocator, initial_row_size),
.render = &.{},
};
}
pub fn deinit(row: *Row, allocator: std.mem.Allocator) void {
row.chars.deinit(allocator);
allocator.free(row.render);
}
Some explanations:
-
our
chars
field is a dynamic string, it contains the actual characters of the row, it expands or shrinks as characters are typed/deleted. We set an initial capacity, to reduce the need for later allocations. -
the
render
field is a simple array ofu8
. This is probably not optimal, but we'll see later if we can improve the implementation. The point is that this array doesn't need to grow dynamically, when it is updated its new size can be precalculated, so at most it would need a single reallocation, which may result in no new allocation at all. For now we keep it simple. -
as usual, the
init()
function returns a new instance, thedeinit()
method frees the memory.
We also add some methods that will help us keeping code concise:
///////////////////////////////////////////////////////////////////////////////
//
// Methods
//
///////////////////////////////////////////////////////////////////////////////
/// Length of the real row.
pub fn clen(row: *Row) usize {
return row.chars.items.len;
}
/// Length of the rendered row.
pub fn rlen(row: *Row) usize {
return row.render.len;
}
In this line
.render = &.{},
you might wonder why that notation: &.{}
. It's a zero-length slice. The
official documentation
says:
A zero-length initialization can always be used to create an empty slice,
even if the slice is mutable. This is because the pointed-to data is zero
bits long, so its immutability is irrelevant.
It's different from initializing a slice to undefined
, because here the slice
has a known length, which is 0
. So you can loop it safely, provided that
you check its length and don't access any index, since it's empty.
for (&.{}) |c| {} // ok
I think it's always preferable to initialize a slice this way, rather than with
undefined
.
The View type
This type only contains fields. This is the full type, nothing more will be added. It tracks the cursor position and the portion of the buffer that is shown in the main window.
//! A View of the current buffer is what we can see of it, and where the
//! cursor lies in it. It's basically the editor window where the file is
//! shown.
/// cursor column
cx: usize = 0,
/// cursor line
cy: usize = 0,
/// column in the rendered row
rx: usize = 0,
/// wanted column when moving vertically across shorter lines
cwant: usize = 0,
/// the top visible line, increases as we scroll down
rowoff: usize = 0,
/// the leftmost visible column
coloff: usize = 0,
Digression: default initializers
When defining the Screen
type, I wrote that Zig supports default initializers
for structs, with a catch. The catch is that they may be the source of illegal
behaviors, as stated by the official language
reference.
In the example from that link, it's not too clear at first sight what's the problem in that struct is, so it's worth pointing it out.
Given this struct:
const Threshold = struct {
minimum: f32 = 0.25,
maximum: f32 = 0.75,
fn validate(t: Threshold, value: f32) void {
assert(t.maximum >= t.minimum);
}
};
If we create a variable like this:
var a = Threshold{ .maximum = 0.2 };
we created a variable where the maximum is smaller than the minimum, and the
validate()
function would panic at runtime. So if in your code you rely on
the assumption that maximum
is always greater than minimum
, you could fall
into some illegal behavior.
For this reason in this program I avoid default initializers for complex types that have methods, which may access those values. I only use them for simple types without methods, because it's hard to give up the convenience of being able to write:
var a = SomeType{};
For more complex types I use a init()
function that returns the
instance, as it's customary to have such functions, and set default values
there.
undefined as default value
undefined
is generally used for local variables whose lifetime is limited and
obvious, as obvious is the place where they acquire a meaningful value. The
compiler will not warn you if you use variable set to undefined
. Instead, it
will warn you if you don't initialize a member. Therefore you should have
a really good reason to set an undefined
default value inside structs.
Anyway, why using undefined
at all? For example, sometimes you need
a variable declared beforehand in an upper scope. In this case, setting it to
a value that it is meant to be overwritten would cause confusion: why am
I setting it to that value? The intent of undefined
is clear instead: this
variable must acquire a meaningful value later on.
Sze from the Ziggit forum says:
You are telling the compiler that you want the value to be undefined. And there aren’t enough safety checks yet so that all ways to use such an undefined value would be caught reliably. So for now you have to be careful and it is better to only use undefined, when you are making sure that you are setting it to a valid value before you actually use it. In cases where some field sometimes needs to be set to undefined, it is better to avoid using a field default value for that and instead pass undefined for that field value explicitly during initialization/setup.
Initialize the editor
Let's open main.zig
and initialize the editor. Add this to the main()
function:
_ = allocator;
var e = try t.Editor.init(allocator, try ansi.getWindowSize());
defer e.deinit();
var args = std.process.args();
_ = args.next(); // ignore first arg
try e.startUp(args.next()); // possible file to open
If you remember, the Editor.init()
function had this signature:
pub fn init(allocator: std.mem.Allocator, screen: t.Screen) !Editor
which means that, besides an allocator, it wants to know the size of the
screen, which is what getWindowSize()
fetches.
If the ansi
and types
modules aren't being imported, add them to the
constants.
const t = @import("types.zig");
const ansi = @import("ansi.zig");
Next we process the command line arguments, we skip the first one, since it's
the name of our executable, finally we start up the editor passing the second
argument, which could be null
.
Keypress processing
Before starting to draw anything, let's handle keypresses, because it's easier, shorter, and as a bonus we'll have a way to quit the editor if we build it (remember that our raw mode disables Ctrl-C).
By the way, did you follow my advice to install ctags
? Because from now on,
we'll move very often from function to function, file to file, and having to
spend half a minute to find something kills completely the fun, believe me.
This is our event loop in Editor.startUp()
:
while (e.should_quit == false) {
// refresh the screen
// process keypresses
}
while (e.should_quit == false) {
// refresh the screen
try e.processKeypress();
}
Let's create the function:
///////////////////////////////////////////////////////////////////////////////
//
// Keys processing
//
///////////////////////////////////////////////////////////////////////////////
/// Process a keypress: will wait indefinitely for readKey, which loops until
/// a key is actually pressed.
fn processKeypress(e: *Editor) !void {
const k = try ansi.readKey();
const static = struct {
var q: u8 = 3;
};
switch (k) {
.ctrl_q => {
if (static.q > 1) {
static.q -= 1;
return;
}
try ansi.clearScreen();
e.should_quit = true;
},
else => {},
}
// reset quit counter for any keypress that isn't Ctrl-Q
static.q = 3;
}
This function calls ansi.readKey()
(which we didn't write yet), then handle
the keypress. The only keypress that we handle for now is Ctrl-Q,
and we want to press it 3 times in a row before quitting.
It needs to import ansi.zig
:
const ansi = @import("ansi.zig");
Static variables in Zig
See that static
struct? Zig doesn't have the concept of static variables that
are local to a function, like in C. But you can achieve the same effect by
declaring a constant struct inside the function, and define variables (not
fields!) inside of it. You don't need to call it static
, of course, it can
have any name.
And that .ctrl_q
? It's an enum field, of an enum that we didn't write yet.
The Key enum
Let's start with the enum, it will be placed in the 'other types' section of
the types
module:
/// ASCII codes of the keys, as they are read from stdin.
pub const Key = enum(u8) {
ctrl_b = 2,
ctrl_c = 3,
ctrl_d = 4,
ctrl_f = 6,
ctrl_g = 7,
ctrl_h = 8,
tab = 9,
ctrl_j = 10,
ctrl_k = 11,
ctrl_l = 12,
enter = 13,
ctrl_q = 17,
ctrl_s = 19,
ctrl_t = 20,
ctrl_u = 21,
ctrl_z = 26,
esc = 27,
backspace = 127,
left = 128,
right = 129,
up = 130,
down = 131,
del = 132,
home = 133,
end = 134,
page_up = 135,
page_down = 136,
_,
};
This is a non-exhaustive enum: it has an underscore as last element.
Generally, enums are a strongly namespaced type. You can't infer an integer from it, if that enum doesn't have a member with that value. Non-exhaustive enums are more permissive: they are like a set of all integers of a certain type, some of which have been given a name.
This means that we will be able to cast an integer to an enum member (with
@enumFromInt
), even if the enum doesn't have a member for that integer.
Why do we want this? Because we aren't going to give a name to all possible keys:
readKey()
will readu8
characters throughreadChars()
readKey()
will return aKey
, so it must be able turn anyu8
character into aKey
enum member
But this character may be a letter, a digit, or anything that doesn't have
a field in that enum. We want to full interoperation with all possible u8
values.
Reading keys
The ansi
module needs a new constant, from the Zig standard library:
const asc = std.ascii;
Let's write ansi.readKey()
.
/// Read a character from stdin. Wait until at least one character is
/// available.
pub fn readKey() !t.Key {
// code to come...
}
We'll use a [4]u8
buffer to store the keys that will be read. We'll feed this
to the same readChars()
that we've used before.
// we read a sequence of characters in a buffer
var seq: [4]u8 = undefined;
const nread = try linux.readChars(&seq);
// if the first character is ESC, it could be part of an escape sequence
// in this case, nread will be > 2, that means that more than two
// characters have been read into the buffer, and it's an escape sequence
// for sure, if we can't recognize this sequence we return ESC anyway
If you remember, that function has a loop that ignores .WouldBlock
errors,
and it's guaranteed to read at least one byte from stdin
before returning. If
the keypress is a special key which uses CSI escape sequences, there will be
more characters. We read up to 4 characters, then we decide what to do with
them.
You can verify that the sequences are correct by opening a terminal, pressing Ctrl-V and then the key. For example:
keys | sequence | character-by-character |
---|---|---|
Ctrl-VLeft | ^[[D | ESC [ D |
Ctrl-VDel | ^[[~3 | ESC [ ~ 3 |
We use @enumFromInt
to cast a character in the sequence to a Key
enum
member, which might not be defined, but it won't be a problem since our enum is
non-exhaustive.
const k: t.Key = @enumFromInt(seq[0]);
Note that this function doesn't guarantee that we interpret all possible escape sequences: if a sequence isn't recognized, ESC is returned.
We also handle the case that more than one character has been read, but it's
not an escape sequence (nread > 1
). It's possibly a multi-byte character and
we don't handle those, so we return ESC.
If instead it's a single character, it is returned as-is.
if (k == .esc and nread > 2) {
if (seq[1] == '[') {
if (nread > 3 and asc.isDigit(seq[2])) {
if (seq[3] == '~') {
switch (seq[2]) {
'1' => return .home,
'3' => return .del,
'4' => return .end,
'5' => return .page_up,
'6' => return .page_down,
'7' => return .home,
'8' => return .end,
else => {},
}
}
}
switch (seq[2]) {
'A' => return .up,
'B' => return .down,
'C' => return .right,
'D' => return .left,
'H' => return .home,
'F' => return .end,
else => {},
}
}
else if (seq[1] == 'O') {
switch (seq[2]) {
'H' => return .home,
'F' => return .end,
else => {},
}
}
return .esc;
}
else if (nread > 1) {
return .esc;
}
return k;
clearScreen()
We also add a clearScreen()
function:
/// Clear the screen.
pub fn clearScreen() !void {
try linux.write(ClearScreen);
}
At this point, if we compile and run we should get an empty prompt, if we then press Ctrl-Q three times in a row the program should clear the screen and quit.
Reading and Writing
Before we can draw anything, we must be able to open a file, read all of its lines and store them in our Buffer.
In main()
, the first command line argument is passed to the
Editor.startUp()
function. If it is non-null, the file will be opened if
existing.
To handle read/write operations, we'll use the Io.Reader
and Io.Writer
interfaces. They have methods to process incoming/outcoming data and can do
buffered reading and writing. They are interfaces, meaning that independently
from what they are attached to, they have the same way of operating. So if you
read from stdin or from a file, you'll have access to the same ways of
processing data.
They have been only recently added to the Zig standard library and are a vast subject, so I will only mention that they exist, and that we'll be using them for some tasks.
For now we can only read a file, because we don't have the means to fill our Buffer rows yet.
Opening a file
In order, we're going to:
-
update our buffer filename, to match the path of the file we're going to open
-
try to open the file itself and read its lines
-
if that fails, we start editing an empty file with the given name
Let's update our Editor.startUp()
:
if (path) |name| {
_ = name;
// we open the file
}
if (path) |name| {
try e.openFile(name);
}
Just below startUp()
, we inaugurate a new section for file operations, and we
add an openFile()
function:
///////////////////////////////////////////////////////////////////////////////
//
// File operations
//
///////////////////////////////////////////////////////////////////////////////
/// Open a file with `path`.
fn openFile(e: *Editor, path: []const u8) !void {
// code to come...
}
Naming the buffer
We update the buffer name from the path
argument:
var B = &e.buffer;
// store the filename into the buffer
B.filename = try e.updateString(B.filename, path);
To update the filename, we write a helper function (I put the Helpers section at the bottom, above the Constants section):
///////////////////////////////////////////////////////////////////////////////
//
// Helpers
//
///////////////////////////////////////////////////////////////////////////////
/// Update the string, freeing the old one and allocating from `path`.
fn updateString(e: *Editor, old: ?[]u8, path: []const u8) ![]u8 {
t.freeOptional(e.alc, old);
return try e.alc.dupe(u8, path);
}
For now we can't rename a buffer, so the old filename will always be null
.
Which is OK only because we made our Buffer.filename
an optional type.
Open the file
After having stored the new filename into the Buffer, we try to open the file.
std.fs.cwd().openFile()
is how we open files, and it works on both relative
and absolute paths, so we don't have to worry about that.
// read lines if the file could be opened
const file = std.fs.cwd().openFile(path, .{ .mode = .read_only });
if (file) |f| {
defer f.close();
try e.readLines(f);
}
openFile()
expects an
OpenMode
enum value, which is one of the following:
pub const OpenMode = enum {
read_only,
write_only,
read_write,
};
We're opening to read, so our .mode
is .read_only
.
The function openFile()
returns an error union, so we must do a capture on
our if
statement, to get the value, or handle the error. If the file doesn't
exist (error.FileNotFound
) we don't want to quit, instead we assume we're
editing a new file. If the file exists, we read its lines, without forgetting
to close()
the file handle.
else |err| switch (err) {
error.FileNotFound => {}, // new unsaved file
else => return err,
}
Io.Reader
std.fs.File
implements the Io.Reader
interface, so we'll use that to read
its lines. A simple pattern would be like the following:
/// Read all lines from file.
fn readLines(e: *Editor, file: std.fs.File) !void {
_ = e;
var buf: [1024]u8 = undefined;
var reader = file.reader(&buf);
while (reader.interface.takeDelimiterExclusive('\n')) |line| {
// we print the line to stderr, to see if it works
std.debug.print("{s}\n", .{line});
}
else |err| if (err != error.EndOfStream) return err;
}
file
is the file that has already been opened and is ready to be read.
We create a buffer on the stack, then we initialize its reader. Io.Reader
actually lives in reader.interface
, so Io.Reader
methods will be called on
the interface.
We stop at error .EndOfStream
, which means our file has been fully read.
Other errors instead should be handled.
Now, this implementation is simple, but it has a problem: the buffer is on the stack, and has fixed size. Which means that we can't read lines longer than its size. If a file has lines that are longer than that, it will error out. We'll fix this later.
Anyway, let's test this. Create a file named kilo
at the root of the project:
#!/bin/sh
~/kilo-zig/zig-out/bin/kilo "$@" 2>err.txt
Then
chmod u+x kilo
It will run the program and write stderr
output to err.txt
. Compile and run
with an argument, the lines of the file should be written into err.txt
:
./kilo src/main.zig
Remember that we still have to press 3 times Ctrl-Q to quit.
Digression: assignments in Zig
In Zig, it's really important that you pay attention to details, which, if you have mainly experience with OOP languages, you may find confusing or even frustrating. It has to do with the fact that in Zig, as in C, all assignments imply a copy by value.
This is especially important when assigning struct
values. Most OOP
languages, when assigning objects, take a reference to it. But in Zig
structs are not references, they are values, and they are copied when
assigned.
The case of the Io.Reader interface
The interface is a nested struct in the reader. To work, it uses a builtin
function called @fieldParentPtr()
that desumes the address of its parent, so
that the interface knows the address of the struct that contains it. But if
instead of writing:
var reader = file.reader(&buf);
while (reader.interface.takeDelimiterExclusive('\n')) |line|
you write:
var reader = file.reader(&buf).interface;
while (reader.takeDelimiterExclusive('\n')) |line|
then you make a copy of that interface, which is orphan, can't take a valid address of its parent because it doesn't have one, and is essentially broken.
There's also the problem that file.reader(&buf)
, which is the legitimate
parent, in the second form doesn't have a stable address, because it's not
assigned to any variable, meaning that in the second expression it's temporary
memory that becomes immediately invalid at the end of the assignment. So even
if interface
wasn't a copy and could still get its address, it would be
invalid memory anyway.
The program will panic at runtime (in safe builds!), and the error reported can be hard to understand. Unfortunately Zig documentation is still immature, so right now you'll have to find out the hard way how these things work.
These kind of issues can be frustrating if you're used to OOP languages, which are generally designed to perform complex operations under the hood, hiding the details of the implementation from the user, for the sake of easiness of use.
In OOP languages when you assign something, often you aren't copying by value, but you are taking a reference to an object. In Zig you are expected to understand what assignments do (they always copy by value), and what you are really assigning.
Other example, many OOP languages have private fields, which can't be accessed outside of a certain scope. Zig has nothing like that, and everything is in plain sight, but it expects that you know what you're doing. As the creator of Zig said:
it all comes down to simplicity. Other languages hide complex details from
you; Zig keeps things simpler but in exchange requires you to understand
those details.
That said, there's probably room for improvement, and possibly there will be ways, in the future, to at least prevent accidental mistakes.
Interesting discussions and posts
Filling rows
Now that we can read a file line by line, we must store these lines in our Buffer rows.
We'll modify readLines()
so that it will insert the row.
_ = e;
var buf: [1024]u8 = undefined;
var reader = file.reader(&buf);
while (reader.interface.takeDelimiterExclusive('\n')) |line| {
// we print the line to stderr, to see if it works
std.debug.print("{s}\n", .{line});
while (reader.interface.takeDelimiterExclusive('\n')) |line| {
try e.insertRow(e.buffer.rows.items.len, line);
which means that we'll insert a row at the last index of Buffer.rows
.
Watch out the reading buffer
We'll also fix one problem of the current way we're reading the file. We're
using a fixed buffer which is placed on the stack, and that's ok, because our
file.reader
needs a buffer. But the way this reader works, is that this
buffer is filled with the line that is being read, then a row is inserted with
the content of this buffer.
If the line is longer than the buffer, the program will quit with an error:
error: StreamTooLong
I don't know if there's a way to salvage the line that has just been read and
be able to handle the error in the else
branch. My first guess is no.
We could allocate a very large buffer and use that:
const buf = try e.alc.alloc(u8, 60 * 1024 * 1024);
defer e.alc.free(buf);
var reader = file.reader(buf);
But this approach has multiple problems:
- it's very slow, because allocating such a large buffer is expensive
- we could get a
OutOfMemory
error - it doesn't solve the problem that you might still have files with lines longer than that
Using an allocating Reader
So we use another solution (suggested on Ziggit forum):
while (reader.interface.takeDelimiterExclusive('\n')) |line| {
try e.insertRow(e.buffer.rows.items.len, line);
}
else |err| if (err != error.EndOfStream) return err;
var line_writer = std.Io.Writer.Allocating.init(e.alc);
defer line_writer.deinit();
while (reader.interface.streamDelimiter(&line_writer.writer, '\n')) |_| {
try e.insertRow(e.buffer.rows.items.len, line_writer.written());
line_writer.clearRetainingCapacity();
reader.interface.toss(1); // skip the newline
}
else |err| if (err != error.EndOfStream) return err;
This approach makes the reader not store the line it is reading in a line
slice, but it will feeding it to an allocating Writer
, that stores the line
in itself, allocating as much as it is needed.
It uses another method of the Reader
interface:
-
instead of
takeDelimiterExclusive
, which doesn't take aWriter
as argument, it will usestreamDelimiter
, which does -
it must toss the last character, because
streamDelimiter
doesn't skip it, liketakeDelimiterExclusive
would do
Way too complex?
You can see that this is quite complex. I needed the help of experienced Zig
users just to read the lines of the file. But this is a temporary problem,
because the Reader
and Writer
interfaces are very new, and they still lack
convenience, which has been already been promised and will come soon in the
next Zig versions.
Inserting a row
If you remember, our Row
type had two arrays:
/// The ArrayList with the actual row characters
chars: t.Chars,
/// Array with the visual representation of the row
render: []u8,
where Chars
is actually a std.ArrayList(u8)
, which we'll be using a lot.
In our insertRow()
function, what we'll do is:
- initialize a new
Row
- copy the line into
row.chars
- insert the row in
Buffer.rows
Finally we'll update the row, and set the dirty
flag.
///////////////////////////////////////////////////////////////////////////////
//
// Row operations
//
///////////////////////////////////////////////////////////////////////////////
/// Insert a row at index `ix` with content `line`, then update it.
fn insertRow(e: *Editor, ix: usize, line: []const u8) !void {
const B = &e.buffer;
var row = try t.Row.init(B.alc);
try row.chars.appendSlice(B.alc, line);
try B.rows.insert(B.alc, ix, row);
try e.updateRow(ix);
B.dirty = true;
}
We set the dirty flag because the same function will be used while modifying
the buffer, but for now we're just reading the file. This flag will be reset in
openFile()
.
Add this at the bottom of openFile()
:
else |err| switch (err) {
error.FileNotFound => {}, // new unsaved file
else => return err,
}
B.dirty = false;
}
Updating a row
Updating the row means that we must update the render
field from the chars
field. That is, we must generate what will be actually rendered on screen.
The only way they will differ, at this point, is given by the possible presence
of tab characters in our chars
ArrayList.
Let's say we want to make this tabstop
an option, so that it can be
configured. We create a src/option.zig
file and paste the following:
//! Editor options. For now they are hard-coded and cannot be modified from
//! inside the editor, neither are read from a configuration file.
/// Number of spaces a tab character accounts for
pub var tabstop: u8 = 8;
As the description says, they're hard coded, but we'll still use a module, so that we can test different options ourselves if we want.
We'll also have to import it in the Constants section:
const opt = @import("option.zig");
rowAt() and currentRow()
We'll write other helper functions that we'll use a lot:
/// Get the row pointer at index `ix`.
fn rowAt(e: *Editor, ix: usize) *t.Row {
return &e.buffer.rows.items[ix];
}
/// Get the row pointer at cursor position.
fn currentRow(e: *Editor) *t.Row {
return &e.buffer.rows.items[e.view.cy];
}
Because frankly, to take that pointer all the times becomes annoying after a while.
We shouldn't worry about performance loss for too many function calls: Zig lacks macros, so the compiler tries to inline small functions when it can. Writing small functions is actually the Zig way to write macros.
updateRow()
The purpose of this function is to update the rendered row, which is what we see on screen.
/// Update row.render, that is the visual representation of the row.
/// Performs a syntax update at the end.
fn updateRow(e: *Editor, ix: usize) !void {
// code to come...
}
Allocator.realloc()
const row = e.rowAt(ix);
// get the length of the rendered row and reallocate
const rlen = // ??? total size of the rendered row ???
row.render = try e.alc.realloc(row.render, rlen);
As explained before, I chose to make row.render
a simple array because we can
desume its size before any reallocation happens. Most of the time
a reallocation would not result in a new allocation, because realloc()
does
the following:
- if the previous size is 0 (first time the row is updated) and new size is bigger, there is an allocation
- if the new size is smaller (characters are deleted), it is resized
- if the new size is slightly bigger (such as when inserting a single character while typing), most of the times it will extend the array without reallocating
- it would only allocate when the size is bigger and it's not possible to extend the array
An ArrayList would bring some benefits, but also increase total memory usage. For now we'll keep it simple, but we'll keep it in mind.
Looping characters of the real row
var idx: usize = 0;
var i: usize = 0;
while (i < row.chars.items.len) : (i += 1) {
if (row.chars.items[i] == '\t') {
row.render[idx] = ' ';
idx += 1;
while (idx % opt.tabstop != 0) : (idx += 1) {
row.render[idx] = ' ';
}
}
else {
row.render[idx] = row.chars.items[i];
idx += 1;
}
}
What the loop does, is that it inserts in row.render
the same character when
it's not a tab, otherwise it will convert it to spaces, making some
considerations in the process:
- inside the loop,
idx
is the current column in the rendered row - we want a minimum of one space, so we add it, and increase
idx
- we want to see if there are more spaces to add, and this is true if
(idx % tabstop != 0)
For example, assuming tabstop = 8
, at the start of a line, where idx
is 0,
a Tab would insert 8 spaces.
But a Tab typed in the middle of a row won't add necessarily
tabstop
spaces, because the starting column in the rendered row may be such
that idx % 8
is greater than 1, so if we insert a tab at idx = 12
, we have
a space insertion, which makes idx = 13
, then 5 more spaces, because 13 % 8 = 5
.
Computing beforehand the size of the rendered row
// get the length of the rendered row and reallocate
const rlen = // ??? total size of the rendered row ???
row.render = try e.alc.realloc(row.render, rlen);
We didn't assign anything to rlen
. How do we know how long will be our
rendered row? We'll have do something similar to what we do inside the loop in
updateRow()
, but we just increase idx
and return the final value. But often
in our program we'll have to convert a real column index to an index in the
rendered row, so we write a function that does that.
We call the function cxToRx()
and the call becomes:
const rlen = // ??? total size of the rendered row ???
const rlen = row.cxToRx(row.chars.items.len);
That is, we calculate the index in the rendered row for the last column of the real row.
We put this function in Row.zig
, because it is in agreement with how we
wanted to design our types: they shouldn't change the state of the Editor, but
they can return their own state. Here Row will not modify itself, so it's ok.
/// Calculate the position of a real column in the rendered row.
pub fn cxToRx(row: *Row, cx: usize) usize {
var rx: usize = 0;
for (0..cx) |i| {
if (row.chars.items[i] == '\t') {
rx += (opt.tabstop - 1) - (rx % opt.tabstop);
}
rx += 1;
}
return rx;
}
The loop is a bit different here, because instead of two nested loops we have only one. That's because we don't need to modify the row in any way, so we can calculate the needed spaces in a single operation. Which is quite a bit more difficult to understand, to be honest. Feel free to recreate an example loop step by step as we did above.
Also this function needs to import option.zig
, so do that.
We don't handle multi-byte characters, and we don't have virtual text of any kind. In a real editor this function would be more complex.
Write a test
Let's write a test to see if what we wrote is working.
Run tests from main.zig
To run this test with
zig build test
we must add a section to our main.zig
module:
///////////////////////////////////////////////////////////////////////////////
//
// Tests
//
///////////////////////////////////////////////////////////////////////////////
comptime {
if (builtin.is_test) {
_ = @import("Editor.zig");
}
}
In fact, our build.zig
is set up in a way that running zig build test
executes the tests that are in main.zig
. When tests are executed from
a module, all tests placed in imported modules are executed too.
We don't want any test in main.zig
, but with this comptime
block we import
the modules we want to test, if builtin.is_test
is true, and this happens
only when we're running tests.
Importing these modules will cause their tests to be executed, which is what we want.
Add the test in Editor
You should add some constants in Editor
:
const linux = @import("linux.zig");
const mem = std.mem;
const expect = std.testing.expect;
Then we add a test to src/Editor.zig
. Add the test section just above the
Constants section.
///////////////////////////////////////////////////////////////////////////////
//
// Tests
//
///////////////////////////////////////////////////////////////////////////////
test "insert rows" {
var da = std.heap.DebugAllocator(.{}){};
defer _ = da.deinit();
var e = try t.Editor.init(da.allocator(), .{ .rows = 50, .cols = 180 });
try e.openFile("src/main.zig");
defer e.deinit();
const row = e.rowAt(6).chars.items;
try expect(mem.eql(u8, "pub fn main() !void {", row));
}
It's a simple test that verifies the number of rows that have been read, and that the content of one row actually matches the one in the file.
I initialize the editor with a 'fake' screen, because this isn't an interactive
terminal. Also, we avoid the event loop by reading directly the file with
openFile()
, otherwise processKeypress()
would hang the test.
If we modify main.zig
again, this test could fail, or course. I will not, but
maybe you will.
The screen surface
Now we're finally ready to start drawing on the screen.
We'll use an ArrayList to hold all characters that will be printed on every screen refresh.
We'll add a new field to our Editor type:
/// String that is printed on the terminal at every screen redraw
surface: t.Chars,
It is initialized in the init()
function:
return .{
.alc = allocator,
// multiply * 10, because each cell could contain escape sequences
const surface_capacity = screen.rows * screen.cols * 10;
return .{
.alc = allocator,
.surface = try t.Chars.initCapacity(allocator, surface_capacity),
We give our surface
an initial capacity, so that it will probably never
reallocate. We make enough room for escape sequences: potentially, almost every
cell of the screen could contain an escape sequence.
surface
must be deinitialized in deinit()
, or it will leak:
/// Deinitialize the editor.
pub fn deinit(e: *Editor) void {
e.buffer.deinit();
e.surface.deinit(e.alc);
Note that we must pass the allocator as argument when deinitializing an ArrayList.
Appending to the surface
Every time we want to append to the surface, we'd need either:
try e.surface.appendSlice(e.alc, slice);
or
try e.surface.append(e.alc, character);
Let's create a helper function, because we'll append to the surface in lots of places, and we want our code to be more concise and readable.
/// Append either a slice or a character to the editor surface.
fn toSurface(e: *Editor, value: anytype) !void {
switch (@typeInfo(@TypeOf(value))) {
.pointer => try e.surface.appendSlice(e.alc, value),
else => try e.surface.append(e.alc, value),
}
}
With this function we just need to do:
try e.toSurface(slice_or_character);
Builtin function @TypeOf()
returns a type, which can only be evaluated at
compile time, hence our helper doesn't have any runtime cost, because the
operation to perform is decided at compile time. As proof of this, you will get
a compile error if you pass something wrong to this function.
More escape sequences
We also add a bunch of constants to the bottom of ansi.zig
. These are escape
sequences that we'll use while drawing, at one point or another, so let's just
add them all now:
/// Background color
pub const BgDefault = CSI ++ "40m";
/// Foreground color
pub const FgDefault = CSI ++ "39m";
/// Hide the terminal cursor
pub const HideCursor = CSI ++ "?25l";
/// Show the terminal cursor
pub const ShowCursor = CSI ++ "?25h";
/// Move cursor to position 1,1
pub const CursorTopLeft = CSI ++ "H";
/// Start reversing colors
pub const ReverseColors = CSI ++ "7m";
/// Reset colors to terminal default
pub const ResetColors = CSI ++ "m";
/// Clear the content of the line
pub const ClearLine = CSI ++ "K";
/// Color used for error messages
pub const ErrorColor = CSI ++ "91m";
Refresh the screen
In startUp()
, replace the commented placeholder in the event loop:
while (e.should_quit == false) {
// refresh the screen
try e.refreshScreen();
We'll do the drawing with this function, that goes in a new section, which I put above the Helpers section:
///////////////////////////////////////////////////////////////////////////////
//
// Screen update
//
///////////////////////////////////////////////////////////////////////////////
/// Full refresh of the screen.
fn refreshScreen(e: *Editor) !void {
// code to come...
}
We'll have to explain what goes on.
-
we clear our ArrayList, which will eventually contain the characters that must be printed
-
we set the background color, hide the terminal cursor so that it doesn't get in the way, and move the cursor to the top left position
-
we draw the rows, later we'll also draw the statusline and the message area
e.surface.clearRetainingCapacity();
try e.toSurface(ansi.BgDefault);
try e.toSurface(ansi.HideCursor);
try e.toSurface(ansi.CursorTopLeft);
try e.drawRows();
// try e.drawStatusline();
// try e.drawMessageBar();
-
we move the cursor to its current position and we show it again
-
we print the whole thing with a
write()
call
const V = &e.view;
// move cursor to its current position (could have been moved with keys)
var buf: [32]u8 = undefined;
const row = V.cy - V.rowoff + 1;
const col = V.rx - V.coloff + 1;
try e.toSurface(try ansi.moveCursorTo(&buf, row, col));
try e.toSurface(ansi.ShowCursor);
try linux.write(e.surface.items);
moveCursorTo()
To move the cursor we'll need a new function in ansi.zig
:
/// Return the escape sequence to move the cursor to a position.
pub fn moveCursorTo(buf: []u8, row: usize, col: usize) ![]const u8 {
return std.fmt.bufPrint(buf, CSI ++ "{};{}H", .{ row, col });
}
It takes a slice buf
and formats it to generate an escape sequence that will
move the cursor to a position.
Drawing the rows
This function will be expanded later, but for now all it needs to do is to draw the rows without any highlight.
/// Append rows to be drawn to the surface. Handles escape sequences for syntax
/// highlighting.
fn drawRows(e: *Editor) !void {
// code to come...
}
We can print a number of rows which is equal to the height of our main window,
which is e.screen.rows
. We use a for
loop with a range, but to the index
y
we must add e.view.rowoff
, which is the current row offset. This will be
greater than 0
if we scroll down our window and the first row went
off-screen.
const V = &e.view;
const rows = e.buffer.rows.items;
for (0 .. e.screen.rows) |y| {
const ix: usize = y + V.rowoff;
Since we draw by screen rows, and not by Buffer rows, y
may be greater than
the number of the Buffer rows, which means we are past the end of the file. In
this case we draw a ~
to point that out.
// past buffer content
if (ix >= rows.len) {
try e.toSurface('~');
}
Otherwise, we are within the file content, but it doesn't mean that there is something to print in all cases:
// within buffer content
else {
// length of the rendered line
const rowlen = rows[ix].render.len;
// actual length that should be drawn because visible
var len = if (V.coloff > rowlen) 0 else rowlen - V.coloff;
For example, if we scrolled the window to the right, the leftmost columns would
go off-screen, and e.view.coloff
would become positive. If the line is
shorter than that, nothing will be printed, because it's completely off-screen.
We also limit len
to the number of screen columns:
len = @min(len, e.screen.cols);
If len > 0
there's something to print: which would be the slice of the
rendered line that starts at coloff
, and is long len
characters.
We append this slice to the surface ArrayList.
// draw the visible part of the row
if (len > 0) {
try e.toSurface(rows[ix].render[V.coloff .. V.coloff + len]);
}
}
We end the line after that:
try e.toSurface(ansi.ClearLine);
try e.toSurface("\r\n"); // end the line
}
Again: V.coloff
is 0
unless a part of the row went off-screen on the left
side.
At this point, if you compile and run:
./kilo kilo
you should already be able to visualize the file on the screen! That's big progress. You can't move the cursor, and you can still quit the editor with Ctrl-Q pressed 3 times.
You will notice that the last 2 lines of the screen don't have the ~
character: that's because in init()
we subtracted 2 from the real screen
height, to make room for statusline and message area.
The statusline
Uncomment the line in refreshScreen()
where we draw the statusline.
try e.drawRows();
// try e.drawStatusline();
try e.drawStatusline();
The drawStatusline()
function
I put this below drawRows()
:
/// Append the statusline to the surface.
fn drawStatusline(e: *Editor) !void {
const V = &e.view;
// code to come...
}
We want the color of the statusline to be the inverse of the normal text color, with dark text over bright background. We want two sections, so we declare two buffers.
try e.toSurface(ansi.ReverseColors);
var lbuf: [200]u8 = undefined;
var rbuf: [80]u8 = undefined;
-
on the left side we want to display the filename, or
[No Name]
for a newly created file, and the modified state of the buffer -
on the right side, the filetype (or
no ft
) and the current cursor position
// left side of the statusline
var ls = std.fmt.bufPrint(&lbuf, "{s} - {} lines{s}", .{
e.buffer.filename orelse "[No Name]",
e.buffer.rows.items.len,
if (e.buffer.dirty) " [modified]" else "",
}) catch "";
// right side of the statusline (leading space to guarantee separation)
var rs = std.fmt.bufPrint(&rbuf, " | {s} | col {}, ln {}/{} ", .{
e.buffer.syntax orelse "no ft",
V.cx + 1,
V.cy + 1,
e.buffer.rows.items.len,
}) catch "";
We'll use std.fmt.bufPrint
to format the two sides of the statusline, then
we'll fill with spaces the room between them, to cover the whole
e.screen.cols
dimension, which would be the width of the screen.
Note that we use the orelse
statement to provide fallbacks for our optional
variables (e.buffer.filename
and e.buffer.syntax
)
Since we'll use fixed buffers on the stack for bufPrint
, there's the risk of
having filenames that are so long that they won't fit, in that case we just
print nothing. We do the same for the right side.
We'll prioritize the left side, in case there isn't enough room for both.
We'll have to ensure we reset colors and insert a new line at the end. We could
use a defer
statement for this purpose, but inside defer
statements error
handling isn't allowed, so we would have to ignore the errors, and hope for the
best. Instead we'll create a small helper function so that errors can still be
handled. In Zig the goto
statement doesn't exist, so we must get used to this
kind of alternatives.
var room_left = e.screen.cols;
// prioritize left side
if (ls.len > room_left) {
ls = ls[0 .. room_left];
}
room_left -= ls.len;
try e.toSurface(ls);
if (room_left == 0) {
try e.finalizeStatusline();
return;
}
Labeled blocks as goto alternative
Labeled blocks as goto alternative
Another alternative to goto
is a labeled block, for example:
do: {
std.debug.print("do block\n", .{});
var i: usize = 0;
while (i < 10) : (i += 1) {
if (i == 5) {
break :do;
}
}
std.debug.print("no break\n", .{});
}
std.debug.print("exit\n", .{});
prints:
do block
exit
This increases the indentation level of the whole block, though, so I prefer other solutions, when possible.
To make sure we only append if there's enough room, we track the available room
in the room_left
variable that is initially equal to e.screen.cols
, and we
reduce it as we determine the size of the left and right sides
Append the right side and we're done:
// add right side and spaces if there is room left for them
if (rs.len > room_left) {
rs = rs[0 .. room_left];
}
room_left -= rs.len;
try e.surface.appendNTimes(e.alc, ' ', room_left);
try e.toSurface(rs);
try e.finalizeStatusline();
finalizeStatusline()
This is the helper function to finalize the statusline, and still be able to handle errors.
/// Reset colors and append new line after statusline
fn finalizeStatusline(e: *Editor) !void {
try e.toSurface(ansi.ResetColors);
try e.toSurface("\r\n");
}
Compile and run, and enjoy your statusline!
The message area
For the message area we'll need more Editor fields:
/// String to be printed in the message area (can be a prompt)
status_msg: t.Chars,
/// Controls the visibility of the status message
status_msg_time: i64,
Also add these constants:
const time = std.time.timestamp;
const time_ms = std.time.milliTimestamp;
const initial_msg_size = 80;
Add to init()
:
.status_msg = try t.Chars.initCapacity(allocator, initial_msg_size),
.status_msg_time = 0,
and to deinit()
(always deinitialize ArrayLists or they will leak):
e.status_msg.deinit(e.alc);
Uncomment the line in refreshScreen()
where we draw the message area.
try e.drawRows();
try e.drawStatusline();
// try e.drawMessageBar();
try e.drawMessageBar();
The drawMessageBar()
function
I put this below finalizeStatusline()
:
/// Append the message bar to the surface.
fn drawMessageBar(e: *Editor) !void {
try e.toSurface(ansi.ClearLine);
var msglen = e.status_msg.items.len;
if (msglen > e.screen.cols) {
msglen = e.screen.cols;
}
if (msglen > 0 and time() - e.status_msg_time < 5) {
try e.toSurface(e.status_msg.items[0 .. msglen]);
}
}
As you can see, it's pretty simple. We clear the line, then if there's a message to be printed, we append it to the surface.
We have also some sort of timer: it's not a real timer in the sense that
there's not an async timer that runs independently from the main thread.
Remember that the screen is redrawn in the event loop, whose iterations are
controlled by the processKeypress()
function, since it's that function that
halts the loop while waiting for new keys pressed by the user. So what this
"timer" does, is to check if 5 seconds have passed since the last redraw, then
it will append the message to the surface if it didn't, otherwise it will not
append anything, and the message won't be printed.
It will be the function which sets a status message that will update
status_msg_time
, but we don't have a way to set a status message yet.
The welcome message
The original kilo
editor would print a welcome message when the program is
started without arguments, which results in a new empty, unnamed buffer to be
created.
We also want it because it's cool and reminds us (or at least me) of vim.
New fields in Editor
:
/// String to be displayed when the editor is started without loading a file
welcome_msg: t.Chars,
/// Becomes false after the first screen redraw
just_started: bool,
Initialize in init()
.
.welcome_msg = try t.Chars.initCapacity(allocator, 0),
.just_started = true,
Deinitialize in deinit()
:
e.welcome_msg.deinit(e.alc);
When, how, and where do we want the welcome message to appear?
-
we generate it when the argument for
startUp()
isnull
, which means there's no file to open -
we want to generate it dynamically because the message should be centered on screen, and we can assess that only at runtime
-
we render the message in
drawRows()
A module for messages
When generating the message, we must fetch the base string from somewhere.
It will be the same for other text constants and messages that we'll use in the
editor in the future. So we create a message
module and we import it in
Editor:
const message = @import("message.zig");
This module for now will look like this:
//! Module that holds various strings for the message area, either status or
//! error messages, or prompts.
const std = @import("std");
const opt = @import("option.zig");
const status_messages = .{
.{ "welcome", "Kilo editor -- version " ++ opt.version_str },
};
pub const status = std.StaticStringMap([]const u8).initComptime(status_messages);
We also create a version_str
in our option
module, so that it contains the
current version number, as a string:
pub const version_str = "0.1";
The StaticStringMap
is created at compile time (see how it's initialized),
and will be accessed in Editor with message.status.get()
, that returns an
optional value which is null
if the key couldn't be found.
Keys of StaticStringMap
will always be strings, but values can be of any
type. In our case they are also strings ([]const u8
).
Generate the message
We had a commented placeholder in startUp()
, so we must replace it with the
actual function call.
else {
// we generate the welcome message
}
else {
try e.generateWelcome();
}
The function to generate the message is:
/// Generate the welcome message.
fn generateWelcome(e: *Editor) !void {
// code to come...
}
The line with the welcome message starts with a ~
, because we're in an empty
buffer.
The length of the message must be limited to the screen columns - 1, because of
the ~
which we just appended.
try e.welcome_msg.append(e.alc, '~');
var msg = message.status.get("welcome").?;
if (msg.len >= e.screen.cols) {
msg = msg[0 .. e.screen.cols - 1];
}
The padding will be inserted before the message.
const padding: usize = (e.screen.cols - msg.len) / 2;
try e.welcome_msg.appendNTimes(e.alc, ' ', padding);
try e.welcome_msg.appendSlice(e.alc, msg);
Render the message
In drawRows()
, all we have to do is replace the if
branch for when the row
is past the end of the buffer, with this:
// past buffer content
if (ix >= rows.len) {
try e.toSurface('~');
// past buffer content
if (ix >= rows.len) {
if (e.just_started
and e.buffer.filename == null
and e.buffer.rows.items.len == 0
and y == e.screen.rows / 3) {
try e.toSurface(e.welcome_msg.items);
}
else {
try e.toSurface('~');
}
We append it to the surface if the buffer is empty, doesn't even have a name, and current row is at about 1/3 of the height of the screen.
Remember to set just_started
to false
at the bottom of refreshScreen()
,
if you didn't already.
e.just_started = false;
try linux.write(e.surface.items);
We also set just_started
to false
so that our welcome message won't be
printed again.
Compile and run with
./kilo
to see an empty buffer and the welcome message. You can try to run again with a narrower terminal window, to verify that the message and the statusline are displayed correctly.
A text viewer
Right now we're able to open a file and display it, but not being able to move the cursor, keeps us stuck in the top-left corner of the screen.
Our processKeypress()
must detect more keys, and we must bind these keys to
actions to perform.
We change our function to this:
/// Process a keypress: will wait indefinitely for readKey, which loops until
/// a key is actually pressed.
fn processKeypress(e: *Editor) !void {
const k = try ansi.readKey();
const static = struct {
var q: u8 = opt.quit_times;
};
const B = &e.buffer;
switch (k) {
.ctrl_q => {
if (B.dirty and static.q > 0) {
static.q -= 1;
return;
}
try ansi.clearScreen();
e.should_quit = true;
},
else => {},
}
// reset quit counter for any keypress that isn't Ctrl-Q
static.q = opt.quit_times;
}
opt.quit_times
First thing, we want to remove that magic number and bind static.q
to an
option, so in option.zig
we'll add:
pub const quit_times = 3;
and we replace 3
with opt.quit_times
. And we only want to repeat
Ctrl-Q if the buffer has modified.
Next, we'll handle more keypresses.
Before we deal with movements, we must complete our Row type.
The rxToCx()
method
This does the opposite of the cxToRx()
method, that is, it finds the real
column for an index of the rendered row. It must still iterate the real row,
not the rendered one, because from the latter we just couldn't know what was
a tab
and what a real space
character. Therefore we iterate the real row
like in cxToRx()
, we track both the rendered column and the current index in
the real row, and when the resulting rendered column is greater than the
requested column we return the current index in the real row.
/// Calculate the position of a rendered column in the real row.
pub fn rxToCx(row: *Row, rx: usize) usize {
var cur_rx: usize = 0;
var cx: usize = 0;
while (cx < row.chars.items.len) : (cx += 1) {
if (row.chars.items[cx] == '\t') {
cur_rx += (opt.tabstop - 1) - (cur_rx % opt.tabstop);
}
cur_rx += 1;
if (cur_rx > rx) {
return cx;
}
}
return cx;
}
More keypress handling
Inside the switch that handles keypresses, we add a variable and more prongs:
const B = &e.buffer;
const V = &e.view;
.ctrl_d, .ctrl_u, .page_up, .page_down => {
// by how many rows we'll jump
const leap = e.screen.rows - 1;
// place the cursor at the top of the window, then jump
if (k == .ctrl_u or k == .page_up) {
V.cy = V.rowoff;
V.cy -= @min(V.cy, leap);
}
// place the cursor at the bottom of the window, then jump
else {
V.cy = V.rowoff + e.screen.rows - 1;
V.cy = @min(V.cy + leap, B.rows.items.len);
}
},
.home => {
V.cx = 0;
},
.end => {
// last row doesn't have characters!
if (V.cy < B.rows.items.len) {
V.cx = B.rows.items[V.cy].clen();
}
},
.left, .right => {
e.moveCursorWithKey(k);
},
.up, .down => {
e.moveCursorWithKey(k);
},
I added comments so that what happens should be self-explanatory.
One of my favorite Zig features is how you can omit the enum type when using their values, since the type of those values is known to be that type of enum. It makes the code very expressive and avoids redundancy, without resorting to macros or untyped constants. It also makes it easier to write this kind of guides.
We see the a new function, moveCursorWithKey()
, which we'll cover next.
Move with keys
This function will let us move the cursor with arrow keys. Also in this case the code is self-explanatory.
With keys Left and Right we can also change row, if we are respectively in the first or last column of the row.
/// Update the cursor position after a key has been pressed.
fn moveCursorWithKey(e: *Editor, key: t.Key) void {
const V = &e.view;
const numrows = e.buffer.rows.items.len;
switch (key) {
.left => {
if (V.cx != 0) { // not the first column
V.cx -= 1;
}
else if (V.cy > 0) { // move back to the previous row
V.cy -= 1;
V.cx = e.currentRow().clen();
}
},
.right => {
if (V.cy < numrows) {
if (V.cx < e.currentRow().clen()) { // not the last column
V.cx += 1;
}
else { // move to the next row
V.cy += 1;
V.cx = 0;
}
}
},
.up => {
if (V.cy != 0) {
V.cy -= 1;
}
},
.down => {
if (V.cy < numrows) {
V.cy += 1;
}
},
else => {},
}
}
Handling the wanted column
When we move vertically, the cursor keeps its current column. That's pretty
obvious. But when it moves to a shorter line, if we don't keep track of the
previous value, it will keep moving along the shorter line, instead we want to
move along the same column from where we started. That is the wanted column,
and in our View
type is the cwant
field.
This variable should be:
-
restored when moving vertically, either with arrow keys or by page
-
set to the current column when moving left or right, or to the beginning of the line (Home key), or after typing/deleting something
-
when using the End key, it should be set to a special value that means: always stick to the end of the line when moving vertically
The special value we use is std.math.maxInt(usize)
, which we store in
a constant:
const maxUsize = std.math.maxInt(usize);
The Cwant enum
These different behaviors are listed in an enum
, which will go in our types
module:
/// Controls handling of the wanted column.
pub const Cwant = enum(u8) {
/// To set cwant to a new value
set,
/// To restore current cwant, or to the last column if too big
restore,
/// To set cwant to maxUsize, which means 'always the last column'
maxcol,
};
The doCwant()
function
Differently from the original kilo
editor, here the cwant
field will track
the rendered column, not the real one, which makes more sense in an editor.
///////////////////////////////////////////////////////////////////////////////
//
// View operations
//
///////////////////////////////////////////////////////////////////////////////
/// Handle wanted column. `want` can be:
/// .set: set e.view.cwant to a new value
/// .maxcol: set to maxUsize, which means 'always the last column'
/// .restore: set current column to cwant, or to the last column if too big
fn doCwant(e: *Editor, want: t.Cwant) void {
const V = &e.view;
const numrows = e.buffer.rows.items.len;
switch (want) {
// code to come...
}
}
So when we set cwant
, we assign it to the current column of the rendered
row.
If want
is .maxcol
, we set cwant
to our special value.
.set => {
V.cwant = if (V.cy < numrows) e.currentRow().cxToRx(V.cx) else 0;
},
.maxcol => {
V.cwant = maxUsize;
},
When we restore it, since cwant
is an index in the rendered row,
we use rxToCx()
to find out the real column, to which cx
must be set.
When we restore cwant
, we'll check if we can actually restore it. If the
length of the current row is shorter, the cursor will be moved to the last
column.
If the value of cwant
is our special value, the cursor will always be placed
in the last column, even if the starting line was shorter than the following
ones.
.restore => {
if (V.cy == numrows) { // past end of file
V.cx = 0;
}
else if (V.cwant == maxUsize) { // wants end of line
V.cx = e.currentRow().clen();
}
else {
const row = e.currentRow();
const rowlen = row.clen();
if (rowlen == 0) {
V.cx = 0;
}
else {
// cwant is an index of the rendered column, must convert
V.cx = row.rxToCx(V.cwant);
if (V.cx > rowlen) {
V.cx = rowlen;
}
}
}
},
Calls to doCwant()
Where should the wanted column be handled? Right in the processKeypress()
function. You'll have to add calls to doCwant()
as follows:
// after handling <ctrl-d>, <ctrl-u>, <page-up>, <page-down>
e.doCwant(.restore);
// after handling <up>, <down>
e.doCwant(.restore);
// after handling <left>, <right> and <home>
e.doCwant(.set);
// after handling <end>
e.doCwant(.maxcol);
Scroll the view
There's one more thing to write, before all this begins to actually work. Until
now, movement keys would set the row (View.cy
) and the real column (View.cx
).
But in our refreshScreen()
function, the escape sequence that actually moves
the cursor to the new position will need View.rx
, that is the column in the
rendered row.
This value will be set in another function, scroll()
, which will be invoked
at the top of the refreshScreen()
function. So place the call now:
/// Full refresh of the screen.
fn refreshScreen(e: *Editor) !void {
e.scroll();
We must define another option:
/// Minimal number of screen lines to keep above and below the cursor
pub var scroll_off: u8 = 2;
The actual scroll()
function has 3 purposes:
- adapt the view to respect the
scroll_off
option - set the visual column (column in the rendered row)
- set
View.rowoff
andView.coloff
, which control the visible part of the buffer relatively to the first row and the first column
/// Scroll the view, respecting scroll_off.
fn scroll(e: *Editor) void {
const V = &e.view;
const numrows = e.buffer.rows.items.len;
// handle scroll_off here...
// update rendered column here...
// update rowoff and coloff here...
}
the scroll_off
option
This is how the Vim documentation describes it:
Minimal number of screen lines to keep above and below the cursor. This will make some context visible around where you are working.
//////////////////////////////////////////
// scrolloff option
//////////////////////////////////////////
if (opt.scroll_off > 0 and numrows > e.screen.rows) {
while (V.rowoff + e.screen.rows < numrows
and V.cy + opt.scroll_off >= e.screen.rows + V.rowoff)
{
V.rowoff += 1;
}
while (V.rowoff > 0 and V.rowoff + opt.scroll_off > V.cy) {
V.rowoff -= 1;
}
}
The rendered column
//////////////////////////////////////////
// update rendered column
//////////////////////////////////////////
V.rx = 0;
if (V.cy < numrows) {
V.rx = e.currentRow().cxToRx(V.cx);
}
We just use the cxToRx()
function, for all lines except the last one, which
is completely empty, not even a \n
character, so we can't index it in any
way (the program would panic).
rowoff
, coloff
rowoff
is the topmost visible row, coloff
is the leftmost visible column.
While the latter is rarely positive, the former will be positive whenever we
can't see the first line of the file.
When the function is called, cy
(the cursor column) can have a new value, but
rowoff
has still the old value, so it must be updated. Same for coloff
.
//////////////////////////////////////////
// update rowoff and coloff
//////////////////////////////////////////
// cursor has moved above the visible window
if (V.cy < V.rowoff) {
V.rowoff = V.cy;
}
// cursor has moved below the visible window
if (V.cy >= V.rowoff + e.screen.rows) {
V.rowoff = V.cy - e.screen.rows + 1;
}
// cursor has moved beyond the left edge of the window
if (V.rx < V.coloff) {
V.coloff = V.rx;
}
// cursor has moved beyond the right edge of the window
if (V.rx >= V.coloff + e.screen.cols) {
V.coloff = V.rx - e.screen.cols + 1;
}
Casting numbers
Casting numbers
When calculating a value, and we are handling unsigned integer types (like in this case), we should avoid subtractions, unless we are absolutely sure that the left operand is greater than the right operand.
Castings in Zig tend to be quite verbose, since the Zig phylosophy is to make everything as explicit as possible, and the verbosity is also an element of the concept of friction that Zig has adopted: to make safe things easy, and unsafe things uncomfortable, even if not impossible, so that one becomes inclined to take the safer route to the solution of a problem.
In this program, we don't do any casting, but we don't have to deal with floating point numbers either.
To avoid castings of unsigned integers, sometimes it's enough to move the subtracted operand to the other side of the equation, making it become a positive operand. It's what we're doing here, even though it can make the operation less intuitive.
This thread on Ziggit forum is an interesting read about castings.
Compile and run!
Our text viewer is complete. You should be able to open any file and navigate it with ease.
A text editor
Now we want to turn our text viewer in a proper editor. I guess it's the natural progression for this kind of things. Not to mention that our guide is called "Build a text editor", not "Build a text viewer". Let's not forget that.
Let's start by handling more keypresses in the processKeypress()
function.
We add new switch prongs for Backspace, Del and Enter:
.backspace, .ctrl_h, .del => {
if (k == .del) {
e.moveCursorWithKey(.right);
}
try e.deleteChar();
e.doCwant(.set);
},
.enter => try e.insertNewLine(),
We also change our else
branch to handle characters to be inserted. We only
handle Tab and printable characters, for now.
else => {
const c = @intFromEnum(k);
if (k == .tab or asc.isPrint(c)) {
try e.insertChar(c);
e.doCwant(.set);
}
},
There is a new constant to set:
const asc = std.ascii;
And new functions to implement:
insertChar
will insert a character at cursor positindeleteChar
will delete the character on the left of the cursorinsertNewLine
will start editing a new line after the current one
Insert characters
Before inserting a character, we check if we are in a new row, if so, we insert
the row in the buffer. After that, we can just insert the character and move
forward. We wrote already our insertRow()
function, so there's nothing to add
(for now).
///////////////////////////////////////////////////////////////////////////////
//
// In-row operations
//
///////////////////////////////////////////////////////////////////////////////
/// Insert a character at current cursor position. Handle textwidth.
fn insertChar(e: *Editor, c: u8) !void {
const V = &e.view;
// last row, insert a new row before inserting the character
if (V.cy == e.buffer.rows.items.len) {
try e.insertRow(e.buffer.rows.items.len, "");
}
// insert the character and move the cursor forward
try e.rowInsertChar(V.cy, V.cx, c);
V.cx += 1;
}
rowInsertChar()
This will perform the actual character insertion in the row.chars
ArrayList,
update the rendered row, and set the modified flag.
/// Insert character `c` in the row with index `ix`, at column `at`.
fn rowInsertChar(e: *Editor, ix: usize, at: usize, c: u8) !void {
try e.rowAt(ix).chars.insert(e.buffer.alc, at, c);
try e.updateRow(ix);
e.buffer.dirty = true;
}
Deleting a character
By deleting a character, we mean deleting the character to the left of our cursor, what the Backspace key normally does.
/// Delete a character before cursor position (backspace).
fn deleteChar(e: *Editor) !void {
const V = &e.view;
const B = &e.buffer;
// code to come...
}
We'll want to handle different cases:
Cursor is past the end of file: move to the end of the previous line, don't return, we will possibly delete a character.
// past the end of the file
if (V.cy == B.rows.items.len) {
e.moveCursorWithKey(.left);
}
Cursor at the start of the file: nothing to do.
// start of file
if (V.cx == 0 and V.cy == 0) {
return;
}
Cursor after the first column: delete the character at column before the current one.
// delete character in current line
if (V.cx > 0) {
try e.rowDelChar(V.cy, V.cx - 1);
V.cx -= 1;
}
Cursor is at the start of a line which isn't the first one: we'll append the current line to the previous one, then delete the current row. The cursor will then be moved to the row above, at a column that is the length of the previous row before the lines were joined.
// join with previous line
else {
V.cx = B.rows.items[V.cy - 1].clen();
try e.rowInsertString(V.cy - 1, V.cx, e.currentRow().chars.items);
e.deleteRow(V.cy);
V.cy -= 1;
}
rowDelChar()
For the actual character deletion we write rowDelChar()
, which closely
resembles rowInsertChar()
:
/// Delete a character in the row with index `ix`, at column `at`.
fn rowDelChar(e: *Editor, ix: usize, at: usize) !void {
_ = e.rowAt(ix).chars.orderedRemove(at);
try e.updateRow(ix);
e.buffer.dirty = true;
}
rowInsertString()
In case we want to join lines, we'll need two new functions.
/// Insert a string at position `at`, in the row at index `ix`.
fn rowInsertString(e: *Editor, ix: usize, at: usize, chars: []const u8) !void {
try e.rowAt(ix).chars.insertSlice(e.buffer.alc, at, chars);
try e.updateRow(ix);
e.buffer.dirty = true;
}
This is very similar to rowInsertChar()
, but inserts a slice instead of
inserting a character. Here we're just appending at the end of the row, since
we're passing an at
argument that is equal to the length of the row.
deleteRow()
The last function we need for now is the one that deletes a row from the
Buffer. I put this function below insertRow()
.
As mentioned when we talked about the Buffer type, we're sometimes deinitializing individual rows in the Editor methods, which isn't ideal, but I don't think that creating a method in Buffer just for this is that much better. We can access the Buffer allocator just fine, but we must remember that a Row uses the Buffer allocator, not the Editor one. It's only happening right now that both Editor and Buffer use the same allocator, but things might change in the future.
/// Delete a row and deinitialize it.
fn deleteRow(e: *Editor, ix: usize) void {
var row = e.buffer.rows.orderedRemove(ix);
row.deinit(e.buffer.alc);
e.buffer.dirty = true;
}
The string module
Before we proceed, let's add a new module called string.zig
. It will be quite
simple, just a few helpers for string operations.
It will contain a single function for now. Don't forget to import in Editor.
//! Module with functions handling strings.
///////////////////////////////////////////////////////////////////////////////
//
// Functions
//
///////////////////////////////////////////////////////////////////////////////
/// Return the number of leading whitespace characters
pub fn leadingWhitespaces(src: []u8) usize {
var i: usize = 0;
while (i < src.len and asc.isWhitespace(src[i])) : (i += 1) {}
return i;
}
///////////////////////////////////////////////////////////////////////////////
//
// Constants, variables
//
///////////////////////////////////////////////////////////////////////////////
const std = @import("std");
const asc = std.ascii;
const mem = std.mem;
Insert a new line
We insert a new line when we press Enter. Nothing simpler right? This operation is a bit more complex than it seems, especially if we want to copy indentation, which is optional, but it's so useful that we don't want to miss it.
Let's ignore indentation for now, and write the basic function.
///////////////////////////////////////////////////////////////////////////////
//
// Insert lines
//
///////////////////////////////////////////////////////////////////////////////
/// Insert a new line at cursor position. Will carry to the next line
/// everything that is after the cursor.
fn insertNewLine(e: *Editor) !void {
const V = &e.view;
// make sure the beginning of the line is visible
V.coloff = 0;
// code to come...
// row operations have been concluded, update rows
try e.updateRow(V.cy - 1);
try e.updateRow(V.cy);
// set cursor position at the start of the new line
V.cx = 0;
V.cwant = 0;
}
At least, we want to handle several cases:
- are we at the beginning of the line (
cx = 0
)? We insert an empty line above the current line, then increase the row number
// at first column, just insert an empty line above the cursor
if (V.cx == 0) {
try e.insertRow(V.cy, "");
V.cy += 1;
return;
}
- is there any whitespace that follows the cursor? Then we want to remove it when carrying over the text that follows
// leading whitespace removed from characters after cursor
var skipw: usize = 0;
var oldrow = e.currentRow().chars.items;
// any whitespace before the text that is going into the new row
if (V.cx < oldrow.len) {
skipw = str.leadingWhitespaces(oldrow[V.cx..]);
}
We already know that we are in the middle of a line, so we must carry everything that comes after the cursor to the new line.
After the row has been inserted, we proceed to the new row and shrink the row above. We perform this operation last, because we needed those characters to be able to append them. Cut and paste is actually a copy then delete operation in our case.
// will insert a row with the characters to the right of the cursor
// skipping whitespace after the cursor
try e.insertRow(V.cy + 1, oldrow[V.cx + skipw ..]);
// proceed to the new row
V.cy += 1;
// delete from the row above the content that we moved to the next row
e.rowAt(V.cy - 1).chars.shrinkAndFree(e.alc, V.cx);
We are using the shrinkAndFree
method, which is not optimal, because in many
cases we would like to retain the ArrayList capacity. At least partially.
We could use instead the method shrinkRetainingCapacity
, which does what it
says. But this could lead to excessive memory usage, because rows would always
keep the biggest capacity they had at any time, always growing, never
shrinking.
Maybe better would be to do a shrinkAndFree
while keeping some
extra room, followed by a resize
to set the correct length.
The same concepts would apply to row.render
, if it was made an ArrayList.
These are all optimizations that can wait, anyway. For now, we keep it simple.
You might want to compile and run at this point, to check that everything is working. You should be able to insert characters, delete them, and inserting new lines.
Autoindent
We also want an option for autoindent.
Let's add the option:
/// Copy indent from current line when starting a new line
pub var autoindent = true;
Autoindent brings additional concerns:
-
we should copy the indent from the line above
-
are we inserting the line while in the middle of the indent? Then we want to shorten the indent and remove the part of it that lies after the cursor
Add the ind
variable: it is the number of whitespace characters that we must
copy from the line above.
// leading whitespace removed from characters after cursor
var skipw: usize = 0;
// extra characters for indent
var ind: usize = 0;
What if we hit Enter in the middle of the indentation? We want to reduce it to the current column.
// any whitespace before the text that is going into the new row
if (V.cx < oldrow.len) {
skipw = str.leadingWhitespaces(oldrow[V.cx..]);
}
if (opt.autoindent) {
ind = str.leadingWhitespaces(oldrow);
// reduce indent if current column is within it
if (V.cx < ind) {
ind = V.cx;
}
}
After we proceed to the new row, we must copy over the indent from the line
above. Before copying, we reassign the pointer, because a row insertion in
Buffer.rows
has happened, which could have caused the invalidation of all row
pointers...
// proceed to the new row
V.cy += 1;
if (ind > 0) {
// reassign pointer, invalidated by row insertion
oldrow = e.rowAt(V.cy - 1).chars.items;
// in new row, shift the old content forward, to make room for indent
const newrow = try e.currentRow().chars.addManyAt(e.alc, 0, ind);
// Copy the indent from the previous row.
for (0..ind) |i| {
newrow[i] = oldrow[i];
}
}
Finally, we must update the last two lines to set the cursor column after the indent:
// set cursor position at the start of the new line
V.cx = 0;
V.cwant = 0;
// set cursor position right after the indent in the new line
V.cx = ind;
V.cwant = ind;
Compile and try it!
Handling text wrapping
There's one last thing that we should handle, one little thing that will make our editor much more usable.
After we type a certain number of characters in the line, we want our text to be automatically wrapped into a new line, to avoid that the line becomes too long.
We call this option textwidth
and we add it to our option
module.
/// Wrap text over a new line, when current line becomes longer than this value
pub var textwidth = struct {
enabled: bool = true,
len: u8 = 79,
} {};
Thinking more about it, it's not always desirable, especially when writing code. Our implementation will be particularly stubborn and absolutely refuse to let us write differently. In the future we might introduce ways to change option values with key combinations, and allow different options for different filetypes. For now, this is it, and we must accept it.
We need a new string
module function:
/// Return true if `c` is a word character.
pub fn isWord(c: u8) bool {
return switch (c) {
'0'...'9', 'a'...'z', 'A'...'Z', '_' => true,
else => false,
};
}
Handling of text wrapping happens in insertChar()
, right after inserting the
character.
// insert the character and move the cursor forward
try e.rowInsertChar(V.cy, V.cx, c);
V.cx += 1;
//////////////////////////////////////////
// textwidth
//////////////////////////////////////////
const row = e.currentRow();
const rx = row.cxToRx(V.cx);
if (opt.textwidth.enabled and rx > opt.textwidth.len and str.isWord(c)) {
The logic can be split in two phases.
Phase 1
• | we must find the start of the current word, crawling back along the current row |
• | if this word is preceded by a space character, we push back the cursor again, because we want to remove a single space while wrapping text, but not more than one |
• | if this word is preceded by another kind of separator, we don't remove it, we just wrap the word |
// will be 1 if a space before the wrapped word must be removed
var skipw: usize = 0;
// find the start of the current word
var start: usize = rx - 1;
while (start > 0) {
if (!str.isWord(row.render[start - 1])) {
// we want to remove a space before the wrapped word, but not
// other kinds of separators (not even a tab, just in case)
if (row.render[start - 1] == ' ') {
skipw = 1;
}
break;
}
start -= 1;
}
Phase 2
We crawled back in the row, and we found where this word began. If the column is 0, it means it's a single very long sequence of word characters, we can't wrap anything.
If instead we can wrap it, we proceed as follows:
• | we set the cursor before the word, and also before the space character that precedes it (if there is one) |
• | we insert a new line: the same things that would happen when pressing Enter would happen now, the extra space would be deleted and the word would be carried to the new line |
• | we move forward the cursor to the end of the word we wrapped |
// only wrap if the word doesn't start at the beginning
if (start > 0) {
const wlen = rx - start;
// move the cursor to the start of the word, also skipping a space
V.cx = row.rxToCx(start - skipw);
// new line insertion will carry over the word and delete the space
try e.insertNewLine();
// move forward the cursor to the end of the word
V.cx += wlen;
}
}
This completes the editor chapter. We still can't save our edits, but before getting there we need to expand the capabilities of our message area, so that it can actually print something.
Interacting with the user
At various points of our program, we'll want to interact with the users, either by notifying them of something, or by requesting something.
For example, we want to print a "help" sort of message when the editor starts, we must prompt for a filename when trying to save an unnamed buffer, or for a word when using the searching functionality.
We have already added the status_msg
field in Editor, so we must add
a function that prints it.
We'll have two ways to print, either normal messages (or prompts) using regular highlight, or error messages, which we'll print in a bright red color.
statusMessage()
What this function does, is clearing the previous content, and replace it with
a new one, which we'll format on the fly by using the ArrayList(u8)
method
print()
. Note that this method only works if the base type of the array is
u8
.
///////////////////////////////////////////////////////////////////////////////
//
// Message area
//
///////////////////////////////////////////////////////////////////////////////
/// Set a status message, using regular highlight.
pub fn statusMessage(e: *Editor, comptime format: []const u8, args: anytype) !void {
assert(format.len > 0);
e.status_msg.clearRetainingCapacity();
try e.status_msg.print(e.alc, format, args);
e.status_msg_time = time();
}
print()
uses std.Io.Writer
, we'll see this interface again when we'll want
to save a file.
We never pass an empty format, so we assert()
that the format is not empty.
You have to define a assert
constant (do it yourself).
Finally we update status_msg_time
, so that the message will be actually
printed, then cleared after a while.
This function doesn't really print anything on screen: the actual printing
will be done in drawMessageBar()
, which we already wrote.
Compile and run to see your "help" message printed in the message area when you start up the editor.
errorMessage()
This function is similar, but it will color the message in bright red, since
it's supposed to be an error. Note that we can use the ++
string
concatenation operator, since all values are comptime-known.
/// Print an error message, using error highlight.
pub fn errorMessage(e: *Editor, comptime format: []const u8, args: anytype) !void {
assert(format.len > 0);
e.status_msg.clearRetainingCapacity();
const fmt = ansi.ErrorColor ++ format ++ ansi.ResetColors;
try e.status_msg.print(e.alc, fmt, args);
e.status_msg_time = time();
}
The 'help' message
Let's take care of the "help" message.
pub fn startUp(e: *Editor, path: ?[]const u8) !void {
try e.statusMessage(message.status.get("help").?, .{});
help
should be a key in our message string map, but we don't have it yet,
so add it to status_messages
:
.{ "help", "HELP: Ctrl-S = save | Ctrl-Q = quit | Ctrl-F = find" },
The 'unsaved' message
Let's add a message that warns us when we press Ctrl-Q and there are unsaved changes:
.{ "unsaved", "WARNING!!! File has unsaved changes. Press Ctrl-Q {d} more times to quit." },
We print this message in processKeypress
:
.ctrl_q => {
if (B.dirty and static.q > 0) {
try e.statusMessage(message.status.get("unsaved").?, .{static.q});
Now, if we have unsaved changes, we'll get this warning, telling us how many times we must press Ctrl-Q to quit.
Needed constants:
const assert = std.debug.assert;
I/O: writing
To save files we'll use the Io.Writer
interface. I'm not going to explain in
detail what is possible to do with it, because it has been recently introduced
into the Zig standard library, it's a vast subject and I'm not familiar with
it. So I'll stick to the minimum of informations to make our use case work.
Let's handle first the case where the filename is known, and we just want to save the current file.
We add another key-value pair to our status_messages
string map:
.{ "bufwrite", "\"{s}\" {d} lines, {d} bytes written" },
So that we'll print a message if the save is successful.
ioerr()
and the error messages StringMap
Whenever a write operation fails, we'll handle the error in a helper function,
ioerr()
:
/// Handle an error of type IoError by printing an error message, without
/// quitting the editor.
fn ioerr(e: *Editor, err: t.IoError) !void {
try e.errorMessage(message.errors.get("ioerr").?, .{@errorName(err)});
return;
}
As you can see, this function doesn't make the process terminate only because we couldn't save the file for some reason. Instead, it will print an error in the message area, with the name of the error.
IoError
The ioerr
function accepts an argument of type IoError
. This is an error
union that we'll define in types
:
///////////////////////////////////////////////////////////////////////////////
//
// Error sets
//
///////////////////////////////////////////////////////////////////////////////
/// Error set for both read and write operations.
pub const IoError = std.fs.File.OpenError
|| std.fs.File.WriteError
|| std.Io.Reader.Error
|| std.Io.Writer.Error;
It includes errors for both reading and writing, because to write a file, we must also be able to open it, and also that can fail.
Error messages
We keep all these error messages we'll be using in message.zig
, in another
StringMap that we'll call errors
:
const error_messages = .{
.{ "ioerr", "Can't save! I/O error: {s}" },
};
pub const errors = std.StaticStringMap([]const u8).initComptime(error_messages);
Saving a file
/// Try to save the current file, prompt for a file name if currently not set.
/// Currently saving the file fails if directory doesn't exist, and there is no
/// tilde expansion.
fn saveFile(e: *Editor) !void {
var B = &e.buffer;
if (B.filename == null) {
// will prompt for a filename
return;
}
// code to come...
}
Before saving, we want to determine in advance how many bytes we'll write to disk, so that we can print it in a message.
Since e.buffer.filename
is optional, once we are certain that it can't be
null
, we can access safely its non-null value with the .?
notation.
// determine number of bytes to write, make room for \n characters
var fsize: usize = B.rows.items.len;
for (B.rows.items) |row| {
fsize += row.chars.items.len;
}
const file = std.fs.cwd().createFile(B.filename.?, .{ .truncate = true });
if (file) |f| {
// write lines to file
}
else |err|{
e.alc.free(B.filename.?);
B.filename = null;
return e.ioerr(err);
}
We will try to open the file in writing mode, truncating it and replacing all
bytes. Here the key std function is std.fs.cwd().createFile()
.
In this block we write the lines:
if (file) |f| {
// write lines to file
var buf: [1024]u8 = undefined;
var writer = f.writer(&buf);
defer f.close();
// for each line, write the bytes, then the \n character
for (B.rows.items) |row| {
writer.interface.writeAll(row.chars.items) catch |err| return e.ioerr(err);
writer.interface.writeByte('\n') catch |err| return e.ioerr(err);
}
// write what's left in the buffer
try writer.interface.flush();
try e.statusMessage(message.status.get("bufwrite").?, .{
B.filename.?, B.rows.items.len, fsize
});
B.dirty = false;
return;
Before writing, we need a buffered writer. The size doesn't matter too much I think, but too small would be close to unbuffered.
To actually write the file, we use the Io.Writer
interface, which is accessed
at writer.interface
.
After we wrote all bytes, we have to flush the writer. This is what happens:
1. | we provide a small buffer, that lives on the stack |
2. | this buffer is filled by the writer with characters that have to be written |
3. | when the buffer is full, the writer actually writes the data, then empties the buffer and repeats |
4. | when there's nothing more to write, there can be something left in the buffer, because the writer only writes the buffer when it's full |
5. | so we flush the buffer: the writer empties it and writes what's left |
When we're done we print a message that says the name of the written file, how many lines and bytes have been written to disk.
If for some reason the write fails, the buffer filename is freed and made null.
The same remarks that have been made for the Io.Reader
interface are valid
here: you can't make a copy of the interface, by assigning it directly:
const interface = f.writer(&buf).interface; // WRONG
Prompts
We're back to the point where we need to interact with the user, in this case to obtain a filename for a buffer that doesn't have one, so that we can save it.
The prompt function
We'll put this function in the Message Area section, right above the
statusMessage
and errorMessage
functions.
For now this is a simplified version, we'll have to expand it later, when we'll want this prompt to accept a callback as argument, so that this callback can be invoked at each user input. But right now we don't need it, so we keep it at its simplest.
/// Start a prompt in the message area, return the user input.
/// Prompt is terminated with either .esc or .enter keys.
/// Prompt is also terminated by .backspace if there is no character left in
/// the input.
fn promptForInput(e: *Editor, prompt: []const u8) !t.Chars {
var al = try t.Chars.initCapacity(e.alc, 80);
while (true) {
// read keys
}
e.clearStatusMessage();
return al;
}
This function returns an ArrayList, which is allocated inside the function
itself. It's not a pointer to an existing ArrayList, it's a new one. The caller
must remember to deinitialize this ArrayList with a defer
statement.
Note that in this case, returning a pointer to the ArrayList created in
promptForInput()
would mean to return a dangling pointer, so we should
either:
- return a copy (doing this)
- pass a pointer to an existing ArrayList as argument
To be more explicit, we could pass the allocator to promptForInput()
, but I'm
not doing it here.
The loop
The loop reads typed characters in the ArrayList. Input is terminated with Esc or Enter, and also with Backspace if the prompt is empty. If you wondered if we can move the cursor inside the prompt, the answer is no. But we can press Backspace to delete characters.
try e.statusMessage("{s}{s}", .{ prompt, al.items });
try e.refreshScreen();
const k = try ansi.readKey();
const c = @intFromEnum(k);
switch (k) {
.ctrl_h, .backspace => {
if (al.items.len == 0) {
break;
}
_ = al.pop();
},
.esc, .enter => break,
else => if (k == .tab or asc.isPrint(c)) {
try al.append(e.alc, c);
},
}
When all is done, we clear the message area with this function, which we'll put in the Helpers section:
/// Clear the message area. Can't fail because it won't reallocate.
fn clearStatusMessage(e: *Editor) void {
e.status_msg.clearRetainingCapacity();
}
Prompting for a filename
In saveFile()
we had a placeholder of this case, and we'll replace it with:
// will prompt for a filename
return;
var al = try e.promptForInput(message.prompt.get("fname").?);
defer al.deinit(e.alc);
if (al.items.len > 0) {
B.filename = try e.updateString(B.filename, al.items);
}
else {
try e.statusMessage("Save aborted", .{});
return;
}
We need a new StringMap in our message
module:
const prompt_messages = .{
.{ "fname", "Enter filename, or ESC to cancel: " },
};
pub const prompt = std.StaticStringMap([]const u8).initComptime(prompt_messages);
Binding Ctrl-S to save the file
We don't have yet a way to save, because we didn't bind a key. We add a new
branch to the processKeypress
function:
.ctrl_s => try e.saveFile(),
And that's about it. If you compile and run with:
./kilo some_new_file
you should be able to edit the file, give it a name and save it.
Highlight
We have two features left to implement: searching and syntax highlighting. Both of them require the ability to apply a different highlight to our text, so we'll do that.
We'll do everything in the types
module, but first we must define the color
codes that we'll be using. In ansi
define these namespaced constants:
/// Codes for 16-colors terminal escape sequences (foreground)
pub const FgColor = struct {
pub const default: u8 = 39;
pub const black: u8 = 30;
pub const red: u8 = 31;
pub const green: u8 = 32;
pub const yellow: u8 = 33;
pub const blue: u8 = 34;
pub const magenta: u8 = 35;
pub const cyan: u8 = 36;
pub const white: u8 = 37;
pub const black_bright: u8 = 90;
pub const red_bright: u8 = 91;
pub const green_bright: u8 = 92;
pub const yellow_bright: u8 = 93;
pub const blue_bright: u8 = 94;
pub const magenta_bright: u8 = 95;
pub const cyan_bright: u8 = 96;
pub const white_bright: u8 = 97;
};
/// Codes for 16-colors terminal escape sequences (background)
pub const BgColor = struct {
pub const default: u8 = 49;
pub const black: u8 = 40;
pub const red: u8 = 41;
pub const green: u8 = 42;
pub const yellow: u8 = 43;
pub const blue: u8 = 44;
pub const magenta: u8 = 45;
pub const cyan: u8 = 46;
pub const white: u8 = 47;
pub const black_bright: u8 = 100;
pub const red_bright: u8 = 101;
pub const green_bright: u8 = 102;
pub const yellow_bright: u8 = 103;
pub const blue_bright: u8 = 104;
pub const magenta_bright: u8 = 105;
pub const cyan_bright: u8 = 106;
pub const white_bright: u8 = 107;
};
Highlight enum
We need to define the Highlight
enum, which goes in types
. We start with
few values and will expand it later:
///////////////////////////////////////////////////////////////////////////////
//
// Highlight
//
///////////////////////////////////////////////////////////////////////////////
/// All available highlight types.
pub const Highlight = enum(u8) {
/// The normal highlight
normal = 0,
/// Incremental search highlight
incsearch,
/// Highlight for error messages
err,
};
An array for highlight
Our Row type must have an additional array, which will have the same length
of the render
array, and which will contain the Highlight
for each
element of the render
array:
/// Array with the highlight of the rendered row
hl: []t.Highlight,
We'll initialize this array in Row.init()
:
.hl = &.{},
deinitialize it in Row.deinit()
:
allocator.free(row.hl);
and will fill it in a new function:
///////////////////////////////////////////////////////////////////////////////
//
// Syntax highlighting
//
///////////////////////////////////////////////////////////////////////////////
/// Update highlight for a row.
fn updateHighlight(e: *Editor, ix: usize) !void {
const row = e.rowAt(ix);
// reset the row highlight to normal
row.hl = try e.alc.realloc(row.hl, row.render.len);
@memset(row.hl, .normal);
}
Later we'll do syntax highlighting here. This function is called at the end of
updateRow()
, because every time the rendered row is updated, its highlight
must be too.
try e.updateHighlight(ix);
}
Highlight groups
Highlight groups have properties, which we define in a new type.
/// Attributes of a highlight group.
pub const HlGroup = struct {
/// Foreground CSI color code
fg: u8,
/// Background CSI color code
bg: u8,
reverse: bool,
bold: bool,
italic: bool,
underline: bool,
};
An array of highlight groups
We create the array with the highlight groups in a new module hlgroups.zig
,
since an array isn't a Type.
We add already a helper to get the index for the array when initializing it.
///////////////////////////////////////////////////////////////////////////////
//
// Highlight groups
//
///////////////////////////////////////////////////////////////////////////////
// here goes the hlGroups array
// Get the enum value as integer, so that it can be used as array index.
fn int(ef: t.Highlight) usize {
return @intFromEnum(ef);
}
///////////////////////////////////////////////////////////////////////////////
//
// Constants, variables
//
///////////////////////////////////////////////////////////////////////////////
const std = @import("std");
const t = @import("types.zig");
const ansi = @import("ansi.zig");
const CSI = ansi.CSI;
const FgColor = ansi.FgColor;
const BgColor = ansi.BgColor;
Here things become really interesting, so pay attention.
We must define an array of highlight groups. There are no designated initializers in Zig, so we use a labeled block to make up for them. At the same time, you'll see that these blocks let us do some wondrous things.
This block must return an array of HlGroup
, with a size that is the number of
the fields of the Highlight
enum. We don't want to guess how many highlight
types we have, so we get the exact number of them. We can do so with:
@typeInfo(EnumType).@"enum".fields.len
@" notation for identifiers
@" notation for identifiers
From the official documentation:
Variable identifiers are never allowed to shadow identifiers from an outer
scope. Identifiers must start with an alphabetic character or underscore
and may be followed by any number of alphanumeric characters or
underscores. They must not overlap with any keywords.
If an identifier wouldn't be valid according to this rules, we can use the @"
notation. In our case we write @"enum"
because enum
is a keyword.
// Number of members in the Highlight enum
const n_hl = @typeInfo(t.Highlight).@"enum".fields.len;
/// Array with highlight groups.
pub const hlGroups: [n_hl]t.HlGroup = arr: {
// Initialize the hlGroups array at compile time. A []HlGroup array is
// first declared undefined, then it is filled with all highlight groups.
var hlg: [n_hl]t.HlGroup = undefined;
hlg[int(.normal)] = .{
.fg = FgColor.default,
.bg = BgColor.default,
.reverse = false,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.incsearch)] = .{
.fg = FgColor.green,
.bg = BgColor.default,
.reverse = true,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.err)] = .{
.fg = FgColor.red_bright,
.bg = BgColor.default,
.reverse = false,
.bold = true,
.italic = false,
.underline = false,
};
break :arr hlg;
};
An array of highlight attributes
We also define an array with the attributes (the generated CSI sequences) for all highlight groups. Also this one is created with a labeled block.
In this last block there's a loop: from the previously defined highlight groups, it will generate the CSI escape sequence (the attribute) of the group itself. This sequence is what we will actually use in the program to apply the highlight.
/// Array with highlight attributes.
pub const hlAttrs: [n_hl][]const u8 = arr: {
// generate the attribute for each of the highlight groups
// bold/italic/etc: either set them, or reset them to avoid their
// propagation from previous groups
var hla: [n_hl][]const u8 = undefined;
for (hlGroups, 0..) |hlg, i| {
hla[i] = CSI ++ std.fmt.comptimePrint("{s}{s}{s}{s}{};{}m", .{
if (hlg.bold) "1;" else "22;",
if (hlg.italic) "3;" else "23;",
if (hlg.underline) "4;" else "24;",
if (hlg.reverse) "7;" else "27;",
hlg.fg,
hlg.bg,
});
}
break :arr hla;
};
Maybe you didn't realize yet why it's so awesome: everything here is done at
compile time! There won't be trace of this in the binary executable, except the
resulting hlAttrs
array. The block doesn't use the comptime
keyword, if you
use it the compiler will tell you
error: redundant comptime keyword in already comptime scope
As proof that the comptime
keyword is unnecessary most of the times.
The hlGroups
array isn't used at runtime. Still, defining it is useful
because we can change more easily the highlight groups. The compiler keeps out
of the executable what isn't used at runtime anyway.
How we access the attribute
We'll create a method in the HlGroup
type that returns the attribute for that
highlight type:
underline: bool,
/// Get the attribute of a HlGroup from the hlAttrs array.
pub fn attr(color: Highlight) []const u8 {
return hlAttrs[@intFromEnum(color)];
}
And import the array:
const hlAttrs = @import("hlgroups.zig").hlAttrs;
CSI escape sequences
CSI escape sequences
The attribute of each highlight group is a string: the escape sequence that is fed to the terminal to get the highlight we want. The format is:
ESC[{bold};{italic};{underline};{reverse};{fg-color};{bg-color}m
For example, if a group wants bold text, it will start with
\x1b[1;
If it doesn't want it, it will reset the bold attribute with
\x1b[22;
Otherwise it would inherit the value of the group that preceded it, whatever it was.
More comptime
We used a hard-coded ErrorColor
when printing errors in the message area,
time to change it in errorMessage()
:
const fmt = ansi.ErrorColor ++ format ++ ansi.ResetColors;
const fmt = comptime t.HlGroup.attr(.err) ++ format ++ ansi.ResetColors;
You should now delete the ErrorColor
constant from ansi
.
Note the comptime
keyword here. Without it, the compiler would say:
error: unable to resolve comptime value
note: slice being concatenated must be comptime-known
With the comptime
keyword, you force the compiler to at least try to get
that value at compile time. In this case, it succeeds. Also note that
comptime
can precede any expression, to force it being evaluated at compile
time: function calls, assignments, etc.
Again: you generally don't need the compile
keyword. But if the compiler
complains with that sort of errors, and you think it should be able to get the
value, it's worth a try.
Applying the highlight
We have now all we need to apply the highlight. This should be done where rows
are drawn, in drawRows()
. There, until now, we were simply drawing the
rendered row as-is. This must change into:
We get the portion of the line that starts at coloff
, and we iterate it for
len
characters, so that we only iterate the part of the line that can fit the
screen:
if (len > 0) {
try e.toSurface(rows[ix].render[V.coloff .. V.coloff + len]);
}
// part of the line after coloff, and its highlight
const rline = if (len > 0) rows[ix].render[V.coloff..] else &.{};
const hl = if (len > 0) rows[ix].hl[V.coloff..] else &.{};
Inside the inner loop we check the character highlight, if it's different, we
apply the highlight attribute, which will remain enabled until a different
highlight is found in the row.hl
array:
var current_color = t.Highlight.normal;
// loop characters of the rendered row
for (rline[0..len], 0..) |c, i| {
if (hl[i] != current_color) {
const color = hl[i];
current_color = color;
try e.toSurface(t.HlGroup.attr(color));
}
We draw the character. At the end of the line we restore default highlight, otherwise the last highlight would carry over beyond the end of the line, and onto the next line:
try e.toSurface(c);
}
// end of the line, reset highlight
try e.toSurface(ansi.ResetColors);
Safe to iterate zero-length slices?
Safe to iterate zero-length slices?
We can safely iterate a zero-length slice with a for loop. For example this
just prints nothing
:
const line: []const u8 = &.{};
for (line) |c| {
std.debug.print("{}\n", .{c});
break;
} else {
std.debug.print("nothing\n", .{});
}
We could not do this with a while loop, because we would need to actually access the line by index.
Highlight for non-printable characters
As a first proof-of-concept for our highlight, we want non-printable characters
to be printed with a reversed highlight (black on white), for example we'll
turn Ctrl-A into A
with reversed colors. If the character is not
a Ctrl character, it will be printed as ?
with reversed colors.
It won't work for some charcters like Tab or Backspace, though, but for now it will do.
This kind of highlight will work with all filetypes, so we aren't talking about syntax highlighting yet.
We'll need a way to insert non-printable characters, so we define a key (Ctrk-K) which will let us insert characters verbatim, even those that we couldn't type anyway. For example Ctrl-Q would quit, it would not insert it. But while inserting characters verbatim we'll be able to type it.
Process verbatim keypresses
In processKeypress()
, we add a variable verbatim
in the static
struct:
const static = struct {
var q: u8 = opt.quit_times;
var verbatim: bool = false;
Just below the static
struct definition, before processing keypresses, we
check if the variable was set, in this case we reset the variable, insert the
character and return. There is a set of characters that we don't insert,
because we cannot handle them at this point, they would just break our text.
if (static.verbatim) {
static.verbatim = false;
switch (k) {
// these cause trouble, don't insert them
.enter,
.ctrl_h,
.backspace,
.ctrl_j,
.ctrl_k,
.ctrl_l,
.ctrl_u,
.ctrl_z,
=> {
try e.errorMessage(message.errors.get("nonprint").?, .{ k });
return;
},
else => try e.insertChar(@intFromEnum(k)),
}
return;
}
We'll make Ctrl-K set this variable to true
:
.ctrl_k => static.verbatim = true,
For the error, we need the nonprint
error message:
.{ "nonprint", "Can't insert character: {any}" },
Highlight the verbatim characters
This highlight group is filetype-independent, so we just handle it in the
drawRows()
inner loop:
if (hl[i] != current_color) {
if (c != '\t' and !asc.isPrint(c)) {
// for example, turn Ctrl-A into 'A' with reversed colors
current_color = t.Highlight.nonprint;
try e.toSurface(t.HlGroup.attr(.nonprint));
try e.toSurface(switch (c) {
0...26 => '@' + c,
else => '?',
});
}
else if (hl[i] != current_color) {
We also need to add nonprint
to the Highlight
enum:
/// Highlight for non-printable characters
nonprint,
Define the highlight group
Now, if you try to compile, the compiler will say something like:
src/types.zig|162 col 20| error: use of undefined value here causes illegal behavior
|| if (hlg.bold) "1;" else "22;",
That's because we didn't define the highlight group in hlGroups
, but the
hlAttrs
initializer tries to access it. This means that our system is really
ok! We can't forget to define groups without the compiler telling us.
So we add the highlight group in the hlGroups
labeled block:
hlg[int(.nonprint)] = .{
.fg = FgColor.white,
.bg = BgColor.default,
.reverse = true,
.bold = false,
.italic = false,
.underline = false,
};
Now it should compile and the following should work:
-
try inserting a non-printable character with Ctrl-K followed by Ctrl-A
-
now try pressing two times Ctrl-K: we decided not insert certain characters and print an error message instead, this should have the
.err
highlight.
Searching
Now that we can prompt the user for input, and we can apply highlight to the text, we could give our editor the capability to search for words in the file.
To be able to do this, we'll need several changes. We defined the incsearch
highlight, so we don't need to do that.
Instead, we must change how promptForInput()
works. Until now, it only
prompted a string from the user and returned it, without doing anything in
between.
Now instead we want that every time the user types a character, the currently typed pattern will be searched, and if found it will be given a highlight on the screen.
The prompt callback
To achieve this, we will need our promptForInput()
function to accept
a callback function as parameter, and call it repeatedly inside its body.
We define the callback types as follows:
///////////////////////////////////////////////////////////////////////////////
//
// Callbacks
//
///////////////////////////////////////////////////////////////////////////////
/// The prompt callback function type
pub const PromptCb = fn (*Editor, PromptCbArgs) EditorError!void;
/// Arguments for the prompt callback
pub const PromptCbArgs = struct {
/// Current input entered by user
input: *Chars,
/// Last typed key
key: Key,
/// Saved view, in case it needs to be restored
saved: View,
/// Becomes true in the last callback invocation
final: bool = false,
};
Note how easy and clear it is in Zig to define typedefs (as they are named in
C), as we do for PromptCb
.
Then we change the promptForInput()
signature to:
/// Start a prompt in the message area, return the user input.
/// Prompt is terminated with either .esc or .enter keys.
/// Prompt is also terminated by .backspace if there is no character left in
/// the input.
fn promptForInput(e: *Editor, prompt: []const u8) !t.Chars {
/// Start a prompt in the message area, return the user input.
/// At each keypress, the prompt callback is invoked, with a final invocation
/// after the prompt has been terminated with either .esc or .enter keys.
/// Prompt is also terminated by .backspace if there is no character left in
/// the input.
fn promptForInput(e: *Editor, prompt: []const u8, saved: t.View, cb: ?t.PromptCb) !t.Chars {
_ = cb;
_ = saved;
We'll have to fix the previous invocation:
var al = try e.promptForInput(message.prompt.get("fname").?);
var al = try e.promptForInput(message.prompt.get("fname").?, .{}, null);
EditorError
set
If you try to compile now, the compiler will tell you that this error doesn't
exist. If you try to remove it from the PromptCb
return value, the compiler
will tell you
error: function type cannot have an inferred error set
So we need an explicit error set for our callback. We don't know how many kinds
of errors could cause a PromptCb
to fail. The callback we'll be using for the
searching function will be of type
error{OutOfMemory}
So we could just write that. But PromptCb
is a 'generic' callback, which
could do just about anything, and we'd need to add more errors to that set.
Instead, we create our EditorError
set, and if we'll need to handle more
errors, we'll add them to this set.
Just add it above our previous IoError
set:
/// Error set for functions requiring explicit error handling.
pub const EditorError = error{
OutOfMemory,
};
Updated promptForInput()
Remove those assignments at the top:
_ = cb;
_ = saved;
Now our prompt function needs to invoke this PromptCb
callback.
Before the loop starts, we want to define some variables:
var k: t.Key = undefined;
var c: u8 = undefined;
var cb_args: t.PromptCbArgs = undefined;
while (true) {
which we'll assign inside the loop:
while (true) {
try e.statusMessage("{s}{s}", .{ prompt, al.items });
try e.refreshScreen();
const k = try ansi.readKey();
const c = @intFromEnum(k);
k = try ansi.readKey();
c = @intFromEnum(k);
cb_args = .{ .input = &al, .key = k, .saved = saved };
Before the loop ends, we run the callback, if not null
:
if (cb) |callback| try callback(e, cb_args);
}
e.clearStatusMessage();
After the loop, we call it one last time before returning the input:
e.clearStatusMessage();
return al;
e.clearStatusMessage();
cb_args.final = true;
if (cb) |callback| try callback(e, cb_args);
return al;
Doing the search
Our prompt accepts a callback now, so we're ready to implement the search functionality.
We bind a new key:
.ctrl_f => try e.find(),
Then we define our function:
///////////////////////////////////////////////////////////////////////////////
//
// Find
//
///////////////////////////////////////////////////////////////////////////////
/// Start the search prompt.
fn find(e: *Editor) !void {
const saved = e.view;
var query = try e.promptForInput("/", saved, findCallback);
query.deinit(e.alc);
}
In this function, we make a copy of the current View, so that we can restore the cursor position in the case that the search is interrupted.
We get our query, then deinitialize it. It's clear we're missing some piece of the puzzle...
Which brings us to the findCallback()
function, which is passed to the
prompt.
The find callback: preparations
We'll have to break the code for the find callback in pieces somehow.
We also need some additional preparations.
Pos
type
We need a type that represents a position in the buffer.
/// A position in the buffer.
pub const Pos = struct {
lnr: usize = 0,
col: usize = 0,
};
wrapscan
option
We need an option for the searching behavior: should the search continue when the end of file is reached, by repeating the search from the start of the file? This also works while searching backwards:
/// Searches wrap around the end of the file
pub var wrapscan = true;
Constants
We need two new constants:
const lastIndexOf = mem.lastIndexOf;
const indexOf = mem.indexOf;
The findCallback()
function
This one is big. We'll start with a stub, filled with placeholders. We reset the highlight and we handle clean up at the start of the function, so that later it can return at any point.
/// Called by promptForInput() for every valid inserted character.
/// The saved view is restored when the current query isn't found, or when
/// backspace clears the query, so that the search starts from the original
/// position.
fn findCallback(e: *Editor, ca: t.PromptCbArgs) t.EditorError!void {
// 1. variables
// 2. restore line highlight
// 3. clean up
// 4. query is empty so no need to search, but restore position
// 5. handle backspace
// 6. find the starting line and the column offset for the search
// 7. start the search
}
Variables
As we did before, we have a static
struct which will save the current state
of the search.
const static = struct {
var found: bool = false;
var view: t.View = .{};
var pos: t.Pos = .{};
var oldhl: []t.Highlight = &.{};
};
We also define some constants:
const empty = ca.input.items.len == 0;
const numrows = e.buffer.rows.items.len;
Restore line highlight before incsearch highlight
Before a new search attempt, we restore the underlying line highlight, so that if the search fails, the search highlight has been cleared already.
// restore line highlight before incsearch highlight, or clean up
if (static.oldhl.len > 0) {
@memcpy(e.rowAt(static.pos.lnr).hl, static.oldhl);
}
Clean up
The clean up must also be handled early. This block runs during the last invocation of the callback, that is done for this exact purpose.
In this step we free the search highlight, reset our static variables and restore the view if necessary.
// clean up
if (ca.final) {
e.alc.free(static.oldhl);
static.oldhl = &.{};
if (empty or ca.key == .esc) {
e.view = ca.saved;
}
if (!static.found and ca.key == .enter) {
try e.statusMessage("No match found", .{});
}
static.found = false;
return;
}
Empty query
This happens after we press Backspace and the query is now empty.
We don't cancel the search yet, but we restore the original view.
Search will be canceled if we press Backspace again.
We also reset static.found
because it was true if that character we just
deleted was a match.
// Query is empty so no need to search, but restore position
if (empty) {
static.found = false;
e.view = ca.saved;
return;
}
Handle Backspace
This happens when we press Backspace, but the query is not empty. In this case we restore our static view, which is set later on. Note that if the current query can't be found, this would be the same of the original view, but what matters is that we must restore it, whatever it is.
// when pressing backspace we restore the previously saved view
// cursor might move or not, depending on whether there is a match at
// cursor position
if (ca.key == .backspace or ca.key == .ctrl_h) {
e.view = static.view;
}
Find the starting position for the search
We define some constants, to make the function flow more understandable.
//////////////////////////////////////////
// Find the starting position
//////////////////////////////////////////
const V = &e.view;
const prev = ca.key == .ctrl_t;
const next = ca.key == .ctrl_g;
// current cursor position
var pos = t.Pos{ .lnr = V.cy, .col = V.cx };
const eof = V.cy == numrows;
const last_char_in_row = !eof and V.rx == e.currentRow().render.len;
const last_row = V.cy == numrows - 1;
// must move the cursor forward before searching when we don't want to
// match at cursor position
const step_fwd = next or empty or !static.found;
If we skip the !eof
check when defining last_char_in_row
, we would cause
panic when starting a search at the end of the file. This happens because
e.currentRow()
tries to get a pointer to a line that doesn't exist. Watch out
for these things!
We are determining where the search must start, and that's either at cursor position, or just after that (one character to the right). That is, we must decide whether to accept a match at cursor position or not.
We want to step forward:
-
if we press Ctrl-G, looking for the next match
-
if we are at the starting position, because either:
- we just started a search
- query is empty
- a match hasn't been found
In any of these cases:
if (step_fwd) {
if (eof or (last_row and last_char_in_row)) {
if (!opt.wrapscan) { // restart from the beginning of the file?
return;
}
}
else if (last_char_in_row) { // start searching from next line
pos.lnr = V.cy + 1;
}
else { // start searching after current column
pos.col = V.cx + 1;
pos.lnr = V.cy;
}
}
Start the search
Our match is an optional slice of the chars.items
array of the Row where the
match was found. We try to find it with the appropriate functions, which we'll
define later.
//////////////////////////////////////////
// Start the search
//////////////////////////////////////////
var match: ?[]const u8 = null;
if (!prev) {
match = e.findForward(ca.input.items, &pos);
}
else {
match = e.findBackward(ca.input.items, &pos);
}
static.found = match != null;
If a match is found, we update the cursor position and the static variables.
Since match
is a slice of the original array, we can find the column with
pointer arithmetic, by subtracting the address of the first character of the
chars.items
array from the address of the first character of our match.
const row = e.rowAt(pos.lnr);
if (match) |m| {
V.cy = pos.lnr;
V.cx = &m[0] - &row.chars.items[0];
static.view = e.view;
static.pos = .{ .lnr = pos.lnr, .col = V.cx };
Since we pass &pos
to the functions, we could set the column there,
but this works anyway (it's actually less trouble). Initially I wasn't using
Pos, but I'm keeping it to show an example of pointer arithmetic in Zig.
Feel free to refactor it if it suits you better.
Before setting the new highlight, we store a copy in static.oldhl
. It will be
restored at the top of the callback, every time the callback is invoked.
Note that we are matching against row.chars.items
(the real row), but the
highlight must match the characters in the rendered row, so we must convert our
match position first, with cxToRx
.
// first make a copy of current highlight, to be restored later
static.oldhl = try e.alc.realloc(static.oldhl, row.render.len);
@memcpy(static.oldhl, row.hl);
// apply search highlight
const start = row.cxToRx(V.cx);
const end = row.cxToRx(V.cx + m.len);
@memset(row.hl[start .. end], t.Highlight.incsearch);
}
If a match wasn't found, we restore the initial view (before we started searching).
We must also handle the case that wrapscan
is disabled, a match
isn't found in the current searching direction, but there was possibly a match
before, so we just remain there, and set the highlight at current position. We
need to set it because the original has been restored at the top.
Also here we do the same conversion, but we use the saved position.
else if (next or prev) {
// the next match wasn't found in the searching direction
// we still set the highlight for the current match, since the original
// highlight has been restored at the top of the function
// this can definitely happen with !wrapscan
const start = row.cxToRx(static.pos.col);
const end = row.cxToRx(static.pos.col + ca.input.items.len);
@memset(row.hl[start .. end], t.Highlight.incsearch);
}
else {
// a match wasn't found because the input couldn't be found
// restore the original view (from before the start of the search)
e.view = ca.saved;
}
Search forwards
When searching forwards for a match, we start searching at the given position,
in the current row. We use the std.mem.indexOf
function, that finds the
relative position of a slice in another slice, or returns null
if the slice
isn't contained in the other slice.
Following steps are followed unless a match is returned.
• | search a slice of the current row [col..] |
• | reset search column to 0 |
• | search the following lines |
• | end of file, no wrapscan? return null |
• | restart from the beginning of the file |
• | if you reach the initial line, only search [..col] |
If a match is found, pos.lnr
is updated, because the callback will need the
line where it was found.
/// Start a search forwards.
fn findForward(e: *Editor, query: []const u8, pos: *t.Pos) ?[]const u8 {
var col = pos.col;
var i = pos.lnr;
while (i < e.buffer.rows.items.len) : (i += 1) {
const rowchars = e.rowAt(i).chars.items;
if (indexOf(u8, rowchars[col..], query)) |m| {
pos.lnr = i;
return rowchars[(col + m)..(col + m + query.len)];
}
col = 0; // reset search column
}
if (!opt.wrapscan) {
return null;
}
// wrapscan enabled, search from start of the file to current row
i = 0;
while (i <= pos.lnr) : (i += 1) {
const rowchars = e.rowAt(i).chars.items;
if (indexOf(u8, rowchars, query)) |m| {
pos.lnr = i;
return rowchars[m .. m + query.len];
}
}
return null;
}
Search backward
The process is very similar, but in reverse. This time we use the
std.mem.lastIndexOf
function, that finds the relative position of a slice in
another slice before a certain index, or returns null
if the slice isn't
contained in the other slice.
Following steps are followed unless a match is returned.
• | search a slice of the current row [0..col] |
• | search the previous lines |
• | start of file, no wrapscan? return null |
• | restart from the end of the file |
• | if you reach the initial line, only search [col..] |
If a match is found, pos.lnr
is updated, because the callback will need the
line where it was found.
/// Start a search backwards.
fn findBackward(e: *Editor, query: []const u8, pos: *t.Pos) ?[]const u8 {
// first line, search up to col
const row = e.rowAt(pos.lnr);
const col = pos.col;
var rowchars = row.chars.items;
var i: usize = undefined;
if (lastIndexOf(u8, rowchars[0..col], query)) |m| {
return rowchars[m .. m + query.len];
}
else if (pos.lnr > 0) {
// previous lines, search full line
i = pos.lnr - 1;
while (true) : (i -= 1) {
rowchars = e.rowAt(i).chars.items;
if (lastIndexOf(u8, rowchars, query)) |m| {
pos.lnr = i;
return rowchars[m .. m + query.len];
}
if (i == 0) break;
}
}
if (!opt.wrapscan) {
return null;
}
i = e.buffer.rows.items.len - 1;
while (i > pos.lnr) : (i -= 1) {
rowchars = e.rowAt(i).chars.items;
if (lastIndexOf(u8, rowchars, query)) |m| {
pos.lnr = i;
return rowchars[m .. m + query.len];
}
}
// check again the starting line, this time in the part after the offset
rowchars = e.rowAt(pos.lnr).chars.items;
if (lastIndexOf(u8, rowchars[col..], query)) |m| {
// m is the index in the substring starting from `col`, therefore we
// must add `col` to get the real index in the row
return rowchars[m + col .. m + col + query.len];
}
return null;
}
Write a test
By now you should be able to compile, run and test the feature yourself.
Anyway, the searching feature is way more complex than anything we did before, and it's worth writing a test for it.
I don't know how to simulate keystrokes, so I'm just calling the callback repeatedly.
I initialize the editor with a 'fake' screen, because this isn't an interactive terminal.
Remember that we can do array multiplications (**
) and concatenation (++
),
but only in comptime scopes.
I won't explain what the test does, hopefully you'll be able to understand it.
test "find" {
var da = std.heap.DebugAllocator(.{}){};
defer _ = da.deinit();
var e = try t.Editor.init(da.allocator(), .{ .rows = 50, .cols = 180 });
defer e.deinit();
opt.wrapscan = true;
opt.tabstop = 8;
// our test buffer
try e.insertRow(e.buffer.rows.items.len, "\tabb");
try e.insertRow(e.buffer.rows.items.len, "\tacc");
try e.insertRow(e.buffer.rows.items.len, "\tadd\tadd");
const n = [1]t.Highlight{ .normal };
const s = [1]t.Highlight{ .incsearch };
// Row.hl has the same number of elements as the rendered row, and here we
// have tabs
// first 2 lines: normal highlight
const norm1 = n ** 11;
// third line: normal highlight
const norm2 = n ** 19;
// \t + 1 letter in lines 1-2
const hl = s ** 9 ++ n ** 2;
// \t + 2 letters in lines 1-2
const hl2 = s ** 10 ++ n ** 1;
// \t + 2 letters in line 3, first match
const hl3 = s ** 10 ++ n ** 9;
// \t + 1 letter in line 3, first match
const hl4 = s ** 9 ++ n ** 10;
// \t + 1 letter in line 3, second match
const hl5 = n ** 11 ++ s ** 6 ++ n ** 2;
var al = try t.Chars.initCapacity(e.alc, 80);
defer al.deinit(e.alc);
// our prompt is "\ta", it should be found in line 2, because we skip the
// match at cursor position
try al.appendSlice(e.alc, "\ta");
var ca: t.PromptCbArgs = .{ .input = &al, .key = @enumFromInt('a'), .saved = e.view };
try e.findCallback(ca);
try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &hl));
try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &norm2));
// now it's "\tac", extending the current match
try al.append(e.alc, 'c');
ca = .{ .input = &al, .key = @enumFromInt('c'), .saved = e.view };
try e.findCallback(ca);
try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &hl2));
try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &norm2));
// now it's "\ta", resizing the current match
_ = al.pop();
ca = .{ .input = &al, .key = .backspace, .saved = e.view };
try e.findCallback(ca);
try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &hl));
try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &norm2));
// now it's "\tad", found in line 3
try al.append(e.alc, 'd');
ca = .{ .input = &al, .key = @enumFromInt('d'), .saved = e.view };
try e.findCallback(ca);
try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &hl3));
// now it's "\ta", resizes the current match
_ = al.pop();
ca = .{ .input = &al, .key = .backspace, .saved = e.view };
try e.findCallback(ca);
try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &hl4));
// find next: finds another "\ta" in the same row
ca = .{ .input = &al, .key = .ctrl_g, .saved = e.view };
try e.findCallback(ca);
try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &hl5));
// find next again: finds "\ta" in the first line
ca = .{ .input = &al, .key = .ctrl_g, .saved = e.view };
try e.findCallback(ca);
try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &hl));
try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &norm2));
// find prev: goes back to last line (2nd match)
ca = .{ .input = &al, .key = .ctrl_t, .saved = e.view };
try e.findCallback(ca);
try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &hl5));
opt.wrapscan = false;
// find next should fail (stays the same)
ca = .{ .input = &al, .key = .ctrl_g, .saved = e.view };
try e.findCallback(ca);
try expect(mem.eql(t.Highlight, e.rowAt(0).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(1).hl, &norm1));
try expect(mem.eql(t.Highlight, e.rowAt(2).hl, &hl5));
// clean up
ca.final = true;
try e.findCallback(ca);
}
Syntax highlighting
The last feature to implement is syntax highlighting.
New fields
Add a new field in the Buffer
type:
// Pointer to the syntax definition
syndef: ?*const t.Syntax,
And one in the Row
type:
/// True when the row has a multiline comment continuing into next line
ml_comment: bool,
This one becomes true when a line contains the leader that opens the multi-line comment, and stays true in all following rows, until the end of the block is found, in that row it becomes false again.
Initialize both in their relative init()
, to null
and false
respectively.
Add imports where necessary.
Fill the rest of Highlight enum
This is the full Highlight enum, with all needed highlight names:
/// All available highlight types.
pub const Highlight = enum(u8) {
/// The normal highlight
normal = 0,
/// Line comments highlight
comment,
/// Multiline comments highlight
mlcomment,
/// Numbers highlight
number,
/// String highlight
string,
/// Highlight for keywords of type 'keyword'
keyword,
/// Highlight for keywords of type 'types'
types,
/// Highlight for keywords of type 'builtin'
builtin,
/// Highlight for keywords of type 'constant'
constant,
/// Highlight for keywords of type 'preproc'
preproc,
/// Highlight for uppercase words
uppercase,
/// Highlight for escape sequences in strings
escape,
/// Incremental search highlight
incsearch,
/// Highlight for non-printable characters
nonprint,
/// Highlight for error messages
err,
};
Fill the rest of hlGroups array
This the full initializer of the hlGroups
array, replace the previous one
with it.
/// Array with highlight groups.
pub const hlGroups: [n_hl]t.HlGroup = arr: {
// Initialize the hlGroups array at compile time. A []HlGroup array is
// first declared undefined, then it is filled with all highlight groups.
var hlg: [n_hl]t.HlGroup = undefined;
hlg[int(.normal)] = .{
.fg = FgColor.default,
.bg = BgColor.default,
.reverse = false,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.comment)] = .{
.fg = FgColor.black_bright,
.bg = BgColor.default,
.reverse = false,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.mlcomment)] = .{
.fg = FgColor.blue_bright,
.bg = BgColor.default,
.reverse = false,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.number)] = .{
.fg = FgColor.white_bright,
.bg = BgColor.default,
.reverse = false,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.string)] = .{
.fg = FgColor.green,
.bg = BgColor.default,
.reverse = false,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.keyword)] = .{
.fg = FgColor.cyan,
.bg = BgColor.default,
.reverse = false,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.types)] = .{
.fg = FgColor.cyan_bright,
.bg = BgColor.default,
.reverse = false,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.builtin)] = .{
.fg = FgColor.magenta,
.bg = BgColor.default,
.reverse = false,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.constant)] = .{
.fg = FgColor.yellow,
.bg = BgColor.default,
.reverse = false,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.preproc)] = .{
.fg = FgColor.red_bright,
.bg = BgColor.default,
.reverse = false,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.uppercase)] = .{
.fg = FgColor.yellow_bright,
.bg = BgColor.default,
.reverse = false,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.escape)] = .{
.fg = FgColor.red,
.bg = BgColor.default,
.reverse = false,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.incsearch)] = .{
.fg = FgColor.green,
.bg = BgColor.default,
.reverse = true,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.nonprint)] = .{
.fg = FgColor.white,
.bg = BgColor.default,
.reverse = true,
.bold = false,
.italic = false,
.underline = false,
};
hlg[int(.err)] = .{
.fg = FgColor.red_bright,
.bg = BgColor.default,
.reverse = false,
.bold = true,
.italic = false,
.underline = false,
};
break :arr hlg;
};
Syntax types
You can either defined them in types
module (which I do) or in a different
file, which will be imported by the types
module, so that it's accessible
also from there.
The Syntax type
This type defines many properities of a syntax: extensions used for filetype detection, comment leaders, keywords and syntax-specific editor options.
///////////////////////////////////////////////////////////////////////////////
//
// Syntax types
//
///////////////////////////////////////////////////////////////////////////////
pub const Syntax = struct {
/// Name of filetype
ft_name: []const u8,
/// Array of extensions for filetype detection
ft_ext: []const []const u8,
/// Array of names for filetype detection, to be matched against the tail
/// of any path, so for example ".git/config" will match against any git
/// configuration file in any directory.
ft_fntails: []const []const u8,
/// Leaders for single-line comments
lcmt: []const []const u8,
/// Array with multiline comment leaders
/// [0] is start of block
/// [1] is leader for lines between start and end
/// [2] is end of block
mlcmt: ?[3][]const u8,
/// Array of words with 'Keywords' highlight
keywords: []const []const u8,
/// Array of words with 'Types' highlight
types: []const []const u8,
/// Array of words with 'Builtin' highlight
builtin: []const []const u8,
/// Array of words with 'Constant' highlight
constant: []const []const u8,
/// Array of words with 'Preproc' highlight
preproc: []const []const u8,
/// Bit field with supported syntax groups
flags: SyntaxFlags,
};
The syntax flags
This type is important because it controls the kinds of highlight that a syntax supports, that is what the syntax highlighter will actually highlight when parsing the buffer.
pub const SyntaxFlags = packed struct {
/// Should highlight integer and floating point numbers
numbers: bool = false,
/// Should highlight 0x[0-9a-fA-F]+ numbers
hex: bool = false,
/// Should highlight 0b[01]+ numbers
bin: bool = false,
/// Should highlight 0o[0-7]+ numbers
octal: bool = false,
/// Supports undescores in numeric literals
uscn: bool = false,
/// Should highlight strings
strings: bool = false,
/// Supports double-quoted strings
dquotes: bool = false,
/// Supports single-quoted strings
squotes: bool = false,
/// Highlight backticks as strings
backticks: bool = false,
/// Single-quotes are used for char literals instead
chars: bool = false,
/// Should highlight uppercase words
uppercase: bool = false,
};
Syntax definitions
I won't explain much of what goes on here: it's a list of syntax definitions,
with their flags, keywords and so on. Create a module named syndefs.zig
and
copy-paste them.
You might note that all arrays have a &
in front of them. That's because the
Syntax
type's fields are slices, for example []const []const u8
.
The only exception is mlcmt
, because it has a fixed size, besides being
optional.
//! Module that contains all syntax definitions.
pub const Syntaxes = [_]t.Syntax{
//////////////////////////////////////////////////////////////////////////
//// zig
.{
.ft_name = "zig",
.ft_ext = &.{
"zig",
"zon",
},
.ft_fntails = &.{},
.lcmt = &.{"//"},
.mlcmt = null,
.keywords = &.{
"addrspace", "align", "allowzero", "and",
"anyframe", "anytype", "catch", "const",
"else", "enum", "error", "fn",
"for", "if", "opaque", "or",
"orelse", "packed", "struct", "switch",
"try", "union", "usingnamespace", "var",
"volatile", "while",
},
.types = &.{
"i8", "u8", "i16", "u16",
"i32", "u32", "i64", "u64",
"i128", "u128", "isize", "usize",
"c_char", "c_short", "c_ushort", "c_int",
"c_uint", "c_long", "c_ulong", "c_longlong",
"c_ulonglong", "c_longdouble", "f16", "f32",
"f64", "f80", "f128", "bool",
"anyopaque", "void", "noreturn", "type",
"anyerror", "comptime_int", "comptime_float",
},
.builtin = &.{
"export", "extern", "noinline", "nosuspend",
"inline", "suspend", "async", "await",
"defer", "errdefer", "unreachable", "comptime",
"continue", "return", "resume", "threadlocal",
"callconv", "linksection", "asm", "noalias",
"test", "pub", "break",
},
.constant = &.{
"undefined", "true", "false", "null",
},
.preproc = &.{
"@addrSpaceCast", "@addWithOverflow", "@alignCast",
"@alignOf", "@as", "@atomicLoad",
"@atomicRmw", "@atomicStore", "@bitCast",
"@bitOffsetOf", "@bitSizeOf", "@branchHint",
"@breakpoint", "@mulAdd", "@byteSwap",
"@bitReverse", "@offsetOf", "@call",
"@cDefine", "@cImport", "@cInclude",
"@clz", "@cmpxchgStrong", "@cmpxchgWeak",
"@compileError", "@compileLog", "@constCast",
"@ctz", "@cUndef", "@cVaArg",
"@cVaCopy", "@cVaEnd", "@cVaStart",
"@divExact", "@divFloor", "@divTrunc",
"@embedFile", "@enumFromInt", "@errorFromInt",
"@errorName", "@errorReturnTrace", "@errorCast",
"@export", "@extern", "@field",
"@fieldParentPtr", "@FieldType", "@floatCast",
"@floatFromInt", "@frameAddress", "@hasDecl",
"@hasField", "@import", "@inComptime",
"@intCast", "@intFromBool", "@intFromEnum",
"@intFromError", "@intFromFloat", "@intFromPtr",
"@max", "@memcpy", "@memset",
"@min", "@wasmMemorySize", "@wasmMemoryGrow",
"@mod", "@mulWithOverflow", "@panic",
"@popCount", "@prefetch", "@ptrCast",
"@ptrFromInt", "@rem", "@returnAddress",
"@select", "@setEvalBranchQuota", "@setFloatMode",
"@setRuntimeSafety", "@shlExact", "@shlWithOverflow",
"@shrExact", "@shuffle", "@sizeOf",
"@splat", "@reduce", "@src",
"@sqrt", "@sin", "@cos",
"@tan", "@exp", "@exp2",
"@log", "@log2", "@log10",
"@abs", "@floor", "@ceil",
"@trunc", "@round", "@subWithOverflow",
"@tagName", "@This", "@trap",
"@truncate", "@Type", "@typeInfo",
"@typeName", "@TypeOf", "@unionInit",
"@Vector", "@volatileCast", "@workGroupId",
"@workGroupSize", "@workItemId",
},
.flags = .{
.numbers = true,
.strings = true,
.dquotes = true,
.chars = true,
.uppercase = true,
.hex = true,
.bin = true,
.octal = true,
.uscn = true,
},
},
//////////////////////////////////////////////////////////////////////////
//// c
.{
.ft_name = "c",
.ft_ext = &.{
"c", "h",
},
.ft_fntails = &.{},
.lcmt = &.{"//"},
.mlcmt = .{ "/*", " *", "*/" },
.keywords = &.{
"auto", "case", "const",
"default", "do", "else",
"enum", "extern", "for",
"goto", "if", "inline",
"register", "restrict", "sizeof",
"static", "struct", "switch",
"typedef", "union", "volatile",
"while", "_Alignas", "_Alignof",
"_Atomic", "_Generic", "_Noreturn",
"_Static_assert", "_Thread_local",
},
.types = &.{
"void", "char", "short", "int", "long",
"float", "double", "signed", "unsigned", "_Bool",
"_Complex", "_Imaginary", "size_t", "ptrdiff_t", "wchar_t",
"int8_t", "int16_t", "int32_t", "int64_t", "uint8_t",
"uint16_t", "uint32_t", "uint64_t", "intptr_t", "uintptr_t",
},
.builtin = &.{
"continue", "return", "break",
},
.constant = &.{
"NULL", "EOF", "true", "false", "TRUE", "FALSE",
},
.preproc = &.{
"#include", "#define", "#undef", "#ifdef",
"#ifndef", "#if", "#endif", "#else",
"#elif", "#line", "#error", "#pragma",
"#warning", "__FILE__", "__LINE__", "__DATE__",
"__TIME__", "__STDC__", "__STDC_VERSION__", "__func__",
"__FUNCTION__",
},
.flags = .{
.numbers = true,
.strings = true,
.dquotes = true,
.chars = true,
.uppercase = true,
.hex = true,
},
},
//////////////////////////////////////////////////////////////////////////
//// c++
.{
.ft_name = "cpp",
.ft_ext = &.{
"cpp", "cc", "cxx", "c++", "hpp", "hh", "hxx", "h++",
},
.ft_fntails = &.{},
.lcmt = &.{"//"},
.mlcmt = .{ "/*", " *", "*/" },
.keywords = &.{
"alignas", "alignof", "and", "and_eq",
"asm", "auto", "bitand", "bitor",
"case", "catch", "class", "compl",
"const", "consteval", "constexpr", "constinit",
"const_cast", "co_await", "co_return", "co_yield",
"decltype", "default", "delete", "do",
"dynamic_cast", "else", "enum", "explicit",
"export", "extern", "for", "friend",
"goto", "if", "inline", "mutable",
"namespace", "new", "noexcept", "not",
"not_eq", "operator", "or", "or_eq",
"private", "protected", "public", "register",
"reinterpret_cast", "requires", "sizeof", "static",
"static_assert", "static_cast", "struct", "switch",
"template", "this", "thread_local", "throw",
"try", "typedef", "typeid", "typename",
"union", "using", "virtual", "volatile",
"while", "xor", "xor_eq",
},
.types = &.{
"void", "char", "char8_t", "char16_t",
"char32_t", "wchar_t", "short", "int",
"long", "float", "double", "signed",
"unsigned", "bool", "size_t", "ptrdiff_t",
"int8_t", "int16_t", "int32_t", "int64_t",
"uint8_t", "uint16_t", "uint32_t", "uint64_t",
"intptr_t", "uintptr_t",
},
.builtin = &.{
"std", "override", "final", "concept",
"continue", "return", "break",
},
.constant = &.{
"nullptr", "true", "false",
},
.preproc = &.{
"#include", "#define", "#undef", "#ifdef",
"#ifndef", "#if", "#endif", "#else",
"#elif", "#line", "#error", "#pragma",
"#warning", "__cplusplus", "__FILE__", "__LINE__",
"__DATE__", "__TIME__", "__func__", "__FUNCTION__",
"__PRETTY_FUNCTION__",
},
.flags = .{
.numbers = true,
.strings = true,
.dquotes = true,
.chars = true,
.uppercase = true,
.hex = true,
.bin = true,
},
},
//////////////////////////////////////////////////////////////////////////
//// python
.{
.ft_name = "python",
.ft_ext = &.{
"py", "pyw", "pyx", "pxd", "pxi",
},
.ft_fntails = &.{},
.lcmt = &.{"#"},
.mlcmt = .{ "\"\"\"", "", "\"\"\"" },
.keywords = &.{
"and", "as", "assert", "class",
"def", "del", "elif", "else",
"except", "finally", "for", "from",
"global", "if", "import", "in",
"is", "lambda", "nonlocal", "not",
"or", "pass", "raise", "try",
"while", "with", "yield", "async",
"await",
},
.types = &.{
"int", "float", "complex", "str", "bytes",
"bytearray", "bool", "list", "tuple", "dict",
"set", "frozenset", "object", "type",
},
.builtin = &.{
"__debug__", "self", "cls",
"continue", "return", "break",
},
.constant = &.{
"True", "False", "None", "NotImplemented", "Ellipsis",
},
.preproc = &.{
"abs", "all", "any", "ascii",
"bin", "bool", "callable", "chr",
"classmethod", "compile", "delattr", "dir",
"divmod", "enumerate", "eval", "exec",
"filter", "format", "getattr", "globals",
"hasattr", "hash", "help", "hex",
"id", "input", "isinstance", "issubclass",
"iter", "len", "locals", "map",
"max", "memoryview", "min", "next",
"oct", "open", "ord", "pow",
"print", "property", "range", "repr",
"reversed", "round", "setattr", "slice",
"sorted", "staticmethod", "sum", "super",
"vars", "zip", "__import__",
},
.flags = .{
.numbers = true,
.strings = true,
.dquotes = true,
.squotes = true,
.uppercase = true,
.hex = true,
.bin = true,
.octal = true,
.uscn = true,
},
},
//////////////////////////////////////////////////////////////////////////
//// lua
.{
.ft_name = "lua",
.ft_ext = &.{
"lua",
},
.ft_fntails = &.{},
.lcmt = &.{"--"},
.mlcmt = .{ "--[[", "", "]]" },
.keywords = &.{
"and", "do", "else", "elseif", "end",
"for", "function", "if", "in", "local",
"not", "or", "repeat", "then", "until",
"while",
},
.types = &.{
"boolean", "number", "string", "userdata",
"function", "thread", "table",
},
.builtin = &.{
"_G", "_VERSION", "self",
"goto", "return", "break",
},
.constant = &.{
"true", "false", "nil",
},
.preproc = &.{
"assert", "collectgarbage", "dofile", "error",
"getfenv", "getmetatable", "ipairs", "load",
"loadfile", "loadstring", "next", "pairs",
"pcall", "print", "rawequal", "rawget",
"rawlen", "rawset", "require", "select",
"setfenv", "setmetatable", "tonumber", "tostring",
"type", "unpack", "xpcall", "coroutine",
"debug", "io", "math", "os",
"package", "string", "table",
},
.flags = .{
.numbers = true,
.strings = true,
.dquotes = true,
.squotes = true,
.uppercase = true,
.hex = true,
},
},
//////////////////////////////////////////////////////////////////////////
//// javascript
.{
.ft_name = "javascript",
.ft_ext = &.{
"js", "jsx", "mjs", "cjs",
},
.ft_fntails = &.{},
.lcmt = &.{"//"},
.mlcmt = .{ "/*", " *", "*/" },
.keywords = &.{
"case", "catch", "class", "const", "debugger",
"default", "delete", "do", "else", "export",
"extends", "finally", "for", "function", "if",
"import", "in", "instanceof", "let", "new",
"super", "switch", "this", "throw", "try",
"typeof", "var", "void", "while", "with",
"yield", "async", "await", "static", "get",
"set",
},
.types = &.{
"boolean", "number", "bigint", "string", "symbol",
"object", "Array", "Object", "Function", "String",
"Number", "Boolean", "Date", "RegExp", "Error",
"Map", "Set", "Promise", "Symbol", "BigInt",
},
.builtin = &.{
"globalThis", "console", "window", "document",
"global", "process", "continue", "return",
"break",
},
.constant = &.{
"true", "false", "null", "undefined", "NaN", "Infinity",
},
.preproc = &.{
"parseInt", "parseFloat", "isNaN",
"isFinite", "encodeURI", "encodeURIComponent",
"decodeURI", "decodeURIComponent", "eval",
"setTimeout", "setInterval", "clearTimeout",
"clearInterval", "JSON", "Math",
"console", "alert", "confirm",
"prompt", "require", "module",
"exports", "__dirname", "__filename",
},
.flags = .{
.numbers = true,
.strings = true,
.dquotes = true,
.squotes = true,
.uppercase = true,
.hex = true,
.bin = true,
.octal = true,
},
},
//////////////////////////////////////////////////////////////////////////
//// bash
.{
.ft_name = "bash",
.ft_ext = &.{
"sh", "bash", "zsh", "fish",
},
.ft_fntails = &.{ ".bashrc", ".bash_profile", ".zshrc", ".profile" },
.lcmt = &.{"#"},
.mlcmt = null,
.keywords = &.{
"if", "then", "else", "elif", "fi",
"case", "esac", "for", "while", "until",
"do", "done", "function", "select", "time",
"in", "break", "continue", "return", "exit",
"local", "readonly", "declare", "typeset", "export",
"unset", "shift", "set", "unalias", "alias",
"source", "eval", "exec", "trap", "wait",
"jobs", "bg", "fg", "disown", "suspend",
"kill", "killall", "nohup", "logout",
},
.types = &.{
"array", "string", "integer", "associative",
},
.builtin = &.{
"echo", "printf", "read", "test", "cd",
"pwd", "pushd", "popd", "dirs", "history",
"fc", "hash", "type", "which", "command",
"builtin", "enable", "help", "bind", "complete",
"compgen", "caller", "getopts", "let", "mapfile",
"readarray", "ulimit", "umask", "shopt", "times",
},
.constant = &.{
"true", "false",
},
.preproc = &.{
"$0", "$1", "$2", "$3",
"$4", "$5", "$6", "$7",
"$8", "$9", "$@", "$*",
"$#", "$", "$!", "$?",
"$-", "$_", "$HOME", "$PATH",
"$PWD", "$OLDPWD", "$USER", "$UID",
"$SHELL", "$TERM", "$LANG", "$LC_ALL",
"$TMPDIR", "$IFS", "$PS1", "$PS2",
"$PS3", "$PS4", "$PROMPT_COMMAND", "$BASH",
"$BASH_VERSION", "$BASHPID", "$BASH_SUBSHELL", "$LINENO",
"$FUNCNAME", "$BASH_SOURCE", "$BASH_LINENO", "$SECONDS",
"$RANDOM", "$REPLY", "$OPTARG", "$OPTIND",
"$HOSTNAME", "$HOSTTYPE", "$MACHTYPE", "$OSTYPE",
"$PIPESTATUS", "$SHELLOPTS", "$BASHOPTS",
},
.flags = .{
.numbers = true,
.strings = true,
.dquotes = true,
.squotes = true,
.backticks = true,
.uppercase = true,
},
},
//////////////////////////////////////////////////////////////////////////
//// gitconfig
.{
.ft_name = "gitconfig",
.ft_ext = &.{},
.ft_fntails = &.{ ".gitconfig", ".git/config" },
.lcmt = &.{ "#", ";" },
.mlcmt = null,
.keywords = &.{
"auto", "always", "never", "local", "global", "system",
"worktree",
},
.types = &.{
"core", "user", "remote", "branch",
"merge", "push", "pull", "fetch",
"alias", "color", "diff", "log",
"status", "commit", "tag", "rebase",
"rerere", "submodule", "credential", "http",
"https", "url", "init", "clone",
"gc", "fsck", "pack", "receive",
"transfer", "uploadpack", "uploadarchive", "advice",
"apply", "blame", "browser", "clean",
"column", "format", "grep", "gui",
"help", "i18n", "imap", "instaweb",
"interactive", "mailinfo", "mailmap", "man",
"notes", "pager", "pretty", "protocol",
"sendemail", "sequence", "showbranch", "web",
},
.builtin = &.{
"HEAD", "FETCH_HEAD", "ORIG_HEAD", "MERGE_HEAD",
"master", "main", "origin", "upstream",
"refs", "heads", "tags", "remotes",
},
.constant = &.{
"true", "false", "yes", "no", "on", "off",
},
.preproc = &.{
"name", "email",
"editor", "pager",
"excludesfile", "attributesfile",
"hooksPath", "templatedir",
"gitProxy", "sshCommand",
"askpass", "autocrlf",
"safecrlf", "filemode",
"ignorecase", "precomposeUnicode",
"hideDotFiles", "symlinks",
"bare", "worktree",
"logAllRefUpdates", "repositoryformatversion",
"sharedrepository", "denyCurrentBranch",
"denyNonFastforwards", "fsckObjects",
"transferFsckObjects", "receivefsckObjects",
"allowTipSHA1InWant", "allowReachableSHA1InWant",
"allowAnySHA1InWant", "advertiseRefs",
"allowUnadvertisedObjectRequest", "keepAlive",
"maxStartups", "timeout",
"uploadpack", "uploadarchive",
},
.flags = .{
.numbers = true,
.strings = true,
.dquotes = true,
.squotes = true,
},
},
};
const t = @import("types.zig");
New string functions
We'll complete our string
module with a new set of functions that we'll use
for syntax highlighting.
str.eql
We could just call mem.eql(u8, ...)
everywhere. It's just a shorthand.
I don't know it's good practice, but we'll call it many times and the meaning
is immediately obvious, so it's ok for me.
/// Return `true` if slices have the same content.
pub fn eql(a: []const u8, b: []const u8) bool {
return mem.eql(u8, a, b);
}
str.isTail
This is used for filetype detection.
/// Return `true` if the tail of haystack is exactly `needle`.
pub fn isTail(haystack: []const u8, needle: []const u8) bool {
const idx = mem.lastIndexOfLinear(u8, haystack, needle);
return idx != null and idx.? + needle.len == haystack.len;
}
str.getExtension
Also used for filetype detection.
/// Get the extension of a filename.
pub fn getExtension(path: []u8) ?[]u8 {
const ix = mem.lastIndexOfScalar(u8, path, '.');
if (ix == null or ix == path.len - 1) {
return null;
}
return path[ix.? + 1 ..];
}
str.isSeparator
This one is very similar to str.isWord
. It's actually the opposite. I only
make it a different function to be able to check whitespace before other
characters, since whitespace is the most common way to separate words, and
should be prioritized when deciding if something is a separator or not.
But I'm not sure it really makes a difference. If it doesn't, this function
should be removed and
!str.isWord()
would be used instead.
/// Return true if character is a separator (not a word character).
pub fn isSeparator(c: u8) bool {
if (c == ' ' or c == '\t') return true;
return switch (c) {
'0'...'9', 'a'...'z', 'A'...'Z', '_' => false,
else => true,
};
}
Digression: inline keyword
inline
with functions
The inline calling convention forces a function to be inlined at all call sites.
If the function cannot be inlined, it is a compile-time error.
This what the creator of the Zig language wrote:
It’s best to let the compiler decide when to inline a function, except for these scenarios:
- You want to change how many stack frames are in the call stack, for debugging purposes
- You want the comptime-ness of the arguments to propagate to the return value of the function
- Performance measurements demand it. Don’t guess!
Otherwise you actually end up restricting what the compiler is allowed to do
when you use inline
which can harm binary size, compilation speed, and even
runtime performance.
So basically he's recommending not to use it unless you have a good and measurable reason to do so.
Other uses of inline
From the official language reference:
Other uses of inline
are very different, because they usually allow loops to
be evaluated at compile time. I've never used them, since I never felt the need
for them, so I can't tell you more.
Filetype detection
We'll write a function named selectSyntax
to detect and set the buffer
syntax. This function will be invoked in two places:
- in
openFile()
:
B.filename = try e.updateString(B.filename, path);
B.syntax = try e.selectSyntax();
- in
saveFile()
, so that we can set a syntax for newly created files, after we give them a name:
B.syntax = try e.selectSyntax();
// determine number of bytes to write, make room for \n characters
var fsize: usize = B.rows.items.len;
The selectSyntax()
function
I put this in the "Syntax highlighting" section.
We start by freeing the old syntax, then we try to assign it again. For now unnamed buffers can't set a syntax, but it will be selected when the buffer is named and saved.
/// Return the syntax name for the current file, or null.
fn selectSyntax(e: *Editor) !?[]const u8 {
var B = &e.buffer;
// free the old syntax, if any
t.freeOptional(e.alc, B.syntax);
B.syntax = null;
// we might allow setting a syntax even without a filename, actually...
// but for now it's not possible
if (B.filename == null) {
return null;
}
// code to come...
}
We get the extension of the syntax, then we loop over all syntax definitions and we see if any of them matches for that extension.
If none of the extension matches, we match against the tail of the filename.
const fileExt = str.getExtension(B.filename.?);
for (&syndefs.Syntaxes) |*syntax| {
if (fileExt) |extension| {
for (syntax.ft_ext) |ext| {
if (str.eql(ext, extension)) {
B.syndef = syntax;
return try e.alc.dupe(u8, syntax.ft_name);
}
}
}
for (syntax.ft_fntails) |name| {
if (str.isTail(B.filename.?, name)) {
B.syndef = syntax;
return try e.alc.dupe(u8, syntax.ft_name);
}
}
}
return null;
Needed constants:
const syndefs = @import("syndefs.zig");
Doing the highlight
First, let's add a new option, which controls globally if syntax highlighting should be done or not:
/// Enable syntax highlighting
pub var syntax = true;
In updateHighlight
, we'll return early if the buffer has no filetype, or this
option is disabled.
// reset the row highlight to normal
row.hl = try e.alc.realloc(row.hl, row.render.len);
@memset(row.hl, .normal);
if (e.buffer.syntax == null or opt.syntax == false) {
return;
}
We do the highlight of the whole rendered row. This is certainly not ideal, because certain files have very long lines, and only a part of it is actually visible. At the same time, if we restrict parsing to only what we can see, we will certainly have bad highlight in all those cases where the highlight of a character depends on what precedes it, or even follows.
We could try to do it anyway and add some safety margin, both on the left and
the right side of the rendered part of the line, so that parsing starts before
coloff
and ends after coloff + screen.cols
, but it wouldn't be perfect
(think of very long line comments).
We could make it optional, to have a fast highlight mode, but we can't change options inside the editor.
Doing it properly would need some serious changes, but we'll pass this time. I said it is a toy editor for reasons, and this isn't the only one.
Top-level symbols
Before we start the loop that iterates all characters visible on screen, we define some constants and variables.
The most important one is prev_sep
: it controls when we can start to parse
something new. If this variable isn't set correctly where it needs to be,
highlighting of will be likely broken.
in_string
, which tells us if we're in a string or not, is checked early since
inside strings we should ignore everything else, except escaped characters (for
which we have an escaped
variable).
Similarly for in_mlcomment
: also in this case we don't parse anything until
we find the sequence that closes the comment.
//////////////////////////////////////////
// Top-level symbols
//////////////////////////////////////////
// length of the rendered row
const rowlen = row.render.len;
// syntax definition
const s = e.buffer.syndef.?;
// line comment leader
const lc = s.lcmt;
// multiline comment leaders
const mlc = s.mlcmt;
// syntax flags
const flags = s.flags;
// character is preceded by a separator
var prev_sep = true;
// character is preceded by a backslash
var escaped = false;
// character is inside a string or char literal
var in_string = false;
var in_char = false;
var delimiter: u8 = 0;
// line is in a multiline comment
var in_mlcomment = ix > 0 and e.buffer.rows.items[ix - 1].ml_comment;
// all keywords in the syntax definition, subdivided by kinds
// each kind has its own specific highlight
const all_syn_keywords = [_]struct {
kind: []const []const u8, // array with keywords of some kind
hl: t.Highlight,
}{
.{ .kind = s.keywords, .hl = t.Highlight.keyword },
.{ .kind = s.types, .hl = t.Highlight.types },
.{ .kind = s.builtin, .hl = t.Highlight.builtin },
.{ .kind = s.constant, .hl = t.Highlight.constant },
.{ .kind = s.preproc, .hl = t.Highlight.preproc },
};
The top-level loop
We'll have multiple nested loops, so we will use labels to break to an outer
loop. The top-level loop has the toplevel
label.
We'll use labels for all loops, and all break
and continue
statements. This
way it should be clearer from which loop we're breaking.
At the bottom of the top-level loop we'll increase the row index and set the
critical prev_sep
variable.
First thing we do is to skip whitespaces, which are also a valid separator.
var i: usize = 0;
toplevel: while (i < rowlen) {
if (asc.isWhitespace(row.render[i])) { // skip whitespaces
prev_sep = true;
i += 1;
continue :toplevel;
}
// rest of parsing goes here...
prev_sep = str.isSeparator(row.render[i]);
i += 1;
}
Multi-line comments
Remember that we had, when defining constants and variables:
// line is in a multiline comment
var in_mlcomment = ix > 0 and e.buffer.rows.items[ix - 1].ml_comment;
we have ML comments...
Our mlcmt
field is an optional field, so we must check if it's null
.
If not null
, it's a [3]u8
array with start marker, middle marker and
end marker.
// ML comments
if (mlc != null and mlc.?.len > 0 and !in_string) {
const mc = mlc.?;
we are in a ML comment...
... this we can know because in_mlcomment
is true if the previous row's
ml_comment
field is true.
In this case we paint the character as ML comment, and keep looking for the end marker.
if (in_mlcomment) {
const len = mc[2].len;
row.hl[i] = t.Highlight.mlcomment;
we do find the end marker...
... then in_mlcomment
becomes false. After the marker, normal parsing resumes
in this row.
We don't break out of the top-level loop, we continue it, because unlike line comments, multi-line ones can end in the same line where they started.
if (i + len <= rowlen and str.eql(row.render[i .. i + len], mc[2])) { // END
@memset(row.hl[i .. i + len], t.Highlight.mlcomment);
i += len;
in_mlcomment = false;
prev_sep = true;
continue :toplevel;
}
we don't find the end marker...
... then in_mlcomment
keeps being true also for this line. We keep painting
everything as ML comment.
else {
i += 1;
continue :toplevel;
}
}
we aren't in a ML comment yet...
... and we find the start marker. in_mlcomment
becomes true. From then
onwards, characters are painted as ML comment.
else {
const len = mc[0].len;
if (i + len <= rowlen and str.eql(row.render[i .. i + len], mc[0])) { // START
@memset(row.hl[i .. i + len], t.Highlight.mlcomment);
i += len;
in_mlcomment = true;
continue :toplevel;
}
}
}
Following row will have in_mlcomment
set to false.
A change in comment state triggers a chain update
Normally we only update the row that has changed, but for multi-line patterns, we must update following rows too, otherwise their highlight would stay the same.
We must keep updating following rows, until the value of in_mlcomment
matches
the value of row.ml_comment
: only in this case we know that the row wasn't
affected by the multi-line pattern. Only then we can stop the chain of row
updates.
This is done at the very bottom of the updateHighlight
function. Add it now,
so that you can have a clearer picture.
// If a multiline comment state has changed (either a comment started, or
// a previous one has been closed) we must update following the row, which
// will in turn update others, until all rows affected by the comment are
// updated.
const mlc_state_changed = row.ml_comment != in_mlcomment;
row.ml_comment = in_mlcomment;
if (mlc_state_changed and ix + 1 < e.buffer.rows.items.len) {
try e.updateHighlight(ix + 1);
}
If you still didn't get it, imagine 10 rows, no ML comments. Their
row.ml_comment
is false.
This chain update is probably inefficient, since after the rows that follow are
updated, they will be updated again when it's their turn in drawRows()
to be
updated. We could use a Buffer field to track how many lines could skip the
update, because they've been updated this way. We're not doing it, though.
Line comments
For line comments we just check we aren't in a string or in a multiline comment, and we look for the comment leader. If found, the rest of the line is a comment, no need to continue parsing this line.
// single-line comment
if (lc.len > 0 and !in_string and !in_mlcomment) {
for (lc) |ldr| {
if (i + ldr.len <= rowlen and str.eql(row.render[i .. i + ldr.len], ldr)) {
@memset(row.hl[i..], t.Highlight.comment);
break :toplevel;
}
}
}
Strings
Highlighting of strings is controlled by Syntax.flags.strings
, but that's not
enough. Syntaxes can support double quoted strings, single quoted strings,
backticks as strings or char literals, or more often a combination of them.
in_string
and in_char
differ because the highlight is different (string
vs number). Moreover different delimiters must be handled independently: if
a double quote is found and a string starts, a single quote after that is still
part of the string. Same is true for double quotes after single quotes.
Whatever the delimiter and the string type, an escaped character is an escaped
character, and it gets the .escape
highlight, together with the escaping
backslash.
If the start of a string or a char literal is found, delimiter
is set to
the character, and the appropriate highlight is set until delimiter
is found
again.
Multi-line strings aren't supported.
if (flags.strings) {
if (in_string or in_char) {
if (escaped or row.render[i] == '\\') {
escaped = !escaped;
row.hl[i] = t.Highlight.escape;
}
else {
row.hl[i] = if (in_char) t.Highlight.number else t.Highlight.string;
if (row.render[i] == delimiter) {
in_string = false;
in_char = false;
}
}
i += 1;
continue :toplevel;
}
else if (flags.dquotes and row.render[i] == '"'
or flags.squotes and row.render[i] == '\''
or flags.backticks and row.render[i] == '`') {
in_string = true;
delimiter = row.render[i];
row.hl[i] = t.Highlight.string;
i += 1;
continue :toplevel;
}
else if (flags.chars and row.render[i] == '\'') {
in_char = true;
delimiter = row.render[i];
row.hl[i] = t.Highlight.number;
i += 1;
continue :toplevel;
}
}
Numbers
Parsing numbers depends on syntax flags: different filetypes support different number formats. The formats we support are:
type | format | flag |
---|---|---|
integers | N | numbers |
floats | N.N([eE]N)? | numbers |
hex | 0[xX]N | hex |
octal | 0[oO]N | octal |
binary | 0[bB]N | bin |
Integers and floats are always parsed if flags.numbers
is true.
First we check if it's some special number notation (hex, octal binary). If true, we set the appropriate boolean variable and advance the index by 2 characters.
// numbers
if (flags.numbers and prev_sep) {
var prev_digit = false;
var is_float = false;
var has_exp = false;
var is_hex = false;
var is_bin = false;
var is_octal = false;
var NaN = false;
const begin = i;
// hex, binary, octal notations
if (i + 1 < rowlen) {
if (row.render[i] == '0') {
switch (row.render[i + 1]) {
'x', 'X' => if (flags.hex) {
is_hex = true;
i += 2;
},
'b', 'B' => if (flags.bin) {
is_bin = true;
i += 2;
},
'o', 'O' => if (flags.octal) {
is_octal = true;
i += 2;
},
else => {},
}
}
}
Then we parse the actual number. What counts as a digit depends on the type that has been detected. If it's not a special notation, we only accept digits and a dot.
The variable prev_digit
is true if the previous character was a valid digit
for the type. This variable must be true at the end of the parsing, or this
simply isn't a number.
If flags.uscn
is true, we also accept underscores as separator. They are part
of the number, but they aren't digits themselves, so if they aren't followed by
a digit, it won't be a number.
For the dot, it's similar: it requires to be followed by a digit, otherwise
it's a simple separator. Not only that, but there can be only one dot in the
number. Finding a dot the first time sets is_float
, finding it twice means
it's not a number.
Same goes for e/E
(exponent): they must be followed by digits, and may not
appear more than once. If it's a hex digit, though, e/E
are digits, not
exponents.
// accept consecutive digits, or a dot followed by a number
digits: while (true) : (i += 1) {
if (i == rowlen) break :digits;
switch (row.render[i]) {
'0'...'1' => prev_digit = true,
// invalid for binary numbers
'2'...'7' => {
if (!is_bin) {
prev_digit = true;
}
else {
prev_digit = false;
break :digits;
}
},
// invalid for binary and octal numbers
'8'...'9' => {
if (!is_bin and !is_octal) {
prev_digit = true;
}
else {
prev_digit = false;
break :digits;
}
},
// underscores as delimiters in numeric literals
'_' => {
if (prev_digit and flags.uscn) {
prev_digit = false;
}
else {
break :digits;
}
},
// could be an exponent, or a hex digit
'e', 'E' => {
if (is_float and !has_exp) {
has_exp = true;
prev_digit = false;
}
else if (is_hex) {
prev_digit = true;
}
else {
break :digits;
}
},
// hex digits
'a'...'d', 'f', 'A'...'D', 'F' => {
if (is_hex) prev_digit = true else break :digits;
},
// floating point
'.' => {
prev_sep = true;
prev_digit = false;
if (!is_float and !is_hex and !is_bin) {
is_float = true;
}
else {
break :digits;
}
},
else => break :digits,
}
}
After the loop ends, because a character has been found that is not valid for the type of number, we check if it's actually a number:
- last character must be a valid digit
- it must be followed by either a separator or end of line
We must also set the very important prev_sep
variable, which controls whether
the following characters may be parsed as new tokens, or as part of the
previous one, or not at all. In this case, since we only have keywords left to
parse, if this is false
it will effectively end the parsing of the line.
If end of line has been reached we stop.
// previous separator could be invalid if any character was
// processed
prev_sep = i == begin or str.isSeparator(row.render[i - 1]);
// no matter the type of number, last character should be a digit
if (!prev_digit) {
NaN = true;
}
// after our number comes something that isn't a separator
else if (i != rowlen and !str.isSeparator(row.render[i])) {
NaN = true;
}
if (!NaN) {
for (begin..i) |idx| {
row.hl[idx] = t.Highlight.number;
}
}
}
if (i == rowlen) break :toplevel;
Keywords
Remember the constant we set before the top-level loop started:
// all keywords in the syntax definition, subdivided by kinds
// each kind has its own specific highlight
const all_syn_keywords = [_]struct {
kind: []const []const u8, // array with keywords of some kind
hl: t.Highlight,
}{
.{ .kind = s.keywords, .hl = t.Highlight.keyword },
.{ .kind = s.types, .hl = t.Highlight.types },
.{ .kind = s.builtin, .hl = t.Highlight.builtin },
.{ .kind = s.constant, .hl = t.Highlight.constant },
.{ .kind = s.preproc, .hl = t.Highlight.preproc },
};
Now we iterate this array. Each element of this array is a set of keywords
([]const []const u8
), together with the highlight which they will use.
We loop each of these sets of keyword: if we find that what follows the current position is the keyword, we set the highlight and advance the position by the keyword length.
// keywords
if (prev_sep) {
kwloop: for (all_syn_keywords) |keywords| {
for (keywords.kind) |kw| {
const kwend = i + kw.len; // index where keyword would end
// separator or end of row after keyword
if ((kwend < rowlen and str.isSeparator(row.render[kwend]))
or kwend == rowlen)
{
if (str.eql(row.render[i..kwend], kw)) {
@memset(row.hl[i..kwend], keywords.hl);
i += kw.len;
break :kwloop;
}
}
}
}
Uppercase words
Similar process, but we don't loop any array, we just check if there's a sequence of uppercase characters or underscores.
if (flags.uppercase) {
var upper = false;
const begin = i;
upp: while (i < rowlen and !str.isSeparator(row.render[i])) {
if (!asc.isUpper(row.render[i]) and row.render[i] != '_') {
upper = false;
break :upp;
}
upper = true;
i += 1;
}
if (upper and i - begin > 1) {
@memset(row.hl[begin..i], t.Highlight.uppercase);
}
}
prev_sep = false;
continue :toplevel;
}
Conclusion
I hope you found it interesting and/or useful.
Credits and thanks:
- Andrew Kelley for creating the Zig programming language
- Users of the Ziggit forum for answering questions
- Especially user @vulpesx for advices on making the code more idiomatic for Zig
- Paige Ruten, the writer of the original booklet
- Salvatore Sanfilippo, the author of the original kilo editor
With that I don't mean the code is perfect, as I wrote in the introduction, but I did my best.
If you find mistakes, oversights or bad/wrong explanations of language concepts/features, please post an issue.
Thanks for reading.