El malabarista

Motivation

I start paying attention to the parsing of my little lang TablaM with the desire of improving the display of errors. This leads me to a rabbit hole where re-architecting the parser I need a better way to store the AST in a Tree.

Of course, I check several crates for this, but then I remember a different way to do them!

The Arena is the new pointer on Rust

In Rust, building graphs/trees is challengin, because of the way it enforces the ownership of data. Luckily, it pushes towards a more simple & performant pattern that is HOT! HOT! HOT! on Rust:

The Arena design pattern.

Several crates exploit it for great effect, however differently to how I see is done elsewhere when applied to Trees, with the arenas-ids are nested with enums like:

struct Node<T> {
    parent: Option<NodeId>,
    prev_sibling: Option<NodeId>,
    next_sibling: Option<NodeId>,
    children: Option<(NodeId, NodeId)>,
    value: T,
}

This one is inspired by the talk “High-performance Tree Wrangling, the APL Way”, where the tree is made with vectors.

So, a Tree:

. Users
├── jhon_doe
├   ├── file1.rs
├   ├── file2.rs
├── jane_doe
└────── cat.jpg

... is stored in pre-order on 3 vectors, that store the data, the level/deep & the parent node:

DATA:	Users	jhon_doe	file1.rs	file2.rs	jane_doe	cat.jpg
LEVEl:	0	1	2	2	1	2
PARENT:	0	0	1	1	0	4

This means that all the searching/iterating/navigation is linear on the vectors.

Fix the data layout, and the algorithms are simpler

Similar to things like bumpalo, TreeFlat is not made to cover all the needs for a tree, is just a one-trick pony that is tailored for the use-case of a pre-order tree, and only know how to push nodes at the end.

With this, instead of "jump" chashing pointers/ids the default iteration is purely linear:

impl<'a, T: Debug> Iterator for TreeIter<'a, T> {
    type Item = Node<'a, T>;

    fn next(&mut self) -> Option<Self::Item> {
        let id = NodeId(self.pos);
        self.pos += 1;
        if self.pos <= self.tree.len() {
            Some(self.tree._make_node(id))
        } else {
            None
        }
    }
}

More interesting, navigating the children, parents can exploit this observation:

The children are at the right/up of the parent
The parents are at the left/down of the children
The siblings are all that share the same level

So this means that in the case of navigating the children of jhon_doe:

. Users				
		 ⇡ parents

├── jhon_doe	= Index: 1, Level: 1

		 ⇩ children start at 
			jhon_doe + 1,
			level 	 > jhon_doe

├   ├── file1.rs ?: Level 2 is child!
├   ├── file2.rs ?: Level 2 is child!
├── jane_doe	 ?: Level 1 is below, stop!

└────── cat.jpg

With this, instead of searching a potentially large array, it jumps directly after the node and iterates as long the nodes are above it!.

Performance?

For the first time, I tried to check if the performance claims of the APL way were real, and if I manage to capture it.

I mean: not only the first time on Rust doing this, first time since I have memory I apply a "proper" benchmarking tool more sophisticated than "print the elapsed time in a function". You see, I work mostly in business apps, where I don't need to worry about nanoseconds of difference. With Rust, is another level...

To measure, I use the excellent Criterion.

I also pick the excellent crate ego-tree to compare against... because it already stores the data flat in a vector, the API is almost similar; just the node building is like the "obvious" way. ego-tree is more mature and battle-tested, so that is an excellent candidate to test ambitions (you wanna test against the best!).

Doing it I also hit several bugs in my implementation, and also get my dream crushed when I see that ego-tree brutally defeat my code... and here is where criterion saves my day!.

Looking at the graphs I see something suspicious: All the ego-tree tests show a "flat" line in the execution graph, so yeah, my initial code for the benchmarking was VERY wrong.

I finally make sure to use the same code for both, and now I get more comparable results.

(All this is still some silly benchmarks made by an amateur, so check yourself before believing!).

For example, iterating by children show a nice improvement thanks to the combination of tricks:

Building a tree of 47.370 nodes, with 10 levels, like:

├── 379
├   ├── 380
├   ├   ├── 381
├   ├   ├── 382
├   ├   ├── 383
├   ├   ├── 384
├   ├   ├── 385
├   ├   ├── 386
├   ├   ├── 387
├   ├   ├── 388
├   ├   ├── 389
├   ├   ├   ├── 390
├   ├   ├   ├── 391
├   ├   ├   ├── 392
├   ├   ├   ├── 393
├   ├   ├   ├── 394
├   ├   ├   ├── 395
├   ├   ├   ├── 396
├   ├   ├   ├── 397
├   ├   ├   ├── 398
├   ├   ├   ├   ├── 399
├   ├   ├   ├   ├── 400
├   ├   ├   ├   ├── 401
├   ├   ├   ├   ├── 402
├   ├   ├   ├   ├── 403
├   ├   ├   ├   ├── 404
├   ├   ├   ├   ├── 405
├   ├   ├   ├   ├── 406
├   ├   ├   ├   ├── 407
├   ├   ├   ├   ├   ├── 408
├   ├   ├   ├   ├   ├── 409
├   ├   ├   ├   ├   ├── 410
├   ├   ├   ├   ├   ├── 411
├   ├   ├   ├   ├   ├── 412
├   ├   ├   ├   ├   ├── 413
├   ├   ├   ├   ├   ├── 414
├   ├   ├   ├   ├   ├── 415
├   ├   ├   ├   ├   ├── 416
├   ├   ├   ├   ├   ├   ├── 417
├   ├   ├   ├   ├   ├   ├── 418
├   ├   ├   ├   ├   ├   ├── 419
├   ├   ├   ├   ├   ├   ├── 420
├   ├   ├   ├   ├   ├   ├── 421
├   ├   ├   ├   ├   ├   ├── 422
├   ├   ├   ├   ├   ├   ├── 423
├   ├   ├   ├   ├   ├   ├── 424
├   ├   ├   ├   ├   ├   ├── 425
├   ├   ├   ├   ├   ├   ├   ├── 426
├   ├   ├   ├   ├   ├   ├   ├── 427
├   ├   ├   ├   ├   ├   ├   ├── 428
├   ├   ├   ├   ├   ├   ├   ├── 429
├   ├   ├   ├   ├   ├   ├   ├── 430
├   ├   ├   ├   ├   ├   ├   ├── 431
├   ├   ├   ├   ├   ├   ├   ├── 432
├   ├   ├   ├   ├   ├   ├   ├── 433
├   ├   ├   ├   ├   ├   ├   ├── 434
├   ├   ├   ├   ├   ├   ├   ├   ├── 435
├   ├   ├   ├   ├   ├   ├   ├   ├── 436
├   ├   ├   ├   ├   ├   ├   ├   ├── 437
├   ├   ├   ├   ├   ├   ├   ├   ├── 438
├   ├   ├   ├   ├   ├   ├   ├   ├── 439
├   ├   ├   ├   ├   ├   ├   ├   ├── 440
├   ├   ├   ├   ├   ├   ├   ├   ├── 441
├   ├   ├   ├   ├   ├   ├   ├   ├── 442
├   ├   ├   ├   ├   ├   ├   ├   ├── 443
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 444
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 445
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 446
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 447
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 448
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 449
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 450
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 451
├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 452
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 453
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 454
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 455
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 456
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 457
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 458
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 459
├   ├   ├   ├   ├   ├   ├   ├   ├   ├   ├── 460
├── 469

Iter Tree children/Ego/4 #5
                        time:   [174.10 us 176.36 us 178.90 us]
                        thrpt:  [558.98 Kelem/s 567.01 Kelem/s 574.37 Kelem/s]
Iter Tree children/Flat/4 #5
                        time:   [32.950 us 33.451 us 33.962 us]
                        thrpt:  [2.9445 Melem/s 2.9894 Melem/s 3.0349 Melem/s]

MacBook Air (M1, 2020) : 16GB

Get the code!

The crate is at crates.io: tree-flat, & the code is at repo: tree-flat.