I start paying attention to the parsing of my little lang TablaM with the desire of improving the display of errors. This leads me to a rabbit hole where re-architecting the parser I need a better way to store the AST in a Tree.
Of course, I check several crates for this, but then I remember a different way to do them!
In Rust, building graphs/trees is challengin, because of the way it enforces the ownership of data. Luckily, it pushes towards a more simple & performant pattern that is HOT! HOT! HOT! on Rust:
The Arena design pattern.
Several crates exploit it for great effect, however differently to how I see is done elsewhere when applied to Trees, with the arenas-ids are nested with enums like:
struct Node<T> {
parent: Option<NodeId>,
prev_sibling: Option<NodeId>,
next_sibling: Option<NodeId>,
children: Option<(NodeId, NodeId)>,
value: T,
}
This one is inspired by the talk “High-performance Tree Wrangling, the APL Way”, where the tree is made with vectors.
So, a Tree:
. Users
├── jhon_doe
├ ├── file1.rs
├ ├── file2.rs
├── jane_doe
└────── cat.jpg
... is stored in pre-order on 3 vectors, that store the data, the level/deep & the parent node:
DATA: | Users | jhon_doe | file1.rs | file2.rs | jane_doe | cat.jpg |
---|---|---|---|---|---|---|
LEVEl: | 0 | 1 | 2 | 2 | 1 | 2 |
PARENT: | 0 | 0 | 1 | 1 | 0 | 4 |
This means that all the searching/iterating/navigation is linear on the vectors.
Similar to things like bumpalo, TreeFlat
is not made to cover all the needs for a tree, is just a one-trick pony that is tailored for the use-case of a pre-order tree, and only know how to push
nodes at the end.
With this, instead of "jump" chashing pointers/ids the default iteration is purely linear:
impl<'a, T: Debug> Iterator for TreeIter<'a, T> {
type Item = Node<'a, T>;
fn next(&mut self) -> Option<Self::Item> {
let id = NodeId(self.pos);
self.pos += 1;
if self.pos <= self.tree.len() {
Some(self.tree._make_node(id))
} else {
None
}
}
}
More interesting, navigating the children, parents can exploit this observation:
So this means that in the case of navigating the children of jhon_doe
:
. Users
⇡ parents
├── jhon_doe = Index: 1, Level: 1
⇩ children start at
jhon_doe + 1,
level > jhon_doe
├ ├── file1.rs ?: Level 2 is child!
├ ├── file2.rs ?: Level 2 is child!
├── jane_doe ?: Level 1 is below, stop!
└────── cat.jpg
With this, instead of searching a potentially large array, it jumps directly after the node and iterates as long the nodes are above it!.
For the first time, I tried to check if the performance claims of the APL way were real, and if I manage to capture it.
I mean: not only the first time on Rust doing this, first time since I have memory I apply a "proper" benchmarking tool more sophisticated than "print the elapsed time in a function". You see, I work mostly in business apps, where I don't need to worry about nanoseconds of difference. With Rust, is another level...
To measure, I use the excellent Criterion.
I also pick the excellent crate ego-tree to compare against... because it already stores the data flat in a vector, the API is almost similar; just the node building is like the "obvious" way. ego-tree
is more mature and battle-tested, so that is an excellent candidate to test ambitions (you wanna test against the best!).
Doing it I also hit several bugs in my implementation, and also get my dream crushed when I see that ego-tree
brutally defeat my code... and here is where criterion
saves my day!.
Looking at the graphs I see something suspicious: All the ego-tree
tests show a "flat" line in the execution graph, so yeah, my initial code for the benchmarking was VERY wrong.
I finally make sure to use the same code for both, and now I get more comparable results.
(All this is still some silly benchmarks made by an amateur, so check yourself before believing!).
For example, iterating by children show a nice improvement thanks to the combination of tricks:
Building a tree of 47.370 nodes, with 10 levels, like:
├── 379
├ ├── 380
├ ├ ├── 381
├ ├ ├── 382
├ ├ ├── 383
├ ├ ├── 384
├ ├ ├── 385
├ ├ ├── 386
├ ├ ├── 387
├ ├ ├── 388
├ ├ ├── 389
├ ├ ├ ├── 390
├ ├ ├ ├── 391
├ ├ ├ ├── 392
├ ├ ├ ├── 393
├ ├ ├ ├── 394
├ ├ ├ ├── 395
├ ├ ├ ├── 396
├ ├ ├ ├── 397
├ ├ ├ ├── 398
├ ├ ├ ├ ├── 399
├ ├ ├ ├ ├── 400
├ ├ ├ ├ ├── 401
├ ├ ├ ├ ├── 402
├ ├ ├ ├ ├── 403
├ ├ ├ ├ ├── 404
├ ├ ├ ├ ├── 405
├ ├ ├ ├ ├── 406
├ ├ ├ ├ ├── 407
├ ├ ├ ├ ├ ├── 408
├ ├ ├ ├ ├ ├── 409
├ ├ ├ ├ ├ ├── 410
├ ├ ├ ├ ├ ├── 411
├ ├ ├ ├ ├ ├── 412
├ ├ ├ ├ ├ ├── 413
├ ├ ├ ├ ├ ├── 414
├ ├ ├ ├ ├ ├── 415
├ ├ ├ ├ ├ ├── 416
├ ├ ├ ├ ├ ├ ├── 417
├ ├ ├ ├ ├ ├ ├── 418
├ ├ ├ ├ ├ ├ ├── 419
├ ├ ├ ├ ├ ├ ├── 420
├ ├ ├ ├ ├ ├ ├── 421
├ ├ ├ ├ ├ ├ ├── 422
├ ├ ├ ├ ├ ├ ├── 423
├ ├ ├ ├ ├ ├ ├── 424
├ ├ ├ ├ ├ ├ ├── 425
├ ├ ├ ├ ├ ├ ├ ├── 426
├ ├ ├ ├ ├ ├ ├ ├── 427
├ ├ ├ ├ ├ ├ ├ ├── 428
├ ├ ├ ├ ├ ├ ├ ├── 429
├ ├ ├ ├ ├ ├ ├ ├── 430
├ ├ ├ ├ ├ ├ ├ ├── 431
├ ├ ├ ├ ├ ├ ├ ├── 432
├ ├ ├ ├ ├ ├ ├ ├── 433
├ ├ ├ ├ ├ ├ ├ ├── 434
├ ├ ├ ├ ├ ├ ├ ├ ├── 435
├ ├ ├ ├ ├ ├ ├ ├ ├── 436
├ ├ ├ ├ ├ ├ ├ ├ ├── 437
├ ├ ├ ├ ├ ├ ├ ├ ├── 438
├ ├ ├ ├ ├ ├ ├ ├ ├── 439
├ ├ ├ ├ ├ ├ ├ ├ ├── 440
├ ├ ├ ├ ├ ├ ├ ├ ├── 441
├ ├ ├ ├ ├ ├ ├ ├ ├── 442
├ ├ ├ ├ ├ ├ ├ ├ ├── 443
├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 444
├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 445
├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 446
├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 447
├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 448
├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 449
├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 450
├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 451
├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 452
├ ├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 453
├ ├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 454
├ ├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 455
├ ├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 456
├ ├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 457
├ ├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 458
├ ├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 459
├ ├ ├ ├ ├ ├ ├ ├ ├ ├ ├── 460
├── 469
Iter Tree children/Ego/4 #5
time: [174.10 us 176.36 us 178.90 us]
thrpt: [558.98 Kelem/s 567.01 Kelem/s 574.37 Kelem/s]
Iter Tree children/Flat/4 #5
time: [32.950 us 33.451 us 33.962 us]
thrpt: [2.9445 Melem/s 2.9894 Melem/s 3.0349 Melem/s]
MacBook Air (M1, 2020) : 16GB
The crate is at crates.io: tree-flat, & the code is at repo: tree-flat.