Commit fe9b840
perf: UTF-8 fast path, pre-allocated output, flattened binary detection
- Add UTF-8 fast path in decode_to_string: skip decoder loop entirely
when encoding is UTF-8 and input validates (common case for modern web)
- Pre-allocate output String with html.len() capacity to avoid reallocs
- Increase decode buffer from 2048 to 8192 for large (>15KB) documents,
reducing decode loop iterations by 4x
- Flatten is_binary_file: replace double PHF lookup (first byte -> string
key -> magic bytes) with sorted static table + binary search on first
byte, eliminating string hashing entirely
- Keep PHF-based is_binary_file_phf for backwards compat, add parity test
- Add UTF-8 fast path tests (small + 20KB large)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 2151f53 commit fe9b840
3 files changed
Lines changed: 107 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
3 | 26 | | |
4 | 27 | | |
5 | 28 | | |
6 | 29 | | |
7 | 30 | | |
8 | 31 | | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
9 | 52 | | |
10 | 53 | | |
11 | 54 | | |
| |||
75 | 118 | | |
76 | 119 | | |
77 | 120 | | |
78 | | - | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
79 | 125 | | |
80 | 126 | | |
81 | 127 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
81 | 90 | | |
82 | 91 | | |
| 92 | + | |
83 | 93 | | |
84 | 94 | | |
85 | 95 | | |
86 | | - | |
87 | 96 | | |
88 | 97 | | |
89 | 98 | | |
| |||
113 | 122 | | |
114 | 123 | | |
115 | 124 | | |
116 | | - | |
117 | | - | |
118 | 125 | | |
119 | 126 | | |
120 | 127 | | |
121 | 128 | | |
122 | | - | |
| 129 | + | |
123 | 130 | | |
124 | 131 | | |
125 | 132 | | |
126 | | - | |
| 133 | + | |
127 | 134 | | |
128 | 135 | | |
129 | 136 | | |
130 | 137 | | |
131 | 138 | | |
132 | 139 | | |
133 | 140 | | |
| 141 | + | |
| 142 | + | |
134 | 143 | | |
135 | 144 | | |
136 | 145 | | |
| |||
381 | 390 | | |
382 | 391 | | |
383 | 392 | | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
384 | 438 | | |
385 | 439 | | |
386 | 440 | | |
| |||
0 commit comments