1 files changed, 312 insertions, 157 deletions
diff --git a/vendor/github.com/klauspost/compress/s2/README.md b/vendor/github.com/klauspost/compress/s2/README.md
index 73c0c462..8284bb08 100644
--- a/vendor/github.com/klauspost/compress/s2/README.md
+++ b/vendor/github.com/klauspost/compress/s2/README.md
@@ -20,11 +20,12 @@ This is important, so you don't have to worry about spending CPU cycles on alrea
 * Concurrent stream compression
 * Faster decompression, even for Snappy compatible content
 * Concurrent Snappy/S2 stream decompression
-* Ability to quickly skip forward in compressed stream
+* Skip forward in compressed stream
 * Random seeking with indexes
 * Compatible with reading Snappy compressed content
 * Smaller block size overhead on incompressible blocks
 * Block concatenation
+* Block Dictionary support
 * Uncompressed stream mode
 * Automatic stream size padding
 * Snappy compatible block compression
@@ -325,35 +326,35 @@ The content compressed in this mode is fully compatible with the standard decode
 
 Snappy vs S2 **compression** speed on 16 core (32 thread) computer, using all threads and a single thread (1 CPU):
 
-| File                                                                                                | S2 speed | S2 Throughput | S2 % smaller | S2 "better" | "better" throughput | "better" % smaller |
-|-----------------------------------------------------------------------------------------------------|----------|---------------|--------------|-------------|---------------------|--------------------|
-| [rawstudio-mint14.tar](https://files.klauspost.com/compress/rawstudio-mint14.7z)                    | 12.70x   | 10556 MB/s    | 7.35%        | 4.15x       | 3455 MB/s           | 12.79%             |
-| (1 CPU)                                                                                             | 1.14x    | 948 MB/s      | -            | 0.42x       | 349 MB/s            | -                  |
-| [github-june-2days-2019.json](https://files.klauspost.com/compress/github-june-2days-2019.json.zst) | 17.13x   | 14484 MB/s    | 31.60%       | 10.09x      | 8533 MB/s           | 37.71%             |
-| (1 CPU)                                                                                             | 1.33x    | 1127 MB/s     | -            | 0.70x       | 589 MB/s            | -                  |
-| [github-ranks-backup.bin](https://files.klauspost.com/compress/github-ranks-backup.bin.zst)         | 15.14x   | 12000 MB/s    | -5.79%       | 6.59x       | 5223 MB/s           | 5.80%              |
-| (1 CPU)                                                                                             | 1.11x    | 877 MB/s      | -            | 0.47x       | 370 MB/s            | -                  |
-| [consensus.db.10gb](https://files.klauspost.com/compress/consensus.db.10gb.zst)                     | 14.62x   | 12116 MB/s    | 15.90%       | 5.35x       | 4430 MB/s           | 16.08%             |
-| (1 CPU)                                                                                             | 1.38x    | 1146 MB/s     | -            | 0.38x       | 312 MB/s            | -                  |
-| [adresser.json](https://files.klauspost.com/compress/adresser.json.zst)                             | 8.83x    | 17579 MB/s    | 43.86%       | 6.54x       | 13011 MB/s          | 47.23%             |
-| (1 CPU)                                                                                             | 1.14x    | 2259 MB/s     | -            | 0.74x       | 1475 MB/s           | -                  |
-| [gob-stream](https://files.klauspost.com/compress/gob-stream.7z)                                    | 16.72x   | 14019 MB/s    | 24.02%       | 10.11x      | 8477 MB/s           | 30.48%             |
-| (1 CPU)                                                                                             | 1.24x    | 1043 MB/s     | -            | 0.70x       | 586 MB/s            | -                  |
-| [10gb.tar](http://mattmahoney.net/dc/10gb.html)                                                     | 13.33x   | 9254 MB/s     | 1.84%        | 6.75x       | 4686 MB/s           | 6.72%              |
-| (1 CPU)                                                                                             | 0.97x    | 672 MB/s      | -            | 0.53x       | 366 MB/s            | -                  |
-| sharnd.out.2gb                                                                                      | 2.11x    | 12639 MB/s    | 0.01%        | 1.98x       | 11833 MB/s          | 0.01%              |
-| (1 CPU)                                                                                             | 0.93x    | 5594 MB/s     | -            | 1.34x       | 8030 MB/s           | -                  |
-| [enwik9](http://mattmahoney.net/dc/textdata.html)                                                   | 19.34x   | 8220 MB/s     | 3.98%        | 7.87x       | 3345 MB/s           | 15.82%             |
-| (1 CPU)                                                                                             | 1.06x    | 452 MB/s      | -            | 0.50x       | 213 MB/s            | -                  |
-| [silesia.tar](http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip)                                    | 10.48x   | 6124 MB/s     | 5.67%        | 3.76x       | 2197 MB/s           | 12.60%             |
-| (1 CPU)                                                                                             | 0.97x    | 568 MB/s      | -            | 0.46x       | 271 MB/s            | -                  |
-| [enwik10](https://encode.su/threads/3315-enwik10-benchmark-results)                                 | 21.07x   | 9020 MB/s     | 6.36%        | 6.91x       | 2959 MB/s           | 16.95%             |
-| (1 CPU)                                                                                             | 1.07x    | 460 MB/s      | -            | 0.51x       | 220 MB/s            | -                  |
+| File                                                                                                    | S2 Speed | S2 Throughput | S2 % smaller | S2 "better" | "better" throughput | "better" % smaller |
+|---------------------------------------------------------------------------------------------------------|----------|---------------|--------------|-------------|---------------------|--------------------|
+| [rawstudio-mint14.tar](https://files.klauspost.com/compress/rawstudio-mint14.7z)                        | 16.33x   | 10556 MB/s    | 8.0%         | 6.04x       | 5252 MB/s           | 14.7%              |
+| (1 CPU)                                                                                                 | 1.08x    | 940 MB/s      | -            | 0.46x       | 400 MB/s            | -                  |
+| [github-june-2days-2019.json](https://files.klauspost.com/compress/github-june-2days-2019.json.zst)     | 16.51x   | 15224 MB/s    | 31.70%       | 9.47x       | 8734 MB/s           | 37.71%             |
+| (1 CPU)                                                                                                 | 1.26x    | 1157 MB/s     | -            | 0.60x       | 556 MB/s            | -                  |
+| [github-ranks-backup.bin](https://files.klauspost.com/compress/github-ranks-backup.bin.zst)             | 15.14x   | 12598 MB/s    | -5.76%       | 6.23x       | 5675 MB/s           | 3.62%              |
+| (1 CPU)                                                                                                 | 1.02x    | 932 MB/s      | -            | 0.47x       | 432 MB/s            | -                  |
+| [consensus.db.10gb](https://files.klauspost.com/compress/consensus.db.10gb.zst)                         | 11.21x   | 12116 MB/s    | 15.95%       | 3.24x       | 3500 MB/s           | 18.00%             |
+| (1 CPU)                                                                                                 | 1.05x    | 1135 MB/s     | -            | 0.27x       | 292 MB/s            | -                  |
+| [apache.log](https://files.klauspost.com/compress/apache.log.zst)                                       | 8.55x    | 16673 MB/s    | 20.54%       | 5.85x       | 11420 MB/s          | 24.97%             |
+| (1 CPU)                                                                                                 | 1.91x    | 1771 MB/s     | -            | 0.53x       | 1041 MB/s           | -                  |
+| [gob-stream](https://files.klauspost.com/compress/gob-stream.7z)                                        | 15.76x   | 14357 MB/s    | 24.01%       | 8.67x       | 7891 MB/s           | 33.68%             |
+| (1 CPU)                                                                                                 | 1.17x    | 1064 MB/s     | -            | 0.65x       | 595 MB/s            | -                  |
+| [10gb.tar](http://mattmahoney.net/dc/10gb.html)                                                         | 13.33x   | 9835 MB/s     | 2.34%        | 6.85x       | 4863 MB/s           | 9.96%              |
+| (1 CPU)                                                                                                 | 0.97x    | 689 MB/s      | -            | 0.55x       | 387 MB/s            | -                  |
+| sharnd.out.2gb                                                                                          | 9.11x    | 13213 MB/s    | 0.01%        | 1.49x       | 9184 MB/s           | 0.01%              |
+| (1 CPU)                                                                                                 | 0.88x    | 5418 MB/s     | -            | 0.77x       | 5417 MB/s           | -                  |
+| [sofia-air-quality-dataset csv](https://files.klauspost.com/compress/sofia-air-quality-dataset.tar.zst) | 22.00x   | 11477 MB/s    | 18.73%       | 11.15x      | 5817 MB/s           | 27.88%             |
+| (1 CPU)                                                                                                 | 1.23x    | 642 MB/s      | -            | 0.71x       | 642 MB/s            | -                  |
+| [silesia.tar](http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip)                                        | 11.23x   | 6520 MB/s     | 5.9%         | 5.35x       | 3109 MB/s           | 15.88%             |
+| (1 CPU)                                                                                                 | 1.05x    | 607 MB/s      | -            | 0.52x       | 304 MB/s            | -                  |
+| [enwik9](https://files.klauspost.com/compress/enwik9.zst)                                               | 19.28x   | 8440 MB/s     | 4.04%        | 9.31x       | 4076 MB/s           | 18.04%             |
+| (1 CPU)                                                                                                 | 1.12x    | 488 MB/s      | -            | 0.57x       | 250 MB/s            | -                  |
 
 ### Legend
 
-* `S2 speed`: Speed of S2 compared to Snappy, using 16 cores and 1 core.
-* `S2 throughput`: Throughput of S2 in MB/s. 
+* `S2 Speed`: Speed of S2 compared to Snappy, using 16 cores and 1 core.
+* `S2 Throughput`: Throughput of S2 in MB/s. 
 * `S2 % smaller`: How many percent of the Snappy output size is S2 better.
 * `S2 "better"`: Speed when enabling "better" compression mode in S2 compared to Snappy. 
 * `"better" throughput`: Speed when enabling "better" compression mode in S2 compared to Snappy. 
@@ -361,7 +362,7 @@ Snappy vs S2 **compression** speed on 16 core (32 thread) computer, using all th
 
 There is a good speedup across the board when using a single thread and a significant speedup when using multiple threads.
 
-Machine generated data gets by far the biggest compression boost, with size being being reduced by up to 45% of Snappy size.
+Machine generated data gets by far the biggest compression boost, with size being reduced by up to 35% of Snappy size.
 
 The "better" compression mode sees a good improvement in all cases, but usually at a performance cost.
 
@@ -404,15 +405,15 @@ The "better" compression mode will actively look for shorter matches, which is w
 Without assembly decompression is also very fast; single goroutine decompression speed. No assembly:
 
 | File                           | S2 Throughput | S2 throughput |
-|--------------------------------|--------------|---------------|
-| consensus.db.10gb.s2           | 1.84x        | 2289.8 MB/s   |
-| 10gb.tar.s2                    | 1.30x        | 867.07 MB/s   |
-| rawstudio-mint14.tar.s2        | 1.66x        | 1329.65 MB/s  |
-| github-june-2days-2019.json.s2 | 2.36x        | 1831.59 MB/s  |
-| github-ranks-backup.bin.s2     | 1.73x        | 1390.7 MB/s   |
-| enwik9.s2                      | 1.67x        | 681.53 MB/s   |
-| adresser.json.s2               | 3.41x        | 4230.53 MB/s  |
-| silesia.tar.s2                 | 1.52x        | 811.58        |
+|--------------------------------|---------------|---------------|
+| consensus.db.10gb.s2           | 1.84x         | 2289.8 MB/s   |
+| 10gb.tar.s2                    | 1.30x         | 867.07 MB/s   |
+| rawstudio-mint14.tar.s2        | 1.66x         | 1329.65 MB/s  |
+| github-june-2days-2019.json.s2 | 2.36x         | 1831.59 MB/s  |
+| github-ranks-backup.bin.s2     | 1.73x         | 1390.7 MB/s   |
+| enwik9.s2                      | 1.67x         | 681.53 MB/s   |
+| adresser.json.s2               | 3.41x         | 4230.53 MB/s  |
+| silesia.tar.s2                 | 1.52x         | 811.58        |
 
 Even though S2 typically compresses better than Snappy, decompression speed is always better. 
 
@@ -450,14 +451,14 @@ The most reliable is a wide dataset.
 For this we use [`webdevdata.org-2015-01-07-subset`](https://files.klauspost.com/compress/webdevdata.org-2015-01-07-4GB-subset.7z),
 53927 files, total input size: 4,014,735,833 bytes. Single goroutine used.
 
-| *                 | Input      | Output     | Reduction | MB/s   |
-|-------------------|------------|------------|-----------|--------|
-| S2                | 4014735833 | 1059723369 | 73.60%    | **934.34** |
-| S2 Better         | 4014735833 | 969670507  | 75.85%    | 532.70 |
-| S2 Best           | 4014735833 | 906625668  | **77.85%** | 46.84 |
-| Snappy            | 4014735833 | 1128706759 | 71.89%    | 762.59 |
-| S2, Snappy Output | 4014735833 | 1093821420 | 72.75%    | 908.60 |
-| LZ4               | 4014735833 | 1079259294 | 73.12%    | 526.94 |
+| *                 | Input      | Output     | Reduction  | MB/s       |
+|-------------------|------------|------------|------------|------------|
+| S2                | 4014735833 | 1059723369 | 73.60%     | **936.73** |
+| S2 Better         | 4014735833 | 961580539  | 76.05%     | 451.10     |
+| S2 Best           | 4014735833 | 899182886  | **77.60%** | 46.84      |
+| Snappy            | 4014735833 | 1128706759 | 71.89%     | 790.15     |
+| S2, Snappy Output | 4014735833 | 1093823291 | 72.75%     | 936.60     |
+| LZ4               | 4014735833 | 1063768713 | 73.50%     | 452.02     |
 
 S2 delivers both the best single threaded throughput with regular mode and the best compression rate with "best".
 "Better" mode provides the same compression speed as LZ4 with better compression ratio. 
@@ -489,42 +490,23 @@ AMD64 assembly is use for both S2 and Snappy.
 
 | Absolute Perf         | Snappy size | S2 Size | Snappy Speed | S2 Speed    | Snappy dec  | S2 dec      |
 |-----------------------|-------------|---------|--------------|-------------|-------------|-------------|
-| html                  | 22843       | 21111   | 16246 MB/s   | 17438 MB/s  | 40972 MB/s  | 49263 MB/s  |
-| urls.10K              | 335492      | 287326  | 7943 MB/s    | 9693 MB/s   | 22523 MB/s  | 26484 MB/s  |
-| fireworks.jpeg        | 123034      | 123100  | 349544 MB/s  | 273889 MB/s | 718321 MB/s | 827552 MB/s |
-| fireworks.jpeg (200B) | 146         | 155     | 8869 MB/s    | 17773 MB/s  | 33691 MB/s  | 52421 MB/s  |
-| paper-100k.pdf        | 85304       | 84459   | 167546 MB/s  | 101263 MB/s | 326905 MB/s | 291944 MB/s |
-| html_x_4              | 92234       | 21113   | 15194 MB/s   | 50670 MB/s  | 30843 MB/s  | 32217 MB/s  |
-| alice29.txt           | 88034       | 85975   | 5936 MB/s    | 6139 MB/s   | 12882 MB/s  | 20044 MB/s  |
-| asyoulik.txt          | 77503       | 79650   | 5517 MB/s    | 6366 MB/s   | 12735 MB/s  | 22806 MB/s  |
-| lcet10.txt            | 234661      | 220670  | 6235 MB/s    | 6067 MB/s   | 14519 MB/s  | 18697 MB/s  |
-| plrabn12.txt          | 319267      | 317985  | 5159 MB/s    | 5726 MB/s   | 11923 MB/s  | 19901 MB/s  |
-| geo.protodata         | 23335       | 18690   | 21220 MB/s   | 26529 MB/s  | 56271 MB/s  | 62540 MB/s  |
-| kppkn.gtb             | 69526       | 65312   | 9732 MB/s    | 8559 MB/s   | 18491 MB/s  | 18969 MB/s  |
-| alice29.txt (128B)    | 80          | 82      | 6691 MB/s    | 15489 MB/s  | 31883 MB/s  | 38874 MB/s  |
-| alice29.txt (1000B)   | 774         | 774     | 12204 MB/s   | 13000 MB/s  | 48056 MB/s  | 52341 MB/s  |
-| alice29.txt (10000B)  | 6648        | 6933    | 10044 MB/s   | 12806 MB/s  | 32378 MB/s  | 46322 MB/s  |
-| alice29.txt (20000B)  | 12686       | 13574   | 7733 MB/s    | 11210 MB/s  | 30566 MB/s  | 58969 MB/s  |
-
-
-| Relative Perf         | Snappy size | S2 size improved | S2 Speed | S2 Dec Speed |
-|-----------------------|-------------|------------------|----------|--------------|
-| html                  | 22.31%      | 7.58%            | 1.07x    | 1.20x        |
-| urls.10K              | 47.78%      | 14.36%           | 1.22x    | 1.18x        |
-| fireworks.jpeg        | 99.95%      | -0.05%           | 0.78x    | 1.15x        |
-| fireworks.jpeg (200B) | 73.00%      | -6.16%           | 2.00x    | 1.56x        |
-| paper-100k.pdf        | 83.30%      | 0.99%            | 0.60x    | 0.89x        |
-| html_x_4              | 22.52%      | 77.11%           | 3.33x    | 1.04x        |
-| alice29.txt           | 57.88%      | 2.34%            | 1.03x    | 1.56x        |
-| asyoulik.txt          | 61.91%      | -2.77%           | 1.15x    | 1.79x        |
-| lcet10.txt            | 54.99%      | 5.96%            | 0.97x    | 1.29x        |
-| plrabn12.txt          | 66.26%      | 0.40%            | 1.11x    | 1.67x        |
-| geo.protodata         | 19.68%      | 19.91%           | 1.25x    | 1.11x        |
-| kppkn.gtb             | 37.72%      | 6.06%            | 0.88x    | 1.03x        |
-| alice29.txt (128B)    | 62.50%      | -2.50%           | 2.31x    | 1.22x        |
-| alice29.txt (1000B)   | 77.40%      | 0.00%            | 1.07x    | 1.09x        |
-| alice29.txt (10000B)  | 66.48%      | -4.29%           | 1.27x    | 1.43x        |
-| alice29.txt (20000B)  | 63.43%      | -7.00%           | 1.45x    | 1.93x        |
+| html                  | 22843       | 20868   | 16246 MB/s   | 18617 MB/s  | 40972 MB/s  | 49263 MB/s  |
+| urls.10K              | 335492      | 286541  | 7943 MB/s    | 10201 MB/s  | 22523 MB/s  | 26484 MB/s  |
+| fireworks.jpeg        | 123034      | 123100  | 349544 MB/s  | 303228 MB/s | 718321 MB/s | 827552 MB/s |
+| fireworks.jpeg (200B) | 146         | 155     | 8869 MB/s    | 20180 MB/s  | 33691 MB/s  | 52421 MB/s  |
+| paper-100k.pdf        | 85304       | 84202   | 167546 MB/s  | 112988 MB/s | 326905 MB/s | 291944 MB/s |
+| html_x_4              | 92234       | 20870   | 15194 MB/s   | 54457 MB/s  | 30843 MB/s  | 32217 MB/s  |
+| alice29.txt           | 88034       | 85934   | 5936 MB/s    | 6540 MB/s   | 12882 MB/s  | 20044 MB/s  |
+| asyoulik.txt          | 77503       | 79575   | 5517 MB/s    | 6657 MB/s   | 12735 MB/s  | 22806 MB/s  |
+| lcet10.txt            | 234661      | 220383  | 6235 MB/s    | 6303 MB/s   | 14519 MB/s  | 18697 MB/s  |
+| plrabn12.txt          | 319267      | 318196  | 5159 MB/s    | 6074 MB/s   | 11923 MB/s  | 19901 MB/s  |
+| geo.protodata         | 23335       | 18606   | 21220 MB/s   | 25432 MB/s  | 56271 MB/s  | 62540 MB/s  |
+| kppkn.gtb             | 69526       | 65019   | 9732 MB/s    | 8905 MB/s   | 18491 MB/s  | 18969 MB/s  |
+| alice29.txt (128B)    | 80          | 82      | 6691 MB/s    | 17179 MB/s  | 31883 MB/s  | 38874 MB/s  |
+| alice29.txt (1000B)   | 774         | 774     | 12204 MB/s   | 13273 MB/s  | 48056 MB/s  | 52341 MB/s  |
+| alice29.txt (10000B)  | 6648        | 6933    | 10044 MB/s   | 12824 MB/s  | 32378 MB/s  | 46322 MB/s  |
+| alice29.txt (20000B)  | 12686       | 13516   | 7733 MB/s    | 12160 MB/s  | 30566 MB/s  | 58969 MB/s  |
+
 
 Speed is generally at or above Snappy. Small blocks gets a significant speedup, although at the expense of size. 
 
@@ -543,42 +525,23 @@ So individual benchmarks should only be seen as a guideline and the overall pict
 
 | Absolute Perf         | Snappy size | Better Size | Snappy Speed | Better Speed | Snappy dec  | Better dec  |
 |-----------------------|-------------|-------------|--------------|--------------|-------------|-------------|
-| html                  | 22843       | 19833       | 16246 MB/s   | 7731 MB/s    | 40972 MB/s  | 40292 MB/s  |
-| urls.10K              | 335492      | 253529      | 7943 MB/s    | 3980 MB/s    | 22523 MB/s  | 20981 MB/s  |
-| fireworks.jpeg        | 123034      | 123100      | 349544 MB/s  | 9760 MB/s    | 718321 MB/s | 823698 MB/s |
-| fireworks.jpeg (200B) | 146         | 142         | 8869 MB/s    | 594 MB/s     | 33691 MB/s  | 30101 MB/s  |
-| paper-100k.pdf        | 85304       | 82915       | 167546 MB/s  | 7470 MB/s    | 326905 MB/s | 198869 MB/s |
-| html_x_4              | 92234       | 19841       | 15194 MB/s   | 23403 MB/s   | 30843 MB/s  | 30937 MB/s  |
-| alice29.txt           | 88034       | 73218       | 5936 MB/s    | 2945 MB/s    | 12882 MB/s  | 16611 MB/s  |
-| asyoulik.txt          | 77503       | 66844       | 5517 MB/s    | 2739 MB/s    | 12735 MB/s  | 14975 MB/s  |
-| lcet10.txt            | 234661      | 190589      | 6235 MB/s    | 3099 MB/s    | 14519 MB/s  | 16634 MB/s  |
-| plrabn12.txt          | 319267      | 270828      | 5159 MB/s    | 2600 MB/s    | 11923 MB/s  | 13382 MB/s  |
-| geo.protodata         | 23335       | 18278       | 21220 MB/s   | 11208 MB/s   | 56271 MB/s  | 57961 MB/s  |
-| kppkn.gtb             | 69526       | 61851       | 9732 MB/s    | 4556 MB/s    | 18491 MB/s  | 16524 MB/s  |
-| alice29.txt (128B)    | 80          | 81          | 6691 MB/s    | 529 MB/s     | 31883 MB/s  | 34225 MB/s  |
-| alice29.txt (1000B)   | 774         | 748         | 12204 MB/s   | 1943 MB/s    | 48056 MB/s  | 42068 MB/s  |
-| alice29.txt (10000B)  | 6648        | 6234        | 10044 MB/s   | 2949 MB/s    | 32378 MB/s  | 28813 MB/s  |
-| alice29.txt (20000B)  | 12686       | 11584       | 7733 MB/s    | 2822 MB/s    | 30566 MB/s  | 27315 MB/s  |
-
-
-| Relative Perf         | Snappy size | Better size | Better Speed | Better dec |
-|-----------------------|-------------|-------------|--------------|------------|
-| html                  | 22.31%      | 13.18%      | 0.48x        | 0.98x      |
-| urls.10K              | 47.78%      | 24.43%      | 0.50x        | 0.93x      |
-| fireworks.jpeg        | 99.95%      | -0.05%      | 0.03x        | 1.15x      |
-| fireworks.jpeg (200B) | 73.00%      | 2.74%       | 0.07x        | 0.89x      |
-| paper-100k.pdf        | 83.30%      | 2.80%       | 0.07x        | 0.61x      |
-| html_x_4              | 22.52%      | 78.49%      | 0.04x        | 1.00x      |
-| alice29.txt           | 57.88%      | 16.83%      | 1.54x        | 1.29x      |
-| asyoulik.txt          | 61.91%      | 13.75%      | 0.50x        | 1.18x      |
-| lcet10.txt            | 54.99%      | 18.78%      | 0.50x        | 1.15x      |
-| plrabn12.txt          | 66.26%      | 15.17%      | 0.50x        | 1.12x      |
-| geo.protodata         | 19.68%      | 21.67%      | 0.50x        | 1.03x      |
-| kppkn.gtb             | 37.72%      | 11.04%      | 0.53x        | 0.89x      |
-| alice29.txt (128B)    | 62.50%      | -1.25%      | 0.47x        | 1.07x      |
-| alice29.txt (1000B)   | 77.40%      | 3.36%       | 0.08x        | 0.88x      |
-| alice29.txt (10000B)  | 66.48%      | 6.23%       | 0.16x        | 0.89x      |
-| alice29.txt (20000B)  | 63.43%      | 8.69%       | 0.29x        | 0.89x      |
+| html                  | 22843       | 18972       | 16246 MB/s   | 8621 MB/s    | 40972 MB/s  | 40292 MB/s  |
+| urls.10K              | 335492      | 248079      | 7943 MB/s    | 5104 MB/s    | 22523 MB/s  | 20981 MB/s  |
+| fireworks.jpeg        | 123034      | 123100      | 349544 MB/s  | 84429 MB/s   | 718321 MB/s | 823698 MB/s |
+| fireworks.jpeg (200B) | 146         | 149         | 8869 MB/s    | 7125 MB/s    | 33691 MB/s  | 30101 MB/s  |
+| paper-100k.pdf        | 85304       | 82887       | 167546 MB/s  | 11087 MB/s   | 326905 MB/s | 198869 MB/s |
+| html_x_4              | 92234       | 18982       | 15194 MB/s   | 29316 MB/s   | 30843 MB/s  | 30937 MB/s  |
+| alice29.txt           | 88034       | 71611       | 5936 MB/s    | 3709 MB/s    | 12882 MB/s  | 16611 MB/s  |
+| asyoulik.txt          | 77503       | 65941       | 5517 MB/s    | 3380 MB/s    | 12735 MB/s  | 14975 MB/s  |
+| lcet10.txt            | 234661      | 184939      | 6235 MB/s    | 3537 MB/s    | 14519 MB/s  | 16634 MB/s  |
+| plrabn12.txt          | 319267      | 264990      | 5159 MB/s    | 2960 MB/s    | 11923 MB/s  | 13382 MB/s  |
+| geo.protodata         | 23335       | 17689       | 21220 MB/s   | 10859 MB/s   | 56271 MB/s  | 57961 MB/s  |
+| kppkn.gtb             | 69526       | 55398       | 9732 MB/s    | 5206 MB/s    | 18491 MB/s  | 16524 MB/s  |
+| alice29.txt (128B)    | 80          | 78          | 6691 MB/s    | 7422 MB/s    | 31883 MB/s  | 34225 MB/s  |
+| alice29.txt (1000B)   | 774         | 746         | 12204 MB/s   | 5734 MB/s    | 48056 MB/s  | 42068 MB/s  |
+| alice29.txt (10000B)  | 6648        | 6218        | 10044 MB/s   | 6055 MB/s    | 32378 MB/s  | 28813 MB/s  |
+| alice29.txt (20000B)  | 12686       | 11492       | 7733 MB/s    | 3143 MB/s    | 30566 MB/s  | 27315 MB/s  |
+
 
 Except for the mostly incompressible JPEG image compression is better and usually in the 
 double digits in terms of percentage reduction over Snappy.
@@ -605,33 +568,150 @@ Some examples compared on 16 core CPU, amd64 assembly used:
 
 ```
 * enwik10
-Default... 10000000000 -> 4761467548 [47.61%]; 1.098s, 8685.6MB/s
-Better...  10000000000 -> 4219438251 [42.19%]; 1.925s, 4954.2MB/s
-Best...    10000000000 -> 3627364337 [36.27%]; 43.051s, 221.5MB/s
+Default... 10000000000 -> 4759950115 [47.60%]; 1.03s, 9263.0MB/s
+Better...  10000000000 -> 4084706676 [40.85%]; 2.16s, 4415.4MB/s
+Best...    10000000000 -> 3615520079 [36.16%]; 42.259s, 225.7MB/s
 
 * github-june-2days-2019.json
-Default... 6273951764 -> 1043196283 [16.63%]; 431ms, 13882.3MB/s
-Better...  6273951764 -> 949146808 [15.13%]; 547ms, 10938.4MB/s
-Best...    6273951764 -> 832855506 [13.27%]; 9.455s, 632.8MB/s
+Default... 6273951764 -> 1041700255 [16.60%]; 431ms, 13882.3MB/s
+Better...  6273951764 -> 945841238 [15.08%]; 547ms, 10938.4MB/s
+Best...    6273951764 -> 826392576 [13.17%]; 9.455s, 632.8MB/s
 
 * nyc-taxi-data-10M.csv
-Default... 3325605752 -> 1095998837 [32.96%]; 324ms, 9788.7MB/s
-Better...  3325605752 -> 954776589 [28.71%]; 491ms, 6459.4MB/s
-Best...    3325605752 -> 779098746 [23.43%]; 8.29s, 382.6MB/s
+Default... 3325605752 -> 1093516949 [32.88%]; 324ms, 9788.7MB/s
+Better...  3325605752 -> 885394158 [26.62%]; 491ms, 6459.4MB/s
+Best...    3325605752 -> 773681257 [23.26%]; 8.29s, 412.0MB/s
 
 * 10gb.tar
-Default... 10065157632 -> 5916578242 [58.78%]; 1.028s, 9337.4MB/s
-Better...  10065157632 -> 5649207485 [56.13%]; 1.597s, 6010.6MB/s
-Best...    10065157632 -> 5208719802 [51.75%]; 32.78s, 292.8MB/
+Default... 10065157632 -> 5915541066 [58.77%]; 1.028s, 9337.4MB/s
+Better...  10065157632 -> 5453844650 [54.19%]; 1.597s, 4862.7MB/s
+Best...    10065157632 -> 5192495021 [51.59%]; 32.78s, 308.2MB/
 
 * consensus.db.10gb
-Default... 10737418240 -> 4562648848 [42.49%]; 882ms, 11610.0MB/s
-Better...  10737418240 -> 4542428129 [42.30%]; 1.533s, 6679.7MB/s
-Best...    10737418240 -> 4244773384 [39.53%]; 42.96s, 238.4MB/s
+Default... 10737418240 -> 4549762344 [42.37%]; 882ms, 12118.4MB/s
+Better...  10737418240 -> 4438535064 [41.34%]; 1.533s, 3500.9MB/s
+Best...    10737418240 -> 4210602774 [39.21%]; 42.96s, 254.4MB/s
 ```
 
 Decompression speed should be around the same as using the 'better' compression mode. 
 
+## Dictionaries
+
+*Note: S2 dictionary compression is currently at an early implementation stage, with no assembly for
+neither encoding nor decoding. Performance improvements can be expected in the future.*
+
+Adding dictionaries allow providing a custom dictionary that will serve as lookup in the beginning of blocks.
+
+The same dictionary *must* be used for both encoding and decoding. 
+S2 does not keep track of whether the same dictionary is used,
+and using the wrong dictionary will most often not result in an error when decompressing.
+
+Blocks encoded *without* dictionaries can be decompressed seamlessly *with* a dictionary.
+This means it is possible to switch from an encoding without dictionaries to an encoding with dictionaries
+and treat the blocks similarly.
+
+Similar to [zStandard dictionaries](https://github.com/facebook/zstd#the-case-for-small-data-compression), 
+the same usage scenario applies to S2 dictionaries.  
+
+> Training works if there is some correlation in a family of small data samples. The more data-specific a dictionary is, the more efficient it is (there is no universal dictionary). Hence, deploying one dictionary per type of data will provide the greatest benefits. Dictionary gains are mostly effective in the first few KB. Then, the compression algorithm will gradually use previously decoded content to better compress the rest of the file.
+
+S2 further limits the dictionary to only be enabled on the first 64KB of a block.
+This will remove any negative (speed) impacts of the dictionaries on bigger blocks. 
+
+### Compression
+
+Using the [github_users_sample_set](https://github.com/facebook/zstd/releases/download/v1.1.3/github_users_sample_set.tar.zst) 
+and a 64KB dictionary trained with zStandard the following sizes can be achieved. 
+
+|                    | Default          | Better           | Best                  |
+|--------------------|------------------|------------------|-----------------------|
+| Without Dictionary | 3362023 (44.92%) | 3083163 (41.19%) | 3057944 (40.86%)      |
+| With Dictionary    | 921524 (12.31%)  | 873154 (11.67%)  | 785503 bytes (10.49%) |
+
+So for highly repetitive content, this case provides an almost 3x reduction in size.
+
+For less uniform data we will use the Go source code tree.
+Compressing First 64KB of all `.go` files in `go/src`, Go 1.19.5, 8912 files, 51253563 bytes input:
+
+|                    | Default           | Better            | Best              |
+|--------------------|-------------------|-------------------|-------------------|
+| Without Dictionary | 22955767 (44.79%) | 20189613 (39.39%  | 19482828 (38.01%) |
+| With Dictionary    | 19654568 (38.35%) | 16289357 (31.78%) | 15184589 (29.63%) |
+| Saving/file        | 362 bytes         | 428 bytes         | 472 bytes         |
+
+
+### Creating Dictionaries
+
+There are no tools to create dictionaries in S2. 
+However, there are multiple ways to create a useful dictionary:
+
+#### Using a Sample File
+
+If your input is very uniform, you can just use a sample file as the dictionary.
+
+For example in the `github_users_sample_set` above, the average compression only goes up from 
+10.49% to 11.48% by using the first file as dictionary compared to using a dedicated dictionary.
+
+```Go
+    // Read a sample
+    sample, err := os.ReadFile("sample.json")
+
+    // Create a dictionary.
+    dict := s2.MakeDict(sample, nil)
+	
+    // b := dict.Bytes() will provide a dictionary that can be saved
+    // and reloaded with s2.NewDict(b).
+	
+    // To encode:
+    encoded := dict.Encode(nil, file)
+
+    // To decode:
+    decoded, err := dict.Decode(nil, file)
+```
+
+#### Using Zstandard
+
+Zstandard dictionaries can easily be converted to S2 dictionaries.
+
+This can be helpful to generate dictionaries for files that don't have a fixed structure.
+
+
+Example, with training set files  placed in `./training-set`: 
+
+`λ zstd -r --train-fastcover training-set/* --maxdict=65536 -o name.dict`
+
+This will create a dictionary of 64KB, that can be converted to a dictionary like this:
+
+```Go
+    // Decode the Zstandard dictionary.
+    insp, err := zstd.InspectDictionary(zdict)
+    if err != nil {
+        panic(err)
+    }
+	
+    // We are only interested in the contents.
+    // Assume that files start with "// Copyright (c) 2023".
+    // Search for the longest match for that.
+    // This may save a few bytes.
+    dict := s2.MakeDict(insp.Content(), []byte("// Copyright (c) 2023"))
+
+    // b := dict.Bytes() will provide a dictionary that can be saved
+    // and reloaded with s2.NewDict(b).
+
+    // We can now encode using this dictionary
+    encodedWithDict := dict.Encode(nil, payload)
+
+    // To decode content:
+    decoded, err := dict.Decode(nil, encodedWithDict)
+```
+
+It is recommended to save the dictionary returned by ` b:= dict.Bytes()`, since that will contain only the S2 dictionary.
+
+This dictionary can later be loaded using `s2.NewDict(b)`. The dictionary then no longer requires `zstd` to be initialized.
+
+Also note how `s2.MakeDict` allows you to search for a common starting sequence of your files.
+This can be omitted, at the expense of a few bytes.
+
 # Snappy Compatibility
 
 S2 now offers full compatibility with Snappy.
@@ -648,10 +728,10 @@ If you would like more control, you can use the s2 package as described below:
 Snappy compatible blocks can be generated with the S2 encoder. 
 Compression and speed is typically a bit better `MaxEncodedLen` is also smaller for smaller memory usage. Replace 
 
-| Snappy                     | S2 replacement          |
-|----------------------------|-------------------------|
-| snappy.Encode(...)         | s2.EncodeSnappy(...)   |
-| snappy.MaxEncodedLen(...)  | s2.MaxEncodedLen(...)   |
+| Snappy                    | S2 replacement        |
+|---------------------------|-----------------------|
+| snappy.Encode(...)        | s2.EncodeSnappy(...)  |
+| snappy.MaxEncodedLen(...) | s2.MaxEncodedLen(...) |
 
 `s2.EncodeSnappy` can be replaced with `s2.EncodeSnappyBetter` or `s2.EncodeSnappyBest` to get more efficiently compressed snappy compatible output. 
 
@@ -660,12 +740,12 @@ Compression and speed is typically a bit better `MaxEncodedLen` is also smaller
 Comparison of [`webdevdata.org-2015-01-07-subset`](https://files.klauspost.com/compress/webdevdata.org-2015-01-07-4GB-subset.7z),
 53927 files, total input size: 4,014,735,833 bytes. amd64, single goroutine used:
 
-| Encoder               | Size       | MB/s       | Reduction |
-|-----------------------|------------|------------|------------
-| snappy.Encode         | 1128706759 | 725.59     | 71.89%    |
-| s2.EncodeSnappy       | 1093823291 | **899.16** | 72.75%    |
-| s2.EncodeSnappyBetter | 1001158548 | 578.49     | 75.06%    |
-| s2.EncodeSnappyBest   | 944507998  | 66.00      | **76.47%**|
+| Encoder               | Size       | MB/s       | Reduction  |
+|-----------------------|------------|------------|------------|
+| snappy.Encode         | 1128706759 | 725.59     | 71.89%     |
+| s2.EncodeSnappy       | 1093823291 | **899.16** | 72.75%     |
+| s2.EncodeSnappyBetter | 1001158548 | 578.49     | 75.06%     |
+| s2.EncodeSnappyBest   | 944507998  | 66.00      | **76.47%** |
 
 ## Streams
 
@@ -835,6 +915,13 @@ This is done using the regular "Skip" function:
 
 This will ensure that we are at exactly the offset we want, and reading from `dec` will start at the requested offset.
 
+# Compact storage
+
+For compact storage [RemoveIndexHeaders](https://pkg.go.dev/github.com/klauspost/compress/s2#RemoveIndexHeaders) can be used to remove any redundant info from 
+a serialized index. If you remove the header it must be restored before [Loading](https://pkg.go.dev/github.com/klauspost/compress/s2#Index.Load).
+
+This is expected to save 20 bytes. These can be restored using [RestoreIndexHeaders](https://pkg.go.dev/github.com/klauspost/compress/s2#RestoreIndexHeaders). This removes a layer of security, but is the most compact representation. Returns nil if headers contains errors.
+
 ## Index Format:
 
 Each block is structured as a snappy skippable block, with the chunk ID 0x99.
@@ -844,20 +931,20 @@ The block can be read from the front, but contains information so it can be read
 Numbers are stored as fixed size little endian values or [zigzag encoded](https://developers.google.com/protocol-buffers/docs/encoding#signed_integers) [base 128 varints](https://developers.google.com/protocol-buffers/docs/encoding), 
 with un-encoded value length of 64 bits, unless other limits are specified. 
 
-| Content                                                                   | Format                                                                                                                      |
-|---------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------|
-| ID, `[1]byte`                                                           | Always 0x99.                                                                                                                  |
-| Data Length, `[3]byte`                                                  | 3 byte little-endian length of the chunk in bytes, following this.                                                            |
-| Header `[6]byte`                                                        | Header, must be `[115, 50, 105, 100, 120, 0]` or in text: "s2idx\x00".                                                        |
-| UncompressedSize, Varint                                                | Total Uncompressed size.                                                                                                      |
-| CompressedSize, Varint                                                  | Total Compressed size if known. Should be -1 if unknown.                                                                      |
-| EstBlockSize, Varint                                                    | Block Size, used for guessing uncompressed offsets. Must be >= 0.                                                             |
-| Entries, Varint                                                         | Number of Entries in index, must be < 65536 and >=0.                                                                          |
-| HasUncompressedOffsets `byte`                                           | 0 if no uncompressed offsets are present, 1 if present. Other values are invalid.                                             |
-| UncompressedOffsets, [Entries]VarInt                                    | Uncompressed offsets. See below how to decode.                                                                                |
-| CompressedOffsets, [Entries]VarInt                                      | Compressed offsets. See below how to decode.                                                                                  |
-| Block Size, `[4]byte`                                                   | Little Endian total encoded size (including header and trailer). Can be used for searching backwards to start of block.       |
-| Trailer `[6]byte`                                                       | Trailer, must be `[0, 120, 100, 105, 50, 115]` or in text: "\x00xdi2s". Can be used for identifying block from end of stream. |
+| Content                              | Format                                                                                                                        |
+|--------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
+| ID, `[1]byte`                        | Always 0x99.                                                                                                                  |
+| Data Length, `[3]byte`               | 3 byte little-endian length of the chunk in bytes, following this.                                                            |
+| Header `[6]byte`                     | Header, must be `[115, 50, 105, 100, 120, 0]` or in text: "s2idx\x00".                                                        |
+| UncompressedSize, Varint             | Total Uncompressed size.                                                                                                      |
+| CompressedSize, Varint               | Total Compressed size if known. Should be -1 if unknown.                                                                      |
+| EstBlockSize, Varint                 | Block Size, used for guessing uncompressed offsets. Must be >= 0.                                                             |
+| Entries, Varint                      | Number of Entries in index, must be < 65536 and >=0.                                                                          |
+| HasUncompressedOffsets `byte`        | 0 if no uncompressed offsets are present, 1 if present. Other values are invalid.                                             |
+| UncompressedOffsets, [Entries]VarInt | Uncompressed offsets. See below how to decode.                                                                                |
+| CompressedOffsets, [Entries]VarInt   | Compressed offsets. See below how to decode.                                                                                  |
+| Block Size, `[4]byte`                | Little Endian total encoded size (including header and trailer). Can be used for searching backwards to start of block.       |
+| Trailer `[6]byte`                    | Trailer, must be `[0, 120, 100, 105, 50, 115]` or in text: "\x00xdi2s". Can be used for identifying block from end of stream. |
 
 For regular streams the uncompressed offsets are fully predictable,
 so `HasUncompressedOffsets` allows to specify that compressed blocks all have 
@@ -929,6 +1016,7 @@ To decode from any given uncompressed offset `(wantOffset)`:
 
 See [using indexes](https://github.com/klauspost/compress/tree/master/s2#using-indexes) for functions that perform the operations with a simpler interface.
 
+
 # Format Extensions
 
 * Frame [Stream identifier](https://github.com/google/snappy/blob/master/framing_format.txt#L68) changed from `sNaPpY` to `S2sTwO`.
@@ -951,13 +1039,80 @@ The length is specified by reading the 3-bit length specified in the tag and dec
 | 7      | 65540 + read 3 bytes |
 
 This allows any repeat offset + length to be represented by 2 to 5 bytes.
+It also allows to emit matches longer than 64 bytes with one copy + one repeat instead of several 64 byte copies.
 
 Lengths are stored as little endian values.
 
-The first copy of a block cannot be a repeat offset and the offset is not carried across blocks in streams.
+The first copy of a block cannot be a repeat offset and the offset is reset on every block in streams.
 
 Default streaming block size is 1MB.
 
+# Dictionary Encoding
+
+Adding dictionaries allow providing a custom dictionary that will serve as lookup in the beginning of blocks.
+
+A dictionary provides an initial repeat value that can be used to point to a common header.
+
+Other than that the dictionary contains values that can be used as back-references.
+
+Often used data should be placed at the *end* of the dictionary since offsets < 2048 bytes will be smaller.
+
+## Format
+
+Dictionary *content* must at least 16 bytes and less or equal to 64KiB (65536 bytes).
+
+Encoding: `[repeat value (uvarint)][dictionary content...]`
+
+Before the dictionary content, an unsigned base-128 (uvarint) encoded value specifying the initial repeat offset.
+This value is an offset into the dictionary content and not a back-reference offset,
+so setting this to 0 will make the repeat value point to the first value of the dictionary.
+
+The value must be less than the dictionary length-8
+
+## Encoding
+
+From the decoder point of view the dictionary content is seen as preceding the encoded content.
+
+`[dictionary content][decoded output]`
+
+Backreferences to the dictionary are encoded as ordinary backreferences that have an offset before the start of the decoded block.
+
+Matches copying from the dictionary are **not** allowed to cross from the dictionary into the decoded data.
+However, if a copy ends at the end of the dictionary the next repeat will point to the start of the decoded buffer, which is allowed.
+
+The first match can be a repeat value, which will use the repeat offset stored in the dictionary.
+
+When 64KB (65536 bytes) has been en/decoded it is no longer allowed to reference the dictionary, 
+neither by a copy nor repeat operations. 
+If the boundary is crossed while copying from the dictionary, the operation should complete, 
+but the next instruction is not allowed to reference the dictionary.
+
+Valid blocks encoded *without* a dictionary can be decoded with any dictionary. 
+There are no checks whether the supplied dictionary is the correct for a block.
+Because of this there is no overhead by using a dictionary.
+
+## Example
+
+This is the dictionary content. Elements are separated by `[]`.
+
+Dictionary: `[0x0a][Yesterday 25 bananas were added to Benjamins brown bag]`.
+
+Initial repeat offset is set at 10, which is the letter `2`.
+
+Encoded `[LIT "10"][REPEAT len=10][LIT "hich"][MATCH off=50 len=6][MATCH off=31 len=6][MATCH off=61 len=10]`
+
+Decoded: `[10][ bananas w][hich][ were ][brown ][were added]`
+
+Output: `10 bananas which were brown were added`
+
+
+## Streams
+
+For streams each block can use the dictionary.
+
+The dictionary cannot not currently be provided on the stream.
+
+
 # LICENSE
 
 This code is based on the [Snappy-Go](https://github.com/golang/snappy) implementation.