Skip to content

Commit b76c339

Browse files
authored
First real README.md
1 parent 1354e2a commit b76c339

File tree

1 file changed

+66
-0
lines changed

1 file changed

+66
-0
lines changed

README.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,68 @@
11
# bscrypt
22
A cache hard password hash/KDF
3+
4+
## Why Cache Hard
5+
6+
Cache hard algorithms are better than memory hard algorithms at shorter run times.
7+
Basically cache hard algorithms forces GPUs to use 1/4 to 1/16 of the memory bandwidth because of the large bus width (commonly 256 to 1024 bits).
8+
Another way to look at it is memory transactions vs bandwidth.
9+
Also the low latency of L2 cache on CPUs and the 8 parallel look ups let's us make a lot of random reads.
10+
With memory hard algorithms, there is a point where doubling the memory quarters a GPU attacker's speed.
11+
There then is a point at which a memory hard algorithm will overtake a cache hard algorithm.
12+
Cache hard algorithms don't care that GPUs will get ~100% utilization of memory transactions because it's already very limiting.
13+
14+
## Settings
15+
16+
* `m` (`memoryKiB`)
17+
* `t` (`iterations`)
18+
* `p` (`parallelism`)
19+
20+
Set `m` to the largest per core cache size.
21+
For current CPUs, this is L2 cache and commonly 256 KiB, 512 KiB, 1 MiB, or 1.25 MiB per core.
22+
You shouldn't currently go less than 128 KiB.
23+
When in doubt use `m=256` (256 KiB).
24+
25+
If doing server side, then set `p` to 1.
26+
But if you set up a queuing system the set `p` to number of cores or less.
27+
You may want to benchmark different values of `p` with normal other workloads.
28+
Too find the best `p`.
29+
30+
Now set `t` to at least `1900000 / (1024 * m * p)`.
31+
If you want it to be stronger because this is likely a few milliseconds change 1'900'000 to 19'000'000.
32+
This will limit GPU attackers to <1 KH/s/GPU.
33+
Which is good for encryption.
34+
Note that next gen GPUs are launching around November 2022 and these are just bare minimums for `t`.
35+
I recommend using settings that are at least twice as hard on current hardware to account for future advances.
36+
Just so you have time to upgrade settings so that old settings are still <10 KH/s/GPU.
37+
38+
### Easy Settings
39+
Just use `m=256`, `t=80`, `p=1` that should still be good in 2030.
40+
41+
Looking at historical GPU memory transaction rates and using an exponential trend line for AMD it's still good in 2043 and Nvidia it's still good in 2034.
42+
This assumes GPU cache sizes aren't like 10x higher per SM or whatever by then.
43+
44+
## "Not BLAKE2b"
45+
46+
Not a BLAKE2b mix calculation.
47+
There is no message and the rotates were changed from 32,24,16,63 to 8,1,16,11,40,32.
48+
These were found to give a faster mix by a program that checked 2 any rotates, 3 byte rotates, and a 32 bit rotate.
49+
This was picked out of several equivalent ones because it looked similar to the "best" 4 rotates.
50+
These are 8,1,24,32 that are 1 any rotate, 2 byte rotates, and a 32 bit rotate.
51+
52+
Related: https://twitter.com/Sc00bzT/status/1461894336052973573
53+
54+
I went with the 6 rotates because it mixed faster.
55+
I was going to do either:
56+
57+
* 2 rounds of 6 rotates
58+
* 3 rounds of 4 rotates
59+
60+
Oh "3 rounds of 4 rotates" has a "coverage" of 87.5% and I believe "2 rounds of 6 rotates" has a "coverage" of 100%.
61+
I need to check this it's been a almost a year since I looked at the data.
62+
"Coverage" is the percent of bits from the block that have affected other bits.
63+
You need 1'024 variables representing the 1'024 bits in the block.
64+
Each variable has 1'024 bits representing which bit from the block has influenced its value.
65+
You rotate those and OR them together instead of add and XOR.
66+
Then count the bits that are set.
67+
This may not be the best way to check for the best rotates.
68+
Also addition influences higher bits which this doesn't check for.

0 commit comments

Comments
 (0)