Valid HTML 4.01 Transitional
Prev: GPG Operation Cancelled
(Index)
Jim Carter's Bugfixes

/dev/random Blocks Forever

James F. Carter
2017-02-15
Symptom:

You are creating a secret (private) key for SSH, GPG or TLS (host key), or are connecting to another host using TLS, and the operation takes ridiculously long. It will finish eventually but is much slower than usual.

What's happening:

The crypto software reads /dev/random to get some truly (you hope) random numbers, but the machine's entropy pool is depleted and /dev/random blocks until more entropy is captured. On a less active machine, particularly one where the user is just sitting there waiting for an operation to finish, entropy is produced slowly.

You are supposed to be running a daemon which collects entropy from a variety of sources not practical for the kernel to tap into, but it has gone catatonic or has died.

See the end of this page for a discussion of entropy.

How to fix:

First, does your host have the kernel's hardware-specific driver that collects entropy? Do grep '_rng_driver' /proc/kallsyms and look for $ARCH_rng_driver where on x86 (Intel, AMD, Via) processors you may see ARCH = amd intel tpm via virtio; on Broadcom 2835 ARM (Raspberry Pi) look for ARCH = bcm-2835.

If the kernel has no trace of a random driver, look in /lib/modules/$VERS/kernel/drivers/char/hw_random/ (get $VERS from uname -r) and see which one(s) will load. If you have no modules, look for your architecture in the kernel source directory /usr/src/linux/drivers/char/hw_random. Details for some modules:

If you do have a functioning hardware random number generator then you should have a device /dev/hwrng and it should produce random bytes at a reasonable rate. To test: dd if=/dev/hwrng of=/dev/null count=8 iflag=fullblock. (Default blocksize for dd is 512 bytes.) On one of my virtual machines this stole 4096 bytes from the host in 0.14 seconds.

On a x86 (Intel or AMD) CPU, you also need to know if it has the RDRAND instruction. Look in /proc/cpuinfo, and among the CPU flags look for rdrand.

Now you need to inject this entropy into the kernel's pool. Currently there are two entropy daemons that do this. If you have either /dev/hwrng or the RDRAND instruction, you should run rngd from the rng-tools package. (When RDRAND is present it reports the DRNG entropy source.) It is aware when the kernel's entropy pool drops below a limit. Then it reads enough random bytes from its sources to fill the pool, configured to 3700 bits (not 4096) on a SuSE system. Under heavy load it can refill the pool very fast; I've never seen the pool under 3600 bits.

If you have no official hardware entropy source, as on my three older AMD machines, you need to run haveged from the haveged package. See this page for where haveged gets entropy. It has to work harder to get entropy, and is therefore slower at refilling the pool, but it does serve the purpose.

It is not harmful to run both daemons, but haveged uses more CPU (actually not too much) for no real added benefit, and testing the random number generator with two entropy daemons can get confusing, so it's recommended to pick one or the other.

What Is Entropy?

In thermodynamics it is the logarithm of the number of states a system (e.g. an engine) can be in, which are assumed to be equally probable and unknown to the observer; hence the choice of a state is random or unpredictable.

In computing it is the number of bits of random or unpredictable state information collected in the kernel's entropy pool. This is the same definition because the number of bits equals the logarithm of the number of states (each times a scale factor).

The essential feature of a random number, that counts as entropy, is that an attacker cannot predict it. Thus the number, however many bits you need, can be used as a secret key in a cryptographic algorithm, and the attacker can only capture the payload by trying every possible value for the key; enough bits are used that a brute force attack like that requires impractically much resources or run time.

When numbers are unpredictable they have certain statistical characteristics, e.g. the probability is 1/2 that a bit will be 1 or 0, the autocorrelation function is zero, and various more complex properties. To be accepted as truly random, the output of the hardware generator must pass these tests. Remember that truly random data will fail tests in a certain fraction of the trials, inversely proportional to the quantity of random numbers tested. Random numbers that never fail tests have been tampered with and have lost some of their unpredictability.

A pseudo-random generator is a deterministic algorithm (a program) designed to produce output that is indistinguishable from truly random data. However, the pseudo-random generator needs to start from some initial state, called its seed, and if the attacker discovers the seed he can predict exactly what the pseudo-random generator will produce. But if the seed is truly random, so is the pseudo-random output, within limits. But the pseudo-random generator necessarily repeats eventually, and the more of the generator's output is available to the attacker, the better he can deduce the algorithm and the seed. With weaker algorithms the needed history is surprisingly short (and you need to use a cryptographically secure algorithm). Thus it's prudent to re-seed the generator from time to time. See this blog post about RDRAND and RDSEED by John M (Intel) (2012-11-17) as well as Intel's implementation guide for the Digital Random Number Generator (DRNG). Intel's RDRAND instruction delivers the result of a pseudo-random generator with a long repetition period, that is re-seeded from a noise-based source after up to 511 outputs (random). RDSEED delivers, more slowly, direct from the source, skipping seeds used for RDRAND.

Manufacturing constraints limit what kinds of random sources a CPU chip can include easily; thermal (Johnson) noise is most often used, and shot noise is also practical. With today's emphasis on security, modern chips generally include a noise-based bit source, and its output can be transferred by the entropy daemon to the entropy pool. In addition, the haveged daemon uses variations in the speed of instruction execution, caused by competing processes, as a source of randomness, and the kernel drivers for the disc, network, keyboard and mouse similarly report unpredictable timings to the entropy pool.

Hyper-paranoid cryptography experts point out that the noise-based bit generator is closed source, so we cannot be sure just what it is doing, and it is possible (jimc says, possible but very unlikely) that adversaries have induced the manufacturer to design it so its output is predictable if a procedure is used that is known only to the adversaries. For example the adversary might be able to set the seed and turn off re-seeding. Thus if very difficult conditions are met, the adversary could steal a secret key and decrypt an encrypted message. The conditions include:

How does the Linux kernel convert incoming entropy pool bits into the output of /dev/random? See this blog post about the Linux random generator by Aaron Toponce (2014-07-21). In summary, the various entropy sources are added to the input pool using a fast mixing algorithm. When random numbers are needed, the pool is hashed using SHA-1 and the result is used as the seed for a pseudorandom number generator (PRNG) producing 32bit integers. These are cached in two pools of 1024 bits (32 ints) each, one for /dev/random and one for dev/urandom. When random data is sent out by /dev/random, the entropy estimate of the input pool is decreased by the number of bits sent, and if it reaches zero, /dev/random blocks until more entropy is added to the input pool. /dev/urandom does not block, and its PRNG is re-seeded not more frequently than once every 60 seconds, decreasing the entropy count unless already 0.

This means that Linux random numbers are cryptographically secure, meaning that they are truly random and unpredictable, but for regulatory compliance they cannot be considered independent, having been produced deterministically by a PRNG (between re-seeds).

Jimc says (following the advice of others):


Prev: GPG Operation Cancelled
(Index)