|
Posted by Colin B. on May 11, 2007, 5:15 pm
If you were Registered and logged in, you could reply and use other advanced thread options ottomeister@mail.com wrote:
> wrote:
>
>> $ dd if=/dev/random of=<filename> bs=x count=y conv=sync.
>>
>> Now assuming that we keep the filesize the same (i.e. x*y=constant),
>> the time to generate files goes up as count increases and bs decreases.
>> The interesting thing is that files created with low count and high bs...
>> - compress much better
>> - generate far fewer lines (as measured by wc -l)
>>
>> Now since compress and gzip are apparently entropy-based algorithms, it
>> stands to reason (at least by me!) that the small-count file has less
>> entropy. The question is, what does this actually mean, and what are the
>> consequences of it?
>
> 'conv=sync' tells 'dd' that if it gets a short read from its input
> then it
> should pad the output record to the specified blocksize with zeroes.
> /dev/random can produce short reads if its entropy pool gets depleted.
> If you examine the compressible output files I expect you'll find
> that
> they contain lots of runs of zeroes, and those runs of zeroes are
> highly compressible.
>
> This is also the reason why the large 'bs' causes the file to be
> generated more quickly.
Ah hah! That explains some other behaviour I noticed after posting this,
namely that until a certain point, increasing bs (and decreasing count)
didn't seem to produce the behaviour I described.
Now that I actaully look at the output from dd, I can see the same thing--
0 full records and count partial records if bs is high enough (> 1040,
in this case).
But shouldn't /dev/random (on Solaris, BTW) block until it can fill the
request for whatever block size? Or can it only block between calls?
Thanks,
Colin
|