|
Posted by poster3814 on June 5, 2007, 3:49 pm
If you were Registered and logged in, you could reply and use other advanced thread options
Moe Trin wrote:
> On Sun, 20 May 2007, in the Usenet newsgroup comp.security.misc, in article
>
>> Moe Trin wrote:
>
>>> [compton ~]$ size.of.words /usr/local/share/dict/web2
>>> Source /usr/local/share/dict/web2 has 235882 words
>>> . 52 ........ 29988
>>> .. 160 ......... 32403
>>> ... 1420 .......... 30878
>>> .... 5272 ........... 26013
>>> ..... 10228 ............ 20462
>>> ...... 17705 more than 12 char 37432
>>> ....... 23869
>>> [compton ~]$ echo "52+160+1420+5272+10228+17705" | bc
>>> 34837
>>> [compton ~]$ echo "34837^3" | bc
>>> 42278760414253
>>> [compton ~]$
>>>
>>> That's still quite a few words to mash together ;-)
>>>
>>> /usr/local/share/dict/web2 is the "Webster's Second International"
>>> available through any search engine
>
>> To be honest, I was thinking there might be a variety of dictionary
>> files out there of this type that aren't "complete" dictionaries. For
>> example, ones with only "everyday" words, so to speak, or ones with no
>> scientific terms, or no proper nouns, no numerals, etc. I thought it
>> feasible that such a file would be of a manageable size.
>
> The size of a dictionary is nearly always an advertising gimmick. I have
> two paperback dictionaries on this desk with the number of definitions
> prominently displayed as if more is better. A more commonly used
> computer word list (not a dictionary, because it lacks definitions) has
>
> [compton ~]$ size.of.words /usr/share/dict/words
> Source /usr/share/dict/words has 45402 words
> . 0 ...... 6175 ........... 3069
> .. 49 ....... 7370 ............ 1881
> ... 536 ........ 7075 ............. 1136
> .... 2238 ......... 6086 .............. 545
> ..... 4179 .......... 4592 15 or more char 471
> [compton ~]$
>
> which (assuming English is your original language) is more like what you
> would be using in normal conversation. In the 1950s, international short
> wave radio was an important tool used to exchange news, ideas and culture
> among nations. The official USA service was The Voice Of America, which
> (at the peak in the 1960s) had dozens of transmitters broadcasting 24/7
> in dozens of languages. ONE OF those languages was called "Special English"
> and used a vocabulary of just 1500 words, for people who used English as a
> second or third language. While they did speak slower, even that limited
> number of words didn't make the language seem out of place for a primary
> English speaker.
>
>> I was also thinking that if there were programs that do what I was
>> asking that perhaps the user could select criteria, such as only
>> concatenating 2 words of 5 letters each. Then the user could run it
>> again later concatenating 2 words of 4 letters each, etc.
>
> I don't know why one would be needed, as this is trivial to accomplish
> using virtually any programming language from BASIC to perl to ruby to
> you name it. Creating a dictionary of such combinations is pretty much a
> waste of CPU cycles and disk-space. Using the word-list noted above, there
> are 2238 words of four characters, and 4179 of five. Any two five letter
> words, and you have about 4179^2 or 1.75e6 results. Ignoring case, and
> using a 5 bit (Baudot) alphabet, storing those strings would require over
> a hundred megabytes of space - closer to 180 megabyts using ASCII.
>
> To what end? Do you want to make a book that takes these words and
> creates a password hash for each one? For the normal UNIX 'crypt'
> mechanism which adds two 'salt' characters to "spice up" (vary) the
> hashing algorithm, those 1.75e6 passwords become 7.15e10 different
> 13 character result hashes which would take 930 Gigabytes to store in
> ASCII. Allowing the password to contain upper and lower case multiplies
> the storage space needed by several orders of magnitude.
>
> Old guy
Eesh. That's a lot of data.
Your post seems to make a lot of sense, and I appreciate your time in
replying. As it seems the question I was wondering about would prove to
be a big mess of a solution, and it's really not that big a deal to me
anyway, I don't want to waste anyone's time further with it.
Thanks again for everyone's time and effort. I appreciate it.
--
Please respond to the newsgroup only. Email sent to this account goes
unread.
|