Page 1 of 1

Generating random words

Posted: Wed May 14, 2008 1:19 pm
by Tegiri_Nenashi
I took "context free" adjective literally and trying to generate random words in a grammar. For arbitrary grammar rule, e.g.

A : a B C;

it might be challenging to establish correspondence between terminal symbol positions in the derived word, on one side, and shape coordinates on the plane, on the other. However, if restricted to regular language, then the grammar can be always represented in a normal form so that each rule

B : a C;

has a terminal symbol in the first position and optional nonterminal in the second. Then, it is immediate that Y should be plotted shifted one position (whatever this may mean in the plane coordinates).

To be more concrete, here is a word in free language on alphabet {a,b}:

startshape GRAMMAR

include lower_alphabet.cfdg

rule GRAMMAR 10 {
a_lower {}
GRAMMAR { x 20 }
}

rule GRAMMAR 10 {
b_lower {}
GRAMMAR { x 20 }
}

rule GRAMMAR 1 {
EMPTY_WORD {}
}

rule EMPTY_WORD {
SQUARE { s 5 1 y -5 b 0.9 }
}

Now, I expected it to render a random word in 10 symbols, it outputs abbb_. In fact, it behaves not intuitively on the parameter ranged from 1 to 1000:

* from 1 to 4 it gives the words of increasing lengths as expected
* then up to 300 the output doesn't change
* after the 300 it generates long word which disappears

Any comments?

Posted: Wed May 14, 2008 9:32 pm
by mtnviewmark
It wasn't clear from your post, but I'm assuming by "parameter" you mean the weight value given to the rules (the '10' in the example given.)

If so, you are misinterpreting what that weight means. It is a weighting factor for a random choice. Indeed, as that number is increased, the grammar will tend to produce longer strings - but only on average. Even with a large weight factor, it is possible that the very first replacement for the starting non-terminal GRAMMER will be the rule that replaces with EMPTY_WORD, leading to the string '_'. Multiple renders will show you different length lines.

Now, the expected line length can be computed using something like:

p(n,m) :: probability of a line of length n, given the weights of a and b are n, and the weight of stopping is 1

p(0,m) = 1/(2m+1)
p(i,m) = (1-sum(p(j,m) for j < i))*1/(2m+1)

You can do the math on this and find the median length for a given value of m. Some values:

at m = 1, median length is 1
at m = 2, median length is 3
at m = 3, median length is 4
at m = 4, median length is 5
at m = 10, median length is 14
at m = 100, median length is 138
at m = 200, median length is 277
at m = 300, median length is 417

It isn't surprising that at large numbers the image is so long and thin that it doesn't render anything big enough to see in your window.

Posted: Wed Mar 04, 2009 7:51 pm
by askiopop
I used html of a random word genorator (modifided code)

http://h1.ripway.com/vness/Namemaker.html