Generating random words

If you're having trouble using Context Free or don't understand the language, ask for help here.

Moderators: MtnViewJohn, chris, mtnviewmark

Post Reply
Tegiri_Nenashi
Posts: 9
Joined: Wed May 14, 2008 12:52 pm

Generating random words

Post by Tegiri_Nenashi »

I took "context free" adjective literally and trying to generate random words in a grammar. For arbitrary grammar rule, e.g.

A : a B C;

it might be challenging to establish correspondence between terminal symbol positions in the derived word, on one side, and shape coordinates on the plane, on the other. However, if restricted to regular language, then the grammar can be always represented in a normal form so that each rule

B : a C;

has a terminal symbol in the first position and optional nonterminal in the second. Then, it is immediate that Y should be plotted shifted one position (whatever this may mean in the plane coordinates).

To be more concrete, here is a word in free language on alphabet {a,b}:

startshape GRAMMAR

include lower_alphabet.cfdg

rule GRAMMAR 10 {
a_lower {}
GRAMMAR { x 20 }
}

rule GRAMMAR 10 {
b_lower {}
GRAMMAR { x 20 }
}

rule GRAMMAR 1 {
EMPTY_WORD {}
}

rule EMPTY_WORD {
SQUARE { s 5 1 y -5 b 0.9 }
}

Now, I expected it to render a random word in 10 symbols, it outputs abbb_. In fact, it behaves not intuitively on the parameter ranged from 1 to 1000:

* from 1 to 4 it gives the words of increasing lengths as expected
* then up to 300 the output doesn't change
* after the 300 it generates long word which disappears

Any comments?

User avatar
mtnviewmark
Site Admin
Posts: 81
Joined: Wed May 04, 2005 12:46 pm
Location: Mountain View, CA
Contact:

Post by mtnviewmark »

It wasn't clear from your post, but I'm assuming by "parameter" you mean the weight value given to the rules (the '10' in the example given.)

If so, you are misinterpreting what that weight means. It is a weighting factor for a random choice. Indeed, as that number is increased, the grammar will tend to produce longer strings - but only on average. Even with a large weight factor, it is possible that the very first replacement for the starting non-terminal GRAMMER will be the rule that replaces with EMPTY_WORD, leading to the string '_'. Multiple renders will show you different length lines.

Now, the expected line length can be computed using something like:

p(n,m) :: probability of a line of length n, given the weights of a and b are n, and the weight of stopping is 1

p(0,m) = 1/(2m+1)
p(i,m) = (1-sum(p(j,m) for j < i))*1/(2m+1)

You can do the math on this and find the median length for a given value of m. Some values:

at m = 1, median length is 1
at m = 2, median length is 3
at m = 3, median length is 4
at m = 4, median length is 5
at m = 10, median length is 14
at m = 100, median length is 138
at m = 200, median length is 277
at m = 300, median length is 417

It isn't surprising that at large numbers the image is so long and thin that it doesn't render anything big enough to see in your window.
I'm the "m" in "mtree.cfdg"

User avatar
askiopop
Posts: 12
Joined: Wed Mar 04, 2009 7:07 pm
Location: Guess

Post by askiopop »

I used html of a random word genorator (modifided code)

http://h1.ripway.com/vness/Namemaker.html
thank you

Post Reply