A filter-based word generator


Hiding in the procedures for generating words of the CV language type, there is a model of word structure somewhat more developed than a simple string. Each call to get.string, constructs, in effect, an intermediate sized unit, a syllable. When we try to generate words in a language with more complex phonotactic patterns, the need for syllables and similar constructs becomes more pressing.

Syllable parts

Suppose a language allows syllables to contain groups of up to three consonants initially. In these circumstances, the situation is certainly not going to be so free and easy that any consonant can appear in any position in a group. There will typically be restrictions on the type of consonant selectable for each of the three positions and there will most likely be further restrictions on the specific combinations of sounds allowed in those positions.

In specifying these consonantal constraints, however, we will normally not need to step outside the limits of the consonant group. In other words, a syllable initial consonant sequence typically functions as a self-contained unit. For that reason, it makes sense not only to consider a word to be composed of syllables, but also to consider the syllables in turn to be composed of smaller (and yet smaller) constituents, amongst which are the onset (the initial consonant group) and the rhyme (the rest). From a phonological point of view, words are hierarchically structured, just as sentences are from a syntactic point of view.

Distributional classes and phonetic features

We have also noted that the restrictions which govern sound sequencing are phonetically based. They refer to whole classes of sounds defined phonetically. The reasons for the restriction to a particular class of sounds in a particular place are in part found in our articulatory or perceptual abilities. Sounds which are juxtaposed need to meet two separate and potentially conflicting requirements. They must be similar enough (in some articulatory sense) to be easy to pronounce when strung together, and they must be clearly enough differentiated acoustically for us to perceive them as distinct . In order to handle complex onset structures, we will clearly need, then, to make use of a set of phonetic features going well beyond the basic pair consonant and vowel. In fact we are going take advantage of the notion of sonority and use the following sonority scale (derived from Selkirk 1984) to identify phonetic classes of phonemes:

[[P T K] [B D G] [F %] [V $ Z] [S] [M N] [L] [R] [I U] [E O] [A]]

Voiceless and voiced dental fricatives are (arbitrarily) represented by % and $.

General characteristics of a filter-based implementation

The approach to phonological word building we are going to adopt here assumes that words are freely composed of syllables, and only syllables. (This is not an assumption which you should accept without question.) On this basis, word generation proceeds in two separate steps:
  1. the construction of syllables
  2. the combination of syllables into words
Syllable construction is handled by passing arbitrarily constructed phoneme sequences through a set of filters or well-formedness conditions (WFCs) until some such sequence manages to avoid rejection. A word is simply derived by concatenating some arbitrary number of output sequences from the syllable generator.

You can develop your own syllable concatenator quite easily and you can also experiment with a sub-module for the syllable generator which outputs random sequences of phonemes. We are going restrict our attention to the syllable filter which guarantees the well-formedness of any phoneme sequence it allows to pass through.

The syllable filter

The syllable filter uses two tests:
  1. conformity with the universal sonority sequencing condition
  2. conformity with language specific settings of sonority values
The language specific conditions are assumed to be expressible in the form:
  1. IF R1 THEN :R1 > 6
  2. IF R1 R2 THEN :R2 < :R1 - 1
where R1 and R2 are names for positions in the structure of the rhyme and where the values associated with these names are sonority values. (Possible segment positions are O1 and O2 (in the onset) and R1, R2 and R3 (in the rhyme).

Con dition 1 can be paraphrased as: If there is some segment at R1 position in the rhyme then its sonority value must be greater than 6. Condition 2 says: If there are segments in positions R1 and R2, then the sonority of the segment at R2 must be lower than that at R1 by at least 2 points. (Or, in a closer translation: It must be lower than 1 point lower than R1.) See Selkirk 1984.

The language specific conditions and the internal structural characteristics of onset and rhyme are implemented as lists:

make  "conditions [[[R1] [:R1 > 6]] [[R1 R2] [:R2 < :R1 - 1]]]
make "onset [O2 O1]
make "rhyme [R1 R2 R3]

Look here for the procedures which interpret these well-formedness conditions.

Ron Brasington
Department of Linguistic Science
The University of Reading

E-mail: ron.brasington@rdg.ac.uk