Removing redundant features

The problem

The segment filter which we based on negative well-formedness conditions outputs what a phonologist would call a fully specified feature matrix - i.e. a bundle in which every possible feature is present and specified as having some value. The output is necesarily of this type because the system randomly generates fully specified matrices and does no more than reject bundles which do not meet the WFCs. It does not at any stage modify the contents of a feature bundle.

Now while such an output might be appropriate as a way of providing representations of permissible allophones (or at any rate phones) in a given language, it cannot provide for the representation of phonemes, whose feature matrices (if they are to capture the contrastive possibilities of the language) must be free of all redundancies.

Interpreting WFCs as feature deletion rules

The solution proposed here handles the elimination of redundant feature specifications by adding to the segment filter an extra phonemicizer module. The algorithm used in this module depends on interpreting a set of well-formedness conditions (WFCs), implemented in essentially the same manner as Condition 3 of the segment building system, as rules for removing non-distinctive features from fully specified feature bundles. The WFCs are implemented as a list called "conditions which contains, as its members, two-element sub-lists of the type:

[[+syll] [+son +voice -str]]

The first element of each such sub-list provides the structural description of the rule, i.e. it specifies the feature(s) a bundle must contain if it is to be modified. The second element, which lists the predictable features to be removed, determines the structural change.

Arguably the feature redundancies could be even more economically expressed using the following format:

[[+syll] [son voice str]]

which might be read as Values of the features sonorant, voice and strident are redundant in syllabic phonemes. But at this stage we will restrict changes to the WFC format to the minimum in order to emphasize the fact that effectively the same set of conditions may be used (with different interpretations) by either feature insertion or feature removal algorithms. The only change you may have noticed by comparison with the format used in the segment building system is that the second element of the condition has been reduced to a single flat list rather than maintain the more heavily structured pattern, [[+syll][ [+son +voice -str]]]. (For present purposes, any embedding of the list of features to be changed in a yet higher level list is quite pointless given that no choice is ever available in the features to be removed and that as a result pickrandom is not required.)

Despite the limitations accepted here, you will none the less find it interesting to experiment with the economical specification of feature redundancies suggested above. You might also consider rewriting your segment building system so that the two types of conditions - one governing the insertion of distinctive features, the other the insertion of non-distinctive (redundant, predictable) features - are clearly separated, differently formatted and applied by different algorithms. In this way precisely the same set of conditions - identical data structures - could be used to control feature insertion and feature deletion.

Phonemicize - a feature deletion procedure

The general behaviour of a procedure which will interpret well-formedness conditions appropriately as feature deletion rules is easy enough to describe with a flow diagram:

Since the intended output such a procedure is a phonemic representation, we might as well call the procedure phonemicize. It will clearly be recursive and will require two inputs:

  1. the segment to be modified (initially fully specified)
  2. a list representing the set of WFCs (initially the full set).
On each recursive step, the daughter procedure will be called with these inputs modified as follows:
  1. a new segment, reduced, if it matches the requirements, by killing off the features listed in the second part of the current condition
  2. the butfirst of the condition list.
Once the terminating condition is introduced, the procedure will, in other words, look like this:

to phonemicize :segment :conditions
if empty? :conditions [op :segment] 
if subset? first first :conditions :segment 	
     [op phonemicize 	
                ( killoff last first :conditions :segment)    bf :conditions   ] 
op phonemicize :segment bf :conditions 
At any stage, what we have called the current condition is first :conditions.

The structural description of the corresponding rule is therefore first first :conditions and the test for a match is provided by subset? first first :conditions :segment.

The structural change - the removal of the last first :conditions from :segment - is handled by a subordinate procedure called killoff, which, of course, needs to be defined. Notice that it is killoff, in the parenthesized expression, which provides the first input to next clone of phonemicize by outputting a modified form of :segment.

Killoff - the real villain

From the way in which killof is called it clearly expects two inputs both of which are simple flat lists. To do its work, it must output a new list formed by deleting the elements of the first input from those of the second - i.e. it must eliminate the redundant features (input 1) from the feature bundle (input 2). Since this sounds very much like a job that could be handled by the general purpose tool remove, you might say: Why reinvent the wheel? Why not just use remove? Well, here is remove as defined in the tools list:
to remove :old.item :list
if empty? :list [op []] 
if :old.item = first :list [op bf :list] 
op fput first :list remove :old.item bf :list 
The problem is that remove removes - or attempts to remove - input 1 as a single object from input 2. Killoff, by comparison, must remove each of the elements of input 1 individually from input 2. Suppose input 1 is [@son @voice], killoff must remove @son and @voice separately from input 2, not least because input 2 - as a flat list - could not possibly contain anything like a sub-list [@son @voice].

This does not mean, however, that we give up all ideas of exploiting remove. What it means is that if we are to take advantage of this ready made procedure, we must find some way to apply it recursively, working through each of the elements of the list of old.items turn by turn. This is where killoff comes in. Killoff is a recursive higher level parent of (the already recursive) remove which nibbles its way through an old items list (:old.items) passing on to each clone a second input (representing a feature bundle) already modified by one application of remove:

to killoff :old.items :list
if empty? :old.items [op :list]
op killoff bf :old.items remove first :old.items :list
(Killoff as defined here is specifically tied to the running of remove. For a more general approach to this kind of problem look here.)

Putting it all together

To test-run the the new module nothing more is needed than to load the phonemicizer procedures alongside those of the segment filter. With both sets of procedures in your workspace you can remove the redundant features from a randomly generated fully specified segment by passing the output of possible.bundle to phonemicize, as its first input, with :conditions as input 2:

phonemicize possible.segment :conditions

To get anything like sensible results from the system you will obviously need to work on the contents of :conditions to make it reflect the constraints in some particular language. if you do this, you will realise that there is some considerable duplication of activity between the linguistic facts represented on the one hand by the filters (which will, let us suppose, disallow [@nas !voice]) and the conditions (which, in the same circumstances, will include [[@nas] [@voice]] ). Saying the same thing twice, even if expressing it differently, is not a good idea. The segment building system, especially if made to distinguish the insertion of distinctive features from the other the insertion of non-distinctive features, is from this point of view arguably preferable.

Ron Brasington
Department of Linguistic Science
The University of Reading