(you may click the number of the subfile to be viewed, or scroll down)

This file contains the following subfiles:


10 - common uncertainties
11 - classes and templates
11.1 - Why the definitions are unstable
11.25 - words to templates and vice versa: why?
11.27 - analyzing for essence
11.5 - automatically associate subsets
11.55 - musical motives and managing subgroupings of data


(subfile 10: dealing with uncertainties)

When dealing with language it is essential to have an organized way of dealing with such problems as approximation, unknown elements of a definition, elements which comprise a range of values, and definitions which specify classes of other definitions. In this project's model these four ideas are represented in similar ways. Imagine a printed set of star-light spectra, and that we want to find a subset of those spectra with certain characteristics: e.g., values must be present for some frequencies, and those values must fall in a certain range. It would be possible to construct a piece of cardboard – a template – with spans cut in certain places defined by these characteristics. By holding the template over the individual spectra, we could see which ones correspond to the set of characteristics we defined. The size of the hole cut for a given frequency would be one parameter associated with that frequency. Also imagine that the templates’ spans come with conditions, so that one can specify things like “if at 750 angstroms the value is between x and y, then ignore the value you see through that span at 800 angstroms.” It is the expansion of information at any given axis number from a single value to a range and a probability function that expands the associated meaning of the object to represent the uncertainties mentioned above.




 (subfile 11: classes)

The removal of one axis from a definition often leaves behind the definition of a class-word. An apple is a plant-product that's red, sugary and edible. Take away “red”, and what you're left with looks a lot like “fruit”. Fuzzy-storage methods  provide the mechanism for finding definitions close together in MS, so that these class-word definitions, made by dismembering other definitions, can always be found
(see "Fuzzy data, fuzzy logic" main file, page 22).



(Subfile 11.1: Why the definitions are unstable)

The various reasons for the instability of definitions will be described in detail later, after some relevant and necessary constructions are presented. In brief:

1)
In the training of an entity such as this program, at any given time there are unknown amounts of as-yet-untaught information. Additionally, there are extensive means by which the program processes and changes its own database - each of these also takes time, and there is no way to know how much relevant processing may have been completed when some current interaction requires accessing a particular node. Any calculation made today may be obviated by additional information or analysis made available tomorrow.

2)
All information-storage in this paradigm is "fuzzy" (see p. 21) - that is, while definitions appear to specify a point,  in fact all the routines in this program treat such specifications as the central cores of objects that are  extensive in MS. Accessing an object might therefore return slightly different coordinates either at random or according to other influences effecting the probe. Not only are objects (and probes) fuzzy, but different aspects of objects are activated by context; therefore two calculations that may appear to involve the same object can lead to different results, depending on this activation (see "context-controlled axis activation", p.5).

3)
Definitions of objects, definition complements, summed objects, etc., all consist of collections of coordinates and relations between them. The axes that create the coordinate space are sometimes only marginally connected to any dependable reality, and there are limited means for optimizing them. Primarily, the axes, and the metrics that allow values to be assigned along them, merely provide one of an undoubted infinity of systems for labeling things that are perceived by the computer (see "MS definitions: they're not real!" below, p. 34).





(subfile 11.25: WHY words to templates and vice versa?)


Frames and scripts are venerable AI concepts having to do with the organization of everyday knowledge (see below, subfile 39.6). A learning program should be able to construct and progressively refine both structures. Script learning is comparatively straightforward, since it involves series of behaviors; every branch of the short-term-memory in this program consists of statements and actions, and the series of actions are automatically stored and evaluated both by statistical usage and by reinforcement (such storage and evaluation are parts of the simplest Purr-Puss formulations, as described later).

Frames, however, consist of  associations required for the sensible consideration of a particular 'subject' – these and their default values need to be built up from both experience and intrinsic relationships among definitions. The experiential learning is similar to script learning, and depends on the appearance and use of objects in conversations with Teacher. Some relations among definitions, and some questions that can elucidate these relations, however, are discoverable by the program on its own (see also 'deduction' in subfile #39.7).

An essential part of "apple-ness" is "crunchy". Let's imagine that the texture-axis has received a value, but that little else has as yet been discovered about apples beyond that they are liked – that is, good to eat. (For example, one might see an object on the table, and one's horse might come along and eat it, making a sharp, sudden sound incompatible with "soft & mushy".) A routine concerned with newish words might well loop through axes that possess values, so we can easily imagine a situation in which a comparatively un-defined word would be in one register while the texture axis would be in another. The simplest of question-generators would merely have to proceed from 'word' to 'template', along the texture axis, to know that it should ask "Do you also like liquid apples?" An affirmative answer always indicates reinforcement, and so the motion from apple to apple-juice would be a candidate for membership in the frame of an apple. Since "apple" is a small point in the region of MS known as "fruit" the same motion would (more weakly) be a candidate for membership in the frame for fruit. Later this frame element could independently postulate the existence of any fruit having an associated juice.

The series of events just described requires that a question be asked by Teacher. More important is the establishment of frame elements without requiring the participation of any entity outside the program.

Frame terminals are those sub-contexts that are known to be associated with the central idea of the frame; for example, the function-frame for “chair” includes a back  ( if you’re talking about sitting in a chair, you must be able to lean back against something, or else it’s a bench ). It is essential, for a program that purports "to learn", to be able to construct frame terminals on its own, and the opening and closing of spans on axes can be a central part of this.

Suppose “running” has been discussed, but the frame for “walking” is as yet incomplete. The most basic PUSS would learn to recall words that came after “running” in Teacher's sentences - and in this example “running" is taken to be adjectival. (This means it is actually located in a different region of MS than is the present participle of the verb.) There would be “running water” and “running shoes” and not too much more. “Running water” would not match any aspects of the current context if pedal locomotion were the subject, so “running shoes” would appear near the top of the heap of relevant associations.



Expanding the “speed” axis in “running” would point to “walking”, among other things ( perhaps “jogging” and “sprinting” ). This is clearly a short enough list that each element could be considered - this means that “walking” would at some point come to the top of the association stack. Then, having a strong association to
the words ”walking” and  “shoes”, one is in a position to add “shoes” to the “walking” frame. An addition like this is more like a postulate until repetition confirms that it is an appropriate part of the frame.

Three mechanisms exist for removing unconfirmed frame elements. First, all additions to data structures like frames trigger the dark-cycle crawler that constructs questions for Teacher  <see main file, p.22>. This is no complex matter: in the case being considered, the crawler simply adds to a stack of questions needing answers, that here might say “are shoes relevant to walking?” and subsequently “what default value(s) should there be?”

Second, reinforcement drives many processes, and if, in conversation, Teacher refers to shoes and walking in close proximity, then the "shoes" terminal in the "walking" frame will be reinforced. This is no complicated matter, either; all frames associated with active words are present in the buffer  <see main file, p.25> as soon as they appear, and Teacher's references are so important that they immediately become addresses for content-probes. This would lead to the "walking" frame without any search whatever.  In this case, a frame element that is "not walking" would not immediately be removed because of this reinforcement of "walking", but would sink lower into the bin of  relevant objects; each such sinking reduces the lifetime of the object in the environment (see "Big Buffer", p. 25)
 
Third, all the terminals in a frame are stored (like everything else around here) as PUSS-records, which already include a feature that simply counts the number of times a particular association is made or accessed. After a large amount of teaching has occurred, the counters for each postulated terminal could be examined, and those whose appearances are too few could be suppressed.

(subfile 11.27: analyzing for essence)

Suppose a number of defined words are seen to be isolated in a region of MS, with regions of relative emptiness all around. If all of the words consist of fully expanded definitions (that is, ones in which there appear only axes and pointers, no sub-words) then it may be useful to find out which word best defines the entire group, which words divide the group into classes, what the characteristics of those classes are, and what makes each word a distinct individual. All this is accomplished with a dark-cycle tabulation of subset appearances.

1) best group definition

It is perfectly reasonable to construct all the subsets of axes present in a set of words. By "reasonable" I mean practical from a computational point of view: it doesn't take too long. (It is also reasonable to imagine the neurological analog: many (all?) subsets of sense-inputs must cause different cortical states to arise.)
Different subsets occur in different numbers of words, allowing the simple summing up of a score for each subset. Call the largest subset that occurs in all the words of the group Smax. Then, of the words containing Smax, that word with the fewest additional axis values is likely to be the best class-word for the group.

2) division of the group into classes

Examination of the ranked list of scored subsets would not in itself necessarily provide a list of words, since a subset might not constitute a word and vice versa. Subsets by definition divide MS up into regions that overlap according to the amount of common axis information in the subsets. There is no obvious and simple way, on the basis of axis content alone, to decide which subsets constitute more meaningful or useful partitions of the space.

Fortunately, the combination of subsets with extant (English) words can serve that purpose. We assert that human language must have evolved a sensible subdivision of meanings, and use the presence or absence of a word to score a subset that overlaps the word's location. This is of no help in non-linguistic realms, of course.
In those cases one must make some assumption about scoring. For example, one could assign higher scores to groups of subsets on the basis of isolation (the less overlap, the higher the score). Assumptions such as these would be among the program parameters to be influenced by reinforcement.

3) characteristics of the classes and individuation

Once a region has been divided, one has by definition differentiable sets of axis-content, and that axis content IS "the characteristics of the classes". Likewise, each actual word's definition, minus the axis content of the class in which it is found, constitutes the properties that makes that word an individual among classmates.

Such a dark-cycle analysis of subsets can be running continuously, setting up scored boundaries within MS. In this case, a boundary would receive a higher score the greater were the distances to word-clusters on the sides of the boundary. This is conceptually trivial: we wish to know what regions of the space are incompatible with other regions. The extent to which a possible region is separated from its neighbors is a purely numerical matter. Once a region is found, or once it is decided to consider an area to be a candidate for "region-hood", then its compatibility with non-contiguous regions can be analyzed, just as was its region-hood, by a tabulation of bombs and other associations originating in the region.


(subfile 11.5: learning word fragments, and what is a "syllable" in a melody, an action by an arm, or a path?)


The syllable function will be performed by a dark-cycle crawler (which see: main file, p. 22). The goal of that procedure is to examine the whole vocabulary of defined words, in order to determine what syllables can be extracted that have related meanings. Initially we identify word-fragments that have similar spelling or sound. Any words containing the fragment are treated as a 'cluster' (see main file p.27):


    1 - create a group of words with some suspected internal commonalities: such a commonality is
        suspected - initially - if there is a similarly spelt fragment or if there is a sound with
    a similar subset of I.P.A. designators

    2 - the coordinate sets comprising the words' definitions provide a list of axes

    3 - each axis receives a score
        a - proportional to the frequency of its appearance in the whole group, and
        b - inversely proportional to the MS distance between the coordinates in which it appears.

If some of the words in the group contain an identifiable  fragment with a common meaning, then the axes associated with the common fragment will have created  peaks in the distribution. (This is a somewhat simplified description; because of the difficulty of determining appropriate thresholds for the peaks, the quantity actually used is the first derivative of the change in score as the end of the examination of the group is approached.) At that point  the fragments can be stored away as if they were words; they are treated like suffixes, prefixes, and conjugated verbs.

An example of such a fragment would be "quant" from the word "quantity". All the words like "quantify", "quantize", etc., would include axis subsets that would show up as peaks when this function is used. The vocabulary might include "Quantz" (the 18th century composer) or "Quantum" (the model of car), but these words would share no significant axes with the others, and therefore would not contribute to the statistics that would define the fragment (which is, in this realm, a syllable).

Likewise, syllables that can have two separate meanings will form two peaks, and will be separated and dealt with just as homonyms. An example would be "duct", whose usages in "ductile" and "conduct" would have no overlap in axes.



Finally, it is necessary to avoid inappropriate assignment of meaning to subgroups that have so many meanings that they should never be considered alone. Consider "con". A large number of words contain this spelling and this sound, and so this collection of letters would cause the function to be initiated. The function would create a number of peaks, some of which would be inefficient as members of a working vocabulary. "Con" has rather different meanings when it is followed by "t", "sent", "v", etc. "Co" would be even worse. The difference between these "problem syllables" and ones that are useful is extremely simple: multiple peaks in the axis histogram created by a proposed syllable always implies that the selection made is too short. Additional letters need to be added and the procedure repeated. Eventually so many letters will be added that all peaks will disappear (the entire collection "conversion", for example, would create no peaks: it would appear only in one word, and would not be considered as a candidate. Note the similarity to window resolution issues in Purr-puss.)

Successfully established fragments have a form exactly like all other objects, and therefore they can be extracted into templates like any other word. This provides a sort of class-word for morphemes. Combining templates of word fragments will be a principal means of elucidating (guessing at?) the meaning of unknown words, and of postulating the existence of words not yet entered into the dictionary.

Different bodies of information about words is expected to be available depending on the initial clustering function. Using spelling of syllables will not always coincide with fragments identified by sound. Relationships such as those between "clean" and "kleenex", "focused" and "focussed", or "Pittsburgh" and "Allenberg" would be missed if syllables are only defined by spelling.

We are quite ready to accept the possibility that words fragments thus established might be "real", useful and consistent parts of the language. These fragments can be found bya  brute force comparison of vocabulary elements as well. That is, instead of seeking an initial group of words that is suspected to contain meaningful subunits, we simply set the comparator going and accept any group that is extracted as a candidate for containing useful sub-items for isolation. Thus, for example, similarities related to the structural elements of words would come out (see 'deftrans', main file p. 14).

Since linguistic words look the same as the objects in the musical, navigational and manipulator realms, the program will attempt to find parallel relationships among non-linguistic objects as well. For example, suppose that the robot arm has learned to do lots of things in the blocks world. Each task would have presented series of actions that could be used (interpreted) the same way that series of letters are. Then certain subsets of actions would emerge as being consistently used, just like syllables. For example, grasping a block would undoubtedly be followed by such a small set of actions that each such pair would be found by the function described above. For example we could expect "grasp-lift", "grasp-rotate.left", and "grasp-compress" to emerge as useful "words"; in fact, "grasp" might never be used alone after considerable learning has taken place (except perhaps in sequences of actions used to debug ‘grasping’).


(subfile 11.55: musical motives and managing subgroupings of data)



The idea of a motive in composition covers almost any organizational element created for a particular piece. Music theorists do not usually use the word for elements that are common to all music in the style (such as 'major triad' in common-practice-period Western music), but this program does not make that distinction. Any perceptible process or relation will be learned (if it is repeated).  Some separate mechanism would have to be active if the learned elements are to be sorted into subgroups, such as "motives appropriate for a morning raga" or "common harmonic progressions in Bach". Without such subgrouping, the program, when asked to "create", would combine styles without prejudice.

These subgroupings in music are easy to understand; linguistic analogs include "the current subject" but also "the people present". It is another delightful convenience of Purr-Puss that these subgroupings can be separated from one another in a trivially simple way - and in a way that is closely related to the core function of Purr-Puss.

Without involving any other code, one separates such regions from one another merely by adding one window element to the feature vectors used in the Puss storage method. (At this point readers not familiar with the workings of learning algorithms using content-addressable memory should read the section  on OTHELLO, p.42, main file.) For example, suppose the program is learning to move blocks. A large number of actions might work perfectly well when all the objects present are dry. These actions would be governed by feature vectors such as

         goal:"no red blocks"  

    action: remove (block with property: red)  

end condition:(find(block with property red) fails).

The action would be a composite daemon, whose subDaemons would include "grasp" and "lift".

This would be found to fail under the circumstance "oily". Grasping wouldn't work, and instead, maybe scooping the block up would be an available option. If this succeeded, then the feature vector for the task could have added to it a channel for dryness, and if it had the value "oily", then a different series of behaviors would have to be learned. This technique has a lot of problems not addressed here.

When this program uses separation of data regions, it requires an additional storage cycle for each item. Likewise, the separated data is available both as part of the whole and as a subgroup. For example, if a game is being played at an unrestricted level, and a move has been recovered from the unseparated part of MS, then with one additional probe of memory, the program can discover that some move would be the best one available at a specified restricted level.


 The following examples are equivalent with respect to establishing boundaries between subgroupings of data:

    1 - conversation participants
    2 - current subject
    3 - whose opinion is currently in play
    4 - which branch of an argument is being considered
    5 - what musical style is involved, or what mode, or which raga, etc.
    6 - what level of game play is being used
    7 - which team is "on offense"
    8 - what "tense" is current (past, future, hypothetical, wishful, present-sarcastic, etc.)

One way to clarify this equivalence is to characterize to sorts of data regions that are being separated.

    1 - Clearly, in normal conversation, one knows to whom one is speaking. This knowledge is critical: I have an acquaintance who, as an infant, spoke English with Dad, French with the Nanny, and German with Mom. She was unaware she was tri-lingual: the different languages were just "how you talk to <this person>".

   

    Another aspect of "keeping track" of conversations involves "who has been informed of which facts".

    As with all of the examples that follow, both of these types of regions can be separated by adding single identifiers to all the feature vectors. If one is talking to a person with whom no previous conversation has taken place, all sorts of conversational tacks are available that become irrelevant with partners who are very familiar. If "the person present" can be part of the feature vector (the Puss window) that is involved in making a decision about where a conversation should go, then the decisions made (which are based on the points in MS accessed) can be sensibly limited, for example, to things not yet discussed with this person.

    2 - Suppose we're discussing skateboards. Different facts are useful depending on whether one is considering 'safety' or 'fun'. If the current conversation involves one or the other of these, then a single coordinate can be added to the windows to prevent irrelevant matters from being brought to the fore. Whole different regions of MS - that is, whole different sets of associations - must be enabled or inhibited by whatever the current context includes.

    On the other hand, even a conversation concerning the 'fun' available with a skateboard might occasionally turn to issues of safety; re-enabling those associations is accomplished by removing the limiting window element and probing the memory anew.

     3,4,5 - In normal conversation we commonly evaluate different opinions in turn. Reasoning appropriate for one side of a debate, or logic of a sort used by one particular person, might be out of place for others. In order to follow a single "train of thought"  without contamination, for example, by facts unknown to that side of the debate, it is once again necessary to limit the availability of portions of MS. The limiting could be accomplished by inserting an identifier into feature vectors as long as the limiting is required.

    This procedure can be taken to extremes: if every feature vector ever used includes some arbitrary marker, then an entirely separate 'learning entity' can be running using the same code and memory, without interference with the original program. (An interesting side effect of doing this would be the ability of the entities to enjoy any degree of common access to knowledge desired.) This is the process for using a single program image and memory block to learn to compose melodies in different styles. One batch of learning, accomplished with a training set from one style, can be kept entirely separate from other styles learned, or, the output routines could be allowed access to both batches of learning, producing a mixture of styles.

    This implies a functional analogy between musical style and personal opinion; not surprisingly, since the choices of any given composer must of necessity be based on that composer's 'personal opinion' about "what will sound best".

    6 - Many computer programs for executing game strategies are endowed with an option to limit the level of advancement the program can employ (sometimes this limitation is desirable to reduce the amount of time required by the computer to make its next move). Sometimes these limitations are a matter of limiting the depth of exhaustive search, but if the adjustment is more heuristic, then the limitation of ability amounts to a limitation on the regions of MS available to the decision-making routines.

    7 - See SOCCER, main file, p. 47, in which the effective bit-width of the window is doubled by the use of a single bit whose value reflects 'possession'. In this case, the value of the possession bit not only determines which half of that MS is available, but it also allows the meaning of input channels to be toggled whenever possession changes. This would be rather the same as separating the two halves of memory needed for chess openings. One could allow the function of a program to phase from opening-moves into middle-game moves by allowing successively more access to the other half of memory as the two sides (black and white) become more equivalent.


      8 - We all have modes of discourse that are determined  either by social considerations (how polite one should be at some moment) or by what might be called literary or theatrical "stance". A sarcastic reply, called for when some disagreement is sensed, would call for a different set of linguistic paths than one required when comforting a child. Of course, merely having the equipment to express different stances doesn't tell one when to use them - this is an entirely more difficult question.