(you may click the number of the subfile to be viewed, or scroll down)

This file contains the following subfiles:


38 - use of frequency in data structures
39 - changing the subject/cancelling a context
39.3 - tendencies of word-objects --> parts of speech
39.6 - frames
39.7 - subset-types and vocabulary extraction

40 - part-of-speech templates




(subfile 38: frequency, amplitude, and damping)

The frequency of vibration is a rather direct matter: imagine looking at two paths with similar constructions, and in one, a word is vibrating at a high frequency, and in another, it vibrates only rarely. The one vibrating rapidly spends more time at the opposite pole of its vibration than the slow one, making its vibrational partner more of a candidate for perception by, or use in, another part of the program. The occurrence of a related word, or of an axis present in both words, might be a signal to increase the frequency or otherwise change the energy of an element’s vibration. (These processes are expected to be directed, sorted, and evaluated automatically via the reinforcement mechanisms described in the main text.)


Amplitude has an equally direct analog in this program. A cloud surrounding a word extends further into MS if its amplitude is higher - providing another numerical analog for degree of generalization. A number of structures in the program incorporate the general premise that any point in MS essentially defines the center of a larger region. Axes with higher intrinsic mass would be stimulated to less vibration by the same input; for example, suppose we are discussing my cup of coffee. The density of water is pretty stable: that axis would have a high mass, implying first less vibration and then fewer conversational options. And in fact we seldom discuss the density of our coffee. We are much more prone to discuss aspects of it that are more changeable, like temperature or concentration. Thus knowing that water's temperature is less "massive" leads to easier, faster, and wider vibration, which leads to more conversational activity. It is essential to have such easily calculated ways of knowing what's best to discuss.

Most vibrations in nature are damped, and slowly decay to a degenerate state. Some oscillations used by this program will be part of the current context only, rather than a permanent learned part of the word’s definition (these impermanent vibrations exist in a quasi-temporary structure called the BigBuffer, and  are not stored in the dictionary with the word). These context-dependent vibrations decay as they age, losing frequency until they drop out of the buffer.

This provides an imperfect method of insuring that context can evolve. Unfortunately there are many times in thought and conversation when contexts are entirely canceled  – I have no idea yet about teaching the program automatically to recognize when this happens. Perhaps it will be shown to be similar to the cadential calculations (add link) that might signal when internal conversation should "go public".


For example, a new participant in the conversation could arrive, and could introduce a new subject. The old one might never return, and in such a case, all this context-driven oscillation should be suddenly quashed. Perhaps some signal will become clear that can stimulate the program to ask “have you changed the subject?”
Quite a good deal of computation would have to be thrown out, or at least allowed to subside into the background.





(subfile 39: cancellation of a context)

For example, a new participant in the conversation could arrive, and could introduce a new subject. The old one might never return, and in such a case, all this context-driven oscillation should be suddenly quashed. Perhaps some signal will become clear that can stimulate the program to ask “have you changed the subject?” Quite a good deal of computation would have to be thrown out, or at least allowed to subside into the background.


(Subfile 39.3: Vibration, resonance, association, and parts-of-speech)


Consider the resonances, vibrational partners, bombs, and Purr-puss associations for nouns. These will all be quite different from the resonances, bombs, etc., for verbs. Such consistent groupings of qualities can be characterized numerically, summarized, reduced to templates, and so on. Three types of object are easily calculated:

    1) a word-type object that holds elements that are common to all nouns
    2) one that holds elements that differ between nouns and verbs
    3) one that expresses this difference functionally - that is, the transform between them.

Forming objects that consist of the sets of these characterizations is also straightforward. For example, "truck" would have 1) Purr-Puss associations with items that could be cargo, 2) resonances with other means of transport, 3) argument-partners (see "partnership" in subfile 12.25) with objects that specify destination and route,  and so on. An object consisting of this set of characterizations would have respective coordinate sets as diagrammed below.

There are three types of association (the left-most column), each represented in the canonical MS fashion as a "coordinate", namely, a particular axis and a value. Then there are the three bits of content to which the associations point (the right-most column), again expressed as coordinates.


                         axis                      value             axis      value            axis        value
1)    technique-of-association      purr-puss    |     pointer    next       |       contents   cows


                     axis                            value           axis      value             axis        value
2)    technique-of-association      resonance   |     pointer    next      |       vehicle    railroad


                     axis                          value          axis       value            axis        value
3)    technique-of-association      partner     |     pointer     next       |      route        TBA

Restated, in 1) a truck is associated via Purr-Puss prediction with its contents, "cows". In 2) a truck is associated via resonance with (another) vehicle, a railroad. In 3) truck is associated with its own route because "truck" and "route" were both necessary (together) as arguments to some other function.

Each one of these three sequences looks exactly like a part of a word – thus each can be transformed in all the ways words can be: reduced to templates, etc.
  
The templates for these three objects are like spectra, in that the activity of the words (the words' relations to other words) is what is summarized, not the meaning or content or definition. It is the function - the activity - of words that is involved with grammatical parts-of-speech. "Drive" interacts differently depending on whether one means 'cause a car to move' or 'what you take on Sunday afternoon', even though the definitions contain many identical coordinates.

(Subfile 39.6: frames)

Frames were introduced by Marvin Minsky in the article "A Framework for Representing Knowledge." A frame is a data structure for parsing knowledge into substructures by representing "stereotyped situations,"  and may be connected together to form a complete idea. The frame contains information on its use, what might come next, and what to do when these expectations are not satisfied. Some information in the frame is generally unchanged while other information, stored in "terminals," usually change. (Different frames may share the same terminals.) A frame's terminals may be filled initially with default values, which, according to Minsky, is based on how the human mind works. More about this program's interaction with frames appears in subfile 11.25.


(Subfile 39.7: subset-types and vocabulary extraction)

Why subset-types?

Parts-of-speech are obviously important and useful, even if some of the standard groups contain elements that are quite diverse.

Music theorists need to be able to discuss functional parts of melody and harmony, and a number of subset types exist for this purpose, for example

    Ornamental and contrapuntal functions (passing tones, anticipations, appogiature),
    Scale degree (final, tenor cadence, reciting tone, dominant), and
    Roman numeral chord analysis.

In chant, raga, and various near-Eastern melodic styles, there exist consistently used, small groups of notes; the particular allowed set of such subsets is determined according to mode or raga. These are well known to the performers and constitute a vocabulary of things to “say”, just as a rock guitarist’s riffs provide a standard group of resources.

Navigation also requires that diverse behaviors be exhibited, and that realm, like language, is one in which the same action can perform different functions in different contexts. For instance, a left rotation can be part of a search, an orientation toward a known goal, an optimal escape route, etc. The movements of a robot arm have functional subunits as well, that must be put together into useful (“grammatically correct”) sequences. Many of these short series become parts of a vocabulary, and may never need to be separated out, after they have been incorporated into some standard action.

Extraction of a vocabulary

Before a discussion of a subset-type can occur, a vocabulary must be defined; ‘parts-of-speech’ refer to ‘words’. To a computer, a text consisting of words appears as a sea of undifferentiated and ungrouped characters – there’s no a priori way for the program to know that “a” is an element that only occurs as a part of vocabulary elements, and that “ “ only occurs as a separator.  After all, “a” is just 97, and “ “ is just 32.

How are we to extract useful sets of events from such an ocean? We start with a logical assertion: if there are “events” in an “ocean” then there must be some “things” separated from other “things”. The only way to tell what “things” there are is by extended observation, with hypotheses. We agree to start with the simplest hypotheses, thus, we assert that there might be single character separators. For instance, if the previous paragraph is considered, the first possible separator is the second character, “e”. Such a separator yields a set of possible vocabulary elements to test, as follows (hypothesized elements here are enclosed within arrows, to make the existence of spaces possible):

→B←    →for←      → a discussion of a subs←       →t-typ←      → can occur, a
vocabulary must b←

These proposed elements are then simply sought out, throughout all of the available text, and, whenever one is found (again) we increment its score. We can predict that the proposed vocabulary element → a discussion of a subs← will almost never recur, and will acquire a very low score.

The “e” separator would receive a score equal to the sum of scores for all the vocabulary elements it proposes. The next step is to proceed to the next possible separator, namely, “f”, and score the subset-type it would create. We know without testing that no separator will acquire a score even close to that of “space” when examining English texts. Thus a vocabulary can be extracted from a sea of events such as text, because there is a single-character separator;  such a thing is a very obvious, simple, low-level thing to look for.

If we imagine setting up a modern desktop computer to perform this search on a sufficiently long text (maybe 10 pages?) it clearly will discover that “space” is the best separator very, very quickly. Imagine, then, a crawler that is expected to operate, say, overnight. Quite a large number of separators could be tested, including all 2 character separators, as well as many in which the function of each character changes according to some simple function.    

Hofstadter’s problem

This problem can be discussed in terms of the paradigm described by Hofstadter, in which a series of integers is presented, presumably ordered in some way; the puzzle is to find the rules for generating the series.  For example, the following series

    1.1.0.1.1.0.1.2.1.1.3.2.1.5.4.1.8.7.1.13.12.1.21.20.

must be found to be defined as: a series of triplets, all starting with “1”, followed by a second number determined by the Fibonacci series, followed by a third number equal to the second number minus one.

    1    1    1    1    1    1
        1        1       2       3       5        8
            0           0          1          2          4           7


The relationships that we are asked to discover are

    repeat: triplets
    define triplet:
        first element is “1”
        second element is: start with “1s”, and thereafter,
second_element number ‘n’ = (second_element number ‘n-1’) +
(second_element number ‘n-2’)
        third element = second element minus one

For the purposes of this demonstration, assume that we are using a base larger than the largest number present, so that the problem of  13 followed by 12 is avoided ( that’s why I include the periods as separators above).

A method for finding these divisions using “clusters” is described at <>.

Here we allow ourselves to use whatever is available to a calculating machine, but we require that we always start with the “simplest” things. Thus if we say “you can use arithmetic operators” we start by using only one at a time, or if any rule-type has a number in it (such as “repeat ‘X’ five times”) we always start with the rule-form that has the smallest number. This limitation on the search parameter arranges it so that the simplest options will be tested first, and, if morning arrives and the search must end, wherever we have arrived will always be the same: “We tested everything we had time for, starting with the simplest”.

We assume that the elements in the series are “different”. We therefore initially seek these differences. With integers, this means subtraction; with complex word-type objects with large numbers of coordinates, a more complicated “subtraction” must be defined (see “difference clusters” below).

It is very important to remember, however, that the computer doesn’t know, or care, how complex a function is involved. In the absence of the rule “start with the simplest version of any process you propose”, we could test for vocabularies based on any function whatever. Such indifference (on the part of the method as a whole) is relevant whenever there exists some aspect of the current context that suggests starting with something other than the simplest possibilities. In the case of the algorithms described in this paper, there is almost always such information.

Having selected subtraction, we apply the “vocabulary defining” method described above.
1)    choose a pair of arguments (subtraction, in its simplest form, needs two)
    and choose them the simplest way first, namely, adjacent elements

for this example, pairs would include: 1,1   1,2   2,1    3,2    1,5   etc

2)    perform your chosen function on the arguments, defining, one by one, a set of transforms between argument_pairs

for these pairs, the transforms are  (=)  (+1)  (-1)  (-1)  (+4)

3)    label each transform, and increment the score associated with the label

4)    repeat from 1)

Using adjacent elements and subtraction, the transform-score histogram for the series above would be

8                                   |
7                                   |
6                                   |
5                                   |
4                                   |
3                                   |           |
2                                   |     |     |
1        |               |          |     |     |      |           |                 |                                   |
---------------------------------------------------------------------------------------------
        -6  -5  -4  -3  -2  -1   =   +1  +2  +3  +4  +5  +6  +7  +8  +9  +10 +11 +12

This shows that one of the relationships (from the second_element to the third_element of every triplet) overwhelms the other transforms rapidly. If we the write down the series with breakers at each occurrence of this transform, the “tripletness” of the series is also suggested (but not completely proven, as the sequence “ 3 2 1 5 4” confuses the issue).


1.1.0.1.1.0.1.2.1.1.3.2.1.5.4.1.8.7.1.13.12.1.21.20.


Such calculations therefore suggest two of the essential things about the initial series, including a likely separator.

Having found a separator we have by definition found vocabulary elements.




(subfile 40: templates for parts-of-speech)

Consider a word (we'll refer to it as "ego") defined initially as a noun modifier:

    1) ego will have connections and associations with nouns, primarily
    2) ego will resonate with other modifiers whose targets share properties among themselves – namely nouns.

Number 1)
    Consider the group of words (or summed objects) with which ego has acquired connections and associations. Now, we would usually call these words nouns (or noun phrases), but the program knows nothing of our assignments of parts-of-speech. It can, however, make certain calculations that result in the possibility of classifying words according to use - see below for a description of the crawler that will set up these groupings.
   
    The process here entails making a single object that is a particular type of summed cluster. In signal processing, one can deal with noisy data, as long as it is synchronizable and repetitive. For example, in recording electrical response in the retina, one takes hundreds of traces, all created by the same stimulus. These are summed to provide a view of their common features, even if the common features are well below the amplitude of the noise. This statistical idea has been applied to a variety of situations in the history of my experiments with Purr-Puss. In summing MS objects, each new occurrence of  a value on an axis that has already appeared contributes to a data structure that produces one or more peaks; the "peak-i-ness" can be quantified, and thresholds set. If the values that the cluster's words present to the sum are unified, obvious peaks arise and the axis is allowed to remain in the final result. The use of thresholds is dicey: in fact there is a fairly simple way to quantify the resultant cluster in terms of its simplicy and compactness.

Number 2)
    This goes one step further; take the same group of words collected for Number 1), and then collect all the words connected to those by the same operator that connected ego to the group. The technique of localizing these is the same, but it provides a group of objects that may have completely unrelated definitions, but that are used in similar situations. The cluster formed from these objects will contain the information about that use, and will, in a sense, tautologically define the operator that formed the group.

Connection to "direction"

    The third step in this process is to relate ego to the clusters. The clusters will, like any other object, have a specific location in MS. The various clusters - that is, those statistically created property sets probably corresponding to our "parts-of-speech" - can then be reached, one from another, by bombs, the activity of which, along the various axes, is determined by the statistical measures of simplicity and compactness presented by the data.

    The relation between clusters is general - say, between "foodstuffs" and "metabolism" - so that there is useful knowledge contained in the relationship between clustered information constructed from a conversation involving these two concepts. Specific words in these arenas, however, exist at  distances to the clusters that will limit their participation in any meaning taken from the relation between the clusters. For example, sugar-free gum is a foodstuff, but its properties locate it rather distantly from an energy drink. Likewise, lung-membrane permeability is related to metabolism, but its properties locate it rather distantly from Krebs. These sorts of pairs of entities would exist at quite different directions from each other than would the clusters.

    The utility of all this is, having established these relations by making calculations based on initial definitions and conversation, the program can, without instruction, deduce that sugar-free-gum is not related to energy-creating metabolism, without ever hearing anything about this, either in conversation or in the definitions themselves. This is independent analytical reasoning.


Unfortunately, a consideration of the data structure (that results from defining a part-of-speech as a template) suggests a complication not obviously accounted for in the usual discussions of grammar. Since the data structure is similar at all levels of construction, it is clear that it allows for entities that function like pronouns in several other “places”. Substitutions, "wild-cards", receptors, and possibly Minsky's 'pronomes' are related to this problem.


    *Pronoun:     subs for a noun         (George likes apples)                     (He likes apples)

    * auxiliary:     subs for a verb         (question: George likes apples?)     (Yes, he does)

    *phrase:         subs for a modifier (George is green?)                             (Yes, he looks-thus)

    *“yes”:         subs for a sentence (Are you green?)                                 (Yes) ((I am green))

    * a word is a substitute (label) for a set of axis-value pairs + pointers + operators

    * a phrase-number substitutes for a phrase concatenated from words

When a wildcard function substitutes for the  value on an axis in a definition, a class-word results (”apple” minus “red” equals “fruit”) but exactly what is implied, for instance,  by the data structure in which the  axis itself is replaced by a variable is less obvious. For example, let's imagine that we have collected together all the axes that, like "color", can point at apples as modifiers. Now, we know that replacing the value "red" by a variable produces a class-word specifying something like "types-of-apples", but what class is specified by an object in which the axis "color" is replaced by a variable - a variable that can take as values all those pointing-modifiers?  Such a replacement creates a class of objects that can take any of the usual characteristics of object-ness whatever – this seems useless. On the other hand the creation of small groups of classes – like the parts-of-speech – that manage, between them, to inhere all words may be useful ( see "parts-of-thought" following in the main text ).