(you may click the number of the subfile to be viewed, or scroll down)

This file contains the following subfiles:

32 - example
32.2 - deciding what the program should do
32.9 - internal & external transforms
33 - Illusion of transforms - Ur Path - philosophical difficulty

(Subfile #32: example)


One of the model conversations I enjoy considering is

    1) Want an umbrella?
    2) Is it raining?
    3) Well, duh.

Sentence one requires preprocessing with transforms, and is sufficient for this example.

For sentence 1) to be useful it must be changed into a) below. This is partly the function of the internal conversation, but is also necessary for other unspoken calculations to take place.

    a) Do you want to be given an umbrella for your walk outside?

  We understand that fragment X below is an idiom that can mean b), c), or d).

    X:  "Want a <>?"

    b) "I have one here for you" or
    c) "We can acquire one" or
    d) "We should spend time getting one".

The first consideration for programming is seeing to it that there is a mechanism that can learn from experience that these choices exist, and that both ends of the process are appropriately general. Second, transforming "Want an ...." from 1) into "Do you want to be given an...." in a) requires deciding which of the available contexts b), c), or d) to use. Finally, the words actually to be substituted for "Want a" must be assembled.

Generalizing Input

A number of common phrases have the same content as  X:
        Would you like to have…
        Does madam care for a…
        Do you want…
        Is it your wish to acquire a…

    And slightly less similar:

        Can I get you a…
        Does George want a…
        Did you want some…

Each of these can accept the same objects (Would you like to have a cup of tea?) Were the program to see these seven similar phrases all as completely different objects, it would have to learn to associate every  object with each of the seven similar phrases. Relating all seven, however, to a single point in MS, means that this unacceptable situation does not arise.

To bring about this common association the internal conversation of the program needs to extract from each fragment the same primitive elements and form them into a single feature vector with which to make associations. "Extract" is what this process looks like to a human, but it is in fact nothing more than other associations. For example, one of the primitives common to each fragment is the "question" plus-word. In each case the word order of auxiliary verbs has been used as the LH-side for the prediction "question". This process also can look like a transformation, since it re-orders words like so:

    Do you want                     becomes        Question: you (do) want
    Would you like to have      becomes        Question: you want to have
    Does Madam care for a    becomes       Question: Madam wants

We like to have these various ways of speaking, but the program uses just one process.


Generalizing Output

In transforming “Want a…” into one of the phrases above, we understand the involvement of a central point in MS to which all possible transformations are near. Any of these nearby phrases, if seen in a conversation as input, could be treated by the program the same way (I.e., to produce the same output).

The output of the transformation also needs to be general. It would be unacceptably inefficient if the program had to repeat every aspect of its learning for each possible object that could be wanted. The same list can be considered to be output. If some transform resulted in the use any of these closely related phrases, all of its partners would then become equally available. (Other aspects of the current situation would have to provide reason to choose one over the other: if we knew that we were addressing an honored, older female guest we would use “Madam, would you care for a…”.)

Learning from experience that a set of choices exists

For the kind of learning we want to occur, the seven similar phrases (“want an umbrella?”  “would you like to have an umbrella” etc) each would have to appear in the experience of the program - either in input or in internal calculation. Although it is certainly not necessary for the program to hear something before it can be produced as output, that’s not the question here; we want to be certain that the program CAN learn about a set of choices, regardless of their origin.

The phrases can be collapsed to points like any other object;  these in fact exist at different coordinates in MS. The points are close, as such points go, but not exactly coincident. For this reason it is necessary for the program to have ways of deciding if such a group of phrases is interrelated. The mechanisms by which the program learns to associate such nearby points are all discussed in detail elsewhere:

    1) overlap of fuzzy data             (points are actually clouds - <fuzzy data: p.21>)
    2) crawler-optimized search      (searching for nearby objects is taken care of outside of
                                                                run-time, or by a separate computer - <pp.36 & 22> )
    3) dark-cycle-created bombs     (associating pointers connect regions of MS and are also
                                                                created by crawlers - <p.10>)
    4) resonance                             (two objects associated in common to a third are said
                                                                to resonate - <p.17>)

Each of these methods requires no heuristic cleverness, and each results in an easily calculated numerical measure of relatedness.


Learning that a set of phrases all may serve equivalent purposes requires that they somehow be labeled, or stored on a list, or - as is the Purr-Puss way - associated as RH-sides with some point in MS. There may be groups of phrases (held together by their common ability to communicate a single meaning) that overlap with other groups: a phrase might serve different idiomatic purposes under different circumstances. These "different circumstances" are just other feature vectors, however, and a particular cluster type can be associated with them to limit and precisely define regions of MS that can be considered equivalent, such as the members of our list ("would you like a. . ."  "do you want a. . ."  etc). See "region clusters" under "Cluster Types", p.27 in the main file.


Deciding on a context


1) Want an umbrella?

    b) "I have one here for you" or
    c) "We can acquire one" or
    d) "We should spend time getting one".

Currently I am not considering writing a program that lies - that is, this will not be a program that "acts as if" it were a person. It will converse as if it assumes that it is, itself, conscious. Among other things, this means it cannot "have" any physical objects (unless it gets connected to a real robot arm - and then it can really only have a few things. . .). Therefore the first context is only going to appear when a human says it, so it's not worth worrying about yet. The program can, however, have knowledge about the whereabouts of things.

Sentence 1) tells us what object we're concerned with - it will therefore be in the current-objects buffer (see "BigBuffer", main file p.25). One of the activities of this buffer is amassing all learned associations immediately upon the appearance of an object, and if the location of the object has been discussed recently enough, it will thus be indicated by a pointer in the buffer. The presence of a pointer to <object> will unquestionably be related to the idea of acquisition very early in learning; therefore c) is a natural result of the sequence:

    sentence 1)  --> place umbrella in buffer --> acquire "umbrella"'s associations -->comment on them

It is at this point that the defTrans would become active. There is a word in the input that’s quite rare (umbrella) and a phrase that’s quite common.  The common phrase is a pointer to an object ("an") and an interrogative ("Want"(?)) - the summed object formed from this pair provides the content of the defTrans. It is this content that becomes the context (puss-window, LH-side, feature vector) at which the grammatical patterns of  sentences that followed are stored. Each of the contexts b), c), and d) correspond to one of those patterns.  In this way it is learned that the grammatical structure of part of the input (i.e. of "Want an...") can be associated with any of the structures of b), c), or d) when the input appears again later.


Transforming the sentence fragment 

Above we have considered the transform type in which the structure of words’ definitions is used to learn associations to other structures. It is understood that most of these structures could underlie multiple meanings - probably too many to allow them to be useful directly in predicting specific content. If the input to the transform exists only in the realm of  structure, the process is trapped in that one level.

Fortunately there is no reason to limit the variety of data types used in inputs  (freedom in such combinations is clearest in the simple context of ECONOMIST <main file p.44>.) In this discussion, "structure" means "axis ID, pointers, frame terminals", etc., but no values of the axes: that is "content".  The information in the structure of some of the words can be combined with specific content of others, reducing the generality of the result in different ways according to the meaning chosen for input, and reducing it an amount proportional to the amount of content chosen.

In the  example above. The structural information in “Want a…” can be used to predict structures appropriate for the next sentence, while the content of the definition of "umbrella" can relate to appropriate contexts. These two types of information can combine to form the LH-side of Purr-Puss production rules that act as transforms - that is, rules that allow the learning/linkage of  a) grammatical and definitional structure, plus some content, to b) future content. This procedure is one of those similar to use of intersecting MS surfaces <pp. 15 and 31>.


------------

Finally, imagine that a series of deftrans labels has succeeded at predicting a reply-formation-transform (rft), and that the result of using that rft receives a positive reinforcement. A typical process would then be to store the deftrans label(s) at a location provided by the rft itself. This reverse-storage is the natural way, at some later point when back-chaining is going on, for the program to infer from known rft's that some particular definitional structure should be relevant. Knowing that something should be relevant provides another inroad on the problem of choosing sensible replies. More interestingly, this "prediction" of what grammatical structure should have preceded the current sentence, provides the program with a way to evaluate the progress of this conversation as opposed to previous ones. Such means of numerically evaluating arguments must be found if internal conversation is successfully to be managed.


(Subfile #32.2: deciding what the program should do)

Where do we draw the lines between

correctness of communication             (factual)
efficiency of communication               (bits per concept?)
grammatical-correctness of reply construction     (adherence to rules of grammar)
elegance of language use                   (artistic composition)
language as art                    (poetry)

Should a program start with correctness and build through the four stages to elegance? This does not seem to me to be a realistic model for most actual conversation; people abbreviate, leave out "understood" elements, use redundancy for emphasis, etc etc. These first two depend on a person's internal model of the other person in the conversation - something as yet well beyond the current task - and the third requires that there be some active mechanism of social motivation; such motivation, like hunger and pain, is something computers just don't do <see subfile #65>.


(subfile 32.9: examples of internal and external transforms)

Internal transforms

Input                                                                                Output

<word> arrives as input .................................... command:retrieve definition

previous sentence ..............................................execute script:decrement reinforcement
negatively reinforced                                         of fuzzily related sentences




External transforms:

Input                                                                                       Output

Sentence from Teacher  ....................Pre-processed path for beginning calculations

Content from calculations .................Grammatical output for human listeners



(Subfile 33: philosophical problems)



An early plan for this program involved the use of shape transforms to morph input sentences into output replies. A variety of transform styles were foreseen. Perhaps the most obvious was to utilize geometric rotation, translation, and dimensional changes to learn how Teacher’s input relates to previous paths. Such transforms would use pattern recognition to allow the program to recognize grammatical structures from Teacher that generated reinforced behavior in the past. The transform, with its fuzzy generalization at both input & output, would then be applied to the current conversation, and output would be a morphed form of the most recent sentence. In this view, a transform is a daemon, activated, like an antibody, by the appearance of input that fits its template; its action moves directly from input to output.

This early formulation still has its place, but I no longer believe that a “previous sentence” would be sufficient input, nor do I expect output to consist primarily of grammatical replies. Additionally, the simple shape transform suffers from the same continuity problems encountered elsewhere (MS is not a well-behaved continuous space.) Fortunately, it is no easier to use, as input, an immediately-preceding sentence than it is to use any other extant object in the program (as will be described below: cf. the use of diverse time series in ECONOMIST - see main file p.45). Reinforcement is expected to drive input selection, in the same way as it is hoped to drive the actual procedural decision making process (that is, in any given circumstance, Purr-Puss can learn what input has been useful before.)


Bertrand Russell explains clearly how Zeno’s paradox is resolved by a modern understanding of different sorts of infinities, but Zeno’s conclusion that motion is impossible is still a valid conclusion for a person unaware of the niceties of 20th Century mathematics. To Zeno, motion had to be construed as a succession – in time – of discrete states.

It is almost certainly illusory to imagine that a series of statements in a human conversation would move from one sentence to the next as a result of direct transformation. There is a huge amount of internal “conversation” (see subfile #29) that must happen inside each human as the external conversation progresses. To possess any kind of “truth” or “reality” (or, presumably, utility), such transforms would have to include all of this internal processing. This is unlikely to be possible without postulating the existence of intermediate states, and without representing these in some way. Each participant in the conversation is moving from many internal states to others, and there are no two internal representational states between which another cannot be inserted. This makes reaching the goal of a reply as impossible as it was for Zeno’s runner to reach his goal. And yet Achilles beats the tortoise, and replies are, in reality, constantly being constructed.

For the moment, let’s not concern ourselves with the problem of initialization – so we’re not going to try to think about what a baby’s brain does, starting from its first perceptions. Let’s just imagine that an adult mind is conscious of its own current state, and that such a state can at least partially be represented by a path or constellation of operator-related paths within an MS - and that the MS involved has been enriched by decades of conversational experience. The "current state" of the adult mind is assumed not to include any entities invulnerable to logical investigation (that is, we consider for the moment there to be no souls, spirits, devils, or angels, operating on the mind); therefore the current state - if it can be represented in a computer at all - must consist of objects and their interrelations. Any such state - again, if representable at all - can be thought of as some sort of basic low-level path. Every element of this ur-path is represented by a fuzzy cloud of some size. Perhaps consciousness, or at least the progression of the internal conversation necessary for reply-generation, could be thought of as the crawling of this active path through MS as elements of the path interact with stored associations, via the mechanism of resonance and oscillation described below. Such an image allows for a practical computation to be constructed that mirrors the internal “conversation” we know to be taking place.

Then the problem becomes “at what point in your calculations do you reply”, or “how do you know when the internal conversation should become external?” In melody, there exist "cadential" norms that always occur at the ends of sections or phrases. In navigation, the difference engine operating on the robot's location sees a value of zero, when the task is successfully completed. Seeking cadential calculations in the linguistic reply-formation process will be a necessary part of what the program learns on its own.