I have found out that the intelligence of an NLI system is not something that emerges. Every bit of it needs to be hard-coded. The complexity is formed by to ability to recombine these hard-coded pieces at will.
Last year I was working on an NLI implementation in GO (nli-go, on Github). This year I read about CALO, a very complex system with manu NLI aspects. At the moment I am reworking all features I have gathered all the time.
I get it! It is different. I took a better look at TEAM and now I understand. TEAM makes the distinction between semantic analysis and pragmatic analysis. Semantic analysis is generic. Pragmatic analysis is domain specific. In the semantic analysis phase the syntax tree is converted into a 'semantic' representation. This representation contains absolutely no knowledge of the domain (or context) at hand. It is just a simple representation, close to its syntactic counterpart, but without the overhead of all the phrase constructs, and with the additional information that is implicit in the syntax (e.g. plurality). In the pragmatic analysis phase the semantic representation is converted into a pragmatic representation, or domain specific form. Abstract concepts like 'have' and 'be' are converted into structures that are part of the domain (have-child, color-of). What does this mean for ambiguous words? A bank is either an institution or a river-side. In the semantic representation, this just becomes 'bank', so no attempt is made to turn this in either a bank-institution or bank-river. Only in the pragmatic phase an interpretation is chosen. At this point of analysis, context is part of the input. When interpreting, the interpretation rules of the active context are selected. When the context is banking, 'bank' may be converted into 'has-bank'. About the event based form: only in the pragmatic phase an event based form or a timeless form is chosen. In one context events may be relevant, in another context it may not matter at all.
In the last days I found out something important about semantic interpretation. In my own thinking I had included an extra phase between syntax and domain specific semantics. It was the step of 'generalized semantics'. With that I mean that I thought that each sentence should first be turned into a standardized generalized form, and only after that it should be transformed into a domain specific form.
The systems I investigated don't do that. They go from syntax directly to domain specific. And there's a good reason for that. In the system I had in mind the first semantic form would be event based. This form allows the user to be most expressive. However, this generality comes at a big cost. Most databases are not event based. So the event based generic form should be turned into a fact based domain specific form. This transformation is hard, much harder than it has to be.
This insight dawned on me when I was reading about CHAT-80 and saw the predication `exports(france, wine, britain)`. In my generalized form this would have looked something like this: 'export(e), subject(e, c), name(c, "france"'), object(e, w), ... (etc). As you can see this creates a lot of predicates that are not useful in this domain.
An event based form may be useful in some domains, but quite useless in others. The domain has to tell.
An objection might be: but you can't let the database tell you what logical form to take! This objection is invalid. It is not the database, it is the domain model, that dictates the form.
Another objection is: If the system supports multiple domains, you would need to know the active domain before you can correctly parse the sentence. This is a valid remark, but it is not as much of a problem as it might seem. One always needs to know the current context where to place a sentence. A sentence will always be interpreted differently in another mental context.
Domain specific models also provide a 'namespace' for a set of words and concepts. It is a solution for the problem of homonimy: words that have several meanings. Each word has only a single meaning in a single domain context. Once you know the domain, there is no need to select the right homonym from a set of several, and this may prevent some embarassing interpretation mistakes.
These are my current thoughts about the parsing and semantic analysis phases.
The parsing process is restricted to syntax only and no semantic attachments are made. Person and number agreement checking is done and this excludes some parse trees. Due to quantifier scoping and conjunction ambiguities, this first phase is likely to produce multiple results. This is not a problem because parsing is simple and fast.
The next phase is syntax-to-semantics conversion. It converts the syntax trees into literal semantic representations (either relational, list based, or other). It uses the semantic attachments from the lexicon to do so. At that point we have multiple semantic representations that literally represents the sentence of the user, but it is probably not what he or she meant.This is the next challange.
In the next phase domain specific expressions are converted to their logical and unambiguous counterparts. For example, when someone says "Give me the products with the highest sales figures.", this sentence contains several expressions not found in other domains. These expressions may be found in an acquisition phase by asking users what they would ask the system. This phase converts the expressions into concepts from the domain ontology. Domain ontology? Yes, the acquisition process also contains a domain ontology building step. "Give me .. products" could be transformed into "print the full name and article number of .. products". The literal representation is transformed into the domain model representation.
When this is done we can apply selectional restrictions to cut down impossible interpretations. That is the next step.
Now there still may be multiple representations left and we need to ask the user which of these he or she meant and go with that.
In the next step the domain model representation is converted into knowledge base (database) representation, using 'domain model to knowledge base model conversion rules'.
Probably an extra step is needed to transform the database representation into an actual database query, but just if that makes things simpler, not harder.
When the results come back from the database, they need to be represented in the way expected by the user. This representation was stored in the 'literal to domain specific' conversion phase and we use it here to convert the database results to the way the user expects.
On transportability. There is chicken-and-egg problem. To make a system transportable, not one but several types of mappings need to made. The more such a system is used, the less time it takes to extend it to another domain. However, the more mappings that need to made, the harder it is to get the system accepted. A system that requires just a simple mapping will have an advantage here, since getting an NLI system accepted by multiple groups of people is still a real problem.
'Syntactic Structures' by Chomsky, although historically significant, is in actual practise not very useful. In fact, it has hold back progress for a while by insisting that Phrase Structure grammar alone would not be sufficient to deal with natural language. The solution to him, transformational grammar, has proved impossible to use for efficient parsing and Phrase Structure grammar is still very much alive today.
Studying EUFID, it dawned on me that there should be a separate model for syntax-to-semantics mapping rules. These are always present, except when a system uses custom procedures.
Since we are talking about models, to me a model is a datastructure, or a set of identical datastructures if you will. A model is never just a set of coded functions in the programming language of the system. But I can see that Lisp and Prolog applications have chosen this language exactly because it is easy to create such mapping functions. I would make an exception for such systems if they have developed a formalism that poses a distinct set of restrictions to these functions. So, a LISP function that converts a piece of syntax into a piece of semantics is only a model if its form is highly restricted and its possibilities limited.
One might think that it should be possible to create an abstract architecture for NLI that allows independent teams to work on a single module. I am talking about standardization of the interfaces between the modules. I don't think it is possible to do such a thing. There is just not enough overlap between the different systems. For one thing, I don't think that science has settled on the right formalism for the task. While science progresses no team will like to be fixed on any given formalism. Different techniques are suitable for different domains.
I still have my doubts about the specialness of EUFID. But the more I read about it, especially the paper "Semantic database mapping" by John F. Burger, the more I get impressed. I don't know if EUFID is the first to stress the mapping of the user's view of a domain to the knowledge base representation, but the makers have done a good job at it.
I noticed that William A. Woods, the creator of Lunar, was also the inventor of the Augmented Transition Network. In Lunar he used this formalism. I read his article and the graphs are very nice. However, the implementation quickly gets quite yieldly and tough to master, I think.
I contacted Erik T. Mueller and asked him about the meaning and the origin of the name 'ThoughtTreasure'. It was quite a story and I have placed it on the ThoughtTreasure page.
I also asked him about the main influences of the system. He said they were: 1957 Syntactic Structures by Chomsky; 1984 Cyc commonsense knowledge base started by Douglas Lenat; 1986 The Society of Mind by Marvin Minsky. It is quite interesting because it emphasises that ThoughtTreasure aimed to be a general purpose AI system rather than just a NLI system.
Mr. Mueller also gave me some feedback on the ThoughtTreasure features I listed.
One thing I learned from ThoughtTreasure is the importance of 'phrase verbs'. The verb 'receive' means something completely different than 'receive from'. Compare: 'The minister received the ambassador.' with 'The minister received a compliment from his secretary.' The first verb means something like 'to meet', the second something like 'to get'. The participles and the prepositions (here: 'from') are part of the verb, and determine its meaning. It is a good idea to store these as separate verbs in the lexicon.
Working on ThoughtTreasure.
After SHRDLU I thought I had a strong basis with many features and that the other systems would be mainly just a matter of ticking the checkboxes.
I was wrong. In fact, I found it very frustrating to see that ThoughtTreasure has all kinds of features that were not available in SHRDLU. Again, I had to think of new concepts, and change existing divisions of the domain space.
But when I just started entering the data in the computer I found it to be fun again. It is amazing how these systems differ. And it is intriguing to chart this domain. ThoughtTreasure has explicit higher order cognitive functions (named agents), like emotions and goal recognition. It handles the answering of questions in a more explicit way. I like this idea very much. Mueller groups quite different functions under the term 'agents' so we can't say that all agents are higher order functions. The grouping of functionality itself is interesting.
After going through several systems in a somewhat superficial manner, I started do deal with them more thoroughly.
I started with SHRDLU, because it is the first system I consider, and at the same time it has profound coverage of the problem domain. SHRDLU answers questions about it's own internal processes. This is quite remarkable. I would like to show that this is awesome, but at the same time, how it can be done, and that, with the proper data structures, it does not need to be rocket science to pull it off.
I think it is inevitable that my interpretation of the problem domain, the features I discern, and the way the features are grouped by me, is coloured by what I think is important, and by my lack of understanding of the complete field. I hope to reduce my personal influences and I will strive for an objective representation.
After finding out that the number of features available in NLI systems, and that there was some overlap, but also great differences between these systems, I decided to create a web application to be able to visualize these similarities and differences.
During the holidays with my family in England I started reading several articles available of systems on natural language interaction. I wanted to find out the complete problem domain in order to create a generic architecture for a modular, extendable NLI system.