Text Generation and User Modelling on the Web

Maria Milosavljevic and Robert Dale
Microsoft Institute of Advanced Software Technology
65 Epping Road
NSW 2113 Australia
{t-mariam, rdale}@microsoft.com


Peba-II is a text generation system which produces hypertext descriptions of animals on the World Wide Web (WWW) from a taxonomic knowledge base. A user model modulates the user's view of the animal taxonomy depending on whether she elects to be an expert or novice. In this paper, we describe the user model of the existing system and point to a number of interesting research directions it opens up.

1 Introduction

Existing multimedia systems do not ordinarily have the flexibility to tailor their output to the needs of a specific user. A single user model is hard-wired into the system, and the user must search for relevant information herself, relying on the structure of the system to provide some cues. Using a hypertext interface (such as the WWW) provides some degree of relief by enabling the user to follow links to find relevant information, but the text still remains static. In this paper, we describe Peba-II, a hypertext generation system which varies the descriptions of entities in a taxonomic knowledge base depending on the context of use. At this stage, it incorporates a simple user model that produces different texts for novices and experts, but one might also imagine taking other factors into account, including age, nationality, time pressures or previous discourse. In particular, we are currently exploring the potential of introducing new concepts to the user by comparison with concepts she knows about, either from a model or the user's knowledge or from the discourse history (see Milosavljevic [1996]).

Peba-II dynamically generates descriptions as hypertext documents available on the WWW. We use a variation on McKeown's [1985] schemas as a way of specifying discourse structure, but extended to produce hypertext documents.

2 Text Generation and Hypertext

Employing text generation to dynamically create on-line documents adds considerable utility to a hypertext-based system. Instead of storing documents with a single audience hard-wired, text generation may be employed to dynamically tailor information provision to the user's knowledge, task type, current context and any previous interactions (see Paris [1987]). This also results in reduced document maintenance costs as new information is automatically included. If a fact in the knowledge base is updated, its effect is immediate, not requiring each document containing that fact to be updated. One function of Peba-II is to generate comparisons of animals. Since we cannot determine which animals a user might compare, and to store every comparison of 100 animals would require 4950 documents, we observe the added benefit of reduced document storage space. A hypertext generation system can also include links to other areas of interest based on its model of the user (see Carenini et al [1993]).

Using a hypertext interface for a text generation system enables the user to perform high-level discourse planning, directing the goals of the system herself, rather then relying on the system to reason more deeply about her needs. Peba-II generates descriptions of animals on the WWW, adapted to the current context of use. As with Reiter et al's [1992] IDAS system, Peba-II is able to generate smaller pieces of text than a typical text generation system might, since the user can solicit more information by selecting hypertext tags. In essence, the user drives the system, indicating new discourse goals by selecting hypertext labels. Using the WWW interface for a text generation system has the added benefit of a built-in multi-modal display.

3 Peba-II Architecture

The architecture of the Peba-II text generation system is shown in Figure 1. The knowledge base consists of animal facts extracted from existing encyclopedias, and is described further in Section 4. The system's discourse goals are determined by the user's selection of hypertext tags in the WWW browser interface. A new-fashioned discourse goal is then passed to the text generator which produces another WWW page, including additional hypertext links which the user may select and so on. Peba-II can currently generate two types of texts: a description of one animal or a comparison of two animals in the knowledge base.

Figure 1: The Peba-II Architecture

4 The Taxonomic Knowledge Base

The current Peba-II knowledge-base of animal facts was hand-constructed from an analysis of existing encyclopedia texts. The Linnaean animal taxonomy provides an underlying classification structure which is used to build up a semantic network of animal classes. Each node within this hierarchy provides a place where we can hang information about the animal class. An example fragment of the semantic network is shown graphically in Figure 2.

Figure 2: An example knowledge base hierarchy

The animal taxonomy provides the main backbone for hypertext generation, discussed further in Section 5. It allows us to describe an animal with respect to its position in the hierarchy and to provide inheritance of features from higher classes. It also allows us to infer relationships between animals to produce comparisons.

There are essentially two types of features within the animal knowledge base: distinguishing_property (DC) and hasprop. The distinguishing property of an animal reveals justification for the taxonomic distinction in the hierarchy, and demonstrates the uniqueness of the animal to its siblings. For example, the Monotreme class is distinguished from all other Mammals by its egg-laying method of reproduction. It also inherits the milk-producing characteristic from its supertype, the Mammal.

The hasprop clauses identify additional characteristics for an animal and provide a basis for animal descriptions and comparisons. A substantial analysis of animal encyclopedia articles revealed an inherent categorisation of properties within our domain. A taxonomy of these is used to encode relationships between features, a portion of which is shown graphically in Figure 3. This taxonomy is used to augment each property in the knowledge base with its associated type, and employed to construct comparisons between animal properties, as demonstrated in Section 5.

Figure 3: A fragment of the feature taxonomy

A fragment of the knowledge base is given in Figure 4. A phrasal lexicon is currently utilised in the realisation of knowledge base entities (see Milosavljevic et al [1996]). The entire knowledge base at present contains 1137 clauses describing 401 classes.

(hasprop Echidna (linean-classification Family))
(distinguishing-characteristic Echidna Monotreme (body-covering sharp-spines))
(hasprop Echidna (nose long-snout))
(hasprop Echidna (social-living-status lives-by-itself))
(hasprop Echidna (diet eats-ants-termites-earthworms))
(hasprop Echidna (activity-time active-at-dusk-dawn))
(hasprop Echidna (colouring browny-black-coat-paler-coloured-spines))
(hasprop Echidna (lifespan lifespan-50-years-captivity))

Figure 4: A knowledge base segment for the Echidna

5 Generating Hypertext using Schemas

Peba-II applies text structuring schemas similar to those of McKeown [1985], but adapted to generating hypertext. Each schema essentially provides a discourse grammar, implemented in Peba-II as an augmented transition network, that is stepped through by a text generation system. The schema provides ordering constraints using a set of RHETORICAL PREDICATES in a way that provides coherent and fluent text.

5.1 The Identification Schema

From the outset, the Peba-II system has been designed as a hypertext generation system. As discussed earlier, the nature of hypertext allows the user to perform high-level discourse planning and alleviates some of the burden from the text generation system. As a result of this, and due to the nature of our domain, we can conflate McKeown's Identification, Constituency and Attributive schemas into a single Identification schema shown graphically in Figure 5. This schema dictates that to describe an entity, we first give the naming for the entity (Name-Entity), list any subtypes of that entity (Name-Subtype) and then describe each property of the entity in turn (Describe-Property). The Name-Entity, Name-Subtype and Describe-Property rhetorical predicates are matched onto appropriate clauses in the knowledge base, and each currently produces one sentence in our system.

Figure 5: The Identify Schema

An example text generated using this schema for the Echidna is given in Figure 6. The supertype and list of the subtypes of the active node are generated as hypertext items. The user performs high-level discourse planning by selecting hypertext tags within the WWW page, and in this case, this could be either the Monotreme or the two subtypes of the Echidna.

Figure 6: An Identification Schema WWW Page

Each WWW page generated by Peba-II also allows the user to swap between naive and expert mode as shown in Figure 6.

5.2 The Compare and Contrast Schema

Peba-II generates comparisons of animals based on the feature hierarchy discussed in the previous section. The schema first identifies how two animals are related within the animal taxonomy, and then attempts to compare individual properties. The feature categorisation hierarchy also allows us to draw comparisons between related features like height and length.

The Linnaean relationship between two nodes in the hierarchy is determined by traversing up the animal taxonomy to find their lowest common ancestor. The distinguishing properties of the subtypes of this ancestor to which the two animals belong are adopted as the main basis for their distinction. For example, from Figure 2, the lowest common ancestor of the Kangaroo and Echidna is the Mammal, and their main difference is determined to be the distinguishing properties of the subtypes of the Mammal which they belong to. Hence, the Kangaroo is a Marsupial, so it keeps its young in a pouch, and the Echidna is a Monotreme, so it lays eggs.

The feature hierarchy is used to bring together the properties of the animals which may be compared. An example WWW page generated by Peba-II is given in Figure 7.

Figure 7: A Compare and Contrast Schema WWW Page

6 The User Model

Peba-II embodies a dynamic user model which distinguishes between naive and expert users. The Linnaean taxonomy allows us to implement an automatic user model that filters information through animal naming and the user's view of the taxonomy.

6.1 Animal Naming

Each node within the Linnaean animal taxonomy always has an attached Linnaean name, often has a true name and sometimes a common name. Expert users are presented with scientific Linnaean naming, whereas naive experts are given true naming and are only shown Linnaean naming if a true name does not exist for the animal class. This results in different sentences generated by the Name-Entity rhetorical predicate:
  1. The Peludo, also known as the six-banded Armadillo, is a type of Armadillo which has six flexible bands.

  2. Euphractus sexcinctus, also known as the Peludo, is a member of the Euphractus Genus which has six flexible bands.

Sentence (1) above is generated for a naive user and sentence (2) for an expert.

6.2 The Collapsible Taxonomy

What is considered to be the superordinate or subordinate of a given node depends on who the reader is. In (1) above, the supertype of the Peludo is taken to be the Armadillo, a family node, whereas in (2) the supertype is the intervening node, Genus Euphractus. This forms the basis for the second distinction between users: the view of the Linnaean taxonomy. Figures 8a and 8b show the novice and expert views of the same portion of the knowledge base.

Figure 8a: A novice's view of the Linnaean hierarchy

Essentially, the more technical distinctions are ignored for the naive user. This means that if there is no division within the animal taxonomy, it is collapsed. Hence, the supertype of the Platypus is the Monotreme for a naive user, since the intervening steps in the taxonomy do not provide the her with any additional or meaningful knowledge. Divisions within the taxonomy are where differentia occur and identify groupings of concepts. This is important in defining distinguishing features for entities and in determining similarity between them.

Figure 8b: An expert's view of the Linnaean hierarchy

This also results in a different list of subtypes for a node within the taxonomy. For example, the subtypes of the Echidna will be the short-beaked Echidna and long-beaked Echidna under the naive user model, and Genus Tachyglossus and Genus Zaglossus under the expert user model. This can be seen by comparing Figures 6 and 9.

Figure 9: A Compare and Contrast Schema WWW Page for the Expert User

7 Conclusions and Future Work

In this paper, we have described Peba-II, a text generation system that produces hypertext descriptions of animals on the WWW, modified to the context of use. The application of text generation and user modelling to electronic publishing on the WWW provides substantial leverage, since we can provide information which is tailored to the user's knowledge, task and past interactions. We are currently extending Peba-II in a number of ways:

These research directions will allow us to demonstrate the practical applicability of hypertext generation and user modelling in real-world applications.


Carenini G., Pianesi F., Ponzi M. and Stock O. [1993] Natural Language Generation and Hypertext Access. In Applied Artificial Intelligence, 7(2), Taylor and Francis/Hemisphere Publishing, New York, pp. 135-164.

Dale R. and Milosavljevic M. [1996] Authoring on Demand: Natural Language Generation Hypermedia Documents. Submitted to the First Australian Document Computing Symposium (ADCS'96) . Melbourne, Australia.

McKeown, M. [1985] Text Generation. Cambridge: Cambridge University Press.

Milosavljevic M. [1996] Introducing New Concepts Via Comparison: A New Look at User Modeling in Text Generation. In Proceedings of the Fifth International Conference on User Modelling, Doctoral Consortium.

Milosavljevic M., Tulloch A. and Dale R. [1996] Text Generation in A Dynamic Hypertext Environment. In Proceedings of the Nineteenth Australasian Computer Science Conference (ACSC'96) .

Paris C.L. [1987] The Use of Explicit User Models in Text Generation: Tailoring to a User's Level of Expertise, PhD Thesis, Columbia University.

Reiter E., Mellish C. and Levine J. [1992] Automatic Generation of On-Line Documentation in the IDAS Project. In Proceedings of the Third Conference on Applied Natural Language Processing, Trento, Italy, pp. 64-71.

A postscript file of this paper is available . A much more detailed paper is available.

For more information contact Maria Milosavljevic mariam@mpce.mq.edu.au