Authors: Jon Dron, Chris Boyne (University of Brighton) and Richard Mitchell (Inferdata)
From Proceedings of UKAIS 2001
In this paper we present an overview of some of the issues involved in negotiating the ‘stuff swamp’ of the World Wide Web and take a critical look at some of the solutions that already exist to assist learners seeking new information. We go on to describe a solution which we have been developing called CoFIND, a web-based learner-support system designed using evolutionary principles, which organises itself using the combined efforts of its users to produce emergent order. We observe some of the issues which arise in its use in three course modules.
"First, what's there is stuff: partly information, partly pure nonsense - and it's not always easy to distinguish the two. Second, it's not a superhighway, it's a swamp, albeit a swamp with many remarkable hillocks of well-organized, first-rate data and information." (Crawford, 1999).
There was perhaps a time when a learned scholar could claim to know a significant proportion of all formal academic knowledge but that time has long gone. This is an issue of some concern to teachers as it defines one their central roles. The teacher of any subject, no matter how obscure, is amongst other things a filter to a vast body of knowledge. When this knowledge was primarily embodied in books and periodicals the problem was difficult enough, but the massive growth of the Internet has led, not to an information superhighway but to what Crawford describes as the ‘stuff swamp.’ If teachers are to use the Internet in its fullest form as a resource for their students, they must become experts at find hillocks. They must become experts at discovering the nuggets of knowledge found in the 1247 million or so pages indexed by Google alone (SearchEngineWatch, 2000), which conservative estimates reckon to be a fraction of the content of a small percentage of the total websites available on the web. All this without the traditional mechanisms by which reliability is verified, the judgement of peers and critics, editors and publishing houses, which in the world of traditional publication at least hint at potential quality. With little more than the ability to recognise trustworthy names and perhaps the occasional reference from a trusted page or book, teachers must struggle through the stuff swamp a little ahead of their students.
At least, this is one perception of the problem.
In this paper we present an alternative model, where (with appropriate support) students can themselves become the discoverers of their own resources. This alternative is perhaps a little more disconcerting even for those teachers who wish to become guides on the side, and anathema to teachers of the sage on the stage model. If it were simply a question of the blind leading the blind we would probably agree. However, in this paper we will present a support mechanism which we have developed to capitalise on any partial sight learners may possess.
No matter what anyone may say to the contrary, finding stuff in the stuff swamp is easy. The author recently ran a search on Google for ‘Ethernet tutorial’ and found 39,700 results. Although many of the results were clearly not relevant, there was a sufficiently large number to make the task of sorting through them fiendish if not impossible, despite Google’s excellent second generation search engine technology. It is not possible to tell how many useful sites were lost in the farther reaches of the search results. Random browsing through the pages of results continued to reveal potentially useful resources, up to the point at which Google automatically cut off, 794 results into the search. Adding the word "beginners" reduced Google’s response to 3,200 potential candidates. Unfortunately there is no clue as to what else might have been lost along the way by narrowing down the search query. For example, we might have missed "Ethernet 101", "Ethernet Getting-Started" and so on. Equally, there are many aspects of Ethernet that we might wish to tell our students about and we would not often be looking for tutorials to accomplish this. There is a big difference between a training manual on how to implement the technology as opposed to a theoretical model explaining the concepts behind how it works, for example.
Directories such a Yahoo (http://www.yahoo.com) bypass the need for automation by using human editors to provide simple seals of approval (SOAPs). This is an effective means of helping to ensure that content is at least something like what it is supposed to be, but there is very little help with identifying the qualities of resources that might be valuable in a given context, and the relatively small number of pages indexed means that a lot goes undiscovered. This problem is addressed better by the wide range of specialist directories which are specific to particular subjects (programmers, for example, might use http://www.sourcebank.com/) where a tighter focus by a more interested community can often lead to fairly high quality lists of resources. Once again, these systems seldom provide a good indication of what it is about the sites which is considered useful, and at their best are a pale imitation of a critical review for a journal.
Sites such as Amazon (http://www.amazon.com/) go one step further than directories by attaching customer reviews to their resources and some of these can be very effective. Take for example M.J. Rose, author of a phone-sex novel ‘Lip Service.’ Unable to interest publishers, the author published the book herself then offered it through Amazon.com, where rave reviews posted by ordinary users caused the novel to sell, leading to a substantial five-figure offer from a traditional publishing house. (Piller 1999).
There are various mechanisms which we consciously or tacitly use to assess the quality of a resource, such as the (recognised) names of its authors/producers, the fact that we have followed links from reliable sources or the recommendation of friends, colleagues, mailing lists and newsgroups. Although these are fairly reliable mechanisms, the range of such sites is small. Similarly, there may be features of documents that make them inherently more reliable than others- for instance, those which include an XML-Signature, or which contain a specific set of metadata keywords, although both of these features of themselves would tell little of the quality of a document. The current vogue for blogs (web logs of individuals’ browsing activities) continues this theme of the SOAP, making the search for good sites similar to that often used by readers of fiction- find authors that you like and stick to them. In the stuff swamp, this is akin to finding a hillock and staying there.
Another approach might be to explicitly develop a mechanism to reify trust. Such systems have been suggested by Chislenko (1997) and Zelouff et al (1999). Unfortunately there are no widespread implementations of these methods and, even if there were, they would still be of little assistance in replacing the editorial skills of a good teacher.
First-generation search engines such as Lycos, Alta Vista, WebCrawler and their ilk use a variety of content-based algorithms, usually based on keyword frequency and proximity within different parts of a web document, with a variety of weighting mechanisms to improve probable relevance of results. They continue to suffer from the attentions of ever more sophisticated spammers but more fatally from the rampant synonymy and polysemy which plagues the English language (and most other languages, for that matter). Even semantic techniques which enhance the power of such systems using sophisticated thesauri fail to cope with the huge range of potential meanings of words and sentences in the English language alone, not to mention those found at foreign sites. Mostly such search engines fail to make the grade of hillock in the stuff swamp, bearing a closer resemblance to quicksand. Above all, they provide little if no indication of the quality of a site.
Second-generation systems typified by Google (http://www.google.com/), Alta Vista’s Raging (http://www.raging.com/) and IBM’s seminal Clever use techniques closer to citation analysis than semantic analysis. The popular Google’s PageRank algorithm is an example of this kind of technique. In most of these systems, pages are divided into authorities and hubs. Authorities are the pages being sought, hubs are pages containing links to other pages. Weighting algorithms are applied iteratively based on the degree to which a page is an authority, i.e. pointed to by hubs and their antecedents. Kleinberg christened this as latent human annotation or LHA (Kleinberg 1998). A less sophisticated but equally powerful technique underlies PHOAKS (http://www.phoaks.com), which bases its recommendations on a citation analysis of newsgroups. All such systems are hillock-finders in the stuff swamp, exploring patterns and emergent properties which spring from the nature of the Web and how people use it.
If the author or other annotator has provided metadata about a resource then it should theoretically make the search for relevant information easier. PICS and its vastly more powerful successor RDF (Resource Description Framework) are mechanisms for achieving this flexibly and openly, but they are only tools to assist those seeking to classify data. Whilst some such classifications may be fairly unequivocal, generating useful schema is an ongoing task. In the field of learning resources, the Dublin Core and IMS metadata standards go a long way towards addressing the needs of learners and educators, but suffer from a limited number of implementations, lack of agreement beyond the core metadata as to classifications, and a limited range of value-related metadata. Whilst we may be able to assess, say, the level of a resource, there may yet be no clue as to whether it is funny, or complicated or exciting. The ability to describe resources is extremely useful, but it is no more than an ability, and implementations are diverse and sparse. For that reason, RDF-enabled searching remains a useful chain of hillocks in the stuff swamp, not a cure-all for the masses.
Second-generation search tools such as PHOAKS make use of the techniques of collaborative filtering, where user ratings (explicit or implicit) for resources are matched. Collaborative filters are widespread, continuing a line starting from Tapestry in the early 1990s (Resnick & Varian, 1997) and continued to this day with a very wide range of research and commercial products. They make use of a wide range of algorithms such as latent semantic indexing, SVD, Pearson correlation and so on. Collaborative filters seek similar patterns in user preferences so as to make predictions. Although some of these systems are being used to help discover general resources on the web the technologies are more often called into service to apply ratings to resources such as books, films and music. One good reason for this is that virtually all collaborative filters are only concerned with a single dimension of value. Resources are considered as good or bad, useful or useless and so on. This is perfectly acceptable when dealing with matching tastes, but is less useful when applied to the discovery of resources in a more general context. If I am seeking cheap air flights I will not be using the same criteria as if I am seeking a good poem, and my preferences for one are unlikely to affect my feelings for the other. Consider as further examples the range of criteria which might be appropriate if I am seeking a resource to teach me about systems analysis, to discover a gift for my children or to tell me the latest currency prices or the weather. As each type of quality of a resource is added to the repertoire of what I look for in resources, the chances of finding a user with even an approximation of my preferences decreases.
Delgado’s RAAP and MyLinx make use of a combination of categories and single-dimensional ratings to somewhat overcome the problems of multiple topics (Delgado et al 1998), but categories are only part of the problem for learners. Learners have needs which are highly context-sensitive and (of necessity) constantly changing. My previous high rating for Sesame Street as a learning resource may have little bearing on my current need for resources about astrophysics, although there has probably been a fairly continuous pattern of growth from one to the other. As my needs as a learner change so does my ability to learn and style of learning. The essence of learning is change, so past preferences may have little bearing on future needs. It is this environment that provides the rationale for CoFIND.
CoFIND (Collaborative Filter In N Dimensions) has many functions to support a learning community, but for this paper we shall be concentrating on the manner in which it enables the emergence of structures which generate lists of resources through the use of categories (called topics) and value metadata (called qualities).
Written using Microsoft’s ASP and an Access back end, CoFIND is designed to considerably adapt to the behaviour of its users. Indeed, without users it is as informative as a blank screen. Learners collaboratively generate the data and metadata which constitute each CoFIND instance. No two CoFINDs are alike in how they structure resources, nor in the individual resources which they use. Structure and content are achieved without the intercession of an overall designer, nor even a discernible pattern to follow. Design emerges as a property of the system, borne by its users’ individual actions.
CoFIND makes use of two central forms of metadata
Resources (usually web pages and sites, although we enable free text entry as well as file uploads) may be added by any user of CoFIND. In a sense the system thus behaves a little like a collaborative bookmark database.
Resources may be rated by any user. A rating is for a resource within a specific topic and using a specific quality. Thus I might rate Google (resource) as a very useful (quality) search engine (topic). Broadly speaking, resources are displayed according to how well they have been rated given a selected quality and topic. Unless a search term is entered, the system always displays all available resources- only the order in which they are displayed is changed, according to relevance to the topic and ratings given for the selected quality.
Were the system as described left to run, it might be useful for a while, but before long a proliferation of topics and qualities would make it very difficult to make any sense of them as a structure or framework. CoFIND thus incorporates evolutionary mechanisms which place topics in conflict with each other and qualities in a competitive jungle where they struggle for supremacy. In both cases, usage determines strength. Qualities and topics that are not used fade away and eventually die, whilst those which are used more often thrive in an obvious and visual fashion.
The evolutionary mechanism of topics
CoFIND’s topic selection screen is split into four independent sectors. Each sector represents a separate competitive arena or ecosystem. Users may add topics to any of these areas, as they see fit. When a user clicks on a topic, its font size is increased by a number proportional to the number of topics it is fighting with and at the same time, all those other topics within the sector decrease in size, by a smaller but similarly proportional sum. A typical example of the effects of this is shown in figure 1. In this example, a group of MSc students who had been asked to investigate LANs instead showed a strong interest in web design, so the system shaped itself to their needs, based on usage. Like any ecosystem, there are positive feedback loops where success leads to more success.
Within this topic screen, dynamic change is rapid. The system bends to the will of the majority, yet the positive feedback loop means that it also shapes the will of the majority- users without specific goals are far more likely to click on the large text than the small. CoFIND thus lends itself to a cohort of learners with similar learning goals. Were the group of users too diverse then there us a danger that inappropriate topics would present themselves to learners at the "wrong" times.
The evolutionary mechanism of qualities
Like any environment, different parts of the system move at different rates. Whereas the rate of topic adaptation is high because it needs adjust in something approximating real time, the rate of evolution for qualities is lower. Rather like the large trees which determine the ecologies of rain forests or the street plans of cities which determine the shapes and forms of buildings within them (Brand 1997), qualities grow slowly and change sedately over time. From a user perspective, qualities are presented in a listbox which shows the first four (figure 2). Which qualities appear at the top is dependent on the amount that they have been used to rate resources, the number of times they have been used to seek resources and an adaptive weighting which is given to new resources as a reward for novelty. This weighting falls rapidly in relation to overall usage of the system if a new quality is not used. Qualities which are used more often will be those which return useful lists of resources, as useful lists of resources give the qualities that selected them a greater utility value than those which did not. These qualities are thus used more often to rate resources, leading to ever more useful lists of resources. As with topics, we see a positive feedback loop, where strength builds more strength and weakness is its own undoing. This polarisation allows speciation to occur, with distinct ecologies developing around different qualities. As with natural ecologies, the role of chance in survival is significant. Strength is defined relative to the system, not in absolute terms. This leads to different CoFIND variants when implemented with different groups. The systems shape themselves to the cohorts of users who use them.
Cofind and the stuff swamp
Our small scale experiments will not put the stuff swamp on the retreat, but our system points towards another way to negotiate it. The approach employed by CoFIND differs from the LHA algorithms of Google, Clever and their ilk. Instead of seeking patterns within existing data, CoFIND imposes new ones, making new hillocks in the stuff swamp. However, in contrast to directories such as Yahoo or traditional information retrieval approaches, these hillocks are emergent properties akin to the clusters within the Web itself, not artificial constructs of an individual or co-ordinated group.
We have implemented various iterations of CoFIND across a number of modules and courses at a variety of levels over the past two years, although it is only recently that all of the evolutionary mechanisms reached their present level of maturity. The system is currently implemented in three separate iterations in support of three course modules.
Collecting and structuring resources
CoFIND has been moderately successful as a collector and shaper of resources. Its collaborative model requires a relatively low level of input from its users but gives potentially large rewards. It is significant that in most of the versions of CoFIND implemented so far, with groups of forty to over one hundred students at a time, those students were able to find a wider range of resources of far greater quality than those that the author (as a subject expert and a web enthusiast) has been able to find alone. More importantly, qualities, ratings and topics have helped to provide a level of structure to that knowledge. Like most knowledge bases there are anomalies- for example, the qualities ‘useful’ and ‘to some extent useful’ appearing in competition with each other (‘to some extent useful’ lost by a massive margin). Perhaps due to an ambiguous interface, we also occasionally see quite inappropriate resources being rated for specific topics- pages on web design appearing under a topic of Windows NT for example. By giving up experts we expect the occasional oddity, but the structuring of the system in general tends to keep survival rates low for all but the most useful, thus indicating that in general the self-organising mechanisms are functioning correctly.
The quality of qualities
The value of qualities is still uncertain. In a series of experiments we divided students into randomly selected groups, around half of whom could rate and view resources using a full range of qualities, whilst the rest could only rate on a single-dimensional scale of good to bad. Initial results showed a small but significant difference in rating behaviour, where students using qualities gave higher average ratings to resources than those using the single-dimensional scale. However, this behaviour tailed off as the system was used more, and more ratings became available. In two out of three systems there was no significant difference in average rating between those using qualities and those not.
In two out of three systems, there were proportionally fewer ratings overall by users given access to qualities than by those who were given the single-dimensional scale. Partly this seems to be due to cognitive overload: it is often easier to decide that something is good or bad than whether it is, say, good for beginners or not. More interestingly however, the number of previously rated resources appears to have some bearing on the desire to rate. When qualities are used, votes are distributed between those qualities, whilst a single-dimensional scale means that all votes are taken into account when displaying resources. This means that the list of rated resources is always shorter when qualities are used than when all users effectively use the same quality. To compensate for this and therefore to ensure that relevant resources are returned for everyone, we have applied an aggregate function to resources which have been rated in the current topic, but which were rated using other qualities. Thus, a combination of all ratings for all qualities decides the order of any resources not rated for the selected quality. The user sees resources rated using the selected quality for the current topic first (with an indication of their current rating), followed by those resources rated for the current topic using other qualities (with no indication of that rating). From a user perspective, the main difference between the list returned when using qualities as opposed to that returned when using a single-dimensional scale is that there is no explicit indication that those other resources have already been rated. This seems to imply another positive feedback loop: ratings lead to more ratings. The problem therefore seems a subset of the cold-start phenomenon.
The cold-start phenomenon
The cold-start phenomenon is an issue often raised in the collaborative filtering literature (Harney, 2000). The system only becomes useful once it has a reasonably sized body of resources which have been categorised and rated. Until that point there is little incentive to use it, as it will fail to return useful results, leading to a vicious circle of negative feedback where a lack of resources discourages further use, hence resulting in a lack of resources. Our only effective solution thus far has been to seed the systems with a range of useful rated resources and to provide other incentives. These incentives range from encouragements like discussions groups and chat-rooms, to giving marks for participation or penalties for non-participation. These latter incentives are not in the spirit of self-organisation which drives the development of the system, but help to provide experimental results as well as notable pedagogic benefits to the students.
We have built a system which exhibits behaviour not built into it by design, but which arises from the combined interactions of its users. The system results in collections of useful resources and structures those resources in unpredictable but coherent and useful ways. The topic mechanism works and adapts to and shapes the behaviour of its users. Whether the mechanism of qualities is essential to the structuring process or whether it just gets in the way is still undecided and is clearly affected disproportionately by the cold start phenomenon. We believe on pedagogic grounds that it is in principle better suited to the needs of learners in the long run than a simple one-dimensional rating scale, but have not yet built systems of a sufficient size that the cold-start problem goes away. CoFIND continues to evolve and we will continue to report on it. Versions of the system and related papers and presentations may be found at http://www.it.brighton.ac.uk/staff/jd29/cofind.html.
Brand, Stewart (1997), "How Buildings Learn- what happens after they’re built" Phoenix Illustrated, London, 1997
Chernenko, Alexander, (1997) Collaborative Information Filtering and Semantic Transports, http://www.lucifer.com/~sasha/articles/ACF.html , (visited 1999 November 19)
Crawford, Walt (1999). "The Card Catalog and Other Digital Controversies." American Libraries 30:1 (Jan 1999), 52
Delgado, J, Ishii N, and Ura, T. (1998) Content-based Collaborative Information Filtering: Actively Learning to Classify and Recommend Documents in Cooperative Agents II, Proceedings/CIA 98 201-215, ed Matthias Klush, Gerhard Weiss, LNAI Series vol. 1435, Springer Verlag
Harney, John (2000), IT self-help, in Knowledge Management, June 2000
Kleinberg, Jon M. (1998) Authoritative Sources in a Hyperlinked Environment in Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, ed. H. Karloff (SIAM/ACM-SIGACT, 1998)
Piller, Charles (1999) "Everyone is a Critic in Cyberspace" from the LA Times, online at http://www.latimes.com/news/nation/updates/lat_opine991203.htm (visited 7/8/2000)
PHOAKS (2000) http://www.phoaks.com/ visited November 9 2000
Resnick, Paul, Varian, Hal (1997) "Recommender Systems," in Communications of the ACM, March 1997, Volume 40, Number 3
SearchEngineWatch (2000), http://www.searchenginewatch.com/ visited November 9 2000
Zellouf, Yamina, Jaillon P, Girardot J J (1999) Providing Rated Documents on the Net, Proceedings of WebNet 99, AACE, Honolulu, Hawaii, October 24-30 1999