Here’s the problem: searching for words isn’t really what you want to do. You’d like to search for ideas, for concepts, for solutions, for answers. Instead, your typical search engine moronically sorts through its postings, and tries to solve your problems by looking at which words appear where, and how often, and so on. What we’d really like is an intelligent search engine. This essay is mostly about why we’re not likely to get one any time soon.
The Verity Pitch · I remember this like yesterday, even though it was around 1990. At Open Text, we were just getting into the commercial search business, and I attended a presentation at a trade show by a marketing guy from Verity, probably the leading vendor in the market, at least in terms of mindshare. It was brilliant.
What’s the most valuable thing in your business? Is it your computers? No, you’re going to be replacing them next year. Is it your data? No, that’s just a bunch of ones and zeroes. Is it the information hidden in your data? Well, that’s a little closer, but not the right answer. The most important thing in your business is your people, and the information and knowledge they possess. And that’s what Verity’s Topic Search captures for you.
You could practically hear the businesspeople in the room quivering, this guy was stroking their strings the way Yo-Yo Ma does a cello’s.
Now in fact, what Verity’s engine really did was support a hierarchical weighted thesaurus, so you could define Topics and associate them with various terms with various weightings, and with various other topics with various weightings. Then, when you did a search, you could search for topics and it would search for the terms and do the math and give you a result list. On their carefully-crafted demos, it worked like a champ.
Here’s a funny Verity story: They made a big sale of the search engine to the Justice Department of the Canadian Federal Government. Since Justice thought it was strategic, they formed a committee of very senior policy-oriented lawyers to reach agreement on their terms and Topics. They vanished into a committee room and were never seen again.
Now, thesauri are useful (more on that later), and I gather Verity’s implementation was pretty good, but nobody would call what’s happening “intelligence.”
Optimism Reigns · In fact, nobody has ever really offered “intelligent” search. But the desire for it is so strong that every decade or so, someone comes along and announces they’ve cracked the nut, and carves a brief, bright trajectory across the search-technology firmament.
The most recent one I remember is from around 1995. In the Web search engine market at that time, the leaders were Lycos, Infoseek, and us at Open Text. One day my CEO called me into the office and looked real serious and said “Tim, I think we’re in trouble. There’s a new search engine about to launch, called Architext, and they don’t search for words like we do, they search for concepts. See, it says so right here in their Powerpoints that this VC slipped me.”
All of a sudden it was 1990 again, and I was listening to the Verity pitch again. Anyhow, Architext morphed into Excite and then @Home (I think, it’s hard to keep that history straight); and their search engine, while not-bad, never showed signs of doing anything useful with concepts.
The Turing Test · Consider what a really intelligent search engine would have to do. It would have to read an arbitrary selection of documents in an arbitrary selection of dialects and styles, and ascertain what they are about. Then, it would have to look at an arbitrary query, once again in an arbitrary dialect and style, and ascertain what it is about. Then it would have to match the about-nesses of the query against that of the documents and return the right documents. This is the kind of behavior you expect from an intelligent human expert with deep subject expertise in some particular area.
A huge amount of money and time and virtuosity has been invested over the last few decades attempting to create intelligent search through the application of statistical and linguistic techniques. The results are worthwhile, but they are not intelligence.
Here’s what I believe, based on a lot of experience: any search system that could exhibit the kind of intelligence I’ve described could as a side-effect pass the Turing test, and perhaps qualify for citizenship and protection under the laws of the land.
Put another way: intelligence in search requires deep processing of human languages, which (many believe) is the single most important defining characteristic of human intelligence.
Thus, intelligent search is among the hardest of the hard AI problems. So, don’t expect to be buying software that does it by this time next year.
So What? · This means that if we want better search (and we do), we’d better not count on AI voodoo or linguistic juju or semantic mojo. We need to work with good sound statistical techniques, and be clever about generating and using metadata, and we need to get our APIs right. All of these things are hard, and there is good work being done in all of them.
Human language is a slippery beast, and there’s no reason at all to think that it’s going to be easy to teach it to machines. Next time, I’ll look at some of the better-understood linguistic problems, such as inflexion and part-of-speech and language. For example, if you need to know about cow farming you’re probably also searching for cattle ranching, beef (or dairy) production, and Kuhbauernhof, whether you know it or not.