TopicTagger FAQ

How to enhance search and browsing on your web site

Is TopicTagger a search engine? How is better than what I've got?

We aren't a better search engine. We are a way to make your search engine more effective, no matter which one you use.

TopicTagger helps you organize your content, to improve both the search and browsing experience. Manual tagging procedures are often slow, expensive and inconsistent, when they are followed at all. TopicTagger can augment or replace these efforts with a fast, consistent, and highly cost-effective process.

We happen to bundle with Lucene because it's a solid product, and it's free. But we can work with any search engine that you have in place.


Why are there so many search engines, anyway? Why is the enterprise search market so fragmented?

Enterprise search is a highly consultative business. In practice, it doesn't matter much which of many popular products you use. What matters is how you set up the input data and configure the engine.

Successful search projects depend on having a good process for the initial creation, and an ongoing tuning process as new data comes online. Even though a huge effort often goes into product selection, the choice seldom has much impact on a project's success. Therefore, no single product has majority market share.

Why does Google do such a good job in the web search market? Why doesn't that translate to the enterprise market?

Google was the first search engine to systematically make use of available link data. It also had a cheap and effective distributed computing platform. Then-market-leader Alta Vista began as a demo for high-end hardware, and could not change its DNA fast enough to keep up with the rapidly growing internet. Other companies were locked into legacy algorithmic approaches, or distracted by portal strategies, or all of the above.

PageRank is a useful algorithm that gave Google an early leg up. But its effects can be replicated with simpler techniques that are long off-patent. The key to Google's success is not algorithmic. It is due to primarily to three main principles:

  • Making maximum effective use of available data

  • Being test driven

  • Being end-user driven

Making use of link data is a good example of the first principle. It's also critical that Google does not use data unless it actually helps results. Its test-driven culture makes it easy to try out new ideas, to drive them into the product if they work, and to exclude them if they don't. It's also critical that the company pays close attention to what end users are trying to do, not just traditional measures of relevance. There is a huge amount of quality information flowing constantly from the end users to the developers who make critical technical decisions.

This approach doesn't translate to enterprise search because it puts the onus on enterprise customers, who expect the software to work like Google web search out of the box. Google does not expose its quality assurance tools and methods to end users. Many enterprise users expect the Google Search Appliance to magically solve their problems as soon as it is installed. It doesn't take long for them to see that reality is very different.

How does TopicTagger work?

TopicTagger is based on a psychological approach, trying to understand what users have in mind when they type a particular query against a particular set of documents. There is no single special algorithm, because there is no single way that people judge how valuable a document is to them.

Like Google, TopicTagger tries to connect the true desires of end users to the underlying technology of your search engine. As with Google, the key is provide the correct input data in way the engine can use effectively. But instead of hiring thousands of engineers, you can get great results for a small monthly fee.

TopicTagger automates the key parts of a consultative process that is usually time consuming and expensive. In many cases, this process isn't performed at all, because it's judged as too expensive. With TopicTagger, it becomes radically cost-effective to make your search engine work correctly for end users.

Ok, how does it really work?

As the name implies, the most important thing TopicTagger does is apply tags. These are the terms that your end users care the most about. TopicTagger helps the search engine understand which documents are actually good responses to these search terms. The tags are then incorporated into the search index via your normal indexing process.

Unlike human taggers, TopicTagger is extremely consistent in its labeling, and the tags are geared to maximize the effectiveness of search and browsing. TopicTagger can give far better results for a fraction of the cost of human tagging.

That sounds interesting, but I mainly use tags to help users browse, not search

Search and browsing are tightly related. TopicTagger helps with both.

When you make a browse page for a topic X, in a sense you are making the page a user would most like to see when they search on X. If you put a prominent link in your navigation to topic X, then users will tend to click on it rather than search. If they don't see what they want in the navigation, then they'll either click the closest topic they see, or they'll enter a search and try to get there directly.

In practice, it is hard to keep lots of topic pages manually consistent and up to date. Editors can be inconsistent in which tags they apply, and how many they put on each article. When you add a new tag, it is a lot of work to manually go back and tag relevant content. When you import content in bulk, it can be very expensive and time-consuming to tag it all by hand. You can use TopicTagger to automatically create or supplement topic pages in real time, at a fraction of the expense.

Content creators usually have more work than they can handle – they have other things to do than study long lists of tags, and then think long and hard about which should be applied to a given article. TopicTagger frees up your editors for more useful work. It gives consistent results that work well with your search engine and enhance end-user browsing.

Can you help me with SEO and SEM?

Yes! The point of driving users to your site is to keep them there. TopicTagger is a great tool to create and/or supplement landing pages to absorb traffic from search engines. You only get one chance to make a first impression. With TopicTagger, you can be that the user's first experience will be a page with all the most relevant, up-to-date content for the search that brought them to your site.