Blog Search Engine – an overview | ScienceDirect Topics

Blogscope – full-featured blog searching from Canada

The blog search engine and visualization tool Blogscope is the result of an ongoing research project at the University of Toronto. Several PhD students are contributing to the project, which is headed by Professor Nick Koudas. Currently Blogscope tracks over 40 million blogs and has indexed in the region of 1.2 billion blog posts. The aim of the project is to develop technology for extracting what public opinion has to say on a range of topics. The visualization component captures bursts of interest in the blogosphere along a dateline in a way that resembles BlogPulse. Widgets are available for embedding Blogscope visualized comparisons and trend graphs on your own website. The publicly accessible implementation of Blogscope, which you can use for free, is just a limited preview of the project’s search engine technology.

The public version of Blogscope has lots of advanced search options. Searching is normally done from the search box in the upper right corner of the screen. Using this you’re limited to entering keywords and choosing how results should be ranked. However, by clicking ‘Options’ and then ‘Advanced options’ you get a pop-up window where you can construct more complex queries. Below the simple search box there’s a link to a Boolean search constructor page where you can type words or phrases that you want to include in or exclude from the search. You can also search by blog-hosting services, such as Blogspot, Livejournal, Live Spaces and WordPress – you can type in any domain name you want.

The standard ranking or scoring gives some preference to more recent posts. You can boost the impact of recency on ranking or make it the sole ranking criterion. Two other ranking options are by relevance according either to the search engine or to the influence of the blogs. You can also combine recency and influence (Figure 6.7). This composite scoring principle is called ‘Enhanced’ in the Advanced Search form. You may want to experiment a bit with these settings to find what best suits your type of query. The next thing to consider is which index to use. There is blog standard, blog stemmed and news standard. If you use the index called blog stemmed, your search word will be converted to the root form and then expanded to all derivations of the word; it works only with English words.

Figure 6.7. Trying out different scoring options in Blogscope

Some other advanced search options are related to the time-span that you’re interested in. You can set a start date and an end date for your search, or specify maximum and minimum ages for the posts you want to see. Blogscope’s geographical awareness is evident in the next few options, which allow you to choose countries, provinces and even cities. The countries search box will guess as you start typing, so you don’t have to type many characters before your country is suggested. The most exotic search parameter in the advanced options is by gender. Naturally, only two choices are offered – but we think these should be sufficient. How well it works is another question.

You will find several interesting things in the results lists. For example, an icon indicates whether a post includes images, audio or video. To the right is a list of Related Terms. Three different types of search can be performed by clicking on a related term. The default is to add the related term to your search and use it as a filter within the results list. You can also use the related term instead of your own and search all documents in the index. The third option is to compare the popularity of your original search term and the related term.

Underneath the related terms is the popularity curve, which is a trend graph for your search query spanning the two last months. When you click on it you get a more detailed view and you can also choose other time spans. You do this either interactively or by choosing links for the last 3 or 6 months. Blogscope will also generate images of the popularity curve in sizes up to 600   ×   400 pixels that you can embed in your own website. The necessary HTML code is provided in a window to the left. To work interactively with the graphs, position the cursor over the region you want to select and then click to get a results list for the selected time-span. This enables you to identify why bursts of interest occurred in the blogosphere during the selected time. Another great feature is that you can include another search query in the graph and make quick comparisons between query results.

The outstanding feature of the results lists is the facility to read the posts in small preview windows. These are based on Blogscope’s cached copies and they eliminate the need to click through to the actual blog. Previews of web pages have been used by a handful of web search engines, but never quite so elegantly as this. The reason why this can be done so well, of course, is that blog post texts are stable, readily available and easily handled by a search engine. The feature is so useful that it makes you wonder why every blog search engine doesn’t use it.