Tuesday, January 18, 2011

A Collection of Links

Today I posted four new links to From the Halls of Devon, and it got me thinking about links, and the quality of sites that are out there. Perhaps I should be more specific, not the quality of the site, but the quality of the search engine for a site. A final point before I launch into my reviews of the sites, they are all worthwhile. I wouldn't include them if I didn't think they were, but some have search engine issues which can significantly reduce their usefulness as resources.


As far as I understand Canadiana is a both a collection of collections, and a collection of its own. It contains magazines, photos, books, and other materials. A large amount of it is public domain. It is divided into two sections, Early Canadiana Online, and a Canadian Discovery Portal.

I have two criticisms of the site. The first is that many of the documents are only available by subscribing to the site. A related criticism is that there is only one subscription option, annual, and the cost is quite high for a site that is essentially a collection with very little "social" component. I suspect that they could significantly increase their subscriber base if they allowed shorter subscription times, or lowered their annual rate. Certainly it discourages casual research, and in my opinion limits their ability to raise funds for further digitization projects.

The second is that only one (Early Canadiana Online) of the two parts of the site has the ability to search for exact phrases. This is a very large drawback when searching a large online collection. Certainly when I am searching for my own surname I don't want every bingo hall, community hall, church hall, banquet hall, halls of power, residence halls, and well, I'm sure you get the idea, turning up in my search.

The Internet Archive

The Internet Archive is a very large collection of sound, video, text, images, etc., all in the public domain. My personal interest is the text portion of the archive. It has a very limited search engine, that searches only the titles of the documents, but you can choose which collections or sub-collections you will search. I have often found that the best way to search the Archive is to do an advanced Google search, and specify to search only in the Internet Archive domain. An added, major, advantage of the Internet Archive is that you can download the texts in a number of different formats, including those that can be read on e-readers.

The Canada Gazette Archive

The range of the Canada Gazette archive is from 1841 until 1997. As with most government of Canada websites, it is a little quirky (click the search links on the left of the page), and worse still, the search engine sucks. I don't know what algorithms they use, but they return the oddest results, and in the oddest order sometimes. Sadly there is no way to sort the results, and it would be really handy to sort by date. There are other issues too.
  • In addition to the results presentation, there are significant limitations on how you do a search. You cannot specify your own date range. This is all the stranger when you consider that you can specify a single date.
  • Word stemming is automatic and it does not look like it can be disabled. Word stemming means that in a search "halls" and "hall" are treated the same, i.e. the "s" is ignored, not what I want in a search for my surname.
  • When you go back to the main search page to modify your search you have to retype your entire search as the form automatically clears.
  • The results that are generated are shown in pdf or gif, which is fine, but the search terms are not highlighted in the gif version, meaning you have to read the entire page, or in a pdf, search again using the pdf search tools to see if the result is one that you want.

The site has been redesigned, and I have to say I like what they have done. You can drill down through region and sub-region to see the documents you want. In many cases you can actually view the documents themselves online without ordering the microfilms. Much of their Newfoundland collection is now available with images. The Newfoundland collection is not indexed, but they have grouped the documents by date, location, and occasionally church. The same goes for many other collections, though there are collections that are both indexed and have images of the original documents. Finally, one can choose to search historical records, trees, and the catalogue for books, microfilms, etc.

They seem to have refined their search engine too. The results that come back are much more sensible, and specifying an exact match now actually shows the exact matches, which was not always the case before.

No comments:

Post a Comment