Saturday, November 20, 2010

Unit 11 Reading Notes: Web Search & OAI Protocol

Here are some notes on a couple of the articles assigned for Unit 11. Enjoy! :D

Current Developments and Future Trends~
  • The Archives Initiative Protocol for Metadata Harvesting has been accepted and used by many since its inception in 2001.
  • This was created to federate access to electronic archives via metadata harvesting.
  • Also, this model has been shown to be of possible great use to a wide and diverse range of communities. 
  • In this sense, it is has not shown to be limited or exclusive.
  • At the time of the article's publication, there are over 300 organizations providing data from a wide range of domains. *Emphasis here on diverse data providers*
  • Throughout the article: focus on efficient dissemination of information and knowledge through commonly understood terms and standards.
  • Another great and very helpful aspect of the protocol: give access to areas of the Web that are not easy to navigate or access via search engines (good example: information stored within databases can be next to impossible to find using a traditional search engine)
  • There are many communities within the OAI that focus on specific areas.  In many ways, this recalls the structure of the Dublin Core.
  • Like so many growing initiatives, it has become increasingly more and more difficult for providers to successfully and effectively use the many repositories available.
  • The resesearch group is working on several new initiatives, such as: creating a "harvest bag" component, which would essentially allow users to create their own list or collection of repositories that they feel would be useful; make the registry's data in ways that were more useful for machine processing (did this by making it an actual OAI repository); create further automated maintenance of registry; allow for improved search and discover with collection-specific description of repositories; collection-specific metadata development/increase.
The Deep Web: Surfacing Hidden Value~
  • This article provides some really great insight into the world of e-pubslishing. *Highly useful for those who are trying to get a better understanding of how the Web is structured/designed.
  • Like net fishing--we skim the surface of what's available on the Web when we are searching for information.  Yes, there is much to be found using this strategy.   In reality, though, some of the very best information is hidden deep in the Web, and therefore many do not have access to certain knowledge because of this barrier.
  • There's a definite need to dig deeper and find new information for patrons.
  • Deep Web--quite different from surface Web. The distinction is quite evident in this article.
  • The statistics about search engines were interesting and helped to contextualize the information.
  • Not surprisingly, Google currentlyl has the largest number of indexed items.  
  • Direct crawling and indexing are replacing random link chain clicking for a more efficient approach. Essential point/goal: documents more frequently cross-referenced take priority over other documents when it comes of results pages and crawling.
  • Interesting information and background on Deep Web analysis and characteristics.

*Good synthesizing quote from the article:

"There is tremendous value that resides deeper than this surface content.  The information is there, but it is hiding beneath the surface of the Web."   

No comments:

Post a Comment