Notes
Outline
Simmons College
LS 434:  Medical Librarianship
Spring 2002


Using the WWW to develop a multi-functional library website
Instructor
Mary McKeon, MSLS
Information Services Librarian & Head of Circulation
617-638-4253, mamckeon@bu.edu
Class goals
Examine the Alumni Medical Library’s multi-functional website
Improve  information retrieval skills using the WWW
Practice search strategies using general search engines, web directories, and meta-search engines
Use established ‘virtual library’ collections as an alternative strategy for locating resources on the WWW
Evaluate Web sites for validity, source, content, and currency of information
Functions of the Alumni Medical Library’s Website
The website serves as an electronic representation of the physical library
Expands access to library services & resources from remote locations
Supports reference activities
Supports user education/curriculum support activities
Supports outreach grants
Serves as a department newsletter
World Wide Web
Created in 1989 by Tim Berners-Lee and his colleagues at CERN, a physics laboratory in Switzerland
Their goal was to provide shared documents and graphics more easily on the Internet
World Wide Web
World Wide Web Gopher, FTP, HTTP, telnet, electronic mail, etc.
Hypertext transfer protocol (HTTP) allows for text
  AND hypermedia
Documents are usually written in hypertext markup language (HTML)
 Data is requested by a client and provided by a server --
any computer can be a client and/or server
What are WWW browsers?
Browsers navigate the WWW
Browsers are simple to use because one user application communicates with many servers, and one server can support many user interfaces
Mosaic was released in 1993 by the National Center for Supercomputing Applications (NCSA) as the 1st WWW browser program
Netscape
Internet Explorer
URL -- Uniform Resource Locator
URL Domain Names
"Finding Information on the Web"
Finding Information on the Web:
About Search Engines
Acknowledgements:
SearchWatch   http://searchenginewatch.com/
The WWW is currently
estimated to contain
tens of millions of documents
How does one go about sifting through all of these documents to find information that’s relevant to the question or topic?
Trends & impact
How do you find WWW pages/sites?
other web pages 88%
search engines 82%
Internet directories (ie., Yahoo) 65%
print media 62%
friends 58%
email signatures 33%
TV advertising 37%
Use health/medical print resources to identify websites & resources
Professional journals such as the Bulletin of the Medical Library Association
Professional news sources such as College & Research Libraries News
Specialty publications such as Medicine on the Net
Search MEDLINE for medical journal articles identifying and evaluating topical websites
What is a search engine?
The term "search engine" is often used generically to describe both search engines and Web directories.
They are not the same.  The difference is how listings are compiled.
True search engines
Directories
Hybrid search engines
Meta search engines
True search engines
“True” search engines are created by machines.  Databases are built by “robots” or  computer programs that roam the WWW finding sites new to their home database, updating old ones and deleting obsolete sites.
A spider visits a web page, reads it, and then follows links to other pages within the site (this process is called being "spidered" or "crawled”)
The spider returns to the site on a regular basis to look for changes and this may affect how sites are listed and retrieved.
True search engines
Page titles, body copy and other elements play a role
The software sifts through the millions of site records in the index to find matches to a search request
The software also ranks the retrieved sites by relevancy
Directories
A directory such as Yahoo! depends on humans for its listings
Site creators submit a description to the directory for the entire site, or editors write one for sites they review
A search looks for matches only in the submitted descriptions
Directories usually have much smaller databases than true or hybrid search engines
Hybrid search engines
(both engine and directory)
These days, almost every search tool is part engine, part directory
Being included in a search engine's directory is usually a combination of luck and quality
Site producers can “submit” their sites for review, but there is no guarantee that they will be included in directories.
Meta search engines
Meta-search engines do not maintain databases of their own = “middle agents”
Transmit your search query simultaneously to multiple search engines
Search results represent a compilation of results from all engines queried
Meta search engines
Useful for saving time in searching multiple engines at-once
Useful for obtaining an overview of “what’s out there”
Beware:  if you enter a complex search strategy, not all of the engines searched may be able to interpret it
Try these meta-search engines:
MetaCrawler Inference Find
Metafind Ask Jeeves
Why do search results vary from engine to engine?
Why do search results vary from engine to engine?
A search engine searches the contents of its database
     -- not the World Wide Web directly
None of these databases includes all the WWW pages in existence, so results vary
Each database has different features
Why do search results vary from engine to engine?
Some Web search databases are maintained with little human evaluation (true search engines)
In others, sites are hand-picked and evaluated or reviewed (directories)
Some search tools do both (hybrid search engines)
Search tools vary in features, size and comprehensiveness
How do search engines rank relevancy?

 location of search term
Title words are assumed to be most relevant
Keywords appearing near the top of a web page are assumed to be relevant
Assumes that any page relevant to the search term will mention those words right from the beginning
How do search engines rank relevancy?

frequency of search term
Frequency of keywords
The result is that no search engine has the exact same collection (database) of web pages to search
Search engines may also give a web page a relevancy
boost if it has a lot of links pointing to it or if it has been favorably reviewed
Let’s take a look at how some of these search engines work...
Example #1:

Searching for the Alumni Medical Library’s homepage in HotBot
(a search engine with an associated directory)
Slide 30
Slide 31
Slide 32
What do these search results reveal about the search engine?
 HotBot searches all the pages on a particular site
  (not just the main page or the actual homepage)
 HotBot uses the first few words of the page for its descriptions
 HotBot’s database is probably compiled based on searches of the first few words (usually first 100) of any page’s text
(which means that if the searched terms are 150 words into the page, that particular page won’t be retrieved)
What do these search results reveal about the search engine?
 HotBot does not appear to search headings or graphical text
(if it did, it would have retrieved the library’s homepage)
Example #2:

Searching for the Alumni Medical Library’s homepage in Yahoo
(a Web directory)
Slide 36
Slide 37
Slide 38
Slide 39
Slide 40
Slide 41
The point of all this is…
It’s important to understand what the search engine is actually doing.
It’s important to recognize that no two engines work exactly alike.
The more you know about how a search engine works, the better able you will be to manipulate it to its fullest advantage.
But, most search engines don’t readily explain what and how they are searching.
Using Search Tools Effectively
Using search tools effectively
DO become familiar with one or two favorite search tools and learn to use their advanced features
DO enter singular terms -- many search engines will find substrings:
searching for game will usually retrieve games too
DO NOT expect these features to replicate the kind of precision you’d find within a bibliographic database
Do use collections that have been organized and quality-filtered by libraries & other organizations
Reality check…
WWW is highly unstructured and unorganized:
No thesaurus or controlled vocabulary is used
No indexing process occurs
No standardization in the types of materials that are mounted on the Web
No quality controls or review process when WWW sites are mounted
Each search engine’s database works differently and is developed based on different criteria -- no uniformity regarding what parts of a Web page the engine is searching
Alternatives
to using Search Engines
Alternatives to using Search Engines
Instead of doing a “cold” search in a search engine, think about the information another way:
First, think about information in terms of category, then find a site that fits that category.
"Information need:"
Information need:
HIV/AIDS surveillance reports
Who might produce or distribute that information?
U.S. government agency
Centers for Disease Control
How to approach the search?
Go to the CDC’s Web site and look for “surveillance reports”
"Information need:"
Information need:
latest HIV/AIDS treatments
Who might produce or distribute that information?
A variety of different places, depending on your perspective!
Is the information for:
a researcher? a social worker?
an administrator? a patient?
a caregiver? A partner or loved one?
How to approach the search engine?
Instead of searching for “Ryan White” or “RFP” search for the “Boston Department of Public Health AIDS Information Service”
Alternatives to using Search Engines
Instead of doing a “cold” search in a search engine, use the virtual libraries compiled by:
libraries
professional organizations and associations
government agencies
city, county, & state agencies
Evaluating Internet Resources



Acknowledgements: 
Jan Alexander & Marsha Tate
Wolfgram Memorial Library
Widener University
Chester, PA
www.science.widener.edu/~withers/evalout.html
Evaluating Internet Resources
Once you find a Web site, you have to determine whether the information is relevant.
What are some of the criteria
that you could use to evaluate Internet resources?
Evaluating Internet Resources
Criterion #1:  Content
Accuracy
Disclaimer
Completeness
Evaluating Internet Resources
Criterion #2:  Credibility
A site should display the name & logo of the institution responsible for the information, as well as particular authors.  Disclosing sponsorship can assist users assess motivations of information providers and potential conflicts of interests.
Evaluating Internet Resources
Criterion #3:  Currency
The date of the original document on which the information is based and the date of posting on the Web assists users to judge timeliness.
Evaluating Internet Resources
Criterion #4:  Site Evaluation
Sites should indicate whether the information provided has been subject to review
Is the site fact-checked or verified in some way?
Is the information accurate and factual?
Or, is the site sponsored by the agency that produces the informational content?
Evaluating Internet Resources
Criterion #5:  Design, Software requirements
Do you find the perfect site only to find that your computer doesn’t have the appropriate software to view/manipulate the site?
Does your browser alter the appearance of the page?
Can you tell whether the software has limited the amount of information on the page?
Does the site have a “text only” version for low-level browsers?
Evaluating Internet Resources
Criterion #6:  Purpose, Target Audience, Point-of-View
The best Web sites are clearly focused on their purpose and target audience
The point-of-view or agenda should be stated or made obvious.
The purpose of the site should be clearly stated, and the information provided should be appropriate to that purpose or mission.
Evaluating Internet Resources
Criterion #7:  Disclosure, Profiling, Confidentiality
Web sites request and use information for purposes of which the user may be unaware.
Users must be informed if any information about them is gathered or used by the Web site.
Evaluating Internet Resources
Criterion #8:  Internal Search Capabilities
An internal search engine with an easy user interface is highly desirable.  It should be capable of keyword or search string searching.
Evaluating Internet Resources
Criterion #9:  Evaluation of Quality of Links
The person/s responsible for link selection should have the expertise and credentials to critically evaluate their appropriateness.
The site “architecture” or design of pointers to linked sites is important for ease of navigation.
The content of links should be accurate, current, credible, relevant.  The content of the originating site is enhanced if it includes links to high-quality sites.
Evaluating Internet resources
Criterion #10:  Style & functionality
Is the site organized clearly and logically?
Is the site well-written?
Is the site easy to navigate?
Do the links work?
Does the site have an internal search engine?
Evaluating Internet Resources: 
Challenges
Blurred distinction between advertising and the actual information/ Infommercials
Is the advertising provided by same organization that provides informational content?
Does advertising bias informational content?
“Infommercial” Web sites
Is informational content mixed with entertainment or advertising?
Evaluating Internet Resources: 
Challenges
Web pages out-of-context
Does a search land you in the middle of a site, so that you don’t know its origin or intended audience?
Always return to the site’s “home” to determine its source.
Evaluating Internet Resources: 
Challenges
Instability
Does a favorite site disappear or move without notice?
Try to determine the stability of a site before linking to it or becoming reliant on it.
Document the URL, producer or location of the site so that you can locate it later.
Evaluating Internet Resources: 
Challenges
Site alterations, updates
Does a site suddenly change?
Is information moved around?
Is the site altered without notice?
Is the information archived?
If this is the case, attempt to verify information using other sources.
Evaluating Internet Resources: 
Challenges
“Teasers” & limited free-of-charge access
Does a site contain only “teasers” -- leading you to think the information is comprehensive when it actually is not?
Does a formerly “free” site suddenly require a
fee-based subscription?
Are certain sections or pages of a site restricted to paying customers only?
Evaluating Internet Resources: 
Challenges
Privacy & confidentiality
Is the information you input about yourself confidential?
Does a site “sell” your email address to advertisers?
Does a site require registration?  If so, how do you determine what is done with the information you’ve provided?
In-Class Exercises
End Of Slides