WebNet - June 3, 1997

This month's meeting featured presentations by Jacqueline Craig of IS&T on UCB web policy, Aron Roberts of IS&T on the Phantom web search engine, and Debra Bartling of EERC on SWISH and other search engines.

Jacqueline Craig

Jacqueline Craig, from User and Account services of IS&T, discussed the UCB draft web policy, available at http://amber.berkeley.edu:5014/. This policy page is not intended to represent an oversight body - UCB has no central authority monitoring web sites. Instead, it tries to draw together existing University policies so that the campus community will be aware of them.

Jacqueline encourages visitors to look at the links, not just the top page of the site. Links to other campus web policies are included. Links include:

Copyright
This is a great area of contention; copyright law is emerging; and there great deal of disagreement as the federal laws are interpreted differently in different states. Individual providers are responsible for compliance. No central authority at UCB to consult. A secondary level page links to many copyright resources.

Accessibility
Any information that must be used by an individual on the web, must be accessible. All information should be as accessible as possible in general as well. WebNet will have a meeting on accessibility in the future.

Student information disclosure
Anywhere where students' names may be listed, disclosure requirements apply. A link to UC policy on student disclosure provides guidance in this subject.

Responsible authorities
This section has gotten the most feedback. Departments are encouraged to create their own web steering committees to identify preferred standards and aim for consensus via discussion.

Policy violations
Web policy violations may be submitted to abuse@uclink. Immediate referrals will be made to whoever can resolve the problem. In the case of severe problem or risk, the site may need to be shut down. This section of the policy web page will be expanded, with more guidance on problem resolution.

Jacqueline encouraged continued review of this draft policy page. The policy group scheduled a June 11 meeting to make modifications based on suggestions received to date. She then took questions, which included:

Q: How does the web policy apply to pages that are duplicates of other media, such as mailing lists or newsgroups, where the author may not have control of the content?
A: Always have a disclaimer for when people leave the site. Also have the URL showing.

Q: How much of this would apply to an intranet (only w/in a dept)?
A: Anything that is linked through a home page applies. Any web site with .berkeley.edu applies if you give someone that URL. If it's password-protected, then it's private and would probably be considered an internal document.

Q: What kind of control would the dept. have over personal pages?
A: If it's a University resource, it's subject to University regulations. The physical server is a University resource and must be used in support of University activity.

Q: What about sponsorship messages on web site, for companies that help design the site?
A: Must not imply endorsement of the product. Use a disclaimer, make sure it's clear you're not endorsing the company.

Aron Roberts

Aron Roberts, of Distributed Computing Support (formerly WSSG), presented Phantom, a web search engine from Maxum. This product acts as an Internet robot which scans, indexes, and searches. It is written in 4th Dimension and distributed as a runtime application, and requires little system administration experiences

An online slide presentation (generated with Office 97 PowerPoint) of Aron's talk is available. Highlights include:

A full-featured, downloadable 30-day demo is available at http://www.maxum.com/.

Debra Bartling

Debra Bartling, of the Earthquake Engineering Research Center, discussed site and database searching.

Debra began by discussing SWISH, a free search engine which EERC started using about 6 months ago. SWISH includes the following features:

Debra recommended keeping up good site housekeeping (as SWISH will index pages whether there are links to them or not), and using meaningful, descriptive titles for pages.

Unfortunately SWISH is not working as well for EERC now, because the site is evolving and a lot of it is in databases. Since the databases don't have HTML pages, no one search button will retrieve everything.

EERC maintains EQIIS, a Sybase database of earthquake damage images. For searching they are using Sybperl (available at http://www.perl.org/CPAN/authors/id/MEWP/sybperl-2.07.tar.gz). Pages are cached so that visitors don't have to start all over if there's a network interruption. EERC has also added the ability to refine a search without actually retrieving any images.

Another EERC project is the Earthquake Engineerng Abstracts database. It uses KE Texpress, which has an SQL-like interface. It supports a full complement of regular expressions and complex boolean queries.

EERC keeps a log file of searches. This way they can find where visitors are searching for things they don't have, and refer them to more appropriate locations. They have also found where visitors are looking for things that they do have, but can't find using SWISH because the content is in a database, so they have added a page explaining to search the database if the SWISH search is unsuccessful.

We then had another question and answer session, including the following:

Q: How much do the products cost?
A: SWISH is free, and can be downloaded from ftp://sunsite.berkeley.edu/pub/swish-e/ and http://sunsite.berkeley.edu/SWISH-E/ (prototype web site). Cost information for Sybase and KE Texpress can be found on their web pages (http://www.sybase.com/ and http://www.kesoftware.com/) .

Q: Harvest is really hard to configure and install - how about SWISH?

A: Very easy. Did it in less than a morning. SWISH is only for local sites though.

Q: Inktomi was developed here at UCB; would there be any way we could use that for campus?
A: Inktomi has been commercially licensed (HotBot). Inktomi is not currently available - it's not a licensing restriction. We're not automatically entitled to the commercial derivatives of the research code. Ask the CS dept?

Q: Using AltaVista to search UCB sites- host:berkeley.edu will find only Berkeley sites. But you have to use their results page, so you end up at AltaVista even if you do your own front-end. Also hard to keep up to date; you don't know when they index.
A: UCB is thinking of getting AltaVista for UCB home page, but it's very expensive; in the tens of thousands. But maybe there is an educational discount.


Last modified June 16, 1997 by Julie Bernstein.