The Deep Web
Is there life after Google?
Introduction
As a basic requirement for the work we do, we've been researching
the web and organizing data since our inception. In 2004 we needed
to compile grantmaker data for our client Amara Conservation, and
in the process realized we're pretty good at locating information
via the web, and should perhaps offer web research as a separate specific
service. This led to more research, and in March 2005 we decided
to commit to the task.
Overview
When we say "finding information via the web", we're not
simply referring to using search engines like Google to find information,
we're referring more specifically to the "Deep Web". As
of this writing, Google claims to have indexed about 8 billion pages,
MSN 5 billion pages, and although Yahoo doesn't officially announce
such figures, they say their index is "competitive" with
such figures. What isn't often discussed are two very important
factors:
1.) Page & Directory Index Depth
2.) Content that can't easily be "spidered"
Although a major search engine may have indexed
8 billion pages, what may be more important is how deeply it spiders
a site. We know for a fact that Google in particular "likes"
top-level pages, and may take several passes before completing a
spider of even one directory deep.
More importantly, a great deal of data that is
accessible by other means on the web - specifically databases that
are only displayed in dynamically-generated pages when specific
data is requested - may not be indexed by search engines at all.
This is where humans become relevant once again, and why we're laying
groundwork for a human-edited directory focusing on specialized
content.
Summary
We're still formalizing our mission and assembling volunteer editors
for our project. So far we've focused on grantmaking sources and
deep content on U.S. government web sites. We'll continue to post
updates here as our plans move ahead.
If you're interested in inquiring about our services
or being a volunteer editor or researcher, please contact
us for details.
Last Updated: April 2005 |