Searching for Everything

Google’s mission in life — apart from making tons of money, becoming a verb, and (mostly) refraining from things evil — appears to be to provide the greatest amount of information possible to the greatest number of people.

Fine idea. And Google’s certainly doing a mighty fine job as far as the information on the web goes. But what about the great gobs of information that exist outside of the web itself? Some of this information sits on dusty library shelves, not yet in electronic form, and Google has, in fact, embarked on a very ambitious project to digitize entire college libraries (and copyright be damned, I say).

But there’s also a whole universe of information out there that already is in electronic format, and that already is accessible via the internet. But none of it is — yet — available through a Google search. If I had to venture a guess (and it appears that, yes, I have to!), I’d say these missing pieces of electronica are probably as large as the current, Google-searchable, piece of the internet.

And not only are they large…most of it is high-quality stuff. Materials that are consciously archived are generally deemed to be worthy of the effort exactly because they represent high quality information (searching the net is a blast, but we all know that there’s a lot of garbáge out there). But that high-quality-but-invisible information isn’t showing up when you Google for it. Information like…

PACER. The Public Access to Court Electronic Records is the US government’s effort at bringing the federal courts into the electronic age. They have semi-succeeded. PACER is an enormous dataset of federal court records that includes not only opinions, but the extensive, often arcane filings, that go into making up a case docket. It’s a combination of pointers to records along with actual documents themselves. It’s a mixed up, unwieldy, hit and miss, and very vast collection of information. It needs to be Googlized. There is a lot of other court information available at some federal courts that aren’t part of PACER, and at state and local courts as well, along with many court systems in other countries. All, all, all should be Googlized.

And while we’re on the topic of courts, there’s also…

Lexis-Nexis. How big is Lexis-Nexis. Can you say exabytes? OK, maybe not that large (yet)…but it’s well on it’s way. Lex-Nex is another source of court cases (including a lot of historical material) but it’s also much more: full-text newspapers, magazines, directories, professional forms, credit reports, public filings, attorney general opinions, and stuff I’m sure I haven’t discovered yet.

I have mixed feelings about linking Lex-Nex and Google. Google is free, Lex-Nex isn’t, and that can make for an unhappy marriage. But the content that Lex-Nex has to offer is so compelling, that it’s probably worth exploring opportunities. After all, Google already makes available snippets of information from subscription sources that — if you want the full text — you have to pay for it. The bulk of Google Scholar materials seems to fall into this territory.

Internet Archive. How can you not love the Wayback Machine? In spirit, this is the undertaking most like Google itself. In practice, though, it’s badly in need of a system upgrade. If Google would take Wayback by the hand, meld it with its own vast collection of archived pages, make the whole thing searchable, and basically lead Wayback into the light, then Oh, What a Wonderful World This Would Be.

Newspapers. The first rough draft of history is available online back at least to the 1700’s, in multiple archives, and from a variety of nations. In addition to mainstream publications like the New York Times, there are a host of others that offer important insights into corners of world history. like the Baltimore Afro American , the Johannesburg Sunday Times or the Sydney Morning Herald.

I haven’t even mentioned patents, copyright, trademarks, corporate annual reports (a particularly easy one!), and other electronic materials already available, though not search-engine-available.

Wouldn’t adding all this to a Google search just hopelessly clutter up the results page? It could, but it doesn’t have to. A search that recognizes that there is a lot of highly-relevant material in, say, a court case, could simply ask a question like “Do you wan’t to see these results with court cases included?”

Well…do you?

pafalafaga aka David Sarokin

One Response to “Searching for Everything”

  1. eiffel says:

    Google goes part-way there with patents. Try this search:

    patent 5123123

    That catch, of course, is that you need to already know what you want.

    If nothing else, this example shows that a user-friendly interface exists for this kind of search.