Archive for the 'Search Engines' Category

Google as predicted in 1964

Monday, June 25th, 2007

I do enjoy looking at old predictions of the future. Eventually, the future arrives and we can compare it with the predictions.

Sometimes, the predictions are better than the reality. Sometimes, reality outpaces not only the predictions but even the dreams of the past. And sometimes, the predictions end up being pretty-much spot on.

That’s the case with a piece about the “answer machine” of the future, which appeared in the book Childcraft Volume 6: How Things Change, published by Field Enterprises Educational Corporation in 1964. (Thanks to Paleo-Future for bringing this to my attention.)

Here’s how it starts:

a1.jpg

I think Google can handle that:

g1.png

What else can our Answer Machine do for us?

a2.jpg

A single click from Google’s first result shows us this picture:

phono.jpg

a3.jpg

Yep, “File | Print” does the job nicely.

a4.jpg

The original “Mary Had A Little Lamb” recording was not kept, but we can listen to Edison re-enacting it or to an 1899 recording made on Edison’s 1878 tinfoil phonograph.

a5.jpg

A Google Video search doesn’t disappoint, although you do need to scroll past movies about Edison Lighthouse. I especially like this movie, filmed by Edison, which demonstrates that the more things change, the more they stay the same.

a6.jpg

Someday? I already have an answer machine that can do all those things. And if that fails, I can ask my question at Uclue. I’m feeling lucky.

Content based image retrieval

Wednesday, November 22nd, 2006

Would you like to search for images by visual similarity? I thought so. This is a hot research topic, and there are even a few content-based image applications that you can already play with.

CogniSign’s xcavator application seems quite advanced. You can choose a starting image, then click to define “important bits” that must be present in the matching images. As you click, the set of matching images is constantly refined to match your clicks. The demo seems to work well, but their database contains only a sample of images from Flickr. Be sure to watch their video intro before you try the demo.

LTU Technologies’ Image-Seeker examines both appearance and keywords to help you browse to similar images. This is perhaps less ambitious than the approach taken by xcavator, and I felt that keyword similarity was weighted too heavily compared to visual similarity. There’s a demo which enables you to browse the Corbis royalty-free images by choosing a starting image from a random set and progressively clicking on images that are closer and closer to your target image.

VIMA Technologies’ Visual Image Search lets you search a sample of 40000 Flickr images. Each image has attached to it a plus button and a minus button, which enable you to refine your query by clicking on images that either match well or that match poorly.

In contrast to the above applications, which are essentially technology showcases, imgSeek is an open-source application that you can download and run on your own system. It’s a photo collection manager which provides for the usual kinds of browsing and adds a similarity search. You can either provide an existing image as the seed for the similarity search, or you can use the mouse to sketch a few lines and blobs in relevant colours. It seems to do a pretty good job of finding matching images (check out the screenshots).

imgseek.jpg

But imgSeek won’t scale to searching all the images on the internet, which is what many of us would like to do. That’s something which is “not quite there yet”, and you can be sure that the likes of Google are working furiously on it.

55 ways to have fun with Google

Saturday, June 24th, 2006

Philipp Lenssen has written a fun little book about fun things to do with Google.

55-ways-cover.png

The 55 chapters searching games, graphical games, Google history, Google trivia, Google gadgets, Googledromes and more.

The second-best thing about this book is that Philipp has released it under the Creative Commons Attribution-NonCommercial-ShareAlike 2.0 license, which means that you are allowed - encouraged even - to copy, read, share, remix, convert, quote, browse, and print the PDF to your liking. (Philipp asks that if you do create conversions, e.g. an HTML version, please send him the URL.)

You can download the PDF for free, or buy the paper book. The paper version is certainly easier to read on the train, but somehow I think this book is best read on-screen from the PDF, with a Google window open to the side of it so that you can try everything out as you read about it.

I’ve saved the best thing until last. Look what I found tucked away on page 154:

pink55.jpg

Searching for free content with Google

Thursday, June 22nd, 2006

If you’re looking for free clip-art, free stock photos, free music, free articles, you can make use of Google’s “Usage Rights” search.

This little-known search feature is found on Google’s Advanced Search page, and lets you specify what kind of content you wish to find, according to whether it’s free to use or share, whether its free to modify, and whether it’s free even for commercial use.

ccsearch.png

As Google’s Usage Rights help page explains, Google is selecting the results based on links back from the content website to the free-licensing site Creative Commons. This alone doesn’t guarantee that every search result returned will be free to use. You will still need to double-check the item you wish to use, but those search results will be the best place to start looking.

A “onebox” for Google Answers

Monday, June 12th, 2006

Google Answers Researcher Philipp Lenssen has designed some possible extensions to Google’s “onebox” - the box of specialized results that sometimes appear at the top of the search results.

The most interesting of these (to us) is the onebox for Google Answers. Imagine if this was shown whenever Google detected the kind of query that might be well-suited to a human response…

onebox.png

Searching for phrases on Google

Monday, June 12th, 2006

You probably know that if you want to find this blog using the Google search engine, it won’t help to search for web owls. That search currently returns nine million results, because it finds every page that contains the words “web” and “owls” on it, whether the words are adjacent or not.

You can search for a phrase by enclosing the words in doublequotes. A search for “web owls” returns 144 pages, each of which contains the word “web” followed immediately by the word “owls”.

Google offers an alternative syntax which works the same way, and is not so widely known, but is sometimes easier to type. Simply use a dot between each word: web.owls for example.

Why use this form? If you have searched without quotes and received an unmanageable-large number of results, it’s tedious to click-at-the-start, shift-doublequote, click-at-the-end, shift-doublequote. It’s easier to simply replace the space by a dot.

I also use the dot-form in another situation. Suppose I am trying to find a specific phrase, but don’t remember it properly. For example, suppose I have searched for “able I saw elba” and received no results. It takes a lot of fiddling around with doublequotes to search for various subphrases until I find that a search for able “I saw elba” returns the pages that contain “able was I, ere I saw Elba!”.

Instead, if my original search was for able.i.saw.elba then it is a simple matter to replace various dots by spaces until I search for able i.saw.elba and get the results I want.

Incidentally, there are other characters you can use instead of dots. Slashes, apostrophes and the equals sign work the same way (I’m only interested in punctuation that doesn’t use the shift key on my keyboard).

Other characters, such as hash and hyphen, work differently.

Google’s synonym operator

Monday, May 22nd, 2006

Sometimes you know that the page you seek is out there, but you can’t find it. You’ve tried searching for all the relevant keywords you can think of, but the page must be using different words.

That’s when Google’s synonym operator comes into its own. Just prefix a word with a tilde (~) and Google will search for the synonyms that it knows about.

tilde.jpg

So, for example, a search for ~earth will find pages about world in addition to those about earth.

Suppose you want to lookup acronyms: most people wouldn’t think to include finder amongst their search terms, but a Google search for acronym ~search will return Acronym Finder amongst its results.

You can put the synonym operator in front of more than one search term if you like, but it doesn’t work in front of quoted phrases.

Google sometimes searches for very close synonyms without asking, but the synonym operator makes it use a fairly broad range. For example, Google’s synonyms for garden include plant and landscaping, synonyms for car include BMW, automobile, motor etc.

(photo by Bjarne Kvaale)

Searching for Everything

Saturday, May 20th, 2006

Google’s mission in life — apart from making tons of money, becoming a verb, and (mostly) refraining from things evil — appears to be to provide the greatest amount of information possible to the greatest number of people.

Fine idea. And Google’s certainly doing a mighty fine job as far as the information on the web goes. But what about the great gobs of information that exist outside of the web itself? Some of this information sits on dusty library shelves, not yet in electronic form, and Google has, in fact, embarked on a very ambitious project to digitize entire college libraries (and copyright be damned, I say).

But there’s also a whole universe of information out there that already is in electronic format, and that already is accessible via the internet. But none of it is — yet — available through a Google search. If I had to venture a guess (and it appears that, yes, I have to!), I’d say these missing pieces of electronica are probably as large as the current, Google-searchable, piece of the internet.

And not only are they large…most of it is high-quality stuff. Materials that are consciously archived are generally deemed to be worthy of the effort exactly because they represent high quality information (searching the net is a blast, but we all know that there’s a lot of garbáge out there). But that high-quality-but-invisible information isn’t showing up when you Google for it. Information like…

PACER. The Public Access to Court Electronic Records is the US government’s effort at bringing the federal courts into the electronic age. They have semi-succeeded. PACER is an enormous dataset of federal court records that includes not only opinions, but the extensive, often arcane filings, that go into making up a case docket. It’s a combination of pointers to records along with actual documents themselves. It’s a mixed up, unwieldy, hit and miss, and very vast collection of information. It needs to be Googlized. There is a lot of other court information available at some federal courts that aren’t part of PACER, and at state and local courts as well, along with many court systems in other countries. All, all, all should be Googlized.

And while we’re on the topic of courts, there’s also…

Lexis-Nexis. How big is Lexis-Nexis. Can you say exabytes? OK, maybe not that large (yet)…but it’s well on it’s way. Lex-Nex is another source of court cases (including a lot of historical material) but it’s also much more: full-text newspapers, magazines, directories, professional forms, credit reports, public filings, attorney general opinions, and stuff I’m sure I haven’t discovered yet.

I have mixed feelings about linking Lex-Nex and Google. Google is free, Lex-Nex isn’t, and that can make for an unhappy marriage. But the content that Lex-Nex has to offer is so compelling, that it’s probably worth exploring opportunities. After all, Google already makes available snippets of information from subscription sources that — if you want the full text — you have to pay for it. The bulk of Google Scholar materials seems to fall into this territory.

Internet Archive. How can you not love the Wayback Machine? In spirit, this is the undertaking most like Google itself. In practice, though, it’s badly in need of a system upgrade. If Google would take Wayback by the hand, meld it with its own vast collection of archived pages, make the whole thing searchable, and basically lead Wayback into the light, then Oh, What a Wonderful World This Would Be.

Newspapers. The first rough draft of history is available online back at least to the 1700’s, in multiple archives, and from a variety of nations. In addition to mainstream publications like the New York Times, there are a host of others that offer important insights into corners of world history. like the Baltimore Afro American , the Johannesburg Sunday Times or the Sydney Morning Herald.

I haven’t even mentioned patents, copyright, trademarks, corporate annual reports (a particularly easy one!), and other electronic materials already available, though not search-engine-available.

Wouldn’t adding all this to a Google search just hopelessly clutter up the results page? It could, but it doesn’t have to. A search that recognizes that there is a lot of highly-relevant material in, say, a court case, could simply ask a question like “Do you wan’t to see these results with court cases included?”

Well…do you?

pafalafaga aka David Sarokin

Google searches and the plus sign

Friday, May 19th, 2006

The Google search syntax accepts a plus sign in front of a search term. There are two situations where you might want to do this.

Firstly, a plus sign tells Google that you really truly honestly do want to search for a common term. A simple search for book a love returns mostly results that contain the more common phrase “Book of Love”.

Searching for book +a love fixes this; so does searching for “book a love” if you want the words to appear in that order - because common terms within a phrase are always matched.

A second use of the plus sign is to tell Google not to mess with your search term. A search for well yields a page cluttered with results for Wells Fargo, whereas these are absent from a search for +well.

The plus sign is not a search operator that I would use often, but just occasionally it’s a useful tool to get some unwanted entries off the search results list.

Search Engines trust their own Answers

Wednesday, May 17th, 2006

Yahoo also has an Answers service. It’s very different from Google Answers, because the Yahoo questions are limited to 110 characters, and no cash changes hands. Yahoo Answers has now come out of beta and is being “integrated” into their core search services.

Yahoo was corporate-blogging about how “questions and answers are being surfaced within results“, and gave as an example a search for best dog for apartment, where the YA question/answer appears on the first page of the results.

Nicholas Carr decided to look into this more deeply. He found that a Google search for best dog for apartment returned a Google Answers question/answer on the first page of the results, with YA on page three.

Back on Yahoo, GA was nowhere to be seen, and MSN Search didn’t return either the GA or YA pages.

Search engines trust their own answers.