Archive for the 'Search Engines' Category

Even More Image Search Tools (and pretty nifty ones, at that)

Thursday, March 11th, 2010

We’ve looked at TinEye already, and I have to tell you, I find myself using it more and more as a tool for pinning down mystery photos, or looking for copyright infringements. I won’t be surprised if this becomes one of the search services that Google scoops up one day.

Idée Inc, the creators of TinEye, have a Labs page, where they play around with new search tools. It’s really worth a look.

The Multicolour Search Lab (or Multicolor, or Multicolr…your choice!) allows you to pick one or several colors from a typical web palette. With each pick, the search returns photos matching those colors, from a collection of more than 10 million Flickr Creative Commons photos. The effect is really quite striking.

The Visual Search Lab lets you pick an image from a random presentation, then finds dozens of similar images. You can then refine the results by entering keywords, which will further narrow down the similarities. Hard to explain, but easy to understand once you give it a try.

Lastly, there’s a BYO Image Search. In theory, this allows to upload your own picture, and use some of the similarity tools to find others like it. It didn’t work for me (wouldn’t allow an upload)…if anyone gets it to work, please post a comment here to let us know.

All in all, a fun set of tools to play around with.

Google Knol. The new “Invisible Web”?

Tuesday, July 29th, 2008

In case you missed it, Google launched its new next-big-thing. Google Knol.

A knol is a “unit of knowledge” according to Google, and their Wikipedia-like Knol site is a platform where anyone can contribute their knowledge on pretty much any topic. While Google is encouraging ‘experts’ to contribute, any Tom, Dick or Mary Jane can write whatever they feel like.

All this is well and good, and Knol is an easy site to use. But — so far, at least — the bulk of its content is invisible. That is, content in Knol does not show up on an ordinary Google search. It usually takes just a day or so for new content to show up in a Google search (this Web Owls article, for instance, should show up tomorrow…UPDATE: Actually, it only took about an hour).

But content in Google Knol that has been posted for five days or more is not showing up in routine Google searches, or in results from other search engines, like Yahoo Search. In other words, most of Knol is invisible to search engines.

Don’t get me wrong. Some Knol content is making its way through to search results. These seem to be chiefly the articles that are featured on the Knol front page. A test knol by search guru Danny Sullivan also quickly made it into Google search results, causing no small amount of finger-pointing about Google cooking the results.

Whatever happened with Sullivan’s knol, the simple truth is that the bulk of Knol’s material is nowhere to be found. For instance, I ran a search at Knol on the term money, and pulled up 62 knols containing that term. One by one, I have been going through the list, trying to find one of these knols in ordinary Google search results (usually, I search on the author’s name and title of the knol. I also tried some exact phrase searching). So far, not one of the 62 has shown up.

Here’s an example. A fellow named James Burchill wrote a knol on July 24, called How to Make Money on Elance. A Google search on [ james burchill how to make money on elance ] turns up five results mentioning Burchill articles with the same title. But none of the results has anything remotely to do with Google Knol.

The knols all have nofollow code in the html, but I didn’t see anything that would disallow spidering of the articles. It’s not at all clear why Knol content is not being indexed by any of the search engines.

But it sure ain’t showing up!

One of the main attractions of Knol is that it allows authors to link their Adsense accounts to their knol articles, thereby collecting any ad revenue the page generates. But if the page never appears in Google search results, the odds of generating very much traffic appear pretty slim.

Stay tuned for more on the mysterious saga that is Knol…

News Flash! Andy Czernek, one-time Google Answers wunderkind (I think he was omnivorous-ga…hard to remember…everything fading…) has an article on people searching that made the Knol front page. Three cheers for Andy.

Content based image retrieval

Wednesday, November 22nd, 2006

Would you like to search for images by visual similarity? I thought so. This is a hot research topic, and there are even a few content-based image applications that you can already play with.

CogniSign’s xcavator application seems quite advanced. You can choose a starting image, then click to define “important bits” that must be present in the matching images. As you click, the set of matching images is constantly refined to match your clicks. The demo seems to work well, but their database contains only a sample of images from Flickr. Be sure to watch their video intro before you try the demo.

LTU Technologies’ Image-Seeker examines both appearance and keywords to help you browse to similar images. This is perhaps less ambitious than the approach taken by xcavator, and I felt that keyword similarity was weighted too heavily compared to visual similarity. There’s a demo which enables you to browse the Corbis royalty-free images by choosing a starting image from a random set and progressively clicking on images that are closer and closer to your target image.

VIMA Technologies’ Visual Image Search lets you search a sample of 40000 Flickr images. Each image has attached to it a plus button and a minus button, which enable you to refine your query by clicking on images that either match well or that match poorly.

In contrast to the above applications, which are essentially technology showcases, imgSeek is an open-source application that you can download and run on your own system. It’s a photo collection manager which provides for the usual kinds of browsing and adds a similarity search. You can either provide an existing image as the seed for the similarity search, or you can use the mouse to sketch a few lines and blobs in relevant colours. It seems to do a pretty good job of finding matching images (check out the screenshots).

imgseek.jpg

But imgSeek won’t scale to searching all the images on the internet, which is what many of us would like to do. That’s something which is “not quite there yet”, and you can be sure that the likes of Google are working furiously on it.

[Update: for the state-of-the-art in 2009, see Searching for Similar Images]

55 ways to have fun with Google

Saturday, June 24th, 2006

Philipp Lenssen has written a fun little book about fun things to do with Google.

55-ways-cover.png

The 55 chapters searching games, graphical games, Google history, Google trivia, Google gadgets, Googledromes and more.

The second-best thing about this book is that Philipp has released it under the Creative Commons Attribution-NonCommercial-ShareAlike 2.0 license, which means that you are allowed – encouraged even – to copy, read, share, remix, convert, quote, browse, and print the PDF to your liking. (Philipp asks that if you do create conversions, e.g. an HTML version, please send him the URL.)

You can download the PDF for free, or buy the paper book. The paper version is certainly easier to read on the train, but somehow I think this book is best read on-screen from the PDF, with a Google window open to the side of it so that you can try everything out as you read about it.

I've saved the best thing until last. Look what I found tucked away on page 154:

pink55.jpg

Searching for free content with Google

Thursday, June 22nd, 2006

If you’re looking for free clip-art, free stock photos, free music, free articles, you can make use of Google’s “Usage Rights” search.

This little-known search feature is found on Google’s Advanced Search page, and lets you specify what kind of content you wish to find, according to whether it’s free to use or share, whether its free to modify, and whether it’s free even for commercial use.

ccsearch.png

As Google’s Usage Rights help page explains, Google is selecting the results based on links back from the content website to the free-licensing site Creative Commons. This alone doesn’t guarantee that every search result returned will be free to use. You will still need to double-check the item you wish to use, but those search results will be the best place to start looking.

A “onebox” for Google Answers

Monday, June 12th, 2006

Google Answers Researcher Philipp Lenssen has designed some possible extensions to Google’s “onebox” – the box of specialized results that sometimes appear at the top of the search results.

The most interesting of these (to us) is the onebox for Google Answers. Imagine if this was shown whenever Google detected the kind of query that might be well-suited to a human response…

onebox.png

Searching for phrases on Google

Monday, June 12th, 2006

You probably know that if you want to find this blog using the Google search engine, it won’t help to search for web owls. That search currently returns nine million results, because it finds every page that contains the words “web” and “owls” on it, whether the words are adjacent or not.

You can search for a phrase by enclosing the words in doublequotes. A search for “web owls” returns 144 pages, each of which contains the word “web” followed immediately by the word “owls”.

Google offers an alternative syntax which works the same way, and is not so widely known, but is sometimes easier to type. Simply use a dot between each word: web.owls for example.

Why use this form? If you have searched without quotes and received an unmanageable-large number of results, it’s tedious to click-at-the-start, shift-doublequote, click-at-the-end, shift-doublequote. It’s easier to simply replace the space by a dot.

I also use the dot-form in another situation. Suppose I am trying to find a specific phrase, but don’t remember it properly. For example, suppose I have searched for “able I saw elba” and received no results. It takes a lot of fiddling around with doublequotes to search for various subphrases until I find that a search for able “I saw elba” returns the pages that contain “able was I, ere I saw Elba!”.

Instead, if my original search was for able.i.saw.elba then it is a simple matter to replace various dots by spaces until I search for able i.saw.elba and get the results I want.

Incidentally, there are other characters you can use instead of dots. Slashes, apostrophes and the equals sign work the same way (I’m only interested in punctuation that doesn’t use the shift key on my keyboard).

Other characters, such as hash and hyphen, work differently.

Google’s synonym operator

Monday, May 22nd, 2006

Sometimes you know that the page you seek is out there, but you can’t find it. You’ve tried searching for all the relevant keywords you can think of, but the page must be using different words.

That’s when Google’s synonym operator comes into its own. Just prefix a word with a tilde (~) and Google will search for the synonyms that it knows about.

tilde.jpg

So, for example, a search for ~earth will find pages about world in addition to those about earth.

Suppose you want to lookup acronyms: most people wouldn’t think to include finder amongst their search terms, but a Google search for acronym ~search will return Acronym Finder amongst its results.

You can put the synonym operator in front of more than one search term if you like, but it doesn’t work in front of quoted phrases.

Google sometimes searches for very close synonyms without asking, but the synonym operator makes it use a fairly broad range. For example, Google’s synonyms for garden include plant and landscaping, synonyms for car include BMW, automobile, motor etc.

(photo by Bjarne Kvaale)

Searching for Everything

Saturday, May 20th, 2006

Google’s mission in life — apart from making tons of money, becoming a verb, and (mostly) refraining from things evil — appears to be to provide the greatest amount of information possible to the greatest number of people.

Fine idea. And Google’s certainly doing a mighty fine job as far as the information on the web goes. But what about the great gobs of information that exist outside of the web itself? Some of this information sits on dusty library shelves, not yet in electronic form, and Google has, in fact, embarked on a very ambitious project to digitize entire college libraries (and copyright be damned, I say).

But there’s also a whole universe of information out there that already is in electronic format, and that already is accessible via the internet. But none of it is — yet — available through a Google search. If I had to venture a guess (and it appears that, yes, I have to!), I’d say these missing pieces of electronica are probably as large as the current, Google-searchable, piece of the internet.

And not only are they large…most of it is high-quality stuff. Materials that are consciously archived are generally deemed to be worthy of the effort exactly because they represent high quality information (searching the net is a blast, but we all know that there’s a lot of garbáge out there). But that high-quality-but-invisible information isn’t showing up when you Google for it. Information like…

PACER. The Public Access to Court Electronic Records is the US government’s effort at bringing the federal courts into the electronic age. They have semi-succeeded. PACER is an enormous dataset of federal court records that includes not only opinions, but the extensive, often arcane filings, that go into making up a case docket. It’s a combination of pointers to records along with actual documents themselves. It’s a mixed up, unwieldy, hit and miss, and very vast collection of information. It needs to be Googlized. There is a lot of other court information available at some federal courts that aren’t part of PACER, and at state and local courts as well, along with many court systems in other countries. All, all, all should be Googlized.

And while we’re on the topic of courts, there’s also…

Lexis-Nexis. How big is Lexis-Nexis. Can you say exabytes? OK, maybe not that large (yet)…but it’s well on it’s way. Lex-Nex is another source of court cases (including a lot of historical material) but it’s also much more: full-text newspapers, magazines, directories, professional forms, credit reports, public filings, attorney general opinions, and stuff I’m sure I haven’t discovered yet.

I have mixed feelings about linking Lex-Nex and Google. Google is free, Lex-Nex isn’t, and that can make for an unhappy marriage. But the content that Lex-Nex has to offer is so compelling, that it’s probably worth exploring opportunities. After all, Google already makes available snippets of information from subscription sources that — if you want the full text — you have to pay for it. The bulk of Google Scholar materials seems to fall into this territory.

Internet Archive. How can you not love the Wayback Machine? In spirit, this is the undertaking most like Google itself. In practice, though, it’s badly in need of a system upgrade. If Google would take Wayback by the hand, meld it with its own vast collection of archived pages, make the whole thing searchable, and basically lead Wayback into the light, then Oh, What a Wonderful World This Would Be.

Newspapers. The first rough draft of history is available online back at least to the 1700’s, in multiple archives, and from a variety of nations. In addition to mainstream publications like the New York Times, there are a host of others that offer important insights into corners of world history. like the Baltimore Afro American , the Johannesburg Sunday Times or the Sydney Morning Herald.

I haven’t even mentioned patents, copyright, trademarks, corporate annual reports (a particularly easy one!), and other electronic materials already available, though not search-engine-available.

Wouldn’t adding all this to a Google search just hopelessly clutter up the results page? It could, but it doesn’t have to. A search that recognizes that there is a lot of highly-relevant material in, say, a court case, could simply ask a question like “Do you wan’t to see these results with court cases included?”

Well…do you?

pafalafaga aka David Sarokin

Google searches and the plus sign

Friday, May 19th, 2006

The Google search syntax accepts a plus sign in front of a search term. There are two situations where you might want to do this.

Firstly, a plus sign tells Google that you really truly honestly do want to search for a common term. A simple search for book a love returns mostly results that contain the more common phrase “Book of Love”.

Searching for book +a love fixes this; so does searching for “book a love” if you want the words to appear in that order – because common terms within a phrase are always matched.

A second use of the plus sign is to tell Google not to mess with your search term. A search for well yields a page cluttered with results for Wells Fargo, whereas these are absent from a search for +well.

The plus sign is not a search operator that I would use often, but just occasionally it’s a useful tool to get some unwanted entries off the search results list.