Searching for "similar images"

Monday, November 2nd, 2009

We’ve discussed content-based image search before, but that was three years ago. Since then, the technology has matured into a useful everyday tool.

Suppose you have an image, and you want to find others like it. You can use the TinEye image search engine where you can either upload an image file, or submit a URL. TinEye will then display a list of similar images.

TinEye claims to index over a billion images, but it doesn’t always find a match. The matches that it does find tend to be very good. In particular, it will find images that include any one of the major components of the original image.

The other alternative is Google Images, which has rolled out a “Find similar images” option. Perform a regular image search, and you will find that most of the result images have a “Find similar images” link beneath them.

Google finds a different kind of match than TinEye. TinEye is quite literal, whereas the Google matches are more broad. Google seems to be taking account of the textual context on the image’s target page, as well as the visual characteristics of the image. TinEye seems to be matching some kind of literal measurement of the image components, such as their angle and height/width ratio.

Google’s interface is easier to use, because you can keep refining your search by clicking “Find similar images” on your best match so far. TinEye requires you to start each search afresh, although they also offer a browser plugin for easy searching on any image that you find on any webpage.

As at November 2009, TinEye requires free registration. Google Images is free to use without registration.

Similar images found by Google Image Search (left column) and TinEye (right column)

Google Maps adds photos and Wikipedia

Friday, May 16th, 2008

Google has rolled out some new options for their Google Maps service. If you click “More”, you can activate an overlay of photos or an overlay of Wikipedia buttons.

The photo overlay adds thumbnail photos of points of interest, from Panoramio. Clicking on a thumbnail displays a larger version of the photo.

The Wikipedia overlay adds a Wikipedia icon for any location that has a geotagged Wikipedia article. Both overlays can get quite cluttered in popular areas, but two things make this more manageable: as you zoom out the number of items in the overlay decreases, and more popular/important items are indicated with a larger icon or thumbnail.

Thursday, March 6th, 2008

UNdata isn’t the antithesis of data (in the way that UNcyclopedia is the antithesis of an encyclopedia).


Undata is a remarkably convenient way to access statistical data collected from the many and varied international governmental organizations that make up the United Nations.

Over 55 million database records are held on the site. A free-text search box helps you locate the data sets of interest, as does an “Explore” link. Once you locate the data you are looking for, you can refine it by applying filters, and can choose a column as a “pivot” to produce a cross-tabulation.

Best of all, you can then click “Link to this page” to create a static URL to your customized version of the data. For example, I looked up the land area of protected spaces (parks, etc) and applied a filter to restrict the results to Australia and Iraq. Choosing “Country” as the pivot column changed it to a two-dimensional table showing changes to protected spaces by country and year, allowing me to obtain this link to my data table.

If the presentation of complex data is important to you, you should visit Gapminder. Choose the Gapminder World option, and you can view five-dimensional data in a very intuitive way. Suppose you were interested at trends in carbon output on a country-by-country basis, correlated with wealth and life expectancy, and varying over time – it’s no problem. Put carbon output on one axis, wealth (GDP) on the other, make the size of each data point represent life expectancy, and hit the “Play” button for a dramatic presentation of how it changes over time. Colour coding is used to indicate countries, which can be highlighted or labelled if you like.


Does that sound like hard work? Well sit back and watch Hans Roslings use the Gapminder animated charts to take you on a dramatic video tour of world data. He presents several data sets illustrate world problems and to suggest insights into possible solutions. He’s an engaging presenter, so do watch the video until the end, where there is a most entertaining finale!

ResearchWikis for free market research

Thursday, January 3rd, 2008

If I had to pick a topic that I thought was unsuited to wikis, market research would be that topic. Sources are closely guarded, figures are usually unverifiable and sometimes unsubstantiated, and the entire market research industry is built on some rather flimsy assumptions. No, a market research wiki could never work.


That hasn’t stopped ResearchWikis from making a good go of it. I checked a few of their pages, such as Aluminum Market Research, and found usable though fairly basic information covering market background, market structure, industry definitions, market metrics, industry players, trends, recent developments, and some sources. This would make a good first port of call; an overview and familiarization pass before settling down to some serious market research work.

The initial market research looks like it has been seeded by ResearchBuy, who are more than happy to sell you a more advanced report or to provide you with custom research.

The site is being actively maintained, but just by one user named John. It will be interesting to see how ResearchWikis holds up once people start editing the site in earnest.

A Great New Resource…sigh

Friday, November 30th, 2007

I’m here to blow off some steam.

What is it about university-based search engines that makes them — without exception — so frustratingly clumsy?

The latest entry from Carnegie-Mellon University — the Universal Digital Library aka The Million Book Project — should have us all jumping for joy.  Then Million Book Project does exactly what it says — makes a million-plus volumes available for immediate online access. 

Wow!  This is a phenomenal accomplishment.  Amazing.  Undreamed of a mere two decades ago.  The entire world now has instant access to a large research library, covering just about any topic under the sun, and in multiple languages too.

But just try using the danged thing, and you might find your enthusiasm quickly fading.

First off, the images aren’t web-compatible, nor are they based on a common add-in like Adobe PDF.  You need to download not one, but two, separate viewers in order to see the books themselves. 

The viewer downloads don’t happen automatically, when you try to view a book.  Instead, your viewing attempt will simply fail, with no explanation of why.  You need to find the instructions squirreled away in the FAQs, and go through the (unusually cumbersome…including a requirement to register) process for obtaining the software.

Then, if you know exactly what book you’re looking for, you can do a quick Title or Author search.   My search for “Oliver Twist” pulled up 18 copies of, essentially, the same book.  (While this may be useful for scholars wanting to compare editions, one wonders whether it was the best use of limited resources?)

Ready to read Oliver Twist?  Perhaps the book you click on will open, perhaps not.  the volumes housed on the library’s China server, in particular, seem to go through 45 minutes worth of firewalls before deciding whether to grant acces or not.

But if you’re lucky enough to get an image, you can begin reading…one page at a time!  Click to open the page.  Wait. Adjust the viewer format.  Read the page.  Click.  Wait.  Adjust the viewer format again. Read the next page.  Click. Wait.  Adjust the…

There’s no way to access a chapter at a time or, heaven forbid, download the entire book.

Want to search within Oliver Twist for a particular passage you recall from your school days?  Sorry.  No in-the-book searching is available!

I don’t mean to diminish the exceptional accomplishment of the Universal Digital Library…it really is a momentous achievement.  But it just doesn’t flow the way one has come to expect tools on the internet to flow.  For some reason, university-based systems just don’t seem able to manage the flow. 

I’ve written before about the Making of America, and other digital content online at the University of Michigan.  MoA is one of my favorite historic research tools, but it’s so damned slow and cumbersome — right down to its unwieldy URLs — that it seems to be deliberately designed to hide itself from the research community, and to frustrate its users once they happen to find it.

The Internet Archive, which grew up at UC Berkeley, is another university-launched frustration.  Without a doubt, this is one of the internet’s great resources, but still, it’s so hard to manuever around and search that it can make you crazy.  They toyed with full-text search capabilities a few years ago, but it never worked well, and has long since disappeared from view (I can’t even find it in the archive of the Archive!).

Like internet-savvy researchers everywhere, I’ve grown familiar with the fast, easy-access capabilities of in-the-book search engines like Amazon and Google Books, or commercial services like Questia.  Perhaps I’m being unreasonable, but I expect to see these in any online collection, whether of library books, or web pages.  Why can’t universities seem to manage this?

Of course, the Universal Digital Library, MoA, and the Internet Archive all operate on a shoestring, and don’t have the resources of Google or Amazon…or even tiny Questia…to add a lot of capabilities like a user-friendly design and full-text searching.

But somehow, the non-profit Wikipedia manages to do it!

Decoding acronyms

Wednesday, October 10th, 2007

Sometimes you want to know the meaning of an acronym. Perhaps it’s new, or perhaps it’s jargon — a term used within a particular community or subculture.

There are two ways to go here. You could look it up on a comprehensive site such as The Free Dictionary, where you may get dozens of possible meanings. Take a look at this Free Dictionary search for the acronym OP, for example.

Or, you could look it up at the Urban Dictionary. A Uclue user, willdeans, describes the advantage of this approach:

People submit their own definitions and then vote on the definition. As a result, the most common uses of an acronym become immediately clear. Chances are the definition you are looking for will be in the top few

Thanks, Will!

Prices, Inflation, Stocks, Interest Rates

Tuesday, October 2nd, 2007

Historical information about prices, inflation, stocks, interest rates, exchange rates, labor prices, the value of gold, etc goes under the name of Economic History.

A wealth of economic history data can be found at the Economic History Services website, where you can explore a variety of data sets and use a variety of calculators to answer questions such as these:

  • How many modern dollars would I need to buy the same goods that I could have bought for $10 in 1793?
  • How has the purchasing power of the pound changed since 1264?
  • How much did unskilled labor cost in the past?
  • What has happened to interest rates, exchange rates, the cost of living, the stock market, and savings in the past?

Other services at the site include book reviews, databases, an encyclopedia of Economic and Business History, and a massive set of useful links to related sites.

Some of the calculators redirect to the Measuring Worth site, where a number of useful data sets are hosted, together with a glossary and explanatory article about Measures of Worth.

The Rolls Royce of Patent Searching

Thursday, September 27th, 2007

If you need to go beyond what Google Patent Search can deliver, you could consider LexisNexis TotalPatent, a pay service. If you have to ask how much it costs, you can’t afford it (unless you’re a Patent Attorney).


The following blurb comes via Amy Storey, who sent it on behalf of LexisNexis TotalPatent:

I’ve enjoyed reading Web Owls tips on searching patents and thought your readers might be interested in LexisNexis and their recently launched TotalPatent … While TotalPatent isn’t a free service, users of TotalPatent will get the benefits of:

  • Exclusive back files – we have patents that go back as far as 1836 for the US patents. The European Patent Office is 1978; World Intellectual Property Office is 1978; Great Britain is 1890 and the Netherlands is 1915.
  • In some cases, TotalPatent actually has more documentation of these patents online than the national patent office. For example, we are loading the full text of Granted patents published by the British Patent Office back to 1979 that are unavailable from any other source, including the British Patent Office. We are able to do so because we had heard a library was going to throw away their older patent records due to a space problem.
  • 22 full-text authorities in one source
  • 3 times more full text collections than anyone else offers
  • 65 million compressed, multi-page, searchable PDF documents.
  • Chisum on Patents, Milgrim on Licensing, and many others
  • Prior-art content from Elsevier Science Journals.
  • World’s largest collection of searchable full-text and bibliographic patent databases—in the language of publication and English translations—including images, citations, legal status and patent family collections.
  • Alert and profiling tools used to monitor industry trends and technology issues, allowing the user to stay ahead of technological developments and competitive activity.

What can you do with an ISBN?

Tuesday, September 18th, 2007


Suppose you have the ISBN for a book – what can you do with it?

You can look it up in WorldCat, the free catalog of the world’s libraries. You can look it up using Google Book Search, or the Internet Book Database, or the Internet Book List. You can check LibraryThing to see how the book has been tagged, and to find a list of similarly-tagged books.

You can find the book on search engines. You can do a citation metasearch. You can access the bibliographical information in MLA or APA format

You can find the catalog entry for the book at numerous libraries worldwide, whether famous (such as Oxford University) or obscure (such as the Waikato Institute of Technology).

You can find this book at online booksellers such as Amazon, or perhaps Or maybe you want it for free, in which case you could check BookMooch, or perhaps you can swap it at BookHopper. If you want to know where your copy has been before you bought it, you could find the book at BookCrossing.

You could look up the book at a price comparison site, or you could see if it’s listed by sellers of rare books. Or perhaps it’s a technical book that’s available in the online reference library at Safari Books?

How do you do all these things? I could give you a long list of links, but I don’t need to. Those good wikipedians have set up a wonderful page for this.

Visit Wikipedia’s Book Sources page and enter your ISBN. In return you’ll get a page full of links, all customized to that ISBN, with which you can access that book at all the services listed above and many more too.

Wikipedia warns that possession of an ISBN doesn’t prove that a book was necessarily issued, as the publication may have been cancelled after the ISBN was assigned. Also, an ISBN identifies one specific edition of a book, so a single book might have multiple ISBNs (paperback, hardcover, second edition, etc). Wikipedia even comes to the rescue here – the Book Sources page also generates links to thingISBN and xISBN, services that will help you find different editions of the same book.

And if you want the same service for sites in upteen other languages including Slovenian and Persian, that’s available too. Book Sources is truly a comprehensive service.

Oh, and see this page if you don’t know your ISBN from your ICBM.

Searching US Patents

Wednesday, September 12th, 2007


It was IBM who first brought practical free patent searching to the masses. But IBM’s service has now been replaced by which requires registration for even a basic search, and charges you to download the fruits of your search.

So what can you use instead? James Ryley, President of naturally suggests his own site, which lets you search without registering. You can view the text plus a representative image online, however you need to register for anything beyond that.

The natural place from which to search is surely the US Patent and Trademark Office Patent Search. Here you can view the text and all the images. But not so fast! The images are in TIFF format, which most browsers won’t be able to see without installing a plugin. And the site is somewhat arcane – for example, there are complicated instructions for linking to individual patents.

Once again, Google comes to the rescue. Google Patent Search makes it easy to search over 7 million patents, and it doesn’t make you sing and dance before you see the results. You get the text, the images (directly viewable from the browser) and a PDF download if you need it. No fuss, no muss.

So much for the technical resources, but how do you actually find what you need to know? Alice Kawakami, Information Specialist at the University of Southern California, shares tips about Patents and Patent Searching, or you may prefer eHow’s more basic description of How to Conduct a Patent Search.

If you want to know much more about how the patent system works, there’s a huge and informative document on Patent Search.

On the other hand, if the whole patent system strikes you as absurd, disfunctional and self-serving, then you are not alone. explains and evangelizes all the problems – then offers legal resources and tools to help you “survive the patenting frenzy of the Internet, Bioinformatics, and Electronic Commerce”.