Archive for the 'Search Engines' Category

Search Engines trust their own Answers

Wednesday, May 17th, 2006

Yahoo also has an Answers service. It's very different from Google Answers, because the Yahoo questions are limited to 110 characters, and no cash changes hands. Yahoo Answers has now come out of beta and is being “integrated” into their core search services.

Yahoo was corporate-blogging about how “questions and answers are being surfaced within results“, and gave as an example a search for best dog for apartment, where the YA question/answer appears on the first page of the results.

Nicholas Carr decided to look into this more deeply. He found that a Google search for best dog for apartment returned a Google Answers question/answer on the first page of the results, with YA on page three.

Back on Yahoo, GA was nowhere to be seen, and MSN Search didn't return either the GA or YA pages.

Search engines trust their own answers.

For the Greater Google…Part II

Wednesday, May 17th, 2006

Back to the big question of “How can we make Google even better?” (and I’m not sure a death ray is best way to go, here)

My big suggestion for the day is…

Fixing Ctrl-C. One of the most basic tasks in working with text is the old cut-and-paste. It’s also one of the most @#$%^&*! intolerable. How many times have you tried to simply paste text from a web page into a document, only to have it come out jibberish, or have screwy line breaks, or a ton of unwanted characters, or wind up with invisible code being made visible?

And tables…oy vey! Pasting text from web columns into a spreadsheet is a fool’s errand, as likely to result in all the text being dumped into a single cell, as in any sort of neat, usable, formatted table. Cutting and pasting text and tables from PDF files is a Sisyphussian task, and attempting to cut/paste from a Google cache will more than likely freeze your window.

Use ‘Paste-Special’, you say? To which I say…Hah!

This isn’t all Google’s fault, of course. But that’s just the problem It’s nobody’s fault…and nobody’s working to fix it. If only Google would make it their mission, what an act of public service they would be performing. Heck, they’re all geniuses. Coming up with a good Ctrl-C/Ctrl-V fix should take ’em about ten minutes.

They’ve already got their Google Notebook in beta, so what better platform for working on the cut-and-paste problem?

It’s 11:00 a.m. in Washington DC as I post this. I’ll be looking for the fix by, shall we say, noon?

David Sarokin aka pafalafaga

The “allintext:” modifier

Wednesday, May 17th, 2006

Google supports various modifiers that you can use to refine your search query. An interesting one is “allintext:”. If you place this modifier before your query, Google will only return pages which contain all the query terms in the text of the page.

You can see this modifier in action by searching for [“to be or not to be”] (the square brackets indicate the beginning and end of the search text; you don’t type them in). Amongst the pages returned are some that don’t match the phrase exactly.

For example, in fourth position is a children’s page about two bees called 2Bee and Queen Nottoobee. Perhaps some other web page links to this page with the text “to be or not to be” as the hyperlink, or perhaps Google is just being terribly clever in returning this page, which certainly doesn’t contain the search term.

No problem: you can search instead for [allintext:”to be or not to be”]. Now the results drop by about a million, and every page has “to be or not to be” highlighted in its snippet.

Similarly, the “allintext:” modifier is useful to remove from the search results pages where the search words are present in the URL or page title, but not in the page content.

I can see you!

Monday, May 15th, 2006

Probably one of the most widespread, yet also most overlooked phenomena of the Internet Age is that almost everybody has at least a vague idea of the incredible amount of information that can be found online, while hardly anyone can imagine that he/she is leaving footprints in Data Desert anyone can find without being a hacker.

Some days ago, for example, I met the sister of a friend of mine. Being of Indian descent (India in the Orient, that is), her name is not exactly an everyday name one would stumble across at every corner of the street. Just for fun, I googled her name, and I instantly knew what she was studying, at which university, and some other details. Nothing spectacular. The next time I met her, I casually mentioned what I knew and she was utterly surprised.

At another occasion, when I was working on a question for Google Answers, I started with the name of a little-known, highly specialized medical journal a customer desired to contact. After half an hour, I did not only know who the publisher was, I also had his exact address, a photo of his house and I knew what kind of wood the floor in his living room is made of, to mention only some things. And I'm pretty sure I could have found out even more about him.

It is a bit frightening how much information about ordinary private individuals you can find if you only know how to search for it and how to combine the bits and pieces. And the amount of information puzzle pieces available to the experienced online researcher seems to be growing enormously. Next time you walk in the street, look at the people around you and ask yourself: Do they have the slightest idea of what you could find out about many of them, without even having to leave your computer desk at home?

Maybe it's better they don't know. Bismarck once said: “People can sleep peacefully as long as they do not know how sausages and politics are made.” And what can be found about them on the Web, I'd like to add.

For the Greater Google…

Monday, May 15th, 2006

All right, Google. You and Yahoo are cloning each other’s services (gigabyte e-mail, anyone?). Microsoft’s breathing down your back. AltaVista’s looking to make a comeback with cool, new, knock-yer-socks-off search features.

How do you stay on top in the searching-for-everything business?

There are a few things I’d like to see added to Google searching that can really make a difference. I’ll begin by mentioning a few today, and will grow the list over the course of a few posts.

It would be nice to also see the list grow with comments from fellow bloggers…I can’t be the only one with an unrequited yearning for new and cool power-search features, such as…

Special characters

No, I don’t mean Donald Duck or Donald Trump (not that they’re not both special). I’m talking about keyboard characters that we all use a zillion times over, but that no search engine on the face of the planet has opted to fully index.

Take the almighty dollar sign. Sometimes Google recognizes it, sometimes not. Google has a splendid number range feature, that can search for all numbers between, say, 5,000 and 10,000. But suppose I’m not interested in all numbers. Suppose I just want to zero in on webpages that mention prices and costs…and not in Yen or Pounds or Euros. Just good ol’ dollars. A search that consistently recognized $ could distinguish between ‘5,000 people spent $100 dollars each…’ and ‘100 people spent $5,000 each…”. And of course, the same feature for Yen and Pounds and Euros makes just as much searching sense in this globally-connected age of ours.

Percents. Same, thing, sort of, for the percent sign. It would be very cool and very convenient to be able to specify that a search is looking for numbers-as-percents, rather than any old number that happens to happen by. The percents issue also brings up the topic of symbol and numeral ‘translations’ (for want of a better term), of which, there is more below.

Escargot. Bet you didn’t know that’s what the French sometimes call the @ sign. It does look like a snail, non? The nefarious, ubiquitous, emminently spammable at-sign…Why the hell can’t we search for it? Obvious, you say? Because the spammers will then be able to get email addresses off the internet? Well…big news flash!…they already can, using specialized harvesting software. But for all the rest of us, searching for the occasional email address is made overly difficult by the inability of search engines to recognize @ as part of their searches. I, for one, would love to be able to find some addresses, so I could contact a few of their customer service humans when I’m feeling the need for some human-to-human customer service.

Symbol/Number/Name/Abbreviation Translations.

Have you ever wanted to search for an exact hit on 10% ? I have. But how many ways can 10% can be written…! 10%. 10 percent. 10 per cent. Ten percent. Ten per cent. And there are probably a few that I’m missing. So, not only should Google learn to recognize the percent sign, it should learn to ‘translate’ some mainstream items like %, so that my poor carpal-tunneled fingers don’t have to type the same phrase six different ways. In addition to percents, Google should be able to translate “a hundred dollars” as “100 dollars” and “$100”. Bob and Bill should translate as Robert and William. Corp as Corporation, Ltd as Limited. “Baltimore & Ohio” as “Baltimore and Ohio” In essence, Google already ‘translates’ misspellings, so why not take the next logical step?

That’s all the brain dump I’m dumping for now.

‘Till next time…

Dave aka pafalafaga

Google knows all

Friday, May 12th, 2006

If I’m logged in when I do a Google search, and I click on one or more of the results, Google remembers my click.

If I later do another search, and those same web pages appear in the results, Google tells me how many times I’ve been there and when my most recent visit was.

Google truly knows all.


Google Trends

Wednesday, May 10th, 2006

Google Trends is a new service from Google Labs that shows you the search trends for any high-volume search term. Here are a few examples:

You can even break down the results by region. It’s like having your own Google Zeitgeist. One could waste a lot of time here…

How Google overtook AltaVista

Tuesday, May 9th, 2006

Before Google rose to dominance, AltaVista was king of search.

Google's PageRank algorithm is usually credited as enabling Google to steal the crown from AltaVista, but it's not quite as simple as that.

AltaVista ranked web pages for relevance based on the content of the page – and did a reasonable job of it – but AltaVista did not analyse the structure of the web itself.

Google looked at the incoming and outgoing links for each page, and deduced a page rank from that data, on the assumption that people were more likely to link to a worthwhile page than to a useless one.

That simplistic assumption was true in those days. Link farms and other forms of link spam did not yet exist. Sure, there was plenty of keyword stuffing (particularly in the webpage meta-tags which AltaVista paid some attention to), but Google's PageRank did the trick and sorted out the quality pages from the chaff.

The real brilliance though was a marketing coup – the “I'm Feeling Lucky” button. Google was the only search engine with good enough results that you could get directly to, say, the Hewlett Packard home page by entering hewlett packard into a search box and clicking a button. Nowadays, the “I'm Feeling Lucky” button doesn't seem to work quite as well as it used to – probably due to spam – but there are still people who use it regularly.

Even so, I don't think that PageRank and “I'm Feeling Lucky” alone would have been enough for Google to take over, but Google made two other important innovations.

The first is so simple that it now seems obvious, yet other search engines had not done it. I'm talking about making search terms combine with “AND” by default. In other words, if you enter purple robin into the search box you will only see pages that include the word robin AND the word purple.

In contrast, earlier search engines usually combined search terms with “OR” by default (although they generally provided an “AND” operator for advanced queries).

Google's default choice of “AND” produced shorter and more relevant results, and also allowed a search to be easily refined by simply adding more search terms, making it easy to “home in” on the desired page. There was also no more need for a “search within results” option, although Google still provides this – I suppose for sentimental reasons.

Finally, Google advanced the state-of-the-art by providing decent snippets. AltaVista snippets consisted of the first few sentences of the web page, but this often included navigation words and menu choices. Google showed a relevant extract with the search terms highlighted, which made it much easier to see which results were worth clicking on.

In my opinion, those factors are what enabled Google to rise to dominance. Of course, there were also some things that AltaVista did better. I will make them the subject of a later post – but suffice to say that they weren't enough!