Archive for the 'Research' Category

UofM says paid Q&A sites get best results

Tuesday, April 15th, 2008

Harper, Raban, Rafaeli & Konstan from the University of Minnesota have investigated online Q&A services to find predictors of answer quality. In their paper, they report that:

First, you get what you pay for in Q&A sites. Answer quality was typically higher in Google Answers (a fee-based site) than in the free sites we studied, and paying more money for an answer led to better outcomes. Second, we find that a Q&A site’s community of users contributes to its success.

This doesn’t come as a surprise to me, but the breakdown of judged answer quality is interesting:

  • Quality score 0.68 – Google Answers $30 questions
  • Quality score 0.59 – Google Answers $10 questions
  • Quality score 0.51 – Yahoo Answers
  • Quality score 0.41 – Google Answers $3 questions
  • Quality score 0.41 – Library reference services
  • Quality score 0.40 – Microsoft Live QnA
  • Quality score 0.33 – AllExperts

Bobbie7’s answer to Which actress has the first female line in a talking movie? was highlighted in the report.

One of the luxuries of academic research is being able to take your time. The paper was published earlier this month, but Google Answers closed in 2006.

Dodging misinformation

Tuesday, February 5th, 2008

In my previous post I mentioned a heuristic which we can use to help judge the reliability of what we read: Has a fact been derived from a single grand vision, or from many different ideas?

That idea is that a fact derived from the convergence of many different ideas is more likely to be reliable and robust than one derived from a single grand vision (which might turn out to be a single grand delusion).

What are some other heuristics that we can use when trying to dodge misinformation? There are plenty of good suggestions at Google Answers, on a question entitled Quality of Information. The question itself is delightful to read, because it’s so beautifully written.

Here is a synthesis of ideas offered by pinkfreud-ga (who posted the answer), plus commenters journalist-ga, j-philipp-ga, aceresearcher-ga, luciaphile-ga and voila-ga (all of whom were Google Answers Researchers).

  • The context of a website can cast doubt on the authority of its content. Can you trust information placed on a site whose purpose seems to be not the spread of knowledge, but the spread of animated graphics, uninvited MIDI music and intrusive pop-ups? (it's a joke)
  • Likewise, the purported authorship of a website can cast doubt on the authority of its content. Would you rely on a source identified only as armadillogirl?
  • Poor spelling can be a warning (Kemlo’s posts excepted, of course). If someone hasn’t taken the time to check the spelling of their text, are they likely to have taken the time to check its correctness? It’s like your local takeaway: there’s no hygiene reason why the outside of their windows must be clean, but if they have cleaned their windows then they have probably also cleaned in more important places too.
  • Does the website have an identifiable agenda? Even if it’s unrelated to the information that you are interested in, an agenda increases the possibility that the information is not been assembled in a rigorous and balanced way.
  • Is the site hosted at a location that is generally used for the dissemination of information? It’s prejudice I know, but a MySpace site may not be as reliable as an .edu site.

This might all sound a bit discouraging, but there are also some positive indicators:

  • Are sources cited?
  • Has time and care been taken to lay out the site?
  • Does the site appear to be motivated by the desire to spread knowledge?
  • Is the site regularly kept up-to-date?

Some types of content should raise warning flags. This kind of content is not always untrustworthy, but you do need to cross-check:

  • Quotes and their origins. Misinformation gets repeated as gospel. You need to check the original source if it is at all possible.
  • Something unbelievable yet somehow compelling. This is the stuff of which urban legends are made. If it was so unbelievable yet true, it would be more widely mentioned and discussed than on niche websites.
  • A “fact” that could hurt the reputation of a person or a company. It’s quite likely that it’s nothing more than malicious fiction.
  • Someone wants to sell something. Maybe use their website if you want to buy what they are sellng. Don’t rely on the website for anything else.
  • Someone is seeking help of some sort by a mass appeal. Way less than one percent of these are going to be genuine. If you want to help, cross-check with other sources. A genuine appeal will be verifiable.
  • Someone is warning that your health (or that of your PC) is in peril. As if they would really know which files you should delete on my computer, or where you should send your bank account details to, to fix this fake “problem”.
  • Something is claimed to be true but cannot be explained by science. Well maybe it is true, but if neither they nor you can prove it, then all bets are off. Believe in it if you like, but don’t pass it on as a fact.
  • Knowledge is claimed to be suppressed by a conspiracy. It’s tempting to dismiss all of these claims as being made by nut-cases, but history shows that some of the conspiracy theories turn out to be true. The problem is that usually no-one knows which ones are true until the government archives are released fifty years later. If you want to establish claims of a conspiracy as fact, you need to look for evidence elsewhere, and not simply accept the word of the conspiracy site.
  • An extremely good or bad claim. So you’ve apparently won the lottery, or have a rare disease and will die tomorrow? It might be worth looking for evidence elsewhere.

J_philipp-ga warns us about statistics in general: There’s a saying that statistics are like bikinis. (“What they reveal is suggestive, but what they conceal is vital.”) Unless a website has a good reason to make a statistic out of the information, look further and use the original data instead.

Finally, pinkfreud-ga gives us a thought to ponder:

Brownie points for good humor and wit (genuinely funny people are, in my experience, more careful with facts and tend to be more trustworthy than are humorless wretches).

I never thought of it that way before, but I agree, and I like it that way.

Authoritative Misinformation

Thursday, January 10th, 2008

So you want some information, and you need it to be more reliable than the average web page. Who do you turn to?

You make the effort to track down an authority.

Not so fast. It doesn’t always work.

Pope Urban VIII, who was learned in the sciences, ratified the statement that The proposition that the Earth is not the centre of the world and immovable but that it moves, and also with a diurnal motion, is equally absurd and false philosophically and theologically considered at least erroneous in faith.

George Bush Junior and Tony Blair were quite sure that Iraq was stockpiling weapons of mass destruction.

OK, so maybe it’s not the best idea to depend upon political or religious authorities. What kind of authorities can we rely on?


Not so fast.

All scientists? Would you rely on scientists employed by a drug company, if you’re seeking information about that company’s drug? No? How about government scientists then?

That wouldn’t have helped if you had wanted to know about a possible connection between BSE and CJD in the 1990s. New Statesman suggests that the “independent” scientists contracted by the Ministry of Agriculture, Fisheries and Food were anything but. And Dr Harash Narang, the BSE/CJD whistleblower, was stripped of his authority as a result of his persistent warnings about BSE’s linkage to CJD.

What about a scientist of Einstein’s calibre? Sure, he got the theory of relativity right, but he was too quick to dismiss Alexander Friedman’s “expanding universe” solution to his (Einstein’s) equations, and also to dismiss Lemaitre’s early insights towards what would become Hubble’s Law.

So what can you do, as a researcher, if you can’t depend on authority? In a slow-moving field, you can look for a developed consensus. The Wikipedia “talk” pages can help you to discover whether consensus has formed or not.

In a quick-moving field, you can examine the evidence and draw your own conclusions – but that’s not easy unless you’re a specialist in the field. It can help to use sites such as zFacts which attempt to distill the salient facts about controversial issues.

Sometimes it’s worth checking out the “debunking” sites. Uncomfortable though it may seem, sometimes it’s easier to establish the validity of a debunking than the validity of the original postulate. is often worth checking out. It’s not intellectual but it often has its finger “on the pulse”.


Whether you have faith in what is said by authority figures, or whether you know that they can be as fallible and misguided as the rest of us, you may enjoy Christopher Cerf’s book The Experts Speak: The Definitive Compendium of Authoritative Misinformation. It’s by no means the only book in this nook though. There’s also 776 Stupidest Things Ever Said and Facts and fallacies: A book of definitive mistakes and misguided predictions.

On a more serious note there’s Expert Political Judgment: How Good Is It? How Can We Know? One of the reviews at the Amazon page for that book says in part:

This book is a rather dry description of good research into the forecasting abilities of people who are regarded as political experts. It is unusually fair and unbiased. His most important finding about what distinguishes the worst from the not-so-bad is that those on the hedgehog end of Isaiah Berlin’s spectrum (who derive predictions from a single grand vision) are wrong more often than those near the fox end (who use many different ideas).

Now that’s really interesting because it gives us a heuristic which we can use to help judge what we read: Has a fact been derived from a single grand vision, or from many different ideas?

I like that.

In a future Web Owls post I will explore other heuristics that we can make use of when trying to dodge misinformation.

Researching with Wikipedia

Thursday, November 16th, 2006

Previously on this blog we’ve considered whether Wikipedia can be a legitimate research source – and generally agreed that it can. Wikipedia itself has a page of tips on Researching with Wikipedia.

The page points out that “not everything in Wikipedia is accurate, comprehensive, or unbiased”, then discusses how to work around this. Most of the points they make will be obvious to seasoned researchers, but there are a few interesting twists.

Unlike a printed encyclopedia, you can examine the entire editing history of an article. This often gives an insight into opposing viewpoints on controversial issues.

You can also click on “what links here” (in the “Toolbox” section) to see which other Wikipedia articles consider the current one worth linking to. The range and nature of the linking articles can provide insight into how the subject matter fits into the bigger picture.

Another approach is to turn the research process into an active two-way exchange. Is there something in the article that is unclear, or which might be misleading or incomplete? Go to the corresponding “Talk” page and begin a dialogue with other Wikipedians. That way, you don’t just have to adopt a “take it or leave it” approach to Wikipedia – you can have a much more active involvement.

So, Wikipedia is very different from a conventional research source. Sometimes you can exploit those differences and make them work for you.

British surname?

Sunday, September 3rd, 2006

If you are British, or descend from a British ancestor, then the surname profiler web site may be of interest.

Type in your surname and a map will show the distribution in the UK of the name for 1881 and 1981.

Suggest you read the ‘small print’ to understand how the data was collected and its meaning.

Google Answers Researcher Interviews

Wednesday, August 23rd, 2006

Back in 2003, Philipp Lenssen of Google Blogoscoped ran a series of interviews with Google Answers researchers. The interviews have been approved by Google, but do not in any way represent Google’s views.


The interviews make interesting reading, and most of the researchers interviewed are still active on Google Answers.

There are interviews with clouseau-ga, digsalot-ga, easterangel-ga, journalist-ga, justaskscott-ga, knowledge_seeker-ga, larre-ga, leli-ga, missy-ga, omnivorous-ga, pafalafa-ga, pinkfreud-ga, politicalguru-ga, robertskelton-ga, scriptor-ga, sublime-ga, tehuti-ga, till-ga, tisme-ga, tlspiegel-ga and voila-ga (who writes that a book about her life would start with the sentence “Albatross, flavored dental floss, wing-ed fleeber’s dross; roll softballs in my salad”).

Incidentally, Philipp Lenssen recently had a positive experience with Yahoo Answers.

Domesday Book now online.

Monday, August 7th, 2006

In 1085 William the Conqueror’s England was under threat of invasion by the Danes. In order to find out the financial and military resources available to him, he ordered that a survey should be undertaken throughout England. The results of this great survey is known as The Domesday Book.

Copies of the survey are now available online at the UK’s National Archives. There is a searchable database for place and people’s names, and plenty of background information on how the survey was conducted, the questions asked, and what life was like in 11th century England. Images of the pages of the book require a small payment.

Everyone v Google — Google lawsuits

Wednesday, July 5th, 2006

Taking a look at Google’s latest quarterly report  (and, No, I don’t…Sob!…own any Google stock), I am reminded how often a large company like Google gets sued, and not just by litigious Americans:

Certain companies have filed trademark infringement and related claims against us over the display of ads in response to user queries that include trademark terms. The outcomes of these lawsuits have differed from jurisdiction to jurisdiction. Courts in France have held us liable for allowing advertisers to select certain trademarked terms as keywords. We are appealing those decisions. We were also subject to two lawsuits in Germany on similar matters where the courts held that we are not liable for the actions of our advertisers prior to notification of trademark rights. We are litigating or recently have litigated similar issues in other cases in the U.S., France, Germany, Italy, Israel and Austria. Adverse results in these lawsuits may result in, or even compel, a change in this practice which could result in a loss of revenue for us, which could harm our business.

Certain entities have also filed intellectual property claims against us, alleging that features of certain of our products, including Google Web Search, Google News, Google Image Search, and Google Book Search, infringe their rights. Adverse results in these lawsuits may include awards of damages and may also result in, or even compel, a change in our business practices, which could result in a loss of revenue for us or otherwise harm our business.

From time to time, we may also become a party to other litigation and subject to claims incident to the ordinary course of business, including intellectual property claims (in addition to the trademark and copyright matters noted above), labor and employment claims, breach of contract claims, and other matters…

This prompted me to do a nicely self-referential Google search on “v Google” to see what popped up:

Perfect 10 v. Google, Inc.

…Perfect 10 v. Google, Inc., et al….was a U.S. court case between an adult men’s magazine and the world’s leading search engine company, decided by the district court of the Central District of California in early 2006. The plaintiff requested an injunction for Google to stop creating and distributing thumbnails of its images in its Google Image Search service, and for it to stop indexing and linking to sites hosting such images. The court granted the request in part and denied it in part, ruling that the thumbnails were infringing but the links were not.

GEICO v. Google

…GEICO has filed suit against two major Internet search engine operators, Google Inc. and Overture Services Inc., in an effort to suppress advertising by competing insurance companies and online insurance brokers.


…Field contends that by allowing Internet users to access copies of 51 of his copyrighted works stored by Google in an online repository, Google violated Field’s exclusive rights to reproduce copies and distribute copies of those works.

Lane’s Gifts v. Google

…You may remember that last February, Google was sued in Arkansas over what is commonly called click fraud. We’re very near a resolution in that case, so we thought we’d offer an update…Google currently allows advertisers to apply for reimbursement for clicks they believe are invalid. They can do this for clicks that happen during the 60 days prior to notifying Google. Under the agreement with the plaintiffs, we are going to open up that window for all advertisers, regardless of when the questionable clicks occurred.

Gonzales v. Google, Inc.

…The U.S. Department of Justice filed a motion in federal court seeking a court order that would compel search engine company Google, Inc. to turn over “a multi-stage random sample of one million URL’s” from Google’s database, and a computer file with “the text of each search string entered onto Google’s search engine over a one-week period (absent any information identifying the person who entered such query

Authors Guild v. Google

…Class Action Lawsuit Alleging Copyright Infringement


…The eleven claims are: (1) direct copyright infringement, (2) contributory copyright infringement, (3) vicarious copyright infringement, (4) defamation, (5) invasion of privacy, (6) negligence, (7) Lanham Act violations, (8) and (9) racketeering, (10) abuse of process, and (11) civil conspiracy. v. Google

…Google has been sued for downgrading the PageRank of websites in contravention of its stated “objective” policies. In KinderStart’s case, they got kicked out of Google in March 2005 and immediately lost 70% of their traffic. Google is now 0.01% of KinderStart’s referral traffic.

Toback v. Google

…Sensing that his 15 minutes of fame was up, Jeffrey Toback has withdrawn his lawsuit against Google regarding Google’s alleged facilitation of child pornography.

Agence France Press v. Google Inc.

…The French news agency AFP (Agence France-Presse)  sued Google Inc. before the U.S. District Court in Washington, D.C., for pulling together photos and story excerpts from thousands of news Web sites (see Update 29). In its brief filed Oct. 12 Google argued that news headlines that are purely factual and merely ten words long lack sufficient orginality to preclude others from copying them. Google also seeks dismissal of the lawsuit on the ground that Agence France has failed to identify the allegedly infringed works with sufficient precision.

McGraw-Hill v. Google

…McGraw-Hill v. Google is the latest publisher lawsuit against the Google Library Project (via Copyfight). The complaint seems basic, again claiming the project is copyright infringement.

Digital Envoy, Inc. v. Google, Inc

Google has won a case brought against it by a specialist in location technology, reports. A district judge in   northern California last month dismissed a lawsuit filed by Digital Envoy that alleged that Google was in breach of a contract and was illegally profiting from Digital Envoy’s software. Digital Envoy owns software that pinpoints the
physical locations of Internet surfers to deliver advertisements.


And from Lexis-Nexis (no links…sorry!) I also came up with:

Advanced Internet Techs., Inc. v. Google, Inc

…Google sells advertising online. The plaintiffs in the present action allege that Google has fraudulently over-billed those who purchase advertising from it on a per-click basis.


…Plaintiff, NetJumper, Inc…initiated this lawsuit, in which it complained that Google, through an application of its “Google Toolbar, had infringed upon two of its patents…

Elwell v. Google, Inc

…Elwell challenges her demotion and reduction in pay during a high-risk pregnancy

Search King, Inc. v. Google Tech., Inc

…This case involves the interrelationship between Internet search engines and Internet advertising, and their collective connection to the First Amendment. More specifically, the questions at issue are whether a representation of the relative significance of a web site as it corresponds to a search query is a form of protected speech, and if so, whether the “speaker” is therefore insulated [*2]  from tort liability arising out of the intentional manipulation of such a representation under Oklahoma law.

And lastly, we have:

SEC v. Google

in a case involving flagrant stock fraud:

…This is an action for permanent injunction, accounting, and for the surrender of monies received in connection with…Company’s unregistered public offering and sale of securities.


Involved in stock fraud!!! 

How could I not have know about this?

This warranted some further investigation.  Using my much-heralded skills as a researcher, I determined the full name of the case:


No doubt, Jonathan N. will soon be suing the Google corporate namesake for dragging his good name through the mud.

Or should it be the other way around…?

pafalafaga Dave Sarokin


Google Unanswers

Tuesday, June 6th, 2006

I know that you, like I, are fascinated by all things having to do with Google Answers; otherwise, how did you get here, and why are you reading this?

And sooner or later, every afficianado-ga comes face-to-face with an obvious conundrum: Why do so many Google Answers questions go unanswered?

This is a good question. But it, itself, is a hard one to answer.

Some reasons are obvious, like the customers who offer up their two bucks and want a list of every zip code in the United States, the names and addresses of each person within each zip code, whether they own or rent, and a special check mark next to all those who have blue eyes.

And there are other GA clients who establish a certain, er, reputation among the researchers for being difficult to please — endless clarifications, poor ratings, and unrelenting mission creep. Out of a sense of professional decorum, I offer no links to such queries.

Still, there are plenty of questions like the one about diesel and gas engines that seem answerable, but don’t get answered (the poor guy even asked a follow up question wanting to know why his other question wasn’t answered…but the follow-up is in danger of going unanswered too!).

Here’s another one that went begging, on medieval church history, and this one at a hundred bucks.

Nor is there any shortage of unanswered $200 questions, like the customer looking for marketing contacts at banks, or any of these.

Which brings us back to the main topic: Why so many questions that get no answers?

I have my pet notions about this, and I hope to post these in days to come (I’m learning not to promise ahead on posts that I may or may not be able to get around to…).

But in the mean time, what of my fellow researchers and GA question-askers, and even the inveterate peanut gallery…and of course, anyone else who stumbles across this post.

What do you all think?

pafalafaga aka David Sarokin


 P.S.  Here’s a good example of a problemmatic question, due to:

–extreme lowballing on the price

–not knowing quite when to stop asking for things (had they stopped at #1, they had a chance at getting an answer), and

–asking for information that probably doesn’t exist

Choosing a question to answer

Monday, June 5th, 2006

How do researchers choose which Google Answers questions they wish to answer?

Obviously, one major factor is to consider the price set by the customer, weighed against the expected time to find the answer. Another factor to consider is how interesting the research process will be, or how satisfying the answer will be. A further factor is whether the subject is one with which the researcher is already familiar.

For me, there’s one criterion that is more important than all the others. Perhaps surprisingly, the difficulty of the question is not the main issue. The main criterion is this: when I have found the answer, am I confident that the asker will find the answer acceptable?

If I can’t be confident that my answer will be acceptable, it opens the door to all kinds of disappointments: an answer that the asker would rather not be told, or an answer to a different question from the one that the asker really had in mind, or an answer that’s correct but at quite the wrong level. Occasionally the customer doesn’t even want an answer but wants a dialogue.

Sometimes these situations can be repaired using the clarification process, but sometimes it just isn’t possible to provide an alternative answer that satisfies.

So, if anyone reading this is considering asking a question, please make it crystal-clear what kind of answer would satisfy you!