Sunday, February 07, 2010

WSDM, Vowels Ltd.

WSDM 2010 conf took place at NYU/Poly last week. It is a conf primarily focused on web search and web data mining, with a strong attendance of 200 or so. It was good to see the Yahoo! folks (Andrei, Evgeniy, Vanya, Sergei, Baeza-Yates, ...), Microsoft folks (Rakesh Agarwal, Rina, Sreenivas Gollapudi, ...), some of the alg/database folks (A* Das Sarma brothers, Laks L, John Byers, Tomasz I, ...) and students (Aleksandra, ...).

Soumen Chakrabarti gave a plenary talk. Between the two extremes of current web search based on query terms and say natural language or SQL-complete query language, Soumen identified a language (S-language?) with variables, predicates, and certain aggregates, suitable for web searches that are now a challenge. Then he discussed the main tasks in supporting this language. These include spotting (entities+context => larger labels, called spots), disembiguation (by connecting spots to Wiki) and ranking (proximity models + context scoring -> ranking, eg., by reducing it to selecting rectangles from 2-dim points). This S-approach ultimately generates billions of microlinks between web pages and creates new info pathways in the web. Tom Mitchell asked, what is the weakest link in this approach? Quantifying the accuracy of various annotations (in particular, in terms of search accuracy) is hard. Of course, ranking beyond single snippets is a challenge, that to be formal, needs reasoning about probabilistic data. Other questions were how does S-approach differ from semantic web (which orders the web, this approach works with the disorder), NLP (other languages here besides natural languages such as lang of tables, site organization etc), IR Trec, Question & Answer systems, and so on.

Susan Athey gave a plenary talk on Ad Marketplaces. She started with the mantra of Economics Theory + Empirics + Experiments => exciting world of ad auctions, exciting even for Economists who have seen big successes with auctions in reality. She gave a high level view of auction-based platforms: aspects of information feedback and dispersal that makes them unique, the objectives that include long term participation and 1st order issues of competition, etc. She then argued for building structural and behavioral models for ad auctions so we can learn counterfactuals and predict out-of-sample situations, all disclaimers notwithstanding on the challenges of making such models stick. The bulk of her talk was on her forthcoming work with Denis Nekipelov on "Equilibrium and Uncertainty in Sponsored Search Advertising", which follows this methodology. One of the outcomes of this analysis is the guidance we now know well in sponsored search that bidders need to operate where marginal cost per click equals their value, but the more interesting outcome seemed to be that they could plot data for what would happen if we had 20% more bidders and other what-ifs. Also, stochastic budget optimization problems pop right out of their models. Tom Mitchell asked why there were negative dips in the plot (I gave Tom the answer after the talk: when you increase bids, you qualify for new keywords with higher reserves and spend more per click, reducing total clicks for your budget; there are other discontinuities in the system). I asked if they had looked at applying this methodology to display ads (vs sponsored search), which is a bigger beast, not yet understood, where the potential for new impactful work is much larger.

Notes:
  • My small contribution (with Sergei and Sihem) to WSDM was to propose The Park for banquet. The space worked out well, drinks flowed, food worked out less good.
  • There was a tweet stream running live on the screen during the talks. The speakers dont get to see the stream! :) Bunch of blogs about WSDM eg., Daniel.
Some random thoughts:
  • Did Wiki save IR research?
  • Is there a non-vector of features (bag of words) view of the world in IR, please?
  • Michael Jordan saves NBA, speaks for Nike and is a perennial web search example.
  • First price auction is the whipping post of ad auction research.

Labels:

2 Comments:

Anonymous Anonymous said...

# Did Wiki save IR research?

Not sure what this means. IR has been pretty active inside of web search with tasks such as learning to rank; outside of web search with domain-specific tasks and ranking (e.g. legal, enterprise) and evaluation (e.g. retrieval metrics, labeling policies).

# Is there a non-vector of features (bag of words) view of the world in IR, please?

Most recent ranking models are feature-based although few of the "learning to rank" approaches use words as features (although individual features may be derived from some sort of bow analysis). If you're looking for higher order n-grams, the research consistently suggests that the value really depends on the task, with a lot of results showing that 1-grams are frustratingly difficult to beat in performance.

10:47 AM  
Anonymous xl pharmacy said...

INteresting piece of information, I'm very interested in this one, it is what I'm studying.

6:58 AM  

Post a Comment

<< Home