Friday, March 21, 2014

Reading notes for Unit 9

19 Web search basics
A user searching for maui golf real estate is not merely seeking news or entertainment on the subject of housing on golf courses on the island of Maui, but instead likely to be seeking to purchase such a property.
It is crucial that we understand the users of web search as well. This is again a significant change from traditional information retrieval,where users were typically professionals with at least some training in the art of phrasing queries over a well-authored collection whose style and structure they understood well. In contrast, web search users tend to not know (or care) about the heterogeneity of web content, the syntax of query languages and the art of phrasing queries; indeed, a mainstream tool (as web search has come to become) should not place such onerous demands on billions of people. To a first approximation, comprehensiveness grows with index size, although it does matter which specific pages a search engine indexes – some pages are more informative than others. It is also difficult to reason about the fraction of theWeb indexed by a search engine, because there is an infinite number of dynamic web pages.

21 Link analysis
In this chapter what I have learned is that the analysis of hyperlinks and the graph structure of the Web has been instrumental in the development of web search. In this chapterwe focus on the use of hyperlinks for ranking web search results. Such link analysis is one of many factors considered by web search engines in computing a composite score for a web page on any given query.

Link analysis forweb search has intellectual antecedents in the field of citation analysis, aspects of which overlap with an area known as bibliometrics. These disciplines seek to quantify the influence of scholarly articles by analyzing the pattern of citations amongst them. Much as citations represent the conferral of authority from a scholarly article to others, link analysis on the Web treats hyperlinks from a web page to another as a conferral of authority. Clearly, not every citation or hyperlink implies such authority conferral; for this reason, simply measuring the quality of a web page by the number of in-links (citations from other pages) is not robust enough.

Paper: Authoritative Sources in a Hyperlinked Environment
Rich source of information can be provided by the structure of hyperlinked environment and effective means for understanding it can be provided by the content of the environment. Algorithmic tools is developed by the author to extract information from the link structures of such environments, reporting on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue would be the distillation of broad search topics, through the discovery of “authoritative” information sources on such topics. An algorithmic formulation was proposed basing on the relationship between set of relevant authoritative pages and set of “hub pages” joining them together in the link structure. The formulation has connections to the eigenvectors of certain matrices associated with the link graph which in turn motivate additional heuristics for link-based analysis. This is the point of this paper.


Paper: The Anatomy of a Large-Scale Hypertextual Web Search Engine

No comments:

Post a Comment