A user searching for maui golf real estate is not merely seeking news or entertainment on the subject of housing on golf courses on the island of Maui, but instead likely to be seeking to purchase such a property.
It is crucial that we understand the users of web search as well. This is again a significant change from traditional information retrieval,where users were typically professionals with at least some training in the art of phrasing queries over a well-authored collection whose style and structure they understood well. In contrast, web search users tend to not know (or care) about the heterogeneity of web content, the syntax of query languages and the art of phrasing queries; indeed, a mainstream tool (as web search has come to become) should not place such onerous demands on billions of people. To a first approximation, comprehensiveness grows with index size, although it does matter which specific pages a search engine indexes – some pages are more informative than others. It is also difficult to reason about the fraction of theWeb indexed by a search engine, because there is an infinite number of dynamic web pages.
21 Link analysis
In this chapter what I have learned is that the analysis of hyperlinks and the graph structure of the Web has been instrumental in the development of web search. In this chapterwe focus on the use of hyperlinks for ranking web search results. Such link analysis is one of many factors considered by web search engines in computing a composite score for a web page on any given query.
Link analysis forweb search has intellectual antecedents in the field of citation analysis, aspects of which overlap with an area known as bibliometrics. These disciplines seek to quantify the influence of scholarly articles by analyzing the pattern of citations amongst them. Much as citations represent the conferral of authority from a scholarly article to others, link analysis on the Web treats hyperlinks from a web page to another as a conferral of authority. Clearly, not every citation or hyperlink implies such authority conferral; for this reason, simply measuring the quality of a web page by the number of in-links (citations from other pages) is not robust enough.
Paper: Authoritative Sources in a Hyperlinked Environment
Rich
source of information can be provided by the structure of hyperlinked environment
and effective means for understanding it can be provided by the content of the
environment. Algorithmic tools is developed by the author to extract
information from the link structures of such environments, reporting on
experiments that demonstrate their effectiveness in a variety of contexts on
the World Wide Web. The central issue would be the distillation of broad
search topics, through the discovery of “authoritative” information sources on
such topics. An algorithmic formulation was proposed basing on the
relationship between set of relevant authoritative pages and set of “hub pages”
joining them together in the link structure. The formulation has connections to
the eigenvectors of certain matrices associated with the link graph which in
turn motivate additional heuristics for link-based analysis. This is the
point of this paper.
Paper: The Anatomy of a Large-Scale Hypertextual Web Search Engine
This paper took Google as an example, a prototype of a large-scale
search engine which makes heavy use of the structure present in hypertext.
Google is designed to crawl and index the Web efficiently and produce much more
satisfying search results than existing systems. A search engine is a challenging task. Search engines index tens to hundreds
of millions of web pages involving a comparable number of distinct terms. They
answer tens of millions of queries every day. Even though it is large-scale
search engines on the web is important, little academic research has been
done on them. Also rapid advance in technology and web proliferation creates a
web search engine today is very different from three years ago. The paper has
in-depth description of our large-scale web search engine which is detailed
public description we know of to date. Some the problems of scaling traditional
search techniques to data of this magnitude. New technical challenges are
involved using additional information present in hypertext to produce better
search results. I learned that how to build a practical large-scale system
which can exploit the additional information present in hypertext and how to
effectively deal with uncontrolled hypertext collections .
No comments:
Post a Comment