Friday, March 28, 2014

Reading notes for Unit 10

Information retrieval systems often have to deal with very large amounts of data. They must be able to process many gigabytes or even terabytes of text, and to build and maintain an index for millions of documents.
Parallel Query Processing is an efficient way to process a large scale of indexing file when doing a query. I got that in reality, search engine retrieve the data from a huge posting and indexing file and parallel query processing become necessary in part. Splitting the indexing files making them work simultaneously and return top K results to central server. 

I also learned something about redundancy and fault tolerance distributed search engines and index construction and statistical analysis of a corpus of text.

No comments:

Post a Comment