Information retrieval systems often have to deal with very large
amounts of data. They must be able to process many gigabytes or even terabytes
of text, and to build and maintain an index for millions of documents.
Parallel Query Processing is an
efficient way to process a large scale of indexing file when doing a
query. I got that in reality, search engine retrieve the data from a huge
posting and indexing file and parallel query processing become necessary
in part. Splitting the indexing files making them work simultaneously and
return top K results to central server.
I also learned something about
redundancy and fault tolerance distributed search engines and index
construction and statistical analysis of a corpus of text.
No comments:
Post a Comment