MapReduce: Now Google Invents New Ways to Manage Data!
Once opon a time, if you wished to make order of large sets of data, you would need to do two things: Firstly, create a meticulously manganed and maintained database, using tags and catagories as data land-marks. Secondly you would need a very large computer, to sift through your data using complex query.
This is all fine, untill your data grows in size to petabyte scale, then old way simply isn’t feasible. Tagging, sorting, and categorizing, would take an emmence amount of time. A single computer, no matter how large, just can’t crunch that many numbers at once.
Google use a very different approach, when sifting and ordering the world wide web. Thier solution for working with colossal data sets, is an approach called MapReduce.
It works like this:
1. Collect
MapReduce does not depend on traditional databases, where information is collected then categorized. We’ll just gather up the full text of every book Google has scanned.
2. Map
You then write a function to map the data: “Count every use of every word in Google Books.” The request then splits among all the computers within your army, each is assigned a chunk of data to work with.
3. Save
Each PC doing a map, writes the results to its local hard drive, cutting down on data transfer time. Then the computers that have been assigned a “reduce” function grab the lists from the mappers.
4. Reduce
Then the Reduce computers correlate the lists of words. Now you would know the frequency of a particular word that is used, and in which books.
5. Solve
The system finally creates a data set about your data! In my example, the final list of words is stored as separate sets, so it can be quickly referenced or queried. So then you don’t have to plow through unrelated data to get your answer.
Please drop us a comment, and share our lovely and insightfull post…thanks!
Post comment
Latest Blog Posts
- Fortune 500 Companies: Slowly Enguaging With Social Media!
March 3, 2010
0 Comments - RF Intent Index: What is Intent Driven Social Media?
February 9, 2010
4 Comments - MapReduce: Now Google Invents New Ways to Manage Data!
February 3, 2010
0 Comments



