MapReduce: Now Google Invents New Ways to Manage Data!

Once opon a time, if you wished to make order of large sets of data, you would need to do two things: Firstly, create a meticulously manganed and maintained database, using tags and catagories as data land-marks. Secondly you would need a very large computer, to sift through your data using complex query.

This is all fine, untill your data grows in size to petabyte scale, then old way simply isn’t feasible. Tagging, sorting, and categorizing, would take an emmence amount of time. A single computer, no matter how large, just can’t crunch that many numbers at once.

Google Data Structure

Google Data Structure

Google use a very different approach, when sifting and ordering the world wide web. Thier solution for working with colossal data sets, is an approach called MapReduce.

It works like this:

1. Collect

MapReduce does not depend on traditional databases, where information is collected then categorized. We’ll just gather up the full text of every book Google has scanned.

2. Map

You then write a function to map the data: “Count every use of every word in Google Books.” The request then splits among all the computers within your army, each is assigned a chunk of data to work with.

3. Save

Each PC doing a map, writes the results to its local hard drive, cutting down on data transfer time. Then the computers that have been assigned a “reduce” function grab the lists from the mappers.

4. Reduce

Then the Reduce computers correlate the lists of words. Now you would know the frequency of a particular word that is used, and in which books.

5. Solve

The system finally creates a data set about your data! In my example, the final list of words is stored as separate sets, so it can be quickly referenced or queried. So then you don’t have to plow through unrelated data to get your answer.

Please drop us a comment, and share our lovely and insightfull post…thanks! ;-)

Post to Twitter Post to Plurk Post to Yahoo Buzz Post to Delicious Post to Digg Post to Facebook Post to MySpace Post to Ping.fm Post to Reddit Post to StumbleUpon

Post comment

Follow Us On Twitter Join Us On Facebook Subscribe to RSS Feed

Latest Blog Posts

  • Popular Topics

  • Recent Comments