August 26, 2008
Known applications of MapReduce
Most of the actual MapReduce applications I’ve heard of fall into a few areas:
- Text tokenization, indexing, and search
- Creation of other kinds of data structures (e.g., graphs)
- Data mining and machine learning
That covers all MapReduce apps I recall hearing about via commercial companies and users, and also includes most of what’s in the two big sources I found online. To wit:
1. In a slide presentation, Google offers the following applications of MapReduce:
- distributed grep
- distributed sort
- web link-graph reversal
- term-vector per host
- web access log stats
- inverted index construction
- document clustering
- machine learning
- statistical machine translation
2. The Hadoop applications page offers a rich trove of applications. Excerpts include:
- Aggregate, store, and analyze data related to in-stream viewing behavior of Internet video audiences.
- Analytics
- Analyze and index textual information
- Analyzing similarities of user’s behavior.
- Build scalable machine learning algorithms like canopy clustering, k-means and many more to come (naive bayes classifiers, others)
- Charts calculation and web log analysis
- Crawl Blog posts and later process them.
- Crawling, processing, serving and log analysis
- Data mining and blog crawling
- Facial similarity and recognition across large datasets.
- Filter and index our listings, removing exact duplicates and grouping similar ones.
- Filtering and indexing listing, processing log analysis, and for recommendation data.
- Flexible web search engine software
- Gathering world wide DNS data in order to discover content distribution networks and configuration issues
- Generating web graphs
- Image based video copyright protection.
- Image content based advertising and auto-tagging for social media.
- Image processing environment for image-based product recommendation system
- Image retrieval engine
- Large scale image conversions
- Latent Semantic Analysis, Collaborative Filtering
- Log analysis, data mining and machine learning
- Natural Language Search
- Open source social search tools.
- Parses and indexes mail logs for search
- Plot the entire internet
- Process apache log, analyzing user’s action and click flow and the links click with any specified page in site and more.
- Process clickstream and demographic data in order to create web analytic reports.
- Process data relating to people on the web
- Process documents from a continuous web crawl and distributed training of support vector machines
- Process whole price data user input with map/reduce.
- Produce statistics.
- Product search indices
- Recommender system for behavioral targeting, plus other clickstream analytics
- Reduce usage data for internal metrics, for search indexing and for recommendation data.
- Research for Ad Systems and Web Search
- Retrieving and Analyzing Biomedical Knowledge
- Run Naive Bayes classifiers in parallel over crawl data to discover event information
- Search engine for chiropractic information, local chiropractors, products and schools
- Serve large Lucene indexes
- Session analysis and report generation
- Source code search engine
- Statistical analysis and modeling at scale.
- Storage, log analysis, and pattern discovery/analysis.
- Store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning.
- Teaching and general research activities on natural language processing and machine learning.
- Vertical search engine for trustworthy wine information
There also were some research apps and some general processing speed-up apps I found harder to excerpt.
Some of our recent links about MapReduce
- The integration of MapReduce with SQL data warehousing
- Three major applications of MapReduce
- Another application of MapReduce
- Sound bites about MapReduce
- Other links about MapReduce
Comments
16 Responses to “Known applications of MapReduce”
Leave a Reply
[…] the sky is really the limit for anyone to build powerful analytic apps. Curt Monash has posted an excellent compendium of applications that are successfully leveraging the MapReduce paradigm […]
[…] Three major applications of MapReduce […]
[…] third approach is my Subject Of The Week: MapReduce. When I posted a list of canonical MapReduce applications, my friends at Aster Data offered one pushback — I left out the area of data transformation. […]
[…] Three major applications of MapReduce […]
[…] Three major applications of MapReduce […]
There is a coding tutorial available at this link in the middle of the page: http://www.greenplum.com/resources/mapreduce/
Key things to note about Greenplum’s MR implementation:
– It’s very similar in form and expression to Google and Hadoop
– Extensions for Joins and Pipelined task execution
– Native parallel file access
– Parallelism is full and transparent to the programmer
In summary: we have implemented MapReduce within which you can write SQL, Perl, Python and many more languages. It is straightforward to use MR programs written for Hadoop or Google and port them to Greenplum.
[…] Three major applications of MapReduce […]
[…] If you are unable to attend, or eager to understand, here are some MapReduce resources you may find informative: Aster’s whitepaper on In-Database MapReduce; Google Labs’ MapReduce research paper; Curt Monash’s post on Known Applications of MapReduce. […]
[…] По существу, вы можете сделать все, что угодно с одной записью* – это шаг map. Но вы сильно ограничены в том, как вы можете объединить информацию о многих (часто промежуточных) записях – это шаг reduce. Тем не менее, шаг reduce позволяет вам выполнять подсчет, суммирование и другие операции агрегирования. Сей факт, вкупе с универсальной мощью шагов map, делает MapReduce полезным, по меньшей мере, для трех важных классов приложений: […]
[…] «Темой Недели»: MapReduce. Когда я опубликовал список канонических приложений MapReduce, мои друзья из компании Aster Data предложили мне еще одно […]
[…] Три основных области применения MapReduce […]
[…] Examples abound. Consider a SQL/MR function which applies a complex model to score the data in the database, whether it’s scoring a customer for insurance risk, scoring an internet user for an ad’s effectiveness, or scoring a snippet of text for its sentiment. These functions often construct a data structure in memory to accelerate scoring, which works very well with the SQL/MR API: build the data structure once and reuse it across a large number of rows. […]
Hadoop-MR Use Cases…
I’m trying to college known uses of Hadoop/GFS/MapReduce, and categorize them somewhat. When possible, citations are great…….
[…] Datameer seems to be designed for the classic MapReduce use cases of ETL and heavy data […]
[…] Google points out that MapReduce is a powerful tool that can be applied for a variety of purposes including distributed grep, distributed sort, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning and statistical machine translation. A much longer list of MapReduce applications is available at http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/. […]
Hi there just wanted to give you a quick heads up. The text in your post seem to be running off the
screen in Ie. I’m not sure if this is a formatting issue or something
to do with web browser compatibility but I thought I’d post to
let you know. The design look great though! Hope you get the issue solved soon.
Thanks