Twitter is considering using MapReduce
From a Twitter job listing (formatting mine). The most interesting section is “Additional preferred experience.”
Responsibilities
1. Data Warehouse Design and Development
* Translate business requirements into system design and implementation.
* Hands on build of end-to-end BI infrastructure, from raw data to warehouse to reporting.
2. Support
* Design, code, monitor and document highly automated and repeatable processes for maintaining data loading, distribution, integrity, backup, etc.
3. Analytics
* Write and interpret complex SQL queries for standard as well as ad-hoc data mining purposes.
* Summarize and report key analytical findings in both oral and written form.
* Develop, publish and maintain reports from business requirements.
Requirements
1. Conceptual Frameworks
- Strong understanding of fundamental relational database design, ETL and business intelligence concepts.
- Breadth of knowledge. You will need to understand a variety of constantly evolving business requirements, tools and platforms.
- Speed. Can you work intelligently with speed? You will be prototyping systems that have never built before with little or no technical documentation/requirements.
- Work experience in scalable software / database development preferably in the internet space.
2. Specific Tools:
- Strong SQL skills.
- Strong UNIX / Linux background
- Strong compiled object oriented language skills (Java, C++).
3. Additional Preferred Experience
- Ideally experienced in working with large unstructured and structured data sets. Large = multi-terabyte+, 100MM+ daily transaction volumes.
- Experience with map/reduce like architectures (Hadoop, Hive, Pig, etc.).
- Knowledge of web logging, publishing platforms and/or content syndication architecture.Experience with regular expressions.
- Interest in discrete math and/or statistics.
Bonus
- Previous startup experience
- Interest in functional programming
- Active user of Twitter
Comments
6 Responses to “Twitter is considering using MapReduce”
Leave a Reply
[…] Twitter’s analytics job opening Share: These icons link to social bookmarking sites where readers can share and discover new web pages. […]
Interest in functional programming is an interesting twist there at the end, especially with the nod towards Java/C++ in section 2.
interesting – it’s a far cry from their RoR roots, huh? with all of this money, do you think that they’ll explore vertica or greenplum type solutions? looks like an awesome cloudera prospect!
All I know is I havent been able to login to twitter on and off for a week now…lame.
Twitter’s OLTP performance/reliability has indeed been bad again recently.
RoR is still in the mix.
As for which data warehouse DBMS to use for web analytics — there are a lot of good possibilities. 🙂 MapReduce favors Aster or Greenplum, of course.
[…] http://www.dbms2.com/2009/03/31/twitter-is-considering-using-mapreduce/ […]