Cloudera Sentry and other security subjects
I chatted with Charles Zedlewski of Cloudera on Thursday about security — especially Cloudera’s new offering Sentry — and other Hadoop subjects.
Sentry is:
- Developed by Cloudera.
- An Apache incubator project.
- Slated to be rolled into CDH — Cloudera’s Hadoop distribution — over the next couple of weeks.
- Only useful with Hive in Version 1, but planned to also work in the future with other Hadoop data access systems such as Pig, search and so on.
- Lacking in administrative scalability in Version 1, something that is also slated to be fixed in future releases.
Apparently, Hadoop security options pre-Sentry boil down to:
- Kerberos, which only works down to directory or file levels of granularity.
- Third-party products.
- Roll-your-own.
Sentry adds role-based permissions for SQL access to Hadoop:
- By server.
- By database.
- By table.
- By view.
for a variety of actions — selections, transformations, schema changes, etc. Sentry does this by examining a query plan and checking whether each step in the plan is permissible.
What Sentry doesn’t have is cell-based security, for which Charles perceives relatively little demand. I agree, but also note that traditional RDBMS implementations of cell-based security — notably Oracle Label Security — can have unpleasant performance consequences. From there, I segued the discussion to Accumulo. Unlike Hortonworks, Cloudera sees Accumulo demand strictly in the Federal government, where Accumulo is baked into some major reference architectures.
Charles also walked me through the use cases for some security requests he does frequently hear:
- Encryption at rest is important for compliance, for example for credit card numbers.
- Masking is also of particular interest for credit card numbers.
- Audit arises frequently for Sarbanes-Oxley compliance, and also in financial services (not necessarily for compliance).
- View-based security — a big point of Sentry — is usually to satisfy internal (i.e. non-regulatory) policies.
Related link
- Other issues in regulatory compliance (July, 2012)
Comments
7 Responses to “Cloudera Sentry and other security subjects”
Leave a Reply
[…] we scheduled a call to talk about Sentry, Cloudera’s Charles Zedlewski and I found time to discuss other stuff as well. One […]
Interesting attempt by Cloudera to match what Intel is already shipping with a security framework under project Rhino https://github.com/intel-hadoop/project-rhino/. Today Intel has encryption of data files with access from Map Reduce, Pig and Hive that keeps the data fully secure while processing and at rest. Future releases will include cell level authorization and encryption of HBase. I encourage you to see what Intel has for security by reviewing the Intel distribution Security Guide located here: http://hadoop.intel.com/pdfs/IDH-SecurityGuide_R2-4-1_EN.pdf
@Greg – I encourage you to re-read Curt’s post and then your product documentation. Sentry is focused on fine-grained authorization by securing views of data such as a selection of columns or a range of rows in the Hive metastore as well as securing select SQL operations as they’re applied to those views. This is totally unrelated to the encryption you’re referring to.
Apache Sentry (incubating) is open source and I encourage your colleagues to consider incorporating it into a future release of your own products and perhaps even contributing patches. Much as Cloudera will review & incorporate the HDFS, MapReduce and HBase features that Intel proposes to add under the umbrella of “Rhino” (once they are committed upstream of course).
A talk description for the Intel Developer Forum next month. Emphasis mine. However, I have no idea how granular the bolded part is.
EDCS004 – Protect Your Big Data with Intel® Xeon® Processors and Intel® Distribution for Apache Hadoop* Software
Big data and data analytics are helping businesses to become smarter, more productive, and better at making predictions. These tools also present numerous security, compliance, and performance challenges. The Apache* open source projects do not provide adequate mechanisms for data protection or access controls, which are necessary for typical enterprise production use. The Intel® Distribution for Apache Hadoop* software provides significant enhancements to deal with these gaps.
In this session, we will discuss:
Hi guys
I implemented a 25 node hive-hadoop cluster in production last month. I have been struggling with implementing Cloudera Sentry for past 3 weeks. I read the documentation but its still not working. Basically I can do the LDAP authentication from a beeline client. But I am not clear if I need to create ROLES thru Hive CLI or just Sentry config files. Either way the authorization is not working. Many questions were answered well by CDH users group but I still cant get it to work. Where can I find help ?
https://groups.google.com/a/cloudera.org/forum/#!mydiscussions/cdh-user/wEEcDWWqUBI
https://groups.google.com/a/cloudera.org/forum/#!mydiscussions/cdh-user/y6nwB2-gpoo
thanks
sanjay
shutterfly coupons free shipping 2012
Cloudera Sentry and other security subjects | DBMS 2 : DataBase Management System Services
Would data security for Hive count in this case?
http://www.datasunrise.com/masking/hive/
http://www.datasunrise.com/firewall/hive