EMR hive JDBC over SSL with ELB

Recently we need to setup a hive cluster consuming S3 objects so that we could run query from our java server(tomcat) via JDBC.

Several challenges:

  1. our java server is on prem(will move to aws in 2017) so we have to secure the channel to the ERM cluster in emr.
    Solution: use SSL cross the board.
  2. Every time the EMR cluster restarts, its ip changes. For the JDBC connection, we would like a constant value (DNS) name.
    Solution: use ELB to obtain a DNS name.
  3. EMR master needs to be attached to ELB every time it is created
    Solution: in the bootstrap script, we constantly pull the state of the EMR creation state and once it finishes, grab its IP and attach to ELB.

For the challenge 1

we need to install the certificate chain as well as the private key into ELB so that it could accept ssl connection from client side. In the client JDBC, we need to add ‘;ssl=true’ to the jdbc connection string to instruct client to init the connection with SSL. Another thing is to import the CA to the JRE’s lib/security cacerts so that when we do SSL handshake, the ELB’s certificate CA would be in the java client’s truststore.  For our test, we use a self-CA(finrarcselfca.crt is our self CA certificate):

sudo keytool -keystore cacerts -importcert -alias finrarcselfca -file finrarcselfca.crt

As for the connection between ELB and EMR master, we can just use a random self-signed keystore since ELB does not need to verify the ERM cluster.First generate a self-signed keystore using:

keytool -genkey -keyalg RSA -alias selfsigned -keystore self.keystore -storepass changeit -validity 360 -keysize 2048

and then add the below config to the hive-site.xml

    <property>
        <name>hive.server2.use.SSL</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.server2.keystore.path</name>
        <value>/home/hadoop/hive/conf/hive2.keystore</value>
    </property>
    <property>
        <name>hive.server2.keystore.password</name>
        <value>changeit</value>
    </property>

Note: the password should match for the one when created and the one specifed in the hive site xml.

For the challenge 2

In the ELB listener, we can forward the SSL TCP 443 port to the SSL TCP 10000 port. This way, when our java client init the JDBC over SSL connection, the ELB could unwrap the message and then start another SSL connection with the EMR master via port 10000.

For the challenge 3

Previously We have following shell script to wait ERM startup and attach to ELB. The bash way is quick and dirty.
Now we are leverage nodejs with aws-js-sdk to maintain our cluster which is more robust and easier to maintain/understand.

if [ $exitStatus -eq 0 ]
then
   clusterIdLine=`echo $result| sed 's/{\|"\|:\|}\|ClusterId//g'`
   clusterId=`echo $clusterIdLine| awk '{$1=$1;print}'`
   Mail "Adhoc Query Cluster $clusterId started..."
else
   Mail "Error while creating Adhoc Query Cluster $result"
fi

sleep 30

while :
do
clusterState=`aws emr list-clusters --active |grep '"Id":\|"State":' | sed 'N;s/\n/ /' |grep "$clusterId" |sed 's/"\|,\|Name\|Id//g'|cut -d':' -f2|awk '{$1=$1;print}'`
echo $clusterId
echo $clusterState
if [ $clusterState = 'WAITING' ]
then
   break
fi
  echo 'Waiting for cluster to be created'
  sleep 30
done

masterInstanceID=`aws emr list-instances --cluster-id $clusterId --instance-group-types MASTER |grep '"Ec2InstanceId":' | sed 'N;s/\n/ /' |sed 's/"\|,\|Ec2InstanceId//g'|cut -d':' -f2|awk '{$1=$1;print}'`

echo $masterInstanceID

result=`aws elb register-instances-with-load-balancer --load-balancer-name $elbName --instances $masterInstanceID`

echo $result

Some reference

How to Create a Self Signed Certificate using Java Keytool

import private key to keystore

Cloudera hive config

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s