Hive JDBC Architecture

Part 1: What is the actual contract that Hive provides us with? Hive’s contract to users is defined in the HiveInterface class That is – thrift is a communication channel that hive uses to expose its main service : which is the translation of SQL commands into hadoop / mapreduce commands.  The ultimate class invoked…

How JDBC URLs get mapped to connections at runtime

Who cares? I recently found the need to mock out a JDBC url to experiment with a new way of testing sqoop without a hard dependency on a particular database installation.  In order to do this, you first need to understand how it is that, at runtime, JDBC drivers connection call URLs get routed to implementation…

Generate Large file via Hive JDBC

Recently in our project, we have a requirement of generating comparatively large file via Hive JDBC for web app user to download, by large I mean millions of rows. Since our previous use case are all small files containing less than maybe 50000 rows, our approach is grabbing them all into memory and dump into file….

hive jdbc with Spring Beanpropertyrowmapper

In our project we need to port some hive table data to our local RDBMS(Oracle). For tables with a lot of columns(hundreds), it could be very tedious to wrote the hive sql and convert the resultSet to the Jpa entity object. Spring jdbctemplate provides us a good class which would do camel-case conversion to the underscore for us….

EMR hive JDBC over SSL with ELB

Recently we need to setup a hive cluster consuming S3 objects so that we could run query from our java server(tomcat) via JDBC. Several challenges: our java server is on prem(will move to aws in 2017) so we have to secure the channel to the ERM cluster in emr. Solution: use SSL cross the board….

jpa performance over jdbc for large table

I have a table with about 80 million records. While I was doing a simple query using JPA with 2-3 predicates. It takes about 120s to get the result, comparing the 1s using JDBC. Notice, i am using exactly the same query that the jpa generates. This is somehow frustrating. To be honest, I have…