large file from hive to rdbms(oracle)

Recently we have a requirement of dumping a sizable file(4+G) to oracle from s3. The file itself is hive-compatiable. so instead of downloading the file and generate sql for it, we decided to transfer the content using hive jdbc and persist in via jpa/hiberante.


On the hive side, one important thing is to make sure batchsize is set in jdbc resultset.

hiveJdbcTemplate.query(sqlToExecute, rs -> {
            while ({
      you handling


on the relational database side

  1. make sure index is turned off. otherwise it each insertion will trigger the b-tree index to be inserted.
  2. make sure leverage the hibernate batch-size
    hibernate.jdbc.batch_size. I set it to 50 since my table has over 200 columns.For example , if you save() 100 records and your hibernate.jdbc.batch_size is set to 50. During flushing, instead of issue the following SQL 100 times :

    insert into TableA (id , fields) values (1, 'val1');
    insert into TableA (id , fields) values (2, 'val2');
    insert into TableA (id , fields) values (3, 'val3');
    insert into TableA (id , fields) values (100, 'val100');

    Hiberate will group them in batches of 50 , and only issue 2 SQL to the DB, like this:

    insert into TableA (id , fields) values (1, 'val1') , (2, 'val2') ,(3, 'val3') ,(4, 'val4') ,......,(50, 'val50')
    insert into TableA (id , fields) values (51, 'val51') , (52, 'val52') ,(53, 'val53') ,(54, 'val54'),...... ,(100, 'val100')  

    Please note that Hibernate would disable insert batching at the JDBC level transparently if the primary key of the inserting table isGenerationType.Identity.

  3. make sure flush()/clear() for certain size so that memory is not eaten up by the millions of objects built on the fly.
    flush will make sure query be executed and object saved(synced) to DB.
    clear will clear the persistence context so all managed entities are detached. entities that have not been flushed will not be persisted.

My main code is something like:

    public int doImport(int limit)
        String sql = "SELECT * FROM erd.ERD_PRDCT_FIXED_INCM_MNCPL_HS_prc_txt";
        if (limit >= 0)
            sql = sql + " LIMIT " + limit;
        HiveBeanPropertyRowMapper<SrcErdFixedIncmMuniEntity> mapper = new HiveBeanPropertyRowMapper<>(SrcErdFixedIncmMuniEntity.class);
        int batchSize = 5000;
        int[] inc = {0};
        Instant start =;
        List<SrcErdFixedIncmMuniEntity> listToPersist = new ArrayList<>(batchSize);
        hiveJdbcTemplate.query(sql, (rs) -> {
            while (
                listToPersist.add(mapper.mapRow(rs, -1));
                if (inc[0] % batchSize == 0)
                    persistAndClear(inc, listToPersist);
            //left overs(last n items)
                persistAndClear(inc, listToPersist);
            return null;
        Instant end =;
        System.out.println("Data Intake took: " + Duration.between(start, end));
        return inc[0];

    private void persistAndClear(int[] inc, List<SrcErdFixedIncmMuniEntity> listToPersist)
        em.clear();"Saved record milestone: " + inc[0]);


not bad, ~3.5 Millions records get loaded in about an hour.

serialize enum fields with gson

By default, Gson just serialize the ‘name’ of the Enum which might not be enough since we might need also want to carry all the fields during the serialization. To achieve this we need to has our own gson adaptor and make use of reflection.

public class EnumAdapterFactory implements TypeAdapterFactory

    public <T> TypeAdapter<T> create(final Gson gson, final TypeToken<T> type)
        Class<? super T> rawType = type.getRawType();
        if (rawType.isEnum())
            return new EnumTypeAdapter<T>();
        return null;

    public class EnumTypeAdapter<T> extends TypeAdapter<T>
        public void write(JsonWriter out, T value) throws IOException
            if (value == null || !value.getClass().isEnum())

                      .filter(pd -> pd.getReadMethod() != null && !"class".equals(pd.getName()) && !"declaringClass".equals(pd.getName()))
                      .forEach(pd -> {
                          } catch (IllegalAccessException | InvocationTargetException | IOException e)
            } catch (IntrospectionException e)

        public T read(JsonReader in) throws IOException
            // Properly deserialize the input (if you use deserialization)
            return null;

Enum class:

public enum ReportTypes
    SP(1), CA(2), ADF(3), ORF(4), CTO(5), CDS(6), TSP(7);

    private int reportTypeId;
    ReportTypes(int reportTypeId)
        this.reportTypeId = reportTypeId;

    public int getReportTypeId()
        return reportTypeId;

Test Code:

    public void testReportTypesGsonSerialization()
        GsonBuilder builder = new GsonBuilder();
        builder.registerTypeAdapterFactory(new EnumAdapterFactory());
        Gson gson = builder.create();


    "value": "SP",
    "reportTypeId": "1"
    "value": "CA",
    "reportTypeId": "2"
    "value": "ADF",
    "reportTypeId": "3"
    "value": "ORF",
    "reportTypeId": "4"
    "value": "CTO",
    "reportTypeId": "5"
    "value": "CDS",
    "reportTypeId": "6"
    "value": "TSP",
    "reportTypeId": "7"

DNS原理以及A/NS Record Cname

阮一峰 老师的一篇关于DNS的好博客,尤其喜欢里面对于分级查询以及A-Record, NS-Record, CNAME的解释, 简单明了, 所以转载了这一部分如下:







根域名的下一级,叫做”顶级域名”(top-level domain,缩写为TLD),比如;再下一级叫做”次级域名”(second-level domain,缩写为SLD),比如http://www.example.com里面的.example,这一级域名是用户可以注册的;再下一级是主机名(host),比如http://www.example.com里面的www,又称为”三级域名”,这是用户在自己的域里面为服务器分配的名称,是用户可以任意分配的。



# 即






  1. 从”根域名服务器”查到”顶级域名服务器”的NS记录和A记录(IP地址)
  2. 从”顶级域名服务器”查到”次级域名服务器”的NS记录和A记录(IP地址)
  3. 从”次级域名服务器”查出”主机名”的IP地址








$ dig +trace









七、NS 记录的查询


$ dig ns com
$ dig ns


$ dig +short ns com
$ dig +short ns




(1) A:地址记录(Address),返回域名指向的IP地址。

(2) NS:域名服务器记录(Name Server),返回保存下一级域名信息的服务器地址。该记录只能设置为域名,不能设置为IP地址。

(3)MX:邮件记录(Mail eXchange),返回接收电子邮件的服务器地址。

(4)CNAME:规范名称记录(Canonical Name),返回另一个域名,即当前查询的域名是另一个域名的跳转,详见下文。

(5)PTR:逆向查询记录(Pointer Record),只用于从IP地址查询域名,详见下文。



$ dig


;; ANSWER SECTION: 3370    IN  CNAME  600 IN  A




$ dig -x






$ dig a
$ dig ns
$ dig mx

JPA SequenceGenerator with allocationSize 1 performance tuning

I had a blog last year about fixing the sequence number going wild by setting the allocationSize to 1.

Overall it solves the inconsistency problem if you are using a sequence with ‘INCREMENT BY’ value 1 in database.


One problem comes up today is I am facing some performance issue with the above setting when I was trying to persist a lot of records(entities) because for every entity need to perform a ‘select SEQ.nextval from DUAL’ in order to get a ID from the specified sequence. So when persisting hundreds of thousands of entities, this becomes a problem.

First Try

Did some search and tried to set my allocationSize to 500 also increased my sequence’s ‘INCREMENT BY’ value to 500 by

alter sequence SEQ_EQUITY_PROC_DAILY_ID increment by 500

At doing this, the saving process is much faster(10 times). However  when I query the database, i found another inconsistency that my sequence next value is ‘2549522’ but the ID I have in the db table is something like ‘1274761000’. So the problem for using the MultipleHiLoPerTableGenerator where the id will be allocationSize*sequenceValue. This generator is perfectly fine is you have a new table with sequence init value 1 given that you can tolerate this kind of inconsistency between the ID value and the actual sequence value. So how it works is, by default we have allocation size 50, so hibernate will get the 50 and use the 1-50 for the current entities. Next round it will use 51-100 when the sequence value is 2. The drawback is if there are some other JDBC connection or jpa using a different setting, we will probably get ID collision.


To solve this problem, we need to set a property in hibernate:

properties.setProperty("", Boolean.toString(true));

This ‘’ by default is false which uses the ‘SequenceHiLoGenerator‘ which will have that multiply behavior. Once we set it to true, it will then use the ‘SequenceStyleGenerator‘, which is more jpa and oracle friendly. It generates identifier values based on an sequence-style database structure. Variations range from actually using a sequence to using a table to mimic a sequence.

3d css with perspective and transform rotate

The perspective CSS property gives an element a 3D-space by affecting the distance between the Z plane and the user.

The strength of the effect is determined by the value. The smaller the value, the closer you get from the Z plane and the more impressive the visual effect. The greater the value, the more subtle will be the effect.

Important: Please note the perspective property doesn’t affect how the element is rendered; it simply enables a 3D-space for children elements. This is the main difference between the transform: perspective() function and the perspective property. The first gives element depth while the later creates a 3D-space shared by all its transformed children.

 * Syntax
 * perspective: none | &lt;length&gt; 

.parent {
    perspective: 1000px;

.child {
    transform: rotateY(50deg);

Here in the bottom is the some other examples to create cool 3d with hover (transform: rotate(0))effect.

java generic multi-level wild card

Was having a interesting issue on Generic which i thought i was pretty familiar with.

Basically i need to pass a

Map<String, List<SubClass>()>

to a

Map<String, List<SuperClass>()>

I was thinking i could just do

Map<String, List<? extends SuperClass>()>

in the method signature, then i should be fine. But i get compiler error complaining type not compatible.

It turns out that the Map i pass in is actually a

HashMap<String, ArrayList<SubClass>()>

where the ArrayList is not the same type as List.

To fix this, i need to make the signature:

Map<String, ? extends  List<? extends SuperClass>()>


hive jdbc with Spring Beanpropertyrowmapper

In our project we need to port some hive table data to our local RDBMS(Oracle). For tables with a lot of columns(hundreds), it could be very tedious to wrote the hive sql and convert the resultSet to the Jpa entity object.

Spring jdbctemplate provides us a good class which would do camel-case conversion to the underscore for us.  Do leverage that, we just need to make sure the hive table has the same column name as the RDBMS table. Then we just need to call:

hiveJdbcTemplate.query(sql, new BeanPropertyRowMapper&lt;&gt;(YourOwnEntity.class));

However you might find that the result size will be good but all the entity fields will be null. That is because beginning with Hive .13, a new Hive property called hive.resultset.use.unique.column.names was created. The default value of this property is TRUE.

If the default value is used, the Hive table name is prepended to all column names. This is a change in behavior from previous versions of Hive.

This change will cause the all nulls described above because Spring converts all the fields from the target entity class to underscore format(col_name1, col_name2), however when it does the comparison with the resultSet, the ‘ResultSetMetaData’ returned by hive jdbc contains header as ‘table.col_name1’, ‘table.col_name2’ etc. As a result, the fields will be all null.
To prevent the Hive table names from being prepended to the column names, use this property setting in the Hive-site.xml file.