jpa performance over jdbc for large table

I have a table with about 80 million records. While I was doing a simple query using JPA with 2-3 predicates. It takes about 120s to get the result, comparing the 1s using JDBC.


        CriteriaBuilder cb = entityManager.getCriteriaBuilder();
        CriteriaQuery<Long> cq = cb.createQuery(Long.class);
        Root<SrcMpodrSalesDtlEntity> root = cq.from(SrcMpodrSalesDtlEntity.class);
        List<Predicate> predicates = new ArrayList<Predicate>();
        predicates.add(cb.greaterThanOrEqualTo(root.get(SrcMpodrSalesDtlEntity_.trdDt), startDate));
        predicates.add(cb.lessThanOrEqualTo(root.get(SrcMpodrSalesDtlEntity_.trdDt), endDate));
        cq.select(cb.count(root)).where(predicates.toArray(new Predicate[predicates.size()]));
        return entityManager.createQuery(cq).getSingleResult();

Notice, i am using exactly the same query that the jpa generates.

    select
        * 
    from
        ( select
            count(srcmpodrsa0_.SRC_MPODR_SALES_DTL_ID) as totalTrade 
        from
            SRC_MPODR_SALES_DTL srcmpodrsa0_ 
        where
            srcmpodrsa0_.TRD_DT>=TO_DATE('2011-07-01-00:00:00', 'yyyy-MM-dd-HH24:mi:ss') 
            and srcmpodrsa0_.TRD_DT<=TO_DATE('2011-07-31-23:59:59', 'yyyy-MM-dd-HH24:mi:ss')
        ) 
    where
        rownum <= 1

This is somehow frustrating.

To be honest, I have to leave JPA and stick with JDBC (but certainly using JdbcTemplate support class or such like). JPA (and other ORM providers/specifications) is not designed to operate on many objects within one transaction as they assume everything loaded should stay in first-level cache (hence the need for clear() in JPA).

Also I am recommending more low level solution because the overhead of ORM (reflection is only a tip of an iceberg) might be so significant, that iterating over plain ResultSet, even using some lightweight support like mentioned JdbcTemplate will be much faster.

JPA is simply not designed to perform operations on a large amount of entities. You might play with flush()/clear() to avoid OutOfMemoryError, but consider this once again. You gain very little paying the price of huge resource consumption.

There is no “proper” what to do this, this isn’t what JPA or JDO or any other ORM is intended to do, straight JDBC will be your best alternative, as you can configure it to bring back a small number of rows at a time and flush them as they are used, that is why server side cursors exist.

ORM tools are not designed for bulk processing, they are designed to let you manipulate objects and attempt to make the RDBMS that the data is stored in be as transparent as possible, most fail at the transparent part at least to some degree. At this scale, there is no way to process hundreds of thousands of rows ( Objects ), much less millions with any ORM and have it execute in any reasonable amount of time because of the object instantiation overhead, plain and simple.

REASON

Turns out it is because of the java.sql.Timestamp that JPA uses for the java.util.Date in the preparedStatement. And it will cause THIS!!!

The JPA has “BasicTypeRegistry” which register the sql type handling the java types. Among them, java.util.Date is regitered with TimeStampType. I found that if we use DateType, the performance will be the same as JDBC. But i did not find an easy way to do this.

configuration.getTypeResolver().registerTypeOverride( new DateType(){
            @Override
            public String[] getRegistrationKeys() {
                return new String[] {
                    getName(),
                    java.sql.Date.class.getName(),
                    java.util.Date.class.getName()
                };
            }
        } );

But since i am using JPA, i do not find a way go get the reference for hibernate Session Factory or the hibernate Configuration above.

One work around i found is annotate the field with ( @Type(type = “date”), it would use java.sql.Date ) and then using JPQL directly rather than the JPA Criteria. This way, the createQuery does not go to replace the Date time to the TimeStamp.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s