JPA SequenceGenerator with allocationSize 1 performance tuning

I had a blog last year about fixing the sequence number going wild by setting the allocationSize to 1.

Overall it solves the inconsistency problem if you are using a sequence with ‘INCREMENT BY’ value 1 in database.

Issue

One problem comes up today is I am facing some performance issue with the above setting when I was trying to persist a lot of records(entities) because for every entity need to perform a ‘select SEQ.nextval from DUAL’ in order to get a ID from the specified sequence. So when persisting hundreds of thousands of entities, this becomes a problem.

First Try

Did some search and tried to set my allocationSize to 500 also increased my sequence’s ‘INCREMENT BY’ value to 500 by

alter sequence SEQ_EQUITY_PROC_DAILY_ID increment by 500

At doing this, the saving process is much faster(10 times). However  when I query the database, i found another inconsistency that my sequence next value is ‘2549522’ but the ID I have in the db table is something like ‘1274761000’. So the problem for using the MultipleHiLoPerTableGenerator where the id will be allocationSize*sequenceValue. This generator is perfectly fine is you have a new table with sequence init value 1 given that you can tolerate this kind of inconsistency between the ID value and the actual sequence value. So how it works is, by default we have allocation size 50, so hibernate will get the 50 and use the 1-50 for the current entities. Next round it will use 51-100 when the sequence value is 2. The drawback is if there are some other JDBC connection or jpa using a different setting, we will probably get ID collision.

Solution

To solve this problem, we need to set a property in hibernate:

properties.setProperty("hibernate.id.new_generator_mappings", Boolean.toString(true));

This ‘hibernate.id.new_generator_mappings’ by default is false which uses the ‘SequenceHiLoGenerator‘ which will have that multiply behavior. Once we set it to true, it will then use the ‘SequenceStyleGenerator‘, which is more jpa and oracle friendly. It generates identifier values based on an sequence-style database structure. Variations range from actually using a sequence to using a table to mimic a sequence.

5 Coding Hacks to Reduce GC Overhead

Some background

The GC is built to handle large amounts of allocations of short lived objects (think of something like rendering a web page, where most of the objects allocated become obsolete once the page is served).

The GC does this using what’s called a “young generation” – a heap segment where new objects are allocated. Each object has an “age” (placed in the object’s header bits) which defines how many collections it has “survived” without being reclaimed. Once a certain age is reached, the object is copied into another section in the heap called a “survivor” or “old” generation.

The process, while efficient, still comes at a cost. Being able to reduce the number of temporary allocations can really help us increase throughput, especially in high-scale environments, or Android apps where resources are more limited.

Below are five ways we can write everyday code that’s more memory efficient, without having to spend a lot of time on it, or reducing code readability.

1. Avoid implicit Strings

Strings are an integral part of almost every data structure we manage. Being much heavier than other primitive values, they have a much stronger impact on memory usage.

One of the most important things to note is that Strings are immutable. They cannot be modified after allocation. Operators such as “+” for concatenation actually allocate a new String containing the contents of the strings being joined. What’s worse, is there’s an implicit StringBuilder object that’s allocated to actually do the work of combining them.

For example –

1
a = a + b; // a and b are Strings

The compiler generates comparable code behind the scenes:

1
2
3
4
StringBuilder temp = new StringBuilder(a).
temp.append(b);
a = temp.toString(); // a new String is allocated here.
// The previous “a” is now garbage.

But it gets worse.

 

Let’s look at this example –

1
2
3
String result = foo() + arg;
result += boo();
System.out.println(“result = “ + result);

In this example we have 3 StringBuilders allocated in the background – one for each plus operation, and two additional Strings – one to hold the result of the second assignment and another to hold the string passed into the print method. That’s 5 additional objects in what would otherwise appear to be a pretty trivial statement.

Think about what happens in real-world code scenarios such as generating a web page, working with XML or reading text from a file. Nested within loop structures, you could be looking at hundreds or thousands of objects that are implicitly allocated. While the VM has mechanisms to deal with this, it comes at a cost – one paid by your users.

The solution: One way of reducing this is being proactive with StringBuilder allocations. The example below achieves the same result as the code above while allocating only one StringBuilder and one String to hold the final result, instead of the original five objects.

1
2
3
StringBuilder value = new StringBuilder(“result = “);
value.append(foo()).append(arg).append(boo());
System.out.println(value);

By being mindful of the way Strings and StringBuilders are implicitly allocated you can materially reduce the amount of short-term allocations in high-scale code locations.

2. Plan List capacities

Dynamic collections such as ArrayLists are among the most basic structures to hold dynamic length data. ArrayLists and other collections such as HashMaps and TreeMaps are implemented using underlying Object[] arrays. Like Strings (themselves wrappers over char[] arrays), array size is also immutable. The obvious question then becomes – how can we add/put items in collections if their underlying array’s size is immutable? The answer is obvious as well – byallocating more arrays.

Let’s look at this example –

1
2
3
4
5
6
7
List<Item> items = new ArrayList<Item>();
for (int i = 0; i < len; i++)
{
Item item = readNextItem();
items.add(item);
}

The value of len determines the ultimate length of items once the loop finishes. This value, however, is unknown to the constructor of the ArrayList which allocates a new Object array with a default size. Whenever the capacity of the internal array is exceeded, it’s replaced with a new array of sufficient length, making the previous array garbage.

If you’re executing the loop thousands of times you may be forcing a new array to be allocated and a previous one to be collected multiple times. For code running in a high-scale environment, these allocations and deallocations are all deducted from your machine’s CPU cycles.

The solution: Whenever possible, allocate lists and maps with an initial capacity, like so:

1
List<MyObject> items = new ArrayList<MyObject>(len);

This ensures that no unnecessary allocations and deallocations of internal arrays occur at runtime as the list now has sufficient capacity to begin with. If you don’t know the exact size, it’s better to go with an estimate (e.g. 1024, 4096) of what an average size would be, and add some buffer to prevent accidental overflows.

3. Use efficient primitive collections

Current versions of the Java compiler support arrays or maps with a primitive key or value type through the use of “boxing” – wrapping the primitive value in a standard object which can be allocated and recycled by the GC.

This can have some negative implications. Java implements most collections using internal arrays. For each key/value entry added to a HashMap an internal objectis allocated to hold both values. This is a necessary evil when dealing with maps, which means an extra allocation and possible deallocation made every time you put an item into a map. There’s also the possible penalty of outgrowing capacity and having to reallocate a new internal array. When dealing with large maps containing thousands or more entries, these internal allocations can have increasing costs for your GC.

A very common case is to hold a map between a primitive value (such as an Id) and an object. Since Java’s HashMap is built to hold object types (vs. primitives), this means that every insertion into the map can potentially allocate yet another object to hold the primitive value (“boxing” it).

The standard Integer.valueOf method caches the values between -128 and 127, but for each number outside that range, a new object will be allocated in addition to the internal key / value entry object. This can potentially more than triple GC overhead for the map. For those coming from a C++ background this can really be troubling news, where STL templates solve this problem very efficiently.

Luckily, this problem is being worked on for next versions of Java. Until then, it’s been dealt with quite efficiently by some great libraries which provide primitive trees, maps and lists for each of Java’s primitive types. I strongly recommend Trove, which I’ve worked with for quite a while and found that can really reduce GC overhead in high-scale code.

4. Use Streams instead of in-memory buffers

Most of the data we manipulate in server applications comes to us in the form of files or data streamed over the network from another web service or a DB. In most cases, the incoming data is in serialized form, and needs to be deserialized into Java objects before we can begin operating on it. This stage is very prone to large implicit allocations.

The easiest thing to do usually is read the data into memory using a  ByteArrayInputStream, ByteBuffer and then pass that on to the deserialization code.

This can be a bad move, as you’d need to allocate and later deallocate room for that data in its entirety while constructing new objects out of it . And since the size of the data can be of unknown size, you guessed it – you’ll have to allocate and deallocate internal byte[] arrays to hold the data as it grows beyond the initial buffer’s capacity.

The solution is pretty straightforward. Most persistence libraries such as Java’s native serialization, Google’s Protocol Buffers, etc. are built to deserialize data directly from the incoming file or network stream, without ever having to keep it in memory, and without having to allocate new internal byte arrays to hold the data as it grows. If available, go for that approach vs. loading the data into memory. Your GC will thank you.

5. Aggregate Lists

Immutability is a beautiful thing, but in some high-scale situations it can have some serious drawbacks. One scenario is when passing List objects between methods.

When returning a collection from a function, it’s usually advisable to create the collection object (e.g. ArrayList) within the method, fill it and return it in the form of an immutable Collection interface.

There are some cases where this doesn’t work well. The most noticeable one is when collections are aggregated from multiple method calls into a final collection. While immutability provides more clarity, in high-scale situations it can also mean massive allocation of interim collections.

The solution in this case would be not to return new collections, but instead aggregate values into a single collection that’s passed into those methods as a parameter.

Example 1 (inefficient) –

1
2
3
4
5
6
7
8
List<Item> items = new ArrayList<Item>();
for (FileData fileData : fileDatas)
{
// Each invocation creates a new interim list with possible
// internal interim arrays
items.addAll(readFileItem(fileData));
}

Example 2 –

1
2
3
4
5
6
7
List<Item> items =
new ArrayList<Item>(fileDatas.size() * avgFileDataSize * 1.5);
for (FileData fileData : fileDatas)
{
readFileItem(fileData, items); // fill items inside
}

Example 2, while disobeying the rules of immutability (which should normally be adhered to) can save N list allocations (along with any interim array allocations). In high-scale situations this can be a boon to your GC.

Additional reading

String interning – http://plumbr.eu/blog/reducing-memory-usage-with-string-intern

Efficient wrappers – http://vanillajava.blogspot.co.il/2013/04/low-gc-coding-efficient-listeners.html

Using Trove – http://java-performance.info/primitive-types-collections-trove-library/

FROM HERE

jpa performance over jdbc for large table

I have a table with about 80 million records. While I was doing a simple query using JPA with 2-3 predicates. It takes about 120s to get the result, comparing the 1s using JDBC.


        CriteriaBuilder cb = entityManager.getCriteriaBuilder();
        CriteriaQuery&lt;Long&gt; cq = cb.createQuery(Long.class);
        Root&lt;SrcMpodrSalesDtlEntity&gt; root = cq.from(SrcMpodrSalesDtlEntity.class);
        List&lt;Predicate&gt; predicates = new ArrayList&lt;Predicate&gt;();
        predicates.add(cb.greaterThanOrEqualTo(root.get(SrcMpodrSalesDtlEntity_.trdDt), startDate));
        predicates.add(cb.lessThanOrEqualTo(root.get(SrcMpodrSalesDtlEntity_.trdDt), endDate));
        cq.select(cb.count(root)).where(predicates.toArray(new Predicate[predicates.size()]));
        return entityManager.createQuery(cq).getSingleResult();

Notice, i am using exactly the same query that the jpa generates.

    select
        * 
    from
        ( select
            count(srcmpodrsa0_.SRC_MPODR_SALES_DTL_ID) as totalTrade 
        from
            SRC_MPODR_SALES_DTL srcmpodrsa0_ 
        where
            srcmpodrsa0_.TRD_DT&gt;=TO_DATE('2011-07-01-00:00:00', 'yyyy-MM-dd-HH24:mi:ss') 
            and srcmpodrsa0_.TRD_DT&lt;=TO_DATE('2011-07-31-23:59:59', 'yyyy-MM-dd-HH24:mi:ss')
        ) 
    where
        rownum &lt;= 1

This is somehow frustrating.

To be honest, I have to leave JPA and stick with JDBC (but certainly using JdbcTemplate support class or such like). JPA (and other ORM providers/specifications) is not designed to operate on many objects within one transaction as they assume everything loaded should stay in first-level cache (hence the need for clear() in JPA).

Also I am recommending more low level solution because the overhead of ORM (reflection is only a tip of an iceberg) might be so significant, that iterating over plain ResultSet, even using some lightweight support like mentioned JdbcTemplate will be much faster.

JPA is simply not designed to perform operations on a large amount of entities. You might play with flush()/clear() to avoid OutOfMemoryError, but consider this once again. You gain very little paying the price of huge resource consumption.

There is no “proper” what to do this, this isn’t what JPA or JDO or any other ORM is intended to do, straight JDBC will be your best alternative, as you can configure it to bring back a small number of rows at a time and flush them as they are used, that is why server side cursors exist.

ORM tools are not designed for bulk processing, they are designed to let you manipulate objects and attempt to make the RDBMS that the data is stored in be as transparent as possible, most fail at the transparent part at least to some degree. At this scale, there is no way to process hundreds of thousands of rows ( Objects ), much less millions with any ORM and have it execute in any reasonable amount of time because of the object instantiation overhead, plain and simple.

REASON

Turns out it is because of the java.sql.Timestamp that JPA uses for the java.util.Date in the preparedStatement. And it will cause THIS!!!

The JPA has “BasicTypeRegistry” which register the sql type handling the java types. Among them, java.util.Date is regitered with TimeStampType. I found that if we use DateType, the performance will be the same as JDBC. But i did not find an easy way to do this.

configuration.getTypeResolver().registerTypeOverride( new DateType(){
            @Override
            public String[] getRegistrationKeys() {
                return new String[] {
                    getName(),
                    java.sql.Date.class.getName(),
                    java.util.Date.class.getName()
                };
            }
        } );

But since i am using JPA, i do not find a way go get the reference for hibernate Session Factory or the hibernate Configuration above.

One work around i found is annotate the field with ( @Type(type = “date”), it would use java.sql.Date ) and then using JPQL directly rather than the JPA Criteria. This way, the createQuery does not go to replace the Date time to the TimeStamp.

Optimize Exception performance by overriding the “fillInStackTrace” method

Exceptions Are Slow

Throwing exceptions is bad right. Its slow and makes the code unreadable. Well… kinda…maybe..

Webwork 1 has a flat configuration mechanism to look up names to values.

Its like a chained hash map of configuration providers that are asked in turn “do you know about this key” and if yes what is its value. And if no, tell me that and I will go onto the next one. Its classic tri-state return logic

And it uses exceptions as the mechanism to say “I dont know about that key”. Now before you snicker at this, remember that java.util.ResourceBundle does exactly this. At least it did up until 1.6 but thats another story.

Here is some code for example:

public Object getImpl(final String aName) throws IllegalArgumentException
{
// Delegate to the other configurations
IllegalArgumentException e = null;
for (final ConfigurationInterface config : configList)
{
try
{
return config.getImpl(aName);
}
catch (final IllegalArgumentException ex)
{
e = ex;
// Try next config
}
}
throw e;
}

In this example the stack trace of the signalling exception is not important. And I would also make the argument that this is very readable code from an exception handling point of view. I might have declared my own specific runtime exception type rather than use IllegalArgumentException but its very readable.

And even the fact that the last exception out is rethrown in not important becuse most of the surrounding code does this.

try
{
String classname = (String) Configuration.get(CLASS_NAME);
if (classname != null && classname.length() > 0)
{
try
{
impl = (InjectionImpl) ClassLoaderUtils.loadClass(classname, InjectionUtils.class).newInstance();
}
catch (Exception e)
{
LogFactory.getLog(InjectionUtils.class).error("Could not load class " + classname + " or could not cast it to InjectionUtils.  Using default", e);
}
}
}
catch (IllegalArgumentException e)
{
//do nothing - this is just because the property couldn't be found.
}

So we dont need the stack trace.

But you get it always because Throwable, as the root of all exceptions, does this in all of its constructors:

public Throwable() {
fillInStackTrace();
}
public Throwable(String message) {
fillInStackTrace();
detailMessage = message;
}
public Throwable(String message, Throwable cause) {
fillInStackTrace();
detailMessage = message;
this.cause = cause;
}

Notice how every incarnation fills in the stack trace?

However Jed Wesley Smith led us onto a great Java trick. Just dont fill in the stack trace.

Speed Of Execution

Filling in exception stack traces is what takes all the time in exception handling.

Without it then :

try {
throw e;
...
} catch (e) {
}

is pretty much an object allocation and a goto statement.

And its simple not to fill out the stack trace.

In fact Jed had already left an example in JIRA source code

/**
* efficient exception that has no stacktrace; we use this for flow-control.
*/
@Immutable
static final class ResourceNotFound extends Exception
{
ResourceNotFound()
{}
@Override
public Throwable fillInStackTrace()
{
return this;
}
}

Thats it. No stack trace is available on this exception. If you e.printStackTrace() it is just empty.

I wrote a micro test on this comparing thrown stack traced filled exceptions against stack trace empty exceptions, at different levels of stack depth.

I found that it is 25% faster. But because I am a fail at maths, I was correctly told that its in fact a 400% increase.

Having no stack trace is FAST.

Rupert Shuttleworth initially found this hot spot while profiling JIRA during the functional test runs during dev speed week. He found that 5% of the server time was taking up by this exception handling.

There are many calls to find config per request and each delegates down a stack of about 10 config providers.

Once he used an exception class that did not fill in the stack trace, the code fell off the profiling list. Win!!

Readability Of The Code

You could imagine returning a tuple object in Java that indicate the tri-state return logic.

Something more like this:

public static class CompundResult {
private final Object value;
private final boolean found;
public CompoundResult(final Object value, final boolen found)
{
this.value = value;
this.found = found;
}
public boolean found()
{
return found;
}
public boolean value()
{
return value;
}
}
public CompundResult getImpl(final String aName)
{
// Delegate to the other configurations
for (final ConfigurationInterface config : configList)
{
CompundResult result =  config.getImpl(aName);
if (result.wasFound())
{
return result;
}
// Try next config
}
return new CompoundResult(null,fase);
}

But I think this is slightly less readable and also requires an intermediatory class for the compound return object. I argue that the exceptions is more readable. I initial didnt think that but I think I was coloured by my perceptions that exceptions are slow and therefore bad.

But if the exceptions cost you next to nothing then it some how becomes more elegant code.

Also in this case I was hamstrung by an existing design, so changing all the callers was going to be hard.

Jed has since shown me a great use of functional style Options to better encode the compound object shown above. But lets leave that for another blog post.

TL;DR Too Late

So the lesson is that you if you use exceptions for path control, make sure you use an exception class that does not fill out its stack trace.

It will be faster and more readable.

FROM HERE