hibernate caching strategy

For databases, the following applies to all levels: the best database accesses are those that do not have to be carried out. For this, caches are used within the database and also in Hibernate and JPA. Caching is relatively complex and one of the most misunderstood concepts of Hibernate. There are three different caches:

  • First level cache: always active and relates to a unit of work, i.e. mostly to a service call as it is attached to the current session.
  • Second level cache: can be configured for certain entities. In this case, the objects are deposited in a cross-transaction cache. The key to this deposit is the primary key of the entity. The cache can be configured cluster-wide or within a JVM.
  • Query cache: stores the result of an HQL/EJB-QL query in a cache. For this, only the primary keys of the result objects are stored which are then loaded via second level cache. Thus the query cache works only together with the second level cache.

The efficient usage of the caches depends on the application logic and the correct cache configuration. Hibernate allows the configuration of different cache providers. The most popular implementations areEhcache and JBoss Cache. First, an analysis must be made to see if an entity allows only reading access or if it is modified frequently. For reading access (e.g. for key values like country lists) a read-only access second level cache make much sense, because all entities are in the memory and database accesses can be prevented almost completely. However, the correct dimensioning of the caches is important. If there are too many instances of an object and if they cannot all be loaded into the memory, the usage of a cache should be analyzed – it could make sense in the case that not all data are used equally often.

If all entities in the cache are modified, too, you have to decide how important the consistency of the data in the cache with the database really is. Hibernate supports transactional caches for which each update on the cache is stored permanently directly in the database, but also less restrictive algorithms which control the validity of data based on timestamps. Depending on the frequency of modifications and the data consistence requirements, the gain in performance of the second level caches must be evaluated differently. An exact analysis of the access behaviour with the help of a profiler or monitoring tool is thus very helpful for the correct configuration of the second level cache.

Some other points must be observed, however, if the second level cache is used, because one might be astonished that even with active second level cache there can still be queries. The entities in the second level cache are identified via their primary key. This also means that only those queries are read from the cache which read an entity via primary key. If you have, for example, a class named person that can be identified via a unique, consecutive number as primary key, only those queries can be optimized by the second level cache which read a person via this consecutive number. If person is searched via family and first name, these queries would go directly to the database past the second level and there would not be any performance savings though a second level for person exists. To be able to use a cache for the search via first and last name, the query cache must be activated for the query. In this case all primary keys of person from the query output are stored in the cache. If a new search is carried out for the same name, the primary keys are read from the query cache and a query is send to the second level cache which then gives the results back from the cache. This is the only way to prevent direct database queries.

Another pitfall is that the data in the second level cache are not stored as objects, but in a so-called dehydrated form. The dehydrated form is a sort of serialization of the entities. If an entity is read from the second level cache, it must be “rehydrated”, that is de-serialized. If huge amounts of query outputs are given back, which are all in the second level cache, this may lead to performance problems. Each object must be read from the cache and be de-serialized. The advantage is that each transaction gets back its own object instance from the cache, which prevents concurrency problems. This means that the second level cache is thread-safe. Especially for key values which are accessed read-only, the second level cache can be slower compared to a normal object cache. It is a good idea to analyze the application thoroughly and, if necessary, to put the key objects in a cache above the database access layer. The details of the different cache configurations can only be sketched here but the correct usage of the different caches is absolutely necessary, especially for frequently used applications, to achieve a good performance.

From here

For more detail about these caches, read the previous posts:

session cache (1st leve)

cross-session cache (2nd level)

query cache

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s