Fix Hibernate JPA sequence generating odd id with allocationSize

I config my jpa entity class as usual using Intellij generated stub from DB schema with below config:


@Entity
@javax.persistence.Table(name = "IMPORT_RUN", schema = "SCRCOWNER", catalog = "")
public class ImportRunEntity
{
    private Long importRunId;

    @SequenceGenerator(name = "IMPORT_RUN_ID_GEN", sequenceName = "SEQ_IMPORT_RUN_ID")
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "IMPORT_RUN_ID_GEN")
    @javax.persistence.Column(name = "IMPORT_RUN_ID", nullable = false, insertable = true, updatable = true, precision = -127)
    public Long getImportRunId()
    {
        return importRunId;
    }

    public void setImportRunId(Long importRunId)
    {
        this.importRunId = importRunId;
    }
.....
}

However the id that is inserted into DB are way different(larger) than it should be. For example, id should be 10012 but it becomes 5223326. This odd number is actually getting from some other sequence.

To fix this, I have to add the

    @SequenceGenerator(name = "IMPORT_RUN_ID_GEN", sequenceName = "SEQ_IMPORT_RUN_ID", allocationSize=1)

to the ‘SequenceGenerator’ to make sure it get its own sequence and add 1. By default the allocationSize is 50.

TheJavadoc of the javax.persistence.SequenceGenerator class provided an partial answer, because I found out that it contained a property called allocationSize, which is specified as follows:

(Optional) The amount to increment by when allocating sequence numbers from the sequence.

After this modification the value inserted to the id column of the Persons table was the same than value fetched from the person_sequence sequence. I am not sure how common this problem is, but I have never encountered this problem when building web applications by using Hibernate without JPA. This leads me to wonder, if this is a Hibernate specific problem or are other JPA2 implementations behaving in the same way. However, in my opinion the default value of the allocationSize property should be one, because it is the most common use case of id generation by using a sequence.
And this is a related discussion
Some useful notes:

AllocationSize is a range of primarykey values reserved for Hibernate. And the select ….nextval from dual will be done only after hibernate consumed this range of primarykeys.

So you MUST declare the same value on both allocationSize (Hibernate) AND sequence increment (DB)

When explicitly set allocationSize=500, e.g. on Oracle

create sequence SEQ_ACE_WORKERS_QUEUE_STATS_ID
       MINVALUE 1 
       MAXVALUE 999999999999999999999999999 
       START WITH 1
       INCREMENT BY 500 
       NOCACHE 
       NOCYCLE;

Otherwise, you’ll notice negative values or constraint errors raised from your DB because of primary key collisions.

When the app server is restarted, you’ll notice the ‘jump’ between the latest primarykey allocated, and the ‘newly’ sequence number selected at restart.

Final comment : default value is 50. So if you don’t specify allocationSize on the Hibernate side, you MUST declare INCREMENT BY 50 on the DB side.

For clustering

If you’re using load-balancers, but the DB is shared between the instances, you cannot cache the sequences’ values.

You have to ask for the next sequence value each time you insert into DB. And each insert has to be inside a transaction.
Hibernate, (by default) has allocationSize = 50, so to fix the problem, you have to explicitly set it to 1.

 @SequenceGenerator(allocationSize=1 ...)

Hi/Lo generator

Using the JPA @SequenceGenerator along with the legacy Hibernate identifiers will give you theSequenceHiLoGenerator that applies a HI/LO optimization algorithm anyway.

But for Sequences, the actual identifier value is applied during flush-time, so you won’t get the actual value until the session gets flushed (a manual flush or the commit-time flush).

For IDENITY generator, you get the identifier generated prior to flushing, but that disables JDBC batching so it’s not a silver-bullet either.

If you want full control, you need to resort to assigned identifiers and UUID surrogate keys are perfect for this job.

Composite primary keys vs unique object ID field

Most of the commonly used engines (MS SQL Server, Oracle, DB2, MySQL, etc.) would not experience noticeable issues using a surrogate key system. Some may even experience a performance boost from the use of a surrogate, but performance issues are highly platform-specific.

In general terms, the natural key (and by extension, composite key) verses surrogate key debate has a long history with no likely “right answer” in sight.

The arguments for natural keys (singular or composite) usually include some the following:

1) They are already available in the data model. Most entities being modeled already include one or more attributes or combinations of attributes that meet the needs of a key for the purposes of creating relations. Adding an additional attribute to each table incorporates an unnecessary redundancy.

2) They eliminate the need for certain joins. For example, if you have customers with customer codes, and invoices with invoice numbers (both of which are “natural” keys), and you want to retrieve all the invoice numbers for a specific customer code, you can simply use "SELECT InvoiceNumber FROM Invoice WHERE CustomerCode = 'XYZ123'". In the classic surrogate key approach, the SQL would look something like this: "SELECT Invoice.InvoiceNumber FROM Invoice INNER JOIN Customer ON Invoice.CustomerID = Customer.CustomerID WHERE Customer.CustomerCode = 'XYZ123'".

3) They contribute to a more universally-applicable approach to data modeling. With natural keys, the same design can be used largely unchanged between different SQL engines. Many surrogate key approaches use specific SQL engine techniques for key generation, thus requiring more specialization of the data model to implement on different platforms.

Arguments for surrogate keys tend to revolve around issues that are SQL engine specific:

1) They enable easier changes to attributes when business requirements/rules change. This is because they allow the data attributes to be isolated to a single table. This is primarily an issue for SQL engines that do not efficiently implement standard SQL constructs such as DOMAINs. When an attribute is defined by a DOMAIN statement, changes to the attribute can be performed schema-wide using an ALTER DOMAIN statement. Different SQL engines have different performance characteristics for altering a domain, and some SQL engines do not implement DOMAINS at all, so data modelers compensate for these situations by adding surrogate keys to improve the ability to make changes to attributes.

2) They enable easier implementations of concurrency than natural keys. In the natural key case, if two users are concurrently working with the same information set, such as a customer row, and one of the users modifies the natural key value, then an update by the second user will fail because the customer code they are updating no longer exists in the database. In the surrogate key case, the update will process successfully because immutable ID values are used to identify the rows in the database, not mutable customer codes. However, it is not always desirable to allow the second update – if the customer code changed it is possible that the second user should not be allowed to proceed with their change because the actual “identity” of the row has changed – the second user may be updating the wrong row. Neither surrogate keys nor natural keys, by themselves, address this issue. Comprehensive concurrency solutions have to be addressed outside of the implementation of the key.

3) They perform better than natural keys. Performance is most directly affected by the SQL engine. The same database schema implemented on the same hardware using different SQL engines will often have dramatically different performance characteristics, due to the SQL engines data storage and retrieval mechanisms. Some SQL engines closely approximate flat-file systems, where data is actually stored redundantly when the same attribute, such as a Customer Code, appears in multiple places in the database schema. This redundant storage by the SQL engine can cause performance issues when changes need to be made to the data or schema. Other SQL engines provide a better separation between the data model and the storage/retrieval system, allowing for quicker changes of data and schema.

4) Surrogate keys function better with certain data access libraries and GUI frameworks. Due to the homogeneous nature of most surrogate key designs (example: all relational keys are integers), data access libraries, ORMs, and GUI frameworks can work with the information without needing special knowledge of the data. Natural keys, due to their heterogeneous nature (different data types, size etc.), do not work as well with automated or semi-automated toolkits and libraries. For specialized scenarios, such as embedded SQL databases, designing the database with a specific toolkit in mind may be acceptable. In other scenarios, databases are enterprise information resources, accessed concurrently by multiple platforms, applications, report systems, and devices, and therefore do not function as well when designed with a focus on any particular library or framework. In addition, databases designed to work with specific toolkits become a liability when the next great toolkit is introduced.

I tend to fall on the side of natural keys (obviously), but I am not fanatical about it. Due to the environment I work in, where any given database I help design may be used by a variety of applications, I use natural keys for the majority of the data modeling, and rarely introduce surrogates. However, I don’t go out of my way to try to re-implement existing databases that use surrogates. Surrogate-key systems work just fine – no need to change something that is already functioning well.

There are some excellent resources discussing the merits of each approach:

http://www.google.com/search?q=natural+key+surrogate+key

http://www.agiledata.org/essays/keys.html

http://www.informationweek.com/news/software/bi/201806814

 

 

FROM HERE

获取刚persist的bean的id

被这个问题困扰好多天了,今天奋战到两点多终于解决了,其实是一个很简单的解决方法。

前面用到contact和employee,需要在jsp里面创建contact后加入到employee的列表,貌似直接persist contact后,立即用这个contact的话会id 为null,相当苦恼!不知道是oracle的原因还是因为是在客户端里调用的原因。

今天在为了完成作业的任务之一–3个entity bean,强行加了一个一对一的phoneNumber to contact,以为会很简单,其实坐起来还是用了两个多小时,也好,更清楚的理解了一下orm!

当表格中得到phoneNumber后,就要立即instantiate一个phoneNumber bean,然后persist,问题跟以前一样,如果直接用这个bean来创建新contact的话,id会为空,然后当然就nullpointer来了!

解决方法是改造一下persist方法,以为在dao里面的话,persist之后那个obj是会有id的,于是在dao里面见一个persist的wrapper方法,让它返回刚persist的obj,这样就有id了! //其实这个就是merge方法,不用增加了,今天才发现。。。(11月2日)

如果是一对多的话,以为可以直接用那个list的add方法,不知道EJB会不会任,明天来试试!如果可以的话就不用手动去维护中间那张表了,也可以把那个sql语句给去掉!//确实可行,不过add之后要增加一个persist或者merge的操作,它们不会自动进数据库。。。(11月2日)
希望能行,momo!