batch / bulk insert/update in jpa/hibernate with flush and clear

For JPA

use entity manager to do flush/clear

int batchSize = 1000;
for (int i = 0; i < taloes.size(); i++) {
    TalaoAIT talaoAIT = taloes.get(i);
    em.persist(talaoAIT);
    if(i % batchSize == 0) {
        em.flush();
        em.clear();
    }
    taloes.add(talaoAIT);
}
em.flush();
em.clear();

For hibernate

basically just switch the entity manager with hibernate session.

    @Transactional
    public void  saveNiidsMessages(List<SrcNiidsXmlEntity> entities)
    {
        Session session = getSession();

        for(SrcNiidsXmlEntity entity : entities)
        {
            session.save(entity);
        }
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }

    protected Session getSession()
    {
        return sessionFactory.getCurrentSession();
    }

When making new objects persistent flush() and then clear() the session regularly in order to control the size of the first-level cache.

The suggest batch size is 20-50 by hibernate. However I found 1500 is good in some of my scenarios.

More on flush and clear

Hibernate administers the persistent objects within a transaction in the so-called session. In JPA, the EntityManager takes over this task. In the following, the term “EntityManager” will also be used as a synonym for the Hibernate session, as both have a similar interface. As long as an object is attached to an EntityManager, all changes to the object will be synchronized with the database automatically. This is called flushing of the objects. The point in time of the object synchronization with the database is not guaranteed – the later a flush occurs, the more optimizing potential has the EntityManager, because e.g. updates to an object can be bundled to prevent SQL statements.

If you call clear, all currently managed objects of the EntityManager will be detached and the status is not synchronized with the database. As long as the objects are not explicitly attached again, they are standard Java objects, whose change does not have any effect on the data base. In many applications that use Hibernate or JPA, flush() and clear() are frequently called explicitly, which often has fatal effects on performance and maintainability of the application. A manual call of flush() should be prevented by a clear design of the application and is similar to a manual call of System.gc() which requests a manual garbage collection. In both cases, a normal, optimized operation of the technologies is prevented. For Hibernate and JPA this means that generally more updates are made than necessary in the case the EntityManager would have decided about the point in time.

The call of clear(), in many cases preceded by a manual flush(), leads to all objects being decoupled from the EntityManager. For this reason you should define clear architecture- and design guidelines about where aclear() can be called. A typical usage scenario for clear() is in batch processing. Working with unnecessary extensive sessions should be prevented. Apart from that, this should be noted in the Javadoc of the method explicitly, otherwise the application could show some unpredictable behaviour if the call of a method can lead to the deletion of the complete EntityManager context. This means that the objects must be re-inserted into the context of the EntityManager. Normally, the status of the objects has to be re-imported from the database for this. Depending on the fetching strategies, there are cases in which the status of the objects must be read manually to have all associations attached again. In the worst case, even modified object data will not be saved permanently.

Reference 1

Reference2 

Advertisements

replace indexed but not stored data in lucene

Was trying to replace indexed but not stored data in lucene . found this thread has the same issue:

> > >> I have a strange problem with Field.Store.NO and Field.Index.ANALYZED
> > >> fields with Lucene 3.0.1.
> > >>
> > >> I'm testing my app with twenty test documents. Each has about ten
> > >> fields. All fields except one, "Content", are set as Field.Store.YES.
> > >> The "Content" field is set as Field.Store.NO and
> > >> Field.Index.ANALYZED. Using Luke, I discovered that this "Content"
> > >> field is not persisted to the disk, except on one document (neither
> > >> the first nor the last in the list). This always happens for exactly
> > >> the same document. When I examine the Document object before writing
> > >> it, it has the "Content" field I expect.
> > >>
> > >> When I change the "Content" field from Field.Store.NO to
> > >> Field.Store.YES, everything starts working. Every document has the
> > >> "Content" field exactly as I expect, and searches produce the hits I
> > >> expect to see. I really don't want to save the full "Content" data in
> > >> the Lucene index, though. I'm baffled why Field.Store.NO results in
> > >> nothing being written to the index even with Field.Index.ANALYZED.
> > I finally had time to go back and look at this problem. I discovered that
> the
> > analyzed fields work fine for searching until I use
> > IndexWriter.updateDocument().
> >
> > The way my application runs, it has to update documents several times to
> > update one specific field. The update code queries out Document objects
> using
> > a unique identifier, and updates the field. The problem is in Document
> objects
> > returned by the query. The querying code runs a search, and eventually
> calls
> > IndexSearcher.doc(int). According to the API documentation, that method
> only
> > returns Document objects with stored fields from the underlying index.
> >
> > I tried calling IndexSearcher.doc(int i, FieldSelector fieldSelector)
> with
> > fieldSelector set to null: the documentation states that this returns
> Document
> > objects with all fields, but that also only seems to return stored
> fields.
> >
> > So my question becomes: how can I update a document which contains non-
> > stored analyzed fields without clobbering the analyzed-only fields?
> > Note that I do not need to update the analyzed-only fields. I have found
> nothing
> > helpful in the documentation.
> You cannot retrieve non-stored fields. They are analyzed and tokenized
> during indexing and this is a one-way transformation. If you update
> documents you have to reindex the contents. If you do not have access to
> the
> original contents anymore, you may consider adding a stored-only "raw
> document" field, that contains everything to rebuild the indexed fields. In
> our installation, we have a stored field containing the JSON/XML source
> document to do this.
Adding to Uwe's comment, you may be operating under a false
assumption. Lucene has no capability to update fields in a document.
Period. This is one of the most frequently requested changes, but
the nature of an inverted index makes this...er...tricky. Updates
are really a document delete followed by a document add. And as
a bonus, the new document won't even have the same internal
Lucene doc id as the one it replaces.

So if you're reading a document from the index, non-stored fields
are not part of the new update and your results will be...uhmmmm....
not what you expect...

Also this is a good link for mistakes when using lucene.

This is a good article for lucene coding reference.

checkin code to branch in intellij

  1. update project with “branch URL” under VCS -> update project… OR ctrl + T.  Check the below checkbox
    Update/Switch to specific Url
    • Select this check box to synchronize your local working copy with a specific repository. Specify the source repository either in the URL text box through its full Url address or in the Use Branch text box through the branch name.
    • Clear this check box to bring the changes from the repository that corresponds to the current working copy.
  2. after files are synced with branch code, do commit .

 

It is somewhat odd that intellij put this option under update project. Personally I like the “switch” command in tortoise SVN.

primeface update current table

I have a data table. Each row of the table has a commandButton called ‘Remove’, which is supposed to remove that row from the model and the view and perform an update in-place.

when I click a button on one of the rows, to remove it, it partially works. The corresponding element is removed from the model but the view is not updated.

update=”userTable” is the way in this situation.

I tried : update=”:usersForm:userTable” not working.

Finally the update=”@form” works for me.

 

Another issue is if the commandButton and the update component are not in the same form, the explicit location must be specified just like : update=”:otherFormId:tableToBeUpdatedId”. Even if the component is in the top level, the colon is still necessary for the id. For example, the growl is something very common, we need to updated this way: update=”:growl”