replace indexed but not stored data in lucene

Was trying to replace indexed but not stored data in lucene . found this thread has the same issue:

> > >> I have a strange problem with Field.Store.NO and Field.Index.ANALYZED
> > >> fields with Lucene 3.0.1.
> > >>
> > >> I'm testing my app with twenty test documents. Each has about ten
> > >> fields. All fields except one, "Content", are set as Field.Store.YES.
> > >> The "Content" field is set as Field.Store.NO and
> > >> Field.Index.ANALYZED. Using Luke, I discovered that this "Content"
> > >> field is not persisted to the disk, except on one document (neither
> > >> the first nor the last in the list). This always happens for exactly
> > >> the same document. When I examine the Document object before writing
> > >> it, it has the "Content" field I expect.
> > >>
> > >> When I change the "Content" field from Field.Store.NO to
> > >> Field.Store.YES, everything starts working. Every document has the
> > >> "Content" field exactly as I expect, and searches produce the hits I
> > >> expect to see. I really don't want to save the full "Content" data in
> > >> the Lucene index, though. I'm baffled why Field.Store.NO results in
> > >> nothing being written to the index even with Field.Index.ANALYZED.
> > I finally had time to go back and look at this problem. I discovered that
> the
> > analyzed fields work fine for searching until I use
> > IndexWriter.updateDocument().
> >
> > The way my application runs, it has to update documents several times to
> > update one specific field. The update code queries out Document objects
> using
> > a unique identifier, and updates the field. The problem is in Document
> objects
> > returned by the query. The querying code runs a search, and eventually
> calls
> > IndexSearcher.doc(int). According to the API documentation, that method
> only
> > returns Document objects with stored fields from the underlying index.
> >
> > I tried calling IndexSearcher.doc(int i, FieldSelector fieldSelector)
> with
> > fieldSelector set to null: the documentation states that this returns
> Document
> > objects with all fields, but that also only seems to return stored
> fields.
> >
> > So my question becomes: how can I update a document which contains non-
> > stored analyzed fields without clobbering the analyzed-only fields?
> > Note that I do not need to update the analyzed-only fields. I have found
> nothing
> > helpful in the documentation.
> You cannot retrieve non-stored fields. They are analyzed and tokenized
> during indexing and this is a one-way transformation. If you update
> documents you have to reindex the contents. If you do not have access to
> the
> original contents anymore, you may consider adding a stored-only "raw
> document" field, that contains everything to rebuild the indexed fields. In
> our installation, we have a stored field containing the JSON/XML source
> document to do this.
Adding to Uwe's comment, you may be operating under a false
assumption. Lucene has no capability to update fields in a document.
Period. This is one of the most frequently requested changes, but
the nature of an inverted index makes this...er...tricky. Updates
are really a document delete followed by a document add. And as
a bonus, the new document won't even have the same internal
Lucene doc id as the one it replaces.

So if you're reading a document from the index, non-stored fields
are not part of the new update and your results will be...uhmmmm....
not what you expect...

Also this is a good link for mistakes when using lucene.

This is a good article for lucene coding reference.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s