xpath name space

XPath is a way to navigate through an XML document, modeled on filesystem paths. An XPath expression consistes of a series of “node tests” that describe how to traverse from one element to another within a document. For example, given the following XML, the path “/foo/bar/baz” selects the single “baz” node:

<foo>
    <bar name='argle'>
        Argle content
    </bar>
    <bar name='bargle'>
        Bargle content
        <baz>Baz content</baz>
    </bar>
</foo>

Like a filesystem path, XPaths can be relative to a given node: the path “bar/baz” could be evaluated from node “foo” to produce the same result as the absolute path above. Also like a filesystem path, “.” refers to the current node, and “..” refers to the parent of the current node. This means that you could start from the first “bar” node, and use the relative path “../bar/baz” to again select the “baz” node — passing through the second “bar” node on the way.

Unlike filesystem paths, however, an XPath can specify a predicate: a logical test that’s applied to the set of nodes selected at that point in the path. For example, the XML fragment above has two nodes named “bar”. If you want to ensure that you select only the first, you could use a positional predicate: “/foo/bar[1]”. Or, you could use a predicate that selects nodes with a specific value for the attribute name: “/foo/bar[@name='argle']”. You can also combine predicates: “/foo/bar[2][@name='argle']” is legal, although it won’t select any nodes from the sample document.

The XPath specification will tell you all of the variants of an XPath expression. This article is about how to evaluate XPaths in Java. So here’s the code to execute the first example above:

Document dom = // however you get the document
XPath xpath = XPathFactory.newInstance().newXPath();
String result = xpath.evaluate("/foo/bar/baz", dom);

Pretty simple, eh? Like everything in the javax.xml hierarchy, it uses a factory: XPath is an interface, and the actual implementation class will depend on your JRE (the Sun 1.5 JDK uses the Apache Xerces implementation). Also, like the other packages in javax.xml, as you start to use more of the features of XPath the complexity (and code required) increases.

Result Types

The first piece of complexity is what, exactly, you get when evaluating an XPath expression. The simple answer is that it returns the list of nodes that match the path and for which all predicates evaluate true, in document order. Yet the example above returns a string value. And what happens with a path like “/foo/bar”, which selects two nodes?

The answer is that the XPath specification requires any result be convertible into a single string value. The value of a single node is the text contained within that node and its descendents, and the string value of a list of nodes is the string value of the first node in that list. In many cases, the string value of an expression is all that you need, which is why the basic evaluate()method exists.

If you need more control over your results, such as getting the actual nodes selected, there’s a form of evaluate() that lets you specify the result type; the valid types are defined as constants in XPathConstants. For example, to return the selected nodes:

NodeList nodes = (NodeList)xpath.evaluate("/foo/bar", dom, XPathConstants.NODESET);

Note that the result type name is NODESET but the actual return type is NodeList. Chalk this up to different spec editors and reuse on the part of the Xerces team: NodeList is from the DOM Level 1 spec, and lives in the org.w3c.dom package.

The term “nodeset” is confusing in several ways. First, it is most definitely not a java.util.Set: XPath always returns nodes in document order.

Nor is it a java.util.List. Being defined by the DOM spec means that NodeList provides the methods getLength() and item() rather than anything from the Java collections framework. The Practical XML library provides several work-arounds ranging from a simple asList() method to NodeListIterable, a wrapper that allows NodeList objects to be used directly with JDK 1.5 for-each loops.

Namespaces

Namespaces are perhaps the biggest source of pain in working with XPath. To see why, consider the following XML. It looks a lot like the example at the top of this article, with the addition of a default namespace. You might think that the example XPath expressions would work unchanged. You’d be wrong.

<foo xmlns='http://www.example.com'>
    <bar name='argle'>
        Argle content
    </bar>
    <bar name='bargle'>
        Bargle content
        <baz>Baz content</baz>
    </bar>
</foo>

The reason: XPath is fully namespace aware, and node tests match both localname and namespace. If you don’t specify a namespace in the path, the evaluator assumes that the node doesn’t have a namespace. Making life difficult, there’s no way to explicitly specify a namespace as part of the node test; you must instead use a “qualified name” name (prefix:localname), and provide an external mapping from prefix to namespace.

So, to select node baz, we first have to add a prefix to each node in the expression: “/ns:foo/ns:bar/ns:baz”. And then, we have to give the evaluator a NamespaceContext object to tell it the namespace bound to that prefix:

        XPathFactory factory = XPathFactory.newInstance();

        XPath xpath = factory.newXPath();
        xpath.setNamespaceContext(new NamespaceContext()
        {
            @Override
            public String getNamespaceURI(String prefix)
            {
                return "http://www.example.com";
                //if there are multiple name space, just use if else to return them.
                //or create a map and delegate the get URI process.
            }

            @Override
            public String getPrefix(String namespaceURI)
            {
                return null;
            }

            @Override
            public Iterator getPrefixes(String namespaceURI)
            {
                return null;
            }
        });

One most important thing here is when you get the document factory, remember to set it name space aware.


		DocumentBuilderFactory builderFactory = DocumentBuilderFactory
				.newInstance();
		builderFactory.setNamespaceAware(true);
		DocumentBuilder builder = builderFactory.newDocumentBuilder();

One last thing to remember about namespaces and qualified names: the prefix doesn’t matter. It’s just a way to find the actual namespace URI in a lookup table. The prefixes in your XPath expression don’t have to match those in the source document. As shown above, the XPath expression has to use a prefix even when the source XML doesn’t. Similarly, the source XML could have a node named “foo:bar”, and the XPath expression could use “baz:bar”. As long as both prefixes resolve to the same URI, the XPath will return the correct element.

 

Here is a very good example . download its source code for detail .

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s