Understand optional true in maven dependency

Today one colleague  from the other team was trying to mimic our behavior doing custom authentication on Hive-Server2. He asked me why he could not get the HiveConf.get(key) working. It basically gets the key we defined in hive-site.xml. It is convenient because if we put key/value there, we do not have to worry about path issue in cluster, just do HiveConf c = new HiveConf() and call get(key) . (side note: this way seems to be not recommended officially since now they have a bunch of enum value to restrict what you can define there) After looking at the source code. get() is actually a method in the parent Configuration class.

import org.apache.hadoop.conf.Configuration;
public class HiveConf extends Configuration

On my pom, I explicitly included the

       <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-core</artifactId>
         <version>${hadoop.core.version}</version>
      </dependency>

So i can look at it from IDE directly. However there is no such dependency in his pom, I cannot find the artifact from the dependency tree either. This is really confusing to me.

It turns out the org.apache.hive -> hive-service has dependency on some hive-shims artifacts which has hadoop-core dependency specified as optional=true.

   <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-core</artifactId>
      <version>${hadoop-20.version}</version>
<optional>true</optional>
    </dependency>

so what is optional true? How come his jar can compile even without the hadoop-core dependency specified? The below pictures are from this POST.

Meaning of <optional>

In short, if project D depend on project C, Project C optionally depend on project A, then project D do NOT depend on project A.

image

Since project C has 2 classes use some classes from project A and project B. Project C can not get compiled without dependencies on A and B. But these two classes are only optional features, which may not be used at all in project D, which depend on project C. So to make the final war/ejb package don’t contain unnecessary dependencies, use to indicate the dependency is optional, be default will not be inherited by others.

What happens if project D really used OptionaFeatureOne in project C? Then in project D‘s pom, project A need to be explicitly declared in the dependencies section.

image

If optional feature one is used in project D, then project D‘s pom need to declare dependency on project A to pass compile. Also, the final war package of project D doesn’t contain any class from project B, since feature 2 is now used.

Our case

In our scenario, the hive-service depends on hive-exec which depends on hive-shims which has optional dependency on hadoop-core. So when the Configuration.get() is not used, his project could still compile even though HiveConf extends the Configuration class. Now if the get() method is to be used, then he need to explicitly declare the dependency on hadoop-core.