regex dash – inside bracket []

We have a requirement to remove some illegal character in file name like slash etc… So I put on a regex in our code:

[^a-zA-Z0-9.-_]

I assume the above regex will match everything that is not number/char/dot/dash/underscore.

Turns out I was wrong! The - is special case inside [...] that is used for range. It should be in the beginning or in the last or escaped. Otherwise it will match all the character that is in between . and _ in ASCII character set. So in my case, the .-_ part will try to match characters from 46(.)-95(_) in ASCII char table.

The correct one should be putting it in the last  [^a-zA-Z0-9._-] . Or just escape it: [^a-zA-Z0-9.\-_]

Advertisements

print all data in paginated table/grid

direct Tabular data display

Recently our project has a page need to show tabular data from 30-6000 rows with 3-4 columns. At first, I thought this is pretty reasonable data to show in one page so I just throw the data into a ng-repeat table with my own implementation of filtering/sorting which is pretty straightforward in angular. Every time user select new category/type, fetch data from backend and replace the data in vm/$scope. Also with this implementation, it is quite easy to fulfill our another requirement which is export/print the page content. For export I just need to get the DOM content to the server and return as downloadable. For print, even easier, just call window.print() ,that’s it.

Performance issue with IE

Everything works fine until  our QA hits IE which is super slow when the data in the list is replaced from backend. Did some profiling in IE11, turns out the appendChild and removeChild calls are taking forever when it tries to clear the rows in the dom and put the new elements into dom. Also another slowness is from styleCalculation which it does for every column/row. Overall, IE takes 20s to render a page with 5000 rows and FF/safari/chrome need only 1-2 seconds. This forces us to abandon the straightforward way to use the more IE friendly way which is pagination with angular ui-grid. But this brings us to another problem which is print since data is now paginated and DOM only has 20 rows.

Server side render and client side print

What I eventually did is sending the model data back to server and do server side rendering and eventually send back to browser where an iFrame is created on the fly for printing. The pros of doing this is we have a lot of flexibility on content/layout by whatever manipulation/styling etc… The cons is we added more stuff to the stack and one more round trip comparing to the direct print.

server side

So on server side, when we get the REST call for print, we have a Thymeleaf template there for generating the html. I compared different java server side rendering engines like Velocity/Freemaker/Rythm etc, looks like Thymeleaf has the best Spring integration and most active development/release.

@Configuration
public class ThymeleafConfig
{
    @Autowired
    private Environment env;

    @Bean
    @Description("Thymeleaf template rendering HTML ")
    public ClassLoaderTemplateResolver exportTemplateResolver() {
        ClassLoaderTemplateResolver exportTemplateResolver = new ClassLoaderTemplateResolver();
        exportTemplateResolver.setPrefix("thymeleaf/");
        exportTemplateResolver.setSuffix(".html");
        exportTemplateResolver.setTemplateMode("HTML5");
        exportTemplateResolver.setCharacterEncoding(CharEncoding.UTF_8);
        exportTemplateResolver.setOrder(1);
        //for local development, we do not want template being cached so that we could do hot reload.
        if ("local".equals(env.getProperty("APP_ENV")))
        {
            exportTemplateResolver.setCacheable(false);
        }
        return exportTemplateResolver;
    }

    @Bean
    public SpringTemplateEngine templateEngine() {
        final SpringTemplateEngine engine = new SpringTemplateEngine();
        final Set<ITemplateResolver> templateResolvers = new HashSet<>();
        templateResolvers.add(exportTemplateResolver());
        engine.setTemplateResolvers(templateResolvers);
        return engine;
    }
}

With the engine we confined, we could used like:

            Context context = new Context();
            context.setVariable("firms", firms);
            context.setVariable("period", period);
            context.setVariable("rptName", rptName);
            context.setVariable("hasFirmId", hasFirmId);
            if (hasFirmId)
            {
                context.setVariable("firmIdType", FirmIdType.getFirmIdType(maybeFirmId).get());
            }

            return templateEngine.process("sroPrint", context);

Template with name sroPrint has some basic Theymleaf directives:

<html xmlns:th="http://www.thymeleaf.org">
<head>
<style>
    table thead tr th, table tbody tr td {
      border: 1px solid black;
      text-align: center;
    }
  </style>

</head>
<body>
<div>
<h4 th:text="${rptName}">report name</h4>
<div style="margin: 10px 0;"><b>Period:</b> <span th:text="${period}"></span>
<div>
  <h4 th:text="${rptName}">report name</h4>
  <div style="margin: 10px 0;"><b>Period:</b> <span th:text="${period}"></span></div>
  <table style="width: 100%; ">
    <thead>
    <tr>
      <th th:if="${hasFirmId}" th:text="${firmIdType}"></th>
      <th>crd #</th>
      <th>Firm Name</th>
    </tr>
    </thead>
    <tbody>
    <tr th:each="firm : ${firms}">
      <td th:if="${hasFirmId}" th:text="${firm.firmId}"></td>
      <td th:text="${firm.crdId}">CRD</td>
      <td th:text="${firm.firmName}">firm name</td>
    </tr>
    </tbody>
  </table>
</div>
</body>
</html>

client side

Now on the client side we need to consume the HTML string from the client side. The flow is we create an iFrame, write the html into it and call browser print on that iFrame and remove the element from DOM. The below implementation is inside the success callback of $http call for getting that dom string. It is in pure js without jQuery, with which it might be a bit more concise.


var printIFrame = document.createElement('iframe');
document.body.appendChild(printIFrame);
printIFrame.style.position = 'absolute';
printIFrame.style.top = '-9999px';
printIFrame.style.left = '-9999px';
var frameWindow = printIFrame.contentWindow || printIFrame.contentDocument || printIFrame;
var wdoc = frameWindow.document || frameWindow.contentDocument || frameWindow;
wdoc.write(res.data);
// tell browser write finished
wdoc.close();
$scope.$emit('UNLOAD');
// Fix for IE : Allow it to render the iframe
frameWindow.focus();
try {
    // Fix for IE11 - printng the whole page instead of the iframe content
    if (!frameWindow.document.execCommand('print', false, null)) {
        // document.execCommand returns false if it failed -http://stackoverflow.com/a/21336448/937891
        frameWindow.print();
    }
    // focus body as it is losing focus in iPad and content not getting printed
    document.body.focus();
}
catch (e) {
    frameWindow.print();
}
frameWindow.close();
setTimeout(function() {
    printIFrame.parentElement.removeChild(printIFrame);
}, 0);

PDF/XLS Export

For xls/pdf export, it is similar to the other POST that I have before. The only difference is the dom string was passed from client there. Here we generate the dom string in server side.

Understand optional true in maven dependency

Today one colleague  from the other team was trying to mimic our behavior doing custom authentication on Hive-Server2. He asked me why he could not get the HiveConf.get(key) working. It basically gets the key we defined in hive-site.xml. It is convenient because if we put key/value there, we do not have to worry about path issue in cluster, just do HiveConf c = new HiveConf() and call get(key) . (side note: this way seems to be not recommended officially since now they have a bunch of enum value to restrict what you can define there) After looking at the source code. get() is actually a method in the parent Configuration class.

import org.apache.hadoop.conf.Configuration;
public class HiveConf extends Configuration

On my pom, I explicitly included the

       <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-core</artifactId>
         <version>${hadoop.core.version}</version>
      </dependency>

So i can look at it from IDE directly. However there is no such dependency in his pom, I cannot find the artifact from the dependency tree either. This is really confusing to me.

It turns out the org.apache.hive -> hive-service has dependency on some hive-shims artifacts which has hadoop-core dependency specified as optional=true.

   <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-core</artifactId>
      <version>${hadoop-20.version}</version>
<optional>true</optional>
    </dependency>

so what is optional true? How come his jar can compile even without the hadoop-core dependency specified? The below pictures are from this POST.

Meaning of <optional>

In short, if project D depend on project C, Project C optionally depend on project A, then project D do NOT depend on project A.

image

Since project C has 2 classes use some classes from project A and project B. Project C can not get compiled without dependencies on A and B. But these two classes are only optional features, which may not be used at all in project D, which depend on project C. So to make the final war/ejb package don’t contain unnecessary dependencies, use to indicate the dependency is optional, be default will not be inherited by others.

What happens if project D really used OptionaFeatureOne in project C? Then in project D‘s pom, project A need to be explicitly declared in the dependencies section.

image

If optional feature one is used in project D, then project D‘s pom need to declare dependency on project A to pass compile. Also, the final war package of project D doesn’t contain any class from project B, since feature 2 is now used.

Our case

In our scenario, the hive-service depends on hive-exec which depends on hive-shims which has optional dependency on hadoop-core. So when the Configuration.get() is not used, his project could still compile even though HiveConf extends the Configuration class. Now if the get() method is to be used, then he need to explicitly declare the dependency on hadoop-core.