site diff tool with puppeteer

Background

 

We recently have a requirement to do a site diff tool so that we can compare our pages in qa and prod to make sure the changes go in is expected.

Existing Stack

There used to be a version that works but a bit complicated. The existing stack is user provides a list of url paths and 2 hosts as input for an entry lambda function, which will spawn a bunch of additional lambda functions(via SQS) for each path so that they will take respectively connect to source labs, access the 2 hosts + path, then use selenium take screen shot, then upload to some s3 bucket and eventually write some new sqs msg to another queue which will spawn some other lambda functions to do the image diff then upload the diff image to s3 then write more sqs msg and eventually trigger a reduce lambda function to generate some summary json data for UI.

diff-tool-workflow

To be honest it is a lot of moving parts to reason/troubleshot and maintain. Logs are scattered all around in CloudWatch as there are so many functions created.  So I decided to make a slightly easier to use version.

Basic flow of new stack

  1. get all the paths from CMS with API so that user does not need to collect them manually and copy paste in multiple different places which is error-prone
  2. use puppeteer in headless mode to access the site and manipulate the dom then take screenshot.
  3. use JIMP to create diff image based on configured threshold and then create the summary json locally
  4. upload the diff images and summary to s3 bucket which populates the UI.

Result

With the above flow,

  • No complex stack on aws, everything is done in one place, debugging/logging is really easy.
  • Also threshold is introduced so we do not need to create diff image for all the paths which are similar enough or identical.
  • Previously there are always some diff between prod and qa due to some extra prod-only feedback banner which creates big noice for screenshot comparison as it would sometimes cause pixel shift. Now we can easily remove it via puppeteer’s modern API before taking screenshot so the result is much more accurate and concise.
  • Saves a lot of space on s3 as on S3 we only need to have paths that are above threshold. And since we are doing everything in one shot, we also do not have to upload the original screenshot to S3 which was required previously due to the nature of multi-staging process.
  • If needed we can integrate to Jenkins and run from there ideally with a slave that has puppeteer installed.
Advertisements

unsubscribe in rxjs(angular 2+)

background

In reactive world(rxjs/ng2+), it is common and convenient to just create some subject/observable and subscribe to them for event handling etc. It is like the gof observer pattern​ out of the box.

issue

One caveat we recently have is, we call subscribe() of some subjects from our service in our ngOnInit or ngAfterViewInit funtions we forget to unsubscribe the subscriptions in our component. The consequence is each time the component is recreated during route change, one more subscription will be added to the subject, this is pretty bad if we are doing something heavy in the callback or even worse making some http call.

solution 1 – unsubscribe in ngOnDestroy

One solution is to call keep a reference of the subscription which is return by the subscribe function and then call its unsubscribe() function in the angular’s ngOnDestroy() lifecycle hook. It would work and is fine if there are only a few of them. If there are many and need to be called on each related component, it would be quite tedious.

Solution 2 – custom decorator calling ngOnDestroy

Another solution is to write a custom decorator which will provide logic for ngOnDestory. And the component itself still need to keep a list of subscriptions.

Solution 3 – use takeUntil operator

This way is to use a global subject to tell all subscription to stop taking values once it emit a value. It is more declarative IMHO.

import { OnDestroy } from '@angular/core';
import { Subject } from 'rxjs/Subject';

/**
 * extend this class if component has subscription need to be unsubscribed on destroy.
 *
 * example: myObservable.takeUntil(this.destroyed$).subscribe(...);
 */
export abstract class UnsubscribableComponent implements OnDestroy {
  // the subject used to notify end subscription(usually with `takeUntil` operator).
  protected destroyed$: Subject = new Subject();

  protected constructor() {}

  ngOnDestroy(): void {
    this.destroyed$.next(true);
    this.destroyed$.complete();
  }
}

So in the component it can be something like:

export class MyOwnComponent extends UnsubscribableComponent implements OnInit {
  ngOnInit() {
    this._eventsContainerService.allEventsInfo
      .takeUntil(this.destroyed$)
      .subscribe(result => {
        if (result) {
          this.handleAllEventsLoad(result);
        }
      });
}

 

sessionStorage/localStorage scope

Firstly, localStorage and sessionStorage are 2 objects on the window object. They tie to the origin of the current window.

As a result they are bind to :

  1. protocol, http/https are different
  2. domain
    1. subdomain can share with parent by manually setting document.domain.
    2. xxx.capitalone.com cannot share with yyy.capitalone.com
  3. port

Same thing apply to 302 redirect. The session/local storage value set on a page is not available on the page after redirect as long as they are different origin, even if they are in the SAME tab/window.

It can also be understood as per application based, as their values can be viewed in the dev-tool’s Application Tab.

 

WHATWG spec

MDN link

webpack custom plugin

Recently we work with a platform which need to use webpack to build some ng2/4 assets and also some custom steps to pull data from a headless cms(via gulp) and eventually render components. One problem here is we cannot do live reload/recompile that every time we make some change we have to run the npm command again to compile the resources.

To solve the issue, i decided to write a custom webpack plugin to make browser-sync and webpack work together.

The basic flow is 1. run webpack in watch mode so every time a resouce(ts/css/html) changes, webpack auto re-compile, 2. serve the resources via browser-sync, here browserSync just serve as a mini express server and provide browser reload capability. 3. a webpack plugin to start the browser-sync server and register the reload event when webpack compilation is done.

Plugin

The webpack api is pretty straightforward, it exposes a compile object from the plugin’s apply function. It represents the fully configured Webpack environment. This object is built once upon starting Webpack, and is configured with all operational settings including options, loaders, and plugins. When applying a plugin to the Webpack environment, the plugin will receive a reference to this compiler. Use the compiler to access the main Webpack environment.

const browserSync = require('browser-sync');

function WebpackAndromedaPlugin(options) {
  console.log(`WebpackAndromedaPlugin options: ${JSON.stringify(options, null, 2)}`);

  let browserSyncConfig = require('./lib/dev-server').getBrowserSyncConfig(options);
  browserSyncConfig.server.middleware.push(require('./lib/dev-server').makeServeOnDemandMiddleware(options));
  browserSync.init(browserSyncConfig);
}

WebpackAndromedaPlugin.prototype.apply = (compiler) => {

  compiler.plugin("compile", function (params) {
    console.log('--->>> andromeda started compiling');
  });

  compiler.plugin('after-emit', (_compilation, callback) => {
    console.log('--->>> files prepared');
    browserSync.reload();
    callback();
  });
}

as we can see above, we can register our callbacks with compiler.pulgin(), where webpack exposes different stages for us to interact.

Another important object is compilation, which  represents a single build of versioned assets. While running Webpack development middleware, a new compilation will be created each time a file change is detected, thus generating a new set of compiled assets. A compilation surfaces information about the present state of module resources, compiled assets, changed files, and watched dependencies. The compilation also provides many callback points at which a plugin may choose to perform custom actions.

For example, all the generated files will be in compilation.assets object.

webpack-dev-middleware

The webpack-dev-middleware is a nice little express middleware that serves the files emitted from webpack over a connect server. One good feature it has is serving files from memory since it uses a in memory file system, which exposes some simple methods to read/write/check-existence in its MemoryFileSystem.js.  The webpack-dev-middleware also exposes some hooks like close/waitUntilValid etc, unfortunately the callback that waitUntilValid registers will only be called once according to the compileDone function here. Anyway, it is still an efficient tool to serve webpack resources and very easy to integrate with the webpack nodejs APIs:

~function() {
  const options = require('./config');
  let  webpackMiddleware = require("webpack-dev-middleware");

  let webpack = require('webpack');
  let browserSyncConfig = getBrowserSyncConfig(options);
  browserSyncConfig.server.middleware.push(makeServeOnDemandMiddleware(options));
  const compiler = webpack(require('./webpack.dev'));
  compiler.plugin('done', ()=>browserSync.reload())
  let inMemoryServer = webpackMiddleware(compiler, {noInfo: true, publicPath:'/assets'});
  browserSyncConfig.server.middleware.push(inMemoryServer);
  browserSync.init(browserSyncConfig);
}();

 

webpack-dev-server

The webpack-dev-server is basically a wrapper over the above webpack-dev-middleware. it is good for simple resources serving since it does not expose much. I was trying to find a hook to it to intercept the resource it generates/serves but did not get a good solution. If you need more customization, it would be better to go with webpack-dev-middleware.

a very detailed webpack intro article

debug hover item in chrome devtools

Chrome devtools is our friend, always.

Today when I was developing an angular 4.x app with primeng library, i have to check the class set on the tooltip component. As we know the tooltip is hover event based, so if we hover on it to make it showup and then shift our focus to the dev tool element tab, the tooptip would disappear.

Chrome tool has a feature to activate the hover stuff(:hover) on specific element for CSS sake. It is quite handy but obviously it does not apply in this use case since this tooltip is js based.

Search around and finally find a solution: using F8 or CMD + \ which is pause the script execution.

Steps are quite straightforward:

Mouse over the tooltip, and press F8 while it is displayed.

Now you can now use the inspector to look at the CSS.

A deeper look at event loop (micro/macro tasks)

One common question

(function test() {
    setTimeout(function() {console.log(4)}, 0);
    new Promise(function executor(resolve) {
        console.log(1);
        for( var i=0 ; i<10000 ; i++ ) {
            i == 9999 && resolve();
        }
        console.log(2);
    }).then(function() {
        console.log(5);
    });
    console.log(3);
})()

So why the result is 1,2,3,5,4 rather than 1,2,3,4,5

If we look at the detail, looks like the async of setTimeout is different from the async of Promise.then, at least they are not in the same async queue.

The answer is here in the whatwg SPEC.

  • An event loop has one or more task queues.(task queue is macrotask queue)
  • Each event loop has a microtask queue.
  • task queue = macrotask queue != microtask queue
  • a task may be pushed into macrotask queue,or microtask queue
  • when a task is pushed into a queue(micro/macro),we mean preparing work is finished,so the task can be executed now.

And the event loop process model is as follows:

when call stack is empty,do the steps-

  1. select the oldest task(task A) in task queues
  2. if task A is null(means task queues is empty),jump to step 6
  3. set “currently running task” to “task A”
  4. run “task A”(means run the callback function)
  5. set “currently running task” to null,remove “task A”
  6. perform microtask queue
    • (a).select the oldest task(task x) in microtask queue
    • (b).if task x is null(means microtask queues is empty),jump to step (g)
    • (c).set “currently running task” to “task x”
    • (d).run “task x”
    • (e).set “currently running task” to null,remove “task x”
    • (f).select next oldest task in microtask queue,jump to step(b)
    • (g).finish microtask queue;
  7. jump to step 1.

a simplified process model is as follows:

  1. run the oldest task in macrotask queue,then remove it.
  2. run all available tasks in microtask queue,then remove them.
  3. next round:run next task in macrotask queue(jump step 2)

something to remember:

  1. when a task (in macrotask queue) is running,new events may be registered.So new tasks may be created.Below are two new created tasks:
    • promiseA.then()’s callback is a task
      • promiseA is resolved/rejected:  the task will be pushed into microtask queue in current round of event loop.
      • promiseA is pending:  the task will be pushed into microtask queue in the future round of event loop(may be next round)
    • setTimeout(callback,n)’s callback is a task,and will be pushed into macrotask queue,even n is 0;
  2. task in microtask queue will be run in the current round,while task in macrotask queue has to wait for next round of event loop.
  3. we all know callback of “click”,”scroll”,”ajax”,”setTimeout”… are tasks,however we should also remember js codes as a whole in script tag is a task(a macrotask) too.

 

In nodejs world:  setImmediate()is macro/task, and process.nextTick() is a micro/job

 

One good discussion in Chinese and blog.

Fighting with browser popup block

Background

Recently in our project, we have a need of refactoring some old struct actions to rest based pages. This way we avoid multiple page navigation for our user so that all the stuff can be done in a single page.

One example is file download. Previously in the struts based app, if a page have 12 files. What user have to do is click the download link in the main page, if available, user will be taken to the download page where the real download link is, then download. if not available, user will be taken to a request page for confirmation and then once confirmed, to the download page to wait. So to download all the files, user have to constantly navigate between different pages with a lot of clicks which is kind of crazy. In the coming single page application, everything(request/confirm/download) is in the same page which is much better.

Issue

However, we hit one issue. When user click the download link, the same as the above flow, we first need to make an ajax call back to server to check, if not available, a modal will show up for confirming request. otherwise get the download id and open a new tab for download the stream. The problem comes from this point where the browser(chrome/FF, safari does not) will block the download tab from opening. Tried it both form submit and window open. What is really bad is in chrome the block notification is really not noticeable, which is a tiny icon on the upper-left where user can barely see.

check status

        this.requestDetail = function (requestObj, modalService) {
            that.checkDetailStatus(requestObj).then(
                function success(res) {
                    var status = res.data.status;
                    switch (status) {
                        case 'AVAIL_NOT_REQ':
                            that.createNewRequest(requestObj, modalService);
                            break;
                        case 'NO_DATA':
                            $.bootstrapGrowl('No data available!', {type: 'info'});
                            break;
                        case 'EXISTING_RPT':
                            that.downloadFile(res.data.requestId);
                            break;
                        case 'PENDING':
                            //add user to notify list then redirect
                            that.mapNotifyUser(res.data.requestId).then(
                                function success(res) {
                                    var DETAIL_RUN_INTERVAL = 3;
                                    var minute = DETAIL_RUN_INTERVAL - res.data.minute % DETAIL_RUN_INTERVAL;
                                    $.bootstrapGrowl('Your detail data file will be ready in ' + minute + ' minutes.', {type: 'info'});
                                });
                            break;
                        case 'ERROR':
                            $.bootstrapGrowl('Error Getting Detail data! Contact Admin or Wait for 24 hour to re-request.', {type: 'danger'});
                            break;
                        default:
                            $.bootstrapGrowl('Error Getting Detail data, Contact ADMIN', {type: 'danger'});
                    }
                },
                function error(err) {
                    console.log(err);
                    $.bootstrapGrowl('Network error or Server error!', {type: 'danger'});
                }
            );
        };

with form


        this.downloadFile = function (requestId) {
            //create a form which calls the download REST service on the fly
            var formElement = angular.element("
<form>");
            formElement.attr("action", "/scrc/rest/download/detail/requestId/" + requestId);
            formElement.attr("method", "get");
            formElement.attr("target", "_blank");
            // we need to attach iframe to the body before form could be attached to iframe(below) in ie8
            angular.element('body').append(formElement);
            //call the service
            formElement.submit();
        };

With window

        this.downloadFile = function (requestId) {
            $window.open('/scrc/rest/download/detail/requestId/' + requestId);
        };

Cause

Turns out the issue is: A browser will only open a tab/popup without the popup blocker warning if the command to open the tab/popup comes from a trusted event. That means the user has to actively click somewhere to open a popup.

In this case, the user performs a click so we have the trusted event. we do lose that trusted context, however, by performing the Ajax request. Our success handler does not have that event anymore.

Possible Solutions

  1. open the popup on click and manipulate it later when the callback fires

      var popup = $window.open('','_blank');
      popup.document.write('loading ...');
      ...
      inCallBack(){
        //existing:
        popup.location.href = '/scrc/rest/download/detail/requestId/' + res.data.requestId;
        // other:
        popup.close();
    
      }
    

    this will work but not elegant since it opens a tab and close instantly but still create a flash in browser that user could notice.

  2. you can require the user to click again some button to trigger the popup. This will work because we could update the link if existing then user click again, we init the download so popup is triggered by user directly. But still not quite user friendly

  3. Notify user to unblock our site.
    This is eventually what we do. So we detect on the client side if popup is blocked. If so, we ask user to unblock our site in setting. The reason we use this is the unblock/trust action is really a one time thing that browser will remember the behavior and will not bother user again.

            this.downloadFile = function (requestId) {
                var downloadWindow = $window.open('/scrc/rest/download/detail/requestId/' + requestId);
                if(!downloadWindow || downloadWindow.closed || typeof downloadWindow.closed=='undefined')
                {
                    $.bootstrapGrowl('Download Blocked!<br\> Please allow popup from our site in your browser setting!', {type: 'danger', delay: 8000});
                }
            };