site diff tool with puppeteer



We recently have a requirement to do a site diff tool so that we can compare our pages in qa and prod to make sure the changes go in is expected.

Existing Stack

There used to be a version that works but a bit complicated. The existing stack is user provides a list of url paths and 2 hosts as input for an entry lambda function, which will spawn a bunch of additional lambda functions(via SQS) for each path so that they will take respectively connect to source labs, access the 2 hosts + path, then use selenium take screen shot, then upload to some s3 bucket and eventually write some new sqs msg to another queue which will spawn some other lambda functions to do the image diff then upload the diff image to s3 then write more sqs msg and eventually trigger a reduce lambda function to generate some summary json data for UI.


To be honest it is a lot of moving parts to reason/troubleshot and maintain. Logs are scattered all around in CloudWatch as there are so many functions created.  So I decided to make a slightly easier to use version.

Basic flow of new stack

  1. get all the paths from CMS with API so that user does not need to collect them manually and copy paste in multiple different places which is error-prone
  2. use puppeteer in headless mode to access the site and manipulate the dom then take screenshot.
  3. use JIMP to create diff image based on configured threshold and then create the summary json locally
  4. upload the diff images and summary to s3 bucket which populates the UI.


With the above flow,

  • No complex stack on aws, everything is done in one place, debugging/logging is really easy.
  • Also threshold is introduced so we do not need to create diff image for all the paths which are similar enough or identical.
  • Previously there are always some diff between prod and qa due to some extra prod-only feedback banner which creates big noice for screenshot comparison as it would sometimes cause pixel shift. Now we can easily remove it via puppeteer’s modern API before taking screenshot so the result is much more accurate and concise.
  • Saves a lot of space on s3 as on S3 we only need to have paths that are above threshold. And since we are doing everything in one shot, we also do not have to upload the original screenshot to S3 which was required previously due to the nature of multi-staging process.
  • If needed we can integrate to Jenkins and run from there ideally with a slave that has puppeteer installed.

pipe operator in rxjs

Pipe is introduced so that we can combine any number of operators.

const source$ = Observable.range(0, 10);
  .filter(x => x % 2)
  .reduce((acc, next) => acc + next, 0)
  .map(value => value * 2)
  .subscribe(x => console.log(x));

Above can be converted to:

const source$ = Observable.range(0, 10);
  filter(x => x % 2),
  reduce((acc, next) => acc + next, 0),
  map(value => value * 2)
).subscribe(x => console.log(x));

Pros are:

“Problems with the patched operators for dot-chaining are:

  1. Any library that imports a patch operator will augment the Observable.prototype for all consumers of that library, creating blind dependencies. If the library removes their usage, they unknowingly break everyone else. With pipeables, you have to import the operators you need into each file you use them in.
  2. Operators patched directly onto the prototype are not “tree-shakeable” by tools like rollup or webpack. Pipeable operators will be as they are just functions pulled in from modules directly.
  3. Unused operators that are being imported in apps cannot be detected reliably by any sort of build tooling or lint rule. That means that you might import scan, but stop using it, and it’s still being added to your output bundle. With pipeable operators, if you’re not using it, a lint rule can pick it up for you.
  4. Functional composition is awesome. Building your own custom operators becomes much, much easier, and now they work and look just like all other operators from rxjs. You don’t need to extend Observable or override lift anymore.”
We can also combine and make use of a single operator:
import { Observable, pipe } from 'rxjs/Rx';
import { filter, map, reduce } from 'rxjs/operators';

const filterOutEvens = filter(x => x % 2);
const sum = reduce((acc, next) => acc + next, 0);
const doubleBy = x => map(value => value * x);

const complicatedLogic = pipe(

const source$ = Observable.range(0, 10);

source$.let(complicatedLogic).subscribe(x => console.log(x)); // 50
For tap operator, we basically can do operation/logic with side effect in it and it
would return the original observable without affected by all the modification.

debug typescript mocha and server in vscode

We recently are developing a graphql api using apollo server and Typeorm on top of Aws Lambda. Code-wise is kind of straightforward with schema defined, then resolvers then service layer then dao layer then model defined typeorm with its annotations/decorators. However there are 2 issues related to debugging, unit test and run graphql local.

unit test

For unit test, our ci/cd pipeline uses nyc/mocha as runner. Those are good for running all test suites and generating reports on coverages etc. However when it comes to debugging we need to go to ide. And as we are using typescript, there is one more layer of transpile rather than vanilla es5/6 which is this a bit more complicated.

Good news is vscode comes with a powerful built-in node debugger, with the blow config, we can just open a ts​ file with mocha tests, set break point and start debug,

  "name": "TS Mocha Tests File",
  "type": "node",
  "request": "launch",
  "program": "${workspaceRoot}/node_modules/mocha/bin/_mocha",
  "args": ["-r", "ts-node/register", "${relativeFile}"],
  "cwd": "${workspaceRoot}",
  "protocol": "inspector",
  "env": { "TS_NODE_PROJECT": "${workspaceRoot}/tsconfig.json"}
  • Sets up a node task, that launches mocha
  • Passes a -r argument, that tells mocha to require ts-node
  • Passes in the currently open file – ${relativeFile}
  • Sets the working directory to the project root – ${workspaceRoot}
  • Sets the node debug protocol to V8 Inspector mode
  • The last TS_NODE_PROJECT I have to set it as I am using Typeorm which uses annotation/decorator which requires emitDecoratorMetadata set to true which is not default.

Local Run with nodemon

Another issue is as we are using aws lambda, it is not easy to run our graphql server locally.
Need to set up a local Koa server with the schema that the Apollo lambda also uses. This way we can access our the graphiql service from the  localhost:8080/graphiql​.

import 'reflect-metadata';
import * as Koa from 'koa';
import { initDatabase } from '../../dao/data-source';
import * as Router from 'koa-router';
import * as koaBody from 'koa-bodyparser';
import {
} from 'apollo-server-koa';
import { schema } from '../../gq-schema';
import { localConf } from '../../config/config';

export const routes = new Router();

// API entrypoint
const apiEntrypointPath = '/graphql';
const graphQlOpts = graphqlKoa({
    context: {msg: 'hello context'}

// routes.get(apiEntrypointPath, graphQlOpts);, koaBody(), graphQlOpts);

// GraphiQL entrypoint
routes.get('/graphiql', graphiqlKoa({ endpointURL: apiEntrypointPath }));

(async () => {
  const app = new Koa();

Now we can have nodemon run this server so every time we make any code change, the server will reload with the new content. Put below content in the nodemon.json in the project root.

  "watch": ["./src"],
  "ext": "ts",
  "exec": "ts-node --inspect= ./path/to/above/server.ts"

Notice we run ts-node with 9229 port flag which is the default debug port for chrome so that we can later do debug in the chrome’s built-in node-debugger which is a green cube icon in the chrome’s dev tool console.

Now we can run local server by adding command into package.json:

"local": "npm run build && nodemon"

Then run npm run local OR ​yarn local.

Option2 debug server with vscode

To debug the above  server with vscode, we need to add some config into the launch.json.

      "name": "Local Graphql Server",
      "type": "node",
      "request": "launch",
      "args": [
      "runtimeArgs": [
      "sourceMaps": true,
      "cwd": "${workspaceRoot}",
      "protocol": "inspector",

  • Sets up a node task that starts the currently open file in VS Code (the${relativeFile} variable contains the currently open file)
  • Passes in the --nolazy arg for node, which tells v8 to compile your code ahead of time, so that breakpoints work correctly
  • Passes in -r ts-node/register for node, which ensures that ts-node is loaded before it tries to execute your code
  • Sets the working directory to the project root – ${workspaceRoot}
  • Sets the node debug protocol to V8 Inspector mode (see above)

Now we can set break point in vscode and start debugging.

PS: No Enum in x.d.ts

One thing I notice today is in the xxx.d.ts file which is the module definition file, never define thing like Enum inside as this file is used for type/interface definition only and the content will NOT compile to js hence will not available in the run time. So if you defined anything like enum here it will compile fine but when you run the application, as long as you use these enums​, you get runtime error.

One alternative solution is to use custom type and define the list of strings:

export type MessageLevel = "Unknown" | "Fatal" | "Critical" | "Error";

href # does not always do nothing

Today we were facing an issue that the angular universal generates an anchor element with href="false" so before the js is loaded, the anchor would lead user to /false.

My first thought is to just put a href="#' so that it does nothing. Our url is something like After adding the #, the click always navigate to uri: It turns out we have a base tag defined in our html: <base href="/" /> which changes the behavior of the #.

One solution is to use <a href="javascript:void(0);"> , which will make the anchor do nothing. However one drawback is if we click the anchor before the angular finish bootstrapping, the anchor’s directive cannot be loaded so it remains doing nothing forever…

Eventually what we did is adding a onclick="return false;" to the anchor and it will be removed after the directive comes in and replace the behavior. This ways we make sure the anchor does not do anything before all js load, also the js context work as expected even if the anchor is clicked before it finishes loading.

javascript statement completion value

We all know how painful it is to write long and tedious cloud formation json templates. Cloudform is a nice little library to write cloud formation template in typescript and will compile it into json. Its types are all from amazon’s own library so it is easy to keep up to date without the maintainer having to write new code to support the new aws features.

As I was reading the source code of cloudform compiler where it eventually emit the cloud formation template via a eval() function, i was kind of confused how it works. As i dig deeper, find an interesting concept in js that I never heard of: statement completion value.

To test that, i did some test on the node cli:

const s0 = 'const f1 = ()=>1;';
const s1 = 'const f1 = ()=>1; f1();';
const s2 = 'const f1 = ()=>1; f2 = ()=>2; f1(); f2()';
const s3 = 'const f1 = ()=>1; f1(); "2"';

// output undefined
// output 1
// output 2
// output 2

So this means, it always take the last valid return value.

Another note is IIFE can also return value as long as you have return statement. And async is not well supported as the eval is executed synchronously, so logic executed in other ticks will not get a chance to return unless we do some monkey patching with async/await.

interesting reference on question by Paul Irish on Twitter on 01/2017 and answered by Brendan Eich: HERE

package json bin field

We can define some executable js file in the bin section of the package json so that during installation, npm will symlink that file into prefix/bin for global installs, or ./node_modules/.bin/ for local installs.

The file js need to start with the node flavor hash ban : #!/usr/bin/env node,  where the env will file the path of node in the system for us.

And npm will also add the ./node_modules/.bin into the PATH for its script section so if we define something like `{ “bin” : { “myapp” : “./cli.js” } }`. Then in the `script` we just need to define:

run-myapp: myapp. rather than run-myapp: ./node_modules/.bin/myapp.

debug nodejs in chrome

The new chrome ships with the about:inspect and a dedicated debugger for nodejs which is super cool! At least we do not have to rely solely on console.log() magic.

To do that, run script with an additional flag:

node --inspect myServer.js

This should fire up the app and then go to a new tab and enter about:inspect. Then click on the open dedicated DevTools for Node link, which opens a debugger for node. Now you can start to put break points etc…

More options HERE and a video intro.