Recently I was doing a MVP for replacing a ELB/EC2/Docker based static site preview stack with a cloudfront/lambda/s3 based one.
The purpose of this is to
- reduce the maintenance we has to do with the EC2 stack like regular AMI update.
- reduce the complexity of the stack as the previous one involves building custom image, store image, cloud-formation to bring up stack, ec2 user data to init the system(pull image, run docker compose etc).
- reduce the cost as ELB, EC2 have to run 7/24.
- increase the stability as we know
lambdadoes not rely on any specific run time whereas our docker containers still have to run on some instace even though docker has done a pretty good job on isolation.
The existing stack is like below. On init, the docker containers will pull code from github, run installation on node dependencies, run preview command which is a
browser-sync server pulls data from
CMS and return combined html to client browser.
When code in Github updated, we have to restart the EC2 instance to pick it up.
The new stack will be we build the bundle from Github code via Jenkins, push to s3 which is fronted by a cloud-front distribution which notifies the
lambda@edge function on request. So when user request a page, if it is the entry point(/bank/xxx), as it has no extension, cloud-front will have a
miss and forward the request to origin. At this point, the lambda function we registered on the
origin request life cycle will receive this request before it goes to origin and this is perfect time to do manipulation. So here in the lambda function, we request the html file from origin by adding the
.html extension, then request the dynamic data form
CMS and combine them the function and return to user directly. What’s happening next is browser will parse the html and send requests for the resource to
CloudFront where we cloud either serve from CDN cache or fetch from S3 origin.
When code in Github updates, we just need to have a hook to trigger a Jenkins build to push the new artifacts to s3. One thing to notice is we need to set the entry html file
TTL to 0 on CloudFront so that we do not have to invalidate it explicitly when deploying new code. It is a trade-off.
I was having a hard time with
lambda@edge logging on CloudWatch. The function I triggered from lambda test console logs fine however when the function is triggered via CloudFront, it does not appear on the
/aws/lambda/Function_Name log path. I had to open an enterprise aws support ticket for it. Turns out that the function triggered by CloudFront logs have a
region prefix, like:
CloudFront Trigger Selection
There are currently(as of 09/015/2018) 4 triggers we can choose from:
- the time a viewer request is received
- the time of cache miss and send request to origin
- the time it receives response from origin and before it caches the object
- the time it returns the content to the viewer.
So the type 1 and 4 are kind of the
expensive and heavy hook that are triggered on each request no matter what! Be careful when they are selected as it may increase the latency as well as the cost. The
origin request is a perfect life cycle hook in this use case as we only what the entry point to be manipulated. The following real
assets request can still be handled by CloudFront and leverage its cache capability.