nginx reverse proxy S3 files

China access issue

Recently some of our church site users reported that the sermon audio/video download feature does not work any more. We recently moved our large files from file system to s3. After some research looks like the aws s3 is blocked by the famous Chinese Great FireWall(GFW).

Possible Solutions

Moving files back to file system(EBS) is one option but maybe too much as we already decided to store files in S3 which is much cheaper and easier to maintain.

Tried 2nd way which is using our existing web server Nginx as a reverse proxy. The directives in nginx is quite tricky and there are a lot of conventions which is not easier to document as well as debug.

The config is quite simple though

location ^~ /s3/ {
    rewrite /s3/(.*) /$1 break;

    proxy_set_header Host '';
    proxy_set_header Authorization '';
    proxy_hide_header x-amz-id-2;
    proxy_hide_header x-amz-storage-class;
    proxy_hide_header x-amz-request-id;
    proxy_hide_header Set-Cookie;
    proxy_ignore_headers "Set-Cookie";

Basically have the url match whatever starts with /s3 and rewrite the current url with a regex so that we can get rid of the /s3 prefix and get the real key in s3. Then use break keyword so that it continues rather than the last which would try to find another handler. Next is use the proxy_pass to proxy the request to S3 with the processed real key. The other settings are just hide all the amazon response headers etc.

Nginx location order

From the HttpCoreModule docs:

  1. Directives with the “=” prefix that match the query exactly. If found, searching stops.
  2. All remaining directives with conventional strings. If this match used the “^~” prefix, searching stops.
  3. Regular expressions, in the order they are defined in the configuration file.
  4. If #3 yielded a match, that result is used. Otherwise, the match from #2 is used.

Example from the documentation:

location  = / {
  # matches the query / only.
  [ configuration A ] 
location  / {
  # matches any query, since all queries begin with /, but regular
  # expressions and any longer conventional blocks will be
  # matched first.
  [ configuration B ] 
location /documents/ {
  # matches any query beginning with /documents/ and continues searching,
  # so regular expressions will be checked. This will be matched only if
  # regular expressions don't find a match.
  [ configuration C ] 
location ^~ /images/ {
  # matches any query beginning with /images/ and halts searching,
  # so regular expressions will not be checked.
  [ configuration D ] 
location ~* \.(gif|jpg|jpeg)$ {
  # matches any request ending in gif, jpg, or jpeg. However, all
  # requests to the /images/ directory will be handled by
  # Configuration D.   
  [ configuration E ] 

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s