js tilde IIEF

// Without superfluous operator, we need to surround the anonymous ‘scoping’ function in
// parenthesis to force it to be parsed as an expression instead of a *declaration*,
// which allows us to immediately function-call-pattern it.
   // ...

// By inserting a superfluous operator, we can omit those parentheses,
// as the operator forces the parser to view the anonymous function as
// an expression *within* the statement, instead of as the
// statement itself, which saves us a character overall, as well as some ugliness:
   // ...

// But, in all of the above examples, if one is depending on ASI, and
// doesn't needlessly scatter semicolons all over their code out of ignorance,
// a prepended semicolon is necessary to prevent snafus like the following:
var foo = 4
   // ...
// ... in which case, the variable `foo` would be set to a crazy
// addition / concatenation involving the (probably non-existent) *return value*
// of our anonymous ‘scoping’ function. Hence, our friend the bitflip:
var foo = 4
   // ...
// ... he solves all of our problems, by disnecessitating the prepended semicolon
// *and* the wrapping parentheses.

JVM JIT and running mode


At the beginning of Java, all code are executed in interpreted manner, I.E execute code line by line after interpretation which would result in slowness. Especially for code that are frequently executed.

So later JIT was introduced so that when when some code are frequently executed, it becomes ‘Hot Spot Code’ and would be compiled to  machine code. Typically there 2 types of hot-spot-code:

  1. A function that are called very frequently
  2. The code in loop.

JVM maintains a count as of how many time a function is executed. If this count exceeds a predefined limit JIT compiles the code into machine language which can directly be executed by the processor (unlike the normal case in which javac compile the code into bytecode and then java – the interpreter interprets this bytecode line by line converts it into machine code and executes).

Also next time this function is calculated same compiled code is executed again unlike normal interpretation in which the code is interpreted again line by line.

Server vs client mode

when we check java version, there are 3 lines, we will check what the the 3rd line.

---> java -version

java version "1.8.0_66"

Java(TM) SE Runtime Environment (build 1.8.0_66-b17)

Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

Hotspot jvm has 2 mode: client and server. we can use java -server xxx / java -client xxx to start. The former is more heavier on compilation and optimization hence with longer startup time.


use java -Xint/comp/mixed to trigger, mixed is the default. 

java -Xint -version 

will show that it is in the interpreted mode so that code will be interpreted and executed line by line which would be quite slow specially in loop. On the other hand, -Xcomp will execute the code after compile all the code to machine code. 


GC friendly java coding

1. Give size when initing collections

When initing Map/List etc, if we could know the size, pass into the constructor. This way JVM does not have to constantly allocate memory to new arrays when the size grows.

2. Use immutable

Immutability has a lot of benefits. One of them is typically ignored which is its effect on GC.

We cannot modify the fields/references in the immutables.

public class ObjectPair {

   private final Object first;
   private final Object second;

   public ObjectPair(Object first, Object second) {
       this.first = first;
       this.second = second;

    publicObject getFirst() {
       return first;

   public Object getSecond() {
       return second;


This means immutables cannot reference objects that are created after it. This way when GC is performed in the YoungGen, it could skip the immutables in OldGen which will not have reference to the objects in the current YoungGen which means less memory page scan and result in shorter GC cycle.

3. prefer Stream to big blob

When dealing with large files, reading it to memory will generate a large object on heap and could easily result in OOM. Expose the file as Stream and do proper handling would be more efficient since typically we would have many API for processing Streams.

4. string concat in loop

Typically java compiler will do pretty good optimization for the String concats(with ‘+’) and use StringBuilder.

However when it comes to for loop, it would be a different story. The temp strings in the loop will result in a lot of new StringBuilder objects being created. So a better way is to use StringBuilder directly when it comes to concat with loop. More detail can be found in this SO answer.

hoist for var, let, const, function, function*, class

I have been playing with ES6 for a while and I noticed that while variables declared with var are hoisted as expected…

console.log(typeof name); // undefined
var name = "John";

…variables declared with let or const seem to have some problems with hoisting:

console.log(typeof name); // ReferenceError
let name = "John";


console.log(typeof name); // ReferenceError
const name = "John";


these variables cannot be accessed before they are declared. However, it’s a bit more complicated than that.

Are variables declared with let or const not hoisted? What is really going on here?

All declarations (var, let, const, function, function*, class) are hoisted in JavaScript. This means that if a name is declared in a scope, in that scope the identifier will always reference that particular variable:

x = "global";
(function() {
    x; // not "global"

    var/let/… x;
    x; // not "global"

    let/const/… x;

This is true both for function and block scopes1.

The difference between var/function/function* declarations and let/const/classdeclara­tions is the initialisation.
The former are initialised with undefined or the (generator) function right when the binding is created at the top of the scope. The lexically declared variables however stay uninitialised. This means that a ReferenceError exception is thrown when you try to access it. It will only get initialised when the let/const/class statement is evaluated, everything above that is called the temporal dead zone.

x = y = "global";
(function() {
    x; // undefined
    y; // Reference error: y is not defined

    var x = "local";
    let y = "local";

Notice that a let y; statement initialises the variable with undefined like let y = undefined;would have.

Is there any difference between let and const in this matter?

No, they work the same as far as hoisting is regarded. The only difference between them is that a constant must be and can only be assigned in the initialiser part of the declaration (const one = 1;, both const one; and later reassignments like one = 2 are invalid).




Hive Hadoop mapper size

Below is an article from MapR

Hive table contains files in HDFS, if one table or one partition has too many small files, the HiveQL performance may be impacted.
Sometimes, it may take lots of time to prepare a MapReduce job before submitting it, since Hive needs to get the metadata from each file.
This article explains how to control the file numbers of hive table after inserting data on MapRFS; or simply saying, it explains how many files will be generated for “target” table by below HiveQL:


Above HiveQL may have below 2 major steps:

1. MapReduce(In this example, Map only) job to read the data from “source” table.

The number of Mappers determines the number of intermediate files, and the number of Mappers is determined by below 3 factors:

a. hive.input.format

Different input formats may start different number of Mappers in this step.
Default value in Hive 0.13 is org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.
It will combine all files together and then try to split, so that it can improve the performance if the table has too many small files.
One old default value is org.apache.hadoop.hive.ql.io.HiveInputFormat which will split each file separately. Eg, If you have 10 small files and each file only has 1 row, Hive may spawn 10 mappers to read the whole table.
This article is using the default CombineHiveInputFormat as example.

b. File split size

mapred.max.split.size and mapred.min.split.size control the “target” file split size.
(In latest Hadoop 2.x, those 2 parameters are deprecated. New ones are mapreduce.input.fileinputformat.split.maxsize and mapreduce.input.fileinputformat.split.minsize).
For example, if one Hive table have one 1GB file, and the target split size is set to 100MB, 10 mappers MAY be spawned in this step. The reason of “MAY” is because of below factor c.

c. MapR-FS chunk size

Files in MapR-FS are split into chunks (similar to Hadoop blocks) that are normally 256 MB by default. Any multiple of 65,536 bytes is a valid chunk size.
The actual split size is max(target split size, chunk size).
Take above 1GB file with 100MB “target” split size for example, if the chunk size is 200MB, then the actual split size is 200MB, 5 mappers spawned; if the chunk size is 50MB, then the actual split size is 100MB, 10 mappers spawned.

Lab time:

Imagine here we have prepared 3 hive tables with the same size — 644MB and only 1 file for each table.
The only difference is the chunk size of the 3 hive tables.
source  — chunk size=8GB.
source2 — chunk size=256MB(default in mfs).
source3 — chunk size=64k(Minimum).

# hadoop mfs -ls /user/hive/warehouse/|grep -i source
drwxr-xr-x Z U   - root root          1 2014-12-04 11:22 8589934592 /user/hive/warehouse/source
drwxr-xr-x Z U   - root root          1 2014-12-04 11:31  268435456 /user/hive/warehouse/source2
drwxr-xr-x Z U   - root root          1 2014-12-04 12:24      65536 /user/hive/warehouse/source3

Then the question is how many mappers will be spawned for below INSERT, after setting target split size to 100MB?

set mapred.max.split.size=104857600;
set mapred.min.split.size=104857600;

1.  Table “source”
The whole table 644MB is in 1 chunk(8G each), so only 1 mapper.
2. Table “source2”
The whole table 644MB is in 3 chunks(256MB each), so 3 mappers.
3. Table “source3”
The whole table 644MB is in more than 10000 chunks(64KB each), and target split size(100MB) is larger than each chunk size 100MB, so 7 mappers.

Thinking accordingly, if the target split size is 10GB, for all above 3 tables, only 1 mapper will be spawned in the 1st step; If the target split size is 1MB, the mappers counts are : source(1), source2(3), source3(645).
After figuring out the 1st step, let’s move to the 2nd step.

2. Small file merge MapReduce job

After the 1st MapReduce job finishes, Hive will decide if it needs to start another MapReduce job to merge the intermediate files. If small file merge is disabled, the number of target table files is the same as the number of mappers from 1st MapReduce job. Below 4 parameters determine if and how Hive does small file merge.

  • hive.merge.mapfiles — Merge small files at the end of a map-only job.
  • hive.merge.mapredfiles — Merge small files at the end of a map-reduce job.
  • hive.merge.size.per.task — Size of merged files at the end of the job.
  • hive.merge.smallfiles.avgsize — When the average output file size of a job is less than this number, Hive will start an additional map-reduce job to merge the output files into bigger files. This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs if hive.merge.mapredfiles is true.

By default hive.merge.smallfiles.avgsize=16000000 and hive.merge.size.per.task=256000000, so if the average file size is about 17MB, the merge job will not be triggered. Sometimes if we really want only 1 file being generated in the end, we need to increase hive.merge.smallfiles.avgsize to large enough to trigger the merge; and also you need to increase hive.merge.size.per.task to the get the needed number of files in the end.

Quiz time:

In hive 0.13 on MapR-FS with all default configurations, how many files will be generated in the end for below HiveQL? Have a guess on the file size in the end?
Reminder: chunk size for “source3” is 64KB.

set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.merge.smallfiles.avgsize=104857600;
set hive.merge.size.per.task=209715200;
set mapred.max.split.size=68157440;
set mapred.min.split.size=68157440;


set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.merge.smallfiles.avgsize=283115520;
set hive.merge.size.per.task=209715200;
set mapred.max.split.size=68157440;
set mapred.min.split.size=68157440;

1. Target split size is 65MB and chunk size is only 64KB, so 1st job will spawn 10 mappers and each mapper will generate one 65MB intermediate file.
Merge job will be triggered because average file size from previous job is less than 100MB(hive.merge.smallfiles.avgsize).
For each task, to achieve file size 200MB(hive.merge.size.per.task), 4 x 65MB files will be merged into one 260MB file.
So in the end, 3 files will be generated for target table — 644MB = 260MB+260MB+124MB.

[root@n2a warehouse]# ls -altr target
total 659725
-rwxr-xr-x  1 root root 130296036 Dec  5 17:26 000002_0
-rwxr-xr-x  1 root root 272629772 Dec  5 17:26 000001_0
drwxr-xr-x  2 root root         3 Dec  5 17:26 .
-rwxr-xr-x  1 root root 272629780 Dec  5 17:26 000000_0
drwxr-xr-x 38 mapr mapr        37 Dec  5 17:26 ..

2. Target split size is 65MB and chunk size is only 64KB, so 1st job will spawn 10 mappers and each mapper will generate one 65MB intermediate file.
Merge job will be triggered because average file size from previous job is less than 270MB(hive.merge.smallfiles.avgsize).
For each task, to achieve file size 200MB(hive.merge.size.per.task), 4 x 65MB files *should* be merged into one 260MB file. However if so, the average file size is still less than 270MB(hive.merge.smallfiles.avgsize), so they are still considered as “small files”.
In this case, 5 x 65MB files are merged into one 325MB file.
So in the end, 2 files will be generated for table table — 644MB = 325MB+319MB.

[root@n1a warehouse]# ls -altr target
total 659724
-rwxr-xr-x  1 root root 334768396 Dec  8 10:46 000001_0
drwxr-xr-x  2 root root         2 Dec  8 10:46 .
-rwxr-xr-x  1 root root 340787192 Dec  8 10:46 000000_0
drwxr-xr-x 38 mapr mapr        37 Dec  8 10:46 ..

Key takeaways:

1.  MapR-FS chunk size and target split size determine the number of mappers and the number of intermediate files.
2. Small file merge job controls the final number of files for target table.
3. Too many small files for one table may introduce job performance overhead.


The Oracle RDBMS allows you to collect statistics of many different kinds as an aid to improving performance. DBMS_STATS package is concerned with optimizer statistics only. Given that Oracle sets automatic statistics collection of this kind on by default, DBMS_STATS package is intended for only specialized cases.


Most enterprise databases, Oracle included, use a cost-based optimizer to determine the appropriate query plan for a given SQL statement. This means that the optimizer uses information about the data to determine how to execute a query rather than relying on rules (this is what the older rule-based optimizer did).

For example, imagine a table for a simple bug-tracking application


  issue_id number primary key,

  issue_text clob,

  issue_status varchar2(10)


CREATE INDEX idx_issue_status

    ON issues( issue_status );

If I’m a large company, I might have 1 million rows in this table. Of those, 100 have an issue_status of ACTIVE, 10,000 have an issue_status of QUEUED, and 989,900 have a status of COMPLETE. If I want to run a query against the table to find my active issues


  FROM issues

WHERE issue_status = ‘ACTIVE’

the optimizer has a choice. It can either use the index on issue_status and then do a single-row lookup in the table for each row in the index that matches or it can do a table scan on the issues table. Which plan is more efficient will depend on the data that is in the table. If Oracle expects the query to return a small fraction of the data in the table, using the index would be more efficient. If Oracle expects the query to return a substantial fraction of the data in the table, a table scan would be more efficient.

DBMS_STATS.GATHER_TABLE_STATS is what gathers the statistics that allow Oracle to make this determination. It tells Oracle that there are roughly 1 million rows in the table, that there are 3 distinct values for the issue_status column, and that the data is unevenly distributed. So Oracle knows to use an index for the query to find all the active issues. But it also knows that when you turn around and try to look for all the closed issues


  FROM issues

WHERE issue_status = ‘CLOSED’

that it will be more efficient to do a table scan.

Gathering statistics allows the query plans to change over time as the data volumes and data distributions change. When you first install the issue tracker, you’ll have very few COMPLETED issues and more ACTIVE and QUEUED issues. Over time, the number of COMPLETED issues rises much more quickly. As you get more rows in the table and the relative fraction of those rows that are in the various statuses change, the query plans will change so that, in the ideal world, you always get the most efficient plan possible.

DNS原理以及A/NS Record Cname

阮一峰 老师的一篇关于DNS的好博客,尤其喜欢里面对于分级查询以及A-Record, NS-Record, CNAME的解释, 简单明了, 所以转载了这一部分如下:







根域名的下一级,叫做”顶级域名”(top-level domain,缩写为TLD),比如.com.net;再下一级叫做”次级域名”(second-level domain,缩写为SLD),比如http://www.example.com里面的.example,这一级域名是用户可以注册的;再下一级是主机名(host),比如http://www.example.com里面的www,又称为”三级域名”,这是用户在自己的域里面为服务器分配的名称,是用户可以任意分配的。



# 即






  1. 从”根域名服务器”查到”顶级域名服务器”的NS记录和A记录(IP地址)
  2. 从”顶级域名服务器”查到”次级域名服务器”的NS记录和A记录(IP地址)
  3. 从”次级域名服务器”查出”主机名”的IP地址








$ dig +trace math.stackexchange.com









七、NS 记录的查询


$ dig ns com
$ dig ns stackexchange.com


$ dig +short ns com
$ dig +short ns stackexchange.com




(1) A:地址记录(Address),返回域名指向的IP地址。

(2) NS:域名服务器记录(Name Server),返回保存下一级域名信息的服务器地址。该记录只能设置为域名,不能设置为IP地址。

(3)MX:邮件记录(Mail eXchange),返回接收电子邮件的服务器地址。

(4)CNAME:规范名称记录(Canonical Name),返回另一个域名,即当前查询的域名是另一个域名的跳转,详见下文。

(5)PTR:逆向查询记录(Pointer Record),只用于从IP地址查询域名,详见下文。



$ dig facebook.github.io


facebook.github.io. 3370    IN  CNAME   github.map.fastly.net.
github.map.fastly.net.  600 IN  A




$ dig -x


;; ANSWER SECTION: 3600 IN    PTR pages.github.com.




$ dig a github.com
$ dig ns github.com
$ dig mx github.com