Stack and Heap

The stack is the memory set aside as scratch space for a thread of execution. When a function is called, a block is reserved on the top of the stack for local variables and some bookkeeping data. When that function returns, the block becomes unused and can be used the next time a function is called. The stack is always reserved in a LIFO order; the most recently reserved block is always the next block to be freed. This makes it really simple to keep track of the stack; freeing a block from the stack is nothing more than adjusting one pointer.

The heap is memory set aside for dynamic allocation. Unlike the stack, there’s no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time; there are many custom heap allocators available to tune heap performance for different usage patterns.

Each thread gets a stack, while there’s typically only one heap for the application (although it isn’t uncommon to have multiple heaps for different types of allocation).

To answer your questions directly:

  • The OS allocates the stack for each system-level thread when the thread is created. Typically the OS is called by the language runtime to allocate the heap for the application.
  • The stack is attached to a thread, so when the thread exits the stack is reclaimed. The heap is typically allocated at application startup by the runtime, and is reclaimed when the application (technically process) exits.
  • The size of the stack is set when a thread is created. The size of the heap is set on application startup, but can grow as space is needed (the allocator requests more memory from the operating system).
  • The stack is fasterbecause the access pattern makes it trivial to allocate and deallocate memory from it (a pointer/integer is simply incremented or decremented), while the heap has much more complex bookkeeping involved in an allocation or free. Also, each byte in the stack tends to be reused very frequently which means it tends to be mapped to the processor’s cache, making it very fast.Stack:
    • Stored in computer RAM like the heap.
    • Variables created on the stack will go out of scope and automatically deallocate.
    • Much faster to allocate in comparison to variables on the heap.
    • Implemented with an actual stack data structure.
    • Stores local data, return addresses, used for parameter passing
    • Can have a stack overflow when too much of the stack is used. (mostly from inifinite (or too much) recursion, very large allocations)
    • Data created on the stack can be used without pointers.
    • You would use the stack if you know exactly how much data you need to allocate before compile time and it is not too big.
    • Usually has a maximum size already determined when your program starts

    Heap:

    • Stored in computer RAM like the stack.
    • Variables on the heap must be destroyed manually and never fall out of scope. The data is freed with delete, delete[] or free
    • Slower to allocate in comparison to variables on the stack.
    • Used on demand to allocate a block of data for use by the program.
    • Can have fragmentation when there are a lot of allocations and deallocations
    • In C++ data created on the heap will be pointed to by pointers and allocated with new or malloc
    • Can have allocation failures if too big of a buffer is requested to be allocated.
    • You would use the heap if you don’t know exactly how much data you will need at runtime or if you need to allocate a lot of data.
    • Responsible for memory leaks

The most important point is that heap and stack are generic terms for ways in which memory can be allocated. They can be implemented in many different ways, and the terms apply to the basic concepts.

  • In a stack of items, items sit one on top of the other in the order they were placed there, and you can only remove the top one(without toppling the whole thing over).Stack like a stack of papers
  • In a heap, there is no particular order to the way items are placed. You can reach in and remove items in any order because there is no clear ‘top’ item.Heap like a heap of licorice allsorts

It does a fairly good job of describing the two ways of allocating and freeing memory in a stack and a heap. Yum!

Read More

Advertisements

KMP 算法

KMP算法用于字符串的模式匹配,也就是查找某个字符串里是否存在某个子串,并返回位置,也就是Java/C++里的String.indexOf()方法。  字符串匹配最朴素的算法就是一个一个去比较,出错了就回溯到下一个起点继续比较。这样的复杂度有O(mn),即原字符串×匹配串。而KMP算法能把时间复杂度降到O(m+n)。KMP这个缩写来自于Knuth,Morris和Pratt,瞬间想到了AVL树。

前面提到,朴素的匹配算法在出错后需要回溯到原来起点。但KMP不需要回溯到起点,直接从出错的位置继续进行比较,从而达到了O(n)。之所以不需要回溯,KMP的秘诀在于其用匹配串(Pattern)计算出来的一个数组F。每当进行字符串的比较卡壳在匹配串的第j个位置时(即Target[i] != Pattern[j]),查阅这个数组F,就可以知道下一次比较时j的值。举个例子:

  • Source串(Target):              a b a b a b a b a c b
  • 匹配串(Pattern):                a b a b a c b
  • 根据Pattern得出的数组F:  0 0 0 1 2 3 0  (P怎么计算的后面会讲)  数字代表如果这次mismatch需要回到退几位

一一比较到第五个时前面的字符串都相等,到第六个时,由于Target[5] != Pattern[5],所以需要知道下一次比较时,新的j值。j值越大,省略的比较越多。对于朴素的匹配算法,一旦卡壳在位置j,则下一次直接把j置0,即从第一个开始比较,并且回到这次比较的起点。而KMP则是得到新的j值(新的j值显然小于当前j值),并且不回到比较的起点,而是从这个地方继续比较下去,不吃回头草。

由于P只需要根据Pattern生成一次,所以KMP特别适用于多个Target字符串,单个Pattern串的情况。

/** * Knuth-Morris-Pratt Algorithm for Pattern Matching */
public class KMPMatch {

  private String text;
  private String pattern;
  private int[] failure;
  private int matchPoint;

  ...
   /** * Finds the first occurrence of the pattern in the text. */
  public boolean match() {
    int j = 0;
    if (text.length() == 0) return false;

    for (int i = 0; i < text.length(); i++) {
     // p1: only execute when previous match is found and mismatch happend
      while (j > 0 && pattern.charAt(j) != text.charAt(i)) {
        j = failure[j - 1];
      }
      //p2: this p2 always execute before p1 
      if (pattern.charAt(j) == text.charAt(i)) { j++; }
      if (j == pattern.length()) {
        matchPoint = i - pattern.length() + 1;
        return true;
      }
    }
    return false;
  }
   /** * Computes the failure function using a boot-strapping process, * where the pattern is matched against itself. */
  private void computeFailure() {
    int j = 0;
    for (int i = 1; i < pattern.length(); i++) {
      while (j > 0 && pattern.charAt(j) != pattern.charAt(i)) { 
        j = failure[j - 1];
      }
      if (pattern.charAt(j) == pattern.charAt(i)) { 
        j++; 
      }
      failure[i] = j;
    }
  }
  ...
}

Minimal Number of Coins for Change

Problem: Please implement a function which gets the minimal number of coins, whose value is v1, v2, …, vn, to make change for an amount of money with value t. Any coin with value vi may duplicate for any times to make change.
For example, the minimal number of coins to make change for 15 out of a set of coins with value 1, 3, 9, 10 is 3. We can choose two coins with value 3 and a coin with value 9. The number of coins for other choices should be greater than 3.
Analysis: Firstly let us define a function f(t) which is the minimal number of coins to make change for total value t. If there are n different coins, we have n choices to make change for value t: we can add a coin with value v1 into a set of coins whose total value is t-v1. The minimal number of coins to get value t-v1 is f(t-v1). Similarly, we can add a coin with value v2 into a set of coins whose total value is t-v2. The minimal number of coins to get value t-v2 is f(t-v2)…
Therefore, we divide a problem to calculate f(t) into n sub-problems: f(t-v1), f(t-v2), …, f(t-vn). We can get a formal equation for f(t) as the following accordingly:
This equation can be implemented with recursion easily. However, the recursive solution will cause serious performance issues since there are overlaps when we divide this problem into n sub-problems. A better solution is to utilize iteration, and store the result of sub-problems into a table (as the Table 1).
In the Table 1, each column except the first one is to denote the number of coins to make change for a specific value. We can calculate the numbers in the Table 1 from left to right, to simulate the iterative process to get the result of f(15).
For instance, there are two numbers 4 and 2 under the column title “6”. We have two alternatives to make change for 6: the first one is to add a coin with value 1 to a set of coins whose total value is 5. Since the minimal number of coins to get value 5 is 3 (highlighted number under the column tile “5”), the number in the first cell under column title “6” is 4 (4=3+1). The second choice is to add a coin with value 3 to a set of coins whose total value is 3. Since the minimal number of coins to get value 3 is 1 (highlighted number under the column tile “3”), the number in the second cell under column title “6” is 2 (2=1+1). We highlight the number 2 in the column under tile 6 because 2 is less than 4.
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
0
1
2
3
2
3
4
3
4
5
2
2
3
3
4
5
3
0
1
2
3
2
3
4
3
4
5
2
2
3
3
9
0
1
2
3
2
3
4
3
10
0
1
2
3
2
3
4
Table 1: The iterative process to calculate the minimal number of coins to make changes for 15.
Even though we have a 2-D matrix to show the iterative process, it only requires a 1-D array for coding, because it is only necessary to store the minimal number of coins to make change for each total value. The sample code is shown below:
int GetMinCount(int total, int coins[], int length)
{
    int* counts = new int[total + 1];
    counts[0] = 0;
    const int MAX = 0x7FFFFFFF;
    for(int i = 1; i <= total; ++ i)
    {
        int count = MAX;
        for(int j = 0; j < length; ++ j)
        {
            if(i – coins[j] >= 0 && count > counts[i – coins[j]])
                count = counts[i – coins[j]];
        }
        if(count < MAX)
            counts[i] = count + 1;
        else
            counts[i] = MAX;
    }
    int minCount = counts[total];
    delete[] counts;
    return minCount;

}

Maximum Sum of All Sub-arrays

Question: A sub-array has one number of some continuous numbers. Given an integer array with positive numbers and negative numbers, get the maximum sum of all sub-arrays. Time complexity should be O(n).
For example, in the array {1, -2, 3, 10, -4, 7, 2, -5}, its sub-array {3, 10, -4, 7, 2} has the maximum sum 18.
Solution : Analyze arrays number by number
store 1st element as the largest, add the 2nd, if sum is smaller the the current number, discard the current largest and use the current number.
class MaxSub{
 public int findMaxSub(int[] a){
 if(a== null || a.length == 0) return new ImporperInputException();
 if(a.length == 1) return a[0];
 int tempMax=a[0], startIndex=0, endIndex=0; sum=a[0];
 for(int i=1; i<a.length; i++){
 sum=a[i]+tempMax;
 if(a[i] >= tempMax){
 tempMax=a[i]
 startIndex = i;
 endIndex = i;
 sum=a[i];
} else{
 if(sum>tempMax){
 tempMax=sum; endIndex=i;}
}
}
 return tempMax;
}
}

The Least k Numbers

Question: Please find out the least k numbers out of n numbers. For example, if given the 8 numbers 4, 5, 1, 6, 2, 7, 3 and 8, please return the least 4 numbers 1, 2, 3 and 4.
Analysis: The naïve solution is sort the n input numbers increasingly, and the least k numbers should be the first k numbers. Since it needs to sort, its time complexity is O(nlogn). Interviewers will ask us explore more efficient solutions.

1.    use a k data container, and take first k numbers in. then compare the max in the container with the coming numbers, if coming number is smaller then replace. otherwise, next.

 if we use binary search tree, find&insert&delete would be logk, , the complexity would be O(nlogk).
 If max heap, O(1) to get max, but logk to insert and delete.

2. use the partition function which returns the index of pivot in quick sort.

 until we get the index = k-1, we have all the k-1 numbers in the left which are smaller than the pivot(kth number).
void GetLeastNumbers(int[] input, int n, int[] output, int k)
{
    if(input == NULL || output == NULL || k > n || n <= 0 || k <= 0)
        return;
    int start = 0;
    int end = n – 1;
    int index = Partition(input, n, start, end);
    while(index != k – 1)
    {
        if(index > k – 1)
        {
            end = index – 1;
            index = Partition(input, n, start, end);
        }
        else
        {
            start = index + 1;
            index = Partition(input, n, start, end);
        }
    }
    for(int i = 0; i < k; ++i)
        output[i] = input[i];
}

Post-order Traversal Sequences of Binary Search Trees

Problem: Determine whether an input array is a post-order traversal sequence of a binary tree or not. If it is, return true; otherwise return false. Assume all numbers in an input array are unique.
For example, if the input array is {5, 7, 6, 9, 11, 10, 8}, true should be returned, since it is a post-order traversal sequence of the binary search tree in Figure 1. If the input array is {7, 4, 6, 5}, false should be returned since there are no binary search trees whose post-order traversal sequence is such an array.
Analysis: The last number is a post-order traversal sequence is the value of root node. Other numbers in a sequence can be partitioned into two parts: The left numbers, which are less than the value of root node, are value of nodes in left sub-tree; the following numbers, which are greater than the value of root node, are value of nodes in right sub-tree.
every left and right part need to be “true”.
boolean VerifySquenceOfBST(int sequence[], int length)
{
    if(sequence == NULL || length <= 0)
        return false;
    int root = sequence[length – 1];
    // nodes in left sub-tree are less than root node
    int i = 0;
    for(; i < length – 1; ++ i)
    {
        if(sequence[i] > root)
            break;
    }
    // nodes in right sub-tree are greater than root node
    int j = i;
    for(; j < length – 1; ++ j)
    {
        if(sequence[j] < root)
            return false;
    }
    // Is left sub-tree a binary search tree?
    bool left = true;
    if(i > 0)
        left = VerifySquenceOfBST(sequence, i);
    // Is right sub-tree a binary search tree?
    bool right = true;
    if(i < length – 1)
        right = VerifySquenceOfBST(sequence + i, length – i – 1);
    return (left && right);

}