Subset Sum algorithm

It seems I am late to the party, here are my two cents. We will create a boolean[] solution[n+1][k+1] such that solution[i][j] is true if using first i items (index 0 to i-1) we can get sum j from the set; else false. We will return solution[k][n] finally:

We can deduce the following points:

  1. if sum is zero then always a possible answer (empty set) for any number of elements. So all true.
  2. if set is empty we cannot have any subset hence no way to get any K. So never a possible answer. All false.
  3. if a subset X1 (subset of X without last element in X) has a subset-sum for k then X also has it which is X1. E.g. for X1={1,3,5} and k=8, if X1 has a subset-sum then X={1,3,5,7} also has a subset-sum
  4. For i/p set X = {1,3,5,7,19} and k=20, if X wants to know possibility of subset-sum for 20 then it does if x1={1,3,5,7} can have a subset-sum of 20-19 i.e. 1. It applies only if k >= 19 i.e. last element in X.

Based on the above points we can easily write the algorithm as below.

public class SubSetSum {
    boolean[][] solution; 
    int[] input;
    int k;

    public SubSetSum(int[] input, int targetSum) {
        this.input = input;
        this.k = targetSum;
        this.solution = new boolean[input.length+1][k+1];
    }

    public boolean subsetSum() {
        int n = input.length;

        for (int i = 0; i <= n; i++) {     //case 1
            solution[i][0] = true;
        }

        for (int j = 0; j <= k; j++) {    // case 2
            solution[0][j] = false;
        }

        for (int i = 1; i <= n; i++) {                  // n times
            for (int j = 1; j <= k; j++) {              // k times and time complexity O(n*k)
                if(solution[i-1][j]) {
                    solution[i][j] = solution[i-1][j];      // case 3
                    continue;
                }
                if(j >= input[i-1])  {                       // case 4
                    solution[i][j] = solution[i-1][j-input[i-1]];
                }
            }
        }
        return solution[n][k];
    }
}

Subset Sum is the first NP-complete problem I learned at Macalester. This question is viewed 36000+ times yet I don't see a sufficient answer that explains the algorithm in detail with logic. So I thought I make an attempt to do so.

Assumption:

For the sake of simplicity first I made the assumption that the input set X contains only positive integers and k is positive. However, we can tweak the algorithm to handle negative integers and the case if k is negative.

Logic:

The key to this algorithm or really any DP problem is to break down the problem and start simply from a base case. then we can build on the base case using some knowledge that we know:

  1. we know that if the set X is empty then there is no way we can sum to any value of k.
  2. If a set X contains k then it has a subset sum to k.
  3. we know that if a subset of set x1 who is a subset of X sum to k1 then X will have a subset that sum to k1 namely x1.
  4. we have a set X = {x1, x1, x3, ......., xn, xn+1}. We know it has a subset sum to k1 if x1 = {x1, x1, x3, ......., xn} has a subset sum to k - k1.

Example to illustrate 1,2,3,4:

  1. it is easy. if you have an empty set {}. you can't have a subset thus you can't have any subset sum.
  2. A set X = {4} has a subset sum to 4 because 4 it self is part of the set

  3. say you have a set x1 = {1,3,5} who is a subset of set X = {1,3,5,2,8}. if x1 has a subset sum to k1 = 8 then that means X also has a subset sum to 8 because x1 is a subset of X

  4. say you have a set X = {1,3,5,2,19} and we want to know does it have a subset sum to 20. It does and one way to can know if that is x1 = {1,3,5,2} can sum to (20 - 19) = 1. Since x1 has a subset sum to 1 then when we add 19 to the set x1 we can take that new number 1 + 19 = 20 to create our desired sum 20.

Dynamically build a matrix Cool! now let us utilize the above four logics and start building from the base case. We are going to build a matrix m. We define:

  • matrix m has i+1 rows and k + 1 columns.

  • Each cell of the matrix has value true or false.

  • m[i][s] returns true or false to indicate the answer to this question: "using the first i items in the array can we find a subset sum to s? " m[i][s]returns true for yes and false for no

(note the Wikipedia answer or most of the people build a function m(i,s) but I thought the matrix is an easy way to understand dynamic programming. It works well when we only have positive numbers in the set or array. However the function route is better because you don't have to deal with index out of range, match index of the array and sum to the matrix.....)

Let us build the matrix using an example:

X = {1,3,5,2,8}
k = 9

We are going to build the matrix row by row. We ultimately want to know the cell m[n][k] contain true or false.

First Row: Logic 1. told us that the first row of the matrix should all be false.

   0 1 2 3 4 5 6 7 8 9
   _ _ _ _ _ _ _ _ _ _
0| F F F F F F F F F F
1|
2|
3|
4|
5|

Second Row and above: Then for the second row or above, we can use logic 2,3,4 to help us populate the matrix.

  • logic 2 tells us that m[i][s] = (X[i-1] == s) rememebr m[i] is referring to the ith item in X which is X[i-1]
  • logic 3 tells us that m[i][s] = (m[i-1][s]) this is looking at the cell direclty above.
  • logic 4 tells us that m[i][s] = (m[i-1][s - X[i-1]]) this is looking at the row above and left of X[i-1] cells.

If any of these is true then m[i][s] is true otherwise false. so we can rewrite 2,3,4 into m[i][s] = (m[i-1][s] || a[i-1] == s || m[i-1][s - a[i-1]])

Use these above logics to populate the matrix m. In our example, it looks like this.

   0 1 2 3 4 5 6 7 8 9
   _ _ _ _ _ _ _ _ _ _
0| F F F F F F F F F F
1| F T F F F F F F F F
2| F T F T T F F F F F 
3| F T F T T T T F T T
4| F T T T T T T T T T 
5| F T T T T T T T T T

Now use the matrix to answer your question:

look at m[5][9] which is the original question. using the first 5 items (which is all the items) can we find a subset sum to 9 (k)? and the answer is indicated by that cell which is true

Here is the Code:

import java.util.*;

public class SubSetSum {

    public static boolean subSetSum(int[] a, int k){

        if(a == null){
            return false;
        }

        //n items in the list
        int n = a.length; 
        //create matrix m
        boolean[][] m = new boolean[n + 1][k + 1]; //n + 1 to include 0, k + 1 to include 0 

        //set first row of matrix to false. This also prevent array index out of bounds: -1
        for(int s = 0; s <= k; s++){
            m[0][s] = false;
        }

        //populate matrix m
        for(int i = 1; i <= n; i++){
            for(int s = 0; s <= k; s++){    
                if(s - a[i-1] >= 0){ //when it goes left we don't want it to go out of bounds. (logic 4)
                    m[i][s] = (m[i-1][s] || a[i-1] == s || m[i-1][s - a[i-1]]); 
                } else {
                    m[i][s] = (m[i-1][s] || a[i-1] == s);
                }       

            }
        }

        //print matrix
        print(m);

        return m[n][k];

    }

    private static void print(boolean[][] m){
        for(int i = 0; i < m.length; i++){
            for(int j = 0; j < m[i].length; j++){
                if(m[i][j]){
                    System.out.print("T");
                } else {
                    System.out.print("F");
                }           
            }
            System.out.print("\n");
        }
    }

    public static void main(String[] args){
        int[] array = {1,3,5,2,8};
        int k = 9;

        System.out.println(subSetSum(array,k));

    }
}

To build the matrix m takes O((n+1)(k+1)) which is O(nk). it seems like it should be polynomial but it is not! It's actually pseudo polynomial. Read about it here

Again this only works if the input only contains positive numbers. You can easily tweak it to work with negative numbers tho. The matrix would still have n+1 rows but B - A + 1 columns. Where B is the upperbound and A is the lowerbound ( +1 to include zero).The matrix would still be You would have to offset s with the lowerbound.

It is pretty hard to explain DP problem over text from start to end. But I hope this helps those out there that try to understand this problem.

Note that in the examples above the rows of the DP table are sorted. That does not have to be the case.

Here is a DP table for the case of the question, i.e. given a set of {5, 3, 11, 8, 2}. For brevity, I have omitted the false values.

┌─────────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┐
│ (index) │  0   │  2   │  3   │  5   │  7   │  8   │  10  │  11  │  13  │  14  │  15  │  16  │
├─────────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┼──────┤
│    0    │ true │      │      │      │      │      │      │      │      │      │      │      │
│    5    │ true │      │      │ true │      │      │      │      │      │      │      │      │
│    3    │ true │      │ true │ true │      │ true │      │      │      │      │      │      │
│    11   │ true │      │ true │ true │      │ true │      │ true │      │ true │      │ true │
│    8    │ true │      │ true │ true │      │ true │      │ true │ true │ true │      │ true │
│    2    │ true │ true │ true │ true │ true │ true │ true │ true │ true │ true │ true │ true │
└─────────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┘

Below is an implementation in JavaScript which will output the target set {5, 11}:

var subSetSum = function(input, sum) {

    let y = input.length;
    let x = sum;

    if(input.length === 0) return 0;

    let d = [];

    //fill the rows
    for (let i = 0; i <= y; i++) {
      d[i] = [];
      d[i][0] = true;
    }
    
    for (let j = 1; j <= y; j++) { //j row
      for (let i = 1; i <= x; i++) { //i column
      let num = input[j-1];
        if(num === i) {
          d[j][i] = true;
        } else if(d[j-1][i]) {
          d[j][i] = true;
        } else if (d[j-1][i-num]) {
          d[j][i] = true;
        }
      }
    }
    
    //console.table(d); //uncomment to see the table
    if(!d[y][x]) return null;

    let searchedSet = [];
    for(let j=input.length, i=sum; j>0 && i != 0; j--) {
      if(input[j-1] !== i) {
        while(d[j-1][i]) { // go up
          j--;
        }
      }
      searchedSet.push(input[j-1]);
      i = i-input[j-1];
    }

    return searchedSet;
};

console.log('searched set:'+ JSON.stringify(subSetSum([5, 3, 11, 8, 2], 16)));

Since it looks like all your numbers are positive, you can solve this using dynamic programming:

Start will a boolean array possible of size K+1 with the first value true, the rest false. The ith value will represent whether a subset sum of i is possible to achieve. For each number n in your set, loop through the possible array and if the ith value is true, then set the i+nth value to true as well.

At the end, if the kth value in possible is true, then you can form a subset sum of k. Problem solved in O(NK) time.

Wikipedia's page on the subset sum problem has a detailed explanation of this algorithm applied to sets of integers not guaranteed to be positive.


I'd suggest to read Wiki's algorithm. The algorithm exists there, see Pseudo-polynomial time dynamic programming solution for the O(P*n) solution, The solution is not polynomial time, is polynomial in (p,n) but it's not polynomial in n+log P (size of input) and because P can be very large like 2^n, the solution P*n = (2^n)*n is not a polynomial time solution in general, but when p is bounded by some polynomial function of n is polynomial time algorithm.

This problem is NPC, but there is a Pseudo polynomial timealgorithm for it, and belongs to weakly NP-Complete problems, Also there are Strongly NP-Complete problems, which means, you can't find any pseudo polynomial time algorithm for them unless P=NP, and this problem is not in this range of problems, So somehow is easy.

I said this as simple as possible,but it's not an exact definition of a Strongly NP-Complete or Weakly NP-Complete problems.

For detail see Garey and Johnson chapter 4.