How to divide a set into two subsets such that difference between the sum of numbers in two sets is minimal?

The decision version of the problem you are describing is an NP-complete problem and it is called the partition problem. There are a number of approximations which provide, in many cases, optimal or, at least, good enough solutions.

The simple algorithm you described is a way playground kids would pick teams. This greedy algorithm performs remarkably well if the numbers in the set are of similar orders of magnitude.

The article The Easiest Hardest Problem, by American Scientist, gives an excellent analysis of the problem. You should go through and read it!


No, that doesn't work. There is no polynomial time solution (unless P=NP). The best you can do is just look at all different subsets. Have a look at the subset sum problem.

Consider the list [0, 1, 5, 6]. You will claim {0, 5} and {1, 6}, when the best answer is actually {0, 1, 5} and {6}.


No, Your algorithm is wrong. Your algo follows a greedy approach. I implemented your approach and it failed over this test case: (You may try here)

A greedy algorithm:

#include<bits/stdc++.h>
#define rep(i,_n) for(int i=0;i<_n;i++)
using namespace std;

#define MXN 55
int a[MXN];

int main() {
    //code
    int t,n,c;
    cin>>t;
    while(t--){
        cin>>n;
        rep(i,n) cin>>a[i];
        sort(a, a+n);
        reverse(a, a+n);
        ll sum1 = 0, sum2 = 0;
        rep(i,n){
            cout<<a[i]<<endl;
            if(sum1<=sum2) 
                sum1 += a[i]; 
            else 
                sum2 += a[i]; 
        }
        cout<<abs(sum1-sum2)<<endl;
    }
    return 0;
}

Test case:

1
8 
16 14 13 13 12 10 9 3

Wrong Ans: 6
16 13 10 9
14 13 12 3

Correct Ans: 0
16 13 13 3
14 12 10 9

The reason greedy algorithm fails is that it does not consider cases when taking a larger element in current larger sum set and later a much smaller in the larger sum set may result much better results. It always try to minimize current difference without exploring or knowing further possibilities, while in a correct solution you might include an element in a larger set and include a much smaller element later to compensate this difference, same as in above test case.

Correct Solution:

To understand the solution, you will need to understand all below problems in order:

  • 0/1 Knapsack with Dynamic Programming
  • Partition Equal Subset Sum with DP
  • Solution

My Code (Same logic as this):

#include<bits/stdc++.h>
#define rep(i,_n) for(int i=0;i<_n;i++)
using namespace std;

#define MXN 55
int arr[MXN];
int dp[MXN][MXN*MXN];

int main() {
    //code
    int t,N,c;
    cin>>t;
    while(t--){
        rep(i,MXN) fill(dp[i], dp[i]+MXN*MXN, 0);

        cin>>N;
        rep(i,N) cin>>arr[i];
        int sum = accumulate(arr, arr+N, 0);
        dp[0][0] = 1;
        for(int i=1; i<=N; i++)
            for(int j=sum; j>=0; j--)
                dp[i][j] |= (dp[i-1][j] | (j>=arr[i-1] ? dp[i-1][j-arr[i-1]] : 0));

        int res = sum;

        for(int i=0; i<=sum/2; i++)
            if(dp[N][i]) res = min(res, abs(i - (sum-i)));

        cout<<res<<endl;
    }
    return 0;
}