The subset sum problem with SQL
This is my favourite!
What is the subset sum problem? Find a fun explanation here:
https://xkcd.com/287
And a boring one here:
https://en.wikipedia.org/wiki/Subset_sum_problem
Essentially, for each of these totals…
| ID | TOTAL | |----|-------| | 1 | 25150 | | 2 | 19800 | | 3 | 27511 |
… we want to find the “best” (i.e. the closest) sum possible, consisting of any combination of these items:
| ID | ITEM | |------|-------| | 1 | 7120 | | 2 | 8150 | | 3 | 8255 | | 4 | 9051 | | 5 | 1220 | | 6 | 12515 | | 7 | 13555 | | 8 | 5221 | | 9 | 812 | | 10 | 6562 |
As you’re all quick with your mental mathemagic processing, you have immediately calculated these to be the best sums:
| TOTAL | BEST | CALCULATION |-------|-------|-------------------------------- | 25150 | 25133 | 7120 + 8150 + 9051 + 812 | 19800 | 19768 | 1220 + 12515 + 5221 + 812 | 27511 | 27488 | 8150 + 8255 + 9051 + 1220 + 812
How to do it with SQL? Easy. Just create a CTE that contains all the 2n*possible* sums and then find the closest one for each TOTAL
:
1
2
3
4
5
6
7
8
9
|
-- All the possible 2N sums WITH sums( sum , max_id, calc) AS (...)
-- Find the best sum per “TOTAL” SELECT totals.total,
something_something(total - sum ) AS best,
something_something(total - sum ) AS calc
FROM draw_the_rest_of_the_*bleep*_owl
|
As you’re reading this, you might be like my friend here:
But don’t worry, the solution is – again – not all that hard (although it doesn’t perform due to the nature of the algorithm):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
WITH sums( sum , id, calc) AS (
SELECT item, id, to_char(item) FROM items
UNION ALL
SELECT item + sum , items.id, calc || ' + ' || item
FROM sums JOIN items ON sums.id < items.id
) SELECT totals.id,
totals.total,
min ( sum ) KEEP (
DENSE_RANK FIRST ORDER BY abs (total - sum )
) AS best,
min (calc) KEEP (
DENSE_RANK FIRST ORDER BY abs (total - sum )
) AS calc,
FROM totals
CROSS JOIN sums
GROUP BY totals.id, totals.total
|
In this article, I won’t explain the details of this solution, because the example has been taken from a previous article that you can find here:
How to Find the Closest Subset Sum with SQL
Enjoy reading the details, but be sure to come back here for the remaining 4 tricks:
- ۹۵/۰۳/۱۱