Prime containment numbers (speed edition)
C++ (GCC) + x86 assembly, score 32 36 62 in 259 seconds (official)
Results computed so far. My computer runs out of memory after 65
.
1 2
2 23
3 235
4 2357
5 112357
6 113257
7 1131725
8 113171925
9 1131719235
10 113171923295
11 113171923295
12 1131719237295
13 11317237294195
14 1131723294194375
15 113172329419437475
16 1131723294194347537
17 113172329419434753759
18 2311329417434753759619
19 231132941743475375961967
20 2311294134347175375961967
21 23112941343471735375961967
22 231129413434717353759619679
23 23112941343471735359619678379
24 2311294134347173535961967837989
25 23112941343471735359619678378979
26 2310112941343471735359619678378979
27 231010329411343471735359619678378979
28 101031071132329417343475359619678378979
29 101031071091132329417343475359619678378979
30 101031071091132329417343475359619678378979
31 101031071091131272329417343475359619678378979
32 101031071091131272329417343475359619678378979
33 10103107109113127137232941734347535961967838979
34 10103107109113127137139232941734347535961967838979
35 10103107109113127137139149232941734347535961967838979
36 1010310710911312713713914923294151734347535961967838979
37 1010310710911312713713914915157232941734347535961967838979
38 1010310710911312713713914915157163232941734347535961967838979
39 10103107109113127137139149151571631672329417343475359619798389
40 10103107109113127137139149151571631672329417343475359619798389
41 1010310710911312713713914915157163167173232941794347535961978389
42 101031071091131271371391491515716316717323294179434753596181978389
43 101031071091131271371391491515716316723294173434753596181917978389
44 101031071091131271371391491515716316717323294179434753596181919383897
45 10103107109113127137139149151571631671731792329418191934347535961978389
46 10103107109113127137139149151571631671731791819193232941974347535961998389
47 101031071091271313714915157163167173179181919321139232941974347535961998389
48 1010310710912713137149151571631671731791819193211392232941974347535961998389
49 1010310710912713137149151571631671731791819193211392232272941974347535961998389
50 10103107109127131371491515716316717317918191932113922322722941974347535961998389
51 101031071091271313714915157163167173179181919321139223322722941974347535961998389
52 101031071091271313714915157163167173179181919321139223322722923941974347535961998389
53 1010310710912713137149151571631671731791819193211392233227229239241974347535961998389
54 101031071091271313714915157163167173179211392233227229239241819193251974347535961998389
55 101031071091271313714915157163167173179211392233227229239241819193251972574347535961998389
56 101031071091271313714915157163167173179211392233227229239241819193251972572634347535961998389
57 101031071091271313714915157163167173179211392233227229239241819193251972572632694347535961998389
58 101031071091271313714915157163167173179211392233227229239241819193251972572632694347535961998389
59 1010310710912713137149151571631671731792113922332277229239241819193251972572632694347535961998389
60 101031071091271313714915157163167173211392233227722923924179251819193257263269281974347535961998389
61 1010310710912713137149151571631671732113922332277229239241792518191932572632692819728343475359619989
62 10103107109127131371491515716316717321139223322772293239241792518191932572632692819728343475359619989
63 1010307107109127131371491515716316717321139223322772293239241792518191932572632692819728343475359619989
64 10103071071091271311371391491515716316721173223322772293239241792518191932572632692819728343475359619989
65 10103071071091271311371491515716313916721173223322772293239241792518191932572632692819728343475359619989
These all agree with the output of the Concorde-based solver, so they stand a good chance of being correct.
Changelog:
Wrong calculation for needed context length. The earlier version was 1 too large, and also had a bug. Score:
3234Added equal-context-group optimisation. Score:
3436Overhauled the algorithm to use context-free strings properly, plus some other optimizations. Score:
3662Added a proper write-up.
Added the prime numbers variant.
How it works
Warning: this is a brain dump. Scroll to the end if you just want the code.
Abbreviations:
- PCN: prime containment number problem i.e. this challenge
- SCS: shortest common superstring
- TSP: travelling salesman problem
This program basically uses the textbook dynamic programming algorithm for the TSP.
- Plus a reduction from PCN/SCS, the problem we're actually solving, to TSP.
- Plus using item contexts instead of all digits in each item.
- Plus subdividing the problem based on primes that cannot overlap with the ends of other primes.
- Plus merging calculations for primes with the same start/end digits.
- Plus precomputed lookup tables and a custom hash table.
- Plus some low-level prefetching and bit-packing.
That's a lot of potential bugs. After playing around with anselm's entry and failing to coax any wrong results from it, I should at least prove that my overall approach is correct.
Although the Concorde-based solution is (much, much) faster, it is based on the same reduction, so this explanation applies to both. Additionally, this solution can be adapted for OEIS A054260, the prime-containing primes sequence; I don't know how to solve that efficiently in the TSP framework. So it's still somewhat relevant.
TSP reduction
Let's start by actually proving that reducing to TSP is correct. We have a set of strings, say
A = 13, 31, 37, 113, 137, 211
and we want to find the smallest superstring that contains these items.
Knowing the length is enough
For the PCN, if there are multiple shortest strings, we have to return the lexicographically smallest one. But we will look at a different (and easier) problem.
- SCS: Given an initial prefix and a set of items, find any shortest string that contains all the items as substrings, and starts with that prefix.
- SCS-Length: Just find the length of the SCS.
If we can solve SCS-Length, we can reconstruct the smallest solution and obtain the PCN. If we know that the smallest solution starts with our prefix, we try to extend it by appending each item, in lexicographic order, and solving for the length again. When we find the smallest item for which the solution length is the same, we know that this must be the next item in the smallest solution (why?), so add it and recurse on the remaining items. This method of reaching the solution is called self-reduction.
Touring the maximal-overlap graph
Suppose we started solving SCS for the above example by hand. We would probably:
- Get rid of
13
and37
, because they're already substrings of the other items. Any solution that contains137
, for example, must also contain13
and37
. - Start considering the combinations
113,137 → 1137
,211,113 → 2113
, etc.
This is in fact the right thing to do, but let's prove it for the sake of completeness. Take any SCS solution; for example, a shortest superstring for A
is
2113137
and it can be decomposed into a concatenation of all items in A
:
211
113
31
137
(We ignore the redundant items 13, 37
.) Observe that:
- The start and end positions of each item increase by at least 1.
- Each item is overlapped with the previous item to the greatest extent possible.
We will show that every shortest superstring can be decomposed this way:
For every pair of adjacent items
x,y
,y
starts and ends at later positions thanx
. If this is not true, then eitherx
is a substring ofy
or vice versa. But we already removed all items which are substrings, so that cannot happen.Suppose adjacent items in the sequence have less-than-maximal overlap, e.g.
21113
instead of2113
. But that would make the extra1
redundant. No later item needs the initial1
(as in 21113), because it occurs earlier than113
, and all items that appear after113
cannot start with a digit before113
(see point 1). A similar argument prevents the last extra1
(as in 21113) being used by any item before211
. But our shortest superstring, by definition, won't have redundant digits, so such non-maximal overlaps will not occur.
With these properties, we can convert any SCS problem into a TSP:
- Remove all items which are substrings of other items.
- Create a directed graph which has one vertex for each item.
- For each pair of items
x
,y
, add an edge fromx
toy
whose weight is the number of extra symbols added by appendingy
tox
with maximal overlap. For example, we would add an edge from211
to113
with weight 1, because2113
adds one more digit over211
. Repeat for the edge fromy
tox
. - Add a vertex for the initial prefix, and edges from it to all the other items.
Any path on this graph, from the initial prefix, corresponds to a maximal-overlap concatenation of all items on that path, and the total weight of the path equals the concatenated string length. Therefore, every lowest-weight tour, that visits all the items at least once, corresponds to a shortest superstring.
And that's the reduction from SCS (and SCS-Length) to TSP.
Dynamic programming algorithm
This is a classic algorithm, but we will modify it quite a bit, so here's a quick reminder.
(I've written this as an algorithm for SCS-Length instead of TSP. They're essentially equivalent, but the SCS vocabulary helps when we get to the SCS-specific optimizations.)
Call the set of input items A
and the given prefix P
. For every k
-element subset S
in A
, and every element e
of S
, we calculate the length of the shortest string that starts with P
, contains all of S
, and ends with e
. This involves storing a table from values of (S, e)
to their SCS-Lengths.
When we get to each subset S
, the table needs to already contain the results for S - {e}
for all e
in S
. As the table can get quite large, I calculate the results for all k
-element subsets, then k+1
, etc. For this, we only need to store the results for k
and k+1
at any one time. This reduces memory usage by a factor of roughly sqrt(|A|)
.
One more detail: instead of calculating the minimum SCS-Length, I actually calculate the maximum total overlap between the items. (To get the SCS-Length, just subtract the total overlap from the sum of the items' lengths.) Using overlaps helps some of the following optimizations.
[2.] Item contexts
A context is the longest suffix of an item that can overlap with following items. If our items are 113,211,311
, then 11
is the context for 211
and 311
. (It's also the prefix context for 113
, which we'll look at in part [4.])
In the DP algorithm above, we kept track of SCS solutions that end with each item, but we don't actually care which item an SCS ends in. All we need to know is the context. Thus, for example, if two SCSs for the same set end in 23
and 43
, any SCS that continues from one will also work for the other.
This is a significant optimization, because non-trivial primes end in only the digits 1 3 7 9
. The four single-digit contexts 1,3,7,9
(plus the empty context) are in fact sufficient to compute the PCNs for primes up to 131
.
[3.] Context-free items
Others have already pointed out that many primes begin with the digits 2,4,5,6,8
, such as 23,29,41,43...
. None of these can overlap with a previous prime (aside from 2
and 5
, primes cannot end in these digits; 2
and 5
will already have been removed as redundant). In the code, these are referred to as context-free strings.
If our input has context-free items, every SCS solution can be split into blocks
<prefix>... 23... 29... 41... 43...
and the overlaps in each block are independent of the other blocks. We can shuffle the blocks or swap items between blocks that have the same context, without changing the SCS length.
Thus, we only need to keep track of the possible multisets of contexts, one for each block.
Full example: for the primes less than 100, we have 11 context-free items and their contexts:
23 29 41 43 47 53 59 61 67 83 89
3 9 1 3 7 3 9 1 7 3 9
Our initial multiset context:
1 1 3 3 3 3 7 7 9 9 9
The code refers to these as combined contexts, or ccontexts. Then, we only need to consider subsets of the remaining items:
11 13 17 19 31 37 71 73 79 97
[4.] Context merging
Once we get to primes with 3 digits or more, there are more redundancies:
101 151 181 191 ...
107 127 157 167 197 ...
109 149 1009 ...
These groups share the same starting and ending contexts (usually—it depends on which other primes are in the input), so they are indistinguishable when overlapping other items. We only care about overlaps, so we can treat primes in these equal-context groups as indistinguishable. Now our DP subsets are condensed into multisubsets
4 × 1_1
5 × 1_7
3 × 1_9
(This is also why the solver maximizes overlap length instead of minimizing SCS length: this optimization preserves overlap length.)
Summary: the high-level optimizations
Running with INFO
debug output will print statistics like
solve: N=43, N_search=26, ccontext_size=18, #contexts=7, #eq_context_groups=16
This particular line is for the SCS-Length of the first 62 primes, 2
to 293
.
- After removing redundant items, we're left with 43 primes that are not substrings of each other.
- There are 7 unique contexts:
1,3,7,11,13,27
plus the empty string. - 17 of the 43 primes are context-free:
43,47,53,59,61,89,211,223,227,229,241,251,257,263,269,281,283
. These and the given prefix (in this case, empty string) form the basis of the initial combined context. - In the remaining 26 items (
N_search
), there are 16 nontrivial equal-context groups.
By exploiting these structures, the SCS-Length calculation only needs to check 8498336 (multiset, ccontext)
combinations. Straightforward dynamic programming would take 43×2^43 > 3×10^14
steps, and brute forcing the permutations would take 6×10^52
steps. The program still needs to run SCS-Length several more times to reconstruct the PCN solution, but that doesn't take much longer.
[5., 6.] The low-level optimizations
Instead of doing string operations, the SCS-Length solver works with indices of items and contexts. I also precompute the overlap amount between each context and item pair.
The code initially used GCC's unordered_map
, which seems to be a hash table with linked list buckets and prime hash sizes (i.e. expensive divisions). So I wrote my own hash table with linear probing and power-of-two sizes. This nets a 3× speedup and 3× reduction in memory.
Each table state consists of a multiset of items, a combined context and an overlap count. These are packed into 128-bit entries: 8 for the overlap count, 56 for the multiset (as a bitset with run-length encoding), and 64 for the ccontext (1-delimited RLE). Encoding and decoding the ccontext was the trickiest part and I ended up using the new PDEP
instruction (it's so new, GCC doesn't have an intrinsic for it yet).
Finally, accessing a hash table is really slow when N
gets large, because the table doesn't fit in cache any more. But the only reason we write to the hash table, is to update the best known overlap count for each state. The program splits this step off into a prefetch queue, and the inner loop prefetches each table lookup a few iterations before actually updating that slot. Another 2× speedup on my computer.
Bonus: further improvements
AKA How is Concorde so fast?
I don't know much about TSP algorithms, so here is a rough guess.
Concorde uses the branch-and-cut method to solve TSPs.
- It encodes the TSP as an integer linear program
- It uses linear programming methods, as well as initial heuristics, to obtain lower and upper bounds on the optimal tour distance
- These bounds are then fed into a branch and bound recursive algorithm that searches for the optimal solution. Large parts of the search tree can be pruned, if the computed lower bound for a subtree exceeds a known upper bound
- It also searches for cutting planes to tighten the LP relaxation and obtain better bounds. Typically, these cuts encode knowledge of the fact that the decision variables must be integers
Obvious ideas we could try:
- Pruning in the SCS-Length solver, especially when reconstructing the PCN solution (at that point, we already know what the solution length is)
- Deriving some easy-to-calculate lower bounds for SCS, that can be used to help pruning
- Finding more symmetries or redundancies in the prime number distribution to exploit
However, the branch-and-cut combination is very powerful, so we might not be able to beat a state-of-the-art solver like Concorde, for large values of N
.
Bonus bonus: the prime containment primes
Unlike the Concorde-based solution, this program can be modified to find the smallest containing primes (OEIS A054260). This involves three changes:
When reconstructing the solution from SCS-Length, test whether the solution is also a prime number. If not, backtrack and try some other ordering. There are reasonably many prime numbers (\$1/\ln(n)\$ density), so this usually succeeds quickly. However, if the PCN happens to have a digit sum divisible by 3, then the PCN (and many of its sibling permutations) will be divisible by 3. To avoid getting stuck in this situation, we also…
Modify the SCS-Length solver code to categorize solutions based on whether their digit sums are divisible by 3. This involves adding another entry, the digit sum mod 3, to each DP state. This greatly reduces the odds of the main solver getting stuck with non-prime permutations. This is the change that I couldn't work out how to translate to TSP. It can be encoded with ILP, but then I'd have to learn about this thing called “subtour inequality” and how to generate those.
It may be that all of the shortest PCNs are divisible by 3. In that case, the smallest prime containment prime must be at least one digit longer than the PCN. If our SCS-Length solver detects this, the solution reconstruction code has the option to add one extra digit at any point in the process. It tries adding each possible digit
0..9
and each remaining item to the current solution prefix, in lexicographic order as before.
With these changes, I can obtain the solutions up to N=62
. Except for 47
, where the reconstruction code gets stuck and gives up after 1 million steps (I don't know why, yet). The prime containment primes are:
1 2
2 23
3 523
4 2357
5 112573
6 511327
7 1135217
8 1113251719
9 11171323519
10 113171952923
11 113171952923
12 11131951723729
13 11317237419529
14 1131723294375419
15 113172329541947437
16 1131723294195343747
17 1113172329419434753759
18 11231329417437475361959
19 231132941743475375967619
20 2311294134347175967619537
21 23112941343471735967619537
22 231129413434717359537679619
23 23112941343471735375961983679
24 11231294134347173535961967983789
25 23112941343471735359679837619789
26 2310112941343471735359619783789679
27 231010329411343471735359619678379897
28 101031071132329417343475359619798376789
29 101031071091132329417343475359619767898379
30 101031071091132329417343475359619767898379
31 1010310710911131272329417343475359619678979837
32 1010310710911131272329417343475359619678979837
33 10103107109113127137232941734347535978961967983
34 10103107109113127137139232941734347535961967838979
35 10103107109113127137139149232941734347535961976798389
36 1010310710911312713713914923294151734347535976198389679
37 1010310710911312713713914915157232941734347535967619798389
38 10103107109111312713713914915157163232941734347535967897961983
39 10103107109113127137139149151571631672329417343475961979838953
40 10103107109113127137139149151571631672329417343475961979838953
41 10103107109111312713713914915157163167173232941794347535976198983
42 1010310710911131271371391491515716316717323294179434761819535989783
43 1010310710911131271371391491515716316723294173434753596181917989783
44 101031071091131271371391491515716316717323294179434753836181919389597
45 10103107109113127137139149151571631671731792329418191934347538961975983
46 101031071091113127137139149151571631671731791819193232941974347535989836199
47 (failed)
48 1010310710912713137149151571631671731791819193211392232941974347895359836199
49 10103107109112713137149151571631671731791819193211392232272941974347619983535989
50 10103107109127131371491515716316717317918191932113922322722941974347595389836199
51 101031071091271313714915157163167173179181919321139223322722941974347595389619983
52 101031071091271313714915157163167173179181919321139223322722923941974347538361995989
53 10103107109112713137149151571631671731791819193211392233227229239241974347619983538959
54 101031071091271313714915157163167173179211392233227229239241819193251974347619953835989
55 1010310710911271313714915157163167173179211392233227229239241819193251974325747596199538983
56 101031071091271313714915157163167173179211392233227229239241819193251972572634347619959895383
57 101031071091271313714915157163167173179211392233227229239241819193251972572632694359538983619947
58 101031071091271313714915157163167173179211392233227229239241819193251972572632694359538983619947
59 1010310710912713137149151571631671731792113922332277229239241819193251972572632694347535983896199
60 1010310710911271313714915157163167173211392233227722923924179251819193257263269281974347535961998389
61 1010310710912713137149151571631671732113922332277229239241792518191932572632692819728343538947619959
62 10103107109127131371491515716316717321139223322772293239241792518191932572632692819728343534759896199
Code
Compile with
g++ -std=c++14 -O3 -march=native pcn.cpp -o pcn
For the prime number version, also link with GMPlib, e.g.
g++ -std=c++14 -O3 -march=native pcn-prime.cpp -o pcn-prime -lgmp -lgmpxx
This program uses the PDEP instruction, which is only available on recent (Haswell+) x86 processors. Both my computer and maxb's support it. If yours doesn't, the program will compile in a slow software version. A compile warning will be printed when this happens.
#include <cassert>
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <vector>
#include <unordered_map>
#include <string>
#include <algorithm>
#include <array>
using namespace std;
void debug_dummy(...) {
}
#ifndef INFO
//# define INFO(...) fprintf(stderr, __VA_ARGS__)
# define INFO debug_dummy
#endif
#ifndef DEBUG
//# define DEBUG(...) fprintf(stderr, __VA_ARGS__)
# define DEBUG debug_dummy
#endif
bool is_prime(size_t n)
{
for (size_t d = 2; d * d <= n; ++d) {
if (n % d == 0) {
return false;
}
}
return true;
}
// bitset, works for up to 64 strings
using bitset_t = uint64_t;
const size_t bitset_bits = 64;
// Find position of n-th set bit of x
uint64_t bit_select(uint64_t x, size_t n) {
#ifdef __BMI2__
// Bug: GCC doesn't seem to provide the _pdep_u64 intrinsic,
// despite what its manual claims. Neither does Clang!
//size_t r = _pdep_u64(ccontext_t(1) << new_context, ccontext1);
size_t r;
// NB: actual operand order is %2, %1 despite the intrinsic taking %1, %2
asm ("pdep %2, %1, %0"
: "=r" (r)
: "r" (uint64_t(1) << n), "r" (x)
);
return __builtin_ctzll(r);
#else
# warning "bit_select: no x86 BMI2 instruction set, falling back to slow code"
size_t k = 0, m = 0;
for (; m < 64; ++m) {
if (x & (uint64_t(1) << m)) {
if (k == n) {
break;
}
++k;
}
}
return m;
#endif
}
#ifndef likely
# define likely(x) __builtin_expect(x, 1)
#endif
#ifndef unlikely
# define unlikely(x) __builtin_expect(x, 0)
#endif
// Return the shortest string that begins with a and ends with b
string join_strings(string a, string b) {
for (size_t overlap = min(a.size(), b.size()); overlap > 0; --overlap) {
if (a.substr(a.size() - overlap) == b.substr(0, overlap)) {
return a + b.substr(overlap);
}
}
return a + b;
}
vector <string> dedup_items(string context0, vector <string> items)
{
vector <string> items2;
for (size_t i = 0; i < items.size(); ++i) {
bool dup = false;
if (context0.find(items[i]) != string::npos) {
dup = true;
} else {
for (size_t j = 0; j < items.size(); ++j) {
if (items[i] == items[j]?
i > j
: items[j].find(items[i]) != string::npos) {
dup = true;
break;
}
}
}
if (!dup) {
items2.push_back(items[i]);
}
}
return items2;
}
// Table entry used in main solver
const size_t solver_max_item_set = bitset_bits - 8;
struct Solver_entry
{
uint8_t score : 8;
bitset_t items : solver_max_item_set;
bitset_t context;
Solver_entry()
{
score = 0xff;
items = 0;
context = 0;
}
bool is_empty() const {
return score == 0xff;
}
};
// Simple hash table to avoid stdlib overhead
struct Solver_table
{
vector <Solver_entry> t;
size_t t_bits;
size_t size_;
size_t num_probes_;
Solver_table()
{
// 256 slots initially -- this needs to be not too small
// so that the load factor formula in update_score works
t_bits = 8;
size_ = 0;
num_probes_ = 0;
resize(t_bits);
}
static size_t entry_hash(bitset_t items, bitset_t context)
{
uint64_t h = 0x3141592627182818ULL;
// Add context first, since its bits are generally
// less well distributed than items
h += context;
h ^= h >> 23;
h *= 0x2127599bf4325c37ULL;
h ^= h >> 47;
h += items;
h ^= h >> 23;
h *= 0x2127599bf4325c37ULL;
h ^= h >> 47;
return h;
}
size_t probe_index(size_t hash) const {
return hash & ((size_t(1) << t_bits) - 1);
}
void resize(size_t t2_bits)
{
assert (size_ < size_t(1) << t2_bits);
vector <Solver_entry> t2(size_t(1) << t2_bits);
for (auto entry: t) {
if (!entry.is_empty()) {
size_t h = entry_hash(entry.items, entry.context);
size_t mask = (size_t(1) << t2_bits) - 1;
size_t idx = h & mask;
while (!t2[idx].is_empty()) {
idx = (idx + 1) & mask;
++num_probes_;
}
t2[idx] = entry;
}
}
t.swap(t2);
t_bits = t2_bits;
}
uint8_t update_score(bitset_t items, bitset_t context, uint8_t score)
{
// Ensure we can insert a new item without resizing
assert (size_ < t.size());
size_t index = probe_index(entry_hash(items, context));
size_t mask = (size_t(1) << t_bits) - 1;
for (size_t p = 0; p < t.size(); ++p, index = (index + 1) & mask) {
++num_probes_;
if (likely(t[index].items == items && t[index].context == context)) {
t[index].score = max(t[index].score, score);
return t[index].score;
}
if (t[index].is_empty()) {
// add entry
t[index].score = score;
t[index].items = items;
t[index].context = context;
++size_;
// load factor 4/5 -- ideally 2-3 average probes per lookup
if (5*size_ > 4*t.size()) {
resize(t_bits + 1);
}
return score;
}
}
assert (false && "bug: hash table probe loop");
}
size_t size() const {
return size_;
}
void swap(Solver_table table)
{
t.swap(table.t);
::swap(size_, table.size_);
::swap(t_bits, table.t_bits);
::swap(num_probes_, table.num_probes_);
}
};
/*
* Main solver code.
*/
struct Solver
{
// Inputs
vector <string> items;
string context0;
size_t context0_index;
// Mapping between strings and indices
vector <string> context_to_string;
unordered_map <string, size_t> string_to_context;
// Items that have context-free prefixes, i.e. prefixes that
// never overlap with the end of other items nor context0
vector <bool> contextfree;
// Precomputed contexts (suffixes) for each item
vector <size_t> item_context;
// Precomputed updates: (context, string) to overlap amount
vector <vector <size_t>> join_overlap;
Solver(vector <string> items, string context0)
:items(items), context0(context0)
{
items = dedup_items(context0, items);
init_context_();
}
void init_context_()
{
/*
* Generate all relevant item-item contexts.
*
* At this point, we know that no item is a substring of
* another, nor of context0. This means that the only contexts
* we need to care about, are those generated from maximal join
* overlaps between any two items.
*
* Proof:
* Suppose that the shortest containing string needs some other
* kind of context. Maybe it depends on a context spanning
* three or more items, say X,Y,Z. But if Z ends after Y and
* interacts with X, then Y must be a substring of Z.
* This cannot happen, because we removed all substrings.
*
* Alternatively, it depends on a non-maximal join overlap
* between two strings, say X,Y. But if this overlap does not
* interact with any other string, then we could maximise it
* and get a shorter solution. If it does, then call this
* other string Z. We would get the same contradiction as in
* the previous case with X,Y,Z.
*/
size_t N = items.size();
vector <size_t> max_prefix_overlap(N), max_suffix_overlap(N);
size_t context0_suffix_overlap = 0;
for (size_t i = 0; i < N; ++i) {
for (size_t j = 0; j < N; ++j) {
if (i == j) continue;
string joined = join_strings(items[j], items[i]);
size_t overlap = items[j].size() + items[i].size() - joined.size();
string context = items[i].substr(0, overlap);
max_prefix_overlap[i] = max(max_prefix_overlap[i], overlap);
max_suffix_overlap[j] = max(max_suffix_overlap[j], overlap);
if (string_to_context.find(context) == string_to_context.end()) {
string_to_context[context] = context_to_string.size();
context_to_string.push_back(context);
}
}
// Context for initial join with context0
{
string joined = join_strings(context0, items[i]);
size_t overlap = context0.size() + items[i].size() - joined.size();
string context = items[i].substr(0, overlap);
max_prefix_overlap[i] = max(max_prefix_overlap[i], overlap);
context0_suffix_overlap = max(context0_suffix_overlap, overlap);
if (string_to_context.find(context) == string_to_context.end()) {
string_to_context[context] = context_to_string.size();
context_to_string.push_back(context);
}
}
}
// Now compute all canonical trailing contexts
context0_index = string_to_context[
context0.substr(context0.size() - context0_suffix_overlap)];
item_context.resize(N);
for (size_t i = 0; i < N; ++i) {
item_context[i] = string_to_context[
items[i].substr(items[i].size() - max_suffix_overlap[i])];
}
// Now detect context-free items
contextfree.resize(N);
for (size_t i = 0; i < N; ++i) {
contextfree[i] = (max_prefix_overlap[i] == 0);
if (contextfree[i]) {
DEBUG(" contextfree: %s\n", items[i].c_str());
}
}
// Now compute all possible overlap amounts
join_overlap.resize(context_to_string.size(), vector <size_t> (N));
for (size_t c_index = 0; c_index < context_to_string.size(); ++c_index) {
const string& context = context_to_string[c_index];
for (size_t i = 0; i < N; ++i) {
string joined = join_strings(context, items[i]);
size_t overlap = context.size() + items[i].size() - joined.size();
join_overlap[c_index][i] = overlap;
}
}
}
// Main solver.
// Returns length of shortest string containing all items starting
// from context0 (context0's length not included).
size_t solve() const
{
size_t N = items.size();
// Length, if joined without overlaps. We try to improve this by
// finding overlaps in the main iteration
size_t base_length = 0;
for (auto s: items) {
base_length += s.size();
}
// Now take non-context-free items. We will only need to search
// over these items.
vector <size_t> search_items;
for (size_t i = 0; i < N; ++i) {
if (!contextfree[i]) {
search_items.push_back(i);
}
}
size_t N_search = search_items.size();
/*
* Some groups of strings have the same context transitions.
* For example "17", "107", "127", "167" all have an initial
* context of "1" and a trailing context of "7", no other
* overlaps are possible with other primes.
*
* We group these strings and treat them as indistinguishable
* during the main algorithm.
*/
auto eq_context = [&](size_t i, size_t j) {
if (item_context[i] != item_context[j]) {
return false;
}
for (size_t ci = 0; ci < context_to_string.size(); ++ci) {
if (join_overlap[ci][i] != join_overlap[ci][j]) {
return false;
}
}
return true;
};
vector <size_t> eq_context_group(N_search, size_t(-1));
for (size_t si = 0; si < N_search; ++si) {
for (size_t sj = si-1; sj+1 > 0; --sj) {
size_t i = search_items[si], j = search_items[sj];
if (!contextfree[j] && eq_context(i, j)) {
DEBUG(" eq context: %s =c= %s\n", items[i].c_str(), items[j].c_str());
eq_context_group[si] = sj;
break;
}
}
}
// Figure out the combined context size. A combined context has
// one entry for each context-free item plus one for context0.
size_t ccontext_size = N - N_search + 1;
// Assert that various parameters all fit into our data types
using ccontext_t = bitset_t;
assert (context_to_string.size() + ccontext_size <= bitset_bits);
assert (N_search <= solver_max_item_set);
assert (base_length < 0xff);
// Initial combined context.
unordered_map <size_t, size_t> cc0_full;
++cc0_full[context0_index];
for (size_t i = 0; i < N; ++i) {
if (contextfree[i]) {
++cc0_full[item_context[i]];
}
}
// Now pack into unary-encoded bitset. The bitset stores the
// count for each context as <count> number of 0 bits,
// followed by a 1 bit.
ccontext_t cc0 = 0;
for (size_t ci = 0, b = 0; ci < context_to_string.size(); ++ci, ++b) {
b += cc0_full[ci];
cc0 |= ccontext_t(1) << b;
}
// Map from (item set, context) to maximum achievable overlap
Solver_table k_solns;
// Base case: cc0 with empty set
k_solns.update_score(0, cc0, 0);
// Now start dynamic programming. k is current subset size
size_t eq_context_groups = 0;
for (size_t g: eq_context_group) eq_context_groups += (g != size_t(-1));
if (context0.empty()) {
INFO("solve: N=%zu, N_search=%zu, ccontext_size=%zu, #contexts=%zu, #eq_context_groups=%zu\n",
N, N_search, ccontext_size, context_to_string.size(), eq_context_groups);
} else {
DEBUG("solve: context=%s, N=%zu, N_search=%zu, ccontext_size=%zu, #contexts=%zu, #eq_context_groups=%zu\n",
context0.c_str(), N, N_search, ccontext_size, context_to_string.size(), eq_context_groups);
}
for (size_t k = 0; k < N_search; ++k) {
decltype(k_solns) k1_solns;
// The main bottleneck of this program is updating k1_solns,
// which (for larger N) becomes a huge table.
// We use a prefetch queue to reduce memory latency.
const size_t prefetch = 8;
array <Solver_entry, prefetch> entry_queue;
size_t update_i = 0;
// Iterate every k-subset
for (Solver_entry entry: k_solns.t) {
if (entry.is_empty()) continue;
bitset_t s = entry.items;
ccontext_t ccontext = entry.context;
size_t overlap = entry.score;
// Try adding a new item
for (size_t si = 0; si < N_search; ++si) {
bitset_t s1 = s | bitset_t(1) << si;
if (s == s1) {
continue;
}
// Add items in each eq_context_group sequentially
if (eq_context_group[si] != size_t(-1) &&
!(s & bitset_t(1) << eq_context_group[si])) {
continue;
}
size_t i = search_items[si]; // actual item index
size_t new_context = item_context[i];
// Increment ccontext's count for new_context.
// We need to find its delimiter 1 bit
size_t bit_n = bit_select(ccontext, new_context);
ccontext_t ccontext_n =
((ccontext & ((ccontext_t(1) << bit_n) - 1))
| ((ccontext >> bit_n << (bit_n + 1))));
// Select non-empty sub-contexts to substitute for new_context
for (size_t ci = 0, bit1 = 0, count;
ci < context_to_string.size();
++ci, bit1 += count + 1)
{
assert (ccontext_n >> bit1);
count = __builtin_ctzll(ccontext_n >> bit1);
if (!count
// We just added new_context; we can only remove an existing
// context entry there i.e. there must be at least two now
|| (ci == new_context && count < 2)) {
continue;
}
// Decrement ci in ccontext_n
bitset_t ccontext1 =
((ccontext_n & ((ccontext_t(1) << bit1) - 1))
| ((ccontext_n >> (bit1 + 1)) << bit1));
size_t overlap1 = overlap + join_overlap[ci][i];
// do previous prefetched update
if (update_i >= prefetch) {
Solver_entry entry = entry_queue[update_i % prefetch];
k1_solns.update_score(entry.items, entry.context, entry.score);
}
// queue the current update and prefetch
Solver_entry entry1;
size_t probe_index = k1_solns.probe_index(Solver_table::entry_hash(s1, ccontext1));
__builtin_prefetch(&k1_solns.t[probe_index]);
entry1.items = s1;
entry1.context = ccontext1;
entry1.score = overlap1;
entry_queue[update_i % prefetch] = entry1;
++update_i;
}
}
}
// do remaining queued updates
for (size_t j = 0; j < min(update_i, prefetch); ++j) {
Solver_entry entry = entry_queue[j];
k1_solns.update_score(entry.items, entry.context, entry.score);
}
if (context0.empty()) {
INFO(" hash stats: |solns[%zu]| = %zu, %zu lookups, %zu probes\n",
k+1, k1_solns.size(), update_i, k1_solns.num_probes_);
} else {
DEBUG(" hash stats: |solns[%zu]| = %zu, %zu lookups, %zu probes\n",
k+1, k1_solns.size(), update_i, k1_solns.num_probes_);
}
k_solns.swap(k1_solns);
}
// Overall solution
size_t max_overlap = 0;
for (Solver_entry entry: k_solns.t) {
if (entry.is_empty()) continue;
max_overlap = max(max_overlap, size_t(entry.score));
}
return base_length - max_overlap;
}
};
// Wrapper for Solver that also finds the smallest solution string
string smallest_containing_string(vector <string> items)
{
items = dedup_items("", items);
size_t soln_length;
{
Solver solver(items, "");
soln_length = solver.solve();
}
DEBUG("Found solution length: %zu\n", soln_length);
string soln;
vector <string> remaining_items = items;
while (remaining_items.size() > 1) {
// Add all possible next items, in lexicographic order
vector <pair <string, size_t>> next_solns;
for (size_t i = 0; i < remaining_items.size(); ++i) {
const string& item = remaining_items[i];
next_solns.push_back(make_pair(join_strings(soln, item), i));
}
assert (next_solns.size() == remaining_items.size());
sort(next_solns.begin(), next_solns.end());
// Now try every item in order
bool found_next = false;
for (auto ns: next_solns) {
size_t i;
string next_soln;
tie(next_soln, i) = ns;
DEBUG("Trying: %s + %s -> %s\n",
soln.c_str(), remaining_items[i].c_str(), next_soln.c_str());
vector <string> next_remaining;
for (size_t j = 0; j < remaining_items.size(); ++j) {
if (next_soln.find(remaining_items[j]) == string::npos) {
next_remaining.push_back(remaining_items[j]);
}
}
Solver solver(next_remaining, next_soln);
size_t next_size = solver.solve();
DEBUG(" ... next_size: %zu + %zu =?= %zu\n", next_size, next_soln.size(), soln_length);
if (next_size + next_soln.size() == soln_length) {
INFO(" found next item: %s\n", remaining_items[i].c_str());
soln = next_soln;
remaining_items = next_remaining;
// found lexicographically smallest solution, break now
found_next = true;
break;
}
}
assert (found_next);
}
soln = join_strings(soln, remaining_items[0]);
return soln;
}
int main()
{
string prev_soln;
vector <string> items;
size_t p = 1;
for (size_t N = 1;; ++N) {
for (++p; items.size() < N; ++p) {
if (is_prime(p)) {
char buf[99];
snprintf(buf, sizeof buf, "%zu", p);
items.push_back(buf);
break;
}
}
// Try to reuse previous solution (this works for N=11,30,32...)
string soln;
if (prev_soln.find(items.back()) != string::npos) {
soln = prev_soln;
} else {
soln = smallest_containing_string(items);
}
printf("%s\n", soln.c_str());
prev_soln = soln;
}
}
Try it online!
And the prime-only version on TIO. Sorry, but I didn't golf these programs and there's a post length limit.
JavaScript (Node.js), score 24 in 241 seconds
Results
- \$a(1)\$ to \$a(21)\$ are solved in ~15 seconds on TIO
- \$a(22)=231129413434717353759619679\$ was found in ~70 seconds on my laptop
- \$a(23)=23112941343471735359619678379\$ was found in approximately 2 minutes and 40 seconds on my laptop
- \$a(1)\$ to \$a(24)\$ was found in 241 seconds for the official scoring (edited by maxb)
Algorithm
This is a recursive search which tries all possible ways of merging numbers together, and eventually sort the resulting lists in lexicographical order when a leaf node is reached.
A merge is attempted on \$x\$ and \$y\$ if the first \$k\$ digits of \$x\$ equal the last \$k\$ digits of \$y\$, or the first \$k\$ digits of \$y\$ equal the last \$k\$ digits of \$x\$.
At the beginning of each iteration, any entry that can be found in another entry is removed from the list.
A significant speed-up was achieved by keeping track of visited nodes, so that we can abort early when different operations lead to the same list.
A small speed-up was achieved by updating and restoring the list when possible rather than generating a copy, as suggested by an anonymous user Neil.
Example
Below are the steps that are executed for \$n=7\$, with the primes \$[2,3,5,7,11,13,17]\$.
[] // start with an empty list
[ 2 ] // append 2
[ 2, 3 ] // append 3
[ 2, 3, 5 ] // append 5
[ 2, 3, 5, 7 ] // append 7
[ 2, 3, 5, 7, 11 ] // append 11
[ 2, 3, 5, 7, 11, 13 ] // append 13
[ 2, 5, 7, 11, 13 ] // remove 3, which appears in 13
[ 2, 5, 7, 113, 13 ] // try to merge 11 and 13 into 113
[ 2, 5, 7, 113 ] // remove 13, which now appears in 113
[ 2, 5, 7, 113, 17 ] // append 17
[ 2, 5, 113, 17 ] // remove 7, which appears in 17
--> leaf node: 1131725 // new best result
[ 2, 5, 7, 11, 13, 17 ] // append 17
[ 2, 5, 11, 13, 17 ] // remove 7, which appears in 17
[ 2, 5, 113, 13, 17 ] // try to merge 11 and 13 into 113
[ 2, 5, 113, 17 ] // remove 13, which now appears in 113
// abort because this node was already visited
// (it was a leaf node anyway, so we don't save much here)
[ 2, 5, 117, 13, 17 ] // try to merge 11 and 17 into 117
[ 2, 5, 117, 13 ] // remove 17, which now appears in 117
--> leaf node: 1171325 // not better than the previous one
--> leaf node: 11131725 // not better than the previous one
Code
Try it online!
let f = n => {
let visited = {},
a, d, k, best, search;
// build the list of primes, as strings
for(a = [ '2' ], n--, k = 3; n; k++) {
for(d = k; k % (d -= 2);) {}
d == 1 && n-- && a.push(k + '');
}
best = a.join('');
// recursive search function
(search = (a, n = 0, r = []) => {
let x, y, i, j, k, s;
// remove all entries in r[] that can be found in another entry
r = r.filter((p, i) => !r.some((q, j) => i != j && ~q.indexOf(p)));
// abort early if this node was already visited
if(visited[r]) {
return;
}
// otherwise, mark it as visited
visited[r] = 1;
// walk through all distinct pairs (x, y) in r[]
for(i = 0; i < r.length; i++) {
for(j = i + 1; j < r.length; j++) {
x = r[i];
y = r[j];
// try to merge x and y if:
// 1) the first k digits of x equal the last k digits of y
for(k = 1; x.slice(0, k) == y.slice(-k); k++) {
r[i] = y + x.slice(k);
search(a, n, r);
}
// or:
// 2) the first k digits of y equal the last k digits of x
for(k = 1; y.slice(0, k) == x.slice(-k); k++) {
r[i] = x + y.slice(k);
search(a, n, r);
}
r[i] = x;
}
}
if(x = a[n]) {
// there are other primes to process, so go on with the next one
search(a, n + 1, [...r, x]);
}
else {
// this is a leaf node: see if we've improved our current score
s = r.join('');
if(s.length <= best.length) {
s = r.sort().join('');
if(s.length < best.length || s < best) {
best = s;
}
}
}
})(a);
return best;
}
Concorde TSP solver, score 84 in 299 seconds
Well… I feel silly for only realising this now.
This whole thing is essentially a travelling salesman problem. For each pair of primes p
and q
, add an edge whose weight is the number of digits added by q
(removing overlapping digits). Also, add an initial edge to every prime p
, whose weight is the length of p
. The shortest travelling salesman path matches the length of the smallest prime containment number.
Then an industrial grade TSP solver, such as Concorde, will make short work of this problem.
This entry should probably be considered non-competing.
Results
The solver gets to N=350
in about 20 CPU hours. The full results are too long for one SE post, and the OEIS doesn't want that many terms anyway. Here are the first 200:
1 2
2 23
3 235
4 2357
5 112357
6 113257
7 1131725
8 113171925
9 1131719235
10 113171923295
11 113171923295
12 1131719237295
13 11317237294195
14 1131723294194375
15 113172329419437475
16 1131723294194347537
17 113172329419434753759
18 2311329417434753759619
19 231132941743475375961967
20 2311294134347175375961967
21 23112941343471735375961967
22 231129413434717353759619679
23 23112941343471735359619678379
24 2311294134347173535961967837989
25 23112941343471735359619678378979
26 2310112941343471735359619678378979
27 231010329411343471735359619678378979
28 101031071132329417343475359619678378979
29 101031071091132329417343475359619678378979
30 101031071091132329417343475359619678378979
31 101031071091131272329417343475359619678378979
32 101031071091131272329417343475359619678378979
33 10103107109113127137232941734347535961967838979
34 10103107109113127137139232941734347535961967838979
35 10103107109113127137139149232941734347535961967838979
36 1010310710911312713713914923294151734347535961967838979
37 1010310710911312713713914915157232941734347535961967838979
38 1010310710911312713713914915157163232941734347535961967838979
39 10103107109113127137139149151571631672329417343475359619798389
40 10103107109113127137139149151571631672329417343475359619798389
41 1010310710911312713713914915157163167173232941794347535961978389
42 101031071091131271371391491515716316717323294179434753596181978389
43 101031071091131271371391491515716316723294173434753596181917978389
44 101031071091131271371391491515716316717323294179434753596181919383897
45 10103107109113127137139149151571631671731792329418191934347535961978389
46 10103107109113127137139149151571631671731791819193232941974347535961998389
47 101031071091271313714915157163167173179181919321139232941974347535961998389
48 1010310710912713137149151571631671731791819193211392232941974347535961998389
49 1010310710912713137149151571631671731791819193211392232272941974347535961998389
50 10103107109127131371491515716316717317918191932113922322722941974347535961998389
51 101031071091271313714915157163167173179181919321139223322722941974347535961998389
52 101031071091271313714915157163167173179181919321139223322722923941974347535961998389
53 1010310710912713137149151571631671731791819193211392233227229239241974347535961998389
54 101031071091271313714915157163167173179211392233227229239241819193251974347535961998389
55 101031071091271313714915157163167173179211392233227229239241819193251972574347535961998389
56 101031071091271313714915157163167173179211392233227229239241819193251972572634347535961998389
57 101031071091271313714915157163167173179211392233227229239241819193251972572632694347535961998389
58 101031071091271313714915157163167173179211392233227229239241819193251972572632694347535961998389
59 1010310710912713137149151571631671731792113922332277229239241819193251972572632694347535961998389
60 101031071091271313714915157163167173211392233227722923924179251819193257263269281974347535961998389
61 1010310710912713137149151571631671732113922332277229239241792518191932572632692819728343475359619989
62 10103107109127131371491515716316717321139223322772293239241792518191932572632692819728343475359619989
63 1010307107109127131371491515716316717321139223322772293239241792518191932572632692819728343475359619989
64 10103071071091271311371391491515716316721173223322772293239241792518191932572632692819728343475359619989
65 10103071071091271311371491515716313916721173223322772293239241792518191932572632692819728343475359619989
66 10103071071091271311371491515716313921167223317322772293239241792518191932572632692819728343475359619989
67 10103071071091271311371491515716313921167223317322772293239241792518191932572632692819728343475359619989
68 1010307107109127131137149151571631392116722331732277229323924179251819193257263269281972833743475359619989
69 1010307107109127131137149151571631392116722331732277229323924179251819193257263269281972833743475359619989
70 101030710710912713113714915157163139211672233173227722932392417925181919325726326928197283374347534959619989
71 101030710710912713113714915157163139211672233173227722932392417925181919325726337269281972834743534959619989
72 101030710710912713113714915157163139211672233173227722932392417925181919337257263472692819728349435359619989
73 10103071071091271311371491515716313921167223317322772293372392417925181919347257263492692819728353594367619989
74 101030710710912713113714915157163139211672233173227722932392417925181919337347257263492692819728353594367619989
75 1010307107109127131137313914915157163211672233173227722933792392417925181919347257263492692819728353594367619989
76 101030710710912713113731391491515716321167223317322772293379239241792518191934725726349269281972835359438367619989
77 101030710710912713113731391491515716321167223317337922772293472392417925181919349257263535926928197283674383896199
78 1010307107109127131137313914915157163211672233173379227722934723972417925181919349257263535926928197283674383896199
79 101030710710912713113731391491515721163223317337922772293472397241672517925726349269281819193535928367401974383896199
80 101030710710912713113731391491515721163223317337922772293472397241672517925726349269281819193535928367401974094383896199
81 101030710710912713113731391491515721163223317337922772293472397241916725179257263492692818193535928367401974094383896199
82 1010307107109127131137313914915157223317322772293379239724191634725167257263492692817928353594018193674094211974383896199
83 1010307107109127131137313914922331515722772293379239724191634725167257263492692817353592836740181938389409421197431796199
84 101030710710912713113731391492233151572277229323972419163472516725726349269281735359283674018193838940942119743179433796199
85 101030710710912713113731391492233151572277229323924191634725167257263492692817353592836740181938389409421197431794337943976199
86 1010307107109127131137313914922331515722772293239241916347251672572634926928173535928367401819383894094211974317943379443976199
87 1010307107109127131137313914922331515722772293239241916347251672572634926928173535928367401819383894094211974317943379443974496199
88 1010307107109127131137313914922331515722772293239241916347251672572634926928173535928367401819383894094211974317943379443974494576199
89 10103071071091271311373139149223315157227722932392419163472516725726349269281735359283674018193838940942119743179433794439744945746199
90 10103071071091271311373139149223315157227722932392419163251672572634726928173492835359401819367409421197431794337944397449457461994638389
91 10103071071091271311373139149223315157227722932392419163251672572634726928173492835359401819367409421197431794337944397449457461994638389467
92 101030710710912713113731391492233151572277229323924191632516725726347926928173492835359401819367409421197431794337944397449457461994638389467
93 101030710710912713113731391492233151572277229323924191632516725726347926928173492835359401819367409421197431794337944397449457461994638389467487
94 101030710710912713113731392233149151572277229323924191632516725726347926928173492835359401819367409421197431794337944397449457461994638389467487
95 1010307107109127131137313922331491515722772293239241916325167257263479269281734928353594018193674094211974317943379443974499457461994638389467487
96 1010307107109127131137313922331491515722772293239241916325167257263269281734792834940181935359409421197431794337944397449945746199463674674875038389
97 1010307107109127131137313922331491515722772293239241916325167257263269281734792834940181935359409421197431794337944397449945746199463674674875038389509
98 101030710710912713113732233139227722932392419149151572516325726326928167283479401734940942118193535943179433794439744994574619746367467487503838950952199
99 1010307107109127131137322331392277229324191491515725163257263269281672834794017349409421181935359431794337944394499457461974636746748750383895095219952397
100 101030710710922331127131373227722932414915157251632572632692816728347940173494094211394317943379443944994574618191935359463674674875038389509521975239754199
101 101030710710922331127131373227722932414915157251632572632692816728347401734940942113943179433794439449945746181919353594636746748750383895095219752397541995479
102 101030710710922331127131373227722932414915157251632572632692816728347401734940942113943179433794439449945746181919353594636746748750383895095219752397541995479557
103 101030710710922331127131373227722932414915157251632572632692816728340173474094211394317943379443944945746181919349946353594674875036750952197523975419954795575638389
104 101030710710922331127131373227722932414915157251632572632692816728340173474094211394317943379443944945746181919349946353594674875036750952197523975419954795575638389569
105 101030710722331109227127722932413137325149151571632572632692816728340173474094211394317943379443944945746181919349946353594674875036750952197523975419954795575638389569
106 1010307107223311092271277229324131373251491515716325726326928167283401734740942113943179433794439449457461819193499463535946748750367509521975239754199547955775638389569
107 1010307107223311092271277229324131373251491515716325726326928167283401734740942113943179433794439449457461819193499463535946748750367509521975239754199547955775638389569587
108 10103071072233110922712772293241313732514915157163257263269281672834017340942113943179433794439449457461819193474634994674875035359367509521975239754199547955775638389569587
109 10103071072233110922712772293241313732514915157163257263269281672834017340942113943179433794439449457461819193474634994674875035359367509521975239754199547955775638389569587599
110 1010307223311072271092293241277251313732571491515726326928163283401674094211394317343379443944945746179463474674875034995095218191935359367523975419754795577563838956958759960199
111 1010307223311072271092293241277251313732571491515726326928163283401674094211394317343379443944945746179463474674875034995095218191935359367523975419754795577563838956958759960199607
112 1010307223311072271092293241277251491515716325726326928167283401734094211313734317943379443944945746139463474674875034995095218191935359367523975419754795577563838956958759960199607
113 22331101030722710722932410925127725714915157263269281632834016740942113137343173433794439449457461394634746748750349950952181919353593675239754197547955775638389569587599601996076179
114 2233110103072271072293241092512571277263269281491515728340163409421131373431734337944394494574613946347467487503499509521675239754191819353593675479557756383895695875996019760761796199
115 22331010307227107229324109251257126311277269281491515728340163409421131373431734337944394494574613946347467487503499509521675239754191819353593675479557756383895695875996019760761796199
116 22331010307227107229324109251257126311269281277283401491515740942113137343173433794439449457461394634674875034750952163499523975416754795577563535936756958759960181919383896076179619764199
117 223310103072271072293241092512571263112692812772834014915157409421131373431734433794494574613946346748750347509521634995239541675479557756353593675695875996018191938389607617961976419964397
118 223310103072271072293241092512571263112692812772834014915157409421131373431734433794494574613946346748750347509521634995239541675475577563535936756958759960181919383896076179619764199643976479
119 223310103072271072293241092512571263112692812772834014915157409421131373431734433794494574613946346748750347509521634995239541675475577563535695875935996018191936760761796197641996439764796538389
120 2233101030722710722932410925125712631126928127728340149151574094211313734317344337944945746139463467487503475095216349952395416754755775635356958760181919359367607617961976419964397647965383896599
121 22331010307227107229324109251257126311269281277283401491515740942113137343173443379449457461394634674875034750952163499523954167547557756353569587601819193593676076179641976439764796538389659966199
122 223310103072271072293241092512571263112692812772834014915157409421131373431734433794494574613946346734748750349950952163523954167547557756353569587601819193593676076179641976439764796538389659966199
123 2233101030722710722932410925125712631126928127728340149151574094211313734317344337944945746139463467347487503499509521635239541675475577563535695876018191935936776076179641976439764796538389659966199
124 2233101030722710722932410925125712631126928127728340149151574094211313734317344337944945746139463467347487503499509521635239541675475577563535695876018191935936076179641976439764796536776599661996838389
125 22331010307227107229324109251257126311269127728128340149151574094211313734317344337944945746139463467347487503499509521635239541675475577563535695876018191935936076179641976439764796536776599661996838389
126 2233101030701072271092293241251257126311269127728128340149151574094211313734317344337944945746139463467347487503499509521635239541675475577563535695876018191935936076179641976439764796536776599661996838389
127 223310103070107092271092293241251257126311269127728128340149151574094211313734317344337944945746139463467347487503499509521635239541675475577563535695876018191935936076179641976439764796536776599661996838389
128 223310103070107092271092293241251257191263112691277281283401491515740942113137343173443379449457461394634673474875034995095216352395416754755775635356958760181935936076179641976439764796536776599661996838389
129 22331010307010709227109229324125125719126311269127277281283401491515740942113137343173443379449457461394634673474875034995095216352395416754755775635356958760181935936076179641976439764796536776599661996838389
130 223307010103227092293241072510925712631126912719128128340140942113137331491515727743173443379449457461394634673474875034995095216352395416754755775635356958760181935936076179641976439764796536776599661996838389
131 2233070101032270922932410725109257126311269127191281283401409421131373314915157277431734433794494574613946346739487503475095216349952395416754755775635356958760181935936076179641976439764796536776599661996838389
132 2233070101032270922932410725109257126311269127191281283401409421131373314915157277431734433794494574613946346739487503475095216349952395416754755775635356958760181935936076179641976439764796536776599661996838389
133 223307010103227092293241072510925712631126912719128128340140942113137331443173449149457277433794613946346739487503475095215157516349952395416754755775635356958760181935936076179641976439764796536776599661996838389
134 22330701010322709229324107251092571263112691271912812834014094211313733144317344914945727743379461394634673948750347509521515751634995239541675475575635356958757760181935936076179641976439764796536776599661996838389
135 22330701010322709229324107251092571263112691271912812834014094211313733144317344914945727743379461394634673948750347509521515751634995239541675475575635356958757760181935936076179641976439764796536776599661996838389
136 2233070101032270922932410725109257126311269127191281283401409421131373314431734491494572774337946139463467394875034750952151575163499523954167547557563535695875776018193593607617964197643976479653677696599661996838389
137 22330701010322709229324107251092571263112691271912812834014094211313733144317344914945727734613946346739487433795034750952151575163499523954167547557563535695875776018193593607617964197643976479653677696599661996838389
138 2233070101032270922932410725109257126311269127191281283401409421131373314431734491494572773461394634673948743379503475095215157516349952395416754755756353569587577601819359360761796419764397647965367787696599661996838389
139 22330701010322709229324107251092571263112691271912812834014094211313733144317344914945727734613946346739487433795034750952151575163499523954167547557563535695875776018193593607617964197643976479765367787696599661996838389
140 22330701010322709229324107251092571263112691271912812834014094211313733144317344914945727734613946346739487433795034750952151575163499523954167547557563535695875776018193593607617964197643976479765367787696599661996838389809
141 223307010103227092293241072510925712631126912719128112834014094211313733144317344914945727734613946346739487433795034750952151575163499523954167547557563535695875776018193593607617964197643976479765367787696599661996838389809
142 223307010103227092293241072510925712631126912719128112834014094211313733144317344914572773461394634673948743379503475095214952395415157516349954755756353569587577601676076179641935936439764797653677659966197876968383898098218199
143 223070101032270922932410725109257126311269127191281128340140942113137331443173449145727734613946346739487433475034950952149952337954151575163535475575635695875776016760761796419359364396479765367765996619768383898098218199823978769
144 223070101032270922932410725109257126311269127191281128340140942113137331443173449145727433461394634673474875034950952149952337954151575163535475575635695875773960167607617964193593643964797653677659966197683838980982181998239769827787
145 223070101032270922924107251092571263112691271912811283401409421131373314431734491457274334613946346734748750349509521499523379541515751635354755756356958757739601676076179641935936439647976536599661976836776980982181998239782778782938389
146 2230701010322709229241072510925712631126912719128112834014094211313733144317344914572743346139463467347487503495095214995233795415157516353547557563569587577396016760761796419359364396479765367765996619768383976980982181998239827787829389
147 2230701010322709229241072510925712631126912719128112834014094211313733144317344914572743346139463467347487503495095214995233795415157516353547557563569587577396016760761796419359364396479765365996619768367769809821819982397827787829383985389
148 2230701010322709229241072510925712631126912719128112834014094211313733144317344914572743346139463467347487503495095214995233795415157516353547557563569587576016760761796419359364396479765365996619768367739769809821819982398277829383985389857787
149 2230701010322709229241072510925712631126912719128112834014094211313733144317344914572743346139463467347487503495095214995233795415157516353547557563569587576016760761796419359364396479765365966197683677397698098218199823982778293839853898577878599
150 2230701010322709229241072510925712631126912719128112834014094211313733144317344914572743346139463467347487503495095214995233795415157516353547557563569587576016760761796419359364396479765365966197683677397698098218199823982778293839853857787859986389
151 22307010103227092292410725109257126311269127191281128340140942113137331443173449145727433461394634673474875034950952149952337954151575163535475575635695875760167607617964193593643964797653659661976836773976980982181998239827782938398538577877859986389
152 22307010103227092292410725109257126311269127191281128340140942113137331443173449145727433461394634673474875034950952149952337954151547515755756353569587576016359360761796419364396479765365966197683676980982167739782398277829383985385778778599863898818199
153 22307010103227092292410725109257126311269127191281128340140942113137331443173449145727433461394634673474875034950952149952337954151547515755756353569587576016359360761796419364396479765365966197683676980982167739782398277829383853857787785998638988181998839
154 22307010103227092292410725109257126311269127191281128340140942113137331443173449145727433461394634673474875034950952149952337954151547515755756353569587576016359360761796419364396479765365966197683676980982167739782398277829383853857785998638988181998839887787
155 2230701010322709072292410725109257126311269127191281128340140942113137331443173449145727433461394634673474875034950952149952337954151547515755756353569587576016359360761796419364396479765365966197683676980982167739782398277829383853857785998638988181998839887787
156 22307010103227090722924107251092571263112691127191281128340140942113137331443173449145727433461394634673474875034950952149952337954151547515755756353569587576016359360761796419364396479765365966197683676980982167739782398277829383853857785998638988181998839887787
157 22307010103227090722924107251092571263112691127191281128340140942113137331443173449193457274334613946346734748750349509521499523379541515475155756353569587576015760761796419764396479765359365966199683676980982163823978277398293838538577859986389881816778778839887
158 2230701010322709072292410725109257126311269112719128112834014092934211313733144317344919345727433461394634673474875034950952149952337954151547515575635356958757601576076179641976439647976535936596619968367698098216382397827739829853838577859986389881816778778839887
159 22307010103227090722924107251092571263112691127191281128340140929342113137274314433173344919345746139463467347487503495095214995233735354151547515575635695875760157607617964197643964796535937976596619968367698098216382397827739829853838577859986389881816778778839887
160 2230701010322709072292410725109257126311269112719128112834014092934211313727431443317334491934574613941463467347487503495095214995233735354151547515575635695875760157607617964197643964796535937976596619968367698098216382397827739829853838577859986389881816778778839887
161 223070101032270907229241072510925712631126911271912811283401409293421131372743144331733449193457461394146346734748750349475095214995233735354151547515575635695875760157607617964197643964796535937976596619968367698098216382397827739829853838577859986389881816778778839887
162 22307010103227090722924107251092571263112691127191281128340140929342113137274314433173344919345746139414634673474875034947509521499523373535415154751557563569535875760157607617964197643964796535937976596619968367698098216382397827739829853838577859986389881816778778839887
163 2230701010322709072292410725109257126311269112719128112834014092934211313727431443317334491934574613941463467347487503494750952149952337353541515475155756356953587576015760761796419764396479653593797659661996768367698098216382397827739829853838577859986389881816778778839887
164 22307010103227090722924107251092571263112691127128112834014092934211313727431443317334491457461394146346734748750349475095214995233735354151547515575635695358757601576076179641919359379643964797197653659661996768367698098216382397827739829853838577859986389881816778778839887
165 223070101032270907229241072510925712631126911271281128340140929342113137274314433173344914574613941463467347487503494750952149952337353541515475155756356953587576015760761796419193593796439647971976536596619967683676980982163823977398277829853838577859986389881816778778839887
166 22307010103227090722924107251092571263112691127128112834014092934211313727431443317334491457461394146346734748750349475095214995233735354151547515575635695358757601576076179641919359379643964797197653659661996768367698098216382397739827782983838538577859986389881816778778839887
167 223070101032270907229241072510925712631126911271281128340140929342113137274314433173344914574613941463467347487503494750952149915152337353541547515575635695358757601576076179641919359379643964797197653659661996768367698098216382397739827782983838538577859986389881816778778839887
168 2230701010322709072292410725109257126311269112712811283401409293421131372743144331733449145746139414634673474875034947509521499151523373535415475155756356953587576015760761796419193593796439647971976536596619967683676980982163823977398277829838385385778599786389881816778778839887
169 2230701009070922710103229241072510925712631126911272728112834014092934211313733144317344914574334613941463467347487503494750952149915152337515415475575635356953587576015760761796419193593796439647971976536596619967683676980982163823977398277829838385385778599786389881816778778839887
170 22307010090709227101310322924107251092571263112691127272811283401409293421134431373317344914574334613941463467347487503494750952149915152337515415475575635356953587576015760761796419193593796439647971976536596619967683676980982163823977398277829838385385778599786389881816778778839887
171 22307010090709227101310191032292410725109257126311269112727281128340140929342113443137331734491457433461394146346734748750349475095214991935233751515415475575635356953587576015760761796419643964796535937971976596619967683676980982163823977398277829838385385778599786389881816778778839887
172 22307010090709227101310191021032292410725109257126311269112727281128340140929342113443137331734491457433461394146346734748750349475095214991935233751515415475575635356953587576015760761796419643964796535937971976596619967683676980982163823977398277829838385385778599786389881816778778839887
173 223070100907092271013101910210310722924109251257126311269112727281128340140929342113443137331734491457433461394146346734748750349475095214991935233751515415475575635356953587576015760761796419643964796535937971976596619967683676980982163823977398277829838385385778599786389881816778778839887
174 223070100907092271013101910210310331107229241092512571263132691127272811283401409293421137334431734491457433461394146346734748750349475095214991935233751515415475575635356953587576015760761796419643964796535937971976596619967683676980982163823977398277829838385385778599786389881816778778839887
175 223070100907092271013101910210310331103922924107251092571263132691127272811283401409293421137334431734491457433461394146346734748750349475095214991935233751515415475575635356953587576015760761796419643964796535937971976596619967683676980982163823977398277829838385385778599786389881816778778839887
176 223070100907092271013101910210310331103922924104910725109257126313269112727281128340140929342113733443173449414574334613946346734748750349475095214991935233751515415475575635356953587576015760761796419643964796535937971976596619967683676980982163823977398277829838385385778599786389881816778778839887
177 223070100907092271013101910210310331103922924104910510725109257126313269112727281128340140929342113733443173449414574334613946346734748750349475095214991935233751515415475575635356953587576015760761796419643964796535937971976596619967683676980982163823977398277829838385385778599786389881816778778839887
178 223070100907092271013101910210310331103922924104910510610725109257126313269112727281128340140929342113733443173449414574334613946346734748750349475095214991935233751515415475575635356953587576015760761796419643964796535937971976596619967683676980982163823977398277829838385385778599786389881816778778839887
179 223070100907092271013101910210310331103922924104910510610631325107257109263269112727281128340140929342113733443173449414574334613946346734748750349475095214991935233751515415475575635356953587576015760761796419643964796535937971976596619967683676980982163823977398277829838385385778599786389881816778778839887
180 223070100907092271013101910210310331103922924104910510610631325106911072571092632692811272728340140929342113733443173449414574334613946346734748750349475095214991935233751515415475575635356953587576015760761796419643964796535937971976596619967683676980982163823977398277829838385385778599786389881816778778839887
181 223070100907092271013101910210310331103922924104910510610631325106911072571087263269281092834012727409293421137334431734494145743346139463467347487503494750952149919352337515154154755756353569535875760157607617964196439647965359379719765966199676836769809821638239773982778298383853857785997863898811816778778839887
182 2230701009070922710131019102103103311039229241049105106106313251069107257108726326928109112727283401409293421137334431734494145743346139463467347487503494750952149919352337515154154755756353569535875760157607617964196439647965359379719765966199676836769809821638239773982778298383853857785997863898811816778778839887
183 2230701009070922710131019102103103311039229241049105106106313251069107257108726326928109110932834012727409293421137334431734494145743346139463467347487503494750952149919352337515154154755756353569535875760157607617964196439647965359379719765966199676836769809821638239773982778298383853857785997863898811816778778839887
184 2230701009070922710131019102103103311039229241049105106106313251069107257108726326928109110932834010971929340941272742113733443173449457433461394634673474875034947509521499193523375151541547557563535695358757601576076179641976439647965359379765966199676836769809821638239773982778298383853857785997863898811816778778839887
185 2230701009070922710131019102103103311039229241049105106106313251069107257108726326928109110932834010971929340941272742113733443173449457433461394634673474875034947509521499193523375151541547557563535695358757601576076179641976439647965359379765966199676836769809821638239773982778298383853857785997863898811816778778839887
186 2230701009070922710131019102103103311039229241049105106106313251069107257108726326928109110932834010971929340941272742113733443173449457433461394634673474875034947509521499193523375151541547557563535695358757601576076179641976439647965359379765966199676836769809821638239773982778298383853857785997863898811816778778839887
187 223070100907092271013101910210310331103922924104910510610631325106910725710872632692810911093283401097192934094127274211173344317433449457461373463467347487503494750952149919352337515154157547557563535695358757601635937960761796419764396479765365966199676836769809821677397782398277829838385385778599786389881811398839887787
188 223070100907092271013101910210310331103922924104910510610631325106910725710872632692810911093283401097192934094111727421123344317334494574337346137463467347487503494750952127514991935235354151575475575635695358757601635937960761796419764396479765365966199676836769809821677397782398277829838385385778599786389881811398839887787
189 1009070101307092232271019102103103310491051061063110392292410691072510872571091109326326928109719283401117274092934211233443131733449411294574337346137463467347487503494750952127514991935235354151575475575635695358757601635937960761796419764396479765365966199676836769809821677397782398277829838385385778599786389881811398839887787
190 10090701013070922322710191021031033104910510610631103922924106910725108725710911093263269281097192834011172740929342112334431317334494112945743373461374634673474875034947509521139523535412751499193547557563569535875760157607617964197643964796535937976596619967683676980982163823977398277829838385385778599786389881151816778778839887
191 100907010130709101910210310331049105106106311039223227106910722924108725109110932571097192632692811172728340112334092934211294113137334431734494574337461394634673474875034947509521151153523535412751499193547557563569535875760157607617964197643964796535937976596619967683676980982163823977398277829838385385778599786389881816778778839887
192 1009070101307091019102103103310491051061063110392232271069107229241087251091109325710971926326928111727283401123340929342112941131373344317344945743374613946346734748750349475095211511535235354116354751275575635695358757601499193593796076179641976439647976536596619967683676980982157739778239827782983838538578599786389881816778778839887
193 1009070101307092232271019102103103310491051061063110392292410691072510872571091109326326928109711171928340112334092934211294113137274317334433734494574613946346734748750349475095211511535235354127514991935475575635695358757601576076179641976439647965359379765966199676836769809821677397782398277829838385385778599786388181163898839887787
194 10090701013070922322710191021031033104910510610631103922924106910725108725710911093263269281097111719283401123340929342112941131372743173344337344945746139463467347487503494750952115115352353541163547512755756356953587576014991935937960761796419764396479765365966199676836769809821577397782398277829838385385785997863898811816778778839887
195 100907010130709101910210310331049105106106311039223227106910722924108725109110932571097111719263269281123283401129293409411313727421151153443173344945743346139463467347487503494750952116352337353541181187512754755756356953587576014991935937960761796419764396479765365966199676836769809821577397782398277829838385385785997863898816778778839887
196 100907010130709101910210310331049105106106310691072231103922710872292410911093251097111711232571926326928112928340113137274092934211511534431733449411634574334613946346734748750349475095211811875119352337353541275475575635695358757601499196076179641976439647965359379765966199676836769809821577397782398277829838385385785997863898816778778839887
197 100907010130709101910210310331049105106106310691072231103922710872292410911093251097111711232571926326928112928340113137274092934211511534431733449411634574334613946346734748750349475095211811875119352337353541201275475575635695358757601499196076179641976439647965359379765966199676836769809821577397782398277829838385385785997863898816778778839887
198 1009070101307091019102103103310491051061063106910710872231103922710911093229241097111711232511292571926326928113132834011511534092934211634431733449411811872743345746137346346734748750349475095211935233751201213953535412754755756356958757601499196076179641976439647965359379765966199676836769809821577397782398277829838385385785997863898816778778839887
199 10090701013070910191021031033104910510610631069107108710911039223110932271097111711232292411292511313257192632692811511532834011634092934211811872743173344334494119345746137346346734748750349475095212012139523375121754127547557563535695358757601499196076179641976439647965359379765966199676836769809821577397782398277829838385385785997863898816778778839887
200 100907010130709101910210310331049105106106310691071087109109311039110971117112322711292292411313251151153257192632692811632834011811872740929342119344317334494120121373457433461394634673474875034947509521217512233752353541275475575635695358757601499196076179641976439647965359379765966199676836769809821577397782398277829838385385785997863898816778778839887
Code
Here's a Python 3 script to call the Concorde solver over and over until it constructs the solutions.
Concorde is free for academic use. You can download an executable binary of Concorde built with their own linear programming package QSopt, or if you somehow have a license for IBM CPLEX, you could build Concorde from source to use CPLEX.
#!/usr/bin/env python3
'''
Find prime containment numbers (OEIS A054261) using the Concorde
TSP solver.
The n-th prime containment number is the smallest natural number
which, when written in decimal, contains the first n primes.
'''
import argparse
import itertools
import os
import sys
import subprocess
import tempfile
def join_strings(a, b):
'''Shortest string that starts with a and ends with b.'''
for overlap in range(min(len(a), len(b)), 0, - 1):
if a[-overlap:] == b[:overlap]:
return a + b[overlap:]
return a + b
def is_prime(n):
if n < 2:
return False
d = 2
while d*d <= n:
if n % d == 0:
return False
d += 1
return True
def prime_list_reduced(n):
'''First n primes, with primes that are substrings of other
primes removed.'''
primes = []
p = 2
while len(primes) < n:
if is_prime(p):
primes.append(p)
p += 1
reduced = []
for p in primes:
if all(p == q or str(p) not in str(q) for q in primes):
reduced.append(p)
return reduced
# w_med is an offset for actual weights
# (we use zero as a dummy weight when splitting nodes)
w_med = 10**4
# w_big blocks edges from being taken
w_big = 10**8
def gen_tsplib(prefix, strs, start_candidates):
'''Generate TSP formulation in TSPLIB format.
Returns a TSPLIB format string that encodes the length of the
shortest string starting with 'prefix' and containing all 'strs'.
start_candidates is the set of strings that solution paths are
allowed to start with.
'''
N = len(strs)
# Concorde only supports symmetric TSPs. Therefore we encode the
# asymmetric TSP instances by doubling each node.
node_in = lambda i: 2*i
node_out = lambda i: node_in(i) + 1
# 2*(N+1) nodes because we add an artificial node with index N
# for the start/end of the tour. This node is also doubled.
num_nodes = 2*(N+1)
# Ensure special offsets are big enough
assert w_med > len(prefix) + sum(map(len, strs))
assert w_big > w_med * num_nodes
weight = [[w_big] * num_nodes for _ in range(num_nodes)]
def edge(src, dest, w):
weight[node_out(src)][node_in(dest)] = w
weight[node_in(dest)][node_out(src)] = w
# link every incoming node with the matching outgoing node
for i in range(N+1):
weight[node_in(i)][node_out(i)] = 0
weight[node_out(i)][node_in(i)] = 0
for i, p in enumerate(strs):
if p in start_candidates:
prefix_w = len(join_strings(prefix, p))
# Initial length
edge(N, i, w_med + prefix_w)
else:
edge(N, i, w_big)
# Link every str to the end to allow closed tours
edge(i, N, w_med)
for i, p in enumerate(strs):
for j, q in enumerate(strs):
if i != j:
w = len(join_strings(p, q)) - len(p)
edge(i, j, w_med + w)
out = '''NAME: prime-containment-number
TYPE: TSP
DIMENSION: %d
EDGE_WEIGHT_TYPE: EXPLICIT
EDGE_WEIGHT_FORMAT: FULL_MATRIX
EDGE_WEIGHT_SECTION
''' % num_nodes
out += '\n'.join(
' '.join(str(w) for w in row)
for row in weight
) + '\n'
out += 'EOF\n'
return out
def parse_tour_soln(prefix, strs, text):
'''This constructs the solution from Concorde's 'tour' output format.
The format simply consists of a permutation of the graph nodes.'''
N = len(strs)
node_in = lambda i: 2*i
node_out = lambda i: node_in(i) + 1
nums = list(map(int, text.split()))
# The file starts with the number of nodes
assert nums[0] == 2*(N+1)
nums = nums[1:]
# Then it should list a permutation of all nodes
assert len(nums) == 2*(N+1)
# Find and remove the artificial starting point
start = nums.index(node_out(N))
nums = nums[start+1:] + nums[:start]
# Also find and remove the end point
if nums[-1] == node_in(N):
nums = nums[:-1]
elif nums[0] == node_in(N):
# Tour printed in reverse order
nums = reversed(nums[1:])
else:
assert False, 'bad TSP tour'
soln = prefix
for i in nums:
# each prime appears in two adjacent nodes, pick one arbitrarily
if i % 2 == 0:
soln = join_strings(soln, strs[i // 2])
return soln
def scs_length(prefix, strs, start_candidates, concorde_path, concorde_verbose):
'''Find length of shortest containing string using one call to Concorde.'''
# Concorde's small-input solver CCHeldKarp, tends to fail with the
# cryptic error message 'edge too long'. Brute force instead
if len(strs) <= 5:
best = len(prefix) + sum(map(len, strs))
for perm in itertools.permutations(range(len(strs))):
if perm and strs[perm[0]] not in start_candidates:
continue
soln = prefix
for i in perm:
soln = join_strings(soln, strs[i])
best = min(best, len(soln))
return best
with tempfile.TemporaryDirectory() as tempdir:
concorde_path = os.path.join(os.getcwd(), concorde_path)
with open(os.path.join(tempdir, 'prime.tsplib'), 'w') as f:
f.write(gen_tsplib(prefix, strs, start_candidates))
if concorde_verbose:
subprocess.check_call([concorde_path, os.path.join(tempdir, 'prime.tsplib')],
cwd=tempdir)
else:
try:
subprocess.check_output([concorde_path, os.path.join(tempdir, 'prime.tsplib')],
cwd=tempdir, stderr=subprocess.STDOUT)
except subprocess.CalledProcessError as e:
print('Concorde exited with error code %d\nOutput log:\n%s' %
(e.returncode, e.stdout.decode('utf-8', errors='ignore')),
file=sys.stderr)
raise
with open(os.path.join(tempdir, 'prime.sol'), 'r') as f:
soln = parse_tour_soln(prefix, strs, f.read())
return len(soln)
# Cache results from previous N's
pcn_solve_cache = {} # (prefix fragment, strs) -> soln
def pcn(n, concorde_path, concorde_verbose):
'''Find smallest prime containment number for first n primes.'''
strs = list(map(str, prime_list_reduced(n)))
target_length = scs_length('', strs, strs, concorde_path, concorde_verbose)
def solve(prefix, strs, target_length):
if not strs:
return prefix
# Extract part of prefix that is relevant to cache
prefix_fragment = ''
for s in strs:
next_prefix = join_strings(prefix, s)
overlap = len(prefix) + len(s) - len(next_prefix)
fragment = prefix[len(prefix) - overlap:]
if len(fragment) > len(prefix_fragment):
prefix_fragment = fragment
fixed_prefix = prefix[:len(prefix) - len(prefix_fragment)]
assert fixed_prefix + prefix_fragment == prefix
cache_key = (prefix_fragment, tuple(strs))
if cache_key in pcn_solve_cache:
return fixed_prefix + pcn_solve_cache[cache_key]
# Not in cache, we need to calculate it.
soln = None
# Try strings in ascending order until scs_length reports a
# solution with equal length. That string will be the
# lexicographically smallest extension of our solution.
next_prefixes = sorted((join_strings(prefix, s), s)
for s in strs)
# Try first string -- often works
next_prefix, _ = next_prefixes[0]
next_prefixes = next_prefixes[1:]
next_strs = [s for s in strs if s not in next_prefix]
next_length = scs_length(next_prefix, next_strs, next_strs,
concorde_path, concorde_verbose)
if next_length == target_length:
soln = solve(next_prefix, next_strs, next_length)
else:
# If not, do a weighted binary search on remaining strings
while len(next_prefixes) > 1:
split = (len(next_prefixes) + 2) // 3
group = next_prefixes[:split]
group_length = scs_length(prefix, strs, [s for _, s in group],
concorde_path, concorde_verbose)
if group_length == target_length:
next_prefixes = group
else:
next_prefixes = next_prefixes[split:]
if next_prefixes:
next_prefix, _ = next_prefixes[0]
next_strs = [s for s in strs if s not in next_prefix]
check = True
# Uncomment if paranoid
#next_length = scs_length(next_prefix, next_strs, next_strs,
# concorde_path, concorde_verbose)
#check = (next_length == target_length)
if check:
soln = solve(next_prefix, next_strs, target_length)
assert soln is not None, (
'solve failed! prefix=%r, strs=%r, target_length=%d' %
(prefix, strs, target_length))
pcn_solve_cache[cache_key] = soln[len(fixed_prefix):]
return soln
return solve('', strs, target_length)
parser = argparse.ArgumentParser()
parser.add_argument('--concorde', type=str, default='concorde',
help='path to Concorde binary')
parser.add_argument('--verbose', action='store_true',
help='dump all Concorde output')
parser.add_argument('--start', type=int, metavar='N', default=1,
help='start at this N')
parser.add_argument('--end', type=int, metavar='N', default=1000000,
help='stop after this N')
parser.add_argument('--one', type=int, metavar='N',
help='solve for a single N and exit')
def main():
opts = parser.parse_args(sys.argv[1:])
if opts.one is not None:
opts.start = opts.one
opts.end = opts.one
prev_soln = ''
for n in range(opts.start, opts.end+1):
primes = map(str, prime_list_reduced(n))
if all(p in prev_soln for p in primes):
soln = prev_soln
else:
soln = pcn(n, opts.concorde, opts.verbose)
print('%d %s' % (n, soln))
sys.stdout.flush()
prev_soln = soln
if __name__ == '__main__':
main()