What is the minimum cost to connect all the islands?

To approach this problem, I would use an integer programming framework and define three sets of decision variables:

  • x_ij: A binary indicator variable for whether we build a bridge at water location (i, j).
  • y_ijbcn: A binary indicator for whether water location (i, j) is the n^th location linking island b to island c.
  • l_bc: A binary indicator variable for whether islands b and c are directly linked (aka you can walk only on bridge squares from b to c).

For bridge building costs c_ij, the objective value to minimize is sum_ij c_ij * x_ij. We need to add the following constraints to the model:

  • We need to ensure the y_ijbcn variables are valid. We can always only reach a water square if we build a bridge there, so y_ijbcn <= x_ij for every water location (i, j). Further, y_ijbc1 must equal 0 if (i, j) does not border island b. Finally, for n > 1, y_ijbcn can only be used if a neighboring water location was used in step n-1. Defining N(i, j) to be the water squares neighboring (i, j), this is equivalent to y_ijbcn <= sum_{(l, m) in N(i, j)} y_lmbc(n-1).
  • We need to ensure the l_bc variables are only set if b and c are linked. If we define I(c) to be the locations bordering island c, this can be accomplished with l_bc <= sum_{(i, j) in I(c), n} y_ijbcn.
  • We need to ensure that all islands are linked, either directly or indirectly. This can be accomplished in the following way: for every non-empty proper subset S of islands, require that at least one island in S is linked to at least one island in the complement of S, which we'll call S'. In constraints, we can implement this by adding a constraint for every non-empty set S of size <= K/2 (where K is the number of islands), sum_{b in S} sum_{c in S'} l_bc >= 1.

For a problem instance with K islands, W water squares, and specified maximum path length N, this is a mixed integer programming model with O(K^2WN) variables and O(K^2WN + 2^K) constraints. Obviously this will become intractable as the problem size becomes large, but it may be solvable for the sizes you care about. To get a sense of the scalability, I'll implement it in python using the pulp package. Let's first start with the smaller 7 x 9 map with 3 islands at the bottom of the question:

import itertools
import pulp
water = {(0, 2): 2.0, (0, 3): 1.0, (0, 4): 1.0, (0, 5): 1.0, (0, 6): 2.0,
         (1, 0): 2.0, (1, 1): 9.0, (1, 2): 1.0, (1, 3): 9.0, (1, 4): 9.0,
         (1, 5): 9.0, (1, 6): 1.0, (1, 7): 9.0, (1, 8): 2.0,
         (2, 0): 1.0, (2, 1): 9.0, (2, 2): 9.0, (2, 3): 1.0, (2, 4): 9.0,
         (2, 5): 1.0, (2, 6): 9.0, (2, 7): 9.0, (2, 8): 1.0,
         (3, 0): 9.0, (3, 1): 1.0, (3, 2): 9.0, (3, 3): 9.0, (3, 4): 5.0,
         (3, 5): 9.0, (3, 6): 9.0, (3, 7): 1.0, (3, 8): 9.0,
         (4, 0): 9.0, (4, 1): 9.0, (4, 2): 1.0, (4, 3): 9.0, (4, 4): 1.0,
         (4, 5): 9.0, (4, 6): 1.0, (4, 7): 9.0, (4, 8): 9.0,
         (5, 0): 9.0, (5, 1): 9.0, (5, 2): 9.0, (5, 3): 2.0, (5, 4): 1.0,
         (5, 5): 2.0, (5, 6): 9.0, (5, 7): 9.0, (5, 8): 9.0,
         (6, 0): 9.0, (6, 1): 9.0, (6, 2): 9.0, (6, 6): 9.0, (6, 7): 9.0,
         (6, 8): 9.0}
islands = {0: [(0, 0), (0, 1)], 1: [(0, 7), (0, 8)], 2: [(6, 3), (6, 4), (6, 5)]}
N = 6

# Island borders
iborders = {}
for k in islands:
    iborders[k] = {}
    for i, j in islands[k]:
        for dx in [-1, 0, 1]:
            for dy in [-1, 0, 1]:
                if (i+dx, j+dy) in water:
                    iborders[k][(i+dx, j+dy)] = True

# Create models with specified variables
x = pulp.LpVariable.dicts("x", water.keys(), lowBound=0, upBound=1, cat=pulp.LpInteger)
pairs = [(b, c) for b in islands for c in islands if b < c]
yvals = []
for i, j in water:
    for b, c in pairs:
        for n in range(N):
            yvals.append((i, j, b, c, n))

y = pulp.LpVariable.dicts("y", yvals, lowBound=0, upBound=1)
l = pulp.LpVariable.dicts("l", pairs, lowBound=0, upBound=1)
mod = pulp.LpProblem("Islands", pulp.LpMinimize)

# Objective
mod += sum([water[k] * x[k] for k in water])

# Valid y
for k in yvals:
    i, j, b, c, n = k
    mod += y[k] <= x[(i, j)]
    if n == 0 and not (i, j) in iborders[b]:
        mod += y[k] == 0
    elif n > 0:
        mod += y[k] <= sum([y[(i+dx, j+dy, b, c, n-1)] for dx in [-1, 0, 1] for dy in [-1, 0, 1] if (i+dx, j+dy) in water])

# Valid l
for b, c in pairs:
    mod += l[(b, c)] <= sum([y[(i, j, B, C, n)] for i, j, B, C, n in yvals if (i, j) in iborders[c] and B==b and C==c])

# All islands connected (directly or indirectly)
ikeys = islands.keys()
for size in range(1, len(ikeys)/2+1):
    for S in itertools.combinations(ikeys, size):
        thisSubset = {m: True for m in S}
        Sprime = [m for m in ikeys if not m in thisSubset]
        mod += sum([l[(min(b, c), max(b, c))] for b in S for c in Sprime]) >= 1

# Solve and output
mod.solve()
for row in range(min([m[0] for m in water]), max([m[0] for m in water])+1):
    for col in range(min([m[1] for m in water]), max([m[1] for m in water])+1):
        if (row, col) in water:
            if x[(row, col)].value() > 0.999:
                print "B",
            else:
                print "-",
        else:
            print "I",
    print ""

This takes 1.4 seconds to run using the default solver from the pulp package (the CBC solver) and outputs the correct solution:

I I - - - - - I I 
- - B - - - B - - 
- - - B - B - - - 
- - - - B - - - - 
- - - - B - - - - 
- - - - B - - - - 
- - - I I I - - - 

Next, consider the full problem at the top of the question, which is a 13 x 14 grid with 7 islands:

water = {(i, j): 1.0 for i in range(13) for j in range(14)}
islands = {0: [(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)],
           1: [(9, 0), (9, 1), (10, 0), (10, 1), (10, 2), (11, 0), (11, 1),
               (11, 2), (12, 0)],
           2: [(0, 7), (0, 8), (1, 7), (1, 8), (2, 7)],
           3: [(7, 7), (8, 6), (8, 7), (8, 8), (9, 7)],
           4: [(0, 11), (0, 12), (0, 13), (1, 12)],
           5: [(4, 10), (4, 11), (5, 10), (5, 11)],
           6: [(11, 8), (11, 9), (11, 13), (12, 8), (12, 9), (12, 10), (12, 11),
               (12, 12), (12, 13)]}
for k in islands:
    for i, j in islands[k]:
        del water[(i, j)]

for i, j in [(10, 7), (10, 8), (10, 9), (10, 10), (10, 11), (10, 12),
             (11, 7), (12, 7)]:
    water[(i, j)] = 20.0

N = 7

MIP solvers often obtain good solutions relatively quickly and then spend a huge about of time trying to prove the optimality of the solution. Using the same solver code as above, the program does not complete within 30 minutes. However, you can provide a timeout to the solver to get an approximate solution:

mod.solve(pulp.solvers.PULP_CBC_CMD(maxSeconds=120))

This yields a solution with objective value 17:

I I - - - - - I I - - I I I 
I I - - - - - I I - - - I - 
I I - - - - - I - B - B - - 
- - B - - - B - - - B - - - 
- - - B - B - - - - I I - - 
- - - - B - - - - - I I - - 
- - - - - B - - - - - B - - 
- - - - - B - I - - - - B - 
- - - - B - I I I - - B - - 
I I - B - - - I - - - - B - 
I I I - - - - - - - - - - B 
I I I - - - - - I I - - - I 
I - - - - - - - I I I I I I 

To improve the quality of the solutions you obtain, you could use a commercial MIP solver (this is free if you are at an academic institution and likely not free otherwise). For instance, here's the performance of Gurobi 6.0.4, again with a 2-minute time limit (though from the solution log we read that the solver found the current best solution within 7 seconds):

mod.solve(pulp.solvers.GUROBI(timeLimit=120))

This actually finds a solution of objective value 16, one better than the OP was able to find by hand!

I I - - - - - I I - - I I I 
I I - - - - - I I - - - I - 
I I - - - - - I - B - B - - 
- - B - - - - - - - B - - - 
- - - B - - - - - - I I - - 
- - - - B - - - - - I I - - 
- - - - - B - - B B - - - - 
- - - - - B - I - - B - - - 
- - - - B - I I I - - B - - 
I I - B - - - I - - - - B - 
I I I - - - - - - - - - - B 
I I I - - - - - I I - - - I 
I - - - - - - - I I I I I I 

A brute-force approach, in pseudo-code:

start with a horrible "best" answer
given an nxm map,
    try all 2^(n*m) combinations of bridge/no-bridge for each cell
        if the result is connected, and better than previous best, store it

return best

In C++, this could be written as

// map = linearized map; map[x*n + y] is the equivalent of map2d[y][x]
// nm = n*m
// bridged = true if bridge there, false if not. Also linearized
// nBridged = depth of recursion (= current bridge being considered)
// cost = total cost of bridges in 'bridged'
// best, bestCost = best answer so far. Initialized to "horrible"
void findBestBridges(char map[], int nm,
   bool bridged[], int nBridged, int cost, bool best[], int &bestCost) {
   if (nBridged == nm) {
      if (connected(map, nm, bridged) && cost < bestCost) {
          memcpy(best, bridged, nBridged);
          bestCost = best;
      }
      return;
   }
   if (map[nBridged] != 0) {
      // try with a bridge there
      bridged[nBridged] = true;
      cost += map[nBridged];

      // see how it turns out
      findBestBridges(map, nm, bridged, nBridged+1, cost, best, bestCost);         

      // remove bridge for further recursion
      bridged[nBridged] = false;
      cost -= map[nBridged];
   }
   // and try without a bridge there
   findBestBridges(map, nm, bridged, nBridged+1, cost, best, bestCost);
}

After making a first call (I am assuming you are transforming your 2d maps to 1d arrays for ease of copying around), bestCost will contain the cost of the best answer, and best will contain the pattern of bridges that yields it. This is, however extremely slow.

Optimizations:

  • By using a "bridge limit", and running the algorithm for increasing maximum numbers of bridges, you can find minimal answers without exploring the whole tree. Finding a 1-bridge answer, if it existed, would be O(nm) instead of O(2^nm) - a drastic improvement.
  • You can avoid searching (by stopping recursion; this is also called "pruning") once you have exceeded bestCost, because it makes no sense to keep looking afterwards. If it can't get better, don't keep digging.
  • The above pruning works better if you look at "good" candidates before looking at "bad" ones (as it is, cells are all looked at in left-to-right, top-to-bottom order). A good heuristic would be to consider cells that are near to several unconnected components as higher-priority than cells that are not. However, once you add heuristics, your search starts to resemble A* (and you need some kind of priority queue, too).
  • Duplicate bridges and bridges to nowhere are to be avoided. Any bridge that does not disconnect the island network if removed is redundant.

A general search algorithm such as A* allows much faster search, although finding better heuristics is not a simple task. For a more problem-specific approach, using existing results on Steiner trees, as suggested by @Gassa, is the way to go. Note, however, that the problem of building Steiner trees on orthogonal grids is NP-Complete, according to this paper by Garey and Johnson.

If "good enough" is enough, a genetic algorithm can probably find acceptable solutions quickly, as long as you add a few key heuristics as to preferred bridge placement.