How can a genetic algorithm optimize a neural network's weights without knowing the search volume?

This seems to be a restatement of the core challenge of reinforcement learning with neural networks. You have a loss function that numerically quantifies how good the possible actions are in the current local of the solution space, such that when the action is taken will move you closer/further away from the global optima (the answer). {i.e. gradients w.r.t. loss function}

Before you start you cannot know where exactly the answer lies so you have an exploration policy that you define as part of the algorithm. This drives the exploration of the possible solution space guided by how much improvement certain actions have in moving closer to the answer as defined by the loss function.

At the outset the exploration is very aggressive and makes bold moves so that it can quickly explore the solution space. Then as areas of the solution space present as more promising the exploration becomes less bold in an attempt to converge on the solution.

In your case the exploration policy would vary the mutation size, mutation rate and the cross over of the chromosomes. The mutation size and rate would represent move size within a local and the crossover would represent a dimensional transposition in solution space.

So rather than have max/min you would have a starting position in solution space and assuming uniformly scaled and normalised solution space features a best guess would be any random spot in unit space.

The exploration policy would then select mutation size, rate and cross over to be initially aggressive to explore widely. Selection of subsequent generations would prefer ones that were closer to the answer and with a less aggressive exploration strategy. So the latter generations would tend to be closer to the ‘answer’ and also with a less aggressive exploration strategy and would thus tend to converge.

This article has a more formal review of the concepts.

https://towardsdatascience.com/reinforcement-learning-demystified-exploration-vs-exploitation-in-multi-armed-bandit-setting-be950d2ee9f6


Here's a story. There was once a presentation, probably of this paper, for genetic algorithms to configure the inputs, outputs, and architecture for indoor flight. That is, it hooked up stupid sensors to those floating indoor blimps and made them explore rooms while optimizing for straight and level flight.

The "genes" in this case were:

  • Choosing two or three input values from a list of responses to standard imaging processing filters (vertical edge detection, low contrast, line detection, etc.)
  • Choosing two output connections from a a list of standard voltage profiles for each engine (hard ramp/slow ramp/immediate to 0%, 50%, 100%, -50%, -100%, etc.)
  • Choosing connections between nodes in a two level neural network, each layer having only five nodes. For example, "input 2 attaches to node 3 in layer 1". Only some fraction (30%?) of connections would be allowed.

So, one DNA consisted of two input nodes, fifty connections, and two output nodes. A population starts with a hundred random DNA choices, runs the blimps which trains the neural nets selected, calculates the level flight time, and breeds. By breed, I mean it kills the lowest scoring half and creates mutated copies of the winners. Success happened.

Now, relating to your problem.

You need to be very clear on what can be your genes. Some good choices might be:

  • Network architecture, as in the above story
  • Hyper parameters for knock-out, learning rates, restarts, loss functions, and more.
  • Initial weight distributions, actually more parameters, including some for adding occasional wild weights.
  • Wild kicks to one parameter or another, meaning picking an axis or two to search with wild values or with fine grain precision.

Also remember that mutation and cross breeding are different. You should allow wild mutations sometimes. A common tactic to to breed about 70% (make a copy, swapping some genes) and mutating about 30% (copy a survivor and make random changes).

As is often with quick advice, I'm guessing what's not said in your description. If I'm totally off base on what you are doing, pretend its on base; you are likely to be the one to solve your problem.