questions regarding the use of A* with the 15-square puzzle
An A-star search will find the optimal solution by proving that all paths that have not yet been solved are incapable of being solved with fewer moves than the current solution. You aren't looking for the best solution, but the fastest solution. Therefore, you can optimize your algorithm by returning the first solution, by weighting the number of moves lower than your heuristic function, and the heuristic can return an over-estimate.
The heuristic function itself is typically best modeled by the Manhattan distance and linear conflict. Manhattan distance is well explained in other answers and in the Wikipedia article, and you seem to have a handle on it. Linear conflict adds two to the manhattan distance for each pair of blocks that would have to be swapped to reach a solution. For example, if a row contains "3 2 1 4", then the one and the three have to be swapped, and one would have to be moved to another row to do so.
Using a pattern database is an option and could help your search avoid certain dead-ends, and the memory usage of doing so for a 15-puzzle should be manageable.
What are you using for test data? If it's random, you will not be able to solve the puzzle about half the time. It is not possible to switch two tiles while keeping the rest in the same position, and so if you reach what is almost the end position but has two tiles interchanged, you can't possibly get it into the desired position, and no search algorithm can possibly terminate successfully.
In the 19th Century, American puzzlemaster Sam Loyd sold these toys with the 15 and 14 reversed, and offered a big prize for anybody who could demonstrate a solution switching the tiles (presumably other than the one I've got, a small screwdriver). In today's legal climate, I don't know if he'd have dared.
One possibility would be to try to get it into either the correct configuration or the 15-14 configuration.
Use IDA* instead of A*. You need much less memory. As a heuristics, the "Walking distance" developed by Ken'ichiro Takahashi is much more effective, though using only 25 kB of memory.
Here and here is English translation.