How to speed up computation of medoids?

If you're on 10.3+, this should be faster (it's two orders of magnitude faster than your first example on my loungebook):

medioid=With[{m = #, d = Tr /@ DistanceMatrix[#, DistanceFunction ->SquaredEuclideanDistance]},
             {First@m[[#]], First@#} &@Pick[Range@Length@d, d, Min@d]]&;

Update: You can go even faster with Compile, and exploit the Listable and Parallelization attributes to great effect, if you have a multi-core machine:

SeedRandom[0];
cluster = RandomReal[{0, 1}, {5000, 14}];

myDistMatrix = Compile[{{point, _Real, 1}, {tr, _Real, 2}}, 
                 Total[(point - tr)^2], 
                 RuntimeOptions -> "Speed", 
                 RuntimeAttributes -> {Listable}, 
                 Parallelization -> True,
                 CompilationTarget -> "C"];
medioidCompile[cluster_] := 
  Module[{distances, indexOfMin}, 
   distances = Tr /@ myDistMatrix[cluster, Transpose[cluster]];
   indexOfMin = First[Ordering[distances]];
   {cluster[[indexOfMin]], indexOfMin}];

AbsoluteTiming[medioidCompile[cluster];]
(* 0.35 seconds *)
(* Original: 29.05 seconds *)

If you have Mathematica 10.3, you can use the experimental function DistanceMatrix for a speed-up of 50x:

medioidFast[cluster_] := 
  Module[{distances, indexOfMin}, 
   distances = Tr/@DistanceMatrix[cluster, DistanceFunction -> SquaredEuclideanDistance];
   indexOfMin = First[Ordering[distances]];
   {cluster[[indexOfMin]], indexOfMin}];

AbsoluteTiming[medioidFast[cluster];]
(* 0.521 seconds *)
(* Original: 29.05 seconds *)

(* Also, for fun: *)
medioidFast[cluster]; // RepeatedTiming (* 0.522 seconds *)
medioidCiao[cluster]; // RepeatedTiming (* 0.522 seconds *)

For versions prior to 10.3, or if you don't want to use an experimental function, then the following solution isn't bad. Taking inspiration from Leonid's answer here, you can achieve a speed-up of 12x:

medioid[cluster_] := Module[{distances, indexOfMin}, 
         distances = Tr/@With[{tr = Transpose[cluster]}, 
               Function[point, Total[(point - tr)^2]] /@ cluster];
         indexOfMin = First[Ordering[distances, 1]]; 
         {cluster[[indexOfMin]], indexOfMin}];

AbsoluteTiming[medioid[cluster];]
(* 2.44 seconds *)
(* Original: 29.05 seconds *)

This is much faster:

distMatCompiled = Compile[{{cluster, _Real, 2}},
  Outer[Function[diff, diff.diff][#1 - #2] &, cluster, cluster, 1, 1]
  , CompilationTarget -> "C"
  ]

medoidCompiled[cluster_] := Block[{distances, indexOfMin},
  distances = Total@distMatCompiled[cluster];
  indexOfMin = First[Ordering[distances, 1]];
  {cluster[[indexOfMin]], indexOfMin}
  ]

On my machine:

SeedRandom[0];
cluster = RandomReal[{0, 1}, {5000, 14}];

AbsoluteTiming[m1 = medoidSimple[cluster]]
(* 48.116032 *)

AbsoluteTiming[m2 = medoidCompiled[cluster]]
(* 6.779268 *)

m1 == m2
(* True *)

The reason you couldn't compile efficiently in your example is that SquaredEuclideanDistance is not in the list of compilable functions.

UPDATE:

Just for fun I wanted to implement your optimization idea:

medoidCompiled2 = Compile[{{cluster, _Real, 2}},
  Block[{len = Length[cluster], bestInd = 1, bestVal, partsum, j},
   bestVal = 
    Total[Table[
      Function[x, x.x][cluster[[1]] - cluster[[i]]], {i, len}]];
   Do[
    partsum = 0.;
    j = 1;
    While[j <= len && partsum < bestVal,
     partsum += Function[x, x.x][cluster[[i]] - cluster[[j]]];
     j++
     ];
    If[j == len + 1 && partsum < bestVal, bestVal = partsum; 
     bestInd = i]
    , {i, 2, len}
    ];
   bestInd
   ]
  , CompilationTarget -> "C"
  ]

It performs a bit better than my first attempt:

AbsoluteTiming[m3 = medoidCompiled2[cluster]]
(* 4.286030 *)

Last[m1] == m3
(* True *)

Of course this cannot beat the insane timings ciao and blochwave have come up with ;)