How to make a picture grid of all Nobel Laureates in physics?

Here's something to get you started down to path of scraping the somewhat larger individual pictures from the Nobel website:

links = Import[
   "https://www.nobelprize.org/nobel_prizes/physics/laureates/index.html?images=yes", "Hyperlinks"];

individualpagelinks =
  Select[
   links,
   StringMatchQ[
    "https://www.nobelprize.org/nobel_prizes/physics/laureates/" ~~ NumberString ~~ "/" ~~ name__ ~~ "-facts.html"]
   ];

postcardpictures =
  StringCases[
     individualpagelinks,
     "https://www.nobelprize.org/nobel_prizes/physics/laureates/" ~~ year : NumberString ~~ "/" ~~ name__ ~~ "-facts.html"
      :>
      "https://www.nobelprize.org/nobel_prizes/physics/laureates/" <> year <> "/" <> name <> "_postcard.jpg"
     ] // Flatten // DeleteDuplicates;

Import /@ postcardpictures[[1 ;; 5]]

sample pictures

I found it easier to extract the rationale for the prizes from the Wikipedia table of Nobel Prize winners in physics:

wikidata = Import[
             "https://en.wikipedia.org/wiki/List_of_Nobel_laureates_in_Physics", 
             "Data"
           ];
Cases[
  wikidata,
  {year_, name_, _, rationale_}
  :>
  {year, StringDelete[rationale, {Whitespace ~~ "[" ~~ NumberString ~~ "]", "\""}]},
  Infinity
][[2 ;; -2]]

(* Out: 
{
 {1901, "in recognition of the extraordinary services he has rendered by the discovery of the remarkable rays subsequently named after him"}, 
 {1902, "in recognition of the extraordinary service they rendered by their researches into the influence of magnetism upon radiation phenomena"}, 
...
}
*)

Some manual cleanup will be necessary here: the somewhat naive method I proposed is confused by nested tables...

Using jSoupLink:

<< jSoupLink`
ParseHTML[
  "https://www.nobelprize.org/nobel_prizes/physics/laureates/1921/einstein-facts.html",
  ".laureate_info_wrapper p",
  "text"
  ] // TableForm

Mathematica graphics

It is possible to be more precise:

ParseHTML[
 "https://www.nobelprize.org/nobel_prizes/physics/laureates/1921/einstein-facts.html",
 "span[itemprop=birthDate]",
 "text"
 ]

{"14 March 1879"}

I don't intend to explain how I figure out the CSS rules but there are a lot of things you can do quite easily with this jSoupLink if you know how. You could write a script that starts from this directory of Nobel Prizes and recursively collect data from all laureates, for example.

Thanks to MarcoB and C. E. From them I learned how to deal with HTML contents using Mathematica.

I now summerize my final approach below (it is a bit long, so I make it an answer).

In this approach, I use information all from www.nobelprize.org and mathematica features that are all built-in.

individualpagelinks is a list of all Nobel Laureates information page hyperlinks(I learned from MarcoB)

links = DeleteDuplicates@
   Import["https://www.nobelprize.org/nobel_prizes/physics/laureates/\
index.html?images=yes", "Hyperlinks"];
individualpagelinks = 
  Select[links, 
   StringMatchQ[
    "https://www.nobelprize.org/nobel_prizes/physics/laureates/" ~~ 
     NumberString ~~ "/" ~~ __ ~~ "-facts.html"]];

To fetch essential information on "...-facts.html". The key is to import with option "XMLObject". like this

Import[individualpagelinks[[1]], "XMLObject"]

To know which expression contains the information you want, just Ctrl+F and search, for example search "birth" in the output cell, and you can find XMLElement["span", {"itemprop" -> "birthDate"}, {"9 March 1959"}] contains the information

Then use Cases to get all information you need.

In getData, The order of information: image, year, given name, family name, birth date, birth place, death date, death place, prize motivation, fields(if it exists)

Clear[getData];
getData[link_] := Module[{data},
  data = Import[link, "XMLObject"];
  {Import[StringReplace[link, "-facts.html" -> "_postcard.jpg"]],
   StringTrim /@ {StringCases[link, NumberString][[1]],
     Cases[data, 
       XMLElement["span", {"itemprop" -> "givenName"}, {x_}] -> x, 
       Infinity][[1]],
     Cases[data, 
       XMLElement["span", {"itemprop" -> "familyName"}, {x_}] -> x, 
       Infinity][[1]],
     Cases[data, {XMLElement["strong", {}, {"Born:"}], ___}, 
       Infinity][[1, 3, -1, 1]],
     StringSplit[
      Cases[data, {XMLElement["strong", {}, {"Born:"}], ___}, 
        Infinity][[1, -1]], ","],
     Sequence@
      If[tmp = 
        Cases[data, {XMLElement["strong", {}, {"Died:"}], ___}, 
         Infinity]; tmp =!= {},
       {tmp[[1, 3, -1, 1]], StringSplit[tmp[[1, -1]], ","]}, "live"],
     Cases[
       data, {XMLElement["strong", {}, {"Prize motivation:"}], x_} -> 
        x, Infinity][[1]],
     If[tmp = 
       Cases[data, {XMLElement["strong", {}, {"Field:"}], x_} -> x, 
        Infinity]; tmp =!= {}, tmp[[1]], Nothing]}}]

labeledPicture label image with year, name, country of birth

labeledPicture[dataEntry_] := Labeled[dataEntry[[1]],
  Column[{dataEntry[[2, 1]], 
    dataEntry[[2, 2]] <> " " <> dataEntry[[2, 3]], 
    " (" <> dataEntry[[2, 5, -1]] <> ")"}, "Center"]]

Here is an example with recent 10 Nobel Laureates in physics

data = ParallelMap[getData, individualpagelinks[[1 ;; 10]]];
Grid[Partition[labeledPicture /@ data, 5, 5, {1, 1}, {}]]

This will give

enter image description here

How to make a picture grid of all Nobel Laureates in physics?

Tags:

Web Access

Wolfram Alpha Queries

Data Acquisition

Related

Recent Posts