A Nice, Simple Web Scraping Code Golf

Bash + lynx, 73

lynx -dump http://nyti.ms/2BY72Ph|grep -Po '\[(1[6-9]|2[0-3])]\K[^.]*...'

Output is as follows:

General Electric Co          10.11
Advanced Micro Devices Inc   33.13
PG&E Corp                    6.14 
Fiat Chrysler Automobiles NV 14.98
Neuralstem Inc               1.83 
AT&T Inc                     38.20
GrubHub Inc                  34.00
Twitter Inc                  29.86

R, 97 bytes

library(rvest)
matrix(html_text(html_nodes(html("http://nyti.ms/2BY72Ph"),"td"))[2:25],,3,T)[,-3]

Try it online!

Gets all the td elements of the page. We want elements 2 to 25, which gives everything we need plus the % change, which is removed by omitting the 3rd column of the final matrix.

Outputs as a character matrix, since this is less bytes than a data frame:

     [,1]                  [,2]   
[1,] "General Electric Co" "10.11"
[2,] "PG&E Corp"           "6.14" 
[3,] "Neuralstem Inc"      "1.83" 
[4,] "GrubHub Inc"         "34.00"
[5,] "Twitter Inc"         "29.86"
[6,] "Zynga Inc"           "6.21" 
[7,] "Iveric Bio Inc"      "3.56" 
[8,] "Pfizer Inc"          "38.48"

PHP, 154 153 130 bytes

foreach(DOMDocument::loadHTMLFile('http://nyti.ms/2BY72Ph')->getElementsByTagName(td)as$i)$x<24&&print$x++%3?"$i->nodeValue ":"
";

Try it online!

-23 bytes thx to @Night2!

$ php nyt.php

Agile Therapeutics Inc 1.38
Advanced Micro Devices Inc 34.05
Facebook Inc 191.82
Kraft Heinz Co 32.34
Sirius XM Holdings Inc 6.72
Apple Inc 247.40
Zynga Inc 6.24
Snap Inc 15.11

A Nice, Simple Web Scraping Code Golf

Bash + lynx, 73

R, 97 bytes

PHP, 154 153 130 bytes

Tags:

Code Golf

Internet

Related

Recent Posts