Using R to Analyze Balance Sheets and Income Statements

You are making the common mistake of confusing 'access to Yahoo or Google data' with 'everything I see on Yahoo or Google Finance can be downloaded'.

When R functions download historical stock price data, they almost always access an interface explicitly designed for this purpose as e.g. a cgi handler providing csv files given a stock symbol and start and end date. So this easy as all we need to do is form the appropriate query, hit the webserver, fetch the csv file an dparse it.

Now balance sheet information is (as far as I know) not available in such an interface. So you will need to 'screen scrape' and parse the html directly.

It is not clear that R is the best tool for this. I am aware of some Perl modules for the purpose of getting non-time-series data off Yahoo Finance but have not used them.


The question you want to ask, and get an answer to!, is where can I get free XBRL data for analysing corporate balance sheets, and is there a library for consuming such data in R?

XBRL (Extensible Business Reporting Language - http://en.wikipedia.org/wiki/XBRL) is a standard for marking up accounting statments (income statements, balance sheets, profit & loss statements) in XML format such that they can easily be parsed by computer and put into a spreadsheet.

As far as I know, a lot of corporate regulators (e.g. the SEC in the US, ASIC in Australia) are encouraging the companies under their jurisdiction to report using such a format, or running pilots, but I don't believe it has been mandated at this point. If you limited your investment universe (I am assuming you want this data in electronic format for investment purposes) to firms that have made their quarterly reports freely available in XBRL form, I expect you will have a pretty short list of firms to invest in!

Bloomberg, Reuters et al all have pricey feeds for obtaining corporate fundamental data. There may also be someone out there running a tidy business publishing balance sheets in XBRL format. Cheaper, but still paid for, are XIgnite's xFundamentals and xGlobalFundamentals web services, but you aren't getting full balance sheet data from them.


to read-in the financial information try this function ( I picked it up several months ago and made some small adjustments)

require(XML)
require(plyr)

getKeyStats_xpath <- function(symbol) {
  yahoo.URL <- "http://finance.yahoo.com/q/ks?s="
  html_text <- htmlParse(paste(yahoo.URL, symbol, sep = ""), encoding="UTF-8")

  #search for <td> nodes anywhere that have class 'yfnc_tablehead1'
  nodes <- getNodeSet(html_text, "/*//td[@class='yfnc_tablehead1']")

  if(length(nodes) > 0 ) {
    measures <- sapply(nodes, xmlValue)

    #Clean up the column name
    measures <- gsub(" *[0-9]*:", "", gsub(" \\(.*?\\)[0-9]*:","", measures))   

    #Remove dups
    dups <- which(duplicated(measures))
    #print(dups) 
    for(i in 1:length(dups)) 
      measures[dups[i]] = paste(measures[dups[i]], i, sep=" ")

    #use siblings function to get value
    values <- sapply(nodes, function(x)  xmlValue(getSibling(x)))

    df <- data.frame(t(values))
    colnames(df) <- measures
    return(df)
  } else {
    break
  }
}

to use it, compare for example 3 companies and write the data into a csv-file do the following:

tickers <- c("AAPL","GOOG","F")
stats <- ldply(tickers, getKeyStats_xpath)
rownames(stats) <- tickers
write.csv(t(stats), "FinancialStats_updated.csv",row.names=TRUE) 

Just tried it. Still working.

UPDATE as Yahoo changed it’s web site layout:

The function above does not work anymore as Yahoo again changed its web site layout. Fortunately its still easy to get the financial infos as the tags for getting fundamental data have not been changed. example for downloading a file with eps and P/E ratio for MSFT, AAPL and Ford insert the following into your browser:

http://finance.yahoo.com/d/quotes.csv?s=MSFT+AAPL+F&f=ser

and after entering the above URL into your browser’s address bar and hitting return/enter. The CSV will be automatically downloaded to your computer and you should get the cvs file as shown below (data as 7/22/2016):

enter image description here

some yahoo tags for fundamental data:

enter image description here


I have found on the net only a partial solution to your issue, for I managed to retrieve only the Balance sheet info and financial statement for one year. I don't know how to do it for more years. There is a package in R, called quantmod, which you can install from CRAN

install.packages('quantmod')

Then you can do the following: Suppose you want to get the financial info from a company listed at NYSE : General Electric. ticker: GE

 library(quantmod)
 getFinancials('GE')
 viewFinancials(GE.f)

To get only the income statement, reported anually, as a data frame use this:

viewFinancials(GE.f, "IS", "A")

Please let me know if you find out how to do this for multiple years.

Tags:

R

Finance