SQL query with comments import into R from file

Are you sure you can't just use it as is? This works despite taking up multiple lines and having a comment:

> library(sqldf)
> sql <- "select * -- my select statement
+ from BOD
+ "
> sqldf(sql)
  Time demand
1    1    8.3
2    2   10.3
3    3   19.0
4    4   16.0
5    5   15.6
6    7   19.8

This works too:

> sql2 <- c("select * -- my select statement", "from BOD")
> sql2.paste <- paste(sql2, collapse = "\n")
> sqldf(sql2.paste)
  Time demand
1    1    8.3
2    2   10.3
3    3   19.0
4    4   16.0
5    5   15.6
6    7   19.8

I had trouble with the other answer, so I modified Roman's and made a little function. This has worked for all my test cases, including multiple comments, single-line and partial-line comments.

read.sql <- function(filename, silent = TRUE) {
    q <- readLines(filename, warn = !silent)
    q <- q[!grepl(pattern = "^\\s*--", x = q)] # remove full-line comments
    q <- sub(pattern = "--.*", replacement="", x = q) # remove midline comments
    q <- paste(q, collapse = " ")
    return(q)
}

Summary

Function clean_query:

  • Removes all mixed comments
  • Creates single string output
  • Takes a SQL path or text string
  • Is simple

Function

require(tidyverse)

# pass in either a text query or path to a sql file
clean_query <- function( text_or_path = '//example/path/to/some_query.sql' ){


  # if sql path, read, otherwise assume text input
  if( str_detect(text_or_path, "(?i)\\.sql$") ){

    text_or_path <- text_or_path %>% read_lines() %>% str_c(sep = " ", collapse = "\n")

  }


  # echo original query to the console 
  #  (unnecessary, but helpful for status if passing sequential queries to a db)
  cat("\nThe query you're processing is: \n", text_or_path, "\n\n")


  # return
  text_or_path %>% 
    # remove all demarked /*  */ sql comments 
    gsub(pattern = '/\\*.*?\\*/', replacement = ' ') %>% 
    # remove all demarked -- comments 
    gsub(pattern = '--[^\r\n]*', replacement = ' ') %>% 
    # remove everything after the query-end semicolon 
    gsub(pattern = ';.*', replacement = ' ') %>% 
    #remove any line break, tab, etc.
    gsub(pattern = '[\r\n\t\f\v]', replacement = ' ') %>%  
    # remove extra whitespace 
    gsub(pattern = ' +', replacement = ' ') 

}

You could attach regexps together if you want incomprehensibly long expressions, but I recommend readable code.



Output for "query6.sql"

[1] " select a6.column1, a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1 "



Additional Text Input Example

query <- "

    /* this query has 
    intentionally messy 
    comments
    */

    Select 
       COL_A -- with a comment here
      ,COL_B
      ,COL_C
    FROM 
      -- and some helpful comment here
      Database.Datatable
    ;
    -- or wherever

    /* and some more comments here */

"

Call function:

clean_query(query)

Output:

[1] " Select COL_A ,COL_B ,COL_C FROM Database.Datatable "



If you want to test reading from a .sql file:

temp_path <- path.expand("~/query.sql")

cat(query, file = temp_path)

clean_query(temp_path)

file.remove(temp_path)

Tags:

Sql

R