SQL query with comments import into R from file
Are you sure you can't just use it as is? This works despite taking up multiple lines and having a comment:
> library(sqldf)
> sql <- "select * -- my select statement
+ from BOD
+ "
> sqldf(sql)
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
This works too:
> sql2 <- c("select * -- my select statement", "from BOD")
> sql2.paste <- paste(sql2, collapse = "\n")
> sqldf(sql2.paste)
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
I had trouble with the other answer, so I modified Roman's and made a little function. This has worked for all my test cases, including multiple comments, single-line and partial-line comments.
read.sql <- function(filename, silent = TRUE) {
q <- readLines(filename, warn = !silent)
q <- q[!grepl(pattern = "^\\s*--", x = q)] # remove full-line comments
q <- sub(pattern = "--.*", replacement="", x = q) # remove midline comments
q <- paste(q, collapse = " ")
return(q)
}
Summary
Function clean_query
:
- Removes all mixed comments
- Creates single string output
- Takes a SQL path or text string
- Is simple
Function
require(tidyverse)
# pass in either a text query or path to a sql file
clean_query <- function( text_or_path = '//example/path/to/some_query.sql' ){
# if sql path, read, otherwise assume text input
if( str_detect(text_or_path, "(?i)\\.sql$") ){
text_or_path <- text_or_path %>% read_lines() %>% str_c(sep = " ", collapse = "\n")
}
# echo original query to the console
# (unnecessary, but helpful for status if passing sequential queries to a db)
cat("\nThe query you're processing is: \n", text_or_path, "\n\n")
# return
text_or_path %>%
# remove all demarked /* */ sql comments
gsub(pattern = '/\\*.*?\\*/', replacement = ' ') %>%
# remove all demarked -- comments
gsub(pattern = '--[^\r\n]*', replacement = ' ') %>%
# remove everything after the query-end semicolon
gsub(pattern = ';.*', replacement = ' ') %>%
#remove any line break, tab, etc.
gsub(pattern = '[\r\n\t\f\v]', replacement = ' ') %>%
# remove extra whitespace
gsub(pattern = ' +', replacement = ' ')
}
You could attach regexps together if you want incomprehensibly long expressions, but I recommend readable code.
Output for "query6.sql"
[1] " select a6.column1, a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1 "
Additional Text Input Example
query <- "
/* this query has
intentionally messy
comments
*/
Select
COL_A -- with a comment here
,COL_B
,COL_C
FROM
-- and some helpful comment here
Database.Datatable
;
-- or wherever
/* and some more comments here */
"
Call function:
clean_query(query)
Output:
[1] " Select COL_A ,COL_B ,COL_C FROM Database.Datatable "
If you want to test reading from a .sql file:
temp_path <- path.expand("~/query.sql")
cat(query, file = temp_path)
clean_query(temp_path)
file.remove(temp_path)