How do I delete files in hdfs directory after reading it using scala?
fileStream
already handles that for you - from its Scaladoc:
Create an input stream that monitors a Hadoop-compatible filesystem for new files and reads them using the given key-value types and input format.
This means that fileStream
would only load new files (created after streaming context was started), any files that already existed in the folder before you started your streaming application would be ignored.
You can use the FileSystem
API:
import org.apache.hadoop.fs.{FileSystem, Path}
val fs = FileSystem.get(sc.hadoopConfiguration)
val outPutPath = new Path("/abc")
if (fs.exists(outPutPath))
fs.delete(outPutPath, true)