Check line for unprintable characters while reading text file

Open the file with a FileInputStream, then use an InputStreamReader with the UTF-8 Charset to read characters from the stream, and use a BufferedReader to read lines, e.g. via BufferedReader#readLine, which will give you a string. Once you have the string, you can check for characters that aren't what you consider to be printable.

E.g. (without error checking), using try-with-resources (which is in vaguely modern Java version):

String line;
try (
    InputStream fis = new FileInputStream("the_file_name");
    InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8"));
    BufferedReader br = new BufferedReader(isr);
) {
    while ((line = br.readLine()) != null) {
        // Deal with the line
    }
}

While it's not hard to do this manually using BufferedReader and InputStreamReader, I'd use Guava:

List<String> lines = Files.readLines(file, Charsets.UTF_8);

You can then do whatever you like with those lines.

EDIT: Note that this will read the whole file into memory in one go. In most cases that's actually fine - and it's certainly simpler than reading it line by line, processing each line as you read it. If it's an enormous file, you may need to do it that way as per T.J. Crowder's answer.

Tags:

Java

File

File Io