How to get a random line of a text file in Java?
Either you
read the file twice - once to count the number of lines, the second time to extract a random line, or
use reservoir sampling
Reading the entire file if you want only one line seems a bit excessive. The following should be more efficient:
- Use RandomAccessFile to seek to a random byte position in the file.
- Seek left and right to the next line terminator. Let L the line between them.
- With probability (MIN_LINE_LENGTH / L.length) return L. Otherwise, start over at step 1.
This is a variant of rejection sampling.
Line lengths include the line terminator character(s), hence MIN_LINE_LENGTH >= 1. (All the better if you know a tighter bound on line length).
It is worth noting that the runtime of this algorithm does not depend on file size, only on line length, i.e. it scales much better than reading the entire file.
Here's a solution. Take a look at the choose() method which does the real thing (the main() method repeatedly exercises choose(), to show that the distribution is indeed quite uniform).
The idea is simple: when you read the first line it has a 100% chance of being chosen as the result. When you read the 2nd line it has a 50% chance of replacing the first line as the result. When you read the 3rd line it has a 33% chance of becoming the result. The fourth line has a 25%, and so on....
import java.io.*;
import java.util.*;
public class B {
public static void main(String[] args) throws FileNotFoundException {
Map<String,Integer> map = new HashMap<String,Integer>();
for(int i = 0; i < 1000; ++i)
{
String s = choose(new File("g:/temp/a.txt"));
if(!map.containsKey(s))
map.put(s, 0);
map.put(s, map.get(s) + 1);
}
System.out.println(map);
}
public static String choose(File f) throws FileNotFoundException
{
String result = null;
Random rand = new Random();
int n = 0;
for(Scanner sc = new Scanner(f); sc.hasNext(); )
{
++n;
String line = sc.nextLine();
if(rand.nextInt(n) == 0)
result = line;
}
return result;
}
}