How to parse non-standard month names with DateTimeFormatter
You could use a DateTimeFormatterBuilder
:
private static final DateTimeFormatter formatter = new DateTimeFormatterBuilder()
.appendOptional(DateTimeFormatter.ofPattern("d. MMM. HH:ss"))
.appendOptional(DateTimeFormatter.ofPattern("d. MMMM HH:ss"))
.toFormatter(Locale.GERMAN);
Running it on this:
Stream.of(("10. Jan. 18:14\n" +
"8. Feb. 19:02\n" +
"1. Mär. 19:40\n" +
"4. Apr. 18:55\n" +
"2. Mai 21:55\n" +
"5. Juni 08:25\n" +
"5. Juli 20:09\n" +
"1. Aug. 13:42").split("\n"))
.map(formatter::parse)
.forEach(System.out::println);
you get:
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=10, MonthOfYear=1, MilliOfSecond=0, SecondOfMinute=14, HourOfDay=18},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=8, MonthOfYear=2, MilliOfSecond=0, SecondOfMinute=2, HourOfDay=19},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=1, MonthOfYear=3, MilliOfSecond=0, SecondOfMinute=40, HourOfDay=19},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=4, MonthOfYear=4, MilliOfSecond=0, SecondOfMinute=55, HourOfDay=18},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=2, MonthOfYear=5, MilliOfSecond=0, SecondOfMinute=55, HourOfDay=21},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=5, MonthOfYear=6, MilliOfSecond=0, SecondOfMinute=25, HourOfDay=8},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=5, MonthOfYear=7, MilliOfSecond=0, SecondOfMinute=9, HourOfDay=20},ISO
{NanoOfSecond=0, MicroOfSecond=0, DayOfMonth=1, MonthOfYear=8, MilliOfSecond=0, SecondOfMinute=42, HourOfDay=13},ISO
You can regex replace the month portion so it's always 3 characters length before parsing it using "d. MMM HH:mm"
text = text.replaceFirst("(\\S+\\s\\S{3})\\S", "$1")
Explanation for the regex part: Find 1 or more non-whitespace (\S+) followed by 1 whitespace (\s) followed by three non-whitespace (\S{3}) followed by one non-whitespace, and replace it with the portion inside first bracket ($1)
10. Jan. 18:14
will become 10. Jan 18:14
and
5. Juni 08:25
will become 5. Jun 08:25
As pointed out it would be easier to use a standard and consistent format - here you are mixing long and short month names.
One option (short of using a DateTimeFormatterBuilder
) is to handle both cases separately:
private static final DateTimeFormatter SHORT_MONTH = DateTimeFormatter.ofPattern("d. MMM. HH:ss", Locale.GERMAN);
private static final DateTimeFormatter LONG_MONTH = DateTimeFormatter.ofPattern("d. MMMM HH:ss", Locale.GERMAN);
private static TemporalAccessor parse(String s) {
try {
return SHORT_MONTH.parse(s);
} catch (DateTimeParseException e) {
return LONG_MONTH.parse(s);
}
}
The answer to the problem is the DateTimeFormatterBuilder
class and the appendText(TemporalField, Map)
method. It allows any text to be associated with a value when formatting or parsing, which solves the problem effectively and elegantly:
Map<Long, String> monthNameMap = new HashMap<>();
monthNameMap.put(1L, "Jan.");
monthNameMap.put(2L, "Feb.");
monthNameMap.put(3L, "Mar.");
DateTimeFormatter fmt = new DateTimeFormatterBuilder()
.appendPattern("d. ")
.appendText(ChronoField.MONTH_OF_YEAR, monthNameMap)
.appendPattern(" HH:mm")
.parseDefaulting(ChronoField.YEAR, 2016)
.toFormatter();
System.out.println(LocalDateTime.parse("10. Jan. 18:14", fmt));
System.out.println(LocalDateTime.parse("8. Feb. 19:02", fmt));
Some notes:
- The
monthNameMap
must be populated with all 12 months - The formatter should normally be assigned to a static final constant, rather than being created all the time
- The
parseDefaulting(YEAR, 2016)
has been added so thatLocalDateTime.parse(String, DateTimeFormatter)
can be used directly. Without it, there would be no year, and thus nothing more than aTemporalAccessor
could be parsed (the year must be a leap year, in case 29th Feb is being parsed)