Faceți căutări pe acest blog

marți, 19 iunie 2012

java mbox parsing with Apache James Mime4j

Today I just made a small push to Apache James Mim4j trunk. The change-set adds mbox parsing capabilities to mime4j. It's a one class Iterator that you can use to split an mbox file into individual messages and after this parse them with mime4j.

I also added an simple example here.

This is how you can use it:

for (CharBufferWrapper message : MboxIterator.fromFile(mbox).charset(ENCODER.charset()).build()) {
System.out.println(messageSummary(message.asInputStream(ENCODER.charset())));
count++;
}


5 comentarii:

  1. I get an exception when I try this on a mbox file exported from Gmail using fetchmail.

    Exception in thread "main" java.lang.IllegalArgumentException: File does not contain From_ lines! Maybe not be a vaild Mbox.
    at org.apache.james.mime4j.mboxiterator.MboxIterator.initMboxIterator(MboxIterator.java:85)

    RăspundețiȘtergere
  2. Sorry, I got it now. The mbox generated by fetchmail for a Gmail export has an unexpected From_ line, like: "From william Tue Jul 3 13:05:05 2012" -- there's no @ sign as you can see. After adjusting the regex pattern, your iterator works pretty cool!

    Only problem remaining though is it doesn't work if line terminators in the mbox are CRLF. Only works if line terminator is just LF, otherwise fields not parsed (the field map is empty). Can't find a way to fix that... Downloading a mbox from Google Apps for Business gives a mbox with CRLF after decrypting with private key :(

    RăspundețiȘtergere
  3. Thank you for the feedback. Sorry for not replying in time, I didn't get any email notification. The code is part of Apache Mime4j (Apache James project).

    RăspundețiȘtergere
  4. Thanks for making mime4j more awesome.

    Is latest version available via maven ?
    the latest version in maven I see 0.7.2 and I assume mbox is implemented as part for 0.8.

    RăspundețiȘtergere
  5. I also wanted to use the 0.8.snapshot version with new MboxIterator. How can I get it through Maven?

    RăspundețiȘtergere