I am toying with this ruby library called “Readability”
https://github.com/iterationlabs/ruby-readability
It is a ruby implementation of the popular Readability Bookmarklet. So far, it has been very good, though a couple of poorly structured sites were unreadable. Just follow the instructions on the github page – no hacks needed. I would recommend you go through the code to understand what it does.
Namely, it assigns rankings to various nodes after removing unacceptable ones. For example, it may remove nodes with a class that has ‘comment’ in it, but keep ‘article’.
There are other libraries I came across, including a JAVA one, but for now I am sticking with this for my Rails app.
I have a little fork for my own usage at https://github.com/murtada/ruby-readability