Body Text extraction from websites e.g. extract only article heading and text not all text in site
In computer science literature this problem is usually referred to as the page segmentation or boiler plate detection problem. See the report Boilerplate Detection using Shallow Text Features and its related blog post. Also, I have a few reports and software sites bookmarked that address the problem. Also, see this stackoverflow question.