tagsoup-0.1: Parsing and extracting information from (possibly malformed) HTML documentsContentsIndex
Data.Html.TagSoup
Portabilityportable
Stabilitymoving towards stable
Maintainerhttp://www.cs.york.ac.uk/~ndm/
Contents
Data structures and parsing
Tag Combinators
Description

This module is for extracting information out of unstructured HTML code, sometimes known as tag-soup. This is for situations where the author of the HTML is not cooperating with the person trying to extract the information, but is also not trying to hide the information.

The standard practice is to parse a String to Tags using parseTags, then operate upon it to extract the necessary information.

Synopsis
data Tag
= TagOpen String [Attribute]
| TagClose String
| TagText String
type Attribute = (String, String)
parseTags :: String -> [Tag]
module Data.Html.Download
(~==) :: Tag -> Tag -> Bool
(~/=) :: Tag -> Tag -> Bool
isTagOpen :: Tag -> Bool
isTagClose :: Tag -> Bool
isTagText :: Tag -> Bool
fromTagText :: Tag -> String
isTagOpenName :: String -> Tag -> Bool
isTagCloseName :: String -> Tag -> Bool
sections :: (a -> Bool) -> [a] -> [[a]]
Data structures and parsing
data Tag
An HTML element, a document is [Tag]. There is no requirement for TagOpen and TagClose to match
Constructors
TagOpen String [Attribute]An open tag with Attributes in their original order.
TagClose StringA closing tag
TagText StringA text node, guranteed not to be the empty string
show/hide Instances
type Attribute = (String, String)
An HTML attribute id="name" generates ("id","name")
parseTags :: String -> [Tag]
Parse an HTML document to a list of Tag. Automatically expands out escape characters.
module Data.Html.Download
Tag Combinators
(~==) :: Tag -> Tag -> Bool

Performs an inexact match, the first item should be the thing to match. If the second item is a blank string, that is considered to match anything. For example:

 (TagText "test" ~== TagText ""    ) == True
 (TagText "test" ~== TagText "test") == True
 (TagText "test" ~== TagText "soup") == False

For TagOpen missing attributes on the right are allowed.

(~/=) :: Tag -> Tag -> Bool
Negation of ~==
isTagOpen :: Tag -> Bool
Test if a Tag is a TagOpen
isTagClose :: Tag -> Bool
Test if a Tag is a TagClose
isTagText :: Tag -> Bool
Test if a Tag is a TagText
fromTagText :: Tag -> String
Extract the string from within TagText, crashes if not a TagText
isTagOpenName :: String -> Tag -> Bool
Returns True if the Tag is TagOpen and matches the given name
isTagCloseName :: String -> Tag -> Bool
Returns True if the Tag is TagClose and matches the given name
sections :: (a -> Bool) -> [a] -> [[a]]
This function takes a list, and returns all initial lists whose first item matches the function.
Produced by Haddock version 0.8