| tagsoup-0.1: Parsing and extracting information from (possibly malformed) HTML documents | Contents | Index |
|
Data.Html.TagSoup | Portability | portable | Stability | moving towards stable | Maintainer | http://www.cs.york.ac.uk/~ndm/ |
|
|
|
|
|
Description |
This module is for extracting information out of unstructured HTML code,
sometimes known as tag-soup. This is for situations where the author of
the HTML is not cooperating with the person trying to extract the information,
but is also not trying to hide the information.
The standard practice is to parse a String to Tags using parseTags, then
operate upon it to extract the necessary information.
|
|
Synopsis |
|
|
|
|
Data structures and parsing
|
|
data Tag |
An HTML element, a document is [Tag].
There is no requirement for TagOpen and TagClose to match
| Constructors | | Instances | |
|
|
type Attribute = (String, String) |
An HTML attribute id="name" generates ("id","name")
|
|
parseTags :: String -> [Tag] |
Parse an HTML document to a list of Tag.
Automatically expands out escape characters.
|
|
module Data.Html.Download |
|
Tag Combinators
|
|
(~==) :: Tag -> Tag -> Bool |
Performs an inexact match, the first item should be the thing to match.
If the second item is a blank string, that is considered to match anything.
For example:
(TagText "test" ~== TagText "" ) == True
(TagText "test" ~== TagText "test") == True
(TagText "test" ~== TagText "soup") == False
For TagOpen missing attributes on the right are allowed.
|
|
(~/=) :: Tag -> Tag -> Bool |
Negation of ~==
|
|
isTagOpen :: Tag -> Bool |
Test if a Tag is a TagOpen
|
|
isTagClose :: Tag -> Bool |
Test if a Tag is a TagClose
|
|
isTagText :: Tag -> Bool |
Test if a Tag is a TagText
|
|
fromTagText :: Tag -> String |
Extract the string from within TagText, crashes if not a TagText
|
|
isTagOpenName :: String -> Tag -> Bool |
Returns True if the Tag is TagOpen and matches the given name
|
|
isTagCloseName :: String -> Tag -> Bool |
Returns True if the Tag is TagClose and matches the given name
|
|
sections :: (a -> Bool) -> [a] -> [[a]] |
This function takes a list, and returns all initial lists whose
first item matches the function.
|
|
Produced by Haddock version 0.8 |