How to use the HTML Parser
This tool operates as a DOM segmentation and HTML extraction system that isolates the <head>, <body>, and <footer> sections from a requested URL’s rendered HTML. It removes non-semantic elements such as <script>, <style>, and <noscript> to expose the core structural layout. It also detects XML sitemaps through static path matching and hyperlink analysis. Each segment is returned in a minified format with unminify and copy options, including a truncated 128k version suitable for large language models.
This mirrors Google’s Stage 0 rendering and Stage 2 DOM structure mapping, where the rendered DOM is parsed to extract content blocks, headings, and links before semantic scoring and indexing. The tool supports preprocessing for LLMs to simulate Google’s page rendering stack as outlined in the Google Ranking Leaks.
Using this tool for SEO enables inspection of technical signals critical for indexing and ranking. The <head> section reveals metadata such as canonical URLs, meta descriptions, structured data, and hreflang tags, which directly impact Stage 2 metadata assignment. The <body> is where keyword proximity, heading hierarchy, early-term density, and entity mentions can be validated against neural passage extraction or ABC/T* scoring. The <footer> often includes redundant navigation or internal linking elements that affect internal importance flow and crawl depth. XML sitemap enumeration supports analysis of URL discovery quality and frequency, aligning with Google’s XML sitemap expectations. This segmentation allows LLM-based audits that replicate parts of Brisbane’s SEO framework by focusing on document quality, semantic clarity, and crawl prioritization across page zones.
This tool does not use any AI; it is purely built in JS and PHP.