Michal Cutler

Web pages are not purely text, nor are they solely HTML. This paper surveys HTML web pages; not only on textual content, but with an emphasis on higher order visual features and supplementary technology. Using a crawler with an in-house... more

Web pages are not purely text, nor are they solely HTML. This paper surveys HTML web pages; not only on textual content, but with an emphasis on higher order visual features and supplementary technology. Using a crawler with an in-house developed rendering engine, data on a pseudo-random sample of web pages is collected. First, several basic attributes are collected to verify the collection process and confirm certain assumptions on web page text. Next, we take a look at the distribution of different types of page content (text, images, plug-in objects, and forms) in terms of rendered visual area. Those different types of content are broken down into a detailed view of the ways in which the content is used. This includes a look at the prevalence and usage of scripts and styles. We conclude that more complex page elements play a significant and underestimated role in the visually attractive, media rich, and highly interactive web pages that are currently being added to the World Wide...

Publisher: ACM

Publication Date: 2006

Publication Date: 2006

Publisher: J. Lang. Technol. Comput. Linguistics

Publication Date: 2009

Publication Name: J. Lang. Technol. Comput. Linguistics

Research Interests: Computer Science, Feature Extraction, Ldv, and Genre Classification<div>()</div>

Publication Date: 2004

Research Interests: Software Reliability, Budget Constraint, Compact Disk, Cost Optimization, and Software Structure<div>()</div>

Publisher: IEEE

Publication Name: Digest of Papers 1991 VLSI Test Symposium 'Chip-to-System Test Concerns for the 90's

Publication Date: 1993

Publication Name: Hci

Research Interests: HCI and Document Retrieval<div>()</div>

Publication Date: 2000

Publication Name: Proceedings 11th International Conference on Tools with Artificial Intelligence

Research Interests: Information Retrieval, Genetic Algorithm, and World Wide Web<div>()</div>

Publication Date: 2001

Publication Name: Series on Quality, Reliability and Engineering Statistics

Research Interests: Feature Extraction, Ldv, and Genre Classification<div>()</div>

Publication Date: 2006

Publication Name: Proceedings of the 2006 ACM symposium on Document engineering - DocEng '06

Publisher: IEEE

Publication Date: 2008

Publication Name: 2008 The Ninth International Conference on Web-Age Information Management

Publication Date: 2007

Publication Name: 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07)

Research Interests: Computer Science, Modules, Design Methodology, Branch and Bound, Logic Gates, and Logic circuits<div>()</div>

Publisher: IEEE

Publication Date: 2008

Publication Name: Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008)

Publication Date: 2007

Publication Name: Proceedings of the 18th conference on Hypertext and hypermedia - HT '07

Research Interests: HTML, E Commerce, Visual Features, and Genre Classification<div>()</div>

Publication Date: 1989

Publication Name: Defect and Fault Tolerance in VLSI Systems

Research Interests: Human Computer Interaction, HCI, and Document Retrieval<div>()</div>

Publication Date: 1996

Publisher: Wiley

Publication Date: 1978

Publication Name: Networks

Research Interests: Applied Mathematics, Networks, and Numerical Analysis and Computational Mathematics<div>()</div>

Publisher: Wiley-Blackwell

Publication Date: 1982

Publication Name: Networks

Research Interests: Applied Mathematics, Networks, and Numerical Analysis and Computational Mathematics<div>()</div>

Publisher: Wiley

Publication Date: 1980

Publication Name: Networks

Research Interests: Applied Mathematics, Networks, Traveling Salesman Problem, and Numerical Analysis and Computational Mathematics<div>()</div>

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Date: 1996

Publication Name: IEEE Transactions on Reliability

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Date: 1994

Publication Name: IEEE Transactions on Reliability

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Date: 1990

Publication Name: IEEE Transactions on Education

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Date: 2000

Publication Name: IEEE Transactions on Education

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Date: 1989

Publication Name: IEEE Transactions on Computers

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Date: 1991

Publication Name: IEEE Transactions on Computers

Publisher: Elsevier BV

Publication Date: 1998

Publication Name: Computers & Operations Research

Publication Date: 1997

Research Interests: Information Retrieval and World Wide Web<div>()</div>

Publication Date: 2009

Publication Name: Ldv Forum

Research Interests: Feature Extraction, Ldv, and Genre Classification<div>()</div>

Publication Date: 1996

Publication Name: Tr

Research Interests: Software Reliability, Tr, Budget Constraint, Compact Disk, and Cost Optimization<div>()</div>

Research Interests:
Computer Science, Feature Extraction, Ldv, and Genre Classification

Research Interests:
Software Reliability, Budget Constraint, Compact Disk, Cost Optimization, and Software Structure

Research Interests:
HCI and Document Retrieval

Research Interests:
Information Retrieval, Genetic Algorithm, and World Wide Web

Research Interests:
Feature Extraction, Ldv, and Genre Classification

Research Interests:
Computer Science, Modules, Design Methodology, Branch and Bound, Logic Gates, and Logic circuits

Research Interests:
HTML, E Commerce, Visual Features, and Genre Classification

Research Interests:
Human Computer Interaction, HCI, and Document Retrieval

Research Interests:
Applied Mathematics, Networks, and Numerical Analysis and Computational Mathematics

Research Interests:
Applied Mathematics, Networks, and Numerical Analysis and Computational Mathematics

Research Interests:
Applied Mathematics, Networks, Traveling Salesman Problem, and Numerical Analysis and Computational Mathematics

Research Interests:
Information Retrieval and World Wide Web

Research Interests:
Feature Extraction, Ldv, and Genre Classification

Research Interests:
Software Reliability, Tr, Budget Constraint, Compact Disk, and Cost Optimization