The WebAIM Million
An accessibility analysis of the top 1,000,000 home pages
This page reflects data from February 2019. For a complete and up-to-date information on the accessibility of the top one million home pages,see the main WebAIM Million Report.
In February 2019, WebAIM conducted an accessibility evaluation of the home pages for the top 1,000,000 web sites using theWAVE stand-alone API(with additional tools to collect site technology parameters). While this research focuses only on automatically detectable issues, the results paint a rather dismal picture of the current state of web accessibility for individuals with disabilities.
A re-analysis of these million pageswas conducted in August 2019 to identify changes over time.
The "top" million web sites were gleaned primarily usingthe Majestic Millions list与大多数指子网域。因为没有t all domains have home pages, the list of domains was supplemented with the top 250,000 domains from theOpen PageRank Initiativethat were not already in the Majestic Million list.
回家爸爸ges that returned errors (404, etc.) were not included. Pages with fewer than 10 HTML elements were also rejected—these tended to be placeholder or empty documents rather than home pages that convey content.
回家爸爸ges from 730 unique top-level domains were analyzed, with .com (521,316), .org (76,489), and .net (39,757) being the most common. 6,010 distinct .edu home pages were analyzed.
The WAVE accessibility engine was used to analyze the rendered home pages (i.e., the DOM of all pages after scripting and styles were applied). The WAVE engine uses heuristics and logic to detect patterns in web page content that align with end user accessibility issues and Web Content Accessibility Guidelines (WCAG) conformance failures. All automated tools, including WAVE, are limited in their detection of accessibility issues—only around 25% of possible conformance failures can be automatically detectable.Absence of detectable errors does not indicate that a site is accessible or compliant.Despite these limitations, the data presented in this report provide a meaningful representation of the state of web inaccessibility.
Why Only Home Pages?
We chose to focus only on home pages as a metric for web accessibility in general. Home pages are very often the most accessed pages on a web site and are the gateway to the rest of a web site's content. Home pages not only tend to receive the most attention from developers, but research indicates a correlation between issues detected on a home page and other site pages. Future research may explore additional pages beyond the home page.
Errors and Error Density
Errors are accessibility issues that are automatically detectable via WAVE, have notable end user impact, and are likely WCAG 2 conformance failures.59,653,607 distinct accessibility errors were detected across the 1 million home pages—an average of 59.6 errors per page.
Error density (number of errors divided by number of page elements) for all home pages was collected. 782,481,056 distinct HTML elements were analyzed, meaning there was an average of 782 elements per home page. This results in approximately 7.6% of all home page elements having a detectable accessibility error.Users with disabilities would expect to encounter detectable errors on 1 in every 13 elements with which they engage.
Error density is an interesting metric and is provided inthe site lookup. However, a significant increase in page elements ( There was no significant change in error counts or error density based on popularity rank. The home pages for the most popular domains had only slightly more errors and more elements than home pages for the least popular sites in the sample.
s, for example) may result in a lower error density (suggesting better accessibility), when in fact many new accessibility errors may have also been introduced. We have thus chosen to focus in this report on average number of detectable errors (end user barriers) present as opposed to error densities (how diluted those errors are within page elements).
There was no significant change in error counts or error density based on popularity rank. The home pages for the most popular domains had only slightly more errors and more elements than home pages for the least popular sites in the sample.
97.8% of home pages had detectableWCAG 2failures!These are only automatically detectable errors that align with WCAG conformance failures with a high level of reliability. Because automatically detectable errors constitute a small portion of all possible WCAG failures, this means that the actual WCAG 2 A/AA conformance level for the home pages for the most commonly accessed web sites is very low, perhaps below 1%.
|WCAG Failure Type||# of home pages||% of home pages|
|Low contrast text||852,868||85.3%|
|Missing alternative text for images||679,964||68%|
|Missing form input labels||528,482||52.8%|
|Missing document language||329,612||33.1%|
While failures are prevalent, the types of common errors are relatively few. Simply addressing these few types of issues would have a significant positive impact on web accessibility.
Low contrast text, below the WCAG 2 AA thresholds, was the most common accessibility issue detected. The vast majority (85.3%) of home pages analyzed had detectable WCAG contrast failures. Contrast errors were only detected on elements that contain text.On average, home pages had 36 distinct instances of text with insufficient contrast.4.6% of all home page HTML elements (this is all elements, not just visible elements with text) analyzed had insufficient contrast.
Images and Alternative Text
There were 36,713,043 images in the sample, or 36.7 images per home page on average.33.6% of all images (12.3 per page on average) had missingalternative text(not counting
alt=""). 18.5% of all images (6.7 per page on average) were linked images with missing or empty alternative text, resulting in both an alternative text issue and a link lacking any description. 16% of pages had images and no
altattributes at all.
16.8% of images that were assigned alternative text had questionable (such as alt="image", "graphic", "blank", a file name, etc.) or repetitive alternative text (alternative text identical to adjacent text or an adjacent image's alternative text).
If we assume that this million page sample is indicative of accessibility of broader web pages, these data indicate that around half of images encountered by users with disabilities would definitively have inappropriate alternative text.This, however, presumes that all other images were actually givenequivalentalternative text, which is certainly not the case. As an example, 4.5 million non-linked images (12.2% of all images) had been given
alt=""—it's likely that many of these images should have been assigned alternative text.
2,218 pages (.2% of the sample) had a
longdescattribute present. However, 49.7% of the 12,051
longdescattributes encountered had invalid values, such as an empty value, an invalid URL, an image file name, etc.
59% of the 3.4 million form inputs identified were unlabeled(either via
aria-labelledby). The presence ofunlabeled form controlswas a strong indicator of broader errors—pages with at least one missing form label averaged nearly 30 more errors than those without any label errors.
There were 18,910,980 headings detected. These break down to 1.7 million
s (9.1%), 5.9 million
s (31.4%), 6.5 million
s (34.5%), 3.2 million
s (16.7%), 1.1 million
s (5.7%), and .5 million
There were 908,784 instances of skipped heading levels (e.g., jumping from
)—one in every 20 headings was improperly structured.Skipped headings were present on 362,659 home pages (36.3% of all pages). 148,573 home pages (14.9%) had no headings present at all.
62.4% of home pages had at least one region defined. This includes pages with ARIA landmarks (e.g., a navigation region defined with the HTML
element and/or ARIA
role="navigation"). Pages with
(50.4%) were most common. 23.5% of home pages had a main element or landmark present, 19.1% had an aside/complementary region present, and 15.9% a search landmark.
Pages with at least one region averaged 7.6 regions. Pages with a
region defined, however, averaged notably more - 10.5 regions per page.
96.9% of pages with
have only one instance of
. Pages with a
element present averaged 2.1 of them per page, and pages with a
averaged 3.2 of them per page.
The presence of a
element was an indicator of better accessibility—those pages averaged 3 fewer errors than pages lacking a main region.
60.1% of the 1 million home pages had ARIA present.22.3 million page elements with ARIA attributes were detected. The number of ARIA attributes outpaced both the number of images present and the number of headings present. Home pages that included ARIA had an average of 38.3 ARIA attributes each. 19% of the ARIA attributes were
aria-describedby. NOTE: These figures do not include ARIA landmark roles.
回家爸爸ges with ARIA present averaged 26.7 more detectable errors than pages without ARIA!An increase in the number of ARIA attributes also had a moderate correlation with increased errors. In other words, the more ARIA in use, the higher the detectable errors. This does not necessarily mean that ARIA introduced these errors (it's likely these pages are simply more complex), but pages typically have more errors when ARIA is present, and even more so with higher ARIA usage.
9.6% of home pages had a“跳过”链接present. However, 14.3% of these pages had skip links that were broken—either they were hidden in a way that made them inaccessible or the target for the skip link was not present in the page.
Pages with at least one non-broken "skip" link present averaged 10.4 fewer errors than those without a "skip" link.This was one of the strongest indicators of better accessibility.
74.1% of home pages had a valid HTML5 doctype.Pages with a valid HTML5 doctype had significantly more page elements (average of 844 vs. 605) and errors (average of 61.9 vs. 53.3) than those with other doctypes.1,130 unique doctypes (most of these, obviously, being invalid) were encountered in the sample.
Pages from various top-level domains (TLDs) were analyzed for accessibility differences. Pages with .com (n=521,316) or .net (n=39,757) had just a few more errors on average than pages from other domains. Pages with .org (n=76,489), on the other hand, faired significantly better (47.4 errors on average) than those from other domains (60.6 errors).
Pages from the following highly common top-level domains (ordered by number of home pages in that TLD) had notably fewer errors than their counterparts:
- .de (Germany)
- .uk (United Kingdom)
- .jp (Japan)
- .nl (Netherlands)
- .edu (U.S.-based education institutions)
- .au (Australia)
- .ca (Canada)
Pages from the following highly common top-level domains had notably more errors than their counterparts:
- .ru (Russia)
- .cn (China)
- .pl (Poland)
- .br (Brazil)
- .it (Italy)
- .es (Spain)
回家爸爸ges with .edu (37.1 errors), .us (36.6 errors), and .gov (30.5 errors), which are all affiliated with U.S.-based entities, had the lowest number of average accessibility errors of all common (n>2000) domains.
Data regarding 1,195 different types of technologies used on the one million home pages were collected and analyzed. Technologies that were detected on more than 5,000 home pages (.5% of the sample) are listed below. The categorized tables below show the technology name, the number of home pages with that technology present, the average number of errors present on those pages, and the percent difference in number of average errors detected on pages with that technology present vs. those without. Technologies are ordered from "best" to "worst".
As an example, the first table indicates that home pages on the Squarespace CMS had 45.4% fewer errors (almost half as many) as pages that didn't utilize that technology, pages with WordPress exhibited little difference in accessibility errors, and pages on Blogger had 237% more errors (over 3 times as many) than other pages. It is important to note that correspondence of additional errors with a technology cannot automatically be attributed to that technology.
|CMS||# (and %) of home pages||Avg. # of errors||% decrease/increase of errors|
There is a wide diversity in the impact that the CMS choice appears to have on accessibility.
|Framework||# (and %) of home pages||Avg. # of errors||% decrease/increase of errors|
With the exception of MooTools and TweenMax, the adoption of any of these frameworks is aligned with additional accessibility errors. This does not necessarily mean that the frameworks caused these errors, but does indicate that home pages with these frameworks have more errors than pages without.
|Library||# (and %) of home pages||Avg. # of errors||% decrease/increase of errors|
|jQuery Migrate||313,391 (31.3%)||61.7||5.1%|
The vast majority of the top one million home pages utilize jQuery.回家爸爸ges with jQuery averaged 19.2 more errors than those without jQuery.The presence of jQuery corresponds with nearly 15 million detected errors, or over 25% of all of the accessibility errors we detected. Pages with jQuery were a bit more likely to have alternative text and contrast errors, but much more likely to have empty buttons (2.4 times as many), missing form labels (almost 3 times as many), and empty links (3.4 times as many) than non-jQuery pages. Interestingly, pages with jQuery were twice as likely to have the document language identified than pages without. Pages with jQuery were much more complex (844 elements on average) than other pages (605 elements on average).
|Web Framework||# (and %) of home pages||Avg. # of errors||% decrease/increase of errors|
|ZURB Foundation||25,390 (2.5%)||62.3||4.5%|
回家爸爸ges in the sample that utilize the popular Bootstrap framework had 1.3 million more accessibility errors than pages that did not utilize Bootstrap. We can't know from these data if Bootstrap introduced these errors, but there is a strong correspondence of increased errors when Bootstrap is present.
|Ad Network||# (and %) of home pages||Avg. # of errors||% decrease/increase of errors|
|Google AdSense||125,462 (12.5%)||100.9||87.8%|
Pages that utilized any of these popular ad systems had more errors on average than those that did not.回家爸爸ges that utilize the very common Google AdSense system had 47.2 more errors on average, nearly double, than other pages!
Other common technologies also resulted in pages having more errors. Pages with ReCaptcha had 14.9 more errors on average than those without. Pages with Google Maps averaged 13.9 more errors, those with PHP averaged 7.6 more errors, and those with Java averaged 4.7 more errors.
Here are several other fun facts regarding this research:
- The WebAIM Million database has 168,000,000 data points.
- It took 66.2 days of cumulative computer processing time to download and process all 1,000,000 home pages in the sample. This was shared among 5 AWS instances that ran continuously for 5 days.
- Despite being 2019, 11,200 home pages had
and 570 home pages had blinking content (
- 2,099,665 layout tables were detected compared to only 113,737 data tables.
- The most errors detected on a single home page was 26,680!
这些数据表明,仍有重要的我们rk to be done to ensure the web is made accessible to everyone. It is hopeful that this research will promote greater interest and effort to this end. While the volume of errors is disconcerting, most of the significant errors are of just a few types. We will publish additional analyses of this data and will conduct similar, more extensive research in the future.
There are countless ways in which this data can be examined and explored. This report really only scratches the surface. If you have questions about this research or would like us to analyze the database for something specific, pleasecontact us.