The deep Web, sometimes called the invisible Web, is the large part of the Internet that is inaccessible to conventional search engines. Deep Web content includes email messages, chat messages, private content on social media sites, electronic bank statements, electronic health records (EHRs) and other content that is accessible over the Internet but is not crawled and indexed by search engines like Google, Yahoo, Bing or DuckDuckGo.
The reasons for not indexing deep Web content are varied. It may be that the content is proprietary, in which case the content can only be accessed by approved visitors coming in through a virtual private network (VPN). Or the content may be commercial, in which case the content resides behind a member wall and can only be accessed by customers who have paid a fee. Or perhaps the content contains personal identifiable information (PII), in which case the content is protected by compliance regulations and can only be accessed through a portal site by individuals who have been granted access privileges. When mashups have been generated on the fly and components lack a permanent uniform resource location (URL), they also becomes part of the deep Web.
It is not known how large the deep Web is, but many experts estimate that search engines crawl and index less than 1% of all the content that can be accessed over the Internet. That part of the Internet which is crawled and indexed by search engines is sometimes referred to as the surface Web. The term “deep Web” was coined by BrightPlanet in a 2001 white paper entitled ‘The Deep Web: Surfacing Hidden Value’ and is often confused in the media with the term dark Web. Like deep Web content, dark Web content cannot be accessed by conventional search engines, but most often the reason is because the content is illegal.