Uses of Interface
org.apache.nutch.protocol.Protocol
-
Packages that use Protocol Package Description org.apache.nutch.protocol Classes related to theProtocolinterface, see alsoorg.apache.nutch.net.protocols.org.apache.nutch.protocol.file Protocol plugin which supports retrieving local file resources.org.apache.nutch.protocol.ftp Protocol plugin which supports retrieving documents via the ftp protocol.org.apache.nutch.protocol.htmlunit Protocol plugin which supports retrieving documents via HTTP/HTTPS using Selenium and the HtmlUnitDriver web driver for the for the HtmlUnit headless browser.org.apache.nutch.protocol.http Protocol plugin which supports retrieving documents via the http protocol.org.apache.nutch.protocol.http.api Common API used by HTTP plugins (http,httpclient, etc.)org.apache.nutch.protocol.httpclient Protocol plugin which supports retrieving documents via the HTTP andHTTPS protocols, optionally with Basic, Digest and NTLM authentication schemes for web server as well as proxy server.org.apache.nutch.protocol.interactiveselenium Protocol plugin which supports retrieving documents using and interacting with Selenium.org.apache.nutch.protocol.okhttp Protocol plugin for HTTP/HTTPS based on okhttp, supports HTTP 1.1 and/or http/2.org.apache.nutch.protocol.selenium Protocol plugin which supports retrieving documents via Selenium. -
-
Uses of Protocol in org.apache.nutch.protocol
Methods in org.apache.nutch.protocol that return Protocol Modifier and Type Method Description ProtocolProtocolFactory. getProtocol(String urlString)Returns the appropriateProtocolimplementation for a url.ProtocolProtocolFactory. getProtocol(URL url)Returns the appropriateProtocolimplementation for a url.ProtocolProtocolFactory. getProtocolById(String id)Methods in org.apache.nutch.protocol with parameters of type Protocol Modifier and Type Method Description abstract crawlercommons.robots.BaseRobotRulesRobotRulesParser. getRobotRulesSet(Protocol protocol, URL url, List<Content> robotsTxtContent)Fetch robots.txt (or it's protocol-specific equivalent) which applies to the given URL, parse it and return the set of robot rules applicable for the configured agent name(s).crawlercommons.robots.BaseRobotRulesRobotRulesParser. getRobotRulesSet(Protocol protocol, Text url, List<Content> robotsTxtContent)Fetch robots.txt (or it's protocol-specific equivalent) which applies to the given URL, parse it and return the set of robot rules applicable for the configured agent name(s). -
Uses of Protocol in org.apache.nutch.protocol.file
Classes in org.apache.nutch.protocol.file that implement Protocol Modifier and Type Class Description classFileThis class is a protocol plugin used for file: scheme. -
Uses of Protocol in org.apache.nutch.protocol.ftp
Classes in org.apache.nutch.protocol.ftp that implement Protocol Modifier and Type Class Description classFtpThis class is a protocol plugin used for ftp: scheme.Methods in org.apache.nutch.protocol.ftp with parameters of type Protocol Modifier and Type Method Description crawlercommons.robots.BaseRobotRulesFtpRobotRulesParser. getRobotRulesSet(Protocol ftp, URL url, List<Content> robotsTxtContent)The hosts for which the caching of robots rules is yet to be done, it sends a Ftp request to the host corresponding to theURLpassed, gets robots file, parses the rules and caches the rules object to avoid re-work in future. -
Uses of Protocol in org.apache.nutch.protocol.htmlunit
Classes in org.apache.nutch.protocol.htmlunit that implement Protocol Modifier and Type Class Description classHttp -
Uses of Protocol in org.apache.nutch.protocol.http
Classes in org.apache.nutch.protocol.http that implement Protocol Modifier and Type Class Description classHttp -
Uses of Protocol in org.apache.nutch.protocol.http.api
Classes in org.apache.nutch.protocol.http.api that implement Protocol Modifier and Type Class Description classHttpBaseMethods in org.apache.nutch.protocol.http.api with parameters of type Protocol Modifier and Type Method Description crawlercommons.robots.BaseRobotRulesHttpRobotRulesParser. getRobotRulesSet(Protocol http, URL url, List<Content> robotsTxtContent)Get the rules from robots.txt which applies for the givenurl. -
Uses of Protocol in org.apache.nutch.protocol.httpclient
Classes in org.apache.nutch.protocol.httpclient that implement Protocol Modifier and Type Class Description classHttpThis class is a protocol plugin that configures an HTTP client for Basic, Digest and NTLM authentication schemes for web server as well as proxy server. -
Uses of Protocol in org.apache.nutch.protocol.interactiveselenium
Classes in org.apache.nutch.protocol.interactiveselenium that implement Protocol Modifier and Type Class Description classHttp -
Uses of Protocol in org.apache.nutch.protocol.okhttp
Classes in org.apache.nutch.protocol.okhttp that implement Protocol Modifier and Type Class Description classOkHttp -
Uses of Protocol in org.apache.nutch.protocol.selenium
Classes in org.apache.nutch.protocol.selenium that implement Protocol Modifier and Type Class Description classHttp
-