SlideShare a Scribd company logo
NETWORK PROGRAMMING
CHAPTER 3 : URLS AND URIS
CHANDAN GUPTA BHAGAT
https://me.chandanbhagat.com.np
CONTENT
• URIs: URLs and Relative URLs
• The URL Class: Creating New URLs, Retrieving Data From a URL, Splitting a URL into Pieces, Equality &
Comparison and Conversion
• The URIClass: Constructing a URI,The Parts of the URI, Resolving Relative URIs, Equality & Comparison
and String Representation
• x-www-form-urlencoded: URL Encoder and URL Decoder
• Proxies: System Properties,The ProxyClass andThe ProxySelector Class
• Communicating with Server-Side ProgramsThroughGET
• Accessing Password-Protected Sites:The Authenticator Class,The PasswordAuthenticationClass andThe
JPasswordField Class
URIs: URLs and Relative URLs
• HTML is a hypertext markup language because it includes a way to specify links to
other documents identified by URLs.
• A URL unambiguously identifies the location of a re‐ source on the Internet. A URL is
the most common type of URI, or Uniform Resource Identifier.
• A URI can identify a resource by its network location, as in a URL, or by its name,
number, or other characteristics.
• The URL class is the simplest way for a Java program to locate and retrieve data from
the network.
• You do not need to worry about the details of the protocol being used, or how to
communicate with the server; you simply tell Java the URL and it gets the data for
you.
URIs
• A Uniform Resource Identifier (URI) is a string of characters in a particular syntax that
identifies a resource.
• The resource identified may be a file on a server; but it may also be an email address, a
news message, a book, a person’s name, an Internet host, the current stock price of Oracle,
or something else.
• A resource is a thing that is identified by a URI. A URI is a string that identifies a resource.
Yes, it is exactly that circular.
• All you ever receive from a server is a representation of a resource which comes in the form
of bytes.
• However, a single resource may have different representations. For instance,
https://www.un.org/en/documents/udhr/ identifies the Universal Declaration of Human
Rights; but there are representations of the declaration in plain text, XML, PDF, and other
formats.
URIs
• One of the key principles of good web architecture is to be profligate with URIs. If
anyone might want to address something or refer to something, give it a URI (and in
practice a URL).
• Just because a resource is a part of another resource, or a collection of other
resources, or a state of another resource at a particular time, doesn’t mean it can’t
have its own URI.
• For instance, in an email service, every user, every message received, every message
sent, every filtered view of the inbox, every contact, every filter rule, and every
single page a user might ever look at should have a unique URI.
• Although architecturally URIs are opaque strings, in practice it’s use‐ ful to design
them with human-readable substructure.
URIs
• The syntax of a URI is composed of a scheme and a scheme-specific part, separated by a
colon, like this:
scheme:scheme-specific-part
• The syntax of the scheme-specific part depends on the scheme being used. Current
schemes include:
• data Base64-encoded data included directly in a link; see RFC 2397
• file A file on a local disk
• ftp An FTP server
• http A World Wide Web server using the Hypertext Transfer Protocol
• mailto An email address
• magnet A resource available for download via peer-to-peer networks such as BitTorrent
URIs
• telnet A connection to a Telnet-based service
• urn A Uniform Resource Name
• In addition, Java makes heavy use of nonstandard custom schemes such as rmi, jar,
jndi, and doc for various purposes.
• There is no specific syntax that applies to the scheme-specific parts of all URIs.
However, many have a hierarchical form, like this:
//authority/path?query
• The authority part of the URI names the authority responsible for resolving the rest
of the URI
• The path is responsible for mapping the rest of the resource
URIs
• If the authority is an Internet host, optional usernames and ports may also be
provided to make the authority more specific.
• For example, the URI ftp://mp3:mp3@ci4319-
a.ashvil1.nc.home.com:33/VanHalen-Jump.mp3 has the authority
mp3:mp3@ci43198-a.ashvil1.nc.home.com:33.
• This authority has the username mp3, the password mp3, the host ci43198-
a.ashvil1.nc.home.com, and the port 33. It has the scheme ftp and the path
/VanHalen-Jump.mp3. (In most cases, including the pass‐ word in the URI is a big
security hole unless, as here, you really do want everyone in the universe to know
the password.)
URIs
• The scheme part is composed of lowercase letters, digits, and the plus sign, period, and
hyphen.
• The other three parts of a typical URI (authority, path, and query) should each be composed
of the ASCII alphanumeric characters (i.e., the letters A–Z, a–z, and the digits 0–9).
• In addition, the punctuation characters - _ . ! and ~ may also be used.
• Delimiters such as / ? & and = may be used for their predefined purposes.
• All other characters, including non-ASCII alphanumerics such as á and ζ as well as delimiters
not being used as delimiters should be escaped by a percent sign (%) followed by the
hexadecimal codes for the character as encoded in UTF-8.
• For instance, in UTF-8, á is the two bytes 0xC3 0xA1 so it would be encoded as %c3%a1. The
Chinese character 木 is Unicode code point 0x6728.
• In UTF-8, this is encoded as the three bytes E6, 9C, and A8. Thus, in a URI it would be
encoded as %E6%9C%A8.
URLs
• A URL is a URI that, as well as identifying a resource, provides a specific network location for the
resource that a client can use to retrieve a representation of that resource.
• A generic URI may tell you what a resource is, but not actually tell you where or how to get that
resource.
• In the physical world, it’s the difference between the title “Harry Potter and The Deathly Hallows” and
the library location “Room 312, Row 28, Shelf 7”.
• In Java, it’s the difference between the java.net.URI class that only identifies resources and the
java.net.URL class that can both identify and retrieve resources.
• The network location in a URL usually includes the protocol used to access a server (e.g., FTP, HTTP),
the hostname or IP address of the server, and the path to the resource on that server.
• A typical URL looks like http://www.ibiblio.org/javafaq/javatutorial.html. This specifies that there is a file
called javatutorial.html in a directory called javafaq on the server www.ibiblio.org, and that this file can
be accessed via the HTTP protocol.
URLs
• The syntax of a URL is:
protocol://userInfo@host:port/path?query#fragment
• Here the protocol is another word for what was called the scheme of the URI. (Scheme is the word
used in the URI RFC. Protocol is the word used in the Java documentation.) In a URL, the protocol part
can be file, ftp, http, https, magnet, telnet, or various other strings (though not urn).
• The host part of a URL is the name of the server that provides the resource you want. It can be a
hostname such as www.oreilly.com or utopia.poly.edu or an IP address, such as 204.148.40.9 or
128.238.3.21.
• The userInfo is optional login information for the server. If present, it contains a username and, rarely, a
password.
• The port number is also optional. It’s not necessary if the service is running on its default port (port 80
for HTTP servers).
• Together, the userInfo, host, and port constitute the authority
• Technically, a string that contains a fragment identifier is a URL reference, not a URL. Java, however,
does not distinguish between URLs and URL references.
Relative URLs
• A URL tells a web browser a lot about a document: the protocol used to retrieve the document, the host
where the document lives, and the path to the document on that host.
• Most of this information is likely to be the same for other URLs that are referenced in the document.
Therefore, rather than requiring each URL to be specified in its entirety, a URL may inherit the protocol,
hostname, and path of its parent document (i.e., the document in which it appears).
• URLs that aren’t complete but inherit pieces from their parent are called relative URLs. In contrast, a
completely specified URL is called an absolute URL.
• In a relative URL, any pieces that are missing are assumed to be the same as the corresponding
pieces from the URL of the document in which the URL is found. For example, suppose that while
browsing http://www.ibiblio.org/javafaq/javatutorial.html you click on this hyperlink:
<a href="javafaq.html">
• The browser cuts javatutorial.html off the end of http://www.ibiblio.org/javafaq/javatutorial.html to get
http://www.ibiblio.org/javafaq/.
Relative URLs
• Then it attaches javafaq.html onto the end of http://www.ibiblio.org/javafaq/ to get
http://www.ibiblio.org/javafaq/javafaq.html.
• Finally, it loads that document. If the relative link begins with a /, then it is relative to the document root
instead of relative to the current file. Thus, if you click on the following link while browsing
http://www.ibiblio.org/javafaq/javatutorial.html:
<a href="/projects/ipv6/">
• The browser would throw away /javafaq/javatutorial.html and attach /projects/ipv6/ to the end of
http://www.ibiblio.org to get http://www.ibiblio.org/projects/ipv6/.
• Relative URLs have a number of advantages. First and least important they save a little typing.
• More importantly, relative URLs allow a single document tree to be served by multiple protocols: for
instance, both HTTP and FTP. HTTP might be used for direct surfing, while FTP could be used for
mirroring the site.
• Most importantly of all, relative URLs allow entire trees of documents to be moved or copied from one
site to another without breaking all the internal links.
The URL Class: Creating New URLs, Retrieving Data
From a URL, Splitting a URL Into Pieces, Equality &
Comparison and Conversion
URL Class
• The java.net.URL class is an abstraction of a Uniform Resource Locator such as
http://www.lolcats.com/ or ftp://ftp.redhat.com/pub/.
• Extends java.lang.Object, and it is a final class that cannot be subclassed.
• Uses the strategy design pattern rather than relying on inheritance to configure instances for different
kinds of URLs. Protocol handlers are the strategies, and the URL class itself forms the context through
which the different strategies are selected.
• It is helpful to think of URLs as objects with fields that include the scheme (a.k.a. the protocol),
hostname, port, path, query string, and fragment identifier (a.k.a. the ref), each of which may be set
independently.
• URLs are immutable. After a URL object has been constructed, its fields do not change. This has the
side effect of making them thread safe.
Creating new URLs
• you can construct instances of java.net.URL. The constructors differ in the information they require:
public URL(String url) throws MalformedURLException
public URL(String protocol, String hostname, String file) throws MalformedURLException
public URL(String protocol, String host, int port, String file) throws MalformedURLException
public URL(URL base, String relative) throws MalformedURLException
• Which constructor you use depends on the information you have and the form it’s in. All these
constructors throw a MalformedURLException if you try to create a URL for an unsupported protocol
or if the URL is syntactically incorrect.
• Exactly which protocols are supported is implementation dependent. The only protocols that have
been available in all virtual machines are http and file, and the latter is notoriously flaky.
• Today, Java also supports the https, jar, and ftp protocols. Some virtual machines support mailto and
gopher as well as some custom protocols like doc, netdoc, systemresource, and verbatim used
internally by Java.
Creating new URLs
• If the protocol you need isn’t supported by a particular VM, you may be able to install a
protocol handler for that scheme to enable the URL class to speak that protocol.
• In practice, this is way more trouble than it’s worth. You’re better off using a library that
exposes a custom API just for that protocol
• Other than verifying that it recognizes the URL scheme, Java does not check the correctness
of the URLs it constructs. The programmer is responsible for making sure that URLs created
are valid.
• For instance, Java does not check that the hostname in an HTTP URL does not contain
spaces or that the query string is x-www-form-URL-encoded. It does not check that a mailto
URL actually contains an email address. You can create URLs for hosts that don’t exist and
for hosts that do exist but that you won’t be allowed to connect to.
Creating new URLs
• Constructing a URL from a string
The simplest URL constructor just takes an absolute URL in string form as its single
argument:
public URL(String url) throws MalformedURLException
• Like all constructors, this may only be called after the new operator, and like all URL
constructors, it can throw a MalformedURLException.
• The following code constructs a
URL object from a String, catching the exception that might be thrown:
try {
URL u = new URL("http://www.audubon.org/");
} catch (MalformedURLException ex) {
System.err.println(ex);
}
Creating new URLs
• Constructing a URL from its component parts
• You can also build a URL by specifying the protocol, the hostname, and the file:
public URL(String protocol, String hostname, String file) throws MalformedURLException
• This constructor sets the port to -1 so the default port for the protocol will be used. The
file argument should begin with a slash and include a path, a filename, and optionally
a fragment identifier. Forgetting the initial slash is a common mistake, and one that is
not easy to spot. Like all URL constructors, it can throw a MalformedURLException. For
example:
try {
URL u = new URL("http", "www.eff.org", "/blueribbon.html#intro");
} catch (MalformedURLException ex) {
throw new RuntimeException("shouldn't happen; all VMs recognize http");
}
Creating new URLs
• This creates a URL object that points to http://www.eff.org/blueribbon.html#intro, using the
default port for the HTTP protocol (port 80). The file specification includes a reference to a
named anchor. The code catches the exception that would be thrown if the virtual machine
did not support the HTTP protocol. However, this shouldn’t happen in practice.
• For the rare occasions when the default port isn’t correct, the next constructor lets you
specify the port explicitly as an int. The other arguments are the same. For example, this
code fragment creates a URL object that points to
http://fourier.dur.ac.uk:8000/~dma3mjh/jsci/, specifying port 8000 explicitly:
try {
URL u = new URL("http", "fourier.dur.ac.uk", 8000, "/~dma3mjh/jsci/");
} catch (MalformedURLException ex) {
throw new RuntimeException("shouldn't happen; all VMs recognize http");
}
Creating new URLs
• Constructing relative URLs
• This constructor builds an absolute URL from a relative URL and a base URL:
public URL(URL base, String relative) throws MalformedURLException
• For instance, you may be parsing an HTML document at http://www.ibiblio.org/javafaq/
index.html and encounter a link to a file called mailinglists.html with no further quali‐
fying information. In this case, you use the URL to the document that contains the link
to provide the missing information. The constructor computes the new URL as http://
www.ibiblio.org/javafaq/mailinglists.html. For example:
try {
URL u1 = new URL("http://www.ibiblio.org/javafaq/index.html");
URL u2 = new URL (u1, "mailinglists.html");
} catch (MalformedURLException ex) {
System.err.println(ex);
}
Creating new URLs
• Other sources of URL objects
• Besides the constructors discussed here, a number of other methods in the Java
class library return URL objects. In applets, getDocumentBase() returns the URL of
the page that contains the applet and getCodeBase() returns the URL of the applet
.class file.
• The java.io.File class has a toURL() method that returns a file URL matching the given
file. The exact format of the URL returned by this method is platform dependent.
Retrieving data from a URL
• Naked URLs aren’t very exciting. What’s interesting is the data contained in the documents they point
to. The URL class has several methods that retrieve data from a URL:
public InputStream openStream() throws IOException
public URLConnection openConnection() throws IOException
public URLConnection openConnection(Proxy proxy) throws IOException
public Object getContent() throws IOException
public Object getContent(Class[] classes) throws IOException
• The most basic and most commonly used of these methods is openStream(), which returns an
InputStream from which you can read the data.
• If you need more control over the download process, call openConnection() instead, which gives you
a URLConnection which you can configure, and then get an InputStream from it.
• Finally, you can ask the URL for its content with getContent() which may give you a more complete
object such as String or an Image. Then again, it may just give you an InputStream anyway
Retrieving data from a URL
• public final InputStream openStream() throws IOException
• The openStream() method connects to the resource referenced by the URL, performs any necessary handshaking
between the client and the server, and returns an Input Stream from which data can be read.
• The data we get from this InputStream is the raw (i.e., uninterpreted) content the URL references: ASCII if we’re
reading an ASCII text file, raw HTML if we’re reading an HTML file, binary image data if we’re reading an image file, and
so forth. It does not include any of the HTTP headers or any other protocol-related information.
• We can read from this InputStream as we would read from any other InputStream. For example:
try {
URL u = new URL("http://www.lolcats.com");
InputStream in = u.openStream();
int c;
while ((c = in.read()) != -1) System.out.write(c);
in.close();
} catch (IOException ex) {
System.err.println(ex);
}
• The preceding code fragment catches an IOException, which also catches the MalformedURLException that the URL
constructor can throw, since MalformedURLException subclasses IOException.
Retrieving data from a URL
• public URLConnection openConnection() throws IOException
• The openConnection() method opens a socket to the specified URL and returns a URLConnection object. A
URLConnection represents an open connection to a network resource. If the call fails, openConnection() throws an
IOException. For example:
try {
URL u = new URL("https://news.ycombinator.com/");
try {
URLConnection uc = u.openConnection();
InputStream in = uc.getInputStream();
// read from the connection...
} catch (IOException ex) {
System.err.println(ex);
}
} catch (MalformedURLException ex) {
System.err.println(ex);
}
• We should use this method when you want to communicate directly with the server. The URLConnection gives us
access to everything sent by the server: in addition to the document itself in its raw form (e.g., HTML, plain text, binary
image data), we can access all the metadata specified by the protocol.
Retrieving data from a URL
• public final Object getContent() throws IOException
• The getContent() method is the third way to download data referenced by a URL. The getContent()
method retrieves the data referenced by the URL and tries to make it into some type of object. If the
URL refers to some kind of text such as an ASCII or HTML file, the object returned is usually some sort
of InputStream.
• If the URL refers to an image such as a GIF or a JPEG file, getContent() usually returns a
java.awt.Image Producer. What unifies these two disparate classes is that they are not the thing itself
but a means by which a program can construct the thing:
URL u = new URL("http://mesola.obspm.fr/");
Object o = u.getContent();
• getContent() operates by looking at the Content-type field in the header of the data it gets from the
server. If the server does not use MIME headers or sends an unfamiliar Content-type, getContent()
returns some sort of InputStream with which the data can be read. An IOException is thrown if the
object can’t be retrieved.
Retrieving data from a URL
• public final Object getContent(Class[] classes) throws IOException
• A URL’s content handler may provide different views of a resource. This overloaded variant of the
getContent() method lets you choose which class you’d like the content to be returned as. The
method attempts to return the URL’s content in the first available format. For instance, if you prefer
an HTML file to be returned as a String, but your second choice is a Reader and your third choice is
an InputStream, write:
URL u = new URL("http://www.nwu.org");
Class<?>[] types = new Class[3];
types[0] = String.class;
types[1] = Reader.class;
types[2] = InputStream.class;
Object o = u.getContent(types);
• If the content handler knows how to return a string representation of the resource, then it returns a
String. If it doesn’t know how to return a string representation of the resource, then it returns a
Reader. And if it doesn’t know how to present the resource as a reader, then it returns an
InputStream. You have to test for the type of the returned object using instanceof
Retrieving data from a URL
if (o instanceof String) {
System.out.println(o);
} else if (o instanceof Reader) {
int c;
Reader r = (Reader) o;
while ((c = r.read()) != -1) System.out.print((char) c);
r.close();
} else if (o instanceof InputStream) {
int c;
InputStream in = (InputStream) o;
while ((c = in.read()) != -1) System.out.write(c);
in.close();
} else {
System.out.println("Error: unexpected type " + o.getClass());
}
Splitting a URL into pieces
• URLs are composed of five pieces:
• The scheme, also known as the protocol
• The authority
• The path
• The fragment identifier, also known as the section or ref
• The query string
• For example, in the URL http://www.ibiblio.org/javafaq/books/jnp/index.html?isbn=1565922069#toc, the
scheme is http, the authority is www.ibiblio.org, the path is /javafaq/books/jnp/index.html, the fragment
identifier is toc, and the query string is isbn=1565922069. However, not all URLs have all these pieces. For
instance, the URL http://www.faqs.org/rfcs/rfc3986.html has a scheme, an authority, and a path, but no
fragment identifier or query string.
• The authority may further be divided into the user info, the host, and the port. For example, in the
URL http://admin@www.blackstar.com:8080/, the authority is admin@www.blackstar.com:8080.
This has the user info admin, the host www.black‐star.com, and the port 8080.
• Read-only access to these parts of a URL is provided by nine public methods: getFile(), getHost(),
getPort(), getProtocol(), getRef(), getQuery(), getPath(), getUserInfo(), and getAuthority().
Splitting a URL into pieces
• public String getProtocol()
• The getProtocol() method returns a String containing the scheme of the URL (e.g., “http”,
“https”, or “file”). For example, this code fragment prints https:
URL u = new URL("https://xkcd.com/727/");
System.out.println(u.getProtocol());
• public String getHost()
• The getHost() method returns a String containing the hostname of the URL. For example, this
code fragment prints xkcd.com:
URL u = new URL("https://xkcd.com/727/");
System.out.println(u.getHost());
Splitting a URL into pieces
• public int getPort()
• The getPort() method returns the port number specified in the URL as an int. If no port was specified
in the URL, getPort() returns -1 to signify that the URL does not specify the port explicitly, and will use
the default port for the protocol.
• For example, if the URL is http://www.userfriendly.org/, getPort() returns -1; if the URL is
http://www.userfriendly.org:80/, getPort() returns 80. The following code prints -1 for the port
number because it isn’t specified in the URL:
URL u = new URL("http://www.ncsa.illinois.edu/AboutUs/");
System.out.println("The port part of " + u + " is " + u.getPort());
• public int getDefaultPort()
• The getDefaultPort() method returns the default port used for this URL’s protocol when none is
specified in the URL. If no default port is defined for the protocol, then getDefaultPort() returns -1.
• For example, if the URL is http://www.userfriendly.org/, getDefaultPort() returns 80; if the URL is
ftp://ftp.userfriendly.org:8000/, getDefault Port() returns 21.
Splitting a URL into pieces
• public String getFile()
• The getFile() method returns a String that contains the path portion of a URL; remember that
Java does not break a URL into separate path and file parts. Everything from the first slash (/)
after the hostname until the character preceding the # sign that begins a fragment identifier is
considered to be part of the file. For example:
URL page = this.getDocumentBase();
System.out.println("This page's path is " + page.getFile());
If the URL does not have a file part, Java sets the file to the empty string.
• public String getPath()
• The getPath() method is a near synonym for getFile(); that is, it returns a String containing the
path and file portion of a URL. However, unlike getFile(), it does not include the query string in
the String it returns, just the path.
Splitting a URL into pieces
• public String getRef()
• The getRef() method returns the fragment identifier part of the URL. If the URL doesn’t have a
fragment identifier, the method returns null. In the following code, getRef() returns the string
xtocid1902914:
URL u = new URL("http://www.ibiblio.org/javafaq/javafaq.html#xtocid1902914");
System.out.println("The fragment ID of " + u + " is " + u.getRef());
• public String getQuery()
• The getQuery() method returns the query string of the URL. If the URL doesn’t have a query
string, the method returns null. In the following code, getQuery() returns the string
category=Piano:
URL u = new URL("http://www.ibiblio.org/nywc/compositions.phtml?category=Piano");
System.out.println("The query string of " + u + " is " + u.getQuery());
Splitting a URL into pieces
• public String getUserInfo()
• Some URLs include usernames and occasionally even password information. This information
comes after the scheme and before the host; an @ symbol delimits it.
• For instance, in the URL http://elharo@java.oreilly.com/, the user info is elharo. Some URLs also
include passwords in the user info. For instance, in the URL
ftp://mp3:secret@ftp.example.com/c%3a/stuff/mp3/, the user info is mp3:secret.
• However, most of the time, including a password in a URL is a security risk. If the URL doesn’t
have any user info, getUserInfo() returns null. Mailto URLs may not behave like you expect.
• In a URL like mailto:elharo@ibiblio.org, “elharo@ibiblio.org” is the path, not the user info and
the host. That’s because the URL specifies the remote recipient of the message rather than the
username and host that’s
sending the message.
SPLITTING A URL INTO PIECES
• public String getAuthority()
• Between the scheme and the path of a URL, you’ll find the authority.
• This part of the URI indicates the authority that resolves the resource.
• In the most general case, the authority includes the user info, the host, and the port.
• For example, in the URL ftp://mp3:mp3@138.247.121.61:21000/c%3a/, the authority is
mp3:mp3@138.247.121.61:21000, the user info is mp3:mp3, the host is 138.247.121.61, and
the port is 21000.
• However, not all URLs have all parts. For instance, in the URL
http://conferences.oreilly.com/java/speakers/, the authority is simply the hostname
conferences.oreilly.com.
• The getAuthority() method returns the authority as it exists in the URL, with or without the user
info and port.
Equality & Comparison
• The URL class contains the usual equals() and hashCode() methods.
• Two URLs are considered equal if and only if both URLs point to the same resource on the same host,
port, and path, with the same fragment identifier and query string. However there is one surprise
here.
• The equals() method actually tries to resolve the host with DNS so that, for example, it can tell that
http://www.ibiblio.org/ and http://ibiblio.org/ are the same.
• This means that equals() on a URL is potentially a blocking I/O operation! For this reason, you should
avoid storing URLs in data structure that depend on equals() such as java.util.HashMap. Prefer
java.net.URI for this, and convert back and forth from URIs to URLs when necessary.
• On the other hand, equals() does not go so far as to actually compare the resources identified by two
URLs.
• For example, http://www.oreilly.com/ is not equal to http://www.oreilly.com/index.html; and
http://www.oreilly.com:80 is not equal to http://www.oreilly.com/.
Conversion
• URL has three methods that convert an instance to another form: toString(), toExternalForm(), and
toURI(). Like all good classes, java.net.URL has a toString() method. The String produced by toString() is
always an absolute URL, such as http://www.cafeaulait.org/javatutorial.html. It’s uncommon to call
toString() explicitly. Print statements call to String() implicitly. Outside of print statements, it’s more proper
to use toExternal Form() instead:
public String toExternalForm()
• The toExternalForm() method converts a URL object to a string that can be used in an HTML link or a web
browser’s Open URL dialog.
• The toExternalForm() method returns a human-readable String representing the URL. It is identical to the
toString() method. In fact, all the toString() method does is return toExternalForm().
• Finally, the toURI() method converts a URL object to an equivalent URI object:
public URI toURI() throws URISyntaxException
• The URI class provides much more accurate, specification-conformant behavior than the URL class. For
operations like absolutization and encoding, you should prefer the URI class where you have the option.
You should also prefer the URI class if you need to store URLs in a hashtable or other data structure, since
its equals() method is not blocking. The URL class should be used primarily when you want to download
content from a server.
The URI Class: Constructing A URI, The Parts of the
URI, Resolving Relative URIs, Equality & Comparison
and String Representation
Constructing a URI
• URIs are built from strings. You can either pass the entire URI to the constructor in a single
string, or the individual pieces:
public URI(String uri) throws URISyntaxException
public URI(String scheme, String schemeSpecificPart, String fragment) throws URISyntaxException
public URI(String scheme, String host, String path, String fragment) throws URISyntaxException
public URI(String scheme, String authority, String path, String query, String fragment) throws
URISyntaxException
public URI(String scheme, String userInfo, String host, int port, String path, String query, String fragment) throws
URISyntaxException
• Unlike the URL class, the URI class does not depend on an underlying protocol handler.
As long as the URI is syntactically correct, Java does not need to understand its protocol
in order to create a representative URI object. Thus, unlike the URL class, the URI class
can be used for new and experimental URI schemes.
Constructing a URI
• The first constructor creates a new URI object from any convenient string. For example:
URI voice = new URI("tel:+1-800-9988-9938");
URI web = new URI("http://www.xml.com/pub/a/2003/09/17/stax.html#id=_hbc");
URI book = new URI("urn:isbn:1-565-92870-9");
• If the string argument does not follow URI syntax rules for example, if the URI begins with a
colon this constructor throws a URISyntaxException. This is a checked exception, so either
catch it or declare that the method where the constructor is invoked can throw it.
• However, one syntax rule is not checked. In contradiction to the URI specification, the
characters used in the URI are not limited to ASCII. They can include other Unicode
characters, such as ø and é. Syntactically, there are very few restrictions on URIs, especially
once the need to encode non-ASCII characters is removed and relative URIs are allowed.
Almost any string can be interpreted as a URI.
Constructing a URI
• The second constructor that takes a scheme specific part is mostly used for
nonhierarchical URIs.
• The scheme is the URI’s protocol, such as http, urn, tel, and so forth. It must be
composed exclusively of ASCII letters and digits and the three punctuation
characters +, -, and .. It must begin with a letter. Passing null for this argument omits
the scheme, thus creating a relative URI. For example:
URI absolute = new URI("http", "//www.ibiblio.org" , null);
URI relative = new URI(null, "/javafaq/index.shtml", "today");
• The scheme-specific part depends on the syntax of the URI scheme; it’s one thing
for an http URL, another for a mailto URL, and something else again for a tel URI.
Because the URI class encodes illegal characters with percent escapes, there’s
effectively no syntax error you can make in this part.
Constructing a URI
• The third constructor is used for hierarchical URIs such as http and ftp URLs. The
host and path together (separated by a /) form the scheme-specific part for this URI.
For example:
URI today= new URI("http", "www.ibiblio.org", "/javafaq/index.html", "today");
• This produces the URI http://www.ibiblio.org/javafaq/index.html#today.
• If the constructor cannot form a legal hierarchical URI from the supplied pieces—for
instance, if there is a scheme so the URI has to be absolute but the path doesn’t
start with /—then it throws a URISyntaxException.
Constructing a URI
• The fourth constructor is basically the same as the third, with the addition of a
query string. For example:
URI today = new URI("http", "www.ibiblio.org", "/javafaq/index.html",
"referrer=cnet&date=2014-02-23", "today");
http://www.ibiblio.org/javafaq/index.html?referrer=cnet&date=2014-02-23#today
• As usual, any unescapable syntax errors cause a URISyntaxException to be thrown
and null can be passed to omit any of the arguments.
Constructing a URI
• The fifth constructor is the master hierarchical URI constructor that the previous two
invoke. It divides the authority into separate user info, host, and port parts, each of which has its own
syntax rules. For example
URI styles = new URI("ftp", "anonymous:elharo@ibiblio.org", "ftp.oreilly.com", 21, "/pub/stylesheet",
null, null);
• However, the resulting URI still has to follow all the usual rules for URIs; and again null can be passed
for any argument to omit it from the result.
• If you’re sure your URIs are legal and do not violate any of the rules, you can use the static factory
URI.create() method instead. Unlike the constructors, it does not throw a URISyntaxException. For
example, this invocation creates a URI for anonymous FTP access using an email address as
password:
URI styles = URI.create("ftp://anonymous:elharo%40ibiblio.org@ftp.oreilly.com:21/pub/stylesheet");
• If the URI does prove to be malformed, then an IllegalArgumentException is thrown
by this method. This is a runtime exception, so you don’t have to explicitly declare it or
catch it.
Parts of the URI
• A URI reference has up to three parts: a scheme, a scheme-specific part, and a fragment identifier. The general format
is:
scheme:scheme-specific-part:fragment
• If the scheme is omitted, the URI reference is relative. If the fragment identifier is omitted, the URI reference is a pure
URI. The URI class has getter methods that return these three parts of each URI object. The getRawFoo() methods
return the encoded forms of the parts of the URI, while the equivalent getFoo() methods first decode any
percentescaped characters and then return the decoded part:
public String getScheme()
public String getSchemeSpecificPart()
public String getRawSchemeSpecificPart()
public String getFragment()
public String getRawFragment()
• These methods all return null if the particular URI object does not have the relevant component.
• A URI that has a scheme is an absolute URI. A URI without a scheme is relative. The isAbsolute() method returns true if
the URI is absolute, false if it’s relative:
public boolean isAbsolute()
Parts of the URI
• The details of the scheme-specific part vary depending on the type of the scheme. For example, in a tel URL, the
scheme-specific part has the syntax of a telephone number. However, in many useful URIs, including the very
common file and http URLs, the scheme-specific part has a particular hierarchical format divided into an
authority, a path, and a query string. The authority is further divided into user info, host, and port. The
isOpaque() method returns false if the URI is hierarchical, true if it’s not hierarchical—that is, if it’s opaque:
public boolean isOpaque()
• If the URI is opaque, all you can get is the scheme, scheme-specific part, and fragment identifier. However, if the
URI is hierarchical, there are getter methods for all the different parts of a hierarchical URI:
public String getAuthority()
public String getFragment()
public String getHost()
public String getPath()
public String getPort()
public String getQuery()
public String getUserInfo()
Parts of the URI
• These methods all return the decoded parts; in other words, percent escapes, such as %3C, are changed into the
characters they represent, such as <. If you want the raw, encoded parts of the URI, there are five parallel
etRaw_Foo_() methods:
public String getRawAuthority()
public String getRawFragment()
public String getRawPath()
public String getRawQuery()
public String getRawUserInfo()
• Remember the URI class differs from the URI specification in that non-ASCII characters such as é and ü are never
percent escaped in the first place, and thus will still be present in the strings returned by the getRawFoo()
methods unless the strings originally used to construct the URI object were encoded.
Parts of the URI
• For various technical reasons that don’t have a lot of practical impact, Java can’t always initially detect syntax
errors in the authority component. The immediate symptom of this failing is normally an inability to return the
individual parts of the authority, port, host, and user info. In this event, you can call parseServerAuthority() to
force the authority to be reparsed:
public URI parseServerAuthority() throws URISyntaxException
• The original URI does not change (URI objects are immutable), but the URI returned will have separate authority
parts for user info, host, and port. If the authority cannot be parsed, a URISyntaxException is thrown.
Resolving Relative URIs
• The URI class has three methods for converting back and forth between relative and absolute URIs:
public URI resolve(URI uri)
public URI resolve(String uri)
public URI relativize(URI uri)
• The resolve() methods compare the uri argument to this URI and use it to construct a new URI object that wraps
an absolute URI. For example, consider these three lines of code:
URI absolute = new URI("http://www.example.com/");
URI relative = new URI("images/logo.png");
URI resolved = absolute.resolve(relative);
• If the invoking URI does not contain an absolute URI itself, the resolve() method resolves as much of the URI as
it can and returns a new relative URI object as a result. For example, take these three statements:
URI top = new URI("javafaq/books/");
URI resolved = top.resolve("jnp3/examples/07/index.html");
Resolving Relative URIs
• It’s also possible to reverse this procedure; that is, to go from an absolute URI to a relative one. The
relativize() method creates a new URI object from the uri argument that is relative to the invoking URI.
The argument is not changed. For example:
URI absolute = new URI("http://www.example.com/images/logo.png");
URI top = new URI("http://www.example.com/");
URI relative = top.relativize(absolute);
• The URI object relative now contains the relative URI images/logo.png.
Equality And Comparisons
• Equal URIs must both either be hierarchical or opaque. The scheme and authority parts are compared
without considering case. That is, http and HTTP are the same scheme, and www.example.com is the
same authority as ww.EXAMPLE.com.
• The rest of the URI is case sensitive, except for hexadecimal digits used to escape illegal characters.
Escapes are not decoded before comparing. http://www.example.com/A and
http://www.example.com/%41 are unequal URIs. The hashCode() method is consistent with equals.
• Equal URIs do have the same hash code and unequal URIs are fairly unlikely to share the same hash
code.
• URI implements Comparable, and thus URIs can be ordered. The ordering is based on string
comparison of the individual parts, in this sequence:
• If the schemes are different, the schemes are compared, without considering case.
• Otherwise, if the schemes are the same, a hierarchical URI is considered to be less than an
opaque URI with the same scheme.
• If both URIs are opaque URIs, they’re ordered according to their scheme-specific parts.
• If both the scheme and the opaque scheme-specific parts are equal, the URIs are compared by
their fragments.
Equality And Comparisons
• If both URIs are hierarchical, they’re ordered according to their authority components, which are
themselves ordered according to user info, host, and port, in that order. Hosts are case insensitive.
• If the schemes and the authorities are equal, the path is used to distinguish them.
• If the paths are also equal, the query strings are compared.
• If the query strings are equal, the fragments are compared.
• URIs are not comparable to any type except themselves. Comparing a URI to anything except another
URI causes a ClassCastException.
String Representations
• Two methods convert URI objects to strings, toString() and toASCIIString():
public String toString()
public String toASCIIString()
• The toString() method returns an unencoded string form of the URI (i.e., characters like é and  are not
percent escaped). Therefore, the result of calling this method is not guaranteed to be a syntactically
correct URI, though it is in fact a syntactically correct IRI. This form is sometimes useful for display to
human beings, but usually not for retrieval.
• The toASCIIString() method returns an encoded string form of the URI. Characters like é and  are
always percent escaped whether or not they were originally escaped. This is the string form of the URI
you should use most of the time. Even if the form returned by toString() is more legible for humans,
they may still copy and paste it into areas that are not expecting an illegal URI. toASCIIString() always
returns a syntactically correct URI.
x-www-form-urlencoded: URL Encoder and URL
Decoder
x-www-form-urlencoded
• One of the challenges faced by the designers of the Web was dealing with the differences between
operating systems.
• These differences can cause problems with URLs: for example, some operating systems allow spaces
in filenames; some don’t. Most operating systems won’t complain about a # sign in a filename; but in a
URL, a # sign indicates that the filename has ended, and a fragment identifier follows.
• Other special characters, nonalphanumeric characters, and so on, all of which may have a special
meaning inside a URL or on another operating system, present similar problems.
• Furthermore, Unicode was not yet ubiquitous when the Web was invented, so not all systems could
handle characters such as é and 本. To solve these problems, characters used in URLs must come
from a fixed subset of ASCII, specifically:
• The capital letters A–Z
• The lowercase letters a–z
• The digits 0–9
• The punctuation characters - _ . ! ~ * ' (and ,)
x-www-form-urlencoded
• The characters : / & ? @ # ; $ + = and % may also be used, but only for their specified purposes. If
these characters occur as part of a path or query string, they and all other characters should be
encoded.
• The encoding is very simple. Any characters that are not ASCII numerals, letters, or the punctuation
marks specified earlier are converted into bytes and each byte is written as a percent sign followed by
two hexadecimal digits. Spaces are a special case because they’re so common. Besides being
encoded as %20, they can be encoded as a plus sign (+). The plus sign itself is encoded as %2B. The
/ # = & and ? characters should be encoded when they are used as part of a name, and not as a
separator between parts of the URL.
• The URL class does not encode or decode automatically. You can construct URL objects that use
illegal ASCII and non-ASCII characters and/or percent escapes. Such characters and escapes are not
automatically encoded or decoded when output by methods such as getPath() and toExternalForm().
You are responsible for making sure all such characters are properly encoded in the strings used to
construct a URL object.
• Luckily, Java provides URLEncoder and URLDecoder classes to cipher strings in this format.
URLEncoder
• To URL encode a string, pass the string and the character set name to the URLEncod er.encode()
method. For example:
String encoded = URLEncoder.encode("This*string*has*asterisks", "UTF-8");
• URLEncoder.encode() returns a copy of the input string with a few changes. Any non‐alphanumeric
characters are converted into % sequences (except the space, underscore, hyphen, period, and
asterisk characters). It also encodes all non-ASCII characters. The space is converted into a plus sign.
• It also converts tildes, single quotes, exclamation points, and parentheses to percent escapes, even
though they don’t absolutely have to be. However, this change isn’t forbidden by the URL specification,
so web browsers deal reasonably with these excessively encoded URLs.
• Although this method allows you to specify the character set, the only such character set you should
ever pick is UTF-8. UTF-8 is compatible with the IRI specification, the URI class, modern web
browsers, and more additional software than any other encoding you could choose.
URLEncoder
• This string has spaces
• This*string*has*asterisks
• This%string%has%percent%signs
• This+string+has+pluses
• This/string/has/slashes
• This"string"has"quote"marks
• This:string:has:colons
• This~string~has~tildes
• This(string)has(parentheses)
• This.string.has.periods
• This=string=has=equals=signs
• This&string&has&ampersands
• Thiséstringéhasé
This+string+has+spaces
This*string*has*asterisks
This%25string%25has%25percent%25signs
This%2Bstring%2Bhas%2Bpluses
This%2Fstring%2Fhas%2Fslashes
This%22string%22has%22quote%22marks
This%3Astring%3Ahas%3Acolons
This%7Estring%7Ehas%7Etildes
This%28string%29has%28parentheses%29
This.string.has.periods
This%3Dstring%3Dhas%3Dequals%3Dsigns
This%26string%26has%26ampersands
This%C3%A9string%C3%A9has%C3%A9non-
ASCII+characters
URLDecoder
• The corresponding URLDecoder class has a static decode() method that decodes strings encoded in x-
www-form-url-encoded format. That is, it converts all plus signs to spaces and all percent escapes to
their corresponding character:
public static String decode(String s, String encoding) throws UnsupportedEncodingException
• If you have any doubt about which encoding to use, pick UTF-8. It’s more likely to be correct than
anything else.
An IllegalArgumentException should be thrown if the string contains a percent sign that isn’t followed
by two hexadecimal digits or decodes into an illegal sequence. Since URLDecoder does not touch non-
escaped characters, you can pass an entire URL to it rather than splitting it into pieces first. For
example:
String input = "https://www.google.com/"search?hl=en&as_q=Java&as_epq=I%2FO";
String output = URLDecoder.decode(input, "UTF-8");
System.out.println(output);
Proxies: System Properties, The Proxyclass And The
Proxyselector Class
Proxies
• Many systems access the Web and sometimes other non-HTTP parts of the Internet through proxy
servers. A proxy server receives a request for a remote server from a local client. The proxy server
makes the request to the remote server and forwards the result back to the local client.
• Sometimes this is done for security reasons, such as to prevent remote hosts from learning private
details about the local network configuration.
• Other times it’s done to prevent users from accessing forbidden sites by filtering outgoing requests and
limiting which sites can be viewed.
• For instance, an elementary school might want to block access to http://www.playboy.com. And still
other times it’s done purely for performance, to allow multiple users to retrieve the same popular
documents from a local cache rather than making repeated downloads from the remote server.
• Java programs based on the URL class can work through most common proxy servers and protocols.
• Indeed, this is one reason you might want to choose to use the URL class rather than rolling your own
HTTP or other client on top of raw sockets.
System Properties
• For basic operations, all you have to do is set a few system properties to point to the addresses of your local
proxy servers. If you are using a pure HTTP proxy, set http.proxyHost to the domain name or the IP address of
your proxy server and http.proxyPort to the port of the proxy server (the default is 80). There are several ways to
do this, including calling System.setProperty() from within your Java code or using the -D options when
launching the program. This example sets the proxy server to 192.168.254.254 and the port to 9000:
<programlisting format="linespecific" id="I_7_tt264">% <userinput moreinfo="none">
java -Dhttp.proxyHost=192.168.254.254 -Dhttp.proxyPort=9000 </userinput>
<emphasis role="bolditalic">com.domain.Program</emphasis></programlisting>
• If the proxy requires a username and password, you’ll need to install an Authenticator.
• If you want to exclude a host from being proxied and connect directly instead, set the http.nonProxyHosts
system property to its hostname or IP address. To exclude multiple hosts, separate their names by vertical bars.
For example, this code fragment proxies everything except java.oreilly.com and xml.oreilly.com:
System.setProperty("http.proxyHost", "192.168.254.254");
System.setProperty("http.proxyPort", "9000");
System.setProperty("http.nonProxyHosts", "java.oreilly.com|xml.oreilly.com")
System Properties
• You can also use an asterisk as a wildcard to indicate that all the hosts within a particular domain or subdomain
should not be proxied. For example, to proxy everything except hosts in the oreilly.com domain:
% java -Dhttp.proxyHost=192.168.254.254 -Dhttp.nonProxyHosts=*.oreilly.com
<emphasis role="bolditalic">com.domain.Program</emphasis></programlisting>
• If you are using an FTP proxy server, set the ftp.proxyHost, ftp.proxyPort, and ftp.nonProxyHosts properties in
the same way.
• Java does not support any other application layer proxies, but if you’re using a transport layer SOCKS proxy for
all TCP connections, you can identify it with the socksProxy Host and socksProxyPort system properties. Java
does not provide an option for non-proxying with SOCKS. It’s an all-or-nothing decision.
The Proxy Class
• The Proxy class allows more fine-grained control of proxy servers from within a Java program. Specifically, it
allows you to choose different proxy servers for different remote hosts. The proxies themselves are represented
by instances of the java.net.Proxy class.
• There are still only three kinds of proxies, HTTP, SOCKS, and direct connections (no proxy at all), represented by
three constants in the Proxy.Type enum:
• Proxy.Type.DIRECT
• Proxy.Type.HTTP
• Proxy.Type.SOCKS
• Besides its type, the other important piece of information about a proxy is its address and port, given as a
SocketAddress object. For example, this code fragment creates a Proxy object representing an HTTP proxy server
on port 80 of proxy.example.com:
SocketAddress address = new InetSocketAddress("proxy.example.com", 80);
Proxy proxy = new Proxy(Proxy.Type.HTTP, address);
• Although there are only three kinds of proxy objects, there can be many proxies of the same type for different
proxy servers on different hosts.
The ProxySelector Class
• Each running virtual machine has a single java.net.ProxySelector object it uses to locate the proxy server for
different connections. The default ProxySelector merely inspects the various system properties and the URL’s
protocol to decide how to connect to different hosts. However, you can install your own subclass of
ProxySelector in place of the default selector and use it to choose different proxies based on protocol, host,
path, time of day, or other criteria. The key to this class is the abstract select() method:
public abstract List<Proxy> select(URI uri)
• Java passes this method a URI object (not a URL object) representing the host to which a connection is needed.
For a connection made with the URL class, this object typically has the form http://www.example.com/ or
ftp://ftp.example.com/pub/files/, for example. For a pure TCP connection made with the Socket class, this URI
will have the form socket://host:port:, for instance, socket://www.example.com:80. The ProxySelector object
then chooses the right proxies for this type of object and returns them in a List<Proxy>.The second abstract
method in this class you must implement is connectFailed():
public void connectFailed(URI uri, SocketAddress address, IOException ex)
• This is a callback method used to warn a program that the proxy server isn’t actually making the
connection.
COMMUNICATING WITH SERVER-SIDE PROGRAMS
THROUGH GET
Communicating with Server-Side Programs Through GET
• The URL class makes it easy for Java applets and applications to communicate with serverside programs such as
CGIs, servlets, PHP pages, and others that use the GET method.
• All you need to know is what combination of names and values the program expects to receive. Then you can
construct a URL with a query string that provides the requisite names and values. All names and values must be
x-wwwform-url-encoded—as by the URLEncoder.encode() method.
• There are a number of ways to determine the exact syntax for a query string that talks to a particular program. If
you’ve written the server-side program yourself, you already know the name-value pairs it expects. If you’ve
installed a third-party program on your own server, the documentation for that program should tell you what it
expects. If you’re talking to a documented external network API, then the service usually provides fairly detailed
documentation to tell you exactly what data to send for which purposes.
• Many programs are designed to process form input. If this is the case, it’s straightforward to figure out what
input the program expects. The method the form uses should be the value of the METHOD attribute of the
FORM element. This value should be either GET, in which case you use the process described here, or POST. The
part of the URL that precedes the query string is given by the value of the ACTION attribute of the FORM
element. Note that this may be a relative URL, in which case you’ll need to determine the corresponding
absolute URL. Finally, the names in the name-value pairs are simply the values of the NAME attributes of the
INPUT elements. The values of the pairs are whatever the user types into the form
Communicating with Server-Side Programs Through GET
• For example, consider this HTML form for the local search engine on my Cafe con Leche site. You can see that it uses
the GET method. The program that processes the form is accessed via the URL http://www.google.com/search. It has
four separate name-value pairs, three of which have default values:
<form name="search" action="http://www.google.com/search" method="get">
<input name="q" />
<input type="hidden" value="cafeconleche.org" name="domains" />
<input type="hidden" name="sitesearch" value="cafeconleche.org" />
<input type="hidden" name="sitesearch2" value="cafeconleche.org" />
<br />
<input type="image" height="22" width="55"
src="images/search_blue.gif" alt="search" border="0"
name="search-image" />
</form>
• The type of the INPUT field doesn’t matter. For instance, it doesn’t matter if it’s a set of checkboxes, a pop-up list, or a
text field. Only the name of each INPUT field and the value you give it is significant. The submit input tells the web
browser when to send the data but does not give the server any extra information. Sometimes you find hidden INPUT
fields that must have particular required default values. This form has three hidden INPUT fields. There are many
different form tags in HTML that produce pop-up menus, radio buttons, and more. However, although these input
widgets appear different to the user, the format of data they send to the server is the same. Each form element
provides a name and an encoded string value.
Communicating with Server-Side Programs Through GET
• In some cases, the program you’re talking to may not be able to handle arbitrary text strings for values of
particular inputs. However, since the form is meant to be read and filled in by human beings, it should provide
sufficient clues to figure out what input is expected; for instance, that a particular field is supposed to be a two-
letter state abbreviation or a phone number. Sometimes the inputs may not have such obvious names.
• There may not even be a form, just links to follow. In this case, you have to do some experimenting, first copying
some existing values and then tweaking them to see what values are and aren’t accepted. You don’t need to do
this in a Java program. You can simply edit the URL in the address or location bar of your web browser window.
• The likelihood that other hackers may experiment with your own server-side programs in such a fashion is a
good reason to make them extremely robust against unexpected input.
• Regardless of how you determine the set of name-value pairs the server expects, communicating with it once
you know them is simple. All you have to do is create a query string that includes the necessary name-value
pairs, then form a URL that includes that query string. Send the query string to the server and read its response
using the same methods you use to connect to a server and retrieve a static HTML page. There’s no special
protocol to follow once the URL is constructed.
ACCESSING PASSWORD-PROTECTED SITES: THE
AUTHENTICATOR CLASS, THE
PASSWORDAUTHENTICATION CLASS AND THE
JPASSWORDFIELD CLASS
Accessing Password-Protected Sites
• Many popular sites require a username and password for access. Some sites, such as the W3C member pages,
implement this through HTTP authentication. Others, such as the New York Times website, implement it through
cookies and HTML forms. Java’s URL class can access sites that use HTTP authentication, although you’ll
of course need to tell it which username and password to use.
• Supporting sites that use nonstandard, cookie-based authentication is more challenging, not least
because this varies a lot from one site to another. Implementing cookieauthentication is hard short of
implementing a complete web browser with full HTMLforms and cookie support.
The Authenticator Class
• The java.net package includes an Authenticator class you can use to provide a username and password for
sites that protect themselves using HTTP authentication:
public abstract class Authenticator extends Object
• Since Authenticator is an abstract class, you must subclass it. Different subclasses may retrieve the
information in different ways. For example, a character mode program might just ask the user to type the
username and password on System.in.
• To make the URL class use the subclass, install it as the default authenticator by passing it to the static
Authenticator.setDefault() method:
public static void setDefault(Authenticator a)
• For example, if you’ve written an Authenticator subclass named DialogAuthenticator, you’d install it like this:
Authenticator.setDefault(new DialogAuthenticator());
• You only need to do this once. From this point forward, when the URL class needs a username and
password, it will ask the DialogAuthenticator using the static Authenticator.requestPasswordAuthentication()
method:
public static PasswordAuthentication requestPasswordAuthentication(InetAddress address, int port,
String protocol, String prompt, String scheme) throws SecurityException
The Authenticator Class
• Untrusted applets are not allowed to ask the user for a name and password. Trusted applets can do so, but only
if they possess the requestPasswordAuthentication Net Permission. Otherwise,
Authenticator.requestPasswordAuthentication() throws a SecurityException. The Authenticator subclass must
override the getPasswordAuthentication() method. Inside this method, you collect the username and password
from the user or some other source and return it as an instance of the java.net.PasswordAuthentication class:
protected PasswordAuthentication getPasswordAuthentication()
• If you don’t want to authenticate this request, return null, and Java will tell the server it doesn’t know
how to authenticate the connection. If you submit an incorrect username or password, Java will call
getPasswordAuthentication() again to give you another chance to provide the right data. You normally
have five tries to get the username and password correct; after that, openStream() throws a
ProtocolException.
• Usernames and passwords are cached within the same virtual machine session. Once you set the
correct password for a realm, you shouldn’t be asked for it again unless you’ve explicitly deleted the
password by zeroing out the char array that contains it.
The Authenticator Class
• We can get more details about the request by invoking any of these methods inherited from the Authenticator
superclass:
• protected final InetAddress getRequestingSite()
• protected final int getRequestingPort()
• protected final String getRequestingProtocol()
• protected final String getRequestingPrompt()
• protected final String getRequestingScheme()
• protected final String getRequestingHost()
• protected final String getRequestingURL()
• protected Authenticator.RequestorType getRequestorType()
• These methods either return the information as given in the last call to requestPass wordAuthentication() or
return null if that information is not available. (If the port isn’t available, getRequestingPort() returns -1.)
• The getRequestingURL() method returns the complete URL for which authentication has been requested—an
important detail if a site uses different names and passwords for different files. The getRequestorType() method
returns one of the two named constants (i.e., Authenticator.RequestorType.PROXY or Authenticator.Requestor
Type.SERVER) to indicate whether the server or the proxy server is requesting the authentication.
The PasswordAuthentication Class
• PasswordAuthentication is a very simple final class that supports two read-only properties: username
and password. The username is a String. The password is a char array so that the password can be
erased when it’s no longer needed. A String would have to wait to be garbage collected before it could
be erased, and even then it might still exist somewhere in memory on the local system, possibly even
on disk if the block of memory that contained it had been swapped out to virtual memory at one point.
Both username and password are set in the constructor:
public PasswordAuthentication(String userName, char[] password)
• Each is accessed via a getter method:
public String getUserName()
public char[] getPassword()
The JPasswordField Class
• One useful tool for asking users for their passwords in a more or less secure fashion is the
JPasswordField component from Swing:
public class JPasswordField extends JTextField
• This lightweight component behaves almost exactly like a text field. However, anything the user types
into it is echoed as an asterisk. This way, the password is safe from anyone looking over the user’s
shoulder at what’s being typed on the screen. JPasswordField also stores the passwords as a char
array so that when you’re done with the password you can overwrite it with zeros. It provides the
getPassword() method to return this:
public char[] getPassword()
• Otherwise, you mostly use the methods it inherits from the JTextField superclass.
THANK YOU
YOU CAN FIND THIS SLIDES/NOTES IN
https://chandanbhagat.com.np/docs/network-programming/
AND CODES IN
https://github.com/chandan-g-bhagat/network-programming

More Related Content

PDF
Software Engineering : Requirement Analysis & Specification
PPTX
Application Layer
PDF
Transport layer services
PPTX
Psychology of usable things
PPTX
Tcp and udp
PPT
Transport Layer
PPTX
Static dynamic and active web pages
PPTX
Unicast multicast & broadcast
Software Engineering : Requirement Analysis & Specification
Application Layer
Transport layer services
Psychology of usable things
Tcp and udp
Transport Layer
Static dynamic and active web pages
Unicast multicast & broadcast

What's hot (20)

PPTX
HTML5 Real Time and WebSocket Code Lab (SFHTML5, GTUGSF)
PPT
PPTX
Requirement Engineering Lec.1 & 2 & 3
PPTX
Software Engineering- Requirement Elicitation and Specification
PPT
PDF
Methods for handling deadlocks
PDF
System requirements engineering
PDF
Cloud Computing Using OpenStack
PDF
Introduction to Software Defined Networking (SDN)
PPT
Requirements analysis
PPTX
What is an API?
PPTX
Webofthing_WOT_vs_IOT.pptx
PPT
Use Case Diagram
PDF
gRPC Design and Implementation
PPTX
Remote access service
PPTX
IP addressing and Subnetting PPT
PPT
REQUIREMENT ENGINEERING
PDF
The CAP Theorem
PPTX
Transmission Control Protocol (TCP)
HTML5 Real Time and WebSocket Code Lab (SFHTML5, GTUGSF)
Requirement Engineering Lec.1 & 2 & 3
Software Engineering- Requirement Elicitation and Specification
Methods for handling deadlocks
System requirements engineering
Cloud Computing Using OpenStack
Introduction to Software Defined Networking (SDN)
Requirements analysis
What is an API?
Webofthing_WOT_vs_IOT.pptx
Use Case Diagram
gRPC Design and Implementation
Remote access service
IP addressing and Subnetting PPT
REQUIREMENT ENGINEERING
The CAP Theorem
Transmission Control Protocol (TCP)
Ad

Similar to Unit 3 - URLs and URIs (20)

PPTX
Web technology Unit I Part C
PPTX
Url web design
PPTX
Browser
PPTX
uniform resource locator
PPTX
Lecture 13
PPTX
Rest APIs Training
PPT
Hypertextandhypermedia 120320065133-phpapp01
PPT
Hypertext and hypermedia
PPT
Ch-1_.ppt
PDF
REST Basics
PPT
Http utilize
ODP
Web Browser Basics, Tips & Tricks - Draft 20 (Revised 5/18/17)
 
PPTX
introduction for web connectivity (IoT)
PDF
Ch 3: Web Application Technologies
PPTX
Introduction to Web Services
PDF
Solr Recipes Workshop
PPT
Final Url
PDF
Introduction to Solr
Web technology Unit I Part C
Url web design
Browser
uniform resource locator
Lecture 13
Rest APIs Training
Hypertextandhypermedia 120320065133-phpapp01
Hypertext and hypermedia
Ch-1_.ppt
REST Basics
Http utilize
Web Browser Basics, Tips & Tricks - Draft 20 (Revised 5/18/17)
 
introduction for web connectivity (IoT)
Ch 3: Web Application Technologies
Introduction to Web Services
Solr Recipes Workshop
Final Url
Introduction to Solr
Ad

More from Chandan Gupta Bhagat (20)

PPTX
Unit 2 : Internet Address
PPTX
Unit 7 : Network Security
PPTX
Unit 6 : Application Layer
PPTX
Unit 5 : Transport Layer
PPTX
Unit 4 - Network Layer
PPTX
Unit 3 - Data Link Layer - Part B
PPTX
Unit 3 - Data Link Layer - Part A
PPTX
Computer Network - Unit 2
PPTX
Computer Network - Unit 1
PPTX
Efficient Docker Image | MS Build Kathmandu
PPTX
Better Understanding OOP using C#
PPTX
Parytak sahayatri
PPTX
Developing windows 8 apps
PPTX
IOE assessment marks and attendance system
PPTX
PPTX
Oblique parallel projection
PPTX
Brainstorming session
PPTX
Presentation of 3rd Semester C++ Project
Unit 2 : Internet Address
Unit 7 : Network Security
Unit 6 : Application Layer
Unit 5 : Transport Layer
Unit 4 - Network Layer
Unit 3 - Data Link Layer - Part B
Unit 3 - Data Link Layer - Part A
Computer Network - Unit 2
Computer Network - Unit 1
Efficient Docker Image | MS Build Kathmandu
Better Understanding OOP using C#
Parytak sahayatri
Developing windows 8 apps
IOE assessment marks and attendance system
Oblique parallel projection
Brainstorming session
Presentation of 3rd Semester C++ Project

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
Teaching material agriculture food technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
KodekX | Application Modernization Development
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Programs and apps: productivity, graphics, security and other tools
Advanced methodologies resolving dimensionality complications for autism neur...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectral efficient network and resource selection model in 5G networks
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Big Data Technologies - Introduction.pptx
KodekX | Application Modernization Development
Per capita expenditure prediction using model stacking based on satellite ima...
The AUB Centre for AI in Media Proposal.docx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

Unit 3 - URLs and URIs

  • 1. NETWORK PROGRAMMING CHAPTER 3 : URLS AND URIS CHANDAN GUPTA BHAGAT https://me.chandanbhagat.com.np
  • 2. CONTENT • URIs: URLs and Relative URLs • The URL Class: Creating New URLs, Retrieving Data From a URL, Splitting a URL into Pieces, Equality & Comparison and Conversion • The URIClass: Constructing a URI,The Parts of the URI, Resolving Relative URIs, Equality & Comparison and String Representation • x-www-form-urlencoded: URL Encoder and URL Decoder • Proxies: System Properties,The ProxyClass andThe ProxySelector Class • Communicating with Server-Side ProgramsThroughGET • Accessing Password-Protected Sites:The Authenticator Class,The PasswordAuthenticationClass andThe JPasswordField Class
  • 3. URIs: URLs and Relative URLs
  • 4. • HTML is a hypertext markup language because it includes a way to specify links to other documents identified by URLs. • A URL unambiguously identifies the location of a re‐ source on the Internet. A URL is the most common type of URI, or Uniform Resource Identifier. • A URI can identify a resource by its network location, as in a URL, or by its name, number, or other characteristics. • The URL class is the simplest way for a Java program to locate and retrieve data from the network. • You do not need to worry about the details of the protocol being used, or how to communicate with the server; you simply tell Java the URL and it gets the data for you.
  • 5. URIs • A Uniform Resource Identifier (URI) is a string of characters in a particular syntax that identifies a resource. • The resource identified may be a file on a server; but it may also be an email address, a news message, a book, a person’s name, an Internet host, the current stock price of Oracle, or something else. • A resource is a thing that is identified by a URI. A URI is a string that identifies a resource. Yes, it is exactly that circular. • All you ever receive from a server is a representation of a resource which comes in the form of bytes. • However, a single resource may have different representations. For instance, https://www.un.org/en/documents/udhr/ identifies the Universal Declaration of Human Rights; but there are representations of the declaration in plain text, XML, PDF, and other formats.
  • 6. URIs • One of the key principles of good web architecture is to be profligate with URIs. If anyone might want to address something or refer to something, give it a URI (and in practice a URL). • Just because a resource is a part of another resource, or a collection of other resources, or a state of another resource at a particular time, doesn’t mean it can’t have its own URI. • For instance, in an email service, every user, every message received, every message sent, every filtered view of the inbox, every contact, every filter rule, and every single page a user might ever look at should have a unique URI. • Although architecturally URIs are opaque strings, in practice it’s use‐ ful to design them with human-readable substructure.
  • 7. URIs • The syntax of a URI is composed of a scheme and a scheme-specific part, separated by a colon, like this: scheme:scheme-specific-part • The syntax of the scheme-specific part depends on the scheme being used. Current schemes include: • data Base64-encoded data included directly in a link; see RFC 2397 • file A file on a local disk • ftp An FTP server • http A World Wide Web server using the Hypertext Transfer Protocol • mailto An email address • magnet A resource available for download via peer-to-peer networks such as BitTorrent
  • 8. URIs • telnet A connection to a Telnet-based service • urn A Uniform Resource Name • In addition, Java makes heavy use of nonstandard custom schemes such as rmi, jar, jndi, and doc for various purposes. • There is no specific syntax that applies to the scheme-specific parts of all URIs. However, many have a hierarchical form, like this: //authority/path?query • The authority part of the URI names the authority responsible for resolving the rest of the URI • The path is responsible for mapping the rest of the resource
  • 9. URIs • If the authority is an Internet host, optional usernames and ports may also be provided to make the authority more specific. • For example, the URI ftp://mp3:mp3@ci4319- a.ashvil1.nc.home.com:33/VanHalen-Jump.mp3 has the authority mp3:mp3@ci43198-a.ashvil1.nc.home.com:33. • This authority has the username mp3, the password mp3, the host ci43198- a.ashvil1.nc.home.com, and the port 33. It has the scheme ftp and the path /VanHalen-Jump.mp3. (In most cases, including the pass‐ word in the URI is a big security hole unless, as here, you really do want everyone in the universe to know the password.)
  • 10. URIs • The scheme part is composed of lowercase letters, digits, and the plus sign, period, and hyphen. • The other three parts of a typical URI (authority, path, and query) should each be composed of the ASCII alphanumeric characters (i.e., the letters A–Z, a–z, and the digits 0–9). • In addition, the punctuation characters - _ . ! and ~ may also be used. • Delimiters such as / ? & and = may be used for their predefined purposes. • All other characters, including non-ASCII alphanumerics such as á and ζ as well as delimiters not being used as delimiters should be escaped by a percent sign (%) followed by the hexadecimal codes for the character as encoded in UTF-8. • For instance, in UTF-8, á is the two bytes 0xC3 0xA1 so it would be encoded as %c3%a1. The Chinese character 木 is Unicode code point 0x6728. • In UTF-8, this is encoded as the three bytes E6, 9C, and A8. Thus, in a URI it would be encoded as %E6%9C%A8.
  • 11. URLs • A URL is a URI that, as well as identifying a resource, provides a specific network location for the resource that a client can use to retrieve a representation of that resource. • A generic URI may tell you what a resource is, but not actually tell you where or how to get that resource. • In the physical world, it’s the difference between the title “Harry Potter and The Deathly Hallows” and the library location “Room 312, Row 28, Shelf 7”. • In Java, it’s the difference between the java.net.URI class that only identifies resources and the java.net.URL class that can both identify and retrieve resources. • The network location in a URL usually includes the protocol used to access a server (e.g., FTP, HTTP), the hostname or IP address of the server, and the path to the resource on that server. • A typical URL looks like http://www.ibiblio.org/javafaq/javatutorial.html. This specifies that there is a file called javatutorial.html in a directory called javafaq on the server www.ibiblio.org, and that this file can be accessed via the HTTP protocol.
  • 12. URLs • The syntax of a URL is: protocol://userInfo@host:port/path?query#fragment • Here the protocol is another word for what was called the scheme of the URI. (Scheme is the word used in the URI RFC. Protocol is the word used in the Java documentation.) In a URL, the protocol part can be file, ftp, http, https, magnet, telnet, or various other strings (though not urn). • The host part of a URL is the name of the server that provides the resource you want. It can be a hostname such as www.oreilly.com or utopia.poly.edu or an IP address, such as 204.148.40.9 or 128.238.3.21. • The userInfo is optional login information for the server. If present, it contains a username and, rarely, a password. • The port number is also optional. It’s not necessary if the service is running on its default port (port 80 for HTTP servers). • Together, the userInfo, host, and port constitute the authority • Technically, a string that contains a fragment identifier is a URL reference, not a URL. Java, however, does not distinguish between URLs and URL references.
  • 13. Relative URLs • A URL tells a web browser a lot about a document: the protocol used to retrieve the document, the host where the document lives, and the path to the document on that host. • Most of this information is likely to be the same for other URLs that are referenced in the document. Therefore, rather than requiring each URL to be specified in its entirety, a URL may inherit the protocol, hostname, and path of its parent document (i.e., the document in which it appears). • URLs that aren’t complete but inherit pieces from their parent are called relative URLs. In contrast, a completely specified URL is called an absolute URL. • In a relative URL, any pieces that are missing are assumed to be the same as the corresponding pieces from the URL of the document in which the URL is found. For example, suppose that while browsing http://www.ibiblio.org/javafaq/javatutorial.html you click on this hyperlink: <a href="javafaq.html"> • The browser cuts javatutorial.html off the end of http://www.ibiblio.org/javafaq/javatutorial.html to get http://www.ibiblio.org/javafaq/.
  • 14. Relative URLs • Then it attaches javafaq.html onto the end of http://www.ibiblio.org/javafaq/ to get http://www.ibiblio.org/javafaq/javafaq.html. • Finally, it loads that document. If the relative link begins with a /, then it is relative to the document root instead of relative to the current file. Thus, if you click on the following link while browsing http://www.ibiblio.org/javafaq/javatutorial.html: <a href="/projects/ipv6/"> • The browser would throw away /javafaq/javatutorial.html and attach /projects/ipv6/ to the end of http://www.ibiblio.org to get http://www.ibiblio.org/projects/ipv6/. • Relative URLs have a number of advantages. First and least important they save a little typing. • More importantly, relative URLs allow a single document tree to be served by multiple protocols: for instance, both HTTP and FTP. HTTP might be used for direct surfing, while FTP could be used for mirroring the site. • Most importantly of all, relative URLs allow entire trees of documents to be moved or copied from one site to another without breaking all the internal links.
  • 15. The URL Class: Creating New URLs, Retrieving Data From a URL, Splitting a URL Into Pieces, Equality & Comparison and Conversion
  • 16. URL Class • The java.net.URL class is an abstraction of a Uniform Resource Locator such as http://www.lolcats.com/ or ftp://ftp.redhat.com/pub/. • Extends java.lang.Object, and it is a final class that cannot be subclassed. • Uses the strategy design pattern rather than relying on inheritance to configure instances for different kinds of URLs. Protocol handlers are the strategies, and the URL class itself forms the context through which the different strategies are selected. • It is helpful to think of URLs as objects with fields that include the scheme (a.k.a. the protocol), hostname, port, path, query string, and fragment identifier (a.k.a. the ref), each of which may be set independently. • URLs are immutable. After a URL object has been constructed, its fields do not change. This has the side effect of making them thread safe.
  • 17. Creating new URLs • you can construct instances of java.net.URL. The constructors differ in the information they require: public URL(String url) throws MalformedURLException public URL(String protocol, String hostname, String file) throws MalformedURLException public URL(String protocol, String host, int port, String file) throws MalformedURLException public URL(URL base, String relative) throws MalformedURLException • Which constructor you use depends on the information you have and the form it’s in. All these constructors throw a MalformedURLException if you try to create a URL for an unsupported protocol or if the URL is syntactically incorrect. • Exactly which protocols are supported is implementation dependent. The only protocols that have been available in all virtual machines are http and file, and the latter is notoriously flaky. • Today, Java also supports the https, jar, and ftp protocols. Some virtual machines support mailto and gopher as well as some custom protocols like doc, netdoc, systemresource, and verbatim used internally by Java.
  • 18. Creating new URLs • If the protocol you need isn’t supported by a particular VM, you may be able to install a protocol handler for that scheme to enable the URL class to speak that protocol. • In practice, this is way more trouble than it’s worth. You’re better off using a library that exposes a custom API just for that protocol • Other than verifying that it recognizes the URL scheme, Java does not check the correctness of the URLs it constructs. The programmer is responsible for making sure that URLs created are valid. • For instance, Java does not check that the hostname in an HTTP URL does not contain spaces or that the query string is x-www-form-URL-encoded. It does not check that a mailto URL actually contains an email address. You can create URLs for hosts that don’t exist and for hosts that do exist but that you won’t be allowed to connect to.
  • 19. Creating new URLs • Constructing a URL from a string The simplest URL constructor just takes an absolute URL in string form as its single argument: public URL(String url) throws MalformedURLException • Like all constructors, this may only be called after the new operator, and like all URL constructors, it can throw a MalformedURLException. • The following code constructs a URL object from a String, catching the exception that might be thrown: try { URL u = new URL("http://www.audubon.org/"); } catch (MalformedURLException ex) { System.err.println(ex); }
  • 20. Creating new URLs • Constructing a URL from its component parts • You can also build a URL by specifying the protocol, the hostname, and the file: public URL(String protocol, String hostname, String file) throws MalformedURLException • This constructor sets the port to -1 so the default port for the protocol will be used. The file argument should begin with a slash and include a path, a filename, and optionally a fragment identifier. Forgetting the initial slash is a common mistake, and one that is not easy to spot. Like all URL constructors, it can throw a MalformedURLException. For example: try { URL u = new URL("http", "www.eff.org", "/blueribbon.html#intro"); } catch (MalformedURLException ex) { throw new RuntimeException("shouldn't happen; all VMs recognize http"); }
  • 21. Creating new URLs • This creates a URL object that points to http://www.eff.org/blueribbon.html#intro, using the default port for the HTTP protocol (port 80). The file specification includes a reference to a named anchor. The code catches the exception that would be thrown if the virtual machine did not support the HTTP protocol. However, this shouldn’t happen in practice. • For the rare occasions when the default port isn’t correct, the next constructor lets you specify the port explicitly as an int. The other arguments are the same. For example, this code fragment creates a URL object that points to http://fourier.dur.ac.uk:8000/~dma3mjh/jsci/, specifying port 8000 explicitly: try { URL u = new URL("http", "fourier.dur.ac.uk", 8000, "/~dma3mjh/jsci/"); } catch (MalformedURLException ex) { throw new RuntimeException("shouldn't happen; all VMs recognize http"); }
  • 22. Creating new URLs • Constructing relative URLs • This constructor builds an absolute URL from a relative URL and a base URL: public URL(URL base, String relative) throws MalformedURLException • For instance, you may be parsing an HTML document at http://www.ibiblio.org/javafaq/ index.html and encounter a link to a file called mailinglists.html with no further quali‐ fying information. In this case, you use the URL to the document that contains the link to provide the missing information. The constructor computes the new URL as http:// www.ibiblio.org/javafaq/mailinglists.html. For example: try { URL u1 = new URL("http://www.ibiblio.org/javafaq/index.html"); URL u2 = new URL (u1, "mailinglists.html"); } catch (MalformedURLException ex) { System.err.println(ex); }
  • 23. Creating new URLs • Other sources of URL objects • Besides the constructors discussed here, a number of other methods in the Java class library return URL objects. In applets, getDocumentBase() returns the URL of the page that contains the applet and getCodeBase() returns the URL of the applet .class file. • The java.io.File class has a toURL() method that returns a file URL matching the given file. The exact format of the URL returned by this method is platform dependent.
  • 24. Retrieving data from a URL • Naked URLs aren’t very exciting. What’s interesting is the data contained in the documents they point to. The URL class has several methods that retrieve data from a URL: public InputStream openStream() throws IOException public URLConnection openConnection() throws IOException public URLConnection openConnection(Proxy proxy) throws IOException public Object getContent() throws IOException public Object getContent(Class[] classes) throws IOException • The most basic and most commonly used of these methods is openStream(), which returns an InputStream from which you can read the data. • If you need more control over the download process, call openConnection() instead, which gives you a URLConnection which you can configure, and then get an InputStream from it. • Finally, you can ask the URL for its content with getContent() which may give you a more complete object such as String or an Image. Then again, it may just give you an InputStream anyway
  • 25. Retrieving data from a URL • public final InputStream openStream() throws IOException • The openStream() method connects to the resource referenced by the URL, performs any necessary handshaking between the client and the server, and returns an Input Stream from which data can be read. • The data we get from this InputStream is the raw (i.e., uninterpreted) content the URL references: ASCII if we’re reading an ASCII text file, raw HTML if we’re reading an HTML file, binary image data if we’re reading an image file, and so forth. It does not include any of the HTTP headers or any other protocol-related information. • We can read from this InputStream as we would read from any other InputStream. For example: try { URL u = new URL("http://www.lolcats.com"); InputStream in = u.openStream(); int c; while ((c = in.read()) != -1) System.out.write(c); in.close(); } catch (IOException ex) { System.err.println(ex); } • The preceding code fragment catches an IOException, which also catches the MalformedURLException that the URL constructor can throw, since MalformedURLException subclasses IOException.
  • 26. Retrieving data from a URL • public URLConnection openConnection() throws IOException • The openConnection() method opens a socket to the specified URL and returns a URLConnection object. A URLConnection represents an open connection to a network resource. If the call fails, openConnection() throws an IOException. For example: try { URL u = new URL("https://news.ycombinator.com/"); try { URLConnection uc = u.openConnection(); InputStream in = uc.getInputStream(); // read from the connection... } catch (IOException ex) { System.err.println(ex); } } catch (MalformedURLException ex) { System.err.println(ex); } • We should use this method when you want to communicate directly with the server. The URLConnection gives us access to everything sent by the server: in addition to the document itself in its raw form (e.g., HTML, plain text, binary image data), we can access all the metadata specified by the protocol.
  • 27. Retrieving data from a URL • public final Object getContent() throws IOException • The getContent() method is the third way to download data referenced by a URL. The getContent() method retrieves the data referenced by the URL and tries to make it into some type of object. If the URL refers to some kind of text such as an ASCII or HTML file, the object returned is usually some sort of InputStream. • If the URL refers to an image such as a GIF or a JPEG file, getContent() usually returns a java.awt.Image Producer. What unifies these two disparate classes is that they are not the thing itself but a means by which a program can construct the thing: URL u = new URL("http://mesola.obspm.fr/"); Object o = u.getContent(); • getContent() operates by looking at the Content-type field in the header of the data it gets from the server. If the server does not use MIME headers or sends an unfamiliar Content-type, getContent() returns some sort of InputStream with which the data can be read. An IOException is thrown if the object can’t be retrieved.
  • 28. Retrieving data from a URL • public final Object getContent(Class[] classes) throws IOException • A URL’s content handler may provide different views of a resource. This overloaded variant of the getContent() method lets you choose which class you’d like the content to be returned as. The method attempts to return the URL’s content in the first available format. For instance, if you prefer an HTML file to be returned as a String, but your second choice is a Reader and your third choice is an InputStream, write: URL u = new URL("http://www.nwu.org"); Class<?>[] types = new Class[3]; types[0] = String.class; types[1] = Reader.class; types[2] = InputStream.class; Object o = u.getContent(types); • If the content handler knows how to return a string representation of the resource, then it returns a String. If it doesn’t know how to return a string representation of the resource, then it returns a Reader. And if it doesn’t know how to present the resource as a reader, then it returns an InputStream. You have to test for the type of the returned object using instanceof
  • 29. Retrieving data from a URL if (o instanceof String) { System.out.println(o); } else if (o instanceof Reader) { int c; Reader r = (Reader) o; while ((c = r.read()) != -1) System.out.print((char) c); r.close(); } else if (o instanceof InputStream) { int c; InputStream in = (InputStream) o; while ((c = in.read()) != -1) System.out.write(c); in.close(); } else { System.out.println("Error: unexpected type " + o.getClass()); }
  • 30. Splitting a URL into pieces • URLs are composed of five pieces: • The scheme, also known as the protocol • The authority • The path • The fragment identifier, also known as the section or ref • The query string • For example, in the URL http://www.ibiblio.org/javafaq/books/jnp/index.html?isbn=1565922069#toc, the scheme is http, the authority is www.ibiblio.org, the path is /javafaq/books/jnp/index.html, the fragment identifier is toc, and the query string is isbn=1565922069. However, not all URLs have all these pieces. For instance, the URL http://www.faqs.org/rfcs/rfc3986.html has a scheme, an authority, and a path, but no fragment identifier or query string. • The authority may further be divided into the user info, the host, and the port. For example, in the URL http://admin@www.blackstar.com:8080/, the authority is admin@www.blackstar.com:8080. This has the user info admin, the host www.black‐star.com, and the port 8080. • Read-only access to these parts of a URL is provided by nine public methods: getFile(), getHost(), getPort(), getProtocol(), getRef(), getQuery(), getPath(), getUserInfo(), and getAuthority().
  • 31. Splitting a URL into pieces • public String getProtocol() • The getProtocol() method returns a String containing the scheme of the URL (e.g., “http”, “https”, or “file”). For example, this code fragment prints https: URL u = new URL("https://xkcd.com/727/"); System.out.println(u.getProtocol()); • public String getHost() • The getHost() method returns a String containing the hostname of the URL. For example, this code fragment prints xkcd.com: URL u = new URL("https://xkcd.com/727/"); System.out.println(u.getHost());
  • 32. Splitting a URL into pieces • public int getPort() • The getPort() method returns the port number specified in the URL as an int. If no port was specified in the URL, getPort() returns -1 to signify that the URL does not specify the port explicitly, and will use the default port for the protocol. • For example, if the URL is http://www.userfriendly.org/, getPort() returns -1; if the URL is http://www.userfriendly.org:80/, getPort() returns 80. The following code prints -1 for the port number because it isn’t specified in the URL: URL u = new URL("http://www.ncsa.illinois.edu/AboutUs/"); System.out.println("The port part of " + u + " is " + u.getPort()); • public int getDefaultPort() • The getDefaultPort() method returns the default port used for this URL’s protocol when none is specified in the URL. If no default port is defined for the protocol, then getDefaultPort() returns -1. • For example, if the URL is http://www.userfriendly.org/, getDefaultPort() returns 80; if the URL is ftp://ftp.userfriendly.org:8000/, getDefault Port() returns 21.
  • 33. Splitting a URL into pieces • public String getFile() • The getFile() method returns a String that contains the path portion of a URL; remember that Java does not break a URL into separate path and file parts. Everything from the first slash (/) after the hostname until the character preceding the # sign that begins a fragment identifier is considered to be part of the file. For example: URL page = this.getDocumentBase(); System.out.println("This page's path is " + page.getFile()); If the URL does not have a file part, Java sets the file to the empty string. • public String getPath() • The getPath() method is a near synonym for getFile(); that is, it returns a String containing the path and file portion of a URL. However, unlike getFile(), it does not include the query string in the String it returns, just the path.
  • 34. Splitting a URL into pieces • public String getRef() • The getRef() method returns the fragment identifier part of the URL. If the URL doesn’t have a fragment identifier, the method returns null. In the following code, getRef() returns the string xtocid1902914: URL u = new URL("http://www.ibiblio.org/javafaq/javafaq.html#xtocid1902914"); System.out.println("The fragment ID of " + u + " is " + u.getRef()); • public String getQuery() • The getQuery() method returns the query string of the URL. If the URL doesn’t have a query string, the method returns null. In the following code, getQuery() returns the string category=Piano: URL u = new URL("http://www.ibiblio.org/nywc/compositions.phtml?category=Piano"); System.out.println("The query string of " + u + " is " + u.getQuery());
  • 35. Splitting a URL into pieces • public String getUserInfo() • Some URLs include usernames and occasionally even password information. This information comes after the scheme and before the host; an @ symbol delimits it. • For instance, in the URL http://elharo@java.oreilly.com/, the user info is elharo. Some URLs also include passwords in the user info. For instance, in the URL ftp://mp3:secret@ftp.example.com/c%3a/stuff/mp3/, the user info is mp3:secret. • However, most of the time, including a password in a URL is a security risk. If the URL doesn’t have any user info, getUserInfo() returns null. Mailto URLs may not behave like you expect. • In a URL like mailto:elharo@ibiblio.org, “elharo@ibiblio.org” is the path, not the user info and the host. That’s because the URL specifies the remote recipient of the message rather than the username and host that’s sending the message.
  • 36. SPLITTING A URL INTO PIECES • public String getAuthority() • Between the scheme and the path of a URL, you’ll find the authority. • This part of the URI indicates the authority that resolves the resource. • In the most general case, the authority includes the user info, the host, and the port. • For example, in the URL ftp://mp3:mp3@138.247.121.61:21000/c%3a/, the authority is mp3:mp3@138.247.121.61:21000, the user info is mp3:mp3, the host is 138.247.121.61, and the port is 21000. • However, not all URLs have all parts. For instance, in the URL http://conferences.oreilly.com/java/speakers/, the authority is simply the hostname conferences.oreilly.com. • The getAuthority() method returns the authority as it exists in the URL, with or without the user info and port.
  • 37. Equality & Comparison • The URL class contains the usual equals() and hashCode() methods. • Two URLs are considered equal if and only if both URLs point to the same resource on the same host, port, and path, with the same fragment identifier and query string. However there is one surprise here. • The equals() method actually tries to resolve the host with DNS so that, for example, it can tell that http://www.ibiblio.org/ and http://ibiblio.org/ are the same. • This means that equals() on a URL is potentially a blocking I/O operation! For this reason, you should avoid storing URLs in data structure that depend on equals() such as java.util.HashMap. Prefer java.net.URI for this, and convert back and forth from URIs to URLs when necessary. • On the other hand, equals() does not go so far as to actually compare the resources identified by two URLs. • For example, http://www.oreilly.com/ is not equal to http://www.oreilly.com/index.html; and http://www.oreilly.com:80 is not equal to http://www.oreilly.com/.
  • 38. Conversion • URL has three methods that convert an instance to another form: toString(), toExternalForm(), and toURI(). Like all good classes, java.net.URL has a toString() method. The String produced by toString() is always an absolute URL, such as http://www.cafeaulait.org/javatutorial.html. It’s uncommon to call toString() explicitly. Print statements call to String() implicitly. Outside of print statements, it’s more proper to use toExternal Form() instead: public String toExternalForm() • The toExternalForm() method converts a URL object to a string that can be used in an HTML link or a web browser’s Open URL dialog. • The toExternalForm() method returns a human-readable String representing the URL. It is identical to the toString() method. In fact, all the toString() method does is return toExternalForm(). • Finally, the toURI() method converts a URL object to an equivalent URI object: public URI toURI() throws URISyntaxException • The URI class provides much more accurate, specification-conformant behavior than the URL class. For operations like absolutization and encoding, you should prefer the URI class where you have the option. You should also prefer the URI class if you need to store URLs in a hashtable or other data structure, since its equals() method is not blocking. The URL class should be used primarily when you want to download content from a server.
  • 39. The URI Class: Constructing A URI, The Parts of the URI, Resolving Relative URIs, Equality & Comparison and String Representation
  • 40. Constructing a URI • URIs are built from strings. You can either pass the entire URI to the constructor in a single string, or the individual pieces: public URI(String uri) throws URISyntaxException public URI(String scheme, String schemeSpecificPart, String fragment) throws URISyntaxException public URI(String scheme, String host, String path, String fragment) throws URISyntaxException public URI(String scheme, String authority, String path, String query, String fragment) throws URISyntaxException public URI(String scheme, String userInfo, String host, int port, String path, String query, String fragment) throws URISyntaxException • Unlike the URL class, the URI class does not depend on an underlying protocol handler. As long as the URI is syntactically correct, Java does not need to understand its protocol in order to create a representative URI object. Thus, unlike the URL class, the URI class can be used for new and experimental URI schemes.
  • 41. Constructing a URI • The first constructor creates a new URI object from any convenient string. For example: URI voice = new URI("tel:+1-800-9988-9938"); URI web = new URI("http://www.xml.com/pub/a/2003/09/17/stax.html#id=_hbc"); URI book = new URI("urn:isbn:1-565-92870-9"); • If the string argument does not follow URI syntax rules for example, if the URI begins with a colon this constructor throws a URISyntaxException. This is a checked exception, so either catch it or declare that the method where the constructor is invoked can throw it. • However, one syntax rule is not checked. In contradiction to the URI specification, the characters used in the URI are not limited to ASCII. They can include other Unicode characters, such as ø and é. Syntactically, there are very few restrictions on URIs, especially once the need to encode non-ASCII characters is removed and relative URIs are allowed. Almost any string can be interpreted as a URI.
  • 42. Constructing a URI • The second constructor that takes a scheme specific part is mostly used for nonhierarchical URIs. • The scheme is the URI’s protocol, such as http, urn, tel, and so forth. It must be composed exclusively of ASCII letters and digits and the three punctuation characters +, -, and .. It must begin with a letter. Passing null for this argument omits the scheme, thus creating a relative URI. For example: URI absolute = new URI("http", "//www.ibiblio.org" , null); URI relative = new URI(null, "/javafaq/index.shtml", "today"); • The scheme-specific part depends on the syntax of the URI scheme; it’s one thing for an http URL, another for a mailto URL, and something else again for a tel URI. Because the URI class encodes illegal characters with percent escapes, there’s effectively no syntax error you can make in this part.
  • 43. Constructing a URI • The third constructor is used for hierarchical URIs such as http and ftp URLs. The host and path together (separated by a /) form the scheme-specific part for this URI. For example: URI today= new URI("http", "www.ibiblio.org", "/javafaq/index.html", "today"); • This produces the URI http://www.ibiblio.org/javafaq/index.html#today. • If the constructor cannot form a legal hierarchical URI from the supplied pieces—for instance, if there is a scheme so the URI has to be absolute but the path doesn’t start with /—then it throws a URISyntaxException.
  • 44. Constructing a URI • The fourth constructor is basically the same as the third, with the addition of a query string. For example: URI today = new URI("http", "www.ibiblio.org", "/javafaq/index.html", "referrer=cnet&date=2014-02-23", "today"); http://www.ibiblio.org/javafaq/index.html?referrer=cnet&date=2014-02-23#today • As usual, any unescapable syntax errors cause a URISyntaxException to be thrown and null can be passed to omit any of the arguments.
  • 45. Constructing a URI • The fifth constructor is the master hierarchical URI constructor that the previous two invoke. It divides the authority into separate user info, host, and port parts, each of which has its own syntax rules. For example URI styles = new URI("ftp", "anonymous:elharo@ibiblio.org", "ftp.oreilly.com", 21, "/pub/stylesheet", null, null); • However, the resulting URI still has to follow all the usual rules for URIs; and again null can be passed for any argument to omit it from the result. • If you’re sure your URIs are legal and do not violate any of the rules, you can use the static factory URI.create() method instead. Unlike the constructors, it does not throw a URISyntaxException. For example, this invocation creates a URI for anonymous FTP access using an email address as password: URI styles = URI.create("ftp://anonymous:elharo%40ibiblio.org@ftp.oreilly.com:21/pub/stylesheet"); • If the URI does prove to be malformed, then an IllegalArgumentException is thrown by this method. This is a runtime exception, so you don’t have to explicitly declare it or catch it.
  • 46. Parts of the URI • A URI reference has up to three parts: a scheme, a scheme-specific part, and a fragment identifier. The general format is: scheme:scheme-specific-part:fragment • If the scheme is omitted, the URI reference is relative. If the fragment identifier is omitted, the URI reference is a pure URI. The URI class has getter methods that return these three parts of each URI object. The getRawFoo() methods return the encoded forms of the parts of the URI, while the equivalent getFoo() methods first decode any percentescaped characters and then return the decoded part: public String getScheme() public String getSchemeSpecificPart() public String getRawSchemeSpecificPart() public String getFragment() public String getRawFragment() • These methods all return null if the particular URI object does not have the relevant component. • A URI that has a scheme is an absolute URI. A URI without a scheme is relative. The isAbsolute() method returns true if the URI is absolute, false if it’s relative: public boolean isAbsolute()
  • 47. Parts of the URI • The details of the scheme-specific part vary depending on the type of the scheme. For example, in a tel URL, the scheme-specific part has the syntax of a telephone number. However, in many useful URIs, including the very common file and http URLs, the scheme-specific part has a particular hierarchical format divided into an authority, a path, and a query string. The authority is further divided into user info, host, and port. The isOpaque() method returns false if the URI is hierarchical, true if it’s not hierarchical—that is, if it’s opaque: public boolean isOpaque() • If the URI is opaque, all you can get is the scheme, scheme-specific part, and fragment identifier. However, if the URI is hierarchical, there are getter methods for all the different parts of a hierarchical URI: public String getAuthority() public String getFragment() public String getHost() public String getPath() public String getPort() public String getQuery() public String getUserInfo()
  • 48. Parts of the URI • These methods all return the decoded parts; in other words, percent escapes, such as %3C, are changed into the characters they represent, such as <. If you want the raw, encoded parts of the URI, there are five parallel etRaw_Foo_() methods: public String getRawAuthority() public String getRawFragment() public String getRawPath() public String getRawQuery() public String getRawUserInfo() • Remember the URI class differs from the URI specification in that non-ASCII characters such as é and ü are never percent escaped in the first place, and thus will still be present in the strings returned by the getRawFoo() methods unless the strings originally used to construct the URI object were encoded.
  • 49. Parts of the URI • For various technical reasons that don’t have a lot of practical impact, Java can’t always initially detect syntax errors in the authority component. The immediate symptom of this failing is normally an inability to return the individual parts of the authority, port, host, and user info. In this event, you can call parseServerAuthority() to force the authority to be reparsed: public URI parseServerAuthority() throws URISyntaxException • The original URI does not change (URI objects are immutable), but the URI returned will have separate authority parts for user info, host, and port. If the authority cannot be parsed, a URISyntaxException is thrown.
  • 50. Resolving Relative URIs • The URI class has three methods for converting back and forth between relative and absolute URIs: public URI resolve(URI uri) public URI resolve(String uri) public URI relativize(URI uri) • The resolve() methods compare the uri argument to this URI and use it to construct a new URI object that wraps an absolute URI. For example, consider these three lines of code: URI absolute = new URI("http://www.example.com/"); URI relative = new URI("images/logo.png"); URI resolved = absolute.resolve(relative); • If the invoking URI does not contain an absolute URI itself, the resolve() method resolves as much of the URI as it can and returns a new relative URI object as a result. For example, take these three statements: URI top = new URI("javafaq/books/"); URI resolved = top.resolve("jnp3/examples/07/index.html");
  • 51. Resolving Relative URIs • It’s also possible to reverse this procedure; that is, to go from an absolute URI to a relative one. The relativize() method creates a new URI object from the uri argument that is relative to the invoking URI. The argument is not changed. For example: URI absolute = new URI("http://www.example.com/images/logo.png"); URI top = new URI("http://www.example.com/"); URI relative = top.relativize(absolute); • The URI object relative now contains the relative URI images/logo.png.
  • 52. Equality And Comparisons • Equal URIs must both either be hierarchical or opaque. The scheme and authority parts are compared without considering case. That is, http and HTTP are the same scheme, and www.example.com is the same authority as ww.EXAMPLE.com. • The rest of the URI is case sensitive, except for hexadecimal digits used to escape illegal characters. Escapes are not decoded before comparing. http://www.example.com/A and http://www.example.com/%41 are unequal URIs. The hashCode() method is consistent with equals. • Equal URIs do have the same hash code and unequal URIs are fairly unlikely to share the same hash code. • URI implements Comparable, and thus URIs can be ordered. The ordering is based on string comparison of the individual parts, in this sequence: • If the schemes are different, the schemes are compared, without considering case. • Otherwise, if the schemes are the same, a hierarchical URI is considered to be less than an opaque URI with the same scheme. • If both URIs are opaque URIs, they’re ordered according to their scheme-specific parts. • If both the scheme and the opaque scheme-specific parts are equal, the URIs are compared by their fragments.
  • 53. Equality And Comparisons • If both URIs are hierarchical, they’re ordered according to their authority components, which are themselves ordered according to user info, host, and port, in that order. Hosts are case insensitive. • If the schemes and the authorities are equal, the path is used to distinguish them. • If the paths are also equal, the query strings are compared. • If the query strings are equal, the fragments are compared. • URIs are not comparable to any type except themselves. Comparing a URI to anything except another URI causes a ClassCastException.
  • 54. String Representations • Two methods convert URI objects to strings, toString() and toASCIIString(): public String toString() public String toASCIIString() • The toString() method returns an unencoded string form of the URI (i.e., characters like é and are not percent escaped). Therefore, the result of calling this method is not guaranteed to be a syntactically correct URI, though it is in fact a syntactically correct IRI. This form is sometimes useful for display to human beings, but usually not for retrieval. • The toASCIIString() method returns an encoded string form of the URI. Characters like é and are always percent escaped whether or not they were originally escaped. This is the string form of the URI you should use most of the time. Even if the form returned by toString() is more legible for humans, they may still copy and paste it into areas that are not expecting an illegal URI. toASCIIString() always returns a syntactically correct URI.
  • 56. x-www-form-urlencoded • One of the challenges faced by the designers of the Web was dealing with the differences between operating systems. • These differences can cause problems with URLs: for example, some operating systems allow spaces in filenames; some don’t. Most operating systems won’t complain about a # sign in a filename; but in a URL, a # sign indicates that the filename has ended, and a fragment identifier follows. • Other special characters, nonalphanumeric characters, and so on, all of which may have a special meaning inside a URL or on another operating system, present similar problems. • Furthermore, Unicode was not yet ubiquitous when the Web was invented, so not all systems could handle characters such as é and 本. To solve these problems, characters used in URLs must come from a fixed subset of ASCII, specifically: • The capital letters A–Z • The lowercase letters a–z • The digits 0–9 • The punctuation characters - _ . ! ~ * ' (and ,)
  • 57. x-www-form-urlencoded • The characters : / & ? @ # ; $ + = and % may also be used, but only for their specified purposes. If these characters occur as part of a path or query string, they and all other characters should be encoded. • The encoding is very simple. Any characters that are not ASCII numerals, letters, or the punctuation marks specified earlier are converted into bytes and each byte is written as a percent sign followed by two hexadecimal digits. Spaces are a special case because they’re so common. Besides being encoded as %20, they can be encoded as a plus sign (+). The plus sign itself is encoded as %2B. The / # = & and ? characters should be encoded when they are used as part of a name, and not as a separator between parts of the URL. • The URL class does not encode or decode automatically. You can construct URL objects that use illegal ASCII and non-ASCII characters and/or percent escapes. Such characters and escapes are not automatically encoded or decoded when output by methods such as getPath() and toExternalForm(). You are responsible for making sure all such characters are properly encoded in the strings used to construct a URL object. • Luckily, Java provides URLEncoder and URLDecoder classes to cipher strings in this format.
  • 58. URLEncoder • To URL encode a string, pass the string and the character set name to the URLEncod er.encode() method. For example: String encoded = URLEncoder.encode("This*string*has*asterisks", "UTF-8"); • URLEncoder.encode() returns a copy of the input string with a few changes. Any non‐alphanumeric characters are converted into % sequences (except the space, underscore, hyphen, period, and asterisk characters). It also encodes all non-ASCII characters. The space is converted into a plus sign. • It also converts tildes, single quotes, exclamation points, and parentheses to percent escapes, even though they don’t absolutely have to be. However, this change isn’t forbidden by the URL specification, so web browsers deal reasonably with these excessively encoded URLs. • Although this method allows you to specify the character set, the only such character set you should ever pick is UTF-8. UTF-8 is compatible with the IRI specification, the URI class, modern web browsers, and more additional software than any other encoding you could choose.
  • 59. URLEncoder • This string has spaces • This*string*has*asterisks • This%string%has%percent%signs • This+string+has+pluses • This/string/has/slashes • This"string"has"quote"marks • This:string:has:colons • This~string~has~tildes • This(string)has(parentheses) • This.string.has.periods • This=string=has=equals=signs • This&string&has&ampersands • Thiséstringéhasé This+string+has+spaces This*string*has*asterisks This%25string%25has%25percent%25signs This%2Bstring%2Bhas%2Bpluses This%2Fstring%2Fhas%2Fslashes This%22string%22has%22quote%22marks This%3Astring%3Ahas%3Acolons This%7Estring%7Ehas%7Etildes This%28string%29has%28parentheses%29 This.string.has.periods This%3Dstring%3Dhas%3Dequals%3Dsigns This%26string%26has%26ampersands This%C3%A9string%C3%A9has%C3%A9non- ASCII+characters
  • 60. URLDecoder • The corresponding URLDecoder class has a static decode() method that decodes strings encoded in x- www-form-url-encoded format. That is, it converts all plus signs to spaces and all percent escapes to their corresponding character: public static String decode(String s, String encoding) throws UnsupportedEncodingException • If you have any doubt about which encoding to use, pick UTF-8. It’s more likely to be correct than anything else. An IllegalArgumentException should be thrown if the string contains a percent sign that isn’t followed by two hexadecimal digits or decodes into an illegal sequence. Since URLDecoder does not touch non- escaped characters, you can pass an entire URL to it rather than splitting it into pieces first. For example: String input = "https://www.google.com/"search?hl=en&as_q=Java&as_epq=I%2FO"; String output = URLDecoder.decode(input, "UTF-8"); System.out.println(output);
  • 61. Proxies: System Properties, The Proxyclass And The Proxyselector Class
  • 62. Proxies • Many systems access the Web and sometimes other non-HTTP parts of the Internet through proxy servers. A proxy server receives a request for a remote server from a local client. The proxy server makes the request to the remote server and forwards the result back to the local client. • Sometimes this is done for security reasons, such as to prevent remote hosts from learning private details about the local network configuration. • Other times it’s done to prevent users from accessing forbidden sites by filtering outgoing requests and limiting which sites can be viewed. • For instance, an elementary school might want to block access to http://www.playboy.com. And still other times it’s done purely for performance, to allow multiple users to retrieve the same popular documents from a local cache rather than making repeated downloads from the remote server. • Java programs based on the URL class can work through most common proxy servers and protocols. • Indeed, this is one reason you might want to choose to use the URL class rather than rolling your own HTTP or other client on top of raw sockets.
  • 63. System Properties • For basic operations, all you have to do is set a few system properties to point to the addresses of your local proxy servers. If you are using a pure HTTP proxy, set http.proxyHost to the domain name or the IP address of your proxy server and http.proxyPort to the port of the proxy server (the default is 80). There are several ways to do this, including calling System.setProperty() from within your Java code or using the -D options when launching the program. This example sets the proxy server to 192.168.254.254 and the port to 9000: <programlisting format="linespecific" id="I_7_tt264">% <userinput moreinfo="none"> java -Dhttp.proxyHost=192.168.254.254 -Dhttp.proxyPort=9000 </userinput> <emphasis role="bolditalic">com.domain.Program</emphasis></programlisting> • If the proxy requires a username and password, you’ll need to install an Authenticator. • If you want to exclude a host from being proxied and connect directly instead, set the http.nonProxyHosts system property to its hostname or IP address. To exclude multiple hosts, separate their names by vertical bars. For example, this code fragment proxies everything except java.oreilly.com and xml.oreilly.com: System.setProperty("http.proxyHost", "192.168.254.254"); System.setProperty("http.proxyPort", "9000"); System.setProperty("http.nonProxyHosts", "java.oreilly.com|xml.oreilly.com")
  • 64. System Properties • You can also use an asterisk as a wildcard to indicate that all the hosts within a particular domain or subdomain should not be proxied. For example, to proxy everything except hosts in the oreilly.com domain: % java -Dhttp.proxyHost=192.168.254.254 -Dhttp.nonProxyHosts=*.oreilly.com <emphasis role="bolditalic">com.domain.Program</emphasis></programlisting> • If you are using an FTP proxy server, set the ftp.proxyHost, ftp.proxyPort, and ftp.nonProxyHosts properties in the same way. • Java does not support any other application layer proxies, but if you’re using a transport layer SOCKS proxy for all TCP connections, you can identify it with the socksProxy Host and socksProxyPort system properties. Java does not provide an option for non-proxying with SOCKS. It’s an all-or-nothing decision.
  • 65. The Proxy Class • The Proxy class allows more fine-grained control of proxy servers from within a Java program. Specifically, it allows you to choose different proxy servers for different remote hosts. The proxies themselves are represented by instances of the java.net.Proxy class. • There are still only three kinds of proxies, HTTP, SOCKS, and direct connections (no proxy at all), represented by three constants in the Proxy.Type enum: • Proxy.Type.DIRECT • Proxy.Type.HTTP • Proxy.Type.SOCKS • Besides its type, the other important piece of information about a proxy is its address and port, given as a SocketAddress object. For example, this code fragment creates a Proxy object representing an HTTP proxy server on port 80 of proxy.example.com: SocketAddress address = new InetSocketAddress("proxy.example.com", 80); Proxy proxy = new Proxy(Proxy.Type.HTTP, address); • Although there are only three kinds of proxy objects, there can be many proxies of the same type for different proxy servers on different hosts.
  • 66. The ProxySelector Class • Each running virtual machine has a single java.net.ProxySelector object it uses to locate the proxy server for different connections. The default ProxySelector merely inspects the various system properties and the URL’s protocol to decide how to connect to different hosts. However, you can install your own subclass of ProxySelector in place of the default selector and use it to choose different proxies based on protocol, host, path, time of day, or other criteria. The key to this class is the abstract select() method: public abstract List<Proxy> select(URI uri) • Java passes this method a URI object (not a URL object) representing the host to which a connection is needed. For a connection made with the URL class, this object typically has the form http://www.example.com/ or ftp://ftp.example.com/pub/files/, for example. For a pure TCP connection made with the Socket class, this URI will have the form socket://host:port:, for instance, socket://www.example.com:80. The ProxySelector object then chooses the right proxies for this type of object and returns them in a List<Proxy>.The second abstract method in this class you must implement is connectFailed(): public void connectFailed(URI uri, SocketAddress address, IOException ex) • This is a callback method used to warn a program that the proxy server isn’t actually making the connection.
  • 67. COMMUNICATING WITH SERVER-SIDE PROGRAMS THROUGH GET
  • 68. Communicating with Server-Side Programs Through GET • The URL class makes it easy for Java applets and applications to communicate with serverside programs such as CGIs, servlets, PHP pages, and others that use the GET method. • All you need to know is what combination of names and values the program expects to receive. Then you can construct a URL with a query string that provides the requisite names and values. All names and values must be x-wwwform-url-encoded—as by the URLEncoder.encode() method. • There are a number of ways to determine the exact syntax for a query string that talks to a particular program. If you’ve written the server-side program yourself, you already know the name-value pairs it expects. If you’ve installed a third-party program on your own server, the documentation for that program should tell you what it expects. If you’re talking to a documented external network API, then the service usually provides fairly detailed documentation to tell you exactly what data to send for which purposes. • Many programs are designed to process form input. If this is the case, it’s straightforward to figure out what input the program expects. The method the form uses should be the value of the METHOD attribute of the FORM element. This value should be either GET, in which case you use the process described here, or POST. The part of the URL that precedes the query string is given by the value of the ACTION attribute of the FORM element. Note that this may be a relative URL, in which case you’ll need to determine the corresponding absolute URL. Finally, the names in the name-value pairs are simply the values of the NAME attributes of the INPUT elements. The values of the pairs are whatever the user types into the form
  • 69. Communicating with Server-Side Programs Through GET • For example, consider this HTML form for the local search engine on my Cafe con Leche site. You can see that it uses the GET method. The program that processes the form is accessed via the URL http://www.google.com/search. It has four separate name-value pairs, three of which have default values: <form name="search" action="http://www.google.com/search" method="get"> <input name="q" /> <input type="hidden" value="cafeconleche.org" name="domains" /> <input type="hidden" name="sitesearch" value="cafeconleche.org" /> <input type="hidden" name="sitesearch2" value="cafeconleche.org" /> <br /> <input type="image" height="22" width="55" src="images/search_blue.gif" alt="search" border="0" name="search-image" /> </form> • The type of the INPUT field doesn’t matter. For instance, it doesn’t matter if it’s a set of checkboxes, a pop-up list, or a text field. Only the name of each INPUT field and the value you give it is significant. The submit input tells the web browser when to send the data but does not give the server any extra information. Sometimes you find hidden INPUT fields that must have particular required default values. This form has three hidden INPUT fields. There are many different form tags in HTML that produce pop-up menus, radio buttons, and more. However, although these input widgets appear different to the user, the format of data they send to the server is the same. Each form element provides a name and an encoded string value.
  • 70. Communicating with Server-Side Programs Through GET • In some cases, the program you’re talking to may not be able to handle arbitrary text strings for values of particular inputs. However, since the form is meant to be read and filled in by human beings, it should provide sufficient clues to figure out what input is expected; for instance, that a particular field is supposed to be a two- letter state abbreviation or a phone number. Sometimes the inputs may not have such obvious names. • There may not even be a form, just links to follow. In this case, you have to do some experimenting, first copying some existing values and then tweaking them to see what values are and aren’t accepted. You don’t need to do this in a Java program. You can simply edit the URL in the address or location bar of your web browser window. • The likelihood that other hackers may experiment with your own server-side programs in such a fashion is a good reason to make them extremely robust against unexpected input. • Regardless of how you determine the set of name-value pairs the server expects, communicating with it once you know them is simple. All you have to do is create a query string that includes the necessary name-value pairs, then form a URL that includes that query string. Send the query string to the server and read its response using the same methods you use to connect to a server and retrieve a static HTML page. There’s no special protocol to follow once the URL is constructed.
  • 71. ACCESSING PASSWORD-PROTECTED SITES: THE AUTHENTICATOR CLASS, THE PASSWORDAUTHENTICATION CLASS AND THE JPASSWORDFIELD CLASS
  • 72. Accessing Password-Protected Sites • Many popular sites require a username and password for access. Some sites, such as the W3C member pages, implement this through HTTP authentication. Others, such as the New York Times website, implement it through cookies and HTML forms. Java’s URL class can access sites that use HTTP authentication, although you’ll of course need to tell it which username and password to use. • Supporting sites that use nonstandard, cookie-based authentication is more challenging, not least because this varies a lot from one site to another. Implementing cookieauthentication is hard short of implementing a complete web browser with full HTMLforms and cookie support.
  • 73. The Authenticator Class • The java.net package includes an Authenticator class you can use to provide a username and password for sites that protect themselves using HTTP authentication: public abstract class Authenticator extends Object • Since Authenticator is an abstract class, you must subclass it. Different subclasses may retrieve the information in different ways. For example, a character mode program might just ask the user to type the username and password on System.in. • To make the URL class use the subclass, install it as the default authenticator by passing it to the static Authenticator.setDefault() method: public static void setDefault(Authenticator a) • For example, if you’ve written an Authenticator subclass named DialogAuthenticator, you’d install it like this: Authenticator.setDefault(new DialogAuthenticator()); • You only need to do this once. From this point forward, when the URL class needs a username and password, it will ask the DialogAuthenticator using the static Authenticator.requestPasswordAuthentication() method: public static PasswordAuthentication requestPasswordAuthentication(InetAddress address, int port, String protocol, String prompt, String scheme) throws SecurityException
  • 74. The Authenticator Class • Untrusted applets are not allowed to ask the user for a name and password. Trusted applets can do so, but only if they possess the requestPasswordAuthentication Net Permission. Otherwise, Authenticator.requestPasswordAuthentication() throws a SecurityException. The Authenticator subclass must override the getPasswordAuthentication() method. Inside this method, you collect the username and password from the user or some other source and return it as an instance of the java.net.PasswordAuthentication class: protected PasswordAuthentication getPasswordAuthentication() • If you don’t want to authenticate this request, return null, and Java will tell the server it doesn’t know how to authenticate the connection. If you submit an incorrect username or password, Java will call getPasswordAuthentication() again to give you another chance to provide the right data. You normally have five tries to get the username and password correct; after that, openStream() throws a ProtocolException. • Usernames and passwords are cached within the same virtual machine session. Once you set the correct password for a realm, you shouldn’t be asked for it again unless you’ve explicitly deleted the password by zeroing out the char array that contains it.
  • 75. The Authenticator Class • We can get more details about the request by invoking any of these methods inherited from the Authenticator superclass: • protected final InetAddress getRequestingSite() • protected final int getRequestingPort() • protected final String getRequestingProtocol() • protected final String getRequestingPrompt() • protected final String getRequestingScheme() • protected final String getRequestingHost() • protected final String getRequestingURL() • protected Authenticator.RequestorType getRequestorType() • These methods either return the information as given in the last call to requestPass wordAuthentication() or return null if that information is not available. (If the port isn’t available, getRequestingPort() returns -1.) • The getRequestingURL() method returns the complete URL for which authentication has been requested—an important detail if a site uses different names and passwords for different files. The getRequestorType() method returns one of the two named constants (i.e., Authenticator.RequestorType.PROXY or Authenticator.Requestor Type.SERVER) to indicate whether the server or the proxy server is requesting the authentication.
  • 76. The PasswordAuthentication Class • PasswordAuthentication is a very simple final class that supports two read-only properties: username and password. The username is a String. The password is a char array so that the password can be erased when it’s no longer needed. A String would have to wait to be garbage collected before it could be erased, and even then it might still exist somewhere in memory on the local system, possibly even on disk if the block of memory that contained it had been swapped out to virtual memory at one point. Both username and password are set in the constructor: public PasswordAuthentication(String userName, char[] password) • Each is accessed via a getter method: public String getUserName() public char[] getPassword()
  • 77. The JPasswordField Class • One useful tool for asking users for their passwords in a more or less secure fashion is the JPasswordField component from Swing: public class JPasswordField extends JTextField • This lightweight component behaves almost exactly like a text field. However, anything the user types into it is echoed as an asterisk. This way, the password is safe from anyone looking over the user’s shoulder at what’s being typed on the screen. JPasswordField also stores the passwords as a char array so that when you’re done with the password you can overwrite it with zeros. It provides the getPassword() method to return this: public char[] getPassword() • Otherwise, you mostly use the methods it inherits from the JTextField superclass.
  • 78. THANK YOU YOU CAN FIND THIS SLIDES/NOTES IN https://chandanbhagat.com.np/docs/network-programming/ AND CODES IN https://github.com/chandan-g-bhagat/network-programming

Editor's Notes

  • #23: The filename is removed from the path of u1 and the new filename mailinglists.html is appended to make u2. This constructor is particularly useful when you want to loop through a list of files that are all in the same directory. You can create a URL for the first file and then use this initial URL to create URL objects for the other files by substituting their filenames
  • #24: The filename is removed from the path of u1 and the new filename mailinglists.html is appended to make u2. This constructor is particularly useful when you want to loop through a list of files that are all in the same directory. You can create a URL for the first file and then use this initial URL to create URL objects for the other files by substituting their filenames
  • #27: For example, if the scheme is HTTP or HTTPS, the URLConnection lets you access the HTTP headers as well as the raw HTML. The URLConnection class also lets you write data to as well as read from a URL for instance, in order to send email to a mailto URL or post form data.