Saturday, January 3, 2015

How Hypertext Transfer Protocol (HTTP) Works

It's really important to understand the details of the HTTP protocol. In order to build and debug effective cloud services. So let's talk about those details in some more depth. So the HTTP protocol is always organized around a client sending a request to the server. So we have our server and our client and we have a request that's being sent. 


Now, one of the key parts of that request is the action that the client is asking the server to take on its behalf. And so every request has a request method. And the request method, is the action or the verb that the client is asking the server to take on its behalf. And all requests are dictated as a request method that should be applied to a specific resource on the server. So for example, when you go and access a webpage, using your browser. What you're typically doing is sending a request that has the get request method. So it is a get request, and the resource is usually some webpage like index.html, which is usually the. Core webpage at a website, when you go to that, that address.

So the get is the request method, and the resource is the index.html. And the resource is typically specified as
a path to a resource on the server. So the resource will typically be a path. So you'll see something like get, you know, /index.html or fu/mypage or some other resource that you would like to access. And here again it's a Request method, and then a path to a resource. So let's talk about the H request methods. There's a variety of them in the protocol, but there's a subset of them that we really care about for communicating between mobile devices and the cloud.

One of the most important ones that we want to understand is Get. Get is a simple request to the server it can be without data or possibly include a little bit of data and it's asking the server to get some resource that's there and return it to us. So we're going to talk about the important HTTP request methods, that we really care about.

Another really one, important one is POST. POST is typically used when you want to send a lot of data to the server.  for example, if you want to go and Post an image to the server that it can then store and serve up at some later point in time. Post is probably something you will be using to do that. Get you probably aren't going to be sending an image through get. You're going to be sending some small amount of data through Get. Post is your more general purpose send data to the server. These are your two most important, but there's two more that's worth knowing about having some familiarity with. 

Another one is Put is asking a server to store some data that is contained within the request on the server. And the last one you probably want to know about is Delete. And it's fairly self explanatory. We weren't asking the server to delete some information on the server. So when you're sending HTTP requests, you're always going to specify an HTTP request method plus a path or resource. That you want that method to be applied to you.

Whenever a client sends a request to a server using HTTP, that request has to have a very specific format. I mean, just think about it, if you're that server receiving all of these requests from all over the place. You have to absolutely make sure that they all have the same format so you can interpret them properly and figure out the right thing to do with them. And that's why HTTP, very strictly specifies what the format and the rules are for sending that request.

So we have our HTTP Request. And, each HTTP Request has a number of critical, some mandatory and some optional, parts that have to be sent to the server. One of the first parts that has to be sent to the server is the Request Line. And the Request Line specifies two really important things. The first thing is it specifies the Request Method. And the second thing that it specifies is the Resource.

That we want to take action on, which is typically specified as a Path. So these are a request part. Now, this is the key thing. We're going to say, server, please go and take this action. On this particular Resource. But when that server gets that Request there's lots of cases where it may have multiple options for how it goes and completes that request. For example, let's say that we have a webpage on the server, and it's the homepage of that particular web address, so you go to the Equivalent of google.com. Well, sometimes the server needs help knowing some information about the client that can help it to complete that request, things like the language that the client would like to receive the response in.

the next piece of the request is a series of Headers. And what we can think of these is extra information to help the server. And these were things like the language that we would like the response to come back in. Or the character set that we would like to see the response in, or the content type, what we're sending back to the client so it knows how to interpret it and process it. Or possibly something like the cookies or small pieces of data that were sent from the server to the client in a past request that the client's now providing back to the server to help it figure out. Where it was or who this person is, and associate them with past requests.

the Headers are the second key component of a request, and they provide extra information to help the server figure out the right way to process the request. A final piece of information that can be part of an HTTP Request. Is the Request Body, and the Body is any data that the client is sending to the server, in order to help it complete the request. Now, this an optional part of a request, and not a 100% required, but let's distinguish it a little bit from the Headers. The Headers are meta information, things to help it know the right way to process the request.

The Body is core data that is being sent to the server to process the request. Now one way to think about this is in most cases. If you didn't include the Headers, but you did include the Body, the server could still process the Request. It just may not give it back to you in the format that you expected or exactly the way that you expected it. But if you didn't include the Body and the server needed the Body, it wouldn't be able to process the Request. So the Body is the data the client is sending, that the server absolutely has to have in order to complete that Request. The Headers are extra information that the client is giving the server to help it complete that Request.


Every time we're sending a request to the server, we're asking it to take some action on some resource that we're interested in. So, obviously, a really important part of understanding how to write clients that operate on HTTP is understanding how we talk about resources to the server and how we identify the resources that we're interested in so that the server can locate them and then take that appropriate action. Now in HTTP, the way that resources are identified is with what's called a uniform resource locator or URL, which is what everybody should be familiar with. And what a URL consists of is the specification of the protocol, which in this case, is http://, a host or server that's going to be talked to. Potentially a port number that's preceded by a colon and that is, we're going to connect to a specific port on that host. And then a path, that we would like to access, representing that resource.

So, in this case, this is telling us which server to go and talk to on the network. Which port of that server we want to deliver our HTTP messages to. So this is typically going to be the port that the web application container or web server, whatever the entity is that's providing the server side of the HTTP communications. This is going to be the port on this host that it's listing. And then finally you have the path to the resource that we are interested in on that particular host.


So then when the host receives this, what it's really interested in is typically the component at the end; which is the path. It already knows where it is, it already knows its port, but it needs to know the path to the resource that you're looking for. One other interesting component of URL's that are very important in HTTP communication from mobile clients and other clients trying to communicate with web based applications is the concept of query parameters. So you may have seen some URL's that look something like this. And what this is, is at the end of this we have what are called query parameters. And what query parameters are, is they are additional information that can be attached to the end of the URL, that's passed along to the server in order to communicate information, additional information, about what particular aspect of that resource the client is interested in. And these query parameters take the form of, they must follow a question mark that is placed at the end of the core part of the URL.

So right after the resource path, we have a question mark, which we see right here. And then we have a series of key value pairs, where in this case we have a key which is A and then we have the value which is B.
And so these query parameters follow at the end of the path that's being specified. Now if we want to specify multiple query parameters, that's also possible too. In this case we only have a single key value pair. We have the key A equals B. We can also pass multiple parameters. So for example, if we had a question mark and then we had A equals B, we can then add an ampersand to the query path, I mean the query parameters. And then we can have a second set of key valued pairs. So, in this case we have the first key is
key 1 which is A. Then, the second key which is key 2, which is C. And then we would have B as the first value for key 1. And D is the value for key 2.

So using query parameters, we can pass essentially extra information about a specific aspect of a resource to
the server. And we can pass as many of these as we want up to a limit. Now, we just keep separating these query parameters with ampersand in order to understand where the separations are between the keys and the values. There are a couple of important concepts that we need to know about, if were going to pass query parameters to the client. One of those things is we can't have certain special characters appear within our query parameter values and keys. So as you can imagine on the server side, the server needs to easily be able to parse these query parameters by looking for the equal signs and the ampersand signs as well as the question mark.

And if we began inserting these special symbols into our query parameters, it'd be hard for the server to extract them out. So, by default there are a set of characters that we're allowed to have in our query parameters and a set of characters that we aren't allowed to have in our query parameters. And any of those characters that aren't allowed to be in our query parameters have to be actually encoded using a process that's called URL encoding. And the idea behind this is, if we want to pass a value that has characters that aren't allowed in the query parameter spec.

What we can do is instead replace those characters, with what are called their URL encoded equivalence so that we can still represent them, and the server understands how we represent them. But when we actually go and look at the characters there itself, none of these symbols show up. They get replaced with other things that mean the same things. So if we have an equals sign within our value for a particular key pair, it will get replaced with the URL encoded equivalent of equal sign. Now, you could go and look up all of the specific ways that you URL encode different characters into a query parameter, but you don't need to. Almost every programming language that you're going to work with has libraries and functions available to you to automatically encode your query parameters into a properly URL encoded string.

So you don't need to go and learn them, you just need to understand the concept that. If you pass specific things like the equal sign or the ampersand or the question mark or the slash or space inside of one of your keys or values, then you need to make sure that you're URL is URL encoded. And so other than that, you don't need to do anything. But you do need to understand the cases where if you put something into one of your query parameters, you need to URL encode it. Often, when we're writing clients, they're going to be sending data to the server or sending requests to the server, we want to go ahead and URL encode all of our URLs that are dynamically constructed with some type of data that we're taking in. Because there's a possibility that that data may not strictly adhere to the query parameter specification.

No comments:

Post a Comment